Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents



D-Lib Magazine
January 2003

Volume 9 Number 1

ISSN 1082-9873

Open Archives Activities and Experiences in Europe

An Overview by the Open Archives Forum


Susanne Dobratz
Birgit Matthaei
Humboldt-University, Berlin, Germany
Electronic Publishing Group
University Library / Computing Centre
(dobratz | birgit.matthaei

Red Line


The Open Archives Forum Project

The Open Archives Forum is a two-year accompanying measures project that started in October 2001, under the Information Society Technologies-Programme of the European Commissions 5th Framework (IST-2001-320015). Project partners include: UKOLN- University of Bath (United Kingdom), Istituto di Scienza e Tecnologie della Informazione - CNR (Italy) and the Computing Center of Humboldt University (Germany). UKOLN is the coordinator of the project.

Open Archives Logo

Figure 1. Logo of the Open Archives Forum. For further information, please visit the Open Archives Forum web site at <>.

The Open Archives Forum is not another OAI implementation project. It is a clustering activity that targets existing open archives communities, as well as new communities, like IST projects or national initiatives planning or initiating open archives. The Open Archives Forum is a dissemination activity that aims to manage an exchange of experiences on open archives in general. The project investigates usage of open archives under different paradigms, and its aims are to make digital repositories more widely available, make them globally accessible, encourage people to share developments, and enable developing countries to obtain access to scientific and cultural heritage information.

The Open Archives Forum project supports established metadata repositories and supports new open archive data providers from communities such as cultural heritage institutions, museums, European digitization projects, research organizations, educational institutions, public libraries, community organizations and publishers as well as the commercial sector.

Benefits of a "Forum"

The Open Archives Forum serves as a tool to reach communities and mirrors especially to inform them about what is happening with regard to metadata implementations in Europe. The creation of services like Open Archives Initiative [1] service providers, aggregator services or other value-added services need to be supported. The Forum's additional responsibility is to raise awareness about and spur discussion on major open archive issues, e.g., to agree on a common terminology on digital repositories, to look into metadata and full text harvesting models, and to analyze and transfer user and community needs to potential repository service developers in order to support the building of advanced services.

The primary focus of the Open Archives Forum is to prepare (European) projects for action, so they will develop solutions to potential problems and establish new business models within digital libraries. The Open Archives Forum project both disseminates information and encourages collaborative development of software. It furthermore supports European liaison with the OAI, especially through open exchange of information, and in that way helps build a community of interest.

Summarizing the project objectives: they are to create a forum for exchanging experiences about all aspects of the Open Archive approach, including workshops, information, and organizational validation of existing technologies and strategies, as well as the technical validation of tools, software and interfaces.

How the Open Archives Forum works

The Open Archives Forum project is split into several "workpackages" including:

  • Workshops and Disseminations,
  • Organizational Validation, and
  • Technical Validation.

Workshops and Dissemination workpackage

The Open Archives Forum uses the Workshops and Dissemination workpackage to create interest within new communities. Four workshops were planned, and two have already taken place: the first in May 2002 in Pisa, Italy, and the second in December 2002 in Lisbon, Portugal.

By orienting workshops toward different communities, the Open Archives Forum hopes to raise awareness in communities that have not been as active as the e-prints community has. These new, targeted communities could learn from the experiences of the e-prints community and other communities already providing open archives, and they could also reuse existing tools and software. To support this goal and prepare for the first workshop, a survey was conducted prior to the workshop. (The results of the survey will be discussed in detail later in this article.)

The December workshop in Lisbon was targeted to traditional libraries and archives. The theme of the workshop was "Providing Access to Hidden Resources", and the workshop goal was to explore whether and under what conditions the open archive approach is strategically relevant and technically viable for these two communities. In various presentations and breakout sessions, requirements, standards, best practices, and solutions to interoperability problems of the traditional archival and library communities were analyzed and compared with the features provided by the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The intention was to share experiences in preparing the conditions for a wider availability of the resources now hidden in European libraries and archives. (You can find the detailed workshop program as well as abstracts and presentation slides at <>.)

The communities targeted for the third workshop, which will be held in March 2003 in Berlin, Germany, are those who are conducting multimedia projects, as well as cultural heritage institutions and museums. The Open Archives Forum will bring these communities together to discuss interoperability and open access issues. The production of different media types, such as videos, audios, images, animation, etc., as genuine digital productions or as digitized images has reached the stage where the multimedia content needs to be stored and managed within digital libraries. The aim of the Berlin workshop is to discover and explore the specific requirements and demands that need to be addressed before opening the media archives via the Internet.

The Berlin workshop will consist of presentations given by invited speakers and small group breakout sessions where the participants can discuss key issues. A pre-conference tutorial on OAI-PMH implementation will be held for those not familiar with this protocol. In addition, a representative of the Open Archives Initiative (OAI) will be one of the workshop speakers and will provide an update of OAI activities. (In late January, further information about the Berlin workshop, as well as a registration form, will be available at <>.)

The fourth workshop, planned for September 2003 in Bath, UK, is envisioned as a big dissemination event where all communities can come together, and the workshop will be held with the close cooperation of other initiatives, like SPARC and LIBER.

Organisational Validation workpackage

The Organisational Validation workpackage explores old and new business models that may be used within an open repository and service environment. Topics found in this workpackage include: the cooperation of Data Providers needed to build networks of Service Providers; the terms and conditions of a metadata 'exchange' between Data and Service Providers; and the provision of added value services (metadata enhancement, auto classification, addition of OpenURLs).

Within this workpackage, issues of ownership as well as the digital rights management questions are discussed. For instance, how are Intellectual Property Rights (IPR) and copyright handled within digital repositories and services, and what impact does the open archives development have on authors and publishers? How can metadata be shared and exchanged, and what commercial agreements exist? Another important topic is that of the long-term sustainability of digital resources, metadata and scholarly communication.

To deal with these issues the Open Archives Forum established an email discussion group maintained by Dennis Nicholson (Strathclyde Univ. Glasgow).

Technical Validation workpackage

The Technical Validation workpackage examines the deployment of the OAI technical framework and creates an information space in the form of a web-based database on projects, software, implementations, and services, that allows interested parties to search for potential project partners, for metadata standards and for information about interoperability issues as well as to share project developments. Furthermore this workpackage provides information from those projects that have already dealt with integration of existing technologies. For example, here one can find whether projects have determined unqualified Dublin Core to be sufficiently rich for their purposes, or how the handling of database management issues (concurrency and update, scalability, re-duplication) are investigated, as well as find information regarding software, tools and the amount of effort needed for OAI implementation in terms of manpower, skills and time to set up a repository or service.

Overview of European Activities on OAI in relation to worldwide activities

The Open Archives Forum project conducted a survey on the European activities taking part in OAI.

To see the relationship between European activities to worldwide activities, the following resources [2] were used:

  • Open Archives Initiative
  • Open Archives Forum
  • Signal-Hill OAI Site

Overview of OAI activity

Figure 2. Overview of OAI activity.

If one considers origin, current advancement and concomitantly longer experiences with OAI (at least as shown in Figure 2), the slight numerical predominance of European activity is surprising. However, if one looks at the more detailed country list shown in Figure 3, it is noticeable that within Europe there is an unequal involvement among countries already engaged in OAI activities. (The Figure 2 and Figure 3 charts do not consider projects in development.)

Overview of European countries engaged in OAI implementation

Figure 3. Overview of European countries engaged in OAI implementation.

To obtain an overview of what software was used to become an OAI compatible repository, the data from the OAI repositories were examined. The predominant software used by OAI repositories was self developed. (See Figure 4.)

Software used in implementations

Figure 4. Software used in implementations

Especially in Europe, many implementers used the eprints software from the University of Southampton (Data Providers and Service Providers). Other software tools like ETD-db, CDS, IMDI and arXiv were only mentioned once.

Although the Open Archives Initiative deadline for upgrading to OAI-PMH 2.0 was 1 December 2002, survey results showed that more than half of the repositories were still using OAI-PMH Version 1.1 (up to the end of November 2002) and had not changed over to Version 2.0. (See Figure 5.)

Breakdown of versions of OAI-PMH software

Figure 5. Breakdown of versions of OAI-PMH software used.

Overview of European Open Archives Activities based on OAI implementations

In this article, we basically distinguish between different types of Data Providers: subject gateways, institutional repositories, library online public access catalogues (OPACs), commercial publishers and vendors, and media archives and museums. Service Providers offer either search services or extended services. The following section gives a partial overview of existing Data and Service Providers in Europe today.

Two examples of extended services [3]

  • CYCLADES - The EU project (CNR Pisa, Italy) CYCLADES builds an "Open Collaborative Virtual Archive Environment". It contains an access service to store information in a local database and a collection service to structure the user's workspace. CYCLADES provides search and browse facilities and allows filtering on individual user and community profiles. It also includes a recommendation service and a collaborative workspace.
  • TORII - A second extended service is TORII (TIPS, iCite) by SISSA Italy. It provides a portal where tools and documents are collected under a unified access point. TORII includes a personal workspace for scientists allowing them to rank documents according to user profiles and to measure impact factors. TORII contains the Okapi search engine and the iCite environment, which extracts citations from the documents to the archive.

Exemplary representation of different Data Provider types

We found OAI compliant repositories in: Austria, Belgium, Denmark, Finland, France, Germany, Italy, Ireland, Portugal, Norway, The Netherlands, Russia, Sweden, Switzerland and the United Kingdom. Many institutional archives are planned or are being initiated. Some funded projects include Service Providers. In particular, the University of Southampton has projects with international impact. You can find detailed lists of OAI projects on some of the slides presented at the 2nd OA-Forum workshop in Lisbon [4].

Examples of European Data Providers: subject gateways [5]

  • Medicine: Behavioral and Brain Science Prints Interactive Archive, Cambridge University Press, United Kingdom
  • Economics: RePec, Research Papers in Economics, United Kingdom
  • Mathematics: Math-Net, University of Osnabrück, Germany
  • Physics: PhysNet/ PhysDoc, University of Oldenburg, Germany
  • Cognitive Science: CogPrints, University of Southampton, United Kingdom
  • Psychology: Psycology, University of Southampton, United Kingdom
  • Educational Sciences: Education-line, University of Leeds, United Kingdom
  • Organic Agriculture: Organic Eprints, Denmark
  • Philosophy: Sammelpunkt. Elektronisch archivierte Theorie, Austria

Examples of European Data Providers: institutional repositories (by the example of Germany)

  • University library document server [6]: DuetT (Univ. Duisburg), edoc (Humboldt Univ., Berlin), eldorado (Univ. Dortmund), E-LIB (SuUB Bremen), HSSS (SLUB, TU Dresden), LMU (Univ. Munich), MONARCH (TU Chemnitz), Open Archive Portal (Univ. Tübingen), OPUS (Univ. Stuttgart)
  • Media server of universities [7]: timms (University Tübingen)
  • University library catalogues [8]: Univ. Library Oldenburg
  • Library service institutions for the region [9]: BSZ-BW (Bibliotheksservice-Zentrum Baden-Württemberg), Konstanz
  • Spreading initiative [10]: DINI - German Initiative for Networked Information, that produced recommendations to universities, libraries, etc., to follow the standards of the OAI released in December 2000 and that hosts workshops on different topics, targeted to bring people together working on the same questions in the field of networked computing in general. There is a DINI working group on OAI.

Technical Validation Questionnaire

The reasons behind the Open Archives Forum Technical Validation Questionnaire

The Open Archives Forum started a first Technical Validation Questionnaire [11] in preparation for the first Forum workshop in Pisa. The objective of the questionnaire was to provide an overview on the status, experiences and future plans regarding the workshop participants' OAI implementations. This first questionnaire was given exclusively to Pisa workshop participants.

Great interest was generated in Pisa on the results of this small survey, and the Open Archives Forum project received feedback indicating that it would be a good idea to collect experiences from a broader spectrum of OAI implementers as well as to learn more about starting conditions of those planning to implement or ones just beginning. The focus of interest was on fundamental questions such as:

  • Is there a great deal of common ground and therefore good conditions for cooperating and learning from each other, or are requirements so individual that it will be necessary for many isolated solutions to be developed?
  • Do the existing instruments for implementation fulfill all requirements or should tools and protocols be developed to cater to the needs of different communities?

Based on the need for answers to the above, in the second questionnaire we added or changed some questions and extended the period for responding to the questionnaire. In addition, we expanded the target audience for the questionnaire and subdivided the form to account for those projects that have not yet integrated OAI-PMH in addition to those who are experienced implementers.

This second, long-term survey will continue through autumn 2003.

Summary of early results from the second Technical Validation Questionnaire [12]

In the second Technical Validation Questionnaire we are asking for information about software used, implementation costs, coverage of the archive, and interoperability, experiences and expectations in different communities and in different countries.

Who has participated to date?

To date, 33 repositories have participated in the survey. Eleven of the repositories are not yet OAI implementers, but they are considering becoming implementers. (See Figure 6.)

Distribution of OAI repositories by country

Figure 6. Distribution of OAI repositories by country.

The responding repositories are distributed throughout Europe, with only two participants from overseas. More than a third of the survey respondents were from Germany or the UK.

Up to now, clearly more Data Providers than Service Providers have completed their OAI implementations. If one views the number of implementations under development or being planned, we will soon have many new services available. Many Data Providers used their implementation experiences to guide them in becoming Service Providers. (See Figures 7 and 8.)

Number of responding Data Providers and status of implementation

Figure 7. Number of responding Data Providers and status of implementation.

  • 31 % of active Data Providers are also Service Providers.
  • 54 % of active Data Providers plan or are still developing Service Provider implementations.

Number of responding Service Providers and status of implementation

Figure 8. Number of responding Service Providers and status of implementation.

If we look at the types of communities represented in the responses, it is remarkable that nearly half of the responders came from Libraries or Archives. (See Figure 9.)

Types of OAI-implementing Communities responding to the survey

Figure 9. Types of OAI-implementing Communities responding to the survey.

Software used

As mentioned previously, the first block of the survey is made up of questions about technical infrastructure and software solutions. Prior to OAI implementation, the dominant programming languages used by responders were PERL, Java and PHP, and the dominant databases were MySQL and Oracle. Practically no statements were made to interface and collection systems, so it is not possible to provide relevant information from the survey about those. However, it is significant that almost none of the organizations needed to replace existing software tools in order to become OAI compatible.

About 70% of the tools used to become OAI compatible were self-developed by Data Providers and Service Providers. Most of the Data Providers and Service Providers make their tools and source code available for others to use. The programming languages used to develop these tools are mainly Java and PERL, and also used frequently are PHP and XML. A few implementers used the eprints software, which is for both Data and Service Providers. The eprints system is run in a centralized way although archives are distributed. Other tools like PERL implementations or DBUnion were mentioned only once by survey respondents.

Implementation costs

After the questions regarding the software used, the next questionnaire subject block concerns implementation costs. With regard to the implementation skills needed, Data Providers as well as Service Providers focused on various combinations of the following five competencies:

  • System administration
  • Web server configuration
  • Knowledge of databases and SQL
  • Programming skill and knowledge
  • Experience with metadata

The survey results showed that most implementations were concluded within one quarter (three months) and most implementations were managed by one programmer. The reason for bigger expenditures by a few of the implementers was not directly connected to the implementation of the OAI-specifications. The higher costs involved larger research projects or were due to construction of archives or the processing of greater amounts of data. When survey respondents estimated maintenance costs, these were limited to at most 5 person days, and most often were estimated to be one person day per month for stable protocol.

These statements of expenditures were in line with the expectations of those who have not yet become OAI implementers: i.e., implementations concluded within one-quarter year and by one programmer. However, expectations on further maintenance for a stable protocol are higher; they were expected to be up to 20 days per month for one person. With the other survey questions, the answers when compared to expectations regarding implementation costs, differ too much for trends to be recognizable. This includes expectations regarding easy integration of the data structures suggested by the OAI-PMH in existing infrastructure, the costs of adapting data to the OAI-PMH, and expenditure needed for data preparation for Internet usage.

Resources offered and issues of interoperability

The next block on the survey questionnaire regards the range and kind of resources offered by the archive as well as interoperability.

Data Provider

The range of the number of resources available from Data Providers includes a wide span of between 35 and several million documents. The occupied storage space ranged from between 15 megabytes to 2 terabytes. Looking at both these ranges, it is important to note that the storage capacity used has less to do with the amount of data than with the type of objects.

In the list of the object types offered, it strikes one again that full text documents and metadata are what is mainly offered. The reason is due not only to longer experience with storing and evaluating data based on text. Of bigger concern is the cost of storing pictures and video files, which need stable, efficient databases. (See Figure 10.)

Types of objects offered by Data Providers

Figure 10. Types of objects offered by Data Providers.

The range of content types includes essentially the entire spectrum of scientific publications. There is a notably high interest in preprints, journal articles and theses. This provides evidence of a big need for a reasonable, fast way to access scientific information beyond conventional scientific publication forms. Other resources offered include library catalogues or video streams of university events. (See Figure 11.)

Types of content offered by Data Providers

Figure 11. Types of content offered by Data Providers.

The most-applied metadata format is Dublin Core. In addition, according to the respondents, library-specific formats are used, like MARC 21. However, there are a remarkably high number of formats that are mentioned only once, such as Dublin Core Library Profile, DiTeD, CEOS CIP, AMF, RIS, MAB, SPECTRUM, TEI, and one internal format. (See Figure 12.)

Metadata formats used in OAI implementations

Figure 12. Metadata formats used in OAI implementations.

Approximately half of the Data Providers are offering full text or extracts of documents. If the openness of the interface must be reduced, there are two access-limitation strategies: On the one hand, access control like control of the IP-addresses or licensing can be used, and on the other hand, data output is limited.

Service Provider

Service Providers offer local or community specific services to search and browse for information. Some provide a workspace for managing documents and metadata, and for collaboration within groups of users. A number of survey answers referred to research projects. For Service Providers, paging functions are urgent. They search in one or several sources through one search interface. Other services offered include cross-linking and annotation.

Strategies to process data harvested from data providers include using no provenance information or filtering harvester output and loading the local database. If Service Providers include information about Data Providers in data output, they do so in three ways:

  1. When a metadata record is found, the user can also browse information on the archive from which the record came.
  1. There is no metadata processing, queries against the portal return data sets as harvested, including information about the original Data Provider.
  1. The metadata is parsed and converted to an intermediate format. The provenance information is encoded in the identifier.

Experiences and expectations

Data Provider

For Data Providers, the importance of the OAI technical framework is that it makes it possible to provide additional services to existing services, replace existing services through the OAI interface and offer better retrieval.

The advantages of OAI are to share scientific knowledge and to harvest other knowledge databases. OAI also enables the importation of metadata into library software and major dissemination of the results of research. The OAI implementation is simple, cheap and easy to adapt for internal project usage. Last but not least, in comparison to more complex protocols, it is a simple-to-implement facility for exchanging metadata.

Service Provider

Concerning the experiences of Service Providers, some survey respondents indicated that standardization presented a problem: The heterogeneity of metadata record content requires the Service Providers to expend a lot of effort to normalize the data to make it usable. The Service Providers believe that metadata normalization can be done less expensively by Data Providers. A possible solution to this problem might be the development of middleware tools that Service Providers could use for data normalization.

In addition to listing those problems, the Service Providers who responded to the survey stated that they have future plans to do the following:

  • Extend search and browse functions,
  • Export data in other formats such as XML,
  • Build document delivery services like print on demand,
  • Establish collaborative environments for users and groups of users such as discussion forums, annotations, awareness,
  • Extend existing services and build distributed services, and
  • Establish an exchange of different library catalogues in order to integrate the information into a virtual union catalogue for the whole country.

One library is creating a single catalogue of all its library catalogues: library OPAC, archives database, image database and Internet gateways.

Information sources

Another of the survey questions focused on the quality of information sources. Many of the respondents who are not yet OAI-implementers say it takes too much effort to find good information about metadata, and especially difficult to find technical support. Some asked for an easy introduction to OAI-PMH.

Other participants recommended the following ways to find good information:

  • Search Websites [13] like that of OAI or, for the museums community, CIMI.
  • Read online journals like Ariadne and D-Lib Magazine [14].
  • Participate at conferences and workshops.
  • Initiate informal discussions with other gateway managers.
  • Experiment with test programs [15].

Help us to create an Information Resource

Information Resource Database

The Open Archives Forum project has set up a European authority registry for open archives, (not restricted to OAI-compatible archives) that provides additional information about the content of those archives. The registry supports collaboration and dissemination by making available information about OAI-compliant Data and Service Providers, both EC-funded initiatives and other national initiatives in Europe. Users of the registry will fill the database themselves.

For each provider, the database includes details regarding scope of project, content, collection development policy, metadata formats, version of OAI used or other protocol implemented, contact names, and tools in use. The database uses a self-registering interface with a priority on sustainability and automated input techniques.

The database provides information on:

  • Services for open archives
  • Metadata schemas and Interoperability
  • Open archives software tools
  • Current OAI implementations in Europe and beyond.

Contributions are sought

We welcome those with experience and knowledge of open archive implementation to contribute to the growing database for the benefit of the larger community of implementers.

Please register at the Information Resource Database: <>.

and respond to the Technical Validation Questionnaire: <>.

Notes and References

[1] Open Archives Initiative: <>.

[2] Open Archives Initiative: <>,
Open Archives Forum: <>
Signal-Hill OAI Site: <> <>

[3] Extended services: CYCLADES: <>
TORII (TIPS, iCite): <>

[4] Overview of European Activities on OAI - Slides presented at the 2nd OA-Forum workshop in Lisbon: <>.

[5] Subject Gateways: Behavioral and Brain Science Prints Interactive Archive: <>
RePec, Research Papers in Economics: <>
Math-Net: <>
PhysNet/ PhysDoc: <>
CogPrints: <>
Psycology: <>
Education-line: <>
Organic Eprints: <>
Sammelpunkt. Elektronisch archivierte Theorie: <>.

[6] University library document server: DuetT (Univ. Duisburg): <>
edoc (Humboldt Univ., Berlin): <>
eldorado (Univ. Dortmund): <>
E-LIB (SuUB Bremen): <>
HSSS (SLUB, TU Dresden): <>
LMU (Univ. Munich): <>
MONARCH (TU Chemnitz): <>
Open Archive Portal (Univ. Tübingen): <>
OPUS (Univ. Stuttgart): <>.

[7] Media server of universities: timms (University Tübingen): <>.

[8] University library catalogue: Univ. Library Oldenburg: <>.

[9] Library service institution for the region: BSZ-BW (Bibliotheksservice-Zentrum Baden-Württemberg): <>.

[10] Spreading initiative: DINI - German Initiative for Networked Information: <>.

[11] 1st Technical Validation Questionnaire: <>.

[12] Summary of some first results of the 2nd Technical Validation Questionnaire - Slides presented at the 2nd OA-Forum workshop in Lisbon: <>.

[13] Web information sources: <>, <>, <>, <>, <>, <>, <>.

[14] Online journals: Ariadne: < / D-Lib Magazine: <doi:10.1045/dlib.magazine>.

[15] Test programme: <>.

Copyright © Susanne Dobratz and Birgit Matthaei

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | First Conference Report
Home | E-mail the Editor


D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/january2003-dobratz