D-Lib Magazine
The Magazine of Digital Library Research
transparent image

D-Lib Magazine

January/February 2011
Volume 17, Number 1/2
Table of Contents


Access to Research Data

Introduction From the Guest Editors

Jan Brase
German National Library of Science and Technology

Adam Farquhar
British Library



Printer-friendly Version



Scientists around the world are addressing the need to increase access to research data. Science is international and global cooperation is imperative. DataCite, launched in December 2009, is an association of more than a dozen members from 10 countries and growing, that enables researchers to locate, identify, and cite research datasets with confidence, and plays a global leadership role in promoting the use of persistent identifiers for datasets. In June 2010, the first DataCite summer meeting took place in Hannover, Germany and provided a forum for 25 speakers and nearly 100 participants from Europe, North America and Australia to exchange information for handling research data. This special issue of D-Lib Magazine includes eight articles derived from talks given at the summer meeting and one additional article on the quality of research data. Together, these articles provide a snapshot of the state-of-the-art on these topics.



Access to research data has become a global imperative. Around the world, the organisations that fund scientific research at the national level are addressing the need to increase awareness of, and accessibility to, research data.

Nevertheless science itself is international. Scientists participate in global unions, associate themselves with their disciplines, and collaborate in projects. They share results with colleagues all over the world and use information from many global sources.

In order to address the challenge of increasing access to research data, an effective approach much take this dual nature into account. It must encourage global cooperation for data access via national representatives:

  • a global cooperation, because scientists work globally and scientific data are created and accessed globally;
  • with national representatives, because most scientists are embedded in national funding structures and research organisations.

These insights underlay our work to establish DataCite — an international association that aims to support researchers by enabling them to locate, identify, and cite research datasets with confidence. DataCite was launched on December 1st 2009 in London. As of December 2010, its international membership has grown to include:

German National Library of Science and Technology (TIB)
Australian National Data Service (ANDS)
California Digital Library, USA
Canada Institute for Scientific and Technical Information (CISTI)
German National Library of Economics
German National Library of Medicine (ZB MED)
GESIS — Leibniz Institute for the Social Sciences
Library of the ETH Zürich, Switzerland
Library of TU Delft, Netherlands
L'Institut de l'Information Scientifique et Technique (INIST), France
Office of Scientific and Technical Information (OSTI), US Department of Energy
Purdue University, USA
Swedish National Data Service (SND)
Technical Information Centre of Denmark
The British Library

Further countries and organisations are encouraged to join the association.

DataCite plays a global leadership role promoting the use of persistent identifiers for datasets. Through its members, it establishes and promotes common methods, best practices, and guidance. The member organisations work independently with data centres and other holders of research data sets in their own domains. As science is global with individual researchers working and publishing, DataCite is global with local partners offering services and advice where they are needed by the scientists.


Identifying datasets

As Nelson points out in [doi:10.1038/461160a], nothing exists in any useful way until it is identified. Providing datasets with identifiers is an absolutely essential pre-requisite for citing, locating, retrieving, using, and even receiving credit for creating them. As a consequence, DataCite has focused on providing identifiers for datasets that support these uses. While DataCite supports a variety of identifier schemes, the focus of its work has been to use Digital Object Identifiers (DOIs) for datasets.

A DOI® name is used to cite and link to electronic resources. DOI names are widely used for scientific information and are the most common method in place to identify scientific articles. A DOI name can refer to any sort of resource including research data. The DOI® System differs from other reference systems commonly used on the Internet, such as the URL. A DOI name is permanently linked to the resource itself, not just to the place where it is located — a DOI is a name, not an address.

A major advantage of using the DOI System for data is that scientists, publishers, and libraries can use the same syntax and technical infrastructure for datasets that are already in use for research articles. The DOI System offers persistent stable references to scientific content and an easy way to connect the article with the underlying data. For example:

The dataset:

Storz, D et al. (2009):
Planktic foraminiferal flux and faunal composition of sediment trap L1_K276 in the northeastern Atlantic. doi:10.1594/PANGAEA.724325

is a supplement to the article:

Storz, David; Schulz, Hartmut; Waniek, Joanna J; Schulz-Bull, Detlef; Kucera, Michal (2009): Seasonal and interannual variability of the planktic foraminiferal flux in the vicinity of the Azores Current. Deep-Sea Research Part I-Oceanographic Research Papers, 56(1), 107-124, doi:10.1016/j.dsr.2008.08.009

Since 2005, TIB has been a DOI Registration Agency with a focus on registering research data. TIB provided DOI registration and core metadata services for datasets. The research data was held at data centres or other trusted institutions. The content holders are responsible for quality assurance, metadata creation, storage and access.


DataCite as an international initiative

At the end of 2006, TIB established contact with Library of ETH Zürich in Switzerland, which was interested in using the same infrastructure and approach to assign DOI names to datasets. From this collaboration, the core idea of DataCite was born. By March 2009, nine organisations signed a Memorandum of Understanding, and in December 2009 DataCite was founded. By December 2010 the DataCite members have together registered over one million scientific objects with DOI names.

In June 2010, the first DataCite summer meeting took place in Hannover Germany. The theme of the meeting was "Making datasets visible and accessible" and it provided a forum for data centres to exchange experience, workflows and standards for the handling of research data. It brought together 25 speakers and nearly 100 participants from Europe, North America and Australia to discuss:

  • Metadata for Datasets — More than pure citation information?
  • Peer-review systems and the publication of data sets — Ensuring quality
  • Trustworthiness of data centres — A technological, a structural and a legal discussion
  • Best-practise and examples — What can be done and is done worldwide?
  • Datasets and scholarly journals — A perfect combination?

In this special issue of D-Lib Magazine on data you will find eight publications derived from talks given at the summer meeting, as well as this introduction and one additional article on the quality of research data. They provide a snapshot of the state-of-the-art on these topics.

We hope to see you all at the next DataCite summer meeting in September 2011 in California.



We would like to thank Larry Lannom and Catherine Rey from D-Lib for giving us the opportunity to make this special issue a reality. Furthermore, we would like to thank Irina Sens from the German National Library of Science and Technology for her most valuable help in the editorial work.


About the Authors

Photo of Jan Brase

Jan Brase has a degree in Mathematics, and a PhD in Computer Science. His research background is metadata, ontologies and digital libraries. Since 2005, he has been head of the DOI Registration Agency for research data at the German National Library of Science and Technology (TIB). He is also Managing Agent of DataCite. DataCite was founded in December 2009 and has set itself the goal of making online access to research data for scientists easier by promoting the acceptance of research data as individual, citable scientific objects.

Photo of Adam Farquhar

Adam Farquhar is Head of Digital Library Technology at the British Library, where he initiated the Library's dataset programme and co-founded its digital preservation department. From 2006-2010, he led the EU co-funded Planets Digital Preservation project. He is President of DataCite, Chairman of the Open Planets Foundation, and Board member of the Digital Preservation Coalition. Prior to joining the Library, he was the principle knowledge management architect for Schlumberger (1998-2003) and research scientist at the Stanford Knowledge Systems Laboratory (1993-1998). He completed his PhD in Computer Sciences at the University of Texas at Austin (1993). His work focuses on improving the ways in which people can represent, find, share, use, exploit, and preserve digitally encoded knowledge.

transparent image