Volume 17, Number 1/2
Table of Contents
Criteria for the Trustworthiness of Data Centres
Helmholtz Centre Potsdam German Research Centre for Geosciences
The use of persistent identifiers to identify data sets as part of the record of science implies that the data objects are persistent themselves. Scientific findings, historical documents and cultural achievements are to a rapidly increasing extent being presented in electronic form in many cases exclusively so. However, besides the invaluable advantages offered by this form, it also carries serious disadvantages. The rapid obsolescence of the technology required to read the information combined with the frequently imperceptible physical decay of the media themselves represents a serious threat to preservation of the information content. Since research projects only run for a relatively short period of time, it is advisable to shift the burden of responsibility for long-term data curation from the individual researcher to a trusted data repository or archive. But what makes a data repository trustworthy? The trustworthiness of a digital repository can be tested and assessed on the basis of a criteria catalogue. These catalogues can also be used as a basis to develop a procedure for auditing and certification of the trustworthiness of digital repository.
The rapid decay of URLs pointing to research resources was an important part of the motivation to use persistent identifiers instead of ephemeral URLs (see e.g. Wren, 2008; Lawrence et al., 2001). Surely, if we use persistent identifiers to identify digital objects as parts of the record of science these objects themselves need to be persistent and kept in long-term digital repositories and archives. How can the trustworthiness of a particular repository in a network of data repositories (e.g. DataCite data publication agents, World Data System, ESA Ground Segment, and others) be assessed?
In recent years, scientific findings, historical documents and cultural achievements are to a rapidly increasing extent being presented in electronic form in many cases exclusively so. Besides the invaluable advantages offered by this form, it also carries serious disadvantages. In paper documents content and representation come together as one unit, where as in digital formats the content is separate from its representation and requires additional information and technology for the user to access the information. However, the underlying technology is still undergoing further development at an exceptionally fast pace. The rapid obsolescence of the technology required to read the information combined with the frequently imperceptible physical decay of the media themselves represents a serious threat to preservation of the information content. This makes our digital assets particularly vulnerable. Given the tasks outlined above, only data centres prepared for long-term preservation can be considered to be trustworthy custodians of our digital heritage.
But what makes a data repository trustworthy? This paper will discuss the fundamentals of criteria catalogues for assessing the trustworthiness of an archive for digital research data and how these criteria can be transferred into audit and certification of research data repositories and archives.
Reference Model and Criteria Catalogues
In the project "Publication and Citation of Primary Scientific Data" (STD-DOI), which laid the conceptual and technical foundations for DataCite, the question arose how to assess the trustworthiness of digital repositories. At the same time other groups started to investigate the issue of trustworthiness of digital archives. To help assess repositories, tools and metrics have been developed by various preservation organizations. To achieve a confluence of approaches in the definition of criteria for trustworthiness of digital archives, members of the digital archiving community developed "Ten Principles for Minimum Requirements for Trustworthy Digital Preservation Repositories" (Center for Research Libraries (CRL) et al., 2007).
As early as 1994 it became apparent that criteria for the assessment of trustworthiness of digital archives were needed (Dobratz et al., 2008; Task Force on Archiving of Digital Information, 1996). In 1995 the International Standards Organisation (ISO) approached the Consultative Committee for Space Data Systems (CCSDS) to develop a formal standard for the long-term preservation of data from space missions. In preparing a draft standard it became clear that a reference model was needed as a base for further standard building activities and that a reference model would solve cross-domain problems regarding the long-term preservation of digital materials (Rank et al., 2010). The outcome of this process was the Open Archival Information System Reference Model (OAIS-RM), or known to most as the "OAIS model". This document went through several consultation and review phases and was published as an international standard (ISO 14721:2003). This standard is currently under review and a draft recommended practice was published in October 2009 (CCSDS, 2009).
Although designed for the curation of space data, the OAIS model aims to be as context-neutral as possible and deliberately avoids jargon from both the IT and archival professions. In this way, OAIS became a lingua franca for archival information systems that has since become widely adopted because it enables effective communication among projects on a national and international scale. With its general approach and universal applicability the OAIS model also served as a reference model for criteria catalogues for the assessment of the trustworthiness of digital archives. Among these, the most widely known catalogues are:
- Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) (Ambacher et al., 2007)
- Catalogue of Criteria for Trusted Digital Repositories (nestor Catalogue) (Dobratz et al., 2006, 2009)
The underlying principles of all of the above mentioned criteria catalogues are derived from the fundamental concepts of quality management, as formulated in the ISO 9000 family of standards. These standards are designed to help organizations ensure they meet the needs of customers and other stakeholders (ISO, 2000). Key concepts in ISO 9000, that also apply to assessing the trustworthiness of digital archives, are the documentation and transparency of activities surrounding the digital archive, the adequacy of the activities to the stated goals and the requirements of the designated user community, and the measurability of the degree of compliance of the archive activities with the criteria for trustworthiness (Dobratz et al., 2008).
The initiatives described above do not operate in isolation from each other. While the OAIS model has already been transferred into an ISO standard, activities to derive an international standard criteria catalogue for trustworthy digital repositories are still under way. Currently, TRAC is work in progress in the ISO technical committee ISO TC20/SC13 and CCSDS (ISO/DIS 16363). The nestor Criteria Catalogue has been published as a draft standard by the German National Bureau of Standards (DIN 31644). This activity is not in competition with ISO/DIS 16363 but is intended to complement the work on the ISO draft through the international standardisation structures of ISO and its national members. In summer 2010 representatives of the respective working groups in CCSDS, DANS and DIN signed a "Memorandum of Understanding" to strengthen cooperation between these initiatives (Giaretta et al., 2010).
Translating Criteria into Practice Auditing and Certification of Digital Archives
Each digital repository has its own targets and specifications. On the other hand, the criteria catalogues for trusted digital repositories have to take a general approach and thus remain at a high level of abstraction. For application to a specific domain and archive instance, the evaluation criteria have to be translated into the specified context and aligned to the needs of the designated user community. At this point, where abstract criteria are translated into specific use cases, the principle of applicability becomes important.
An example for the translation of abstract criteria for the trustworthiness of digital repositories into a specific application is the set of "European LTDP Common Guidelines" of the European Space Agency Ground Segment Coordination Body (ESA GSCB) (Albani et al., 2010) for their ground segment data centres.
In a network of data repositories it is quite likely that not all repositories operate on the same technical level. Yet it may be important to define criteria for auditing the performance of the networked repositories. As the example of the CCSDS has already shown, the need to preserve data from space missions is particularly pressing, at the same time space science has a long record of curating data. Data from space missions are not held in a central archive but are, at least initially, distributed among mission specific data systems. In this setting the need arose to find common guidelines for the long-term preservation of these valuable scientific assets.
At the European Space Agency ESA, European Space Agency Centre for Earth Observation (EO) is the largest European EO data provider. It also operates as the reference European centre for EO payload data exploitation. Long-term preservation of these data and of the ability to discover, access and process them is a fundamental issue and a major challenge at programmatic, technological and operational levels. To harmonise its approach to long-term data preservation among participating data centres the ESA Ground Segment Coordination Body (ESA GSCB), in cooperation with nestor, formulated a set of "European LTDP Common Guidelines".
The ESA "Common Guidelines" document directly addresses ESA ground segment data centres. Its criteria are referenced against the nestor Criteria Catalogue and other relevant standards (e.g. metadata encoding, security). Its structure follows the data life cycle. Early in the design process for the Common Guidelines ESA GSCB recognised that not all data centres operate on the same technical level. At the same time, the requirements towards long-term preservation may differ from case to case. To accommodate these differences among data centres the ESA Common Guidelines introduce three different levels of compliance. Each criterion is graded as essential, important, or optional. The criteria are then combined into profiles, or levels of compliance, with an entry level followed by two more advanced levels. To allow for future developments in long-term digital preservation the grading scheme and levels of compliance can be extended to allow for even more advanced levels.
An approach similar to the European LTDP Common Guidelines is proposed in the European Framework for Audit and Certification of Digital Repositories, which was outlined in a Memorandum of Understanding between CCSDS, DANS and DIN (Giaretta et al., 2010). This framework defines three levels of trustworthiness:
- Basic Certification through the Data Seal of Approval (DSA).
- Extended Certification through DSA plus additional publicly available self-audit with an external review based on ISO 16363 (TRAC) or DIN 31644 (nestor).
- Formal Certification after full external audit and certification based on ISO 16363 (TRAC) or DIN 31644 (nestor).
With a wider adoption of standard-based long-term data curation we will see more applications of criteria catalogues to specific data repositories.
The need for criteria to assess the trustworthiness of digital repositories was recognised by memory institutions and by data centres many years ago. This resulted in a number of initiatives aimed at developing criteria catalogues for trusted digital archives. Data centres, in particular those organised in networks of several data repositories and archives, have shown interest also in auditing and certification of their trustworthiness as long-term digital repositories. The need for certification has led to the initiation of standardisation processes through ISO and national standardisation bodies. The standardisation process and regular exchange between the main initiatives has aided a confluence of these activities, which will lead to a harmonisation of the criteria catalogues. In addition, growing adoption of criteria catalogues for auditing of archives and networks of archives has provided useful feedback on further development of criteria catalogues and auditing procedures for the certification of trusted digital archives.
The author would like to thank his colleagues in the nestor working group "Trusted Archives", in the project "Publication and Citation of Scientific Primary Data", and at ESA GSCB for the interesting and fruitful discussions. The author gratefully acknowledges support by the German Research Foundation (DFG) through the project "Publication and Citation of Scientific Primary Data" (STD-DOI), by the German Federal Ministry for Education and Research through nestor, and by ESA.
Albani, M., V. Beruti, M. Duplaa, C. Giguere, C. Velarde, E. Mikusch, M. Serra, J. Klump, and M. Schroeder (2010), Long term preservation of earth observation space data - European LTDP Common Guidelines (Version 1.1), European Space Agency, Ground Segment Coordination Body, Frascati, Italy. Available from: http://earth.esa.int/gscb/ltdp/EuropeanLTDPCommonGuidelines_Issue1.1.pdf
 Ambacher, B. u. a. (2007), Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC), CRL Center for Research Libraries, Chicago, IL. Available from: http://www.crl.edu/sites/default/files/attachments/pages/trac_0.pdf
 CCSDS (2009), Audit and certification of trustworthy digital repositories, Draft Recommended Practice, Red Book, Consultative Committee for Space Data Systems, Greenbelt, MD. Available from: http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206520R1/Attachments/652x0r1.pdf
 Center for Research Libraries (CRL), Digital Curation Centre (DCC), Digital Preservation Europe (DPE), and Competence Network for Digital Preservation (nestor) (2007), Ten Principles, Available from: http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-re
 Digital Curation Centre (DCC), and Digital Preservation Europe (DPE) (2007), DCC and DPE Digital Repository Audit Method Based on Risk Assessment (DRAMBORA), Digital Curation Centre, Edinburgh, UK. Available from: http://www.repositoryaudit.eu/download
 DINI AG Elektronisches Publizieren (2006), DINI-Certificate Document and Publication Services 2007 (Version 2.0), Deutsche Initiative für Netzwerkinformation (DINI), Göttingen, Germany. Available from: http://nbn-resolving.de/urn:nbn:de:kobv:11-10075687
 Dobratz, S. et al. (2006), Catalogue of Criteria for Trusted Digital Repositories, Die Deutsche Bibliothek, Frankfurt (Main), Germany. Available from: http://edoc.hu-berlin.de/series/nestor-materialien/8/PDF/8.pdf
 Dobratz, S. et al. (2009), Catalogue of Criteria for Trusted Digital Repositories, nestor materials, Deutsche Nationalbibliothek, Frankfurt (Main), Germany. [online] Available from: http://nbn-resolving.de/urn:nbn:de:0008-2010030806
 Dobratz, S., P. Rödig, U. M. Borghoff, A. Schoger, and B. Rätzke (2008), The Use of Quality Management Standards in Trustworthy Digital Archives, In: Proceedings of the Fifth International Conference on Preservation of Digital Objects Joining up and working: Tools and Methods for Digital Preservation, A. Farquhar (Ed.), 8 pp., British Library, London, UK. Available from: http://nbn-resolving.de/urn:nbn:de:kobv:11-10092248
 Giaretta, D., H. Harmsen, and C. Keitel (2010), Memorandum of Understanding to Create a European Framework for Audit and Certification of Digital Repositories, [online] Available here.]
 ISO (2000), ISO 9000:2000: Quality management systems Fundamentals and vocabulary, Standard, International Organization for Standardization (ISO), Geneva, Switzerland. Available from: http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=29280
 Lawrence, S., F. Coetzee, E. Glover, D. Pennock, G. Flake, F. Nielsen, R. Krovetz, A. Kruger, and L. Giles (2001), Persistence of Web References in Scientific Research, IEEE Computer, 34(2), 26-31. doi:10.1109/2.901164
 Rank, R. H., C. Cremidis, and K. R. McDonald (2010), Archive Standards: How Their Adoption Benefit Archive Systems, In: Standard-Based Data and Information Systems for Earth Observation, L. Di and H. K. Ramapriyan (Eds.), pp. 127-142, Springer Berlin Heidelberg, Heidelberg, Germany. doi:10.1007/978-3-540-88264-0_8
 Sesink, L., R. van Horik, and H. Harmsen (2008), Data Seal of Approval, Data Archiving and Networked Services (DANS), Den Haag, The Netherlands. Available from: http://www.datasealofapproval.org/
 Task Force on Archiving of Digital Information (1996), Preserving Digital Information, Commission on Preservation and Access and the Research Libraries Group, Mountain View, CA. Available from: http://www.rlg.org/legacy/ftpd/pub/archtf/final-report.pdf
Wren, J. D. (2008), URL decay in MEDLINE-a 4-year follow-up study, Bioinf., 24(11), 1381-1385, doi:10.1093/bioinformatics/btn127
About the Author
Jens Klump is e-Science Project Manager at the Helmholtz Centre Potsdam German Research Centre for Geosciences in Potsdam, Germany. As "embedded scientist" with degrees in geology and oceanography Dr. Klump joins geological research projects to determine their information needs and to help design new e-Science tools. He participated in the project "Publication and Citation of Primary Research Data" which laid the foundations for DataCite, and was a member of the working groups "Trusted Archives" of both the "Competence Network for Digital Preservation" (nestor) and of the German Bureau of Standards (DIN).