Jennifer L. Marill
Edward C. Luczak
The National Institutes of Health (NIH) National Library of Medicine® (NLM) undertook an 18-month project to evaluate, test and recommend digital repository software and systems to support NLM's collection and preservation of a wide variety of digital objects. This article outlines the methodology NLM used to analyze the landscape of repository software and select three systems for in-depth testing. Finally, the article discusses the evaluation results and next steps for NLM. This project followed an earlier NLM working group, which created functional requirements and identified key policy issues for an NLM digital repository to aid in building NLM's collection in the digital environment.
In order to fulfill the National Library of Medicine's (NLM) mandate to collect, preserve and make accessible the scholarly and professional literature in the biomedical sciences, irrespective of format, the Library has deemed it essential to develop a robust repository infrastructure to manage a large amount of material in a variety of digital formats. A number of NLM's Library Operations program areas need such a digital repository to support their existing digital collections and to expand the ability to manage a growing amount of digitized and born-digital resources.
In May 2007, the Associate Director for Library Operations approved the creation of the Digital Repository Evaluation and Selection Working Group (WG) to evaluate commercial systems and open source software and select one (or a combination of systems and software) for use as an NLM digital repository. The group's work followed an earlier Digital Repository Working Group, which created functional requirements and identified key policy issues for an NLM digital repository to aid in building NLM's collection in the digital environment.
Project Scope, Deliverables and Working Guidelines
The scope of the Digital Repository Evaluation and Selection project was to perform an extensive evaluation of existing commercial and open source digital repository systems and software. The evaluation included those systems and software already identified by the Digital Repository Working Group, as well as any new or previously overlooked software. The evaluation was to include hands-on testing against a set of functional requirements based on the Open Archival Information System (OAIS) model ingest, archival storage, data management, administration, preservation planning, and access as specified in the NLM Digital Repository Policies and Functional Requirements Specification . The evaluation was also to include an assessment of the systems and software based on a set of non-functional requirements.
The primary deliverable of this project was a recommendation on which system or suite of software to implement for NLM's digital repository. The recommendation needed to take into account software costs and staffing resources necessary for a pilot or initial implementation. The WG was also charged with forwarding any policy issues that needed management resolution for either this project or implementation. Policy issues related to the priorities for digital preservation were outside the scope of this project.
The full project team held weekly 1.5 hour meetings. Sub-groups were created to meet separately and conduct specific analysis and testing tasks. Working documents and correspondence were posted on a project wiki. The WG included staff from many areas of Library Operations, including the History of Medicine Division, the Public Services Division, and the Technical Services Division; the Office of Computer and Communications Systems; and one staff each from the National Center for Biotechnology Information and from the NIH Library. The project was originally expected to conclude within nine months to one year, however, the time required for hands-on testing led to the extension of the project to nearly 18 months.
The following working guidelines were developed to help further define the goals and scope of the NLM digital repository:
Staff from NLM's Library Operations will define the repository requirements and capabilities, and manage the lifecycle of NLM digital content.
The Working Group held its kick-off meeting June 12, 2007 and completed all work by December 2, 2008. The project was divided into the following phases:
Initial Evaluation of Ten Systems and Software
Based on the work of the previous NLM Digital Repository Working Group, the WG scanned the literature and conducted investigations to construct a list of ten systems and software for initial evaluation. The ten systems included:
The WG then developed a set of "Master Evaluation Criteria," to provide a decision method to narrow the ten systems to three systems for detailed consideration. At this point, tool functionality as described in available software documentation was considered one of many factors in this down-selection process.
The Master Evaluation Criteria included:
Each criterion was equally assessed on a scale of 0 (none of the criterion is present) to 3 (high level of criterion is present). After the functional and non-functional criteria above were addressed, cost of software deployment, including initial cost of software, plus cost of software integration, modifications, and enhancements, were also considered on a scale of 0 (highest cost) to 3 (lowest cost).
In order to conduct these initial investigations, the WG was divided into four subgroups and each subgroup evaluated two or three of the ten systems. Each subgroup presented their research findings and initial ratings to the full WG. The basis for each rating was discussed, and an effort was made to ensure that the criteria were evaluated consistently across all ten tools. The subgroups finalized their ratings to reflect input received from discussions with the full Working Group.
All ten systems were ranked and three were identified for further consideration and in-depth testing: DigiTool, DSpace, and Fedora. Because Fedora has a limited user interface, the WG selected Fez , a Web interface to Fedora, to enable more effective testing.
In-Depth Testing of Three Systems
Using a staggered schedule, DSpace 1.4.2, DigiTool 3.0, and Fedora 2.2/Fez 2 Release Candidate 1 were installed on NLM servers for extensive hands-on testing. The WG established a ground-rule that the latest production versions of each system would be installed and tested. OCCS conducted demonstrations and tutorials for DSpace and Fedora, and Ex Libris provided training for DigiTool, so that members could familiarize themselves with the functionalities of each system.
A Consolidated Digital Repository Test Plan  was created based on the requirements enumerated in the NLM Digital Repository Policies and Functional Requirements Specification. The Test Plan contained 129 specific tests. Each test could be scored from 0 to 3, indicating the extent to which the test element could be successfully demonstrated or documented (0=none, 1=low, 2=moderate, 3=high). Each system could receive a total score of 387 if all tests were scored as 3 (high). All the test elements were represented in a spreadsheet for convenience.
Four subgroups of the WG (Access, Metadata and Standards, Preservation and Workflows, Technical Infrastructure) were formed to evaluate specific aspects of each system. Each test was allocated to one of the four subgroups, who were tasked to conduct that test on all three systems. Scores were added up for each subgroup's set of test elements. A cumulative score for each system was calculated by totaling the four subgroup scores. In addition to the hands-on testing, the WG contacted numerous users and customers of all the software. Information was elicited about software use, the size and nature of the repository collections, the size and skill sets of the repository teams, etc.
Recommendations and Next Steps
After completion of all testing, the WG recommended that NLM select Fedora as the core system for the NLM digital repository. The WG was highly impressed with a number of Fedora capabilities, including the strong technology roadmap, the excellent underlying data model that can handle NLM's diverse materials, the active development community, Fedora's adherence to standards, and Fedora's use by leading institutions and libraries with similar digital project goals. Fedora is also seen as a low risk choice for now, as it is open source and no license fees are involved.
The WG also recommended that work should begin immediately on a Fedora pilot project using four identified collections of materials from NLM and the NIH Library. Most of these collections already have content files and metadata for loading into a repository. After an initial pilot phase at approximately six to eight months, the effort will be evaluated. NLM senior staff concurred with this recommendation and work has already begun on the pilot implementation.
Implementation of the pilot using Fedora will provide real-world experience with actual NLM collections. The four pilot collections contain a variety of digital formats: digitized monographs on cholera dating from 1830 to 1890; digitized motion pictures of a historical nature; digitized images from important historical anatomical atlases; and a selection of annual reports from NIH Institutes and Centers.
The pilot will focus on Submission Information Package (SIP) creation, developing data models for the above material, and understanding metadata needs. The pilot will also investigate "companion" tools that work with Fedora, focusing on three areas: administrative interface tools (e.g., Fez, Muradora ); file identification, verification and characterization tools (e.g., JHOVE , DROID ); and user access tools such as page turning software.
As each pilot collection is completed, NLM intends to evaluate its work with the following types of questions:
NLM has much work ahead of it, but the value of its in-depth evaluation and selection process has been significant. Using a set of well-defined evaluation criteria and test cases has enabled NLM to perform hands-on testing and develop an in-depth understanding of repository software prior to undertaking its initial implementation.
Members of the Digital Repository Evaluation and Selection Working Group were: Diane Boehr, Brooke Dine, John Doyle, Laurie DuQuette, Jenny Heiland-Luedtke, Felix Kong, Kathy Kwan, Edward Luczak (contractor), Jennifer Marill (chair), Michael North, Deborah Ozga, John Rees, and Doron Shalvi (contractor).
For More Information
Notes & References
1. National Library of Medicine. Digital Repository Policies and Functional Requirements Specification. March 16, 2007. <http://www.nlm.nih.gov/digitalrepository/NLM-DigRep-Requirements-rev032007.pdf>.
13. National Library of Medicine. Digital Repository Test Plan. <http://www.nlm.nih.gov/digitalrepository/Consolidated-DR-Testplan-Template.xls>.
17. National Library of Medicine. Recommendations on NLM Digital Repository Software. December 2, 2008. <http://www.nlm.nih.gov/digitalrepository/DRESWG-Report.pdf>.