Carol Minton Morris
It was cool and sunny earlier this spring as over 450 delegates from 35 countries made their way to the University of Southampton in the UK where the School of Electronics and Computer Science was set to host the Third International Open Repositories Conference. This year's conference, held April 1-4, 2008, was sponsored by U.S. and European companies and organizations including Sun Microsystems, Electronic and Computer Science, University of Southampton, JISC, Texas Digital Library, Web Science Research Initiative, Repositories Support Project, Jorum, The Depot, and Enovation Solutions. Conference organizers from three open source software repository groups EPrints,1 Dspace,2 and Fedora Commons3 were on hand to present solutions, and to exchange ideas with attendees in day-long user group meetings and during informal poster sessions and receptions.
The theme "Practice and Innovation" was evident in the main conference program as well as in meetings and events held in conjunction with OR08, as presenters told stories of experimentation often coupled with the challenges of running day-to-day repository services for their companies and institutions. In sessions focused on interoperability, legal, models, architecture and frameworks, national and international perspectives, scientific repositories, social networking, sustainability issues, usage, and Web 2.0, presenters shared both groundbreaking and tried and true use cases, solutions, and results of inquiries for how repositories can better meet the needs of the global scholarly and scientific communities that they serve.
The conference opened with an observation by conference co-chair Les Carr, University of Southampton. He suggested that the collective efforts of the delegates repository managers, librarians, archivists, developers, project leaders and representatives from IT companies who are working directly with worldwide research and information producers are creating a global web of knowledge.
Carr and his University of Southampton team have made all OR08 conference proceedings available in, not surprisingly, a repository. You may browse and download presentations and posters here: <http://pubs.or08.ecs.soton.ac.uk/view/subjects>.
Peter Murray-Rust, a Reader in Molecular Informatics at the University of Cambridge and Senior Research Fellow of Churchill College, gave the keynote address. He began by asking how many of the assembled attendees worked in laboratories. A small number of hands went up. "We have a data drought (in laboratories)," He said. "Permission barriers caused 90% by people and only 10% by technology are preventing direct access to up-to-date scientific research findings." Murray-Rust suggested that most discovery takes place along the long tail of science, among small groups of laboratory scientists who need access to current data. He also noted that there is a vast difference between the technical service needs of bench scientists versus those of theoretical scientists. "You can't sit in a building with ivy growing up it and be removed from scientists in laboratories with their test tubes and small furry animals and call yourself a 'scientific repositorian,'" he said. Murray-Rust gave several examples of technologies and services that would help scientists in laboratories. His suggestions included RSS systems that email "active molecules" harvested and updated hourly by robots; incentives that compel scientists to contribute; text mining technologies that reclaim "lost" data from PDFs; getting involved in scientific data collection workflow "upstream" closer to where data is created; and developing access to pedabyte stores at universities so that there are facilities to hold scientific data at the source.
Dean Krafft, Cornell University, gave a presentation in the Interoperability session on NCore, an open-source platform for creating digital libraries united by a common data model and interoperable applications. Built on top of Fedora, NCore includes a suite of library management tools and services, as well as a variety of end-user tools for the collaborative creation of context and content. The NCore system already supports the National Science Digital Library (NSDL), implementing one of the largest production Fedora repositories in existence, and it is being considered for other new digital library implementations. Krafft's presentation was significant in scope and impact, and provided attendees with multiple ideas for extending off-the-shelf Web 2.0 tools to integrate with metadata and structured content in a digital library framework.
Sustainability affects what happens over time to virtually all aspects of personnel and technology associated with ongoing repository operations. Stuart Haber began the Sustainability sessions at OR08 with a presentation entitled, "A Content Integrity Service for Digital Repositories." He suggests that it is critical to be able to verify that a document, or piece of content, is what it claims to be. His system creates a "witness" for each piece of a document paragraph, sentence, speech, for example. The "witness" then computes a certificate of authenticity that is coupled and stored with the original document.
Mary Marlino, director of e-Science and the NCAR Library and former director of DLESE, spoke frankly from personal experience about what to do when the email arrives informing you that your funding will not continue. She explained her planning for sustainability and asked that people think openly about what happens when and if a funding agency might "pull the plug." She introduced DLESE, The Digital Library for Earth System Education one of the first big digital library projects in the U.S. DLESE was a grassroots, community-led project with a completed collection of 13,500 digital educational resources organized into 41 thematic collections. As an organization DLESE was a focal point for community action in geoscience education and developed numerous best practices around building education-based digital libraries.
The digital artifact is now being sustained in partnership with several groups NSDL, DLS, and NCAR. DLESE will remain a valuable educational resource, and an important partner in NSDL. Although the community governance element of DLESE is no longer funded, Marlino hopes that it will remain very much a vital community focal point, and anticipates continued active dialog with the community of DLESE users.
Les Carr presented "End of Life Scenarios for the Repositories of Virtual Organizations." In giving this talk he hoped that he would not be known as the "man who burns repositories." The alternative title was "Or: who cleans up when the party ends?" His investigation focused on whether a repository could be sufficiently simplified into a low-cost static website once its community of users disappear along with their requirements for active services. Collecting and curating over time is what a persistent and permanent repository backed by policies and institutional commitment implies this activity is not intended to be a fly-by-night dumping ground, even though events, organizations and funding streams are always in flux.
How old is old? How persistent is persistent? Carr talked about venerable institutions like the University of Oxford, for example, which existed in 1096 and perhaps even earlier. Seats of learning are by their nature institutions that can be counted on to last. Repositories created by virtual organizations that come about as temporary collaborations between independent partners, on the other hand, may last about a decade or less. Virtual organizations often come into being as a part of grant activities without the benefit of clear institutional affiliation. Carr suggests that the "squillions of dollars" spent on international, highly collaborative projects may ultimately equate to longer repository lifespans. If the institution that backs the repository, however, disappears, contents are often tied up in administrative and resource allocation knots, leaving information consumers without access unless inexpensive solutions are available.
Around the Edges
Conference organizers worked with the JISC CRIG (Common Repository Interface Group) to design and support "The Repository Challenge." Nineteen small teams of developers competed during OR08 to achieve goals set by repository managers and the user community. The teams were hard at work into the night on each day of the conference. The OR08 web site states, "We think this (Repository Challenge) is the first time ever that an effort has been made to get rapid cross-platform international development happening in the same place and at the same time with some of the best repository developers in the world!"
Winners were Tim Brody, University of Southampton; and team members Ben O'Steen, University of Oxford and Dave Tarrant, University of Southampton, who modeled layered Fedora Commons and Dspace t-shirts while accepting the award at Thursday evening's banquet. Their application was 'Mining the ORE' (Object Reuse and Exchange Protocol). More information about the Repository Challenge may be found in the JISC Information Environment Team blog.4 Interactive flip-charts from OR08's Repository Challenge teams that illustrate highly collaborative and creative engineering processes may be found in CRIG's Flickr photostream.5
Open Archives Object Reuse and Exchange (ORE) AT OR08
The OAI-ORE UK Open Day6 funded by JISC was the main event during day four. Conference delegates were there to find out about the latest improvements to the third alpha release of ORE "specifications that allow distributed repositories to exchange information about their constituent digital objects."
Herbert Van de Sompel, Research Library, Los Alamos National Laboratory opened the meeting with background and motivations for creating a new resource-centric infrastructure. He was followed by Carl Lagoze, Computing and Information Science, Cornell University, who described the underlying data model and gave an overview of implementation in Atom. Michael Nelson, Computer Science, Old Dominion University, talked about discovery of resource maps. Community members Robert Sanderson, University of Liverpool,Jim Downing, University of Cambridge Center for Molecular Informatics, and Thomas Place, DARE and University of Tilburg, gave quick overviews of their experiments with ORE. During the afternoon session Simeon Warner, Computing and Information Science, Cornell University, focused on practical issues including proxies, relationships, lineage, nesting, and implementation in RDF/XML and RDFa.
Web artifacts are seldom single entities. An individual web page, for example, might be comprised of multiple bits of text, images, and media in several formats that together form a unit of related information on the internet. While information managers are in the business of developing applications that collect sets of related information like web pages, ORE standards provide a way to describe and exchange whole "constellations" of complex aggregations of scattered and varied information using a widely applicable description of sets of objects.
Save the Date
The Fourth International Open Repositories Conference 2009 will be hosted by the Georgia Institute of Technology in Atlanta May 18-21, 2009. Check for more information as it becomes available at <http://openrepositories.org/>.
4. JISC Information Environment Team Blog, <http://infteam.jiscinvolve.org/2008/04/18/open-repositories-2008-2/>.
6. OAI-ORE UK Open Day, <http://www.openarchives.org/ore/meetings/Soton/agenda.htm>.
Copyright © 2008 Carol Minton Morris