Carol Minton Morris
Atlanta is a stylish mix of change and contrast. Attractions such as a sleek skyline, traditional gardens, the world's largest fish tank, and the biggest drive-in restaurant (est. 1928) set the stage for the Fourth Annual International Conference on Open Repositories (OR09). The Conference was hosted by the Georgia Institute of Technology where 331 attendees from 23 countries came together to talk about repository technologies that have evolved to include lightweight community solutions and emerging interoperability standards. On par with the rapid growth of this technologically savvy southern city is the rapid development and deployment of multiple significant open source and enterprise repository technologies and initiatives during the last year. From presentations about end-to-end repository solutions designed to preserve significant national and international cultural and scientific resources, to scholarly works and data, simple applications, sustainability strategies, cloud storage solutions, and on-the-spot developer breakthroughs, OR09 was a striking success.
This report is a "one-part-of-the-elephant" look at what happened. For a more complete view please visit the SMARTech Georgia Tech Institutional Repository where general session presentations, user group session presentations and posters are now available. Another interesting way to grok the Conference is to scan the stream of tweets tagged #OR09. A Twitter search reveals hundreds of tweets that contain additional context about formal and informal conference proceedings and gatherings. Here are additional OR09 reports:
A Few Facts
Of the 156 respondents who completed the OR09 Conference Survey, 37% identified themselves as "developers," 78% identified their workplace as "academic," and 35% have been in this field for 5 years or less. Almost all respondents liked the venue, including the excellent Georgia Tech wi-fi services and after-conference events, as well as the types and duration of presentations. Comments included a call for more web broadcasting of OR sessions and publication of conference papers that would make it possible for the community to access content without attending, as many institutions' travel budgets are being cut.
John Wilbanks, VP of Science, Creative Commons, said that knowledge needs a linked Web future in his talk, "Locks and Gears: Digital Repositories and the Digital Commons."
His view on intellectual property is that "the rental problem" means that things got less free, not freer as the Internet emerged. "Renting copies of journals does not allow you to build a service," he said. This issue is complicated by operating system software that people don't know how to use, which are already part of stable systems that resist dramatic change.
He believes, "Our culture, our law, our technology must come together to make this work." To get there Wilbanks suggested three action items going forward solve locally, share globally; (use) standards, standards, standards; and lead by example.
On days one and two of the Conference developers took part in a Repository Challenge competition that was centered around "Developer Lounge" gatherings where prototypes and potential new ideas that demonstrated "the future of repositories" were discussed. To participate, competitors submitted 5-minute recordings of ideas and prototypes. The UK's Joint Information Systems Committee (JISC) and Microsoft awarded $2,000 each to the first-place (Tim Donohue, Mention It) and second-place (Rebecca Koeser, FedoraFS) winners. More information about the winning entries is available at JISC dev8D Repo Challenge Winners.
The four-day conference was packed with general sessions, DSpace, Fedora Commons and ePrints user group sessions, a DuraSpace reception, a poster reception, a banquet at the Georgia Aquarium, a developer challenge and awards ceremony, hallway chatter, intense collaborative "side" meetings, and so many Birds of a Feather sessions that they began to look like a mini-conference-within-a-conference.
Jeremy Frumkin opened one track of the general sessions with an overview of Global Registries Initiative work aimed at combining registry and repository technologies into an interoperable global network of registries. He said that while the potential for such a service is significant, forcing everyone to come to one set of standards is a non-starter. He suggested that content might be harvested locally and aggregated nationally or globally for a global registry strategy to evolve. He demonstrated this type of use case with the Ockham Digital Library Registry Service using the NSDL Data Repository content as an example.
"If we have author identifiers then we have a join point a way to connect data between repositories," said Simeon Warner in describing the open, low cost, subject-spanning, "roll your own" author identifier system implemented at arXiv. Warner argued that the arXiv solution acknowledges the reality of collaborative work that is controlled in a repository by balancing privacy, attribution, and authority issues. ArXiv is owned, operated and funded by the Cornell University Library.
In October of 2008, arXiv surpassed the 500,000 contributed articles mark. In spite of this ongoing success, submission to arXiv, like other repositories, is not painless. Warner explained that building simple yet strategic services on top of Atom streams is one way to promote the use and reuse of arXiv content. A new arXiv Facebook application allows users to gather all their arXiv articles and share them on their Facebook page. Since the service was launched in April 2009, 500 users have downloaded the application and have published 140 stories.
Many Lightweight Views into Complex Repository Content: Enabling Rapid Application Development for Fedora Repositories
Matt Zumwalt believes that complex content will always be relevant. He said, "We want the data to outlast the software trimmings in the repository world. The open data movement is more important than the open software movement." He suggests that in order for users to make use of content a task-specific scope that allows for many different people to have many complex views on data is required.
To accomplish this goal, he advocates for not getting bogged down in implementing complicated software but rather looking for lightweight solutions such as Ruby-on-Rails Active Fedora that creates lightweight Fedora-based applications. Zumwalt suggests that this approach avoids the Star Wars Death Star syndrome, "We don't need to complete the Death Star. It never got finished and then got blown up anyway."
To reach the goal of faculty participation in populating research repositories Michael Witt suggests, "Knowing something about faculty research before talking tech." He reported on Purdue University and the University of Illinois, Urbana-Champaign investigations into "Which researchers are willing to share data, when, with whom, and under what conditions?" This formative research report is from a study that is being conducted of 2 interviews with 20 faculty members who produce data in a variety of research domains.
Characterizing his University of Prince Edward Island (UPEI) Library technical team as "The crazy Canuck repository team," Mark Leggott, University Librarian at UPEI, asked the audience to consider how much money they have on hand to develop a repository research platform. UPEI is the only college in the smallest Canadian province and yet research funding there has increased ten-fold in the last 10 years. Leggott claims that leveraging research to pay for infrastructure that supports multiple campus initiatives, such as digitizing all of the island's cultural heritage, "Is going to be big for the library into the future."
UPEI has developed a strategic research plan to enlist the University in making a commitment to the entire research lifecyle with a single extensible hardware platform that amounts to providing a campus-wide "cloud-like facility."
Virtual Research Environments (VREs) based on Islandora, a platform that joins Drupal and Fedora, now enable over 50 research groups at UPEI; 30 are in active use at various levels. The current VRE research focus is on bioscience, but other scientific disciplines are emerging, as well as some implementations in the humanities.
Leggott concluded by inviting attendees to the Second Annual Red Island Repository Institute at UPEI July 20-24, 2009, where Chris Wilper, Thorny Staples and Matt Zumwalt (among others) will offer hands-on Fedora training.
Institutional Repositories: Contributing to Institutional Knowledge Management and the Global Research Commons
Wendy White suggested that embedding repositories in the knowledge management process of an institution is a key incentive for building the social capital that will lead to greater participation in developing IRs.
John Kunze and Sayeed Choudhury gave two views of the NSF DataNet Prorgam NSF DataNet: Curating Scientific Data that aims to address research and development challenges associated with developing and sustaining data curation infrastructure.
In his DataONE (Observation Network for Earth): Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science, John Kunze pointed out that there are many complexities in the notion of changing global research. Data are massively dispersed, individual and citizen scientists are coming to different conclusions, integrating data sets takes a lot of work, data is subject to unpredictable event loss fire, flood, etc., and we are running out of the ability to store all the data we are collecting in an expanding digital universe.
The DataONE model envisions diverse data nodes that feed into coordinating nodes that retain metadata catalogs with lots of redundancy, investigator tool kits and the ability to grow this network into many other nodes. This concept is a kind of non-repository says Kunze, "With a micro-services approach to data preservation, preservation is not a place but rather a concept."
Sayeed Choudhury's presentation was entitled The Data Conservancy: A Digital Research and Curation Virtual Organization. This is a different kind of model from NSF's perspective, reported Choudhury. The Data Conservancy (DC) holds the view that scientific data sets are the new library material and will build on exemplar scientific projects with U.S. and international partners.
Choudhury asked the audience, "What do we mean by data curation?" He suggested that this activity is not an end, but a means, and not a rigid map, but a set of principles of navigation supported by systems coupled with infrastructure. "I have been thinking a great deal about modularity in technical architecture," he said. The overarching goal of DC is to support new forms of inquiry and learning to meet these challenges through the creation, implementation, and sustained management of an integrated and comprehensive data curation strategy.
Sandy Payette and Michele Kimpton announced that DSpace and Fedora, two of the largest providers of open source software for managing and providing access to digital content, have joined together to create DuraSpace, a three-pronged effort that will take the vision of combining business strategies with markets, products and community-driven open source software and services development into a start-up non-for profit organization. DuraSpace will serve libraries, academia and knowledge-based organizations in the creation, management and preservation of digital content. Synergies between the DSpace and Fedora Commons organizations and communities made it possible to act on this vision as combined, like-minded organizations.
Payette emphasized the new organization would be mission centric and focused on developing software and solutions that best serve the needs of the broader scholarly community. The new organization will offer a diversified portfolio of strategies to forward the mission beyond specific repository technologies. Key goals are to support and look for synergies within the community, and to increase interoperability.
Duraspace is an umbrella organization over product lines that will include DuraCloud, Fedora Commons, DSpace, and Mulgara. Development and support for each product will led by key staff members who will be responsible to the community for each product's evolution.
Sustainability issues require a disciplined look at creating a balance between for-free and for-pay services. Les Carr explained that the the EPrints team is able to deploy their platform to educational and open access communities by making sure that software development does not overshadow broader goals of Open Access to research and support for developing Web enabled effects on scholarly and scientific institutions.
Saying that his group is focused on working with the academic community to make the scholarly communication process visible and accessible Lee Dirks, Microsoft External Research, introduced Zentity, the Microsoft Research Output Platform. He emphasized that this "open edge" technology is not a revenue-generating plan for Microsoft it is a value add to software that most institutions have already purchased. Zentity takes advantage of the fact that many academic institutions have licensed Microsoft software and extends the functionality of that license in support of scholarly communication.
IRODS (Integrated Rule Oriented Data System) data grid software interoperates with Fedora and promises to minimize the amount of labor needed to maintain a collection with regard to recovery and preservation of data. The iRODS storage module for Fedora can be a replacement for the Fedora local storage module or a standalone plug-in module independent of Fedora.
Many opportunities exist for the integration of iRODS with Fedora software. One example is the National Science Digital Library (NSDL) project in which massively archived web pages can be organized hierarchically in iRODS while Fedora manages NSDL information repositories such as the harvest system.
Naming, Branding and Promoting the Institutional Repository: A Social Marketing Approach from the Canadian Perspective
When it's built and ready for deposits who will add content? Wayne Johnston has found that naming and branding institutional repositories using some common social marketing practices can make the difference in whether or not an IR is adopted and used. He discussed examples that included the Canadian Association of Research Libraries and The University of Guelph Library.
Northwestern University Library developed one of the oldest Fedora repositories to power the Encyclopedia of Chicago. In this presentation Bill Parod, Karen Miller, and Claire Stewart explained a mature workflow built on top of a robust repository that allows staff and users to interact with rich and varied collections in an iterative development process.
Alexey Maslov presented a strategy for mapping the OAI-ORE data model to the DSpace architecture resulting in a "specialized and simple" method for building flexibility into Texas Digital Library workflows.
DuraCloud, a planned service of DuraSpace, provides durable data in the compute cloud that is partially inspired by the special issue of the Communications of ACM, "Surviving the Data Deluge." Sandy Payette argued that placing the responsibility for data curation on cloud providers is not fair. There is a mission in being accountable for the public good in guaranteeing durable data. This is where the idea came from for DuraCloud, a Web-based service that makes stored digital content more durable, manageable, accessible, and easier to share.
Preservation is hard even replication is not easy for everyone. Requirements for a DuraCloud service include: replication, easy and elastic provisioning of shared infrastructures, data mining and analysis facilities, and easy access to usable data. Virtualization has driven costs down DuraCloud builds on this.
Michele Kimpton explained that the initial service offering would include replication to up to three providers, a Web-based dashboard, data integrity checking and monitoring, ability to push content from Space and Fedora repository platforms via plug-ins, a pay-per-use cost model and initial compute services on content.
DuraCloud is an open core service (core components will be available for others to build on and use) that will be licensed under Apache OS. The first two DuraCloud pilot partners are the New York Public Library and the Biodiversity Heritage Library.
In explaining University of Southampton's hybrid model approach for distributed repository storage Dave Tarrant said, "Clouds blow away." He argued for taking advantage of the best of local, archival and cloud options.
Supporting the 'Sharing Institution' - Practical Steps towards a More Open Teaching and Learning Culture
Jessie Hey and colleagues are creating a digital learning culture at the University of Southampton that could potentially pinpoint and utilize learning resources from global distributed repositories. EdShare is currently a vehicle for sharing educational materials more easily in a multi-disciplinary institution to support teaching workflows in an open academic atmosphere.
Richard Cave is passionate about transforming scientific journal publishing and has developed Topaz, an Open Source content modeling and storage framework that uses the Fedora Service Framework and Mulgara semantic technology as the core engine, and Ambra, a publishing application built on the Topaz framework.
There are 100M bibliographic records in the aDORe archive where weekly feeds total thousands of new bibliographic records. aDORe Djakota scales to manage this volume.
The Fifth Annual International Conference on Open Repositories - OR10
Next year's OR conference is scheduled to be held in Europe during the spring. Dates and location will be announced later this summer at <http://openrepositories.org>.
Copyright © 2009 Carol Minton Morris