Volume 19, Number 1/2
Table of Contents
2012 CNI Fall Membership Meeting: Scholarship for the Future
Carol Minton Morris
The Coalition for Networked Information (CNI) Fall Member Meeting held on December 10-11, 2012 in Arlington, Virginia included presentations that explored how to manage, maintain, share, identify, preserve, make use of and build global communities around all types of digital content with a focus on research data.
The term "digital library" moved into the common lexicon during the 1990s. Back then intellectual property issues centered around providing some access to metadata and documents online. In 2013 there is a desire for granular control of content down to the level of individual data sets and single images coupled with the ability to provide open access and long-term preservation for large tracts of research data and all types of digital content. At the Coalition for Networked Information (CNI) Fall Member Meeting held on December 10-11, 2012, Clifford Lynch, CNI Executive Director, began with an overview of strategic CNI activities in which he reminded the audience that "Scholarly practice does not stand still" (Recording available: "CNI's Perspective on 2012 & 2013", Clifford Lynch). His remarks set the stage for presentations that explored how to manage, maintain, share, identify, make use of and build global communities around all types of digital content while providing access and the ability to preserve the public record of research and scholarship. Nimble strategies, technologies, and opportunities for networking and experimentation were highlighted in the proceedings.
A selection of project briefings appears below. The full schedule of events showing all topic areas covered can be found here.
Digital Preservation Network (DPN) Update
James Hilton, University of Virginia, reviewed the progress 60 institutional members have made towards establishing a federated preservation network, owned by and for institutions of higher education. DPN management will focus on continually validating to ensure that content can be replicated in case of disaster through a series of contributing nodes (local repositories) which ingest content, and federated, replicating nodes, which are long-term preservation digital repositories for the contributing nodes.
An attendee posed a question about DPN as a US-based initiative. There was interest in where copyright lies because of the differences between US copyright laws and those of the European Union.
Hilton explained that the DPN philosophy is to "Line 'em up and start solving them" as replicating nodes become operational while acknowledging that real storage and policies have an inherent amount of complexity.
Another comment suggested that Internet 2 could serve as an operational model for DPN: I2 "fuels discovery and DPN preserves it".
Hilton feels that an immediate DPN benefit is in helping to facilitate inter-institutional conversations about our capacity to preserve scholarship. He mentioned the E-Science Institute as an example of an initiative (see below) that forces us to ask local questions about the future of scholarshipsomething that is broader than a single institution.
E-Science Institute: An Approach to the Challenge of Digital Research
Mackenzie Smith, University of California, Davis, explained that the ARL/DLF/DuraSpace E-Science Institute (ESI) was designed as a hands-on course to help academic research libraries match their strengths and natural abilities to all aspects of the data lifecycle. The terms "E-Science and E-Research" were commingled in this approach where participants did research on their campuses to come up with strategic agendas for their institutions and then exchanged ideas during the course. Taking a wide view of the data lifecyle offered participants a variety of ways to think about moving towards support for research and data preservation.
ESI was first offered in 2011 because demand for help in this area from research libraries had grown. The fall 2012 offering of the DuraSpace/ARL/DLF E-Science Institute was conducted online and in-person with a face-to-face capstone event that concluded on Dec 13 in Arlington, Virginia. To sign up to receive information about future offerings, visit the DuraSpace ESI Contract Form.
The Research Data Alliance: A Forum for Global Cooperation on Data Infrastructure
In introducing the motivation behind the Research Data Alliance (RDA) Fran Berman, Rensselaer Polytechnic Institute, explained that people have discovered data not just as a foundationit is everything that's going on and represents a competitive advantage. Further she pointed out that in creating access to data for research purposes "It's other people's data that matters." No single data policy will work for everyone, but not understanding the urgency of managing and preserving data, waiting for standards or relying on bad infrastructure is not a solution. The Research Data Alliance is a new organization that aims to jump start global data exchange with code, best practice, tools, best practices.
Berman reminded the audience that to move forward it's important not to "point he guns inward". She warned, "If we're trying to boil the ocean then there's nowhere to go." She invited attendees to participate in the first RDA plenary meeting March 18-20 in Gothenburg, Sweden.
Academic Preservation Trust (APTrust)
Robin Ruggaber, University of Virginia, explained that the APTrust is a critical piece of emerging preservation infrastructure because our data is in jeopardy. Lack of diversity in geographic regions where data is kept, unexpected weather events and even lagging political support can cause fragile scholarly resources to disappear. It is not in the commercial interest to preserve content, leaving it up to higher education, and in particular libraries, to preserve resources for scholarship over the long-haul. That is why a consortium of academic institutions committed to the creation and management of academic and research content for multiple institutions came together to form the APTrust.
Twelve current partners make up the APTrust "Core implementation team" who agree that a successful effort will take more than storage solutions and technology. To make the APTrust sustainable community building, business planning and marketing will be part of the effort.
Andrew Woods, DuraSpace, answered the provocative question, "What do you get from the APTrust"? In the context of ensuring that content survives into the future, APTrust partners have access to redundant copies of their content, the ability to leverage economies of scale, generate audit reports and access to an aggregated repository that will take advantage of DuraCloud technology for consortial work on top of the collection.
Woods also gave attendees a deep dive into the technical infrastructure. APTrust software will roll out in phases with an end-to-end data flow tie-in to DPN (Digital Preservation Network). The data flow works as follows:
- Local partner IRs are represented at the top layer in "3 flavors of repositories".
- Ingest packages and associated metadata are held in DuraCloud "Staging areas".
- When an ingest submission is complete and validated it then moves into a preservation space.
- Instant and ongoing bit checking along with the ability to store content with multiple providers will be available through DuraCloud.
The APTrust is looking for cost-effective disaster recovery, access services, long-term preservation, tools and best practices and hosting a portal on top of collections to enhance preservation strategies and opportunities for partners.
Building an Archival Identity Management Network: Transforming Archival Practice and Historical Research
Presenters offered views on related identity management systems and issues: Social Networks and Archival Context (SNAC) and Building a National Archival Authorities Infrastructure.
SNAC (2010-2014) highlights records of interest on the prototype homepage. Developers aim to find out if this kind of "Facebook for dead people," comprised of authority records pieced together from unstructured snippets of personal histories and events, is of general interest to a wide swath of users from 'high tech' to 'generally interested' people. The curation of archival records generated while people are living and working is the cornerstone of our cultural record.
The system leverages Resource Discovery Framework (RDF) to bring this data together. Brian Tingle, California Digital Library, reminded the audience that "Lenny the Linkhead" wants linked and open data technical details while "Randy the Researcher" wants to find links, documents and relationships among researchers. The SNAC graph database focuses on use cases rather than on "split bits of data." SNAC wants to establish a cooperative community organization to gather and curate the raw stuff of research to develop historic narratives to enable users' ability to find primary resources that will allow you to discover a person's life and work in one place.
Plans are for NARA to host the administration of the cooperative including business and governance. Technical infrastructure will be developed and hosted outside of NARA.
Using the Cloud for Backup, Storage, and Archiving: Decision Factors, Experiences, and Use Cases Explored
DuraCloud is a DuraSpace service that allows users to store, manage and archive data in the cloud. There are currently 4.8 million files and 31 terabytes of data in DuraCloud. Over the last year 6 versions of the DuraCloud open source software were released with 11 new major features that included SDSC integration and automated health checking. All features are rolled up into the open source code so that anyone may freely download a full version of DuraCloud software.
Geneva Henry, Rice University, uses DuraCloud as part of an overall digital preservation strategy at her institution. All DSpace repository contents are stored in DuraCloud, including digitization masters. Rice University was an early DuraCloud pilot institution. They found that they could easily store and manage faculty publications, digital projects, archives and special collections, theses and dissertations and more. Before content is stored in DuraCloud files are packaged in archival information packages (AIPs). Capturing preservation metadata that reflects everything known about a record ensures resource provenance into the future. They are able to recreate their repository with what is stored in DuraCloud. The answers to many questions around digital preservation issues do not have single answers. Henry believes that it is critical to be part of, and contribute to ongoing experiments and new initiatives.
Holly Mercer, University of Tennessee, is using DuraCloud in the context of an overall digital preservation assessment focused on static collections. They have established a production workflow that sends content to local storage, syncs to Amazon S3 and SDSC and finally is uploaded to DuraCloud. They have been pleased that DuraCloud maintains their institutional repository hierarchical structure.
Mark Leggott, UPEI, Islandora and Discovery Garden, oversees 100+ institutions who are using the Islandora software. UPEI was also a pilot DuraCloud institution. The focus of their work with DuraCloud has been to pursue integration with the Islandora framework. DuraCloud has been added to the Islandora stack aiming for a "single button disaster recovery" approach. They have five clients using Islandora and have built in "The Vault" to back-up and preserve users' content with DuraCloud. The DuraCloud sync tool is running in continuous mode in the Islandora stack. Leggott is impressed with the quality of the software that DuraSpace creates and has found that if you have a corrupt file in Islandora you can use DuraCloud as a recovery mechanism.
The Future of Fedora
This session introduced a community initiative to make Fedora "The repository platform for the future." The project is called "Fedora Futures." The group came together in the last 5 months to undertake a 2-3 year software development project that will direct new resources toward a major Fedora overhaul.
Jonathan Markow, Chief Strategy Officer at DuraSpace, explained that as direct grant funding for Fedora development diminished, DuraSpace took on a stewardship role in organizing the community around maintaining and improving the software. While this strategy enabled several significant releases, the user community was left wondering why larger and more fundamental development issues were not being addressed. The simple answer is that available funding was not enough to pay for needed improvements. As a result the core Fedora code has become more difficult to maintain and cannot keep up with emerging current technical requirements.
Mark Leggott explained that Fedora Futures development would begin with two stages. Use cases will be gathered and details of a roadmap will be developed with approval by the Steering Committee. The Technology Team will take the lead in coordination with other standing committers. Raising funds to support the project will be led by a committee chaired by Jonathan Markow. Eddie Shin, Media Shelf, is leading the Technology Team effort to define the project. Media Shelf has been a large contributor to the Fedora code base.
Tom Cramer, Stanford, adopted Fedora in 2009 and is looking to establish a digital repository platform that works "for all of us" for the next 5-10 years. Fedora must work for the full range of institutions, be part of an ecosystem, be able to "go native" in the world of linked data, support existing IRs and provide new data management functions.
In taking a look at Fedora use case actorsthose types of individuals who would interact with the softwareMatthias Razum, head of EScience from FIZ Karlsruhe, asserted that there are four key roles: Curator, Administrator, Researcher and Developer. High level technical requirements that would be significant to all actors are:
- improved scalability and performance
- more flexible storage options
- support for dynamic metadata
- globally unique and reliable identifiers
- improved audit trail/capturing eventssingle point of management for all storage systems.
Jonathan Markow extended an invitation to the entire Fedora community to participate in the effort. "The Futures group has catalyzed renewed development for Fedora, but for the effort to achieve its full potential for all stakeholders, we need to enlist not only Fedora's current committers, but also all adopters, sponsors and service providers." A prospectus is available to those who would like to get involved as contributors to this effort by contacting Jonathan Markow <email@example.com>. University of Virginia, Discovery Garden and University of Prince Edward Island, Stanford University, Columbia University, Oxford University's Bodleian Library, FIZ Karlsruhe, and MediaShelf have already pledged substantial resources to the project.
Closing Plenary & Looking Ahead
Hunter R. Rawlings III, President of the Association of American Universities and former president of Cornell University and University of Iowa, offered closing plenary remarks that centered around the question, "What is college for?" as the impact of massive online courses (MOOCS) are beginning to be felt by higher education institutions. (Recording available: "Massive Online Courses As Drivers for Change", Lynne O'Brien, Duke University).
Rawlings contends that it's "Up for grabs". The traditional value propositiongetting an education to "Learn how to learn" by reading and analyzing complex literature and studying math and scienceis now a distant second behind "Doing what it takes to get a job." Current educational policy initiatives support this view. Some states now require that university degrees cost no more than $10K and that majors be "sorted" by immediate earning power. These policies reduce getting an education to a utilitarian number. (Recording available: "What is College For? The Future of Higher Education", Hunter R. Rawlings III).
Information technology has significantly changed access to scholarly journals. Rawlings sees the need for development of public policy in this contentious arena. Publishing in general is in a state of flux as libraries purchase fewer monographs, books, journals and other materials. University leaders are looking for a "Novel means of solving this development". IT is also playing an essential role in MOOCS. More than 100K students are registered for some courses. Rawlings suggested that large-scale online education represents a major shift in the delivery of higher education.
For slides and recordings of CNI Fall Member Meeting 2012 presentations as they become available please check the CNI web site. The Spring 2013 CNI Membership Meeting will be held on April 4-5 (Thursday and Friday) at the Westin Riverwalk in San Antonio, TX. The call for proposals is open until February 21, 2013.
About the Author
Carol Minton Morris is Director of Marketing and Communications for DuraSpace, and is past Communications Director for the National Science Digital Library (2000-2009) and Fedora Commons (2007-2009). She leads editorial content and materials development and dissemination for DuraSpace publications, web sites, initiatives and online events, and helps connect open access, open source and open technologies people, projects and institutions to relevant news and information. She was the founding editor of NSDL Whiteboard Report (2000-2009) featuring information from NSDL projects and programs nationwide. Follow her at http://twitter.com/DuraSpace.