Carol Minton Morris
Baltimore, MD Bromo-Seltzer was invented in this town by Captain Isaac Emerson. To celebrate his tummy-taming elixir he built a clock tower in 1911 that was intended to look like the Palazzo Vecchio in Florence, Italy. The Baltimore version included a marvelous 51-foot lighted blue rotating bottle on top.
The big blue bottle is long gone, but the Bromo-Seltzer Tower remains. Preservationists, archivists, librarians and technology specialists might argue that the blue bottle has gone the way of other valuable parts and pieces of our cultural heritage as increasing amounts of digital information and data threaten to overrun the institutions whose job it is to preserve knowledge into the future. At the SPARC Institutional Repositories (SPARC IR) Conference and the Sun Microsystems Preservation and Archiving Special Interest Group (PASIG) meeting, attendees reviewed new solutions and grappled with the thorny issues around creating technology and policies to support durable knowledge for future generations in an era of burgeoning information.
SPARC IR, November 18
Living in a land of "plenty o' information" has caused libraries and institutional repositories from all parts of the world to examine policies, look for economies of scale, find new ways to disseminate intellectual products to provide greater service with public funds, and consider profit making initiatives to fund access to scholarly resources.
In SPARC IR morning sessions, David Prosser, Director, SPARC Europe, Syun Tutiya, Chiba University, Japan; and Bonnie Klein, Defense Technology Information Center, USA, presented different views of legal and open access policy environments around access to data and information in their three countries.
European information policy makers are interested in leading the charge towards making materials open and available as a way to stimulate economic development. The Berlin Declaration in Support of Open Access, for example, now has 255 signatories worldwide, including signatories from Germany, France, Austria, Sweden, China and others. The Wellcome Trust has independently funded biomedical research that is mandated for deposit by UK law, as is publicly funded research. "Any original research paper" is required to be deposited. Other European research organizations have also begun putting policies in place to require deposit of papers.
Japan is concerned with getting the technology right prior to instituting overall policy. Tutiya reported, "No policy is our policy." Assessment and establishing industry/society relationships are two aspects of how ideas around Open Access policies are evolving in his country. Japanese information managers are working towards being able to harvest metadata nationally, and they are concluding that "environments [for data and information] are more important."
CENDI is an interagency group of United States federal agencies whose work involves managing repositories, information centers and the U.S. Government Printing Office. All 13 CENDI agencies play a role in addressing science- and technology-based national priorities 26 agencies fund about 1,000 grant programs.
Since 2001 online public information facilities such as Science.gov have been paid for out of pocket by participating agencies to provide greater citizen access to basic science research results. Version 5 of Science.gov has just been launched, which includes a federated search over many federal repository databases
WorldWideScience.org is an even larger collaborative effort. It was launched during the summer of 2007 with 15 partner countries. The "fundamental research" represented at World Wide Science can be defined as basic and applied research in science and engineering, the results of which are ordinarily published and shared broadly about 70% of U.S. research falls under this category.
An ongoing discussion is how to structure "interim" scientific reporting. Early results are not always peer-reviewed and can include spotty data types and formats. Only 53% of grantees thought that early posting on a government web site was a good idea, because an invention could be prematurely disclosed and scholarly journals view web site postings as a form of "publication."
Klein concluded, "[US] Government agencies feel that they do not have the right to mandate Open Access."
Later the morning of November 18,"Campus Publishing Strategies" sessions focused on using and understanding the evolution of scholarly publishing as a platform for scholarly discourse, as well as a process for developing intellectual products. Moderator Richard Fyffe, Grinnell College, reminded the audience that Clifford Lynch, Executive Director, Coalition of Networked Information (CNI), had cautioned against losing the role of institutions in establishing educational repositories.
Rea Devakos, Coordinator, Scholarly Communication Initiatives, University of Toronto Libraries, presented work on the Synergies Project, which is a national platform for scholarly publishing in Canada focused on humanities. She explained, "Context has been added as a fundamental characteristic of information." The Synergies consortium consists of five, core, member institutions led by Université de Montréal, and 16 regional partners. A statement on the project web site reads: "In bringing Canadian Social Sciences and Humanities research to the internet, Synergies will not only bring that research into the mainstream of worldwide research discourse but also it will legitimize online publication in Social Sciences and Humanities."
Devakos added that the University of Prince Edward Island (UPEI) will host the Access 2009 Conference, billed as "The premier library technology conference." UPEI recently released "Islandora", which is an open source Fedora digital repository for a Drupal front-end that provides an easy way to build and manage repository content for use by web sites
Catherine Mitchell, Director, eScholarship Publishing Group, California Digital Library, University of California, opened her remarks by suggesting that we stop talking about repositories. In her view, quantification standards, such as size and number of downloads, do not work because the comparative scales of institutions are so different. There are ten large campuses in the University of California system, for example. In spite of the enormous overall size of UC holdings and usage, she views the lack of visibility of e-scholarship and the associated lack of incentives for participating in open access deposit efforts as impediments to moving forward towards creating robust open access repositories.
Their model is to establish marketing campaigns at each institution facilitated by an outreach and marketing coordinator who in turn creates user groups within each of their institutions in collaboration with e-scholarship liaisons and local site administrators.
"Interface design," Mitchell says, "Is the elephant in the middle of the room." Their new web site design is very simple. She said, "The homepage is a place where people seldom go, and it must be [designed] as a marketing tool."
How scholarly publishing models translate to small colleges and universities involves looking into the strategic reasons that the library might be involved as a publisher, according to Janet Sietman, Digital Commons Project Manager, and Teresa Fishel, Library Director, Macalester College. Located in St. Paul, Minnesota, Macalester has a population of 1,900 students, 57% of whom come from outside the Midwest. Macalester has 164 faculty members and 19 library staff members. The point of convergence with regard to how to provide scholarly publishing efforts on campus was in "Services."
Macalester's institutional repository is a showcase for student research and publications, and it provides visibility for faculty scholarship. Additionally, new opportunities for libraries were assessed as part of the IR planning process. The library saw that by applying traditional library cataloging, selection, and content management skills, they could begin to address the need for institutional change. These e-activities added value over time. They also discovered ways to leverage materials through external web services (Google services).
They expect a 400% increase in downloads from DigitalCommons@Macalester next year, as interest in born digital journals that are published from their repository grows. Faculty at this small institution see heightened visibility of research materials as an advantage going forward.
SPARC IR luncheon speaker Bob Witeck, CEO Witeck-Combs Communications, Inc., Washington, D.C., welcomed fellow "Smarty pants and know-it- alls" who he viewed as fellow marketers from the library and scholarly publishing communities. Witeck stated that he had never done delivery of "real knowledge." He viewed SPARC library and scholarly publishing attendees as fellow marketers.
In collaborative efforts such as scholarly publishing, key marketing messages are an integral part of the work everyone needs to do together in order to succeed. Witeck suggests that fact-finding, like determining who cares about issues such as the reach of shared knowledge in digital formats and why that might matter in a world where science has been kept secret, is significant. He says, "Right now we may have a perfect storm of opportunity around making publicly funded science public, because it's all about trust and value."
In the discussion that followed, Les Carr, University of Southampton, pointed out that when you live in an institutional environment, it's all about proving that one thing is better than another thing. While he agrees that marketing or being able to tell a good story is useful, he finds that institutions are not skilled at using marketing tools such as creative design and media, or coming up with well-crafted marketing messages.
Witeck answered that the best messages might be articulated by people other than scholars, researchers or librarians. "Stop talking to yourselves," he advised. Bright young faculty and champions for new ideas are powerful messengers. Additionally, he believes that institutional leaders should be on board with the marketing messages, and carry those messages forward. Enlisting third parties, focusing on funders and working towards solution-driven business strategies are all part of overcoming institutional barriers to successful deployment of scholarly assets.
Sun PASIG November 19-20
Is there anyone out there who has an in-box, spam filter, hard drive, or update feed that is not brimming with outdated, digital junk? And are you even sure whether or not it's junk? Like old string, your institution may have a particular reason for keeping a collection of regularly updated data. It might even be an important reason. Welcome to the world of the Sun Preservation and Archiving Special Interest Group (PASIG) fall meeting.
Ask any systems administrator holding back a flood of content, and they will tell you that their finger in the digital dyke is the only thing keeping your personal computing devices from being swept away by a virtual rising tide of data. At the Sun PASIG fall meeting also held in Baltimore use cases, data "floodwatch" metrics, technical architectures and storage strategies were examined and discussed as ways to move towards making use of, and tracking, what seem like "bazillions" of proliferating data points. To download Sun PASIG Meeting presentations, please visit <http://events-at-sun.com/pasig_fall08/presentations.html>.
Mike Keller, University Librarian and Director of Academic Information Resources, Stanford University, opened Sun PASIG. He painted an optimistic picture of new directions that Sun Microsystems will take in the future, particularly with regard to creating an ongoing global best practices forum for high performance computing, and solutions for storage and easier-to-implement reference architectures. Keller introduced Ernie Ingles, Vice-Provost and Chief Librarian, University of Alberta, who suggested, "The future has not yet been preserved." He asked attendees to imagine laying the technical and social frameworks for preserving "memory objects" whose meaning will be far greater than any anonymous data for future generations.
National Digital Information Infrastructure and Preservation Program (NDIIPP)
Martha Anderson, Director of Program Management, NDIIPP, (National Digital Information Infrastructure and Preservation Program), U.S. Library of Congress, presented a "Major Trends Overview." The Library of Congress's efforts to preserve U.S. heritage and knowledge takes into account multiple dimensions of preservation from the vantage points of "today, tomorrow and forever." Although it is the stuff of 40-year-old science fiction, she suggested that reading Philip K. Dick's 1969 science fiction collection entitled The Preserving Machine would help attendees gain insight into some of the real issues involved in an expansive view of preserving knowledge today. Anderson said, "We [NDIIPP] want to bridge the present to the future, and we are building machines to help us do that."
Sun Storage Technologies
Sun systems and architecture specialists sketched out assumptions, solutions and new ideas around creating long-term facilities for large data acquisition, storage and management. Chris Wood, Storage CTO, Sun Microsystems, Inc. said, "Over time, systems, software and people will be replaced, and data will be preserved. Every component will fail or be swapped out." He went on, "Twenty years ago they were thinking machines, and then general purpose computing took over specialized hardware will always lose. Multiple archive models must be supported." The model he presented supports LOCKSS, CLOCKSS, as well as ILS models. Wood believes that many types of architectures that support clouds or "federated services" will emerge over time.
Several Sun presenters emphasized the role of energy consumption in the digital preservation equation. Over time, the energy price tag for operating almost any piece of hardware will surpass its original cost. As the costs for storage systems decrease and energy prices increase, institutions will continue to be faced with deciding how much data they can afford to keep.
The DSpace Foundation and Fedora Commons Collaborate on "DuraSpace"
DSpace and Fedora Commons held several meetings at Sun PASIG. The first introduced the organizations' joint DuraSpace inititative. This six-month investigation funded by the Mellon Foundation is being led by the DSpace Foundation and Fedora Commons to determine the feasibility of establishing an open, durable store service layer to leverage repository development in computing and storage clouds. The idea behind DuraSpace is to provide a trusted, value-added service layer to augment the capabilities of generic storage providers by making stored digital content more durable, manageable, accessible and sharable.
The second part of the meeting was dedicated to a community discussion about establishing a professional development curriculum for existing and potential repository developers, managers and curators, with support from Sun Microsystems.
Outreach staff from DSpace and Fedora Commons explained that the key shared objective is to strengthen and engage repository user and developer communities worldwide. Attendees expressed interest in the concept of a repository professional development seminar series that would include preservation and archiving as part of an integrated curriculum, as well as one-off profiles, use cases and "how-to" topics. Seminar leaders and topics are being sought. Please contact Carissa Smith at <email@example.com> at if you are interested in working on the joint seminar series.
DSpace and Fedora developers met at noon on November 19 to move towards shared understandings for how the two popular repositories' technology development strategies might be brought closer together. By taking a step back from existing concepts of how each system operates, simple storage may be viewed as a logical first "rung" on a "ladder" towards interoperability. Four progressive laddering concepts are: Content blobs (bottom rung of the ladder); facts about blobs and their interrelationships (next rung up the ladder); aggregations (next rung up the ladder), and; enriched semantic understanding recommendations and best practices about how to expose things that filters down to the bottom layer.
U.S. National Archives and Records Administration
Kenneth Thibodeau, Director, Electronic Records Archives Program, NHE, U.S. National Archives and Records Administration (NARA) (http://www.archives.gov/) opened afternoon sessions on November 19 by explaining that he woke up each day thinking, "Today's the day the sky is going to fall." His concern is understandable: NARA is responsible for digitally preserving and reliably transmitting digitally encoded information over time and technology in support of nothing less than protecting "Records [that] help us claim our rights and entitlements, hold our elected officials accountable for their actions, and document our history as a nation." And all this must be done in a scalable, extensible, and evolvable way while complying with 852 requirements statements.
As a small example of the volume of data for which NARA is legally responsible, Thibodeau explained that they will take legal ownership of 150 terabytes of data containing 100 million email messages when President Bush leaves office.
Thibodeau advocates a cyclical approach to systems development and production that is designed for growth, evolution, openness and closure, and he stresses that NARA systems cannot satisfy end users. There is a need for information "brokers," like university libraries, to provide user services. Thibodeau would like to make it easier for third parties to package and promote NARA resources and information.
Beyond Fedora 3.0
After the 2008 releases of Fedora 3.0 and 3.01 with a Content Model Architecture (CMA) providing an integrated structure for persisting and delivering the essential characteristics of digital objects, Sandy Payette, Executive Director, Fedora Commons said, "We stood back and strategized."
She went on, "Waves of repository-enabled applications have emerged institutional repository and digital library apps; collaborative "Web 2.0" apps; eScience, eResearch, and data curation apps...We can build amazing private islands." Should we stay within this institutional-specific and organizational-specific small island application development paradigm? She believes that repository communities must evolve from "systems" to "networks" ones that are characterized by being integrated systems featuring distributed control and generic gateways, and that are more open and more reconfigurable.
Payette concluded by saying that new ways to expose content, backend storage abstraction, performance and scalability, as well as the content service ladder concept, promote greater interoperability. Strategic collaborations such as the DSpace and Fedora DuraSpace initiative will find ways to blend networks using repository infrastructure and services. "What we build today needs to be evolveable and organic. The only systems that last are the ones that change," she said.
The National Science Foundation Office of Cyberinfrastructure DataNet Program
Lucy Nowell, Program Director, Office of CyberInfrastructure (OCI), U.S. National Science Foundation says, "In 2007 the amount of information created surpassed, for the first time, our ability to preserve it on the planet." A single project funded by OCI, for example, might generate 30 terabytes of data per night.
OCI's answer to what to do about this wealth of information lies in establishing the Sustainable Digital Data Preservation and Access Network Partners (DataNet) program. DataNet is focused on creating a national framework, culture change, tools, resources, opportunities and exploration around the curation and use of data by building a "network of data networks" similar to the Internet. DataNet seeks to preserve the history of science by not only capturing "big science" data, but also by saving the long tail of small science that provides critical evidence of primary science.
A Very Large, Scalable, Data Architecture
The Church of Jesus Christ of the Latter-Day Saints (LDS) has established and maintains the largest genealogical library in the world to "preserve the heritage of mankind," and makes it freely available to the public. How much information is that? More than 10 billion names have been cataloged since 1894 from images of birth and death information recorded in family and public documents that equal 10% of all the human beings who have ever lived on the planet. Their near-term goal is to be able to publish 1 billion new genealogical record images every year.
Gary Wright, Digital Preservation Product Manager and Randy Stokes, Principal Engineer, FamilySearch, The Church of Jesus Christ of Latter-Day Saints, explained that records are available through the catalog of the Family History Library and may be accessed through the FamilySearch web site. Though recent record collection efforts are digitally based, the historical preservation strategy for collecting images of birth and death records from around the planet was to store them on microfilm. Currently that microfilm is being digitized and a new architecture for preserving the collection is being developed.
Complex systems for ingesting, disseminating, preserving and supporting tens to hundreds of petabytes of data consisting of more than 100 billion objects are currently being developed using the Sun Storage Tek SL8500 Modular Library System.
PASIG in the Global Project Landscape
Clifford Lynch, Executive Director, Coalition for Networked Information (CNI), offered a final keynote summary at Sun PASIG. In the ongoing "tech vs. policy" conversation, he suggests that there are limits to what can be accomplished. There must be an emphasis on economic issues with an eye to what can be optimized and what can be left behind. We cannot pursue perfection, and a balance between better, faster and cheaper must be found.
Lynch is looking for economies of scale in federation models. With advances such as the Open Archives Initiative Object Reuse and Exchange (ORE), moving assets is getting easier. Services that can help to establish provenance and authenticity, along with more network trust models such as Lots of Copies Keep Stuff Safe (LOCKSS), are needed.
"Cloud" storage could be an economic win especially if facilities are located where energy costs are low. Lynch cautioned that there are no standards for cloud agreements what are services; how will risks be monetized; and how will media and format migrations be handled?
He reminded the audience, who perhaps were already on this page after three days of presentations about ubiquitous and persistent data, that there is a lack of framework to discuss "what to give up" as knowledge preservation issues loom. "It is an ugly conversation, but we are morally obligated to have it," Lynch concluded.
Copyright © 2009 Carol Minton Morris