D-Lib Magazine
The Magazine of Digital Library Research

T A B L E   O F   C O N T E N T S
S E P T E M B E R / O C T O B E R   2 0 1 2
Volume 18, Number 9/10

ISSN: 1082-9873




An Embarrassment of Riches
by Laurence Lannom, Corporation for National Research Initiatives



OpenAIREplus: the European Scholarly Communication Data Infrastructure
Article by Paolo Manghi, Institute of Information Science and Technologies, National Research Council, Pisa, Italy; Lukasz Bolikowski, University of Warsaw, Interdisciplinary Centre for Mathematical and Computational Modelling Warsaw, Poland; Natalia Manola, National and Kapodistrian University of Athens, Department of Computer Science Athens, Greece; Jochen Schirrwagen, Bielefeld University Library, Bielefeld, Germany Tim Smith, CERN, Geneva, Switzerland

Abstract: OpenAIRE and OpenAIREplus (Open Access Infrastructure for Research in Europe) are EC funded projects (Dec 2009 — May 2014) whose goals are to realize, enhance, and operate the Open Access European scholarly communication data infrastructure. This paper describes the high-level architecture and functionalities of that infrastructure, including services designed to collect, interlink and provide access to peer-reviewed and non-peer reviewed publications (from repositories), datasets (from dataset archives), and projects of the European Commission and national funding schemes (from CRIS systems).

The Data Conservancy Instance: Infrastructure and Organizational Services for Research Data Curation
Article by Matthew S. Mayernik, National Center for Atmospheric Research (NCAR); G. Sayeed Choudhury, Tim DiLauro, Elliot Metsger, Barbara Pralle, and Mike Rippin, Johns Hopkins University; Ruth Duerr, National Snow & Ice Data Center (NSIDC)

Abstract: Digital research data can only be managed and preserved over time through a sustained institutional commitment. Research data curation is a multi-faceted issue, requiring technologies, organizational structures, and human knowledge and skills to come together in complementary ways. This article provides a high-level description of the Data Conservancy Instance, an implementation of infrastructure and organizational services for data collection, storage, preservation, archiving, curation, and sharing. While comparable to institutional repository systems and disciplinary data repositories in some aspects, the DC Instance is distinguished by featuring a data-centric architecture, discipline-agnostic data model, and a data feature extraction framework that facilitates data integration and cross-disciplinary queries. The Data Conservancy Instance is intended to support, and be supported by, a skilled data curation staff, and to facilitate technical, financial, and human sustainability of organizational data curation services. The Johns Hopkins University Data Management Services (JHU DMS) are described as an example of how the Data Conservancy Instance can be deployed.

A Perspective on Resource Synchronization
Article by Herbert Van de Sompel, Robert Sanderson, and Martin Klein, Los Alamos National Laboratory; Michael L. Nelson, Old Dominion University; Bernhard Haslhofer and Simeon Warner, Cornell University; Carl Lagoze, University of Michigan

Abstract: Web applications frequently leverage resources made available by remote web servers. As resources are created, updated, deleted, or moved, these applications face challenges to remain in lockstep with changes on the server. Several approaches exist to help meet this challenge for use cases where "good enough" synchronization is acceptable. But when strict resource coverage or low synchronization latency is required, commonly accepted Web-based solutions remain illusive. This paper provides a perspective on the resource synchronization problem that results from inspiration gained from prior work, and initial insights resulting from the recently launched NISO/OAI ResourceSync effort.

Identifying Threats to Successful Digital Preservation: the SPOT Model for Risk Assessment
Article by Sally Vermaaten, Statistics New Zealand; Brian Lavoie, OCLC; Priscilla Caplan, Florida Virtual Campus (FLVC)

Abstract: Developing a successful digital preservation strategy amounts to accounting for, and mitigating, the impact of various threats to the accessibility and usability of digital materials over time. Typologies of threats are practical tools that can aid in the development of preservation strategies. This paper proposes a new outcome-based model, the Simple Property-Oriented Threat (SPOT) Model for Risk Assessment, which defines six essential properties of successful digital preservation and identifies a limited set of threats which, if manifested, would seriously diminish the ability of a repository to achieve these properties. We demonstrate that the SPOT Model possesses the attributes of conceptual clarity, balanced granularity, comprehensiveness and simplicity, and provide examples of practical uses of the model and suggestions for future work.

Fulltext Geocoding Versus Spatial Metadata for Large Text Archives: Towards a Geographically Enriched Wikipedia
Article by Kalev H. Leetaru, University of Illinois

Abstract: The rise of "born geographic" information and the increasing creation and mediation of information in a spatial context has given rise to a demand for extracting and indexing the spatial information in large textual archives. Spatial indexing of archives has traditionally been a manual process, with human editors reading and assigning country-level metadata indicating the major spatial focus of a document. The demand for subnational saturation indexing of all geographic mentions in a document, coupled with the need to scale to archives totaling hundreds of billions of pages or those accessioning hundreds of millions of new items a day requires automated approaches. Fulltext geocoding refers to the process of using software algorithms to parse through a document, identify textual mentions of locations, and using databases of places and their approximate locations known as gazetteers, to convert those mentions into mappable geographic coordinates. The basic workflow of a fulltext geocoding system is presented, together with an overview of the GNS and GNIS gazetteers that lie at the heart of nearly every global geocoding system. Finally, a case study comparing manually-specified geographic indexing terms versus fulltext geocoding on the English-language edition of Wikipedia demonstrates the significant advantages of automated approaches, including finding that previous studies of Wikipedia's spatial focus using its human-provided spatial metadata have erroneously identified Europe as its focal point because of bias in the underlying metdata.



United Kingdom's Open Access Policy Urgently Needs a Tweak
Opinion by Stevan Harnad, Université du Québec à Montréal & University of Southampton

Abstract: The UK government, under the joint influence of the publisher lobby and short-sighted advice from Open Access (OA) advocates, has decided to make all UK research output OA within two years by diverting funds from UK research to pay publishers extra for (Gold) OA publishing, over and above what the UK (and the rest of the world) already pays publishers for journal subscriptions. This would merely be a needless waste of UK's scarce research funds in exchange for OA, instead of strengthening the UK's existing mandate for cost-free (Green) OA self-archiving. But the UK has also been persuaded to require researchers to pick and pay for Gold OA, instead of leaving the Green/Gold choice to them. This requirement needs to be dropped to prevent perverse consequences, both locally and globally, for both the UK and OA.


C O N F E R E N C E   R E P O R T S

4,000+ Tweets Later: Looking Back at the Seventh International Conference on Open Repositories
Conference Report by Carol Minton Morris, DuraSpace

Abstract: Edinburgh's cool and misty weather set the stage for the Seventh Annual International Conference on Open Repositories (OR2012) in early July 2012, and contributed to the medieval atmosphere created by craggy hills, castles, and narrow, winding streets. The mists were parted inside the University of Edinburgh's Appleton Tower where 460 attendees from 40 countries were on hand to participate in this trademark formative, informal and inspirational conference.

AERI 2012 Digital Curation Pre-Conference
Conference Report by Alex H. Poole, Christopher A. Lee, and Angela P. Murillo, University of North Carolina at Chapel Hill

Abstract: Organized under the auspices of the DigCCurr II project funded by the Institute of Museum and Library Services (IMLS), a digital curation pre-conference symposium was held on July 8, 2012 at the University of California-Los Angeles in association with the Archival Education Research Institute (AERI). Seven digital curation experts from six institutions led the day's sessions, which focused on digital curation education. The symposium discussed the importance of curriculum development, mentoring, seeking funding, research strategies, and collaboration across disciplines and institutions both nationally and internationally.


N E W S   &   E V E N T S


In Brief: Short Items of Current Awareness

In the News: Recent Press Releases and Announcements

Clips & Pointers: Documents, Deadlines, Calls for Participation

Meetings, Conferences, Workshops: Calendar of Activities Associated with Digital Libraries Research and Technologies

F E A T U R E D   D I G I T A L


National Library of Medicine's Digital Collections


film image
["Save a Day" — film by the United States Public Health Service. Courtesy of the National Library of Medicine.]

film image
["Drinking water" — a United States Navy training film. Courtesy of the National Library of Medicine.]


The materials that comprise the National Library of Medicine's Digital Collections are stored in the National Library of Medicine's (NLM) digital repository, one of several electronic storage systems within which digitized and born-digital objects created or acquired by NLM reside. The digital repository provides access to digital content not covered by PubMed Central and the NIH CIT Videocast project.

There are dozens of digital collections created by the History of Medicine Division that require long-term management and preservation, and NLM collection development and acquisitions staff are seeing an increasing availability of born-digital materials that NLM needs to add to its collections. The NLM preservation program has embraced digitization as a preservation format to replace microfilming.

The users of the NLM Digital Repository include both end-users, such as the general public and NIH staff, and NLM staff who work with and manage the content and repository system. A primary goal of the NLM Repository is to provide access to the material as an active, regularly used archive, as opposed to a dark archive accessible only under certain exceptions.

At the National Library of Medicine's Digital Collections web site, users can view 20 featured items from NLM's collections. These materials are displayed on the site's "Wall" (which requires the latest Adobe Flash Flayer to use). They include videos and digitized books on various subjects. To find additional content, users can browse or search by collection titles, subjects, authors, years, or languages. For the videos, users can opt to see the transcripts of the narration and can select for closed captioning as well.

A suite of open source and NLM-created software supports the web site. All the content in Digital Collections is in the public domain and is freely available worldwide.


D - L I B   E D I T O R I A L   S T A F F

Laurence Lannom, Editor-in-Chief
Allison Powell, Associate Editor
Catherine Rey, Managing Editor
Bonita Wilson, Contributing Editor

  |   Mirror Sites  |  Export Citations: RIS or BibTeX
transparent image