D-Lib Magazine
The Magazine of Digital Library Research

T A B L E   O F   C O N T E N T S
J A N U A R Y / F E B R U A R Y   2 0 1 5
Volume 21, Number 1/2

DOI: 10.1045/january2015-contents
ISSN: 1082-9873




2nd International Workshop on Linking and Contextualizing Publications and Datasets
Editorial by Laurence Lannom, Corporation for National Research Initiatives


Data as "First-class Citizens"
Guest Editorial by Łukasz Bolikowski, ICM, University of Warsaw, Poland; Nikos Houssos, National Documentation Centre / National Hellenic Research Foundation, Greece; Paolo Manghi, Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Italy and Jochen Schirrwagen, Bielefeld University Library, Germany



Semantic Enrichment and Search: A Case Study on Environmental Science Literature
Article by Kalina Bontcheva, University of Sheffield, UK; Johanna Kieniewicz and Stephen Andrews, British Library, UK; Michael Wallis, HR Wallingford, UK

Abstract: As information discovery needs become more and more challenging, traditional keyword-based information retrieval methods are increasingly falling short in providing adequate support. The problem is often compounded by the poor quality of article metadata in some digital collections. This paper investigates automatic semantic enrichment and search methods, as ways to meet these challenges. In particular, the benefits of enriching articles with knowledge from Linked Open Data resources are investigated, with focus on the domain of environmental science. In order to facilitate environmental science researchers in carrying out better semantic searches, a form-based semantic search interface is proposed. It helps researchers to benefit from the semantically enriched content, e.g. to carry out sophisticated location-based searches. The usability and ease of learning of this web interface were evaluated in a user-based study, the results of which are also reported.

A-posteriori Provenance-enabled Linking of Publications and Datasets via Crowdsourcing
Article by Laura Drăgan, Markus Luczak-Rösch, Elena Simperl, Heather Packer and Luc Moreau, University of Southampton, UK; Bettina Berendt, KU Leuven, Belgium

Abstract: In this paper we present opportunities to leverage crowdsourcing for a-posteriori capturing dataset citation graphs. We describe a user study we carried out, which applied a possible crowdsourcing technique to collect this information from domain experts. We propose to publish the results as Linked Data, using the W3C PROV standard, and we demonstrate how to do this with the Web-based application we built for the study. Based on the results and feedback from this first study, we introduce a two-layered approach that combines information extraction technology and crowdsourcing in order to achieve both scalability (through the use of automatic tools) and accuracy (via human intelligence). In addition, non-experts can become involved in the process.

A Framework Supporting the Shift from Traditional Digital Publications to Enhanced Publications
Article by Alessia Bardi and Paolo Manghi, Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Italy

Abstract: Enhanced publications (EPs) can be generally conceived as digital publications "enriched with" or "linking to" related research results, such as research data, workflows, software, and possibly connections among them. Enhanced Publication Information Systems (EPISs) are information systems devised for the management of EPs in specific application domains. Currently, no framework supporting the realization of EPISs is known, and EPIs are typically realized "from scratch" by integrating general-purpose technologies (e.g. relational databases, file stores, triple stores) and Digital Library oriented software (e.g. repositories, cataloguing systems). Such an approach is doomed to entail non-negligible realization and maintenance costs that could be decreased by adopting a more systemic approach. The framework proposed in this work addresses this task by providing EPIS developers with EP management tools that facilitate their efforts by hiding the complexity of the underlying technologies.

Science 2.0 Repositories: Time for a Change in Scholarly Communication
Article by Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano, Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Italy

Abstract: Information and communication technology (ICT) advances in research infrastructures are continuously changing the way research and scientific communication are performed. Scientists, funders, and organizations are moving the paradigm of "research publishing" well beyond traditional articles. The aim is to pursue an holistic approach where publishing includes any product (e.g. publications, datasets, experiments, software, web sites, blogs) resulting from a research activity and relevant to the interpretation, evaluation, and reuse of the activity or part of it. The implementation of this vision is today mainly inspired by literature scientific communication workflows, which separate the "where" research is conducted from the "where" research is published and shared. In this paper we claim that this model cannot fit well with scientific communication practice envisaged in Science 2.0 settings. We present the idea of Science 2.0 Repositories (SciRepos), which meet publishing requirements arising in Science 2.0 by blurring the distinction between research life-cycle and research publishing. SciRepos interface with the ICT services of research infrastructures to intercept and publish research products while providing researchers with social networking tools for discovery, notification, sharing, discussion, and assessment of research products.

Data Citation Practices in the CRAWDAD Wireless Network Data Archive
Article by Tristan Henderson, University of St Andrews, UK and David Kotz, Dartmouth College, USA

Abstract: CRAWDAD (Community Resource for Archiving Wireless Data At Dartmouth) is a popular research data archive for wireless network data, archiving over 100 datasets used by over 6,500 users. In this paper we examine citation behaviour amongst 1,281 papers that use CRAWDAD datasets. We find that (in general) paper authors cite datasets in a manner that is sufficient for providing credit to dataset authors and also provides access to the datasets that were used. Only 11.5% of papers did not do so; common problems included (1) citing the canonical papers rather than the dataset, (2) describing the dataset using unclear identifiers, and (3) not providing URLs or pointers to datasets.

A Methodology for Citing Linked Open Data Subsets
Article by Gianmaria Silvello, University of Padua, Italy

Abstract: In this paper we discuss the problem of data citation with a specific focus on Linked Open Data. We outline the main requirements a data citation methodology must fulfill: (i) uniquely identify the cited objects; (ii) provide descriptive metadata; (iii) enable variable granularity citations; and (iv) produce both human- and machine-readable references. We propose a methodology based on named graphs and RDF quad semantics that allows us to create citation meta-graphs respecting the outlined requirements. We also present a compelling use case based on search engines experimental evaluation data and possible applications of the citation methodology.

Challenges in Matching Dataset Citation Strings to Datasets in Social Science
Article by Brigitte Mathiak and Katarina Boland, GESIS — Leibniz Institute for the Social Sciences, Germany

Abstract: Finding dataset citations in scientific publications to gain information on the usage of research data is an important step to increase visibility of data and to give datasets more weight in the scientific community. Unlike publication impact, which is readily measured by citation counts, dataset citation remains a great unknown. In recent work, we introduced an algorithm to find dataset citations in full text documents automatically, but, in fact, this is just half the road to travel. Once the citation string has been found, it has to be matched to the correct DOI. This is more complicated than it sounds. In social science, survey datasets are typically recorded in a much more fine-granular way than they are cited, differentiating between years, versions, samples, modes of the interview, countries, even questionnaire variants. At the same time, the actual citation strings typically ignore these details. This poses a number of challenges to the matching of citations strings to datasets. In this paper, we discuss these challenges in more detail and present our ideas on how to solve them using an ontology for research datasets.

Enabling Living Systematic Reviews and Clinical Guidelines through Semantic Technologies
Article by Laura Slaughter; The Interventional Centre, Oslo University Hospital (OUS), Norway; Christopher Friis Berntsen and Linn Brandt, Internal Medicine Department, Innlandet Hosptial Trust and MAGICorg, Norway and Chris Mavergames, Informatics and Knowledge Management Department, The Cochrane Collaboration, Germany

Abstract: In clinical medicine, secondary research that produces systematic reviews and clinical practice guidelines is key to sound decision-making and quality care. Having machine-readable primary study publications, namely the methods and results of published human clinical trials can greatly improve the process of summarizing and synthesizing knowledge in medicine. In this short introduction to the problem, we provide a brief review of the related literature on various efforts to produce semantic technologies for sharing and reusing content from clinical investigations (RCTs and other clinical primary studies). Using an illustrative case, we outline some of the necessary metadata that needs to be captured in order to achieve some initial automation in authorship of systematic reviews and clinical guidelines. In addition, we list desiderata that we believe are needed to reduce the time and costs of maintaining these documents. These include linking provenance information to a much longer scientific investigation lifecycle, one that incorporates a single study's role all the way through its use in clinical guideline recommendations for patient treatments.

Data without Peer: Examples of Data Peer Review in the Earth Sciences
Article by Sarah Callaghan, British Atmospheric Data Centre, UK

Abstract: Peer-review of data is an important process if data is to take its place as a first class research output. Much has been written about the theoretical aspects of peer review, but not as much about the actual process of doing it. This paper takes an experimental view, and selects seven datasets, all from the Earth Sciences and with DOIs from DataCite, and attempts to review them, with varying levels of success. Key issues identified from these case studies include the necessity of human readable metadata, accessibility of datasets, and permanence of links to and accessibility of metadata stored in other locations.

The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite
Article by Jan Brase and Irina Sens, German National Library of Science and Technology, Germany and Michael Lautenschlager, German Climate Computing Centre, Germany

Abstract: As part of a project initiated by the German Research Foundation (DFG), the German National Library of Science and Technology (TIB) assigned its first DOI names to scientific data in summer 2004. The goal was to use persistent identifiers as part of a broader effort to make scientific datasets citable research outputs. The effort begun by TIB led to the creation and funding of DataCite on 1 December 2009. During the past five years DataCite has grown into a global consortium that has assigned over four million DOI names to scientific datasets and other research artefacts. It is a successful cooperative effort led by scientists, librarians and researchers. This article highlights its development and gives an overview of DataCite's recent work.


N E W S   &   E V E N T S


In Brief: Short Items of Current Awareness

In the News: Recent Press Releases and Announcements

Clips & Pointers: Documents, Deadlines, Calls for Participation

Meetings, Conferences, Workshops: Calendar of Activities Associated with Digital Libraries Research

F E A T U R E D   D I G I T A L


Archive of Digital Art (ADA)


The Livingroom, Genre: Interactive Installation.
[Copyright Christa Sommerer, Laurent Mignonneau & Roberto Lopez-Gulliver. Used with the permission of The Archive of Digital Art. ]

The Moon is a Mirror. Genre: Digital Performance.
[Copyright Scott Hessels. Used with the permission of The Archive of Digital Art.]


The Archive of Digital Art (ADA) is a cost-free scholarly database and online community dedicated to the research on digital art. The pioneer was founded in 1999 as Database of Virtual Art by art historian and media theorist Oliver Grau in Berlin. In cooperation with established media artists, researchers, and institutions, it has been documenting the rapidly evolving world of digital art and its related fields for more than a decade and contains today a selection of thousands of artworks at the intersection of art, science, and technology.

The Database of Virtual Art was dedicated to installation based, interactive, processual and immersive artworks, focusing on the topics of virtual reality, telepresence, artificial life and robotics. From 2005 to 2013 the field of artworks was extended; today the Archive of Digital Art documents a wide range of digital works at the intersection of art and science.

The main objectives of the archive are the documentation of digital artworks as well as the fostering of research and international collaboration in the field. With an expanded concept of documentation, fitting to the needs of processual, ephemeral, multi-media based, interactive and fundamentally context dependent artworks, the Archive of Digital Art ensures a systematic documentation delivering the needed information for preservation.

At present, the archive features hundreds of artists and scholars and thousands of artworks. With video documentation, technical data, artist statements, academic texts as well as bibliographical information, exhibitions and events, the Archive of Digital Art is the most comprehensive resource in the field. Besides the multifarious collection of works the database provides one of the most extensive bibliographies with more than 2300 publications, as well as 800 documented institutions. The data is richly interlinked and constantly updated by an online community of selected artists and scholars who are collaboratively working on documentation.

In 2013, the archive was transferred into a web 2.0 environment. Hence, ADA also provides community features and user-oriented applications to enable a collective scientific exchange between artists, engineers, scholars and the public to foster interdisciplinary and global collaborative analysis and a proactive process of knowledge transfer. Artists and scholars who met with certain artistic and scientific standards — required are at least 5 exhibitions/publications — are invited to set up their profiles and contribute to the documentation.

A newly developed Meta-Thesaurus search portal enabling the comprehension of digital and classical artworks will be introduced in 2015.

Advisory Board:

Christiane Paul, Roy Ascott, Erkki Huhtamo, Gunalan Nadarajan, Martin Roth, et.al.

ADA team:

Prof. Dr. Habil. Oliver Grau
Michaela Seiser (Editor, Community Admin)
Sebastian Haller, Viola Rühse, Wendy Coones, Ann-Christin Renn (Editorial Team)


D - L I B   E D I T O R I A L   S T A F F

Laurence Lannom, Editor-in-Chief
Allison Powell, Associate Editor
Catherine Rey, Managing Editor
Bonita Wilson, Contributing Editor

 |  Mirror Sites | Export Citations: RIS or BibTeX
transparent image