Volume 17, Number 3/4
Table of Contents
Open Access, Open Data: Paradigm Shifts in the Changing Scholarly Communication Scenario
Elena Giglia, University of Turin
The Open Access Open Data conference was held December 13 -14, 2010 in Cologne, Germany. The purpose of the conference was to examine the development of the Open Access movement during the last five years, how it is expected to change within the next five to ten years, and to further investigate the "Open Data Movement" that is rapidly growing in international importance. This report discusses some of the key ideas that emerged from the workshop discussion and debate.
The expert conference Open Access Open Data, held in Cologne (D) on December 13th and 14th, 2010, was a two-day intensive debate and exchange occasion. Excellent contributions by the invited speakers alternated with sharp and stimulating participant interventions, creating an inspiring atmosphere. The common theme was the changing scholarly communication system. This report follows some key insights emerging from the debate.
Scholarly communication, digital age and the commons
Dieter Stein (Düsseldorf University, D) highlighted the free nature of scholarly communication, and the yet unexploited potential of the Net, stating that the concepts of access, visibility, and impact should be revisited in the perspective of an Open Science, where "User Generated Content" calls for more "liquid" and "current" publications. Open Science stands for transparency and efficiency, made possible by the new ways of producing science in which one can discuss and debate as the science is processed, not only at the end of the work, but before the creation of a final "product". This new approach also requires revisiting the concept of copyright, because current copyright laws were designed for a print-on-paper age, not the digital age.
Following on this same path, Rainer Kuhlen (Konstanz University, D) developed a fascinating theory of moving from the concept of "knowledge as commons" to a "knowledge ecology", because sustainability of immaterial goods can only be achieved by open and free access, and unrestricted use. The knowledge ecology aims to achieve the goal of people-centered, inclusive and sustainable knowledge societies. In the context of the commons paradigm, this "knowledge ecology" and the idea of open access provide an alternative to existing commercial publishing models on the international information markets, and to international copyright regulations. They both have mainly emphasised the economic impact of knowledge and information. The genuine character of knowledge as a common-pool resource should also be taken into account. From this perspective, the "knowledge ecology" does not object to the commercial use of knowledge produced in public environments such as universities and research centres, but suggests that publishing models are only acceptable when they acknowledge the status of knowledge as a commons, allowing free and open access for everyone. This commons must be based on sharing knowledge, producing new knowledge collaboratively, and providing future generations with the same access and usage rights. As a result, the commercial use of publicly produced knowledge should be the exception and open and free access the default; the public should be compensated when the commons "knowledge" is exploited for commercial usage; there needs to be new property right rules when knowledge is increasingly produced collaboratively. Nobody should have an exclusive rights on knowledge, neither the author, nor the publisher, nor the reader. A new, sustainable business model compliant with such a scenario is needed. Access is the real challenge, as knowledge is usable only if it is accessible.
Alma Swan (Key Perspectives and Enabling Open Scholarship, UK) stressed this concept of "access" and defined it according to her concise, factual and strong definition of Open Access as "immediate, free (to use), free (of restrictions) access to the peer reviewed literature and data". "Immediate" means that no embargo is suitable with Open Access.
On the other hand, Open Access offers many advantages for both the single researcher and the institutions from the increased visibility and impact deriving from this new paradigm. Many examples in different disciplinary fields were shown, along with evidence aimed at removing the common misunderstandings about Open Access itself. Open Access increases citations and boosts the flow of ideas by immediately disseminating the papers, and it gives visibility to works previously confined to paper journals that could only be accessed if paid for. Download statistics of the Institutional Repository ORBi, supported by a strong Rectoral mandate at the University of Liege (B), show that usage comes from many Francophone countries which never had access before to the same journal articles. Potentially, by connecting readers worldwide, Open Access fosters international and interdisciplinary collaborations; it also ensures access to academic findings to Small and Medium Enterprises, where innovation is carried out. Open Access is completely different from vanity publishing or the "stick anything up on the Web" approach. Open Access means moving scholarly communication into the Web age, which is in perfect consonance with all of the voices heard during the conference.
Open Access and Open Data
Who better than Stevan Harnad (Southampton University, UK Montreal University, CAN), one of the world's most well known Open Access advocates, to discuss convergences and divergences between the two topics referred to in the conference title: open access and open data? Harnad strongly supported the idea that self-archiving ("Green road") is immediately feasible, promises to be the most cost-effective strategy, and relies only on the authors' wills, which can (and ought to) be supported by institutions' or funders' mandates the latter proving themselves by positive metrics in terms of downloads and citations that result. The free, immediate and permanent full-text online access, stated by Open Access definitions, refers to peer reviewed literature. Access to any other academic material, data included, is desirable but not the focus of the Open Access paradigm; moreover, as it is not immediately feasible, it risks delaying the whole process. Datasets are similar to journal articles, as they often speak for themselves, but whilst the final work gains from being disseminated, generally researchers should be granted the right of first exploitation of their data. That's why Open Access can be mandated, and Open Data cannot. Besides, science and scholarship is quite different from data gathering it is supposed to be data interpretation. Once these is mandated Open Access self-archiving, as "Green" Open Access grows, data-archiving too will grow, because they are by nature complementary, and because of the power of global collaboration to accelerate and enhance research progress. Data, too, as are journal article drafts, should be deposited locally in Institutional Repositories and then harvested centrally by subject-based repositories, which seems to be the most cost-effective solution in order to maximise the dissemination of research outputs.
Malcolm Read (JISC, UK) agreed on the distinction between Open Access to research outputs (that can be mandated) and Open Data. JISC requires datasets to be available and correctly stored for ten years after their publication, but not made openly available, at least until now. Data provide a competitive advantage for the researchers who gathered them, so a right of first exploitation and mining should be provided. "Open" in the JISC perspective is a key issue, a framework in which strategic topics like Open Access, Open Data, and also Open Educational Resources, can be addressed based upon the principle that publicly funded research ought to be publicly available. In the UK, an Open Access Implementation Group has been created, chaired by Prof. Martin Hall (Vice Chancellor of Salford University), to coordinate the efforts of all the actors involved (academic institutions, funders, Research Councils). Next steps towards Open Data should take into account keywords like integrity, reusability, long term preservation, trust, and sustainability. Guidelines implementation is needed as well as the creation of a seamless infrastructure for storage, search and retrieval which can rely on the existing net of Institutional Repositories. Along this path, some of the major concerns expressed by researchers are their lack of expertise in curating, creating metadata and preserving data; the presence of legal and ethical constraints; and the fear of inappropriate use of the datasets. Yet, the effort is worth making, because of the recognised significant potential in data sharing with other countries like the USA, Australia and Asia, under the umbrella of the forthcoming European 8th Framework Program.
The ongoing 7th Framework Program and its Open Access Pilot Project was described by Celina Ramjoué (European Commission, DG Research and Innovation). She gave an overview of the roles of the European Commission as a policy making body, as a research funding body and as a capacity-building body, which encourage a global approach aimed at improving access and dissemination to foster progress, enabling innovation through improved access, and increasing returns on investments in R&D. The European Research Area (ERA), as stated in the Lisbon Treaty, art. 179, envisages the so called "Fifth Freedom", i.e., the freedom of circulation for researchers, scientific knowledge and technology. Open Access is a strategic keystone in delivering ERA, as stated in the Digital Agenda for Europe, COM (2010) 245. Open Access to publicly funded research is a principle also stated in the Innovation Union, COM (2010) 546, both in Commitment 4 and 20, in order to promote openness and capitalise on Europe's creative potential. In this context, the EC is actively engaging all the member states with questionnaires, and hosted the recent Workshop held in Brussels on November 25 -26, 2010 to find out the next steps towards a sound strategy to Open Access; among many issues, the need for mandates, the need for interoperability and standards, the need for a specific copyright law for scientific publications and the issue of Open Data were all at the forefront. The recent Report of the high-level expert group, Riding the wave: how Europe can gain from the rising tide of scientific data (October 2010) is a further step toward a 2030 vision in which "All of these principles our vision point in the direction of an infrastructure that supports seamless access, use, reuse, and trust of data. It suggests a future in which the data infrastructure becomes invisible, and the data themselves have become infrastructure a valuable asset, on which science, technology, the economy and society can advance."
One of the keywords of the debate reuse was highlighted by Sünje Dallmeier-Tiessen (CERN, CH), both from the perspective of research integrity and in the vision of an acceleration in knowledge creation generated by data sharing. Different scientific communities have different prerequisites and specifications for dealing with data, which is why tailored models are supposed to meet different needs. The contributions of information specialists and their expertise are fundamental in achieving proper treatment of data.
Publishing, searching and finding data: what's new?
During the two-day conference, some key issues in data treatment have arisen and many current and innovative practices were shared:
- New balances between record keeping and knowledge transfer: in the print age, both of these tasks were assigned to publishing. In the Web age, with a new article issued in PubMed every 36 seconds, the exigencies of record keeping and the credit/acknowledge economy which drive the existing publishing market can no longer meet the exigencies of knowledge transfer, simply because there is too much to read. Jan Velterop (Concept Alliance) proposed a fascinating model of "nanopublications", based on triplets of concepts describing the content. They are at the same time assertions and references which provide an immediate label for a paper but also allow a serendipitous discovery of unexpected correlations between different and apparently non-related works.
- New integration between data and scholarly communication: presenting the current landscape of scholarly communication in Germany, Anita Eppelin (German National Library of Medicine, D) underlined two different methods for designing a future scenario: the institutional and the "grassroots" approaches. For the first approach, a Commission for the Future of the Information Infrastructure was set up in 2009 in Germany, working on licences, Open Access, Open Data, digitisation and long-term preservation and virtual research environments.For the second approach, we have scientist-driven, bottom-up, promising projects such as Pangea Publishing Network for Geoscientific & Environmental Data, the European Virtual Observatory and the European Southern Observatory in Astronomy, ArtsHumanities.net as a hub for digital Humanities, CESSDA Council of European Social Science Data Archives, an umbrella organisation for social science data archives across Europe, and the new challenges of data publishing. The underlying hope is that Open Access and Open Data might promote each other.
- Data publishing: although scholarly communication is in everyone's opinion more and more data-driven and data-intensive, up to now data have not always been considered an integral part of a scientific article, as both Olaf Siegert (German National Library of Economics, D) and Gert G. Wagner (German Data Forum, D) highlighted. They are often linked in an external database without any connection to the original article. Martin Rasmussen (Copernicus publ.) presented some examples of scientific journals linking the article and the dataset via DOI, and a new innovative Open Access publication, Earth System Science Data (ESSD), born in 2009. Its scope is to publish articles on original research datasets, furthering the reuse of high quality data. A competitive advantage of this journal is the adoption of peer review specifically for data. As for all the Copernicus publications, it applies the Public Peer Review process: the article is accepted after a rapid preliminary review, put in a special section on the journal's homepage being at that point already readable and citable then submitted for 8 weeks to readers' comments ("public peer review") and to traditional reviewers, and eventually revised by the author on the basis of the received comments and published into the journal issue. This process is a solution addressing both the need for rapid publication and a thorough examination, combining the traditional peer review with the immediate dissemination as per the Open Access paradigm.
- Findability: how many datasets are already available on the Web but are somehow lost in cyberspace? Toby Green (OECD) gave a useful example of integration of books, journals, and even datasets and tables in a single search engine within the OECD library, each with an equal bibliographic status. Jan Brase (German National Library of Science and Technology) called for libraries to open their catalogues to non-textual materials, and presented DataCite, a consortium of 12 institutions worldwide aimed at promoting the use of persistent identifiers for datasets. By assigning DOI names to data sets, data becomes citable and can easily be linked to from scientific publications: data integration within the text is an important aspect of scientific collaboration.
I am deeply indebted to Alma Swan. She knows all the reasons why.
About the Author
Elena Giglia has been working in academic libraries since 1991, at the University of Milan and then at the University of Turin (Economic Library, Central Medical Library). She is now assigned to the Library System (University of Turin). Her interest is in Open Access, electronic publishing, biomedical information seeking strategies and the integration of information sources and e-learning systems.