The Five Stars of Online Journal Articles — a Framework for Article Evaluation

Search D-Lib:

D-Lib Magazine

January/February 2012
Volume 18, Number 1/2
Table of Contents

The Five Stars of Online Journal Articles — a Framework for Article Evaluation

David Shotton
Department of Zoology, University of Oxford
david.shotton@zoo.ox.ac.uk

doi:10.1045/january2012-shotton

Printer-friendly Version

Abstract

I propose five factors — peer review, open access, enriched content, available datasets and machine-readable metadata — as the Five Stars of Online Journal Articles, a constellation of five independent criteria within a multi-dimensional publishing universe against which online journal articles can be evaluated, to see how well they match up with current visions for enhanced research communications. Achievement along each of these publishing axes can vary, analogous to the different stars within the constellation shining with varying luminosities. I suggest a five-point scale for each, by which a journal article can be evaluated, and provide diagrammatic representations for such evaluations. While the criteria adopted for these scales are somewhat arbitrary, and while the rating of a particular article on each axis may involve elements of subjective judgment, these Five Stars of Online Journal Articles provide a conceptual framework by which to judge the degree to which any article achieves or falls short of the ideal, which should be useful to authors, editors and publishers. I exemplify such evaluations using my own recent publications of relevance to semantic publishing.

1. Introduction

Many people will be familiar with Tim Berners-Lee's Five Stars of Linked Open Data (Text Box 1), incremental steps that categorise the publication of open data on the Web in levels of increasing usefulness, that encapsulate the present shared vision of the Semantic Web as a Web of Linked Open Data, and that individuals can use to rate their own data publication.

Tim Berners-Lee's Five Stars of Linked Open Data (from Berners-Lee, 2009)
	Make your data available on the web (whatever format), but with an open licence, to be Open Data.
	Make them available as machine-readable structured data (e.g. excel instead of image scan of a table).
	As (2), but use non-proprietary formats (e.g. CSV instead of excel).
	All of the above, plus: Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff.
	All the above, plus: Link your data to other people's data to provide context.

To complement these, I wish to propose the Five Stars of Online Journal Articles, in particular to characterize the potential for improvement to the primary medium of scholarly communication made possible by Web technologies. The background to these Five Stars of Online Journal Articles involves semantic publishing, considerations of the future of research communications, and the Semantic Web itself.

The Semantic Web

While proponents of the Semantic Web have on occasion appeared to resemble Old Testament prophets, whose messages of truth went unheeded by the general populace, uptake by a number of influential parties such as the BBC and skilful marketing of Semantic Web concepts under the banner of 'Linked Data' have recently brought Semantic Web technologies into more widespread acceptance. The principles are quite simple. If entities and their relationships can be identified and defined in machine-readable form by the use of unique URIs referencing publicly available and commonly accepted structured defined vocabularies (ontologies), and if each of these relationships is expressed as a simple subject — predicate — object statement (a 'triple'), following the syntax of the Resource Description Framework (RDF), then such statements can be combined into interconnected information networks (RDF graphs) in which the truth content of each original statement is maintained, thereby creating a web of knowledge, the Semantic Web.

Ontological descriptions of entities and their relationships enable data from independent sources to be integrated without ambiguity or loss of precision of meaning, a situation that would be impossible if the entities were to be described in other ways such as XML, where the lack of universally agreed meanings for markup terms frequently leads to confusion with respect to synonyms (for example, whether "creator" in one schema is equivalent to "composer" or to "choreographer" in another) or homonyms (for example, the potentially different meanings of the markup term "gift", meaning "present" in an English database but "poison" in a German one).

There now exist many powerful examples of how use of Semantic Web technologies 'under the hood' permit integration into unified and coherent services of data originally encoded using non-compatible metadata models and housed in heterogeneous databases. The best example with which I have personally been involved is CLAROS, "The World of Art on the Semantic Web", in which information describing ancient art objects housed in the world's museums is integrated from a number of scholarly sources (Kurtz et al., 2009). The benefits of Semantic Web technologies for libraries were recently discussed at the 2011 annual Semantic Web in Libraries meeting entitled Scholarly Communication in the Web of Data, held in Hamburg, Germany, in November 2011.

Semantic publishing

Journal publication, as the primary dissemination channel and public record of new research results, is a vital ingredient of the scholarly workflow, and its key commodity, the original research article, is of primary importance, since it provides a dated 'version of record' of the authors' hypotheses, supporting results and conclusions at the time of publication, validated by peer review, and as such becomes an immutable part of the scientific record. The basic format of the scientific journal article has changed little since its inception some 350 years ago. It remains a linear rhetorical narrative, in which authors attempt to persuade readers of the correctness of specific hypotheses by the presentation of experimental evidence selected from larger bodies of data.

At present, the majority of journal publishers use the Internet simply as a convenient mechanism for distributing journal articles in PDF format, providing electronic facsimiles of printed pages. While PDF documents are convenient for printing and off-line reading, the typical lack of any form of semantic enhancement or user interactivity, and the difficulties they present for machine interpretation, presently inhibits the development of automated services that could enrich the content of journal articles or link information between articles.

However, various initiatives have recently started to change this status quo, exploring how the Web can be used to enrich online scholarly communications in various ways that are not possible in print. For example, exemplars bearing semantic enhancements have been made of HTML versions of journal articles (Shotton, 2009; Shotton et al., 2009), text-mining Web services have been created that can automatically add semantic markup to named entities within HTML text (Pafilis et al., 2009) or pull back contextual information from cited papers (Wan et al., 2010), and 'smart' PDF readers such as Utopia Documents have been developed that provide annotation overlays to enrich the otherwise-static content of PDF articles (Attwood et al., 2010). Silvio Peroni and I have developed the SPAR (Semantic Publishing and Referencing) Ontologies to facilitate such developments (Shotton, 2010; Peroni and Shotton, 2011), and publishers, including the Royal Society of Chemistry's Project Prospect, Elsevier's Article of the Future, and Pensoft Journals, are starting to provide semantically enriched journal articles as part of their routine publishing workflows.

The publication of journal articles with such enhancements has come to be known as 'semantic publishing', a term that I define as the use of simple Web and Semantic Web technologies:

to enrich the content of an online research article — for example by means of interactive figures, re-orderable reference lists, semantic lenses (e.g. showing graphs when mousing over tables of numerical data, or animations when mousing over diagrams),
to enhance the meaning of the article's content — for example by means of semantic markup of named entities, with links to descriptive definitions of the meanings of terms and concepts, and to additional information concerning these entities (e.g. to Protein Database entries from particular protein names),
to provide links to other information sources of relevance to the article — for example authors' home pages, reagent suppliers' Web sites, and international organizations of relevance (e.g. to the World Health Organization from within an epidemiology paper),
to provide direct links to all of the article's cited references,
to provide access to the data within the published article in actionable form — for example as a downloadable spreadsheet or CSV file,
to link articles to the full research datasets that underpin them,
to facilitate integration of the article's data with semantically related scientific information elsewhere in the literature or on the Web, and
to aid discovery by openly publishing machine-readable descriptive metadata — for example detailing the article's bibliographic record, summarizing its content, and providing bibliographic details of its cited references.

The goal of such semantic publishing is that the data, information and knowledge described in the online article can more easily be found, extracted, combined and reused.

The future of research communication

Four key meetings were held during 2011, bringing together academics, computer scientists and scholarly publishers to discuss the future of scholarly communication. The first of these, a workshop entitled Beyond the PDF, organized and hosted in January 2011 by Philip Bourne at the University of California, San Diego, itself built on an earlier HyPER workshop organized in May 2010 in Amsterdam by Anita de Waard of Elsevier Labs (de Waard et al., 2009). It was followed by a meeting entitled Beyond Impact, organized by Cameron Neylon of STFC at the Wellcome Trust headquarters in London in May 2011, that considered alternative metrics to the journal impact factor for the evaluation of research — and particularly researcher — merit. In August 2011, a further meeting on The Future of Research Communication, organized by Phil Bourne of UCSD, Tim Clark of Harvard University, Robert Dale of Macquarie University, Anita de Waard of Elsevier Labs, Ivan Herman of the W3C, Eduard Hovy of the University of Southern California and myself, was held in Germany as a Schloss Dagstuhl Perspectives Workshop. This led to the formation of the Force11 Community dedicated to the improvement of research communication and e-scholarship, and to the publication in October 2011 of the Force11 White Paper (Bourne et al., 2011), that was submitted as evidence both to the Royal Society's Science as a Public Enterprise project and to the UK Cabinet Office's public consultation Making Open Data Real. Most recently, in October 2011, Microsoft Research and Harvard University jointly hosted a meeting in Cambridge, Massachusetts, entitled Transforming Scholarly Communication that took these ideas forward. The thinking undertaken at these meetings has contributed significantly to the formulation of the Five Stars of Online Journal Articles.

2. The Five Stars of Online Journal Articles

I propose five factors — peer review, open access, enriched content, available datasets and machine-readable metadata — as the Five Stars of Online Journal Articles, a constellation of five independent criteria within a multi-dimensional publishing universe against which online journal articles can be evaluated, designed to characterize the potential for improvement to the journal article made possible by Web technologies.

		Peer review Ensure your article is peer reviewed, to provide assurance of its scholarly value, quality and integrity.
		Open access Ensure others have cost-free open access both to read and to reuse your published article, to ensure its greatest possible readership and usefulness.
		Enriched content Use the full potential of Web technologies and Web standards to provide interactivity and semantic enrichment to the content of your online article.
		Available datasets Ensure that all the data supporting the results you report are published under an open license, with sufficient metadata to enable their re-interpretation and reuse.
		Machine-readable metadata Publish machine-readable metadata describing both your article and your cited references, so that these descriptions can be discovered and reused automatically.

While Tim Berners-Lee's Five Stars of Linked Open Data build one upon the other, representing degrees of achievement or completeness along the single axis of online data publication, the proposed Five Stars of Online Journal Articles are complementary, forming a constellation arranged along five independent axes within a multi-dimensional publishing universe, each of which can be evaluated on its own merits. Of course, the degree of achievement along each of these publishing axes can vary, equivalent to the different stars within the constellation shining with varying luminosities.

The Five Stars of Online Journal Articles thus encapsulate a richer vision. Each star is highly desirable in its own right, but it is only by achieving them all in combination that we will truly advance scholarly communication. Let us now consider how we might score performance of individual articles against each star. My comments are addressed primarily to authors, but it should be clear to everyone that realization of these publishing goals will require the active and enthusiastic collaboration of journal publishers and editors.

2.1 Peer review

Ensure your article is peer reviewed, to provide assurance of its scholarly value, quality and integrity.

Quality assurance of journal articles has traditionally been provided by anonymous pre-publication peer review. In my own experience, peer review has always been a positive experience, the reviews being fair and the modification made in response to reviewers' comments invariably improving the overall readability and quality of the articles. However, the practice is currently being seriously challenged, for several reasons. First, the system for undertaking pre-publication peer review is inefficient and protracted, labouring under the strain of an ever-increasing number of papers being submitted for publication. Second, the academics who are expected to undertake this activity for the benefit of their fellow academics, and without payment from the scholarly publishers, are increasingly reluctant to do so, since good reviewing takes effort for which reviewers receive scant reward in terms of academic recognition, when they are under pressure from other quarters. Third, the service has been criticised for failing to achieve its objective of ensuring that those papers accepted for publication are consistently of high quality. Finally, it has on occasion provided scope for extreme academic malpractice, by giving opportunity for the reviewer to delay the publication of a competitor's work while undertaking research based on stolen ideas, thereby giving the reviewer academic credit that rightly belongs to the authors of the paper under review.

Three approaches have been proposed to improve the peer-review process and guard against such malpractice. First, that reviewer anonymity should be withdrawn, not only to reduce misconduct, but also to permit academic credit to be awarded more transparently to the majority of good reviewers who give their time to this process. Second, that the reviews should be published along with the reviewed article, so that readers can see the contributions made by the reviewers to the final text. Third, most controversially, that the process of quality assurance should be decoupled from the act of publication.

A small number of journals have now adopted fully open reviewing for all their papers, with apparently satisfactory results. However, critics of this policy point out that, at least in some disciplines such as the humanities that have less openly critical cultures, the lack of anonymity may inhibit reviewers from delivering more forthright comments, thus weakening the review process.

Since publication can now be undertaken entirely on line, at considerably reduced cost compared to printing paper journals, there is no requirement that an article be prepared in its final form before being published. Post-publication peer review enables the responsibility for reviewing to be broadened from the two or three individuals selected by a journal's editorial staff to the wider academic community, whose feedback can then be incorporated into a revised version of the paper which is then re-published.

Such post-publication peer review is considered 'light-weight' by many scientists, and has been criticized as working well for controversial or high-interest papers, but less effectively for sound papers of more limited interest, not least because readers are under time pressures of their own, and are reluctant to engage in activities for which there are no established academic reward mechanisms. However, post-publication peer review is the rigorous norm for those who publish Internet specifications and Web standards documents — RFCs (Requests for Comments) published by the Internet Engineering Task Force (IETF) and Candidate Recommendations published by the World Wide Web Consortium (W3C) — where the whole purpose of the initial publication is to make these documents available for post-publication peer review for a specified period of time, during which comments and criticisms from any interested parties are received and acted upon, before the new standards are formally agreed and published.

Papers published on line can additionally receive comments and be subject to approval ratings by readers, with the quality of the paper being determined at least in part by its perceived usefulness, although in practice the uptake of opportunities to make such comments on published articles has been limited.

These alternative possibilities enable us to evaluate the process of peer review, the first of the Five Stars of Online Journal Articles, in terms of its effectiveness and openness, using the following simply five-point scale, from 0 to 4:

	0	No peer review The article is published without pre-publication peer review, for example in Nature Preceedings or on a preprint server such as arXiv.
	1	Pre-publication peer review The article has been critiqued and reviewed by two or more appropriate experts at the invitation of the journal editor, and has been accepted for publication on the basis of the reviewers' comments and suggestions. The reviewers remain anonymous and their reviews are not published. This is the situation with many journal articles. Some journals, for example PLoS ONE, do not require reviewers to evaluate the potential importance, impact or breadth of appeal of the article, only its scientific soundness.
	2	Responsive peer review The article has been critiqued and reviewed by two or more appropriate experts (either before or after initial publication), and the author has been able to make substantive responses to these comments, by means of a rebuttal of the reviewers' criticisms sent to the journal editor, or changes to the final published version of the paper in light of the reviewers' comments and suggestions. The reviewers remain anonymous and their reviews are not published. Most journals adopt this policy for their articles.
	3	Post-publication peer review In addition to responsive peer review of a journal article, readers are able to post comments after publication of the article. These may extend the discussion within the article, draw attention to additional relevant findings, or challenge the author's conclusions. These comments are available to other readers of the article, and the author is alerted and able to respond to them, thus constituting post-publication peer review. PLoS (Public Library of Science) journals, among others, enable this for their articles. Other forms of post-publication peer review are also possible, as for examples in RCFs and Candidate Recommendations.
	4	Open peer review The whole review process is entirely transparent. Each submitted manuscript is immediately made available on the journal's website. Reviews and comments from readers are welcomed, and are considered alongside the formal peer reviews solicited from experts by the journal. All the reviews, the author's responses, and the original and final versions of the article are published, and the appointed reviewers and editors are acknowledged by name in the final version. BMJ Open practices such open peer review, and the Semantic Web Journal strongly encourages it, while allowing reviewers the option of remaining anonymous.

2.2 Open Access

Ensure others have cost-free open access both to read and to reuse your published article, to ensure its greatest possible readership and usefulness.

The most fundamental change that the Internet has brought to scholarly publishing in recent years, over and above the move from print to online provision of journal articles, and the greatest challenge to the traditional business models of subscription access publishers, has been the growth of open access (OA) provision, in which articles are made available to readers without subscription or fee barriers. Without the technical possibility of using the Internet to deliver content cheaply, the open access movement would have been still-born.

As with peer review, varying degrees of access openness are possible, and careful distinctions have to be made. In particular, an open access article may be available to read without payment, but such an article may remain covered by copyright and license restrictions. These prevent all forms of transmission, reproduction and reuse beyond that allowed by the 'fair use' or 'fair dealing' principles of copyright law, thereby preventing reuse of the content for text mining, for the production of derivative works, or for commercial purposes, without the written permission of the copyright owners.

The nomenclature used to characterise different types of open access is confusing and variably employed. My understanding in this area has been guided by two particularly helpful blog posts on this subject by Peter Suber (2008) and Peter Murray-Rust (2011), who clearly distinguish two orthogonal axes of classification:

The location of the open access copy of the article, characterised as green versus gold: green open access means the OA article is available without payment from a site other than the journal Web site, e.g. from an institutional repository; while gold open access means that the article is available without payment from the journal's own Web site.

The type of open access, characterised as gratis versus libre: gratis open access signifies removal of the price barrier alone, giving a right to view the article; while libre open access signifies removal both of the price barrier and at least some of the permission barriers limiting reuse, giving a right to use the article.

While both imply 'free' (a potentially ambiguous word), gratis open access equates to 'free as in beer', while libre open access equates to 'free as in speech'. Gratis open access is thus a necessary but not a sufficient condition for libre open access.

The fundamental Open Access declarations that relate to scholarly publishing — the 2002 Budapest Open Access Initiative, the 2003 Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities, and the 2003 Bethesda Statement on Open Access Publishing — all defined OA in 'libre'-oriented prose, but many publishers' definitions of open access equate only to gratis open access. Clarity of what it means to be fully open is given by the Open Definition of the Open Knowledge Foundation:

"A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike."

Both green open access and gold open access articles can be either gratis open access or libre open access. Libre open access is most clearly specified by use of an explicit license, such as the Creative Commons Attribution License, that states clearly what rights are given to the reader/user of the article, or by the use of a rights waiver that places the article in the public domain. Without such a clear specification, it is wise to assume that any OA article is only 'gratis open access'.

Clearly, it is difficult to bring these two orthogonal classifications together into the single evaluation scale required for the second of the Five Stars, so I have taken the conservative approach of assuming that OA articles are only gratis open access, unless otherwise stated:

	0	No open access The article is published in a subscription access journal, inaccessible to those who lack personal or institutional subscriptions. The author is typically required to transfer copyright to the publisher, and self-archiving of the published article in an institutional repository or elsewhere is not permitted.
	1	Self-archiving green/gratis open access The subscription access journal permits the author to self-publish a preprint, the post-peer-review 'postprint', or a copy of the publisher's PDF of the article in an institutional repository or elsewhere without an embargo period, allowing third parties the chance to read the text without charge, although not to reuse it freely. The copy of the article on the journal web site remains available only to subscribers.
	2	Funder-mandated green/gratis open access In response to a mandate from the funder of the research project described in the article, and upon payment of a fee by the funding agency to the publisher, the publisher permits a copy of the article to be deposited in PubMed Central, allowing third parties the chance to read the text without charge, although not to reuse it freely. The copy of the article on the journal web site remains available only to subscribers.
	3	Author-pays gold/gratis open access In exchange for payment of a fee by the author (or the author's institution) to the publisher, the journal will publish the article on the journal web site so that third parties can read the article without payment, although they are not free to reuse the content. Journals that permit author-pays gold/gratis open access may additionally permit self-archiving of the published article in an institutional repository or elsewhere as green/gratis open access.
	4	Author-pays gold/libre open access In exchange for payment of a fee by the author (or the author's institution) to the publisher, the journal will publish the article on the journal web site under a Creative Commons or similar attribution license, making the article freely available for reading and for at least some forms of reuse, including self-archiving, on the condition that attribution is provided to the publisher's 'version of record'.

These categorizations require further explanatory comment:

No open access

A few subscription access journals, mostly in the biomedical sciences, permit the author to upload the published version of the article to an institutional repository or personal Web site (i.e. self-archiving green/gratis open access) after an embargo period of typically six to twelve months. Information about this is given by SHERPA/RoMEO. However, timely access to newly published research information remains strictly limited to journal subscribers.

Self-archiving green/gratis open access

In physics, mathematics and computer science, use of Cornell University's ArXiv preprint repository is the norm. ArXiv is an exemplary repository, since all its content is either made available under the Creative Commons Attribution license or the Creative Commons Attribution-Noncommercial-ShareAlike license, or has been placed in the public domain by association with the Creative Commons Public Domain Declaration. Thus clearly ArXiv content is green/libre open access.

However, for most research disciplines, where there is no culture of depositing preprints in a single subject-specific open archive before submission to journals for publication, green open access is a poor fourth among the open access choices. This is both because of the difficulty that potential readers have in finding the open access versions of articles scattered across institutional repositories (although new cross-repository search services such as CORE are improving that situation), and because the license arrangements for reuse of such content are typically unclear.

Consider, for example, the Oxford Research Archive, the institutional repository of the University of Oxford. ORA provides a helpful Copyright Guide for the benefit of authors depositing works in ORA, in which the copyright restrictions that publishers may place on published works are discussed, and where the possibility of using a Creative Commons licence is mentioned. However, its guidance to readers concerning ORA content is as follows:

"The full text of many of these items is freely available to be used in accordance with copyright and end-user permissions." (My emphasis)

Eprints Soton, the University of Southampton Institutional Research Repository, carries an identical statement, while Dspace@Cambridge, the institutional repository of the University of Cambridge, has a more restrictive blanket statement:

"Copyright and other intellectual property rights subsist in this site, in the Deposited Works and in any accompanying documentation and metadata. Unless otherwise noted, Deposited Works in DSpace@Cambridge are made freely available for access, printing and download for the purposes of non-commercial research or private study only. You may not further copy, reproduce, publish, ... or otherwise use a Deposited Work in whole or in part or in any manner or in any media without the express written permission from the appropriate rights owner(s) of the Deposited Work(s)." (My emphasis)

The metadata for the most recently deposited ORA research article, a PDF copy of Knight et al. (2011), The Puzzle of Migrant Labour Shortage and Rural Labour Surplus in China, contains no information about the open access status of this article, which one must therefore assume to be only gratis open access. That is confirmed by going to the original article on the Elsevier journal web site, China Economic Review 22 (4): 585-600 (December 2011) doi:10.1016/j.chieco.2011.01.006, where there is a link "Permissions and Reprints" that takes you to a page entitled "RightsLink Copyright Clearance Centre". There one can calculate the cost of reusing the article for purposes other than reading for research or private study. To use 15 print copies of the article as course material for teaching within the University of Oxford would cost £26.74, while to use 15 print copies of the article for training purposes within a commercial organization would cost £384.34.

One must assume that the situation is similar in other institutional repositories, and that items are available only as green/gratis open access unless there is an explicit libre open access license. Therefore, as Peter Murray-Rust concludes: "By default, unless the author/self-archivist makes a special effort, the reader (of an item in an institutional repository) has no rights of use over the deposited item." (Murray-Rust, 2011).

Funder-mandated green/gratis open access

Although the fee paid by the funding agency to the publisher to deposit a copy of the article in PubMed Central is substantial (typically $3,000-$5,000 per article), it is important to realize that this gains the reader only gratis open access rights. In particular, content is not available for text mining. Of the entire content of PubMed Central, some 2.3 million articles, only about 10% are within what PubMed Central terms the Open Access Subset, having some form of libre open access license — these coming mainly from publishers that themselves have a gold/libre open access policy. It is from the reference lists of those articles, rather than the full PubMed Central corpus, that we have created the Open Citations Corpus of some 6.3 million bibliographic references, referencing about 20% of all the biomedical articles published between 1950 and 2010, including the most important papers in every discipline. Published under a libre Creative Commons attribution license and expressed in RDF, these citations are available for human scrutiny and for automated querying via a SPARQL endpoint, and the entire corpus can also be downloaded for re-use.

Author-pays gold/gratis open access

The fees for author-pays gold/gratis open access are typically substantial — in the range of $500-$3,250 per article. The SHERPA/RoMEO Web site provides details. Since my Five Stars are designed to operate at the article level rather than the journal level, I do not distinguish here between articles that are made open access on an individual basis within what is otherwise a subscription access journal (sometimes called a 'hybrid' open access journal), and articles within a 'true' open access journal in which all the articles are open access. Elsevier calls the former arrangement 'sponsored access', and a journal in which all the articles are open access an 'author pays journal'. At the individual article level, however, there is no difference: the author pays a fee, and everyone can read the article freely on the publisher's Web site.

Rather, the issue is whether or not readers have reuse rights over the article, or whether only gold/gratis open access is granted. For example, Elsevier's policy concerning author-pays open access articles is stated on its Terms and Conditions page, which one can reach by clicking the Terms and Conditions footnote on the home page of its only open access journal, International Journal of Surgery Case Reports. Among other restrictions, this states:

"All content contained on or accessed from the Site ... is owned by Elsevier or its licensors and is protected by copyright, trademark and other intellectual property and unfair competition laws. You may not copy, display, distribute, ... or create other derivative works from ... all or any part of the Content ... except as otherwise expressly permitted under these Terms and Conditions, relevant license or subscription agreement or authorization by us. Unless expressly authorized by us, you may not ... automatically search, scrape, extract, deep link or index any Content."

Clearly, in exchange for the ~$3,000 Elsevier article sponsorship fee or author-pays journal article fee, readers are only getting gold/gratis open access. This low value for money is perhaps reflected in the fact that, in 2009, from within the 450 Elsevier journals offering 'sponsored access' to their articles, only 515 'sponsored access' articles were published.

Author-pays gold/libre open access

The fees for author-pays gold/libre open access are also typically substantial — again in the range of $500-$3,250 per article, but in this case the publisher allows third parties both to read all its articles on the journal Web site free of charge, and to reuse the content. As Peter Suber (2008) points out, libre open access encompasses a range of possibilities, corresponding to which permissions for reuse have been granted. For example, it might be possible to use the content of the article to create a derivative work, but not if it is to be used for commercial purposes. The scope for reuse is determined by the nature of the license under which the article is published. It is thus an over-simplification to call libre open access 'full open access'.

BioMed Central and The Public Library of Science (PLoS), the two major publishers of OA journals in the biomedical sciences, both use the most permissive open access attribution license for all the works they publish. (The BioMed Central Open Access license agreement bears a different name, but is otherwise identical to the Creative Commons Attribution License employed by PLoS.) Under that license, authors retain ownership of the copyright for their content, but allow third parties to download, reuse, reprint, modify, distribute, and/or copy the content for any purpose including commercial, as long as the original authors and source are cited. No permission is required from the authors or the publishers. This is clearly the most helpful situation for potential reusers of published articles. The semantic enhancements we were able to apply to the Reis et al. (2008) article in PLoS Neglected Tropical Diseases (Shotton et al., 2009) were made possible because the article was published under such a license.

2.3 Enriched content

Use the full potential of Web technologies and Web standards to provide interactivity and semantic enrichment to the content of your online article.

Web technology can be used to provide various semantic enhancements of scholarly journals articles, links to external information sources of relevance to the textual context, and different types of user interactivity, as outlined in the Semantic publishing section of the Introduction.

	0	No enhancements The article is published on line as an HTML or PDF document with no features beyond those that would be found in the print edition of the same article.
	1	Active Web links The online article contains functional Web links to information and Web sites of direct relevance to the article, for example to authors' home pages, suppliers' catalogues, databases and cited articles.
	2	Semantic enrichment of the text Key terms and concepts within the text are identified and distinguished, for example with mouse-over pop-ups providing semantic definitions, formulae, links to database entries, etc. pulled by live Web services. Reference lists have citation typing.
	3	'Lively' content The article contains interactive figures, semantic lenses revealing numerical data beneath graphs, pop-ups providing excerpts from cited papers relevant to the textual citation contexts, re-orderable reference lists, and other forms of interactive content.
	4	Data fusions ("mash-ups") Data within the article are integrated with pre-existing information (e.g. with similar data from other articles or databases), geographical location data from within the article are available as KML files for visualization in Google Maps, etc.

Since the various types of semantic enrichment possible and some of the means for achieving them have been detailed elsewhere (Shotton et al., 2009; Shotton and Portwin, 2009), they will not be discussed further here. As stated above, several publishers and journal editors are undertaking such enrichments. However, these would best be achieved during authoring. When writing an article, authors can easily achieve quick wins in terms of functionality by ensuring plentiful links to external Web resources are provided (e.g. to their own home pages, to reagent suppliers' catalogues, and to cited articles). An open source plugin to Word 2007 has been published that permits semantic markup of named entities according to chosen ontologies (Fink et al., 2010), and it is hoped that other such semantic authoring tools will soon become available.

2.4 Available datasets

Ensure that all the data supporting the results you report are fully published under an open license, with sufficient metadata to enable their re-interpretation and reuse.

Through the Brussels Declaration of STM Publishing, academic publishers have strongly endorsed the principal that research data relating to journal articles should be made freely available, to enable inspection of the data and validation of the claims made in the article, and to permit data reuse in other contexts. Particularly if the research has been undertaken with public funding, it is now increasingly held that research data should be regarded as a common good (Boulton et al., 2011; Wood et al., 2010), and mechanisms to facilitate their publication are being proposed (Greenberg et al., 2009; Van der Graaf and Waaijers, 2011; Bourne et al., 2011). However, in this commendable enthusiasm for openness, it is important to acknowledge the personal time and effort invested by the researchers who discover or create the data, and their moral right to have the first chance to explore, publish on and benefit academically from the data, before publishing them for the benefit of others.

It is also important to emphasize that the term 'data' should be interpreted here in very general terms, to encompass any outputs from a research investigation over and above the text of journal articles. Thus 'data' can include images, sound recordings and videos, graphs and diagrams, animations and simulations, mathematical models, protocols and workflow, and software, as well as numerical datasets.

The principles of how best to make data available on the Web have already been described by Tim Berners-Lee in his Five Stars of Linked Open Data. Some overlap is inevitable, but the following ratings are intended to reflect the nature of the data made available, and where, when and to whom that availability is granted.

	0	No published data The only data available are those that can be obtained by the reader from within the article itself. Figures and tables are not separately available for download, nor are any supporting datasets.
	1	Supplementary information files available Supplementary information files are available from the journal Web site, and/or the figures and tables containing research data within the article are available for download. However, the formats for these entities are not optimised for reuse — for example, figures and tables from PLoS journal articles are only available in TIFF or PNG image format.
	2	Article data downloadable in actionable form The data contained within the figures and tables of the article, and within its supplementary information files, are available in appropriate actionable formats, for example numerical data in downloadable numerical spreadsheets or CSV files.
	3	Underlying datasets published The full research datasets created during the research project, from which the sub-set of data included within the published article has been selected, are published in a permanent archive or repository, with a unique resolvable identifier (e.g. a URI or a DataCite DOI), with an open access data license or a CC Zero waiver and public domain dedication, and with sufficient descriptive metadata to enable their re-interpretation and reuse.
	4	Data available to peer-reviewers In cooperation with the journal, the datasets supporting a journal article are made available to peer reviewers, to assist in evaluation of the article. This is usually achieved privately, prior to the publication of these datasets at the same time as the article.

Where data are published is of great importance. Authors should bear in mind the very unsatisfactory nature of journal supplementary information files as repositories for valuable research data, in terms of openness, discoverability, curation, and reliable persistence (Evangelou et al., 2005; Anderson et al., 2006; Smit, 2011). As safer havens for published data, they should look instead to institutional repositories or, better, to subject-specific databases and repositories. For example, the Dryad Data Repository curates biological datasets linked to peer-reviewed journal articles, makes them available pre-publication to peer reviewers, and then publishes them, either at the same time as the article or after an optional embargo period, under a Creative Commons CC Zero open data waiver, with DataCite DOIs to permit proper citation and the award of academic credit.

2.5 Machine-readable metadata

Publish machine-readable metadata describing both your article and your cited references, so that these can be discovered automatically.

To date, publishers have employed a variety of proprietary XML-based informational models and document type definitions (DTDs) to mark up component parts of an electronic document (title, author list, abstract, etc.) in ways that assist the publishing process, but all too often even these basic metadata are not made available to readers, who are given only a PDF version of the article.

Modern Web information management techniques employing W3C standards such as RDF and OWL2 permit such information to be encoded using standard vocabularies in ways that permit computers to query metadata and integrate Web-based information from multiple resources in an automated manner. The SPAR (Semantic Publishing and Referencing) Ontologies are just some of the vocabularies being used for this purpose to describe scholarly publications (Peroni and Shotton, 2011).

Using these Web standards and vocabularies, it is possible to provide semantic descriptions of the structural and rhetorical components of the article using DoCO, the Document Components Ontology, and to create and publish machine-readable RDF metadata that describe the journal article itself, i.e. that encode the standard bibliographic information defining the article (authors, publication year, title, journal name, volume number, page numbers, DOI, etc.) using FaBiO, the FRBR-aligned Bibliographic Ontology and BiRO, the Bibliographic Reference Ontology. It is also possible similarly to encode bibliographic information for all the references within the article's reference list, and to use CiTO, the Citation Typing Ontology, both to assert the existence of a citation between the citing and the cited papers (i.e. <Paper A> cito:cites <Paper B> . ) and also to characterise the type or nature of that citation both factually and rhetorically (Shotton, 2010; Peroni and Shotton, 2011).

Of course, machine-readable metadata need not stop there. There are a growing number of checklists and minimum information standards specifying the information that should be included within research publications, or defining the metadata to describe articles or datasets within particular domains. One such example is MIIDI, a Minimal Information standard for reporting an Infectious Disease Investigation. Using the MIIDI Editor, metadata may be structured according to MIIDI to describe an infectious disease investigation and its research outputs, including journal articles and research datasets. For the former, the metadata can include statements about the main hypotheses of the research investigation and the principle conclusions described in the article, in addition to providing factual statements concerning the nature of the disease, the number of patients, etc.

The availability of article metadata can be rated on the following scale:

	0	No available metadata The article is published as a PDF document only. The XML markup used by the publisher during the article production, editing and publication workflow is not published.
	1	Structural markup available The XML markup of the structure of the document using the publisher's DTD, denoting title, author list, abstract, etc., is included in the XHTML version of the online article.
	2	Bibliographic and citation metadata available Full bibliographic metadata for the article and full citation metadata for its reference list are published as downloadable machine-readable files, or as linked open data in a triple store.
	3	Rich embedded markup Additional structural, rhetorical and semantic markup is available embedded within the online article, encoded as RDFa or in a similar machine-readable format.
	4	Structured article summary A machine-readable summary of the key facts, hypotheses, data and conclusions of the article is made freely available in both human- and machine-readable form, based on a minimal information standard appropriate for the domain.

There are several ways in which such metadata may be made available. As indicated above, structural markup may be included within the XHTML document itself. By using RDFa, it is also possible to embed semantic markup within the Web document in such a way that these machine-readable metadata become part of the Web of Linked Open Data. Other possibilities of embedded markup exist using microdata within HTML5 documents. Alternatively, bibliographic and citation metadata can accompany the relevant journal article as supplementary online RDF files: such files accompany Shotton (2010) and our enhanced version of Reis et al. (2008). However, as for the research datasets relating to the article, it is advantageous if the relevant metadata files are also submitted to appropriate linked open data repositories, such as those of the Open Bibliography Project and the Open Citation Corpus.

Detailed metadata describing the content of a paper can form the basis for a structured digital summary describing the essence of an article in both human- and machine-readable form, which can be published as an Open Research Report in an open access data journal (more strictly, in this case, a 'metadata journal'), while individual factual statements from a paper can be published as nanopublications (Groth et al., 2010).

3. Evaluating published articles against the Five Stars

While the criteria adopted for the evaluation scales presented in Section 2 are somewhat arbitrary, and while the rating of a particular article on each axis may involve elements of subjective judgments, these Five Stars of Online Journal Articles provide a conceptual framework by which to judge the degree to which any article achieves or falls short of the ideal, which should be useful to authors, editors and publishers, who should now ask themselves:

"How do my online journal articles rate against these Five Stars?"

As an exercise in 'drinking my own champagne', I have evaluated the Reis et al. (2008) article as it was before and after our semantic enhancements, and also my own recent publications, including this article, to provide exemplars. Each is rated with respect to each of the Five Stars of Online Journal Articles on the five-point scales given in Section 2. I present the results both by means of constellation diagrams, within which the stars have different magnitudes, and in tabular form, with an overall numerical rating for each paper. (Full bibliographic details of the following papers are given in the References section.)

Reis et al. (2008).

Journal: PLoS Neglected Tropical Diseases. Publisher: Public Library of Science.

This research paper contains different types of analysed data concerning the risk factors of contracting the disease leptospirosis for inhabitants of an urban slum in Salvador, Brazil. The underlying unpublished raw datasets contain confidential information about individuals' health, financial, familial and employment status.

Rating: Original version: http://dx.doi.org/10.1371/journal.pntd.0000228.

Peer review (P)	3	Responsive peer review, with the opportunity for readers to make post-publication comments.
Open access (O)	4	Gold/libre. Published in an open access journal with a Creative Commons attribution license, thus making possible the semantic enhancements made to the original by Shotton et al. (2009).
Enriched Content (E)	0	Article lacks useful web links in the text, and lacks direct links to most of the referenced papers.
Available datasets (A)	1	Figures and table within the article have their own DOIs, but are only downloadable as images, so that the data in the graphs and table are not available in actionable form. A Portuguese translation of the Abstract is available as a separate downloadable Word document.
Machine-readable metadata (M)	1	Structural markup is available in the XHTML version of the article.
Overall rating:	9

One week after its publication, I chose the Reis et al. (2008) paper for semantic enhancement, which was undertaken with the support of the authors and PLoS and is described in Shotton et al. (2009) and Shotton and Portwin (2009). I then republished the paper with these enhancements to act as an exemplar.

Rating: Semantically enhanced version: http://dx.doi.org/10.1371/journal.pntd.0000228.x001.

Peer review (P)	3	Responsive peer review of the original article, with the opportunity for readers to make post-publication comments.
Open access (O)	4	Gold/libre. Semantically enhanced version republished with a Creative Commons attribution license.
Enriched Content (E)	4	Of many types, as fully described in Shotton et al. (2009).
Available datasets (A)	2	Data for the table and two figures kindly provided by the authors, and made available in actionable form as downloadable Excel spreadsheets with their own DOIs. Various supplementary information files available, including Data Fusion supplements involving Google Maps and related data from other papers. Portuguese translation of the Abstract available as a separate XHTML document with semantic markup. An Article Summary is also available as a separate document. However, the statements contained within the Summary are not in machine-readable format and have no underlying metadata standard.
Machine-readable metadata (M)	3	Structural and semantic markup within the XHTML text; embedded RDFa provides basic bibliographic information; two downloadable RDF files with their own DOIs accompany the enhanced article, one giving full bibliographic metadata about the article and the second providing bibliographic details, citation typing information and citation frequencies for all the cited references.
Overall rating:	16

Shotton (2009).

Journal: Learned Publishing. Publisher: Association of Learned and Professional Society Publishers.

This paper describes and reviews the state of semantic publishing. There are no numerical data in the paper.

Rating:

Peer review (P)	2	Responsive peer review.
Open access (O)	1	Self-archiving green/gratis. Published in a subscription-access journal that permits authors to publish postprints elsewhere.
Enriched Content (E)	1	Plentiful web links within the text, and direct links to all referenced articles. No other forms of semantic enrichment.
Available datasets (A)	0	Not applicable for this article.
Machine-readable metadata (M)	0	None. Article only available in PDF.
Overall rating:	4

Shotton et al. (2009).

Journal: PLoS Computational Biology. Publisher: Public Library of Science.

This paper describes the semantic enhancements applied to Reis et al. (2009). As such, it contains no primary research data of its own.

Rating:

Peer review (P)	3	Responsive peer review, with the opportunity for readers to make post-publication comments. (N.B. The editor and one of the reviewers, who revealed her identity to the authors, went to exceptional lengths to help the authors improve the paper, involving repeated rounds of reviewing.)
Open access (O)	4	Gold/libre. Published in an open access journal with a Creative Commons attribution license.
Enriched Content (E)	1	Plentiful web links in the text, direct links to all referenced articles, figures bearing their own DOIs, and links to examples of semantic enhancement in our enhanced version of Reis et al. (2008). No semantic enhancements to the paper itself.
Available datasets (A)	1	Two supplementary information files having their own DOIs published with this article, giving technical details of the semantic enhancements made to Reis et al. (2008).
Machine-readable metadata (M)	1	Structural markup available in the XHTML version of the article.
Overall rating:	10

Shotton (2010).

Journal: J. Biomedical Semantics. Publisher: BioMed Central.

Rating:

Peer review (P)	3	Responsive peer review, with the opportunity for readers to make post-publication comments.
Open access (O)	4	Gold/libre. Published in an open access journal with a permissive attribution license.
Enriched Content (E)	1	Plentiful web links within the text, direct links to all referenced articles, and figures and tables bearing their own DOIs, but lacking semantic markup of text and lacking any interactivity.
Available datasets (A)	4	CiTO ontology available online, with a downloadable human-readable supplementary information file with its own DOI providing further explanations, all available to peer reviewers.
Machine-readable metadata (M)	2	Structural markup within the text available as XML. Downloadable RDF files with their own DOIs, one giving full bibliographic metadata about the article and the second providing bibliographic details, citation typing information and citation frequencies for the cited references. No semantic markup, no embedded RDFa and no article summary.
Overall rating:	14

Shotton (2012) (This article.)

Journal: D-Lib Magazine. Publisher: Corporation for National Research Initiatives (CNRI).

This article is a position paper presenting ideas (in terms of the FaBiO ontology, a fabio:proposition), and contains no research data.

While D-Lib Magazine does not undertake formal peer review of its articles, and has a policy of not publishing articles that have previously appeared elsewhere, this particular article has benefited from substantial comments made by colleagues (see Acknowledgements for further details) on a preprint of this paper that was published in Nature Preceedings in preparation for the Microsoft Research/Harvard University meeting in October 2010 entitled Transforming Scholarly Communication. I am grateful to the editorial team of D-Lib Magazine for their flexibility in accepting this article despite the fact that the preprint had already been published, since the comments received have constituted an effective post-publication responsive peer review of the preprint, and have stimulated a major revision and expansion of the text, resulting in significant enhancements to both the content and the quality of the resulting D-Lib article. As a result of these enhancements, the evaluation scales for some of the Five Stars have been amended, causing the ratings given above for Shotton (2009) and for Shotton et al. (2009) to be lowered relative to the scores given to those papers in the preprint.

Rating:

Peer review (P)	3	Post-publication responsive peer review of the preprint — see comments above.
Open access (O)	4	Gold/libre open access without author fee!
Enriched Content (E)	1	Plentiful Web links in text and to all references. No additional semantic enhancement of text.
Available datasets (A)	0	Not applicable.
Machine-readable metadata (M)	1	Structural markup in HTML only.
Overall rating:	9

The above ratings show that the nature of the article will influence the overall rating obtained. For example, reviews and position papers with no primary research data will always score low in terms of available datasets.

The Five Stars of Online Journal Articles Ontology, available from http://purl.org/spar/fivestars/, is a simple ontology written in OWL 2 DL that forms part of SPAR, a suite of Semantic Publishing and Referencing Ontologies (http://purl.org/spar/). It is intended for use by publishers and others wishing to encode Five Stars ratings, such as those shown above, in machine-readable form, so they can accompany other machine-readable metadata for the article. The following RDF graph, shown in turtle notation, gives the Five Stars ratings for this article:

@prefix fivestars: <http://purl.org/spar/fivestars/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://dx.doi.org/10.1045/january2012-shotton>
	fivestars:hasPeerReviewRating "3"^^xsd:nonNegativeInteger ;
	fivestars:peerReviewRatingComment "Post-publication responsive peer review of the preprint." ;
	fivestars:hasOpenAccessRating "4"^^xsd:nonNegativeInteger ;
	fivestars:openAccessRatingComment "Gold/libre open access without author fee!" ;
	fivestars:hasEnhancedContentRating "1"^^xsd:nonNegativeInteger ;
	fivestars:enhancedContentRatingComment "Plentiful Web links in text and to all references. No additional semantic enhancement of text." ;
	fivestars:hasAvailableDatasetsRating "0"^^xsd:nonNegativeInteger ;
	fivestars:hasMachine-readableMetadataRating "1"^^xsd:nonNegativeInteger ;
	fivestars:machine-readableMetadataRatingComment "Structural markup in HTML only." ;
	fivestars:hasOverallFiveStarsRating "9"^^xsd:nonNegativeInteger ;
	fivestars:overallFiveStarsRatingComment "The nature of this article, being a position paper rather than a research paper with primary research data, has influenced the overall rating obtained." .

Ubiquity Press has indicated that it wishes to adopt such evaluations and give each of its published articles a Five Star rating. I encourage other publishers to do so too.

Acknowledgements

I am most grateful to Bob DuCharme who, inspired by Berners-Lee's Five Stars of Linked Open Data, challenged me to come up with five stars for semantic publishing, following a talk entitled Applying XML and Semantic Technologies to Liberate Infectious Disease Data that I gave at the recent Oxford XML Summer School. I thank Tanya Gray and Katherine Fletcher for feedback after reading a preliminary draft of this paper. I wish particularly to acknowledge the input made by Silvio Peroni, who insisted that I specify evaluation scales for all five stars, and whose proposals concerning peer review and open access I have incorporated; by those who participated in a brief but lively discussion of the Five Stars preprint on the Beyond the PDF mail list, particularly Cameron Neylon who suggested a radical revision of my original evaluation scale for peer review, Phillip Lord for wise remarks concerning post-publication peer review and RFCs, and Peter Murray-Rust for his insistence that the type of license under which Open Access publications are published is critical; and by Brian Hole of Ubiquity Press, both for his comments and for his general enthusiasm for the Five Star concept. Their suggestions have constituted an effective post-publication peer review of a preprint of this paper, as mentioned above, and I thank them all sincerely for taking the time and making the effort to supply these valuable critiques.

References

[1] Anderson NR, Tarczy-Hornoch P and Bumgarner RE (2006). On the persistence of supplementary resources in biomedical publications. BMC Bioinformatics 7: 260. http://dx.doi.org/10.1186/1471-2105-7-260.

[2] Attwood TK, Kell DB, McDermott P, Marsh J, Pettifer SR and Thorne D (2010). Utopia documents: linking scholarly literature with research data. Bioinformatics 26: i568-i574. http://dx.doi.org/10.1093/bioinformatics/btq383.

[3] Berners-Lee T (2009). Linked data. Available at http://www.w3.org/DesignIssues/LinkedData.html.

[4] Boulton G, Rawlins M, Vallance P and Walport M (2011). Science as a public enterprise: the case for open data. The Lancet, 377: 1633-1635. http://dx.doi.org/10.1016/S0140-6736(11)60647-8.

[5] Bourne P, Clark T, Dale R, de Waard A, Herman I, Hovy E and Shotton D, on behalf of the Force11 community (2011). Force11 White Paper: Improving the Future of Research Communication and e-Scholarship. (Published 28 October 2011). http://force11.org/white_paper.

[6] de Waard A, Buckingham Shum S, Carusi A, Park J, Samwald M, and Sándor Á (2009). Hypotheses, Evidence and Relationships: The HypER Approach for Representing Scientific Knowledge Claims. In: Proceedings 8th International Semantic Web Conference, Workshop on Semantic Web Applications in Scientific Discourse (26 Oct 2009, Washington DC.). Lecture Notes in Computer Science, Springer Verlag: Berlin. http://oro.open.ac.uk/18563/.

[7] Evangelou E, Trikalinos TA and Ioannidis JP (2005). Unavailability of online supplementary scientific information from articles published in major journals. FASEB J. 19: 1943-1944. http://dx.doi.org/10.1096/fj.05-4784lsf.

[8] Fink JL, Fernicola P, Chandran R, Parastatidis S, Wade A, Naim O, Quinn GB and Bourne PE (2010). Word add-in for ontology recognition: semantic enrichment of scientific literature. BMC Bioinformatics 11: 103. http://dx.doi.org/10.1186/1471-2105-11-103.

[9] Greenberg J, White HC, Carrier S and Scherle R (2009). A metadata best practice for a scientific data repository. Journal of Library Metadata 9 (3-4): 194-212. http://dx.doi.org/10.1080/19386380903405090.

[10] Groth P, Gibson A and Velterop J (2010). The anatomy of a nanopublication. Journal of Information Services and Use 30: 51-56. http://dx.doi.org/10.3233/ISU-2010-0613.

[11] Kurtz D, Parker G, Shotton D, Klyne G, Schroff F, Zisserman A and Wilks Y (2009). CLAROS — bringing classical art to a global public. Proc. IEEE e-Science Conference, Oxford, 9-11 December 2009, pp 20-27. http://doi.ieeecomputersociety.org/10.1109/e-Science.2009.11.

[12] Murray-Rust P (2011). Green and Gold Open Access? Libre and Gratis. Reasons why readers and re-users matter. Blog post.

[13] Pafilis E, O'Donoghue SI, Jensen LJ, Horn H, Kuhn M, Brown NP and Schneider R (2009). Reflect — augmented browsing for the life scientist. Nature Biotechnology 27: 508-510. http://dx.doi.org/10.1038/nbt0609-508.

[14] Peroni S and Shotton D (2011). FaBiO and CiTO: ontologies for describing bibliographic resources and citations. (Submitted for publication). Preprint.

[15] Reis RB, Ribeiro GS, Felzemburgh RDM, Santana FS, Mohr S, Melendez AXTO, Queiroz A, Santos AC, Ravines RR, Tassinari WS, Carvalho MS, Reis MG and Ko AI (2008). Impact of environment and social gradient on Leptospira infection in urban slums. PLoS Neglected Tropical Diseases 2: e228. http://dx.doi.org/10.1371/journal.pntd.0000228.

[16] Shotton D (2009). Semantic Publishing: The coming revolution in scientific journal publishing. Learned Publishing 22: 85-94. http://dx.doi.org/10.1087/2009202. A post-print is available.

[17] Shotton D and Portwin K (2009). Technical implementation of the semantic enhancements applied to Reis et al. (2008) Impact of environment and social gradient on Leptospira infection in urban slums. PLoS Neglected Tropical Diseases 2(4): e228. Supporting Information File S1 to Shotton et al. (2009). http://dx.doi.org/10.1371/journal.pntd.0000228.x009.

[18] Shotton D, Portwin K, Klyne G, and Miles A (2009). Adventures in semantic publishing: exemplar semantic enhancement of a research article. PLoS Computational Biology 5: e1000361. http://dx.doi.org/10.1371/journal.pcbi.1000361.

[19] Shotton D (2010). CiTO, the Citation Typing Ontology. J. Biomedical Semantics 1 (Suppl. 1): S6. http://dx.doi.org/10.1186/2041-1480-1-S1-S6.

[20] Smit E (2011). Abelard and Héloise: Why data and publications belong together. D-Lib Magazine Volume 17, Number 1/2. http://dx.doi.org/10.1045/january2011-smit. (In particular, see Figure 12).

[21] Suber P (2008). Gratis and libre Open Access. SPARC Open Access Newsletter (August 2008 issue). http://www.arl.org/sparc/publications/articles/gratisandlibre.shtml.

[22] Van der Graaf M and Waaijers L (2011). A Surfboard for Riding the Wave. Towards a four country action programme on research data. A Knowledge Exchange Report. http://www.knowledge-exchange.info/Default.aspx?ID=469.

[23] Wan S, Paris C and Dale R (2010). Supporting browsing-specific information needs: Introducing the Citation-Sensitive In-Browser Summariser. Web Semantics: Science, Services and Agents on the World Wide Web 8: 196-202. http://dx.doi.org/10.1016/j.websem.2010.03.002.

[24] Wood J, Andersson T, Bachem A, Best C, Genova F, Lopez DR, Los W, Marinucci M, Romary L, Van de Sompel H, Vigen J, Wittenburg P, Giaretta D and Hudson RL (2010). Riding the wave: How Europe can gain from the rising tide of scientific data. Final report of the High Level Expert Group on Scientific Data; A submission to the European Commission, October 2010. Available from http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf.

About the Author

David Shotton is a cell biologist by background, currently active in developing services for managing research data such as those of the DataFlow Project. He also works in semantic publishing, developing exemplar articles and the SPAR Ontologies for describing all aspects of publishing, bibliographic entities and citations. Recent developments include creation of the Open Citation Corpus, containing the bibliographic citations from the reference lists in all the Open Access articles within PubMed Central, and MIIDI, a Minimal Information standard for reporting an Infectious Disease Investigation. The intent is to use MIIDI, together with the MIIDI Editor that facilitates entry of MIIDI-compliant rich metadata, to create Open Research Reports, structured digital summaries of infectious disease datasets or research articles that can be published in an open access data journal. His other interests include using semantic technologies to assist data integration in the humanities, as exemplified by CLAROS, "The World of Art on the Semantic Web". More details are given in recent conference presentations, and in blog posts in his Open Citations and Semantic Publishing Blog.