Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
June 2003

Volume 9 Number 6

ISSN 1082-9873

DOI

A 2003 Progress Report

 

Norman Paskin
Director, International DOI Foundation
<n.paskin@doi.org>

Red Line

spacer

Introduction

The International DOI Foundation (IDF) recently published the third edition of its DOI Handbook [1], which sets the scene for DOI's expansion into much wider applications. Edition 3 is not simply an updated user guide. A great deal has happened in the underlying technologies and in the practical deployment and development of DOIs (Digital Object Identifiers) since the last edition was published a year ago. Much of the program of technical work foreseen at the inception of DOIs has now been completed.

The initial simple implementation of DOI as a persistent name linked to redirection continues to grow, with approaching ten million DOIs assigned from several hundred organisations through a number of Registration Agencies in USA, Europe, and Australasia, supporting large scale business uses. Implementations of more sophisticated applications (offering associated services) have been developing well but on a smaller scale: a framework for building these has been completed as part of the latest release and promises to stimulate a new wave of growth. From its original starting point in text publishing, there has been gradual embrace by a number of communities: these include national libraries (a consortium of national libraries recently joined the IDF); government documentation (with the appointment of TSO The Stationery Office in the UK as a DOI agency and the announced intention of the EC Office of Publications to use DOIs); non-English language markets (France, Germany, Spain, Italy, Korea). However implementations in non-text sectors have been far slower to develop, though several are now under discussion.

The DOI community can point to several significant achievements over the past few years:

  • A practical successful open implementation of naming objects, treating content as information objects, not simply packets of bits;
  • The IDF's role in co-sponsoring, championing, and now implementing the <indecs>™ framework as a semantic tool for structured metadata — an essential step for treating content as information in Semantic-Web-like applications;
  • A template for building advanced applications, connecting resolution and metadata technologies, and offering hooks to web services and similar applications;
  • The development of a policy framework that allows multiple communities autonomy;
  • The practical implementation of DOIs with emerging related standards such as the OpenURL framework in contextual linking.

A number of issues remain to be solved. In the main these are no longer technical in nature, but more concerned with perception and outreach to other communities. They include: correctly positioning the DOI in the standards community as a practical implementation (based on standards, but more than standards); offering the benefits of DOI to other communities working in related identifier development whilst allowing them to remain largely autonomous; demonstrating how DOIs can complement, rather than compete with, other activities; and ensuring that a sustainable long-term infrastructure for any application (commercial and non-commercial alike) is in place.

Persistent, actionable identifiers with a fully managed sustainable infrastructure are not appropriate for every activity; but they are suitable for many, and where they are used, the key to providing a successful and widely adopted system is encouraging economy of scale (and so, where possible, convergence with other related efforts), flexibility of use, and a low barrier to use. DOI is well on the way to providing this, but not yet guaranteed of success without the further effort that is now being applied.

Historical perspective

Ten years have passed since the launch of the first popular web browser [2]. Almost immediately after its introduction, the side effects on information management of disappearing links became apparent [3] and led in 1995 to the first major effort to make resource discovery easier: the Dublin Core initiative [4, 5] (later, the 1998 launch of Google™ [6] was to change the world of resource discovery forever). In 1997 the BIC/EDItEUR paper "Unique identifiers: a brief introduction" [7] presaged the <indecs> [8] activity of 1998-2000 (which considered interoperable data across all e-commerce media, for all purposes). The Digital Object Identifier (DOI) [9] was launched five years ago to provide an extensible infrastructure for digital management of content — initially conceived as a tool for naming digital content for publishers, though from the outset designed as a generic tool. An earlier but related activity, which soon became key to DOI, was discussed in the paper by Kahn and Wilensky on digital objects [10], where they examined the concept of identifiers (handles) and reconceptualizing the Net from the movement of data packets to the management of information. The Kahn-Wilensky paper, in turn, resulted from the earlier work by Kahn et al. at the Corporation for National Research Initiatives on the basic digital object idea which resulted in the filing of a patent in 1993 [11]. After one year of the IDF's life, "DOI: Current Status and Outlook" summarised the way ahead [12], further articulated in "From one to many: the next stage in development of DOI functionality" [13], which explored the mechanisms and consequences of moving from "one DOI resolves to one URL" to "one DOI resolves to multiple data types" in a logical manner, and how identifiers are dealt with in local contexts. Much of the program of work foreseen then has now been completed.

The DOI is both a deployment and a development activity: the deployment of an initial implementation of persistent actionable identifiers (a DOI as a name for a piece of content, solving the broken link problem), and a development of a full implementation (using multiple resolution and interoperable metadata) as a common infrastructure for distributed content management at the level of "meaning" rather than "bits".

DOI is not a web-only tool but is designed to be applicable to any Internet activity [14], though the current visibility of the Web and the immediate link management problem means that DOIs have been initially implemented there. An early successful example of the initial DOI implementation is CrossRef [15], launched in 1999, where scientific and other scholarly professional articles are assigned a DOI on publication that enables linkage from citation to source: it is now widely recognised as a major advance and is used in millions of articles by the majority of prestigious scientific and professional publications.

An example of the full capability of DOI is dynamic alerting to the availability of a new version of a document, demonstrated in Adobe Acrobat using DOI functionality [16]. The full implementation — which is seamlessly compatible with the initial implementation — has been gradually developed and introduced, but the recent publication of full documentation (and tools such as an API) now fully describes how anyone can create applications providing extensible DOI services through multiple resolution and the DOI metadata system. The completion of this step, marked by the recent publication of a third edition of the DOI Handbook (the third in as many years), is a major achievement for the International DOI Foundation, and a useful point at which to summarise the current position for a wider audience.

Building on standards

The International DOI Foundation aimed to use existing standards, and where these did not already exist to join with others in developing them. Another aim was to develop a wholly extensible and interoperable system, not restricted to one sector, medium, or technology. Finally, it was essential to make a practical application: implementable yet based on sound principles. The elements needed to complete the full implementation were soon recognised: they included:

  • A framework of metadata interoperability that would allow DOI assigners to use any metadata scheme, yet still ensure semantic equivalence with DOIs assigned by others. This did not exist when the DOI concept was launched, but has since been developed as the <indecs> (interoperability of data in e-commerce) framework. That framework is being adopted in the MPEG 21 as the basis of the Rights Data Dictionary and is proving influential elsewhere.
  • A framework for multiple resolution that would allow DOIs to be used by a wide audience without a presupposed understanding of what was done on assignment — assuming DOIs represented essentially unknown resources that had to be dealt with. The existing Handle System® was adopted to provide this flexible framework, and recent DOI work has laid on top of this a specification combining the power of the two technologies, by using Handle multiple resolution to associate DOIs with Application Profiles that include a metadata scheme based on <indecs> principles.
  • Equally importantly, the recognition that standards alone are not a working implementation; a sustainable infrastructure both of technology and policies was needed to harness them for practical deployment. A useful analogy is with the physical bar code system, where the standard bar code symbology and code readers are only a small part of a business infrastructure of managed allocation, policies, and commercial application tools, allowing a wide range of uses (both commercial and non-commercial): DOI has often been described as the bar code for the digital network.

A full exposition of the technology of the DOI System as it now exists can be found in the DOI Handbook. Some fundamental concepts of identification and metadata are discussed elsewhere in a recent review article [17].

The DOI today: what it does

A DOI persistently identifies an entity of relevance in an intellectual property transaction and associates the entity with relevant data and services. An entity can be identified at any arbitrary level of granularity. DOIs can be used to identify, for example, text, audio, images, software, etc., and in future could be used to identify the agreements and parties involved, though initial implementations have focussed on "creations". (While the scope of intellectual property transactions is quite broad, it is unlikely that DOIs would be appropriate for identifying entities such as people or natural objects or trucks unless they are involved in such a transaction). DOIs can be used to identify free materials and transactions as well as entities of commercial value.

The DOI System offers a unique set of functionality:

  • Persistence – DOIs resolve to information (metadata) about the identified object in a manner that persists over changes in location, ownership, description methods, and other changeable attributes. If the object ceases to be available, the DOI at minimum indicates a valid but now defunct identifier.
  • Interoperability – Interoperability enables rich interlinking with related content, so as to increase the content's usefulness and visibility.
  • Extensibility – In this context, extensibility means the ability to later add new features and services.
  • Efficiency – Through single management of data for multiple output formats (platform independence) and class management of applications and services, efficiency is gained.
  • Dynamic updating – Metadata, applications and services need to be quickly and easily updated.

The benefits of this functionality, because it is essentially generic and so rather abstract, needs to be translated into specific illustrations that make sense for a particular community. For example, DOIs in enterprise content management convey the benefits of knowing what you have and being able to find and use it efficiently [18]. DOIs for publishers provide improved discoverability, longer shelf life for access, and linking to related offerings. DOIs for citations improve the ability to create crosslinks in the publishing production process, etc.

Some of the benefits are not in themselves unique to DOI. Persistence, for example, is a feature of the URN and URI specifications, of which DOI is essentially an implementation. Structured metadata allowing interoperability is a feature of several ontology efforts such as ABC [19]. But DOI is unique in being a combination of these benefits and a practical managed implementation delivering them.

The DOI System has four components:

  1. Numbering: assigning an alphanumeric string (a number or name) to the intellectual property entity that the DOI identifies. DOI is an implementation of URI (Uniform Resource Identifier, sometimes-called Universal Resource Identifier) and URN (Uniform Resource Name). The numbering mechanism follows a syntax standardised as ANSI/NISO Z39.84-2000. The number may incorporate any existing identifier scheme (thereby retaining its construction, check digits, etc.) though for the purpose of the DOI System the string is "opaque" or meaningless. DOIs are not case-sensitive and have no fixed field length.
  1. Description of the entity that has been identified with a DOI, through associated metadata. The DOI Metadata System is based on the <indecs> framework. The metadata available with an entity may be derived from many different metadata schemes; the metadata elements needed in a particular transaction depends on the nature of the transaction; some metadata is likely to be common to all applications and essential for initial recognition. From these principles developed the concept of a small kernel of metadata (compulsory for every DOI) and extended Application Profiles (specific to a group of DOIs) as well as the view that these should be interoperable (so that DOIs and services can be mixed and used from various sources) through common controlled definitions in a structured data dictionary (which enables mapping of existing metadata schemes).
  1. Resolution: the Internet technologies that make the identifier "actionable" on digital networks, by providing resolution services. These are currently based on the Handle System, a general-purpose distributed information system designed to provide an efficient, extensible, and secured global name service for use on networks such as the Internet. The Handle System includes an open set of protocols, a namespace, and a reference implementation of the protocols. The DOI System is one implementation of the Handle System.
  1. Policies: the rules that govern the operation of the system, in a social infrastructure. The social infrastructure defines the funding and ongoing operational requirements of the system as well as its day-to-day support and management.

These four components are used elsewhere: for example, there are other implementations of URIs, Handle identifiers, <indecs> metadata principles, and organisation policies; but DOI is unique in bringing together all the components in a fully implemented and managed system.

The DOI today: usage

The DOI System is deployed via Registration Agencies (RAs) who are empowered to assign DOIs for a community under the aegis of the IDF. Growing numbers of RAs have been appointed, in the US, Australasia, and Europe. They include a variety of organisations [20], both commercial and not-for profit. Initial enthusiasm by early adopters led to dedicated and, in some cases, start-up companies building offerings around DOI (e.g., CrossRef, Content Directions, Inc.). More recently, we have seen RA appointments of existing mature businesses who simply wish to add DOI as one tool in their service offerings (e.g., TSO The Stationery Office, Copyright Agency Limited), or developing consortia of companies like MEDRA [21].

The number of DOIs assigned is fast approaching 10 million, from over 300 organisations. The initial CrossRef implementation remains the largest user, but there is growing interest elsewhere, and applications will be stimulated by the availability of full application tools. As each RA comes on board, it brings a whole community into DOI usage. A number of potential RAs and communities in potential areas of application are developing DOI proposals. Independent evaluation has been largely positive:

"...it is clear that DOI is on a roll. Its increasing relevance in educational markets, the opening up of the first government usage via the UK's TSO Ltd (The Stationery Office) as a registration agency, and now the announcement that Europe's three most innovative national libraries (Germany, the Netherlands and the UK's British Library) are joining the International DOI Foundation as an informal consortium demonstrate that clearly enough....from 2001 to 2003 DOI has steadily evolved from a single resolution solution to a multiple resolution environment and now to an effective way of marshalling and indicating metadata and associated data. Both elements are vital. Solving problems posed by content being held in identical form in several different locations is important, and it is even more important to point the user to this right place in an automated way. Helping users to discover the metadata that they need to make choices is critical to content organisation, and indeed at a basic level DOIs become a simple content organisation — or management — system in themselves. There is now enough potential in DOI to unlock usage in every content domain, and also to have a real effect on web organisation in the short term. Asked at the recent SIIA summit in New York whether he approved of DOIs and other persistent identifier schemes, Tim Berners-Lee answered strongly in the affirmative, as long as they were linked, as DOIs are, to URLs. For the foreseeable future, numbering objects and associating knowledge about the object with the number may be the only way of protecting users against the overwhelming ability of a networked society to drown in its own output." (EPS Update Note, 8th May 2003) [22]
"We predict that, within five years, the DOI standard will be used to tag any "published" material from any industry — that is, all content or information that is officially released for consumption, whether within or outside of your firewalls." (Patricia Seybold Group) [23]

Advanced DOI applications

The full DOI implementation combines multiple resolution and interoperable metadata. Resolution is the process in which an identifier is the input — a request — to a network service to receive in return a specific output of one or more pieces of current information (state data) related to the identified entity: e.g., a location (URL). Multiple resolution, that is made possible by the Handle System used as the DOI resolution component, is the return as output of several pieces of current information related to a DOI-identified entity — specifically at least one URL plus defined data structures allowing management (discussed in more detail below). Interoperable metadata refers to metadata elements and schemes that adhere to well-defined principles including a common ontology basis and so can be understood outside a particular metadata scheme.

The basic approach to the full DOI implementation, now fully documented in the recent DOI Handbook, uses the concepts of DOIs, DOI Application Profiles (APs), and DOI Services. Each DOI is associated with one or more AP, and each AP is associated with one or more defined Services.

A DOI Application Profile is the functional specification of an application (or set of applications) of the DOI System to a class of intellectual property entities that share a common set of attributes. A DOI Service is a defined result from a defined action, i.e., do X and the result will be Y; this will frequently involve specific servers on the network, but we also include more abstract notions such as a defined method for comparing dates in documents with dates in DOI records. One of the services, at minimum the only one, is the provision of metadata for each DOI. Any additional services later associated with each DOI of a given AP would also be registered under the AP. In this way, one change to an AP affects all the DOIs in that AP.

The benefits of this set of tools in information management are readily apparent: DOI Application Profiles are a grouping mechanism and, through combination with resolution, the AP also becomes a level of indirection for services. Thus if the Acme Registration Agency registers one million DOIs all of Application Profile 9 and a year later adds one more service to the three services that were available from the start, only the single 10.AP/9 record needs updating, and not the one million DOIs already tagged with AP9. Similarly, additional services may be made available to existing APs: when a new service is created it will not be necessary to change every DOI to point to that service, nor will users have to find out about the service; the Application Profile of each DOI is the key to making that happen: add the new facility (service) to the Application Profile record and all of the millions of DOIs associated with that AP inherit the new facility.

The original conception of DOI was simply as a tool to aid in managing content for rights owners; indeed, the original conception of DOI was simply "a number attached to a file". Fortunately, the founders of the activity realised that only a generic structured approach would suffice. Whilst the detail of DOI Application Profiles seems a long way from "a simple number", the connection is quite straightforward: to manage content of any form, one must be able to identify it (uniquely, and persistently), to precisely describe what it is (since it may not be an object to hand), and to use it in some predictable (hence structured) transaction on a digital network. DOI resolution and metadata are merely the cogs in the wheels that allow this behaviour.

DOI as a managed system

Like Domain Name registration, DOI assignment requires a fee and agreement to follow the defined standard and rules. This does not make the system closed, or commercial, but it does make it managed. The DOI Foundation is a not-for-profit organization, not a commercial operation; however, the system has costs that need to be met. Persistence is a function of organizations, not technology: to support a persistent identifier system, a persistent organization needs to exist. The principle concern of a persistent organization is of continuing funding; hence the model selected for a long-term position for a DOI organization was a body that is not reliant on external sources, such as grants or membership, but is a self-funding system that can be supported in perpetuity from its own resources. The IDF is currently undergoing controlled migration from its initial member-funded organization (like W3C) to an organization that is operationally funded.

The implementation of the DOI System adds value, but the implementation necessarily incurs some resource costs in data management, infrastructure provision and governance, all of which contribute to persistence. The mechanism chosen to recoup those costs incurred by the organization is a self-funding "franchise" business model, as used by the physical bar code UCC/EAN system, and other proven systems. This is funded by a fee for participation (which may optionally be passed on to registrants, waived, or subsidised by the operating entity), but not for use of a DOI once issued.

To make such a system work effectively requires protection of the assets within the system (1) from illicit exploitation, and (2) for assured quality control. Illicit exploitation would include someone calling something a DOI when it is not part of the system; this could be damaging to one or both of the financial health (avoiding payment of an issuing fee) or the quality (poor data) of the system. To prevent this exploitation requires the availability of legal remedies: specifically, DOI relies on copyright and trademark law to protect the DOI brand and reputation. DOI is not a patented system; the IDF has not developed any patent claims on the DOI System and does not rely on patent law for remedy.

The underlying technologies used by the DOI System also have similar considerations. Handle is used by IDF under licence from the Corporation for National Research Initiatives, who have certain intellectual property claims to protect the misuse of the Handle System; <indecs> intellectual property (IP) is assigned to, jointly and solely, IDF and EDItEUR and made available freely but under stated terms to others (an example being the <indecs>RDD work contributed to MPEG 21).

There is a widespread recognition of the advantages of assigning identifiers as well as a widespread misconception that an abstract-free specification (like a URN or URI) actually delivers a working system rather than a namespace that still needs to be populated and managed. URLs, for example, have a clear technical infrastructure (standards for how they are made) but a very loose social infrastructure (anyone can create them once a domain name has been obtained, with the result that they are unreliable: they have no guarantee of stability, let alone associated structured metadata). Product bar codes, Visa numbers, and DOIs have tighter social (business) infrastructures, with rules and regulations, costs of maintaining and policing data, and corresponding benefits of quality and reliability. From this need for management stems some misconceptions about the DOI funding and business model. The most common myths are:

  • Myth: DOI is for, run by, or only to the benefit of, commercial publishers. The publishing community was the first to see the benefits of persistent identification and to attempt to build an open system (rather than a system for, e.g., a library or a campus); several publishers have not only joined the IDF but provided initial loan funding, and the initial CrossRef application is in the publishing sector. However, there is nothing to prevent any other application, or any non-publisher involvement.
  • Myth: DOI is "a commercial packaging of something that is available for free elsewhere". The practical implementation offered by DOI is more than a collection of the underlying technical specifications.
  • Myth: DOI is "only for rights management". Whilst that was the initial impetus, since rights management requires an extensible system, it is in fact applicable for any use.
  • Myth: DOI is "untested" or unrelated to other activities. All of the components are proven in other contexts, and there are millions of working DOIs. DOI builds on Handle and <indecs>, and so it inherits the strengths and real-world testing of these: for example, the <indecs> approach has been validated by rigorous analysis in the MPEG 21 framework development. These underlying technologies (rather than DOI per se) are often appropriate to answer the question of "how the DOI relates to X".
  • Myth: DOI "allows only one business model" (seeing a swan and claiming that all birds are white and swim). As more applications are developed, the flexibility of a system that deliberately allows any business model will be appreciated.

Community autonomy

The requirement that the DOI be extensible across any medium and type of content requires the ability for decentralised application building. For a managed system, this implies subsidiarity: decisions about changes should be taken at the lowest level compatible with maintaining integrity of the system. The consequence for DOI deployment is that individual communities of interest should be empowered to use the DOI with a great deal of autonomy. Registration Agencies, based on market models like physical bar codes, effectively hold a "franchise" on the DOI. In exchange for a fee to the IDF, and a commitment to follow the ground rules of the DOI System, they are free to build their own offerings to a particular community, adding value services on top of DOI registration and charging fees for participation. Since DOIs are designed to serve a wide range of communities, it would be senseless for the IDF to impose any specific business model, or to analyse a specific community's detailed needs. However, it would equally be senseless if communities pretended the problems of managing digital information were unique to them and had nothing in common with others. If a community becomes an RA, or endorses an entity as an RA, it is free to set up any service and business model it wishes.

There is a fee for participation in the system to all RAs, based on number of DOIs assigned; nonetheless, the IDF has no say whatsoever in how RAs generate that fee (and so DOIs can be issued free or as part of a fee-bearing service). As an example, CrossRef is a service of PILA (Publishers International Linking Association), itself a non-profit consortium of some 200 publishers — both for-profit and not-for profit. PILA has its own Board, business plan, governance, etc. The only mandatory connection it has with IDF is a formal agreement for the use of the DOI System, through which it obtains licenses to the Handle System and <indecs> Data Dictionary, DOI implementations, and the benefits of common DOI technology. PILA uses DOI as one part of what CrossRef builds; the rest of the technology, and its use, is entirely up to PILA. More importantly perhaps, the influence is in fact the other way around: rather than IDF exerting control on RAs, RAs exert an influence on IDF. CrossRef is on the IDF Board, and they have representation on the IDF working groups. Ultimately, the RAs will wholly control IDF as a federation. This structure means that every RA has a say in managing the common infrastructure of many applications. The more RAs, the more valuable such collaboration will be.

A new RA is also able to take advantage of existing DOI work and common infrastructure to save time and money and ensure future interoperability. An RA has far more influence than does an organization which developed its own "island of interoperability" by creating a separate "X-namespace" with its own rules and mechanisms: that would work within the island of X, but have no influence on what others were doing; therefore, such an organization would need gateways to Z, Y, etc.

Communities may also pool resources, developing several RAs yet using one RA as a common back-office, and this is being actively pursued by some DOI participants at present. To encourage communities to work on relevant applications, the IDF has encouraged the formation of working groups that can focus on specific areas and may build prototypes or construct full applications: for example, CrossRef was inspired by an early IDF working group.

Funding development

The biggest issue beyond solving technical problems for the IDF has been the provision of resources to allow development and ongoing resources for creating DOI's infrastructure.

Infrastructure for a community benefits all its members; funding the development of it is often a problem, and there is no "one size fits all" solution to how this should be done. There are many modern examples — 3G telephone networks, railways — struggling with the right model for supporting a common infrastructure. The Internet was largely a creation of central (US) government; the product bar code, a creation of a commercial consortium. The self-funding operational model is fine for a mature implementation, but bootstrapping it requires some initial funding: IDF chose a supporting membership model to do this, supplemented by loans from some of the early participants. IDF generates a growing proportion of its revenues (now approaching one half) from operational fees, and relies on other support for the remaining revenue until it achieves 100% self-funding. It is on target to achieve break-even on an annual basis for the first time this year, with revenues from RAs and from non-RA membership contributions (member organization numbers have varied within the range 30-50 over the past few years). The early years of development and launch, as with any start up, have necessitated support through loans that will be repaid from future system revenues. As participant numbers increase, the costs of participation should fall.

"Free open standards are a nice idea, but just as there is no such thing as a free lunch, there is no such thing as a free standard. Even many of the programmers who contributed to Linux did so through the beneficence of their employers or educational institutions, who furnished their equipment, Internet connections and so on....it's not just enough to throw a spec out there; it has to be resourced with market development, training, reference applications...and so on — all of which costs somebody money. The only question is who will pay for it. Plenty of technologically worthwhile would-be standards have withered and died because nobody would" [24].

Selling the concept of funding infrastructure is not easy until some practical implementations are available as examples: self-interest is a far more effective motivator than appealing to enlightened community benefit. For this reason, IDF has developed several prototypes and encouraged working groups in application areas. As one recent correspondent observed: "I've been a supporter of the DOI since my participation on the Enabling Technology committee of the AAP a number of years ago when it was first developed — but I have to say that I had no idea of its usefulness until I started working with publishers to implement it within their operations" [25] . So, potential funders may not see the need for support until a clear application is demonstrated; and once the applications exist, there may be temptation to feel that funding is no longer needed: IDF has seen both types of behaviour over the early years of its development, and some beneficiaries have hitched a free ride on the community's efforts.

Fortunately, several publishers and others (most recently, some national libraries) [26] have seen the longer term advantage of supporting DOI development. As DOI has now been in existence for a few years, we can see one clear practical example of proven benefit: CrossRef. Some publishers have funded membership of both IDF and CrossRef and, in some cases, have also provided start-up loans, as both IDF and PILA were start-up activities. But the start-up costs of the successful CrossRef system, even including the investment also made by several of the same publishers in IDF, were modest in comparison to global sales in the STM (Scientific, Technical and Medical) publishing market of around $9 billion [27] per year; and DOI-enabled cross-linking has been widely recognized as adding value to this process for users and therefore for publishers [28, 29], and the investment by the community of 200 publishers currently participating in CrossRef will continue to be handsomely repaid in the long term.

Metadata implications

One of the achievements of the DOI work has been the first public implementation of the <indecs> framework for interoperable metadata through a Data Dictionary.

DOIs can use metadata from any existing scheme. The DOI Metadata System uses the <indecs> work to enable semantic interoperability so that "cross-domain" tools and applications (those that reference DOIs across more than one AP) can interoperate consistently and effectively. Precise denoting of a metadata term is a necessary but not sufficient condition for interoperability: a further requirement is a tool for ensuring semantic equivalence of terms from disparate schemes through precise ontology definitions of what is being specified. IDF is currently completing work on the initial development of the <indecs> Data Dictionary (iDD) [30], collaborating with EDItEUR (the body responsible for ONIX [31] XML book industry product information standards, which now extends into serials, audio and video product information). The iDD results from methodology developed from <indecs> to support the MPEG21 Rights Data Dictionary standard, and provides semantic interoperability between different schemes and dictionaries describing intellectual property resources of any form. A detailed explanation of the technical basis of the iDD is given in the DOI Handbook. The concept of a future "Semantic Web" is often taken to mean management of digital objects without any coordination other than by machines and underlying schemes contributed by community members. Currently, such automated digital object management is a long way off: there needs to be agreement on definitions for exchange of information or the process breaks down. Even should there be universal agreement at a particular point in time, definitions will evolve and mutate, as they do in spoken language. For robust usage now, any workable system has some inescapable costs in maintaining these definitions. In the DOI System, this is the role of the <indecs> Data Dictionary.

A standard metadata declaration is made for every resource identified with a DOI. This "kernel" is a set of mandatory elements (for anything other than the simplest use of DOI); the kernel differs from Dublin Core in that it is based on an underlying ontology through the iDD and is designed for uses beyond resource discovery, for which DC is inappropriate [32]. It is also possible to declare additional metadata appropriate to an application: this may be derived (mapped from) any metadata scheme; the creation of the iDD means that ONIX terms will be a readily usable source of such metadata, and a mechanism for mapping any other schemes (e.g., SCORM for learning objects) is created, with a central one-to-one mapping rather than multiple uncoordinated crosswalks between different schemes.

In work currently underway, the terms mapped to the iDD include all terms in:

  • ONIX 2.1
  • the DOI Kernel (the mandatory metadata set for any DOI)
  • the DOI Resource Metadata Declaration (a recommended but not mandatory set of semantics) with an XML schema, which can be used as a standard form of extended DOI metadata declaration.
  • the CrossRef Metadata Declaration, an initial DOI application widely used in the serials publishing sector with several million assigned DOIs from 200 publishers
  • the ISO MPEG-21 RDD (Rights Data Dictionary)

DOIs allow any alphanumeric, case-insensitive, syntax of unlimited length, and can therefore include the current <indecs> identifier (iid) used in the iDD, or an ONIX short alphanumeric tag. Further access and use of the iDD in a form searchable from the IDF Website to allow users to look up the meaning of terms is currently being planned. At a minimum, there will be two different "views" of the iDD available: the complete iDD and individual term sets (for example, the "CrossRef Application Profile Metadata Declaration" or "ONIX"), and these views will need to show a user all relevant information about a term (for example, its genealogical relationships within the iDD). It is also possible to declare "Restricted APs": special cases of an Application Profile in which the metadata set is not available to all users, but the mapping of these terms to iDD is still mandatory.

The value of the <indecs> Data Dictionary will lie in its basis for the provision of services based on the semantic development work already undertaken and to be undertaken in future, such as "semantic transformation" services that will allow metadata terms to be translated from one term set to another — for example, from an ONIX message to the metadata requirements of a specific DOI-AP. These services will be introduced in response to market demand and probably will be provided through a third party, including mapping to the iDD of all terms from DOI-APs proposed by DOI-RAs (DOI Registration Agencies). Mapping requires specialist semantic analysis and careful negotiation with the source scheme to ensure that there is agreement over the precise meaning of any term used in a DOI-AP. The iDD will also require continuing maintenance of terms, since no resource of this kind will be entirely static.

DOIs have been used to identify creations, though in principle they could also be used to identify other entities, such as parties. For this reason IDF is participating in the InterParty [33] project under the European Commission's Information Society Technologies Programme (IST), which is considering interoperability of party identifiers.

Identifying at the appropriate level

Another achievement of DOI work has been rethinking the Net as management of information rather than as movement of data packets. Managing information on the Internet at the appropriate level is a recurring theme in the vision of the future of the Internet articulated by pioneers such as Robert Kahn [34] and Ray Ozzie [35].

DOI is not (only) an identifier of digital objects but (more widely) a digital identifier of objects — that is, it facilitates digital management of any entities (focussing on those involved in intellectual property transactions). Identification of non-digital entities, such as underlying abstractions (the "work") and physical manifestations, is also needed in expressing real world transactions [36], and any technology that considers only "digital representations" is inadequate for digital rights management. There is nothing new in using abstractions or representations in trading — we do it all the time with physical property: representations such as deeds and mortgages are altered (not the physical bricks, etc.) when a house changes hands [37]. Similarly with intellectual property, representations such as licences and files are traded. Digital trading of these pieces of property requires that each entity be uniquely and persistently identified, and associated with data.

The <indecs> framework recognises the concept of functional granularity ("it should be possible to identify an entity when there is a reason to distinguish it"); this is echoed in the DOI treatment of an identified entity as a first class object (an object in itself, not some attribute of an object). Whereas URLs are grouped by domain name and then by some hierarchical structure (originally based on file trees), DOIs offer a more finely grained approach to naming, where each name stands on its own, unconnected to any Domain Name System (DNS) or other hierarchy. The most common mechanism for resolution on the Internet is DNS (http as used in a URL is a use of DNS). The Handle System used by DOI uses TCP/IP but avoids the need to use the DNS, and this has significant advantages. One advantage is that names are not implicated in trademark disputes. Another advantage is flexibility over time as the document origins reflected in a hierarchy lose meaning, such as a change in ownership. (If acmeco.com sells some assets to newco.com, all URL filenames beginning acme.com/ that pertain to the sale need to be changed.) This benefit has already been seen in the case of CrossRef, where millions of DOIs identified through the Academic Press IDEAL system were merged into Elsevier's Science Direct system when the companies merged. In order to manage DOIs, we have created tools that allow more flexible management of sets of DOIs, in a more useful way than as a fixed sub-domain: a DOI, DOI Application Profile and DOI services can all be thought of as layers of abstraction that allow this. Functionality such as URL partial redirection and relative URLs (which assume as "known" or inherited a part of a URL / domain name address) make a lot of sense in the context of URLs. However, since DOIs deliberately have a more finely grained approach to naming things, functionality such as partial redirection is dealt with through tools that capitalise on that finer granularity: precise definition of components and their associated services.

Differential access

Whereas a physical object has context simply by where it resides — the copy of a book in my home bookcase has different "properties", such as ownership, from a copy in a bookstore — digital objects need this to be made explicit.

There are limitations to the simple initial implementation approach of DOIs resolving to one URL — which is why more complex applications have been developed. An obvious requirement is differential access: context-sensitive resolution, whereby the result received from resolving a DOI depends on some aspect of context (who is requesting this and what rights do they have: the problem of digital rights management). There are some cases of differential access already amenable to management through DOIs using multiple resolution and other techniques like parameter passing (adding a qualifying string that is in effect returned along with the resolved data).

There are business reasons why differential access is required. Some organisations with proprietary systems have resisted using DOIs with a single resolved URL for links, making users "detour" via their product interface to get to the full text: content providers have often assumed that their content, whether bibliographic citation, abstract or full text etc., would display within the providers native interface that, in addition to the content itself, conveys important information (such as branding and simple rights use declarations) generally not yet embedded in individual records.

One issue recognised at an early stage in DOI development is the "appropriate copy problem" [38]: resolution is sometimes required not to any generic instance of a piece of data, but to a particular copy that may have certain access rights as a locally held library copy. A technique already deployed to solve this is the use of DOIs for global resolution along with Open URL for local contextualisation. The OpenURL Framework for Context-Sensitive Services [39] describes a generic means for bundling contextualized identifier/metadata packages, designed to be extensible to new application domains and new transport methods [40]. As Herbert Van de Sompel, Creator of OpenURL/SFX recently put it: "When you don't have decent metadata, it's hard to provide decent services. That's why I am an enormous fan of unique identifiers for objects, and systems that allow you to obtain well-structured metadata by using those identifiers. For me the big deal of the DOI/CrossRef framework is not necessarily the links they provide, because that might be done in other ways. The crucial importance of that work is in the mere fact that objects are being identified, and that identifiers can lead to metadata about objects. That changes the whole game" [41]. The Open URL/DOI combination has proved very successful and is now in practical use with DOIs assigned via CrossRef [42] through several commercial services like EBSCO's LinkSource, Serials Solutions' Article Linker, Ovid's LinkSolver, and others; these allow end users to access all of a library's digital resources, regardless of whether they reside locally or remotely. One can imagine similar services for consumers in other areas: when a customer signs up for a service, a cookie is set on their machine to identify them as a customer. When the user clicks a DOI, the resolution would read the cookie and send the link to the appropriate service; users signed up for multiple services or who want to see options across multiple services could be managed through a local profile service detailing which services they subscribe to.

Open issues and difficulties

The issues currently facing DOI are less technical than social. They include:

  • Positioning DOI in the world of standards. IDF has placed a strong emphasis on communicating with, and participating in, a wide range of related standards and consortia activities [43]. The DOI syntax was an early formalised standard [44]: although that standard is only applicable within the DOI System, and therefore could have remained part of the DOI Handbook alone, we aimed to show that it had been reviewed by an external standards body. Extending this to other standards communities is difficult for any fully implemented system, as it becomes not a "pure" standard but one loaded with specific implementation requirements and policies. For example, DOI is also an implementation of both the URI and URN concepts; but it is not a "pure" engineering standard: adding those difficulties to the already heated differences in the wider Internet community about the relevance of URI and URN related standards has led us to proceed only slowly with formalising this.
  • Marketing the benefits of DOI. A criticism that has been levelled at the DOI is that it has only a vague "value proposition" and that for this reason, take up by vendors and technology companies is slow: their understandable response is "when our customers ask for it, we'll support it". The benefits of DOI need to be interpreted in context: for example, the value of persistence is widely recognised by the library community but may not be immediately obvious as a cost-benefit case to the publisher community. To build a business case for each community is a large task and requires a deep understanding of their needs — a superficial approach could do DOI's credibility more harm than good. Because of limited resources, RAs are left to deal with their communities largely alone; IDF could certainly benefit from an increased resource available on business development in the sense open to commercial companies like ContentGuard promulgating standards like XrML. DOI also involves some initially simple but soon complex issues; it does not lend itself easily to the ideal marketing approach of "a good bumper sticker, a buzz word, two good factoids, a good diagram, and two personal anecdotes" [45].
  • Naming and content management. One issue is, paradoxically, a result of recognition of the problem we have solved. When DOI began, issues of naming and content management were not well understood by the content communities, and no one was tackling them consistently. Now there are more efforts: they include MPEG-21 Multimedia Framework, the Semantic Web efforts, several technical standards group activities and numerous initiatives from parts of the content industries trying to solve the problem alone for their own communities, be they libraries, music recording industry, museums, or magazine publishers. Yet all these are part of a continuous spectrum of digital content that ultimately must interoperate. Much as this increased attention to the problem is to be welcomed, too much splintering of efforts in re-inventing the wheel is not good: consider what would have happened to the EAN/UCC bar code system if the grocery trade, furniture trade, shoe trade etc., had each devised their own solutions in isolation. Yet that is almost what is happening now, and IDF has made strenuous efforts to reach out to many different groups. Fortunately, there are signs that this message is getting across.
  • Autonomy. Another issue is how to work with the various efforts that have begun to use parts of what the DOI considers to be the whole solution, either alone or as part of a solution for a problem such as managed repositories. For example, users of persistent identifier handles in government [46] are now having to consider how they add appropriate and interoperable metadata; the DSpace project [47] is considering how it moves to a sustainable model for ongoing support; users of metadata are finding that their elements need to be interoperably identified [48]; and so forth. Can we offer them a solution through DOI, or will we be seen as attempting to interfere in their efforts? The approach to community autonomy and working groups we have developed is intended to make such a transition easy.
  • Business Model. The decision to create a self-sustaining system, not reliant on external indirect or direct funding, raised other issues. It created an initial perception that the DOI was in some way a commercial, or only-for-trade system, a perception that has been stubbornly slow to dissipate (perhaps influenced by the initial adoption by large publishers through CrossRef on the one hand, and the natural tendency of some developers to interpret "open" as "free"). Being a self-sustaining system also creates a barrier to participation: whilst the idea that anyone can join in providing they pay their way is equitable, everyone has funding issues and adding a further fee for entry is not welcomed: if we could have a rich sponsor, uptake would be much easier. It also created the need for a transition funding mechanism, through loans and sponsoring membership, to support the development of the system in the first few years until it stands on its own feet. Although good progress is being made in developing revenues from an operational system, this currently provides less than half of the DOI Foundation's budget (from zero in 2000); yet the economic changes of the past two years have been tough for any organisation reliant on discretionary membership funding, and so this remains a concern. It is unlikely that IDF will collapse if the bamboo stake holding the growing plant is removed while alternatives are being found, but it may lead to less than optimal growth. (The related issue that there is no free lunch needs no further exposition!)

As this article is published, the International DOI Foundation is about to embark on its sixth annual meeting. As in previous years, we will invite a wide selection of non-members, representing other communities, activities, or interested parties, to join us and share their experiences and engage in discussion. I look forward to being able to report further progress.

References

[1] DOI Handbook <http://www.doi.org/hb.html>.

[2] The Living Internet: Web Browser History. <http://livinginternet.com/?w/wi_browse.htm>.

[3] And are still with us: "the rate at which once-valid links start pointing at non-existent addresses — a process called "link rot" — is as high as 16 percent in six months. That means that about one sixth of all links will break". Dowling, Thomas; "One Step at a Time", Library Journal Fall 2001, page 36. <http://libraryjournal.reviewsnews.com/index.asp?
layout=article&articleid=CA178131&publication=libraryjournal
>.

[4] "History of the Dublin Core Metadata Initiative" <http://dublincore.org/about/history/>.

[5] Dekkers, Makx and Weibel, Stuart. "State of the Dublin Core Metadata Initiative, April 2003" D-Lib Magazine, April 2003, <doi:10.1045/april2003-weibel>.

[6] "History of Google" <http://www.google.com/corporate/history.html>.

[7] Green, Brian & Bide, Mark. (1997) "Unique Identifiers: a brief introduction". Book Industry Communication/EDItEUR <http://www.bic.org.uk/uniquid.html>.

[8] Rust, Godfrey, and Bide, Mark. "The <indecs> Metadata Framework: Principles, model and data dictionary." 2000. <http://www.indecs.org/pdf/framework.pdf>.

[9] International DOI Foundation Web Site <http://www.doi.org>.

[10] Kahn, Robert E. & Wilensky, Robert. "A Framework for Distributed Digital Object Services". Corporation for National Research Initiatives (CNRI), Reston, Virginia, 1995. <hdl:4263537/5001>.

[11] Pat. No. 6135,646 – System for uniquely and persistently identifying, managing and tracking digital objects – 10/24/00.

[12] Paskin, Norman. "DOI: Current Status and Outlook". D-Lib Magazine, May 1999. [A summary of DOI progress as of mid-1999.] <doi:10.1045/may99-paskin> .

[13] Paskin, Norman and Lannom, Laurence. "From one to many". International DOI Foundation, August 2000. <doi:10.1000/190>.

[14] "Internet refers to the global information system that: (i) is logically linked together by a globally unique address space based on the Internet Protocol (IP) or its subsequent extensions/follow-ons; (ii) is able to support communications using the Transmission Control Protocol/Internet Protocol (TCP/IP) suite or its subsequent extensions/follow-ons, and/or other IP-compatible protocols; and (iii) provides, uses or makes accessible, either publicly or privately, high level services layered on the communications and related infrastructure described herein." October 24, 1995, Resolution of the U.S. Federal Networking Council quoted in Kahn, Robert E. & Cerf, Vinton G. "What is the Internet (And What makes it Work)?" December 1999 <http://www.cnri.reston.va.us/what_is_internet.html>.

[15] CrossRef Web Site www.crossref.org <http://www.crossref.org>.

[16] HDL/DOI Plug-in for Adobe Acrobat and Acrobat Reader – Beta Version 1.0 <http://www.handle.net/hs-tools/adobe/download>.

[17] Paskin, Norman. "Identification and Metadata". (To be published in Digital Rights Management: Technical, Economical, Juridical, and Political Aspects in the European Union), in the series "Lecture Notes in Computer Science" (Springer-Verlag, 2003) <http://www.doi.org/topics/drm_paskin_20030113_b1.pdf>.

[18] Rosenblatt, Bill. "Enterprise Content Integration with the Digital Object Identifier: A Business Case for Information Publishers", June 2002. <http://doi.contentdirections.com/mr/cdi.jsp?doi=10.1220/whitepaper5>.

[19] Hunter, Jane. "MetaNet – A Metadata Term Thesaurus to Enable Semantic Interoperability Between Metadata Domains." <http://archive.dstc.edu.au/RDU/staff/jane-hunter/harmony/jodi_article.html>.

[20] DOI Registration Agencies: <http://www.doi.org/registration_agencies.html>.

[21] MEDRA (multilingual European DOI Registration Agency) <http://www.medra.org/Content/project.htm>.

[22] Worlock, David. "DOI: Starting a new Generation", EPS Update Note, 8th May 2003 <http://www.doi.org/news/EPS080503-DOIs.pdf>.

[23] Protecting Your Digital Assets: Technical Journal Publishers Lead the Way Using Digital Object Identifiers. Patricia Seybold Group, 2003 <http://www.doi.org/topics/Protect_Digital_Asset.pdf>.

[24] Rosenblatt, Bill. "Rights Expression Languages: The Key to DRM Interoperability"; The Seybold Report, Vol 2 No 24, April 2003 <http://www.seyboldreports.com/TSR/index.html>. [subscriber access]

[25] Brooks, Ken. Publishing Dimensions, Inc. (personal communication)

[26] IDF Press release, April 2003: "National Libraries Join International DOI Foundation" <http://www.doi.org/news/030417-Library.html>.

[27] Klimes, Ivan. "The science of science publishing"; imi insights, April 2003 1-4 <http://www.epsltd.com/IMI/IMI.htm> [subscriber access].

[28] Spedding, Vanessa. "Journal Cross-Linking – The Web's potential untangled". Research Information, July 2002. <http://www.researchinformation.info/feature1a.html>.

[29] Vogt, Sjoerd. "Investigative Report: Resolving the Links"; Information Today, April 2003, Vol 20, Issue 4, p.25.

[30] IDF Press release, April 2003: "Common Dictionary for DOI and ONIX Metadata" <http://www.doi.org/news/030415dictionarynews.pdf>.

[31] ONIX Web site <http://www.editeur.org/onix.html>.

[32] Lagoze, Carl. "Keeping Dublin Core Simple – Cross-Domain Discovery or Resource Description?" D-Lib Magazine, January 2001 <doi:10.1045/january2001-lagoze>.

[33] InterParty Web site <http://www.interparty.org>.

[34] Zaret, Elliot. "Internet Pioneer Urges Overhaul", Interview with Robert E. Kahn, CNRI. MSNBC, November 21, 2000. <http://www.doi.org/news/msnbc_rkahn_interview.pdf>.

[35] Farber, Dan. "It's time to rebuild the Internet". ZDNet Tech Update, May 21,2003 <http://techupdate.zdnet.com/techupdate/stories/main/0,14179,2913761,00.html>.

[36] Paskin, Norman. "On Making and Identifying a 'Copy'". D-Lib Magazine, January 2003 <doi:10.1045/january2003-paskin>.

[37] De Soto, Hernando. The Mystery of Capital: Why Capitalism triumphs in the west and fails everywhere else. Basic Books 2000. <http://www.ild.org.pe/tmoc/cp1-en.htm>.

[38] Caplan, Priscilla et al. "Linking to the Appropriate Copy: Report of a DOI-Based Prototype". D-Lib Magazine, September 2001. <doi:10.1045/september2001-caplan>.

[39] ANSI/NISO draft standard for trial use (Z39.88-2003) May 1 - Nov 30 2003 <http://library.caltech.edu/openurl/PubComDocs/Announce/20030416-Announce-Trial.htm>.

[40] NISO Committee AX Public Comments <http://library.caltech.edu/openurl/Public_Comments.htm>.

[41] Bruning, Dennis. "Interview with Herbert van der Sompel, Creator of OpenURL/SFX"; The Charleston Advisor, Volume 4, Number 4, April 2003. <http://charlestonco.com/features.cfm?id=124&type=np>.

[42] "OpenURL and CrossRef" <http://www.crossref.org/03libraries/16openurl.html>.

[43] IDF "Alliances and Liaisons with Other Organizations" <http://www.doi.org/handbook_2000/governance.html#7.11>.

[44] ANSI/NISO Standard Z39.84-2000. "Syntax for the Digital Object Identifier": <http://www.niso.org/standards/standard_detail.cfm?std_id=480>.

[45] Nelson, Michael (Director Internet Technology and Strategy, IBM). "Industry Perspectives on the Future of the Cyberinfrastructure"; <http://www.dtic.mil/cendi/minutes/pa_1202.html>.

[46] "Getting a Handle on Federal Information: Persistent Identification Using Handles" <http://www.dtic.mil/cendi/activities/01_29_03_handles_overview.html>.

[47] Smith, MacKenzie, Barton, Mary, Branschofsky, Margret, McClellan, Greg, Walker, Julie Harford, Bass, Mick, Stuve, Dave, Tansley, Robert. "DSpace: An Open Source Dynamic Digital Repository", D-Lib Magazine, January 2003. <doi:10.1045/january2003-smith>.

[48] CORES Standards Interoperability Forum Resolution on Metadata Element Identifiers <http://www.cores-eu.net/interoperability/cores-resolution/cores-resolution.pdf>.

Copyright © International DOI Foundation Inc., 2003.
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/june2003-paskin