Volume 21, Number 7/8
Table of Contents
The DOI Twenty Years On
(This Opinion piece presents the opinions of the author. It does not necessarily reflect the views of D-Lib Magazine or its publisher, the Corporation for National Research Initiatives.)
In a slightly unusual twist to our normal practice with opinion pieces we present, with permission, the transcript of a talk given by Mark Bide at a recently held IDF (International DOI Foundation) meeting. The IDF Registration Agencies such as CrossRef and DataCite gather twice a year to discuss ongoing efforts and to plan for the future. The summer meeting this year was celebrating the 20th anniversary of 'The Armati Report' that led fairly directly to the creation of the Digital Object Identifier (DOI). Mark, speaking in a personal capacity, takes a long view of the accomplishments of the last twenty years as well as the challenges that remain.
While I continue to wear a number of hats in the publishing industry in collective licensing, in accessibility and inclusive publishing, in the Copyright Hub I am making this presentation to representatives of the International DOI Foundation's Registration Agencies in a purely personal capacity. So, those of you who don't know me and indeed those of you who do are fully entitled to ask why Norman Paskin, Managing Agent for the IDF, might have asked me to start the conversation today.
The invitation came in the wake of my being asked to give a presentation to the US Patent and Trademark Office on identifiers and identification at a workshop on the deployment of the Copyright Hub despite the fact that I made it clear that I "don't do identification and standards stuff any more". And I really don't do this identification and standards stuff any more. So when Norman asked me to do this introductory "think piece", now quite a long way outside my comfort zone, I was properly flattered. Why me?
I guess because I have been concerned with the pursuit of the more efficient management of content and copyright for the better part of two decades. And, because unambiguous identification goes to the heart of any attempt to manage content and copyright on the network, I have been involved in identifiers since the mid-1990s, around the time when the Digital Object Identifier, the DOI, was launched. My best known, my most frequently cited, published paper, co-authored with my colleague Brian Green was on identifiers and was published in 1996 ; it was at least in part a response to the Armati Report which is the genesis of this twenty year retrospective, although if I recall properly, I think it was more of a response to a paper that Chris Burns wrote for the Association of American Publishers at around the same time [2, 3].
I subsequently became part of the team that delivered the <indecs> project, and later I was involved in the project that ultimately became DDEX, the music industry standards body. I have run a global trade standards organization in the publishing industry and in that role been responsible for an ISO identifier authority. I was also the original designer of the Linked Content Coalition project and was responsible for establishing that organization's governance model. So much of my professional life for the last twenty years has been engaged with the same issues that the founding fathers of the International DOI Foundation grappled with in the early 1990s parallel to, and regularly in contact with, Norman Paskin, Larry Lannom of CNRI, Godfrey Rust of Rightscom, and many other people familiar to the people in this room. So perhaps Norman's choice for a curtain raiser makes at least a little sense.
Re-reading the report that Douglas Armati wrote on information identification for the International Association of Scientific, Technical and Medical Publishers in June 1995, was truly fascinating, and (unlike any of you, I suspect) I actually went back and read the entire report. (Well, most of it.) There are some fascinating glimpses into the past, like the Appendix on The Common Copyright Data Model by my long-standing friend, colleague and collaborator Godfrey Rust (who was still in those days gainfully employed in the music industry before, like me, becoming a consultant, a role which another old friend describes as being "gainfully unemployed").
The Common Copyright Data Model was a very basic first iteration of a data model that will be immediately familiar to anyone who has followed Godfrey's work since a model more fully articulated in the <indecs> project a few years after the Armati Report, and through several iterations, coming to what may now be regarded as a level of maturity in the work of the Linked Content Coalition. This model also lies at the core of all the IDF's metadata thinking.
In re-reading the report, what was most surprising to me was the extent to which (at least at some level) it could have been written yesterday. Take a look at the opportunities that he identified. These points have a surprising freshness about them.
- The economic value and strategic importance of intellectual property (IP) is growing rapidly
- IP is being traded more frequently in digital, networked environments
- A substantial global market exists for valuable IP delivered via networked digital devices
- Rapid, low cost, interactive access to these assets would be a boon to users
- Exploitation of this market provides growth opportunities for IP rights (IPR) owners as well as for suppliers of networks and digital devices
- An open market in IP assets would potentially add value to the portfolios of all participants
- Solutions enabling cost effective dynamic licensing offer the most promise
The focus is the economic opportunity that trading in copyright materials on the network represents an opportunity not only to the copyright industries but also to society as a whole. And inevitably, the same is true of the threats a loss of control, ineffective and perhaps threatening proprietary solutions.
- Networked digital devices facilitate simple, quick, cheap reproduction of valuable IP assets
- Existing solutions provide limited protection for IPR owners
- Existing standards do not support licensing of data objects smaller than a complete work
- Proprietary identification, security and trading of IPR based assets is expensive
- Proprietary solutions do not allow open network trade in IP assets
- No effective means exist to identify unlicensed uses of IPR in open networked environments
- IPR owners have good reason to be concerned about losing control of their assets in this domain
- They are naturally reluctant to license use of their IP without adequate protection
I could have used precisely the same points in a SWOT analysis to support the <indecs> project in the late 1990s or to support the launch of the Linked Content Coalition project a decade or more later or the creation of the Copyright Hub even more recently.
So, here is the first point for us to ponder today. We have known about the challenges facing the media with the migration of content distribution for a long time over twenty years and we have even known quite a lot about what we needed to do about it. Yet many of the tools we need and indeed which we already knew twenty years ago that we needed are still at a relatively early stage in their implementation two decades later. In particular Armati's principal theme was that identification certain, unambiguous, actionable, interoperable identification goes to the heart of the challenge of managing commercial business on the Internet. Here's what he said:
- Active development of identification schemes is underway in many industries
- Many projects assume unifying scheme will emerge
- Pilot projects generally use bespoke or existing industry based identification systems
- Widespread commercial use of this approach will lead to "Tower of Babel" in open markets
- Crucial data object granularity issues not addressed
- Pilot projects are using file/document level identification and management only
To this audience, unlike in Washington earlier in the year when I had to present a sort of "Identifiers 101", I don't feel today that I need to spend time explaining why identifiers, and particularly standard identifiers are important, but we have to accept that it is a clear sign of our failure as a community that we are still having to explain the importance of identification to regulators like the US Patent and Trademark Office and to tell them how well (or, more sadly, for the most part how badly) we have done in the last twenty years.
Of course, the history of identifier standards in the media, and particularly in publishing, predates the Armati report by 25 years. We have had the ISBN standard for as long as I have been in publishing, and the UK's SBN reaches further back into the 1960s. Its impact on the industry was revolutionary; it remains close to universal in its implementation after all these years.
Here are some points about the ISBN that I have taken from a presentation that Norman Paskin gave about 10 years ago, which I thought you might find interesting in the context of today's conversation. Norman quoted from an article in The Bookseller by David Whitaker published in May 1967:
- "In 1965 the largest British book wholesaler, WH Smith, announced their intention to move their wholesaling and stock distribution operation to a purpose built warehouse in Swindon [in 1967]. To aid efficiency they would install a computer, and this would necessitate the giving of numbers to all books held in stock..."
- "The idea of numbering books is not new. One British publishing house has been giving numbers to its books for nearly a hundred years. What is an entirely new concept, however, is that numbers should be given to all books; that these numbers should be unique and non-changeable; and that they should be allocated according to a standard system..."
The global book industry owes an incalculable debt of gratitude to David Whitaker for his foresight and vision in introducing a system of standard identifiers to create an efficient supply chain without which it is hard to imagine how the industry would operate at all.
Nearly 30 years after the work of David Whitaker, the Armati Report looked at current and emerging standards in the mid-1990s (for example, the ISWC for the identification of musical works was at that point in development) and made some clear recommendations. Here is a selection of them.
- STM publishers and others should further strengthen identification programs...
- Encourage WIPO ... IPA, FEP, AAP to work with STM in precompetitive standardization activity
- Encourage practical and financial participation from "all other affected industries"
- Build cross industry consensus on... "a set of universally acceptable identification protocols" by the end of 1997
- Establish an ISO standard
Why am I particularly drawing these recommendations to your attention? Look at the middle three bullet points. Armati called for collaboration not only within the publishing industry, but also across what he called "all affected industries" the copyright/media industries as a whole to arrive at a single universal identification scheme, which could be used by all of us to create a unified infrastructure for the management of IP assets on the Internet. Does this sound at all familiar?
Bravely, or perhaps foolishly, Armati went much further than just making these recommendations. He went on to make some predictions about what impact it would have if his recommendations were fully implemented.
Niels Bohr, the Danish physicist, may or may not have coined the aphorism "making predictions is difficult, particularly about the future". But as someone who also spent a lot of the 1990s making predictions about the mid-term future some of which were accurate, some profoundly otherwise, and every one of them wrong in terms of timing I have learned a great deal about the humbling process of making predictions particularly when others look at them a decade or two later.
Below are some headlines drawn from Armati's predictions, all of them dependent on the deployment of universal identification.
- 1997: comprehensive robust and flexible electronic copyright management systems
- 1998: licensed, controlled access commercial "internetworks"
- 2000: licensed, controlled general access commercial "internetworks"
- 2002(?): licensed "open access" networks (note that he didn't mean "open access" in the way we use that term today)
He foresaw a steady progress, driven by the increasingly universal deployment of identifiers towards an IP-aware Internet, one in which creativity could flourish because the management of copyright and licensing was built into the Internet at its core.
And by now, where did Armati think we would be?
- Increasing bandwidth video assets "becoming more mobile"
- Around 2012, a sufficiently robust network (with universal identifiers deployed) to create a genuine market place in IPR assets
- Asset owners begin to encourage reuse of their content ... knowing they will be paid
- An international rights title registry operational
- Securitization of rights based assets becomes possible
In Armati's future, we would be in a very different world from the one we are in today, that's for certain. Some parts of his future vision came true early particularly the widespread availability of sufficient bandwidth to support the distribution of audio-visual assets and that was a prediction which at that time many people would not have accepted as at all likely to happen.
But these predictions didn't come true, so what happened? We can't blame Doug Armati, because his predictions were based on the universal implementation of a common identity model. And, the truth is that we didn't collectively follow the recommendations that he made in1995. As a result, while we have achieved a little bit of what Armati had in mind, it was nothing like enough to deliver the vision, which indeed still seems quite a long way off.
The key to managing copyright on the Internet, the core of the Armati paper, was and still is certainty of identification. This is still the vision of the UK Copyright Hub you might want to watch this short video.
It isn't my job today to talk about what the Copyright Hub is doing, but the reality remains that in many media sectors we are still falling a long way short of the vision that Doug Armati communicated in his report to STM. The DOI once accurately described by Godfrey Rust as "the most powerful identifier ever devised" had, and indeed still has, the technical capability and capacity to fulfill the Armati vision.
But, things didn't quite happen in the way Armati hoped.
Now, of course, some of you may not share the vision of the Copyright Hub, or of the founding fathers of the DOI, or of the Linked Content Coalition or of the myriad other projects that have worked on this core issue of answering Charles Clark's famous "call to arms" on the future of copyright and creativity in the digital environment, "The answer to the machine lies in the machine". But I continue to believe that this vision represents the best available deal for society as a whole, and hope many of you do.
Today, shouldn't we be asking ourselves just one simple question: why have we not achieved so much more in the last twenty years? Enormous efforts have been made and in some case heroic budgets have been spent. But the impact has been at best patchy. There have been areas of huge success, but others places where progress has been painfully slow. We remain for the most part locked in silos, in a world where moving between industry sectors is still like changing the bogies on the trans-Siberian railway, as we the cross the border between one part of our industry and another, and move from one track gauge to another.
Why have we made so little progress in creating the cross-sectoral, interoperable identification standards that we have been seeking for the last twenty years?
I do not believe that the answers to this question are primarily technical. The massive success of the CrossRef implementation of the DOI 70 million DOIs and rising shows that the technical approach that Armati had in mind in the 1990s works not only in theory but also in practice. But for many years, it represented a signal success in a single, important, but circumscribed, market. There have of course been other DOI implementations, but none yet on anything approaching the scale of CrossRef, and it is only with the arrival of EIDR that we have a clear exemplar of a DOI implementation that takes us so firmly into a far-distant domain (as envisaged by Armati) with real and rapid scale and growing application.
So, if the barriers to achieving universality of identity management are not primarily technical, if the same identifier system can equally be used to identify scholarly journal articles and Hollywood movies, where do they lie? It is often assumed by those on the outside of the standards world that standards are simply about technology. The best technical solution will always win. In reality, what those of us who have worked in standards know one of our guilty secrets is that standards implementation is mostly about establishing acceptable social norms to which people willingly adhere. How does that come about? The answer is primarily encapsulated in a single word, a single concept: trust. There may also be commercial and cost issues to consider, but an exploration of governance, which is the primary builder of trust, is critical to any understanding of where we are today.
Why is governance so important? Because, while we all talk about "neutrality" in the standards world, there is another hidden and often unacknowledged secret in standards making: no standard can ever be wholly neutral. Every standard has an unspoken "point of view", a set of assumptions that informs its entire specification. From the point of view of an organization that implements a standard, it is critical that the standard is stable, because otherwise there is a risk in committing to implementation. It is also important that the specification represents a consensus "point of view" among those who implement it, or at least something close to a consensus. "Consensus" is an important word in standards setting, and ISO has a precise definition of what reaching consensus means in its process. It means bringing the discussion to a point where there is "no sustained opposition". Getting to consensus takes time, which is one of the reasons that standardization processes often seem interminable. And all the more difficult when the people with whom we have to create that consensus seem to come from different planets and to have businesses that are a long way away from our primary commercial or other interests.
The governance of the IDF was initially dominated by STM publishers completely understandably since they had provided the financing for the IDF and for its first and major implementation, CrossRef. But for those "outside", that domination automatically made the DOI look like the wrong solution. There was a complete lack of trust. And building trust takes time and willingness on both sides to try.
We may believe that the IDF's problems in terms of building trust are largely over, now that it has such a diverse group of RAs, representing such diverse interests; not only scholarly publishers and movie studios, but libraries and government; not only in the US and Europe but also in Asia. But work remains to be done in bringing the other media sectors into the IDF if we are to see the vision of the DOI as the "universal identifier" realized in its next twenty years. And issues with trust are often amplified by commercial issues, and that has certainly been the case for the IDF.
The universal application of identifiers would obviously be much more easily accomplished were the identification system to be "free of charge" at least to those who have to apply them, because cost (even a very low unit cost) represents a barrier. But of course, identifiers and identification systems are never free. There is always an associated cost that somebody has to pay for. And the cost of an identifier is the least of your worries in maintaining an identification system. But all too often, people focus on direct, identifiable cost, and give little thought to benefit.
But it is also true that the costs of implementing any standard in any supply chain and the benefits to be derived from its implementation are bound to be unequally distributed. This can prove to be a considerable disincentive to implementation, making the supply chain more efficient for others seems like an act philanthropy (although I would argue otherwise).
Nevertheless, adopting a new identifier involves assuming cost which organizations can find hard to justify in their individual, specific circumstances, not the least of which is the cost of developments to legacy systems and legacy mind-sets. And all at a time when the profitability of business, be it for a giant multinational or for an individual professional photographer or singer-songwriter, are under huge pressure.
So, we have had some real successes over the last twenty years in meeting the Armati vision many of them represented in this room and we have considerable efforts which are still being made to drive that original set of high-level objectives through to a satisfactory conclusion. We certainly haven't given up. But we may well not have another twenty years to get this right. Time is unlikely to be on our side.
 Bide, Mark & Brian Green. 1996. Unique Identifiers: a brief introduction. (A revised version of this paper, published in 1999, is available here.)
 Armati, Douglas. 1995. Information Identification: A report to the International Association of Scientific, Technical and Medical Publishers (STM): The Armati Report.
 Burns, Christopher. 1995. Copyright management and the NII: Report to the Enabling Technologies Committee of the Association of American Publishers.
About the Author
Mark Bide has worked in the publishing industry for over forty years, for twenty years corporate roles, and, since the early 1990s, as a consultant. He is now Chairman of the Publishers Licensing Society and co-Chair of the Copyright Licensing Agency. He is also a strategic consultant to the Copyright Hub.