Opinion
spacer

D-Lib Magazine
February 2002

Volume 8 Number 2

ISSN 1082-9873

Digital Preservation and Deep Infrastructure

 

Stewart Granger
University of Leeds
gkh12@dial.pipex.com

Red Line

spacer

(This Opinion piece does not necessarily reflect the views of D-Lib Magazine, its publisher, the Corporation for National Research Initiatives, or its sponsor.)

 

Introduction: Deep Infrastructure in the Task Force Report

In the seminal Task Force Report on digital preservation [1], the phrase 'deep infrastructure' is used several times. This phrase immediately struck me, as I'm sure it did others, as well chosen and extremely suggestive. My aim in this article is to attempt to develop ideas that may help to explicate the concept of 'deep infrastructure'[2].

Although the report does convey an appropriate sense of how it is difficult to identify and make tangible the nature of the problems that need to be solved, I think it is fair to say that the Task Force report clearly identified only two main aspects of what the Task Force meant by deep infrastructure: the need for certified archives (and the implicit process of certification), and the notion of a 'fail safe mechanism'. With respect to the former, we have recently had the excellent RLG/OCLC report [3]. However, the latter aspect — that of a fail safe mechanism — is one aspect of the report with which I felt unhappy [4]. The picture of deep infrastructure I want to present is represented in Figure 1.

 

Image showing the author's conception of deep infrastructure

Figure 1: A view of deep infrastructure

 

The diagram in Figure 1 calls for some explanation. Obviously, it is intended to be a picture of what is desirable and not of what currently exists. In the remainder of this article, I will explain what I mean by the four main aspects of deep infrastructure labelled in the figure. The arrows are intended to represent the idea that the technical solutions to digital preservation problems should meet the requirements of users, that is, those who need to preserve digital content or have long-term access to it (arguably, directly or indirectly, that is all of us). This may sound like the most bland piece of software engineering — identify the requirements and then design and build the system to meet those requirements (bland but often not easy to follow!). What I want to argue, however, is that this is not the reality of the situation we currently face. At this time, technologies frequently are designed and developed more for the benefit of vendors than for users, and persons concerned with digital preservation are expected to jump through whatever hoops are required by those technologies. This controversial view will recur later in this article.

I hope that the diagram in Figure 1 will prove useful, but like all such efforts, it no doubt has its limitations. Nowhere on the diagram do the words 'economics' or 'costs' occur, but of course when one unpacks many of the central concepts, they will be seen to have many cost and economic implications. Moreover, it will be seen that the concepts imply decades of work before they are fully realized. One final caveat: in this paper I am attempting to paint the 'big picture' — I do not claim that this picture is complete.

1. Organizational Aspects of Digital Preservation Infrastructure

1.1 Certified Archives

In 1994, the Joint Commission on Preservation and Access/RLG Task Force on Archiving of Digital Information [2] began to describe and explore the nature of a reliable repository for digital materials.

The major findings of the CPA/RLG report included these points:

  • Long-term preservation of digital information on a scale adequate for the demands of future research and scholarship will require a deep infrastructure capable of supporting a distributed system of digital archives.
  • A critical component of the digital archiving infrastructure is the existence of a sufficient number of trusted organizations capable of storing, migrating and providing access to digital collections.
  • A process of certification of digital archives is needed to create an overall climate of trust about the prospects of preserving digital information.

As mentioned above, we now have the recent report from RLG/OCLC [3] on this aspect of digital preservation infrastructure. Here I will only list what the report contains. The report:

  • Proposes a definition of a trusted digital repository
  • Identifies the primary attributes of a trusted digital repository
  • Articulates a framework for the development of a certification
program
  • Identifies the responsibilities of an OAIS-compliant digital repository
  • Informs the RLG/OCLC communities of other developments necessary to implement a reliable repository and,
  • Provides formal recommendations for future work.

In addition to certified archives, I believe that there is also a need for (at least) two other kinds of organization as discussed below.

1.2 Collaborative structures

Even a cursory examination of the problems of digital preservation indicates the positive need for collaboration amongst interested parties and institutions. It should be obvious that such collaboration is likely to facilitate cost savings, either by economies of scale or by other means. That, I think, is both true and important but I believe does not convey the scale of the problem confronting us. An example may help clarify what I mean. It has been suggested, convincingly to my mind, that future researchers could find it useful to have preserved some of the data that is being collected by supermarkets. [5] That small word 'some' however poses huge problems. The first is who selects the information to be preserved? The obvious answer — the likely users of the data — only raises more questions. Prima facie, both social scientists and historians would have such a potential interest, and this suggests there should be a collaborative mechanism whereby interested parties could sort out such issues. Fine in principle, but such collaborative mechanisms do not currently exist.

The second major problem the example illustrates is that the data creators are a different set of people from the potential users of the data. Would the data creators have any incentive to collaborate with the users? Some might argue that this is where the need for a 'fail safe' mechanism arises, but rather than pursue this, let me attempt to put the problem in a wider context. The wider context should perhaps start by reflecting on the wide variety of interests involved. One (admittedly skewed) way of thinking about the digital world can be represented as in the Venn diagram in Figure 2.

 

Chart showing digital domain

Figure 2. A view of the digital domain.

 

For the purposes of this article, one can think of the world as divided into three categories — libraries and archives, research communities, and commercial entities — each of which have subsets operating primarily in the digital domain.

The diagram indicates the existence of a plethora of differing motives and cultures: public vs private, commercial vs non-profit, national cultures and sub-cultures, to name a few. These differing motives and cultures will, of course, influence the organizational modus operandi when it comes to the digital world. It does not stop there. Given their different interests and cultures, digital communities are likely to have different priorities, e.g., with regard to access, rights management, interoperability, or preservation, etc. And it does not stop there either, since, as noted above, the roles in the digital process (data creator to data user to data preserver) give rise to differing perspectives and interests. Undoubtedly, some of these interests will be in conflict, but many will simply be different.

It is against this background that collaborative structures need to be built. Where possible, these collaborative structures may lead to conflict resolution, but it would be optimistic, to say the least, to suppose that such resolution will always be possible. Nevertheless, there surely is scope for a vast amount of collaborative effort that may lead to solutions not possible for a single group or community to achieve on its own. So, unlike what is proposed in the next section, 'collaborative structures' are not organizations themselves so much as processes of communication between existing (or new) organizations. Perhaps initiatives such as the Digital Preservation Coalition may be seen as the first steps towards such structures [6].

Having written the above from an institutional perspective of digital preservation , it is also worth noting that digital preservation issues are increasingly important for individuals as well. Increasingly, individuals are making greater investments in their use of digital objects. Consider their use of Personal Digital Assistants (PDAs), for example. PDAs, like other technologies, can become obsolete virtually overnight — even when they appear to be well established. Although important from an individual's point of view, it also raises two points with respect wider community interests. Cliff Lynch [10] makes the point that with respect to physical objects, individuals, rather than institutions, have often been the ones who preserved the objects that have later become recognized as significant. Therefore, we should be cautious with respect to solutions to digital preservation problems that would take the form of allowing special treatment for institutions (e.g., certified archives) but not for individuals.

1.3 Centralized Digital Preservation Research & Development Centers

Many actions an institution must undertake in order to preserve digital objects are such that the actions will need to be taken by the institution itself — or at least by a third party undertaking the actions on the institution's behalf. However, there are other actions not of this nature — actions that a few people could undertake on behalf of the many. For these actions, it would make sense for institutions to collaborate in supporting centralized research centers — if only for cost reasons. These actions could include:

  • Developing and maintaining emulators
  • Developing metadata tools
  • Providing data recovery services (commercial data recovery services exist but are very expensive).

Of course, establishing such centers, especially and preferably on an international basis, would clearly be a large undertaking; nevertheless, it is clearly desirable to do it.

2. Legal Aspects

The legal environment in which digital preservation occurs most obviously raises two difficult subjects: Intellectual Property Rights (IPR) and electronic deposit.

2.1 IPR

In a reasonable world, there ought to be a balance between the interests of copyright holders on the one hand, and information consumers on the other. In the world of books, a reasonable balance has been achieved. Authors are entitled to royalties and are protected by copyright law. This balance depends on contingent practical factors, such as the fact that photocopying an entire book is not an attractive option both in terms of the effort required and in terms of cost. Readers' interests are protected by policies of fair use and first sale. [7] First Sale, in particular, makes public lending libraries possible. As reported by the 'Committee on Intellectual Property Rights and the Emerging Information Infrastructure' [8] 'going digital' threatens this balance. From the content provider's point of view, threats arise from the ease with which digital objects can be copied without loss of quality, and because the objects can be easily distributed across networks. Conversely, the information consumer may be threatened by the imposition of draconian conditions of use and by new business models (e.g., licensing, pay per view, etc.) that would lead to the demise of 'first sale'. With respect to the former, recently, one e-book vendor included conditions of use with the absurd prohibition on reading the work aloud!

I am particularly concerned that the introduction of Technical Protection Systems (TPSs) threatens to make more difficult (and therefore more expensive) impossible or illegal, the activities needed for digital preservation. A conclusion reached in the report, The Digital Dilemma [8], is that "….more legitimate reasons exist to circumvent access controls than are currently recognised in the Digital Millennium Copyright Act (DMCA)" [9].

The mention of technical protection systems points to a conclusion well articulated in The Digital Dilemma, which is that achieving a reasonable balance between the interests of users and those of content providers is not solely a legal issue; rather, it is getting a balance between the legal framework, the extant technology and the business model. With technology in its current state of flux, it is impossible for all these issues to be definitively resolved at the current time. Indeed, a recommendation from The Digital Dilemma [8] is that "Legislators should not contemplate an overhaul of intellectual property law and public policy at this time to permit the evolutionary process to play out" [p.16]. What is important now is that the digital preservation community be pro-active to ensure this process evolves in a way that does not harm the interests of preservation.

2.2 Electronic Deposit

The hoped-for intellectual property law and public policy evolutionary process mentioned above should include developing systems for electronic deposit similar to those used for books. The Digital Dilemma recommends a task force to study the problem and, in an interesting footnote, envisages one possible outcome as follows:

"…one scenario might call for voluntary (or mandatory) deposit of digital works that are protected by copyright in the United States and those that are offered for sale under licence or, if distributed free of charge, are protected by a TPS. Such deposited copies would not be made available to the public by the depository as long as they are still offered to the public by the rights holders, except for viewing within the library itself (as is the case with hard-copy works). All deposited copies would be 'in the clear' (i.e., with no encryption or other access limiting mechanism)….. The intent here is to extend into the digital world the traditional balancing act of IP — providing enough control over the work to offer an incentive for creation, yet ensuring that in the long term all work becomes a part of the public intellectual record to the benefit of society as a whole. Providing for deposit of materials 'in the clear' may aid in dealing with problems of access that arise from technical protection mechanisms, as well as issues raised by archiving." [p.208]

3. Culture

"Some content providers seem to have ambitions more appropriate for someOrwellian dystopia." Clifford Lynch [10]

In this context, by 'culture' I mean sets of beliefs, attitudes and values with respect to issues impinging concerning digital preservation. Before looking at some disturbing new developments, it is worth reflecting on the culture of information technology in the recent past. Speaking as a user, that culture appears to me to have been lamentable, particularly in the largest information technology market — namely the world of the personal computer. I will mention only two aspects of this vendor-driven culture — reliability and forced obsolescence. For well over a decade, the operating systems of the most popular platforms have been notoriously unreliable and liable to crash. Modern versions of these operating systems are advertised as being more reliable — as though reliability was not thought to be an essential requirement in the first place. At the same time, users have been virtually compelled to constantly upgrade both hardware and software, regardless of the users' real needs, since after a relatively short time, failure to upgrade will result in having an unsupported system.

Against this background, it is not encouraging to see new information technology opportunities become available that could threaten users, in general, and digital preservation, in particular. Consider the following set of possible conflicts illustrated in Figure 3.

 

Diagram showing the possible conflicts between vendors and DL communities

Figure 3. Possible conflicts between vendors and digital communities.

 

It is a fact that obsolescence can be in some people's interests — at least if those interests are thought of narrowly in terms of commercial gain.

Faced with these threats, there is the consoling thought that if vendors attempt to push things too far, they will meet with consumer resistance. For example, if makers of e-book readers attempt to impose draconian conditions of use, they will kill the goose they hope is laying the golden egg. However, it is currently unclear how much consolation this provides, since these same vendors may be able to push things far enough to create problems for digital preservation before deterring potential consumers. Some of us believe that certain ways of making a profit are unacceptable — including actions that endanger our current or future digital heritage.

The threats referred to in the diagram in Figure 3 represent a pessimistic scenario. However, there is an alternative and optimistic scenario. The optimistic scenario is that — although currently digital preservation is difficult and expensive — as manufacturers respond to the demands of a pro-active digital preservation community, digital preservation could be made easier and cheaper. The development and adoption of standards would contribute toward making that optimistic scenario become a reality. Additionally, what I mean by a pro-active digital preservation community is one that would not fatalistically accept a vendor-driven culture. Consider this statement,

"…standardisation sows the seeds of its own destruction by encouraging vendors to implement non-standard features in order to secure market share." [12]

I find this attitude astonishing — it is rather like a criminal saying 'well, if you didn't have laws I wouldn't have to break them.'

Perhaps part of the scepticism about standardization derives from the assumption by some that when one talks about standards, one must be talking about an ISO (International Standards Organisation) or similar standardization process. ISO standardization is notoriously slow — chances are that by the time a standard is delivered, it is out of date — or at least that is the perception. I have two things to say about this: first, some ISO standards do prove their worth. Second, the ISO model is not the only model for standardization — the Internet Engineering Task Force (IETF) works on a quite different model. Network protocols (i.e., standards) are the sine qua non of the Internet. The Internet works, and it would be hard to claim that the Internet does not change and evolve quickly. (See the discussion by Lynch [13]).

Reference to the Internet should remind us that, in fact, standards are ubiquitous. In "Standards for DSEP (Deposit System for Electronic Publication)" [14], pointers are given to approximately 40 relevant standards (not including any to the plethora of data formats [15]). The old joke says that 'we love standards — that is why we have so many of them'. Indeed, this abundance of standards sometimes is a problem, and nowhere does the problem seem more acute than when it comes to metadata. Sometimes it seems that every user community has its 30-page specification for metadata — but only metadata for a specific purpose. I am aware of two specifications for digital preservation. How can we realistically have metadata that will support all the needs of an archive, i.e., for discovery, rights management, interoperability and preservation — to name the most obvious? Nevertheless, there are areas where new standards will be needed. When the promised land of the certified archive arrives, we will need standards for the certification process — standards that will provide quality assurance (perhaps similar to, or even a specialized version of, ISO 9001). The increasingly popular OAIS emerging standard for Open Archives lists 12 areas where standards are needed [16].

4. Technical Infrastructure

Most of what I placed in the box labelled 'technical infrastructure' in Figure 1 has already been explained in previous sections. I would just comment here on the 'etc etc'. This term is a placeholder, not only for things about which I have not thought but also, hopefully, for things not yet invented. As digital preservation requirements are further identified, I believe those requirements will include hardware and software tools, techniques or components to make digital preservation easier and cheaper. And if we succeed in changing the culture, vendors will be happy to implement them.

5. The Way Forward: the Dual Approach

Clearly the deep infrastructure advocated in this article will take many years to establish. For this reason, it makes sense for institutions to adopt a dual approach: a pragmatic approach and a strategic approach. The pragmatic approach would consist of preserving digital objects now as best one can. There are already several guides to best practice available covering many aspects of digital preservation. More contentiously, I assert that institutions should also adopt a strategic approach. This approach would deal with medium- to long-term issues and could include:

  • Identifying user requirements for digital preservation and setting the preservation agenda accordingly
  • Building collaborative structures
  • Developing standards
  • Creating trusted archives
  • Supporting centralized digital preservation Research & Development centers.

In terms used earlier in this paper, these activities would help change the culture to one in which our digital cultural heritage will be protected.

Notes and References

[1] Preserving Digital Information: Report of the Task Force on Archiving of Digital Information, Commission on Preservation and Access and The Research Libraries Group, Inc., May 1, 1996, <http://www.rlg.org/ArchTF>.

[2] The views expressed in this article are entirely the responsibility of the author and do not necessarily reflect the views of my colleagues.

[3] "Attributes of a Trusted Digital Repository" available at: <http://www.rlg.org/longterm/attributes01.pdf>

[4] The other aspect was the report's assumption that migration is the only viable digital preservation strategy. For reasons to doubt this see Stewart Granger, "Emulation as a Digital Preservation Strategy," D-Lib Magazine, <http://www.dlib.org/dlib/october00/granger/10granger.html>, October 2000. And "Digital Preservation and Emulation: from theory to practice" in ICHIM 2001, Proceedings, Sept 3-7, Milan, ISBN 1-885626-24-X, also available at <http://dspace.dial.pipex.com/stewartg>.

[5] See Kelly Russell "Preserving digital scholarly resources: progress through collaboration", Assignation, Volume 18, No. 2, ISSN 0265-2587

[6] Information on the Coalition is available at: <http://www.jisc.ac.uk/dner/preservation/prescoalition.html>.

[7] To simplify two complex issues: fair use is the right to use copyright information without permission for certain limited purposes e.g., reviewing a work. First sale applies to a physical object such as a book — the copyright owner can control the sale of a book and set the price but once someone buys the book then they have full ownership of that copy and may dispose of it how they will.

[8] The Digital Dilemma, Computer Science and Telecommunications Board, National Research Council, National Academy Press, 2000, ISBN: 0-309-06499-6

[9] See [8] p.222

[10] Clifford Lynch, 2001, "The Battle to Define the Future of the Book in the Digital World," First Monday at: <http://firstmonday.org/issues/issue6_6/lynch/index.html>.

[11] I use 'fair use' and 'first sale' here simply as a placeholder to mean a reasonable balance between interests — I do not pre-judge what the optimum solution will be.

[12] Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation, Jeff Rothenberg, 1999. Available at <http://www.clir.org/pubs/reports/reports.html

[13] Clifford A. Lynch, "The Case for New Economic Models to Support Standardization Efforts" at <http://www.niso.org>.

[14] "Standards for a DESP: Standards for the Implementation of a Deposit System for Electronic Publications" by Bendert Feenstra, Den Haag NEDLIB Report 4 <http://www.kb.nl/nedlib/results/dsepstandards.rtf>.

[15] Not all of these are full international standards — some are RFCs.

[16] Reference Model for an Open Archival Information System (OAIS), Don Sawyer / NSA and Lou Reich / CSC report: Red Book, Issue, May, URL: <http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html>.

Copyright 2002 Stewart Granger
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Editorial | First Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/february2002-granger