Digital Library Metrics:
Uncertainty and Failure

Position Paper for the D-lib Metrics WG (INITIAL DRAFT)
Carl Lagoze (lagoze@cs.cornell.edu)

Effective measurement of an entity first requires a reasonably precise definition of that entity. What are the characteristics of a digital library? What others entities resemble a digital library or could be considered components of it? Database systems? Traditional libraries? Information Retrieval systems? There is, in fact, a broad spectrum of definitions of what constitutes a digital libraries (from a CD-ROM on a single desktop PC, to a mainframe-based Library Management System, to the World Wide Web itself). While each of these definitions is appropriate in a specific context, this working group should concentrate on measurement of federated digital library systems with the following characteristics:

A notable characteristic of such systems is the presence of uncertainty in its operation and the ever presence and possibility of system failures. Our experience at Cornell with NCSTRL (Networked Computer Science Technical Report Library), a limited instantiation of such a federated digital library, has demonstrated a number of failure scenarios.

These are but a few of the possible sources of failure that will appear in future federated digital library systems. The success of the architecture or infrastructure for such systems will lie in its resiliency in the presence failures. This resiliency may have a number of characteristics including insulating the user from failures through backups and mirrors, limited asynchronicity, controlled degradation, and user assistance in failure recovery.

One of our initial goals in the metrics and measurement working group should be a clear enumeration of the failure modes in federated digital library systems. Once we enumerate this set we need then to enumerate the recovery models for failures, with quantitative metrics attached to each recovery model (for example, a failure response where your desktop system crashes is obviously least desirable!). Having done this, we can then proceed to specify test suites that script these failure models and measure system responses to the various types of failures, and measure these responses using the recovery metrics.