Digital Library Metrics:

Digital Library Metrics:
Uncertainty and Failure

Position Paper for the D-lib Metrics WG (INITIAL DRAFT)
Carl Lagoze ([email protected])

Effective measurement of an entity first requires a reasonably precise definition of that entity. What are the characteristics of a digital library? What others entities resemble a digital library or could be considered components of it? Database systems? Traditional libraries? Information Retrieval systems? There is, in fact, a broad spectrum of definitions of what constitutes a digital libraries (from a CD-ROM on a single desktop PC, to a mainframe-based Library Management System, to the World Wide Web itself). While each of these definitions is appropriate in a specific context, this working group should concentrate on measurement of federated digital library systems with the following characteristics:

Distributed services that are administered by independent parties.
Defined protocols for inter-operation between these services.
Heterogeneity throughout the system, such as different search engines, variety in the objects stored, variety in the schemes for protecting intellectual property, and variety in the types of user interfaces to the services.

A notable characteristic of such systems is the presence of uncertainty in its operation and the ever presence and possibility of system failures. Our experience at Cornell with NCSTRL (Networked Computer Science Technical Report Library), a limited instantiation of such a federated digital library, has demonstrated a number of failure scenarios.

Network Partitioning - A server that is part of the federation becomes unavailable due to a failure in the network.
Server Failure - A server that is part of the federation becomes unavailable due to failure (hardware or software) of the server.
Network Latency - The response time of a server that is part of the federation becomes abnormally slow due to network overload (or other problems).
Server Latency - The response time of a server that is part of the federation becomes abnormally slow due to server overload (or other problems).
Protocol Failure - A server that is part of the federation response to protocol requests with incorrect or invalid responses.
Administrative (Semantic) Failures - The quality of a service is compromised by poor administration; for example, metadata is of poor quality, document updates are done infrequently, etc.

These are but a few of the possible sources of failure that will appear in future federated digital library systems. The success of the architecture or infrastructure for such systems will lie in its resiliency in the presence failures. This resiliency may have a number of characteristics including insulating the user from failures through backups and mirrors, limited asynchronicity, controlled degradation, and user assistance in failure recovery.

One of our initial goals in the metrics and measurement working group should be a clear enumeration of the failure modes in federated digital library systems. Once we enumerate this set we need then to enumerate the recovery models for failures, with quantitative metrics attached to each recovery model (for example, a failure response where your desktop system crashes is obviously least desirable!). Having done this, we can then proceed to specify test suites that script these failure models and measure system responses to the various types of failures, and measure these responses using the recovery metrics.