This Working Group is to develop a consensus on an appropriate set of metrics to evaluate and compare the effectiveness of digital libraries and component technologies in a distributed environment. Initial emphasis will be on (a) information discovery with a human in the loop, and (b) retrieval in a heterogeneous world.


Much of digital library research is experimental or exploratory. Research projects lead to demonstrations, pilot systems, and eventually to deployment in production systems. Currently, there are few ways to evaluate the effectiveness of research, or to measure progress towards long-term goals. A notable exception is information retrieval, which has been greatly enhanced by the existence of the well-established measures of precision and recall. These metrics, in conjunction with standard corpora that can be used for testing and evaluation, have helped further the state of the art by allowing researchers to do comparisons and evaluations on a fair comparison basis.

While these measures have been very useful in evaluating and comparing "single site" search and retrieval mechanisms, the richness of the digital libary environment demands a much richer set of metrics. Metrics are required to deal with issues such as the distributed nature of the digital library, the importance of user interfaces to the system, and the need for systems approaches to deal with heterogeneity amongst the various components of the digital library.

Working Group Objectives

The objective of this working group is to develop usable and useful metrics for the more complex world of distributed heterogeneous digital libraries. A parallel initiative plans to establish a test suite of library collections that can be used for collaborative research and as a set of corpora for evaluation. Efforts to accumulate evaluation software for standardization and reuse will also be encouraged.

The working group will initially focus on metrics for  information discovery and retrieval. Information discovery in digital libraries is more complex than the classical problem of information retrieval. One difference is that there is a set of seeking tasks with varying criteria for success. In practice, most information discovery is an iterative process that includes searching, browsing, filtering, and other complex interactions between human understanding and computer processing. Current metrics measure the performance of discrete steps in this process, but not the overall success.

Likewise, retrieval in the distributed digital library environment requires increased attention. As library objects become more complex, it becomes increasingly common for a user to discover the existence of an object, but not be able to access it effectively. Problems range from incorrect references (broken links), access restrictions, mismatches between the MIME types that are supported, system incompatibilities, and so on.

Because of the importance of the human in the loop, we expect to draw metrics from a broad set of relevant fields, including but not limited to those as diverse as psychology, engineering, and human communications. Our ultimate objective is to be able to measure and document the impact of particular system concepts or features, in specific settings, for specific user communities with specific purposes.  Therefore, ideal metrics will be meaningful to all stakeholders, reproducible, and inexpensive.

The D-Lib Working Group on Digital Library Metrics is co-chaired by William Arms and Barry Leiner

