In its inaugural issue, PreText Magazine featured digital libraries. With Nancy Davenport, Paul Duguid, Andrew Odlyzko, and Robert Wilensky, I agreed to participate in the magazine's on-line forum. It was an interesting experience - partly from the way we tiered off of comments that had been made, partly for the relative formality of those comments - sentences were balanced and paragraphs were clearly crafted - and partly for the interaction with moderator, Dominic Gates, who carefully copied us all via e-mail as the comments were appended and amended, and who periodically tossed a firecracker into the circle. The following essay is the substance of my first contribution, which followed a discussion launched by Andrew Odlyzko on the tensions among conventional libraries of books and bricks, the problems posed by trying to archive the web, and the future of librarians and librarianship.
He concluded that "the combination of human intuition, skills, and knowledge will likely provide the most powerful information systems," just as the best chess will be played by a "combination of human tactical skills and computer tactical power." That is to say, the wetware in front of the monitor counts at least as much as the soft- and hardware behind it. Or perhaps more, since the software and hardware did not come about absent human intervention, and they embody human and cultural assumptions about engineering and the way the world works as surely as the messy technologies of literature and art. For example, at a recent conference, a very fine computer scientist told me, half laughingly, half plaintively, that all the hard work that went into designing a system seemed wasted, because users weren't using the system "the way they're supposed to." The writer's equivalent lament goes something like, "but can't they understand what I meant?"
The point here is not improving system design by better user and usability studies, which is a worthwhile discussion. Rather, I would prefer to pursue a somewhat larger issue, which is the existence of assumptions about the world embodied in the engineering and in particular those assumptions related to digital libraries and networked information. And there are many, one of the most fundamental being that we should treat all of the information accessible via the web - or the Internet - as an enormous library that should be equipped with the kinds of tools and services that we associate with an idealized view of a "library." An obvious illustration of this conception of the web is Dominic Gates' own story on the Universal Library -- which is, in Robert Wilensky's lovely image, "everywhere and nowhere."
If we, indeed, take the library metaphor seriously for a moment, the first thing we realize is that there are many kinds of libraries and many kinds of users. And many ways we look for information. For example, when I'm working like a journalist, I frequently call rather than use e-mail because I am working against a tight deadline, or I want to be able to follow up on an answer in real time. However, some of us, myself included, enjoy the process of search as a way to refine and understand a topic; this is particularly useful when I am acting like a writer working on a new book. On the other hand, when I am behaving like an editor and I want to double-check the spelling of a name, a classic known-item search, then all I want is reliable information quickly. And most of the existing search engines do that very well for the authors I work with. So the issue is not simply whether existing search engines will scale to the web of the future, although that seems like a laudable albeit difficult goal. Rather, the fundamental issue concerns human behavior and technology.
Note, though, the importance of reliability in these examples. Traditional libraries, archives, and finding aids possess that attribute in their basic definition; it inheres in their collections policies, organizational structure, and selection of journals to abstract and index. For example, I once found a previously-unidentified document by Robert E. Lee, who has come to personify the anguish of choosing among loyalties during the American Civil War (1861- 1865) as well as dignity in defeat. The document was a routine maintenance report he had written while posted to Fort Hamilton, New York in the 1840s, noticeable initially only because the handwriting was remarkably legible and the sentences were grammatical - at least until I reached the signature line. I found it in a box of similar reports, which I was dutifully reading for research on the Fort's batteries. It is not surprising that the document was filed there, nor that it had gone unidentified as a Lee autograph.
Now, I think that it is fair to argue that this is precisely the situation that the digital world promises to remedy. Details and facts can become findable more easily. But I worry about losing the value of context. In this example, where I was interested in the fortifications and not Lee, the fact that Lee wrote the report was less important than the information it contained. But for a moment, a vision of a talented young officer grinding away at routine tasks like many other talented young officers of his generation was instantly conveyed by virtue of the location of the document Moreover, its location - in a box in the National Archives with similar documents - provided undisputed provenance and authentication.
I also think that it is fair to argue from this example that here's a situation in which archiving virtually everything worked. But while I applaud Brewster Kahle's pioneering efforts to archive the web and to come up with an innovative way of searching the extraordinary collection he is creating, it is worth pointing out what professional archivists have long known: not everything needs to be saved - even if the intellectual property issues were resolved. Heaven help us if I saved the "versions" of stories where the difference between one "version" and the next consisted of correcting the spelling. Particularly with technologies where repetitions can become conflated with measures of relevance, and where accuracy of the information is important and subject to change, based on new research, it seems to me that the human intervention becomes more rather than less important and hence discarding misleading information matters.
A good example of an attempt to evaluate sites is described in the most recent issue of the Tufts University Diet & Nutrition Letter (the site itself is at http://navigator.tufts.edu). The authors point out the dangers of misleading or inaccurate information and explain a ranking system that they devised which considers several dimensions: depth, accuracy, performance, and display. People do the evaluation, but the outcome is computational. Of course, this is domain-specific rather like traditional pathfinders, subject bibliographies, or review articles in the print world. The virtue in the digital world is that maintaining currency in information contexts like health and medicine where currency matters is much easier.
Researchers engaged in design of human-computer interfaces (HCI) would probably be among the first to agree that many user communities must be served and that HCI, search, and retrieval should be considered facets of the same problem rather than bounded disciplines (see, for example, last January's story by Bruce Croft, Ben Shneiderman, and Don Byrd). But I remain unconvinced that all issues can be solved by technology - although many can, and the process of thinking about it tells us much about ourselves and how we seek, use, and share information. Should we try to make it easy to go from one kind of data to the next? Where heterogeneity can be managed and disparate systems made interoperable? For example, where it is easy to find images of Leonardo's paintings, analyses of the deterioration of the paints and successive restorations, copies of his notebooks, and authoritative discussions of his life, which may now be physically distributed across several countries and in several languages? Of course. But the point is that boundaries and categories possess meaning, and traversing boundaries among heterogeneous data is useful. Moreover, we also learn by trial-and-error and by making mistakes, and we will need tools to support the quest as well as places for storing the answer. So to strive toward one great library, some wonderful castle in the air where all knowledge resides and one query will always yield The Answer, may be the siren's song rather than the holy grail.