The satellite telemetric images now arriving from Mars dramatically illustrate the power of the networked digital information technologies. Not only can the scientists at the Jet Propulsion Laboratory see the images, but in close to real time, so can schoolchildren, avocational astronomers, and the suddenly curious who range from artists who love the images to dreamers who just love the technology.
These images pose interesting challenges for the world of digital libraries and related technologies. Quite apart from the achievement of the data capture and transmission are the thorny issues of metadata, storage, cataloging, discovery, and retrieval. All of these are also part of the related questions of archiving and preservation, which now occupies no small sector of the web-watching population. So here is a rather interesting juxtaposition: On the one hand, the images represent advanced information technologies; on the other, they embody many traditional information management questions - what constitutes an adequate description such that the information can be found? And how much do we need to save? In addition to the image, for example, how much information about the project, or provenance, should the metadata include and at what level of granularity? The collection? The image, which is actually comprised of tiles? Or the tiles themselves?
The first-order response is: everything. But this strategy applied indiscriminately can create inchoate information overload down the road analogous to the famous - and possibly apocryphal - story about the British Public Record Office. Some years back, an archivist is said to have reached into an undifferentiated bundle of documents and pulled out one of the four now authenticated autographs of William Shakespeare. This says less about the organization of the archives than about the magnitude of managing public records that go back 1,000 years, more or less.
Over the years, libraries and archives have developed strategies to create manageable and coherent collections based on judgment and selection, weeding and deaccessioning. Indeed, there are voices in the Net world calling for similar consideration of digital information on-line and off. Nevertheless, it seems to me that the need for preservation, important though it may be, can obscure an interesting feature of networked world, namely, that information can be created on-the-fly, that this information can and should be temporary, and that there exists a need to cope with the short-term that should remain independent of the long-term.
Consider the range of project data that can be created as a consequence of on-line collaboration. We can routinely work with an electronic whiteboard, save the notes, jottings, and scribbled equations as a separate file, and then send the file around a network of collaborators, each of whom can continue to work with parts of it, based on need. Yes, as Andreas Paecke argued last May and David Bearman and Jennifer Trant suggest in this issue devoted to conversion, preservation, and archiving, some of this secondary or informal information will be important to solving future problems. But not all of it will. For example, I cannot imagine that future researchers will really want or need to see the five or six drafts of this editorial that exist or the comments I received on it, which range from typos to wording to substance.
For the near-term, we will need tools to store and manage data that are intentionally temporary. We can imagine temporary subject classification schemes based on jargon, date, and project personnel meaningful to the participants but to no one else. These would have an intentionally short life; they would not be expected to solve the information problem traditionally managed by subject thesauri but rather to enable people to work together and to find relevant information in the near term. By separating the near-term and the temporary from the long term and the permanent, the traditional objections to thesauri -- that they are labor-intensive and subject-specific - would not arise. At the conclusion of the project, we can envision a systematic review of the project data to establish the permanent record and to discard what is temporary and irrelevant - like interim whiteboard notes, e-mail among staffers, and preliminary sketches and drawings.
We are all rightly concerned about temporary fixes that somehow become permanent. Still, the coming world of groupware means more and more informal information. Some of it may well merit archiving. But some will not. Indeed, thinking about what is temporary may help us clarify what it means to be permanent. Even if it does not, there is increasingly a place in the digital information environment for tools that help us manage transitory information, created literally on the fly.