William Y. Arms
Corporation for National Research Initiatives
D-Lib Magazine, April 1997
On March 9-11, 1997, the National Science Foundation (NSF) sponsored a "Planning Workshop for Research in Distributed Knowledge Environments (DKE's)." This story is based on one of two plenary papers given on March 10, 1997. The other was given by Ron Larsen and also appears in this issue. All slides, transcripts, and workshop notes will be made available shortly by the University of Michigan, School of Information.
Ron Larsen and I have been invited to explore the assumptions about digital libraries that are so deeply rooted in our thinking that we take them for granted. By challenging such assumptions, we hope to stimulate a creative agenda for the next generation of digital library research. Ron has agreed to explore the technical framework; I will examine our ingrained assumptions about the social, economic, organizational, and legal worlds that relate to digital libraries.
The talk takes the form of a series of questions, loosely grouped into eight topics. I offer no solutions, but I hope that the questions in themselves will help us relax our assumptions and stretch our minds.
A great deal of effort in digital libraries is motivated by implicit assumptions about the market for information. The creators and purveyors of high quality information are assumed to be driven primarily by a quest for revenue. If we accept this view of information, digital library research must pay close attention to payment systems, security, encryption, authentication, licensing agreements, and the resolution of legal questions of copyright and fair use.
But does the concept of information as a commodity conform to our experience in the on-line world? The enormously successful World Wide Web is based on a completely different view: open access to information. On the web, creators give away information. They invest their time and effort in making materials available, hoping that people will read them. Is open access the norm and payment a special case, rather than the other way round?
In our society where information overload is a universal problem, creators must compete for the reader's attention. Consider, for example, government information, scientific research, advertising in the broadest sense, and commercial information. The creators of this information have incentives to make their information available free-of-charge. In the world of books, people paid for entertainment. With broadcast television, the network gives away the entertainment, so that the viewers will watch advertisements. Should we stop searching for models of digital libraries in which the reader pays, and focus on models where the creator or publisher pays?
Conventional wisdom states that new technology supplements the old, but does not replace it. This view implies that digital libraries will coexist with traditional materials for ever. But is this assumption justified? Does it obstruct our willingness to be really creative in our research? In fact, history shows that new technology can drive out the old. Consider, for example, typewriters and gramophone records. Ten years after the introduction of superior technology, both were essentially dead. Where I work the U.S. mail is following the same track to extinction. If the mail service suddenly ceased, we would hardly notice; it has been replaced by e-mail.
My colleague Amy Friedlander points out that, although the typewriter has become obsolete, the "QUERTY" keyboard survives. Gramophone records are dead, but the business of recorded music is essentially unchanged. Can we envisage a similar transmogrification of libraries, in which the old structures disappear, but some of the essence remains?
If so, what might disappear? Library buildings are expensive; they could disappear from university campuses. Scientific journals are in a period of rapid change. What evidence do we have that traditional journals are a cost-effective way to serve the research community in an electronic world? Over time, scientific journals could cease to be physical artifacts on paper, published by traditional organizations. With cuts in government expenditures and the development of government web sites, do we need the Government Printing Office? Should the Library of Congress celebrate the new millennium with an announcement that it will become purely digital?
In the computing industry, each generation of technology destroys the previous one. Minicomputers drove most mainframe companies out of business; personal computers destroyed the minicomputer firms; software companies and networking firms rise and fall.
Current changes to the information industry are equally big. Modern technology has greatly reduced the barriers to competition in many fields. This allows new organizations to emerge that will shock traditional organizations: self-publishing is a challenge to traditional publishing; Web search firms are a threat to abstracting and indexing services. Will the information industry follow the same pattern as computing, with large successful companies failing to react to change? Almost one hundred organizations are associated with the Digital Libraries Initiative. Ten years from now, how many will be out of business or reduced to a shell?
Photography provides an interesting parallel. To be a professional photographer requires a small capital investment for equipment, maybe a thousand dollars. This permits large numbers of freelance photographers. Printed information depend on large teams and expensive technology, which encourage large, hierarchical organizations. Internet information is more like photography.
Every organization in the information business is probably at risk. Organizations that do not focus on the needs of the real users - the creators of information and the users - are particularly vulnerable. Not all will survive.
Digital library research makes many implicit assumptions about the future of the Internet. Some researchers seem very naive about the changes that are taking place; others preach doomsday. Personally, I am optimistic about the technical infrastructure. The technology is resilient and can absorb growth. The engineering process through the Internet Engineering Task Force (IETF) is under stress, but it is doing all right.
I am less sanguine about the forces of Mammon that have been unleashed by the rapid growth of the Internet. Fraud, junk mailings, invasion of privacy, terrorism, pornography, libel, advertising beer and tobacco to minors, sabotage -- these are serious issues. They are going to cause troubles over the next few years. Remember that the Internet is an international network. It has to merge the social norms and legal structures of every country in the world. We must expect dramatic changes in the network culture and be prepared to see regulations from around the world.
For many years systems developers have wrestled (successfully) with the task of building secure systems on an insecure network. Researchers are now facing the challenge of building applications with good performance on a network that has highly variable performance. Will the next research topic be how to build honorable applications on a dishonorable network?
I assume that text will remain a vital part of how we organize, store, transmit, and convey complex ideas. Over the past few years, we have seen a steady migration of text from paper to digital forms, and this can be expected to continue. But electronic text is subtly different from print. An electronic mail message differs from a letter or office memorandum. When writing an on-line article for D-Lib Magazine, I write differently than I would when writing the same material for paper. For example, I use shorter paragraphs and insert more subheadings.
Books are based on specific technology - paper, movable type, etc. Accumulated deep understanding of the technology has made the codex into a superb artifact. This understanding is manifest in the design and structure of printed books, the organization of ideas into chapters or articles, the use of front matter and indexes, and in the graphical design.
Much current research in digital libraries and electronic publishing attempts to mimic books and journals. Formats such as SGML and PDF were developed for this very purpose. Most electronic journals replicate conventions from the print world. These are important interim steps to smooth the transition to on- line information, but they remind me of Gutenberg's fonts, which were designed to mimic manuscript. Can we break away from the formats and assumptions of print and create something that is as well suited to the digital world as the book is to print?
Searching may be the most confusing of all the topics in digital libraries. Some of the confusion comes from a clash of personalities, between the traditions of cataloguing and indexing, and the optimism of researchers in natural language processing. However, the issues are more fundamental.
One important issue is that information is changing. Electronic information is more like software than traditional publication. A large software system consists of numerous files, specifications, and data sets, with intricate relationships among them; there are numerous versions, large numbers of small changes, and complex dependencies on external factors such as operating environments. Attempts to index and manage software by traditional library methods have been largely unsuccessful. As we index our digital libraries, should we be applying the methods of computer librarians to information, rather than the other way round?
Another issue is the cost of managing an ever increasing volume of material. Until recently, successful search systems involved human cataloging, indexing, or abstracting. These are expensive activities. Some crucial funding comes from the government, notably through the Library of Congress and the National Library of Medicine. What would happen if government cuts became really severe and the funding evaporated?
The Web search programs provide a search service with no human intervention. Some users love them; others despise them. Nobody denies their low cost. The programs are interesting technically, but the companies they have spawned are even more interesting. The companies have energy, expertise, rich investors, and are growing very fast. This past year has been a good one for raising capital, but the investors are looking for the companies to show profits. The companies have to be aggressive or they will fail. Some are going to be very big within ten years. What services will they develop? What existing services will they replace?
At the beginning of the Digital Libraries Initiative, a common question was what balance would emerge among libraries, publishers, and computing professionals. Perhaps this was the wrong question.
During the last few years, a new professional has emerged: the webmaster. At the first meeting of federal webmasters two years ago, over a thousand people showed up - a thousand web professionals where none had existed two years earlier. Webmasters are part librarian, part publisher, part designer, and part computing professional. Most are very energetic, very enthusiastic, very user-oriented - characteristics that not always true of traditional librarians, publishers, and computing professionals. Are webmasters the professionals of the future?
Do we even need professionals? Users are publishing their own materials and managing their own libraries. Conventional wisdom states that this is a short term phenomenon, that the future lies with professionals. Yet we all dial our own telephone calls and drive our own cars. Today, we all need professional help to set up our computers, connect to networks, and use libraries, but we do not need help to install a stereo or use a television. Will librarians, publishers, and computer centers go the way of switch board operators and chauffeurs?
We all know the story of the tortoise and the hare. As researchers, our strength is that of the hare. We can race ahead to a new field, jump in, make a quick impression, and then move on.
The big corporations are the tortoises. When the shape of a new field is understood, these tortoises turn ideas into products quickly and effectively. The industries that have major interests in networked information -- computing, entertainment, publishing, and marketing -- have enormous resources. The biggest academic research program is tiny when compared with these industries. Our challenge in research is to make sure that we keep ahead of the tortoises. We must lead them and not drift into competition with them.
In preparing the research agenda for the next round of digital library research, the government is looking for hares. What should a hare do? Where are the open fields? Our task is to leap into these fields, find fruitful new approaches to digital libraries, and show the way to the tortoises.