Taking the Measure of the Net
This month, Clips and Pointers identifies several sites that are intended to estimate user-ship of the Internet. I am well aware that "user-ship" is not a word, but have used it, nonetheless, not because I like making up words on the fly (I don't), but because I would like to focus our attention on the environment of using the network as much as on the difficulties of counting how many of you there are.
But first a word about demographics and metrics; they matter a lot. In the analog worlds of commercial print publishing and broadcast media, business models rely heavily on estimates of circulation (readership) and viewers -- the well-known Nielsen ratings. Through various means, these metrics attempt to answer two questions: how many viewers/readers are there? And who are they? Clearly, similar demographic issues underlie potential commercial applications of the Internet and its successors. Not surprisingly, the much of the demographic work being done is, in fact, at least partially supported by business, either via a business school, e.g., the Hermes Project at the University of Michigan, or from business itself, e.g., Project 2000 at Vanderbilt University and the recent demographic study conducted jointly by CommerceNet and Nielsen Media.
D-Lib is an experimental magazine, and we are quite interested in demographic and user information. For example, we use it to assist in story selection: what stories do people read? What stories seem to have long legs, that is, we continue to see them accessed month after month after month? Where should we strengthen? Where should we cut back? When does usage concentrate? Not surprisingly, during east coast work hours, roughly 8:00 am to 6:00 pm, and many of you seem to be associated with academic institutions. These findings, by the way, are generally consistent with other characterizations of Internet usage.
Time of day, pattern of file requests, and domain/sub-domain name are, of course, all characteristics of use that we can identify by looking at logs, that is, by looking at information internal to the operation of the magazine. It is passive from the user's perspective; we do not have to ask any additional questions. There are clearly limitations. Domain names are not equivalent to users, for example, nor are total file requests. D-Lib's contents page is comprised of at least five files, assuming that readers have not suppressed images, so total requests per day provides a flattering albeit inaccurate view of our popularity. Obviously, we look elsewhere in the structure for a sense of how and how widely we are read.
That said, the other approach is to survey actively, that is, to solicit information as many magazines do through registration systems or through participation in on-line surveys, or to undertake a survey in another medium, through voice or mail. On-line surveys were pioneered by the Graphics, Visualization, and Usability (GVU) Center at Georgia Institute of Technology in collaboration with the Hermes Project at Michigan. An example of a study that combined telephone surveys with an Internet-based approach is the recent study completed by CommerceNet and Nielsen. Two other projects, HomeNet (reported in this magazine in October 1995) and Blacksburg (Virginia) Electronic Village, are also examples of important research projects designed to look at implications of ubiquitous, on-line communications, but they address questions more general than who is reading us, and how many of you are there? The on-line survey approach, implemented by Blacksburg as well as by GVU/Hermes, potentially suffers from a self-selection bias, which researchers seek to offset by encouraging widespread participation. Yet one of the findings of the CommerceNet/Nielsen study is that this self-selection bias has, indeed, had a distorting effect.
The "inside/outside" survey problem is not dissimilar to problems that librarians face when trying to evaluate patterns of usage, which is essential to collections development as well as to facilitities maintenance. Circulation statistics, whether gathered at the desk or at re-shelving stations, tell you what books, bound journals, and microfiche have been selected -- not read -- and surveys of users encounter the usual sorts of negative responses, which range from tossing the questionnaire in the trash to hanging up on the telemarketer. Thus, there is also a self-selection bias to these types of approaches although it is argued that the bias is smaller and more easily offset through controlled survey design up front.
Passive tracking purely through the communications technology raises issues relating to users' privacy. Yet neither am I happy with self-selection biases or resorting to survey techniques that may irritate users. (My own first response to the little cards left hotel rooms asking me if I enjoyed my stay is, "I did until I saw this card.") But what is most interesting to me is how little progress we have made. Many of the evaluation and measurement strategies have simply migrated from the analog to the digital world taking along their problematic baggage. So, perhaps this is telling us that people are people, and advances in technology don't change that. But too much is at stake to simply propagate old limitations, and research into the underlying human/computer interactions with an eye to devising new means of statistical measurement and inference would seem to be a valuable and necessary component of the digital libraries agenda.