Nancy A. Van House
Mark H. Butler
School of Information Management and Systems
University of California
Berkeley, CA 94720-4600
[vanhouse, markhb, lschiff]@info.sims.berkeley.edu
D-Lib Magazine, February 1996
This paper illustrates the application of user needs assessment and evaluation in iterative, user-centered design of a component of the University of California, Berkeley Digital Libraries project named Cypress. This report, while specific to Cypress and the UC Berkeley Digital Library project, is illustrative of the application of user-centered iterative design to digital libraries (DLs) generally. It demonstrates how a relatively straightforward, moderate level of effort involving users, not just for feedback on an interface, but for an understanding of how they do their work, resulted in a significant improvement in the design of a component of the UC Berkeley project; and it reports some of the principles that came out of our investigations and how they applied to this particular DL.
We deliberately call our part of the project "user needs assessment and evaluation," not simply "evaluation," to emphasize that this process takes place before, during, and after design, with the emphasis on before and during [Van House]. For user concerns to be incorporated into the DLs, the action is in design, not post hoc evaluation [Newell]. User-centered design, and iterative design, are advocated in the Human-Computer Interaction (HCI) literature but not as widely practiced as their proponents believe is necessary [Gould], [Norman], [Adler], [Greenbaum].
The UC Berkeley Digital Libraries project, funded under the NSF/NASA/ARPA Digital Libraries Initiative, focuses on publicly held environmental information, especially, at this point, information concerning water. The primary user is the Department of Water Resources a branch of California's Resources Agency. Cypress is an image retrieval system that is linked to the larger UC Berkeley Digital Library.
Digital libraries can be described and evaluated on three key components: contents, functionality, and interface. Usability assessment, as the term is used in the HCI community, usually addresses only or primarily interface design [e.g. Nielsen, Usability Engineering], which is too narrow a basis for evaluating something as complex as a DL. Usability has been defined as "[a system's] capability in human functional terms to be used easily and effectively by the specified range of users, given specified training and support, to fulfill a specified range of tasks, within the specified range of environmental scenarios" (Shakel quoted in [Dillon], p.14).
Digital Libraries support high-order cognitive work. Evaluation of a DL's effectiveness, therefore, has to be in terms of its impact on users' work. For DLs to be truly useful, designers need to first understand the larger context that determines their information needs and purposes for using the DL, that is, the context of the users' work; the individual user's specific work and tasks; his or her information acts (including information searching, analyzing, repackaging), and, finally his or her DL use.
To this end, designers have to be willing to engage in user-centered iterative design. This requires: (1) demonstrated successes in its use, to illustrate the ease and value of its application, (2) an array of methods for collecting, representing, and incorporating user needs, so that any project can adopt those that are most appropriate -- in terms of speed, level of effort, and issues addressed -- to the problem at hand, and (3) the building of partnerships between designers and usability assessors.
Cypress is an on-line database of about 13,000 color images and associated metadata from the Film Library of the California Department of Water Resources (DWR), a division of the California State Resources Agency. The Film Library commissions photography, and collects, maintains, catalogs, and provides access to images on a variety of subjects, primarily for DWR's employees, but available to other agencies and the public. The Cypress images are a subset of the approximately one-half-million-image collection of the Film Library individually chosen based on the popularity of the subject matter and the quality of the image. Digitizing the slides for loading into Cypress is the first step of a long-term project on the part of the Film Library to eventually digitize all of their images and their image handling.
The precursor to Cypress was Chabot, developed as a master's thesis by Ginger Ogle in UC Berkeley's Computer Science Department, described in a paper by Ogle and Stonebraker [Ogle], here . Cypress is available via the World Wide Web and is currently managed by Ogle. Other components of the UC Berkeley DL are linked to Cypress, but this discussion addresses Cypress alone. Cypress is under continual development, so the current implementation may include functionality that doesn't exist in the versions described in this paper.
Because Cypress is part of a research project and is not intended to be a shrink-wrapped product for users, a primary goal is to experiment with functionality by developing more sophisticated, efficient ways of doing image searches. Cypress combines the ability to search on both textual and, most notably, image attributes. One of the first image attributes made searchable was based on color analyses of each image. A more recent implementation adds the capability of searching for color "blobs"--clusters of color within an image that are likely to correspond to objects such as yellow flowers, a red car, or an orange fish. The combination of text and image attributes can be quite powerful: for example, searches on the text "Ronald Reagan" and the image attribute "mostly blue" retrieves images of then-Governor Reagan speaking on outdoor podiums with a background of blue sky.
We chose Cypress to explore user-based, iterative design and its benefits for several reasons. First, the image collection is very useful both on its own and as an adjunct to the other components of the UC Berkeley DL. Second, the collection is currently heavily-used by a number of people within and outside of the DWR. Third, the DWR Film Library staff and management were eager to work with us; user-centered design is highly dependent on the willingness of users to engage in the process. And, finally, the primary designer and implementer of Cypress, Ginger Ogle, was extremely amenable to working with the user needs assessment and evaluation team.
The Film Library serves users outside of DWR, in other California state agencies and other organizations and individuals. Because these are not the primary clients of the Film Library, and because of the heterogeneity of their needs, uses of the images, and other characteristics relevant to design, we are not addressing at this time users outside of DWR.
For the purposes of this project, we define user requirements as the functionality needed by users to get their work done. Initially, we thought of the user requirements for Cypress as simply the retrieval of images with specific textual and/or graphical attributes. However, our work with various stakeholders has shown us that these requirements can and must be defined more specifically, and that they vary from group to group. For example, report writers may need to retrieve images of a particular dam with a human being in the scene to indicate scale. Graphic designers may want images of a given location but also in a specific range of colors to match the color scheme of the artifact they are creating. People preparing publications may be concerned with how well a particular image can be cropped to fit the available space while meeting content and aesthetic requirements.
Over the Summer and Fall of 1995, we interviewed various groups of users and observed them going about their regular work and using Cypress. Then we made re-design suggestions for Cypress. Some of the suggestions came directly from users; others came from out observations of users' difficulties.
Table 1 details the changes made to Cypress as a result of our observation of and interviews with users. This section describes at a higher level of generality our findings applicable to other DLs as well as the UC Berkeley DL. Many of these, not surprisingly, appear elsewhere in the literature. If they did not, this would raise serious issues of the generalizability and cumulativeness of DL research. The principles that appear in the literature, however, are easily reduced to platitudes. The emphasis here is on how these findings apply to a specific Digital Library and how they reinforce the need for design to be user-centered and iterative.
A. Leverage users' existing knowledge
The Film Library staff's extensive knowledge of the photo database was used to make their searching more efficient, such as allowing searching on all existing metadata fields and using existing field labels and terminology. Other users, however, did not share the Film Library staff's knowledge. The issue, then, is how to use existing knowledge when the users and their knowledge are heterogeneous .
Our solution was to provide two query forms and results displays: one for "insiders," such as the Film Library staff, consistent with the Film Library's existing system and providing minimal guidance; and one for "outsiders."
B. Provide enough information to guide users, not enough to confuse them.
Nielsen [Nielsen94] describes this as "aesthetic and minimalist design." This requires, of course, an understanding of what users know and find useful or confusing.
1. Simplify the query form
Outsiders were confused by the wealth of information and choices that insiders found useful. We created a simpler query form for outsiders by removing fields not useful to them, such as photographer name. People outside the film library also needed better labels for fields -- e.g. the original set of fields included one called "subject" and one called "category." Occasional users were confused about how to use the form to submit a query, so we added some instructions.
2. Display non-redundant information only.
Corollary: display the metadata that provides the most added information across retrieved items, and/or across elements for a single record.
Rather than display complete metadata for each image, for the outsiders we selected the most useful fields based on our interviews. What is useful depends on what users know and their purposes for using the DL. "Useful" turned out to be those elements that helped users to select among the images displayed. For example, Instead of displaying the values for fields searched on -- which had the same value for each retrieved image -- we displayed the most unique field, which was the caption for each image written by the photographer. Metadata elements often correlated; for example, subject and description are often very close -- so only the description field was displayed.
C. Match the system to the real world [Nielsen94]
Users of photographic images are used to viewing contact sheets to examine a set of related photos. When they saw the display of retrieved images, some users asked that the set be printable like a contact sheet; this is currently not possible but may become possible via Java "applets." Simulating an existing artifact can improve the integration of the DL with current work practices.
D. Rely on recognition, not recall [Nielsen94]
The list of categories, a controlled vocabulary, was presented as a pull-down from which users could choose. DWR employees share a common vocabulary (e.g. SWP for State Water Project) which was used to categorize photos.
Eliminate as much keying as possible.
When users share a vocabulary, shortcuts are possible. Employees of DWR would rather type "SWP," a familiar acronym, than "State Water Project." In addition, we provided a button for a major dichotomous field (aerial vs. ground photos).
Always give users the ability to quickly review results to determine whether the search is on track, and to modify the search or bail out entirely. When transmitting images over a network, this is critical. The early version of Cypress downloaded the entire set of retrieved images, with no opportunity for the user to determine that the search had not yielded the needed results and to cancel it. For us, this meant segmenting the results set and letting users quickly see that "Ronald Reagan and mostly red" did not retrieve pictures of Mr. and Mrs. Reagan (despite the latter's well-known penchant for red dresses).
Specifically, a caption should be attached to the photo, not displayed elsewhere on the screen. In our first display, users had to scroll up and down to see the photo and then to see its associated metadata. More generally, the metadata should generally be displayed adjoining the object retrieved.
Film Library staff use 17-inch monitors, which are not common, so the "insiders'" display was four images across while the "outsiders'" was three across.
For Web users, telecommunications capacity must be taken into account; downloading the results in segmented subsets takes less time than downloading larger results sets. Many of our users have limited telecommunication bandwidth; the DL must accommodate such issues as this or risk not being used.
Most of these principles above are, at first glance, platitudes. However, we derived them from actual problems that users were having. Furthermore, implementing them in an actual system requires that designers understand who the users are, how they do their work, the purposes for which they will use the DL, their needs and prior knowledge, and their response to the prototype: in other words, user-based, iterative design.
Our use of an iterative, user-centered approach resulted in great improvements to the design of Cypress. This approach raises important questions. At what level of specificity do you stop when creating interfaces for different groups of users? Is it sufficient to create three interfaces: one for those in the film library, one for those in DWR, and one for those outside of DWR? What about the different groups outside DWR?
The tension, of course, is between economy and customization of design. We addressed this through the rather common strategy of two interfaces: expert and novice, or insider and outsider, a crude but effective means of customizing the system for different audiences.
This kind of information could only have been acquired and incorporated into the design of Cypress by an iterative design process, that allowed us to test successive approximations of the "final" system; and by means of user-centered design, that allowed us to understand users, their work, and their needs and preferences for Cypress and fine-tune it accordingly.
Dillon, Andrew. (1994) Designing Usable Electronic Text: Ergonomic Aspects of Human Information Usage. Bristol, PA: Taylor and Francis, Inc.
Gould, John D. and Clayton Lewis. (1983) "Designing for usability -- key principles and what designers think" Human Factors in Computing Systems, CHI '83 Proceedings. New York: ACM.
Greenbaum, Joan, and Morten Kyng. (1991) Introduction: Situated Design. In Greenbaum and Kyng, eds. Design at Work: Cooperative Design of Computer Systems. Hillsdale, NJ: Lawrence Erlbaum.
Newell, Alan, and Stuart K. Card (1985). "The Prospects for Psychological Science in Human-Computer Interaction." Human Computer Interaction 1():209-242.
Nielsen, Jakob. (1993) "Iterative User-Interface Design" IEEE Computer, 26(11):32--41.
Nielsen, Jakob. (1994) "Heuristic Evaluation," Chapter 2 in Nielsen, Jakob, and Robert L. Mack, eds. Usability Inspection Methods. New York: Wiley. pp. 25--62.
Norman, Donald A. and Stephen W. Draper. (1986) User Centered System Design. Hillsdale, NJ: Lawrence Erlbaum Associates, 1986. pp. 31--61.
Ogle, Virginia E. and Michael Stonebraker. (1995) Chabot: Retrieval from a Relational Database of Images, IEEE Computer, 28(9):40--48 .
Van House, Nancy. (1995) User Needs Assessment and Evaluation for the UC Berkeley Electronic Environmental Library Project: a Preliminary Report. Digital Libraries '95: The Second International Conference on the Theory and Practice of Digital Libraries, June 11-13, 1995, Austin, TX.
Available at: http://csdl.t amu.edu/DL95/papers/vanhouse/vanhouse.html
|FEATURE||INITIAL STATE||1st Iteration||2nd Iteration|
-a subset of metadata fields
|all metadata fields||different, more appropriate subset of fields||same as 1st||relabelled, more self-explanatory|
|Other||NA||NA||NA||Button added for aerial/ground||Button added for aerial/ground|
|Images||Single column, lots of scrolling||3 across||3 across||4 across - for 17" monitors||3 across|
|Text||Complete text; below set of images||subject and category fields with each image||subject and category fields with each image||image with caption; complete text below||image with caption and # for ordering reproductions|