Sally Jo Cunningham
Real-life digital library systems require evaluation for many reasons, such as supporting funding and sustainability, and to give feedback to digital librarians and digital library management on ongoing library operations. Available academic evaluation techniques may be high effort and seek higher levels of confidence and deeper claims than are needed or appropriate. Lightweight evaluation methods can therefore play an important role in digital library research. This workshop brought together a range of participants who were interested in effective evaluation techniques that can be applied with minimal expertise, specialist apparatus or financial cost.
There is an increasing emphasis by digital library funding agencies on effective evaluation. Successful digital library evaluation underpins both incremental improvements in services and also long-term sustainability and viability (Reeves et al., 2004). Reviews of the digital library evaluation literature (Saracevic 2005) suggest, however, that while there is a variety of ways to evaluate digital libraries, there is no substantial agreement on which approach works best. Digital libraries of different sizes, levels of development and maturity may require different evaluation models tailored to a variety of local conditions (Khoo et al., 2009).
Lightweight evaluation techniques are of relevance to both practitioner and academic digital librarians. In the case of digital library practitioners, while they may know how to run their digital library, they are often not evaluation specialists; as a result, evaluation is often a consideration left until the end of a project, or to external evaluators. As a consequence, these evaluation efforts often focus on summative evaluation, and ignore ongoing useful formative evaluation work. Designing any kind of digital library evaluation can also be resource intensive, and significant evaluation barriers for many digital libraries include the lack of resources such as funding, staff, time, etc. (Bartolo et al., 2006), Library administrators may also be cautious with regard to summative outcomes-based evaluation, which can be seen as a form of surveillance (Frechtling, 2002).
In the case of digital library researchers, the focus of research is often on technical issues (e.g., information retrieval methods, software architecture, etc.) rather than on user-centered issues. When these researchers turn to user based evaluations, they therefore often lack the necessary expertise to develop robust Human Computer Interaction (HCI) experiments, and their goals are typically limited to "proof of concept" tests, rather than prescribing user motivations or cognitive impacts. Even for HCI-centered researchers, high-effort methods are often inappropriate when problems are not yet well understood, and an initial study is required to inform the goals of later, more detailed studies.
The development of lightweight evaluation know-how therefore addresses a number of interrelated issues related to educational technology and digital library development for both digital librarians and researchers. There is substantial need for lightweight frameworks and techniques that digital librarians can use to evaluate digital libraries in ways that address their specific needs within local resource constraints. These frameworks and techniques should be capable of generating local and appropriate evaluation outcomes that are also relevant for agencies, funders, management, and other stakeholders.
To address these and related issues, this JCDL workshop brought together scholars and practitioners from a range of digital library perspectives, with an interest in user-centered evaluation approaches. Invited participants were asked to present and discuss issues associated with the lightweight user-centered evaluation knowledge for digital librarians. By "lightweight," the organizers suggested the discussion about methods that are both effective and less time- and effort-intensive. By specifying "lightweight" evaluation, the organizers intended contributors to focus on methods that are both effective and also less time- and effort-intensive. In suggesting "user-centered" evaluation, the organizers assumed that current evaluation models may not take digital librarians' needs into account, and that substantive evaluation results can be obtained from tools and methods that support digital librarians' individual concerns. By identifying evaluation "knowledge," the organizers intended to encourage discussion of practical knowledge that can be applied to evaluation questions on an ongoing basis. Finally, by mentioning "digital librarians," the organizers sought to include anyone involved with digital library, collection, or repository management and administration, regardless of their professional training or background. The workshop call for participation outlined these themes, and requested the submission of short position papers that explored various dimensions of lightweight evaluation techniques. The discussion of suitable case studies was encouraged. These position papers are being made available in this issue of D-Lib Magazine, along with this workshop report.
The workshop consisted of a series of panels and presentations that brought together academic and practitioner approaches to lightweight digital library evaluation. In addition, an exercise based on the use of personas and scenarios that modeled the design of a lightweight evaluation strategy was held. Some of the main issues raised by the presenters are summarized here; for further details, please see the individual papers archived below.
Khoo introduced the workshop with a review of a survey of evaluation practices in the National Science Digital Library in 2006 (Bartolo et al., 2006), noting that evaluation was an activity that projects often commenced with good intentions but often left unfinished, often as a result of limited funds, staff members, time, and evaluation skills.
Buchanan then provided an introduction to the concept of lightweight evaluation in human-computer interaction (HCI), based on the significant reduction of experimental complexity while at the same time retaining a high degree of validity. This can be achieved by reducing many of the scope issues involved in HCI experiments, by reducing the number of test subjects, by focusing on particular groups of users and on important/prioritized tasks, and also by using modified instruments for data collection and analysis that support "on-the-fly" coding during the experiment itself.
Wilson and schraefel1 introduced Sii, a lightweight inspection technique for search interfaces. Building on HCI inspection techniques such as heuristic evaluations and cognitive walkthroughs, the authors adapted Belkin et al.'s model of information seeking tactics, which identifies 16 generic forms of information seeking behavior, to identify the number of clicks necessary to achieve each behavior in a particular interface. The technique permits the rapid expert evaluation of the usability of search interfaces for various "model" users, without the need for prolonged user-testing.
Xu outlined the application of a requirements framework for information systems, the Zachman framework for information systems architecture. The Zachman framework supports the development of various profiles for different stakeholders involved in system development, and can be adapted to profile users. The different dimensions of these profiles can then be used to design questionnaires and surveys.
Morrill and McMartin provided an outline of the use of user panels in digital library evaluation. While more complex to organize than some other evaluation techniques, user panels permit the longitudinal evaluation of digital libraries and the identification of use issues that may be missed in "one-shot" evaluations. The authors provide initial lessons learned from the use of user panels with two digital libraries.
Morgan and Wolf traced the iterative development of a survey instrument for digital libraries. Noting that surveys can be aimed at different groups of digital library users, including non-users, and that these different groups can have different understandings of what digital libraries are and can do, they describe how one survey was iteratively developed to make it relevant and useful to various groups of digital library users.
Cunningham described a number of lightweight techniques and approaches to collecting and analyzing large-scale evaluation data. Cunningham's examples included carrying out a series of small, targeted studies of different aspects of the evaluation problem of interest, and triangulating the results; locating and analyzing existing relevant user data on the Web, such as that provided by bloggers and online communities who discuss the use of digital collections; and recruiting 'proxy' researchers, such as students in HCI classes, to gather ethnographic data in a range of settings.
Finally, Morrill described the benefits of target user profiles and user segmentation as part of the digital library evaluator's toolkit. Target user profiles include descriptions of users, and as such they can help to support and focus both formative and summative evaluation efforts by providing a "stake in the ground" for both evaluation and ongoing library development. Morrill provides examples of the use of target users from a corporation, a physical library, and a digital library.
These academic and practitioner perspectives provoked some lively discussions amongst workshop participants, and these discussions were often prompted by the practical questions and concerns raised by workshop attendees interested in digital library management and evaluation. There was an interesting discussion (subsequently carried on by email) regarding the possibilities for and barriers to sharing digital library evaluation data and case studies, perhaps in the form of a "Digital Library of Digital Library Evaluation" (as Morgan and Wolf put it). Such an idea is increasingly relevant, not least because digital library administrators and evaluators often feel that they are "re-inventing the wheel" when they approach the question of evaluation. Given that many projects may want to treat internal evaluation data as confidential, the mechanisms by which such evaluation data could be identified and shared need to be approached with care. An emergent issue from this discussion was the realization that different evaluators may understand and interpret evaluation data and needs in different ways, and thus might have different requirements from the same resources suggesting, in turn, that any repository of digital library evaluation materials should describe its contents carefully along multiple axes in order to make it useful to the widest possible range of users. As a corollary, and in the spirit of this workshop, this further implies that the repository metadata should also make these multiple axes understandable to non-experts and those interested in lightweight evaluation.
While digital libraries have become established both as technologies and organizations, digital library evaluation techniques and practices still have some way to go in order to achieve the same level of maturity. Differences in evaluation models and approaches, combined with a number of resource barriers, indicate the need for the further refinement of digital library tools and techniques. The workshop described in this report brought together both academic researchers and practitioners in digital library evaluation, and initiated a discussion that will support the continued development of lightweight user-friendly evaluation knowledge for digital librarians.
Workshop Position Papers
Buchanan, George. "Towards Lightweight Digital Library Evaluations."
Morgan, Glenda, and Alan Wolf. "Issues in Refining Digital Library User Surveys for General Versus Specialized Audiences."
Morrill, Joshua. "If You Build It They Will Come...Maybe..."
Morrill, Joshua, and Flora McMartin. "Evaluating Digittal Libraries With User Panels."
Wilson, Max, and m. c. schreafel. "Sii: the lightweight analytical search interface inspector."
Xu, Amanda. "Online Surveys for Collecting, Analyzing, Tracking and Evaluating User Responses on FocusOn Search and CategoryMap."
Bartolo, L., Diekema, A., Khoo, M., & McMartin, F. (2006). Evaluation practices in NSDL.
Frechtling, J. (2002). The 2002 user-friendly handbook for project evaluation. Arlington, VA: The National Science Foundation, Directorate for Education and Human Resources, Division of Research, Evaluation, and Communication. Retrieved June 1, 2009, from <http://www.nsf.gov/pubs/2002/nsf02057/start.htm>.
Khoo, M., Zia, L., & MacArthur, D. (2009). "Evaluating Impact: An Agency Perspective." In Papatheodorou, C., & G. Tsakonas (Eds.), Evaluating Digital Libraries, Oxford, U.K.: Chandos Publishing House. In Press.
Reeves, T., Apedoe, X., & Young, H. W. (2004). Evaluating Digital Libraries: A User-Friendly Guide. Boulder, CO: University Corporation for Atmospheric Research. Retrieved June 1, 2009, from <http://dlist.sir.arizona.edu/398/>.
Saracevic, T. (2004). "Evaluation of digital libraries: An overview." Presentation at the DELOS WP7 Workshop on the Evaluation of Digital Libraries, 4-5 October 2004, Department of Information Engineering, University of Padua, Italy. Downloaded June 1, 2009, from <http://www.scils.rutgers.edu/~tefko/DL_evaluation_Delos.pdf>.
Note1 The lower case spelling of schraefel is deliberate.
Copyright © 2009 Michael Khoo, George Buchanan, and Sally Jo Cunningham