Volume 20, Number 7/8
Table of Contents
What Do Researchers Need?
Feedback On Use of Online Primary Source Materials
Jody L. DeRidder and Kathryn G. Matheny
University of Alabama Libraries
Cultural heritage institutions are increasingly providing online access to primary source materials for researchers. While the intent is to enable round-the-clock access from any location, few studies have examined the extent to which current web delivery is meeting the needs of users. Careful use of limited resources requires intelligent assessment of researcher needs in comparison to the actual online presentation, including access, retrieval and usage options. In the hopes of impacting future delivery methods and access development, this article describes the results of a qualitative study of 11 humanities faculty researchers at the University of Alabama, who describe and rate the importance of various issues encountered when using 29 participant-selected online databases.
Increasingly, humanities researchers are dependent upon web access to primary source materials during at least the preliminary portion of their work, and at times for their entire research project. Development of new databases and new online content is exploding, as research libraries, museums and others seek to provide access to their unique special collections materials online. With the voluminous increase in content comes increased difficulty in locating specific items of interest. Yet the basic functionality of digital library interfaces and the expectation of what users need has not changed much in the past twenty years.
Previous studies1 have clarified that user needs vary with the target population. As Borgman points out, "Experts have a variety of strategies and tactics to overcome poor design in digital libraries. Novices do not."2 If experienced researchers cannot effectively find and access the content they need, then it is likely that other, less experienced target audiences have even more difficulty. With this understanding, we developed a qualitative study of the needs of expert researchers utilizing online primary source materials. The desired outcomes of this study were to inform digital library software design, to obtain clarification of the most useful metadata fields for inclusion, and to identify gaps in services provided to the researcher which would suggest opportunities for improved support.
Despite expanded access to primary sources provided by online interfaces, historians have struggled with how best to employ them. Boonstra, Breure and Doorn summarized the 1994 findings of Denley and Everett, saying that "user unfriendliness, a steep learning curve, the feeling of a black box deploying a technology far away from generally accepted standards... have kept many potential users away."3 Ten years later, in a review of over eighty digital library evaluations, Saracevik found a serious gap between what users assume about digital libraries, and what digital library models assume about users, concluding, "Users are from Venus and digital libraries are from Mars."4 In the same year, Boonstra, Breure, and Dorn found that the methodical and technical skills for computerized research were still fairly limited among humanities scholars.5 The problem extended beyond experienced scholars; a study of 111 students published the following year found that "in all instances digital library users' current views of their actual experiences are lower than their desired levels of usability."6
"Usability" itself is not a simple concept to measure. Different researchers have employed multiple variations of the components of usability, but they usually include some combination of the following: efficiency, effectiveness, navigation, appearance, terminology, and learnability7. Thong, et al., proposed that usability (or the usefulness and ease of use) of a digital library is affected by the interface characteristics (terminology, screen design and navigation), the organizational context (relevance visibility), and the individual differences of the user (computer self-efficacy, computer experience, and domain knowledge).8 Banati, et al., found that the usual criteria for web usability are often lacking an emotional component.9 They conclude, "Since usability is primarily a user oriented concept... we include criteria pertaining to concepts which can gauge the user's experience with the website" which includes work satisfaction, emotional satisfaction, and trustworthiness10. Ferreira and Pithan found that observing users' feelings and thoughts adds new perspectives to the analysis of usability.11 Borgman agrees, stating:
'User friendly' design addressed screen displays and functional capabilities but did not delve deeply into task motivation, much less into the relationship between a computer user and the work, educational, or leisure context from which the task arose. People were expected to adapt to systems, and considerable effort was devoted to user training. Today people have higher expectations of information systems.12
Borgman's view of usability criteria stems from the field of human-computer interaction (HCI): "systems should be easy to learn, tolerant of errors, flexible, adaptable, and appropriate and effective for the task."13 Measuring these variables requires more feedback from users, an approach embraced by "User-Centered Design."14
Lack's application of User-Centered Design demonstrates the benefits of understanding users' needs, behaviors, goals, and current environment when designing and modifying delivery systems.15 For example, Kachaluba, Brady and Critten found that, for some, poor image quality and missing images were major barriers to use.16 A study of a digital library interface commonly used for digitized archival material, CONTENTdm, confirmed that "useful and desirable" collections can be hard to find in a "confusing" interface, even among "those who have considerable experience using the Internet."17 While this is useful information, especially given CONTENTdm's widespread use, it is not generalizable to multiple interfaces. This kind of local or limited study is all too common.18 The study of the needs of scholars needs to expand much further.
User needs extend beyond the usability of the interfaces or databases to include locating appropriate databases and collecting, storing, and collating material and information. Zhang spoke to the need for a more holistic approach to evaluating digital libraries, one that goes beyond the traditional library evaluation models: "...few address the effects of a DL at higher levels, including the extent to which a DL fits into or improves people's daily work/life."19 Such studies are rare.
Studies of researcher behavior in the context of digital libraries are even rarer. In a 2012 report20 on historian research practices, Rutner and Schonfeld described the immense challenges faced in locating primary source materials, and in managing their gathered research notes and files, noting that "Digital systems do not appear to address all the needs of even those scholars who seek to use them."21 Conway's qualitative study of expert researchers unearthed fascinating discoveries, such as their distaste for interpretation of any kind, and their extensive development of personal information management methods.22 When Audenart and Furata23 observed the extensive note-taking by scholars using source materials, they contended that developing and supporting the environment needed by scholars in the analysis of content is as important as search and retrieval.24 Chassanoff also recognized these issues in her 2013 study, stating that "Research tools need to be flexible enough so it ultimately does not matter whether historians access materials online or in person... it will become essential that they can seamlessly integrate tools for organizing, annotating, and analyzing primary source materials into their workflows."25 Development of such tools seems to be lagging far behind the need.
A recent survey of humanities scholars at Florida State University found that even the way in which access is provided has a tremendous impact on the researcher's work: "...the container determines the research method and process that the scholar uses to access and interpret the information."26 The more difficult it is for researchers to use systems, the more likely it is that these same systems will be avoided, and their contents underutilized. To ensure effective support for the development of knowledge, there is a pressing need to explore "how humanities researchers actually use and interact with different kinds of electronic and print materials."27
Our study seeks to deepen awareness of the specific needs of humanities researchers using an important and complex subset of electronic materials: primary source materials. The study examines participant encounters with a variety of databases, which have been selected by the researchers themselves. By assessing the gap between desired usability and actual experience in the researchers' normal work environment, we sought to identify clear directions for improvement.
Chowdhury and Chowdhury suggest that usability "must be judged on the basis of a digital library's intended goals."28 Since it can be deduced that a leading goal of provision of online access for primary source materials is to enable research, our hypothesis is the following: Current online interfaces to primary source materials do not fully meet the research needs of even experienced researchers. Since faculty members can reasonably be considered experienced researchers, we drew our study sample from this population.
Subject liaisons, archivists, and information services personnel at the University of Alabama Libraries were asked to recommend faculty members who use online primary source materials in their research. Email invitations were sent to sixty researchers thus identified, in order to solicit volunteers for this study. Each volunteer was asked to select one t o three online interfaces that he or she regularly uses for access to primary source materials, and about which he or she had feedback to provide, either positive or negative. A spreadsheet was developed to track comments using the following labels: interface access, search, browse, results, item access, full content access, transfer of information, additional software used, serendipitous discovery, the ability to return to results, interface services, and other.
Eleven faculty researchers (see Appendix I) accepted the invitation. Each researcher was assigned a number (from one to eleven) for identification in all data collected. Structured interviews took place in the researcher's office, lasting from 40 minutes to two hours. When permitted and feasible, video capture was made of the computer desktop during the interview.
As researchers demonstrated how they used their selected interfaces, they were asked to describe what they liked and didn't like about each step of the process. At the point of selecting content, the researchers were asked to show how they collected and organized information found via these databases. The interviewer then reviewed the comments aloud to ensure they had been recorded correctly. Finally, each researcher was asked to rate his or her comments in terms of importance, where three was extremely important, two was somewhat important, and one was minimally important.
Twenty-seven different interfaces (see Appendix II) were selected by the 11 researchers. Of these, four were chosen twice (Digital Library of Georgia, Early English Books Online, American Periodicals and British Periodicals), and Scout (the University of Alabama Libraries discovery interface) was used three times. Although we asked each participant to select interfaces used in the discovery of primary source materials, confusion as to what constituted a primary source, and the difference between an interface providing access directly to content and an interface aggregator, was occasionally evident.
Utilizing the following definition for primary sources, we determined that three of the 27 interfaces chosen (EBSCOhost, Internet Archive, and JSTOR) were not focused on this type of resource:
Primary sources are original materials. They are from the time period involved and have not been filtered through interpretation or evaluation. Primary sources are original materials on which other research is based. They are usually the first formal appearance of results in physical, print or electronic format. They present original thinking, report a discovery, or share new information.29
Rather than correct the researchers, we decided that enough commonalities (search, browse, results lists, item access, and content transfer processes) existed between interfaces for primary and secondary sources indeed, many interfaces include both that the results from both would be useful, and the very lack of differentiation itself was telling. Additionally, portals to other search interfaces were treated as interfaces themselves, for clarity in these results. In the course of the interviews, some additional comments and observations were gathered about researcher workflow, the online research process, and broad needs not met by particular interfaces.
Collection and initial analysis of data followed the "grounded theory" approach as described by Pandit.30 Data was collected under conceptual labels and then grouped into categories. During the initial analysis of the 393 interface-specific comments, we organized this feedback into four divisions (see Appendix III): Display Functionality, Metadata, Information about the content or interface (beyond item metadata), and Coverage (of content, formats, etc.). Of these, the average importance rating of comments indicates that the Metadata and Coverage (essentially, fundamental concerns of access) were relatively a bit more important than the Display Functionality, though the latter by far garnered the most comments. Certainly there is some overlap between divisions, as the search, browse, limiting and faceting capabilities discussed under Display Functionality are dependent upon what Metadata is available, but this was the division that seemed to provide the most clarity. For each Division, the number of comments, interfaces, and researchers has been recorded, along with a simple average of ratings given on a one to three scale.
To some extent, the number of comments was affected by the method of analysis, which required splitting combination comments into separate ones in order to group feedback appropriately. For example, a researcher comment that location, format, and date were useful in the results list for each item would be split into three comments to enable tracking of the number of times a researcher referred to a particular metadata field. All divisions were further subdivided into categories and often into subcategories.
However, in formulating this report, we realized that our focus needed to return to the workflow as evidenced by the researchers. Also, the initial form of analysis left out the feedback which was not interface-specific. Therefore, in this article we address the groupings of comments as they apply in the sequence of steps taken by researchers to locate and collect useful information. This method of presentation and organization of analysis best informs future research by preserving a more holistic view of scholars' needs.
4. Scope and Limitations
This study included only 11 faculty researchers at the University of Alabama, from a group of potential participants identified by library staff, in the spring of 2013. The sample may not be representative of humanities researchers in general, and may not be generalizable to faculty researchers at other institutions. The interfaces selected by the researchers may not be representative of all online primary source databases. The levels of importance provided by rankings are subjective to the researchers and may vary across the participants. As the study was composed of structured interviews and researcher comments have been summarized or paraphrased below, misinterpretation of some comments is possible.
5. Study Findings
Our findings are presented in the general order of the researcher workflow, with awareness that research is an iterative cycle. We begin with issues related to locating information, move on to exploring results, and finally explore content extraction and management. Given the relative dearth of information about the actual workflows of researchers, we hope that this more holistic approach may provide useful context as well as help to inform future research.
5.1 Locating Information
Once a researcher has begun to formulate a potential line of inquiry, the first hurdle is to locate sources of appropriate information. Once a particular database is found, the researcher must first be able to gain access and then determine whether or not it contains information useful to her research. During this initial exploration, the researcher often has general reactions to the display and navigation options. She then focuses on searching and browsing. The following subsections explore these issues in turn.
5.1.1 Finding the Database
Although our study began with an examination of how researchers used the databases they had pre-selected, we still gathered comments about this important topic. As Borgman says, "[i]nformation searching is a process that most people learn through experience,"31 making a certain amount of faltering part of the (recursive) learning process. However, those who seek to support scholars can and should seek to flatten this learning curve.
One very helpful suggestion was that scholars would be well served by a search service that provides citations of content which matches input search terms. Another asked for updates about newly available portals and aggregations, and added that he wished projects were fully funded and ongoing, rather than limited short-term efforts that are never updated or expanded. An additional suggestion was for each database to enable users to sign up for email alerts for when new content was added; he rated this as very important.
Two of our researchers commented that if an interface does not rank highly in Google search results, it must not be important. Similarly, high rankings in Google were equated with popularity and importance. This is a dangerous presupposition for trained scholars to make, as these assumptions may skew research findings. It also highlights the importance of search engine optimization for database content, to ensure the interface is easily findable in Google.
Another researcher surprised us by saying that he uses online databases primarily to search for analog content (including finding aids) and to identify networks of relationships between historical people of interest. For this researcher, the metadata is often his main research data, not just an auxiliary to the materials themselves. Hence, linked data implementations that elucidate relationships between historical figures could be remarkably helpful.
Since our participants selected the databases about which they would comment, there were few issues with access noted in our study. Obviously, this is a critical area for most researchers, but this study's methodology prevented a reflection of this in our findings. Still, we did obtain 12 comments in this area.
One interface prompted comments from two researchers: Internet Explorer was not supported (the browser crashed), and a particular plugin (DjVu) was required before contents could be viewed. Both these issues caused considerable annoyance.
If login is required, the process should be seamless and at a reasonable point in the process. One researcher was grateful for the option of off-campus proxy access to an interface via login. Another argued that it should not be necessary to register at an interface if coming to it via the library website proxy, and a third expressed problems with staying logged in to an interface after leaving it and returning. Finally, one researcher complained that if a login will be necessary for download, the login dialog should be available at the point of download (the item level).
Interoperability was extremely important (3.0 rating) to three researchers, commenting on four different interfaces. One researcher liked that using the interface allowed him to generate a search of the library catalogue from within the interface; another wished that links to articles not available in full text went to the library's Interlibrary Loan form, preferably with pre-filled data, rather than to an external site. When using the library aggregator portal, one researcher did not like that clicking on an item on the results list opened to the database, not the item, when she used an older Mac.
One researcher said that it would be incredibly helpful if related content and materials were cross-linked, as then he could traverse the trail of information regardless of location or database ownership. The fact that this has not yet been implemented informed his belief that the companies owning the databases are more interested in profits than in enabling scholarly results. He claimed that it's not only difficult for researchers to get what they need, it's unfair. Development of linked data across holdings and databases would provide a strong network for humanities research support, and is highly recommended. Potentially, access could be provided via "patron-driven acquisitions" or a pay-per-view model.
5.1.3 Scope and Coverage
Once a database or interface is located, the researcher must to be able to determine whether or not it contains the type of information he needs. It is unrealistic to expect a researcher to invest the time to learn how to use a database simply to determine whether or not it contains what he seeks. If the scope and content of the database holdings are not readily available, the database likely will simply not be used. Five comments in this category were about the need for scope indicators to describe the range of content covered by an interface; four wanted visualizations; and another added that it should be easy to see at a glance what had been added recently. Comments included praise for a list of genres, desire for line graphs which would help represent the content visually, and thankfulness for a bar graph representing the decades. However, another did not find the timeline feature of her interface to be very useful, except perhaps for students, as insufficient consideration had been given to the variables selected.
We received 23 comments about coverage within databases, with an average ranking of 2.65, indicating moderate relative importance. Nine comments were on the availability and need for full text content. Image capture of text without OCR (optical character recognition) does not support search and retrieval, and access to transcripts is definitely preferred. Seven of the nine comments were rated "very important."
Fourteen additional comments on scope and coverage were more general, representing eight researchers and 11 interfaces. Three wanted more comprehensive coverage, and one wanted databases as results in conglomerated search results (rather than item-level results from those databases). Positive comments about the comprehensiveness of databases included gratitude for the following attributes:
- contains short run periodicals
- envelopes were digitized along with letters
- contains a good mix of sources (including contemporary, and not limited to academic content)
- has content not findable in Google books
- contains finding aids, even if digitized items for those collections are not available
Intuitiveness was one of the more diverse categories of comments, as it reflects individual impressions of each interface's consistency, presentation, and general ease of use. This category received low average ratings across the board, suggesting these factors are more an annoyance than a serious impediment, yet annoyance alone may be sufficient reason to avoid use of a database.
General presentation was not terribly important, generating 12 comments with an average rating of 1.72. Seven were pleased by interfaces that were "clean" or "simple" to use, with a familiar design or smoother presentation. One researcher, on the other hand, wanted the interface in question to be less "confusing" and "overwhelming," and a second simply wanted a more attractive interface. One was offended by flash pages, and one wanted the feedback popup to wait until she was leaving the database. One researcher liked the front page slideshow presentations of items, but was confused that each one did not represent a different collection. Mouse-over descriptions were particularly desirable (three comments over three interfaces), but not when they created lingering dropdown menus.
Three researchers noted a lack of consistency (four comments over three interfaces). They were frustrated by the lack of general uniformity of interface presentation related to search, both across types of materials and across databases. One researcher was bothered by a lack of standardization in presentation due to content being aggregated for multiple sources, noting that the variation can be disorienting.
Where am I?
One of the worst things that a researcher can experience in an online interface is feeling lost and confused. We were appalled at how often this happened during our study. Six of our researchers were lost at least once, and four were lost twice (7 different interfaces). The need for obvious navigation tools clearly visible breadcrumbs or navigation tools at top left, consistent branding and clarity about where one is in the interface generated 15 comments, with an average rating of 2.26.
Even locating the search interface again was difficult for four of the researchers (six comments, five interfaces, average rating 2.83), and four of their comments asked for the search bar to be at the top of the screen, available on every page in the interface.
It should be obvious that linking is important. We received seven comments on this over five interfaces by four researchers, with an average rating of 2.71 (median and mode were both 3). Broken links were found by two of our researchers in two different databases. Researchers often save links, and it's helpful to be able to identify the content from the link; thus, two asked for less opaque, more helpful URLs. Hyperlinks between items or to additional data were appreciated, as were thumbnails that linked to larger images.
5.1.5 Search and Browse
One of the primary usability properties described by Koohang and Ondracek,32 but which seems to be absent from the multitudes of other descriptions of usability, is the aspect of control. Users need to feel that they are in control of using a digital library; they want to have control over what the interface does and the choices that are made there. This need for a sense of control was very evident in researcher comments about search and retrieval. Often, our participants did not understand how search worked, or how to make it work, and their frustration was apparent in the comments we collected. Lack of consistency across interfaces exacerbates the problem: as one researcher stated, "I have difficulty remembering which databases have which rules for search." Other researchers echoed this sentiment. One went on to state that each database should offer a page explaining the basic search rules, such as truncation, capitalization, the level of forgiveness for spelling errors, and more.
Aggregations added an additional layer of complexity. One researcher was unclear as to how to search across multiple databases as opposed to only one; another was unable to locate the database she wanted, as a common acronym for it had not been indexed; a third was confused by the variation in results provided by the search; and a fourth was upset because the conglomerated results did not seem to be organized by relevance.
Startlingly, four of our researchers seemed to believe that every failure on their part to locate something was due to their own incompetence or ignorance, instead of a problem with the interface. These researchers were unwilling to ask for assistance, and were completely dependent upon the intuitiveness of the interface and available help pages that provide search instruction or suggestions that actually work. This distressing finding strongly recommends ongoing training, support, and online guidance even for those normally categorized as "expert researchers."
In the face of their difficulties in obtaining the desired search results, the researchers in our study were heavily dependent upon limiting and refining options to try to target the desired content. When these were unavailable or unhelpful, they would seek browse options, which often were nonexistent.
In this section we will cover in turn the researcher's efforts to figure out the interface, their experiences with the search engines and results lists, and their comments about browse options.
How does this work? Instructions and Explanations
The explosion in numbers of databases has made it unrealistic to expect each scholar to invest time in learning how to use each one and remember how each one works. Each database has different methods of retrieval, and this in itself has become a serious problem for researchers. A failure to provide helpful instructions and guidance often turns this problem into an impasse.
For our purposes, instructions are overt directives or statements of procedure that might help a user better operate within the interface. All nine comments that we collected in this category were negative, reflecting the desire for better understanding of interface operation. As one participant stated, "all databases need clear instructions for how to use each button on the interface."
Five of the comments were about searching. Two researchers needed help with basic search rules, for them to be clearer or for there to be a box explaining them (e.g., explaining truncation symbols). Two wanted instructions on how to search more efficiently, including knowing what terms are supported or indexed; another wanted to understand how to use the advanced search more efficiently. Two comments were about the "help" page: the interface should do what the help page says it does, and the "help" link should be more helpful. The other comments asked for instructions on how to use each button on the interface and a more obvious indicator that "hits are in red" in item view.
Six of the seven comments about explanatory material were negative, again calling attention to unmet needs. They represent five researchers and six interfaces and disparate concerns. Researchers wanted explanations of the criteria for classification of content by type as well as the formula used to determine relevancy ranking. One researcher said it makes him suspicious when the database won't reveal such things as how they rank results for relevancy. As a result, this researcher depends heavily on browse options. As Banati argues, "The relationship between usability and trust is a very complex relationship."33 Clearly, if that relationship deteriorates, it affects the research process. Easily accessible information as to how results are selected (especially when it involves relevancy rankings) would help to build trust and confidence in the value of information found.
Researchers also wanted access to an index of supported search terms, and an easily accessible, clear glossary of symbols and terms used in the presentation interface. Finally, they wanted an indication of which web browsers are supported and to know about the intended usefulness of the clickable tags. The sole positive comment was from a researcher who liked the "request a copy" link, especially that it was accompanied by a note on use restrictions.
Effectiveness: search accuracy and capabilities
As one researcher stated, "people are used to Google, so a poor results list is off-putting and makes people think that what they're looking for is not there." Another stated that "keyword and topical searches are difficult and unpredictable. The results aren't complete. It feels like there's no rhyme or reason to it." A third complained that her search "either returns extraneous results even with multiple refinements, or it does not return results I know are there." Two more were frustrated that common terms for the content in question had apparently not been included in the search index. Failure to obtain satisfying search results turns away scholars, or forces them to browse for content, if indeed browse is even supported.
Seven of the comments on search accuracy were positive, reflecting recognition of an adequate or high level of search accuracy, five of them ranked "very important." Problems were reflected in two areas: results lists that were considered not accurate, complete, or predictable; and searches that arbitrarily exclude a search term despite the use of a Boolean AND, do not turn up good results for topical searches, or do not sort for relevancy so that the best match comes up first.
Most of the comments regarding search capabilities focused on ambiguation, desiring support for variant spellings and fuzzy searches, common synonym recognition, and "more results like this" options. Two others expressed preferences for particular search capabilities: phrase searching and a default to keyword search. Finally, one comment was grateful for the quick, powerful search, but another wanted a better way to locate information within content that has "such poor metadata."
Selectiveness: Fielded search and limiters
One of our researchers was very upset with an interface that limited the search to a particular kind of content, beyond his control. Fielded search and limiters are a particularly powerful way to return control to the user.
Only two researchers (both historians) commented on what fields would be helpful at the point of search (fourteen comments, four interfaces), but they ranked each one as highly important. Value was placed on publication by both researchers; others mentioned were location, year of creation, publication year, title, and institution, organization or library. One of the researchers commented at length about the variety of ways in which names can appear in metadata, wreaking havoc on search results. He would like to be able to search with the full author name, and also for a person mentioned in any way, such as that of sender, recipient, simply referenced or as the person to whom the publication was dedicated.
Limiting and Refining
The number of comments and their ranking indicated that faceting and limiting options were very important to the majority (eight) of our researchers. We received 35 comments over 15 interfaces: 26 comments were rated "very important," eight were rated "important," and only one was rated "somewhat important." Eleven comments were about the mere presence or absence of limiting and refining options, with an overwhelming preference for this functionality. One comment appreciated this option being present for basic search, without an additional click through to advanced search. One liked that multiple facets could be chosen at once, and another liked having checkboxes to do so, which can be removed to allow for reconstruction of the original search. One comment addressed an interface that had a default limit to a particular kind of content, which was undesirable to the researcher. Five of the researchers (two from Literature, three from History) commented on which metadata fields were most useful for limiting and refining:
||Number of Comments
|Full text availability
|Primary vs. Secondary sources
Date includes date range (three), year (two), date, and date of coverage as opposed to publication year. Content type includes fields described as type, media type, genre, or source type. Content location includes both source location and the ability to limit to locally available content.
Once the researcher is presented with a list of results, the usability of that list is critical. One was very upset that different versions or editions of the same item were not grouped together in the result lists. Three of our researchers could not even find a link to the digitized content on the results page or even a clear indication of how to access that content (two researchers gave up, not knowing how to reach the digitized items). Three spoke to the need for provision of thumbnails. Other requests were for a choice of how to access the item, an indication of whether a digitized item was available, and for a click on the entry in the results list to bring the researcher straight to the first hit in the item, or directly to the PDF. Another simply wanted fewer mouse clicks before reaching the item. Only two of our researchers commented on a good presentation of results page.
We received five comments about the need to see the number of results. Three said this was very important, and wanted this on the results page (two requested it at the top of the page). One wanted to know how many times a match to the search term was found (minimal importance), and one wanted to be able to choose how many results displayed on a page (very important).
Five of our researchers said that how results are sorted was very important (average rating 2.8). Four of five comments requested sorting by date, one of which also wanted to be able to sort by Parliamentary Recall Number. One researcher mentioned that this kind of sort helps reduce the effect of an overly exclusionary search engine. The remaining comment was from a researcher who did not like that a results page was sortable by columns.
For lists of content presented within a finding aid, one researcher requested more information in the folder, box, and section descriptions to assist him in determining whether to seek further.
Metadata in Results (20 Comments)
Three researchers made five comments about the need for item description on the results page, though one of these found the condition of the description to be unhelpful because it needed better editing. For three researchers, the date was important to very important in the results list. Two researchers stated that a preview or snippet from each item is very important, in order to determine whether the item is relevant without having to click on it. However, another was very confused by a keyword in context display that was drawn from OCR-derived content, saying it was "inscrutable," with "disconnected phrases run together." The researcher concluded, "It's a pain to try to make sense of this!"
|Keyword in context
Once again, some fields were highly important, but only to a single researcher: information about who has cited the work, series detail, repository location (including box and folder numbers) and format.
When Search Fails: Browse
Thirteen comments expressed a desire for some sort of browse functionality, by categories (with one researcher adding that these should be selected "by someone who knows the field") or by location. Ideal ordering of browse was alphabetical (by author, title, or publication). One preferred chronological order in the browse list, while another simply wanted the "research resources" tab to be better organized.
One very positive experience with a browse involved a list of less than 20 categories of content, which the researcher said were drawn directly from the compiled book indices of the content of the database. Each category, when clicked, provided another layer of less than 20 subcategories, which again (when clicked) offered another set of sub-sub categories; all displayed within same page so that context was not lost. All categories were alphabetized.
5.2 Working with the Content
Rarely do digital library developers think about what happens beyond the point at which researchers locate content in their databases, but that is where real barriers and challenges were evident in our study. The opportunity for new services and support in this area is pressing and rather critical.
5.2.1 The Online Item
Item Navigation and Operability
All but one researcher commented on item navigation and functionality; the ratings indicate that this is an important area for them, which makes sense, given the item's importance to their work. We received 11 comments about moving between sections of the document. Three wanted or approved of a Table of Contents-style navigation. Another three approved of other forms of outline-style navigation which allowed the researcher to see the location of search terms in the text or the location of the current section of text within the whole, and to navigate within that frame. Two stated that it was very important for original page numbers to be visible on each page. One researcher was very unhappy that the documents themselves were not chronologically arranged before compilation and digitization, which made his work difficult. Another appreciated the way compiled documents were displayed so that he could easily get a sense of the length of sections and the location of images, and he could easily access different parts of the document.
Additional single comments related to navigation included the following:
- navigation tabs should be visible within item view; how to close the image and return to results should be clear
- search within the item should result in the first hit so that it is obvious that the search is complete
- text should be presented one page at a time
- paging buttons should appear at the top and bottom of the screen
- original facing pages should be available for view
One very specifically wanted to know how to access the finding aid from the item display.
Six researchers (eight comments, median and mode ratings were three) commented on content handling within six interfaces. Two liked encountering scrollable PDFs, although one wanted to control the amount of scrolling; one liked that PDFs opened in a new window; and another liked seeing a preview before download. One interface did a good job with image content but failed to handle text and maps well. One researcher found HTML useful for browse purposes; another objected to HTML as it did not include the original page numbers. Two of our researchers wanted to be able to save search results online, with the option of saving content and/or links and metadata. One also spoke to the need to be able to organize the content within the folder provided, for presentation.
Size was an important issue for seven of the eleven researchers, commenting on eight interfaces. Four comments were about the availability and proper functioning of zoom, including at multiple levels. Desired options included a separate visual to show where the user is zoomed in on the document, the ability to specify zoom percentage, and automatically maintaining the selected zoom level from one page to the next. One researcher found inclusion of a ruler in the image capture extremely helpful, as opposed to merely written dimensions.
Three of our researchers wanted a larger window on initial presentation, the option to maximize the window, and the ability to expand the viewer to fill the browser window. Readability was a real issue in two of the interfaces for two researchers: one commenter found multi-column format material difficult to read, and another wanted the option of larger type in the display.
Highlighting was very important to a few researchers. Two researchers made nine of ten comments on this topic, covering five databases (both median and mode rating were three). They wanted to see highlighted search terms in item view, with a usefully visible color (one preferred green to yellow). They also wanted highlighting to be used as a clear indication of where the search term appeared, either in the Table of Contents or in the document itself.
Five of our researchers expressed the need to see metadata alongside the item in item view or on the same page as the image. If there isn't room for much information on the page, concise metadata with an option to click for more was requested. Item-level metadata fields mentioned most were repository location, description, and creator, followed by date, publisher, title and subject. However, date, followed by description, was the most important. Interestingly, although three respondents mentioned the subject, its importance ranking was lowest of all the fields specified, perhaps reflecting Conway's finding34 that experienced researchers do not find interpretative metadata very helpful. This conjecture was supported by comments by one researcher, who stated that she didn't want or need interpretive descriptions, and she questions the accuracy of them, as they are "contextualizing the item, and not always in relevant ways." In a similar comment, another researcher specifically stated that assigned titles were useless.
Again, certain researchers found specific metadata fields extremely important, which were not mentioned by others. Not included in the table below are single requests, rated very important, for origin and context (for art), commentary, and citation information (for text).
Two researchers stated simply that they wanted better metadata, and one of them specifically focused that complaint on maps. Unusual types of metadata should have accompanying explanations: one researcher was totally confused by the inclusion of "Cockton Titles," for example. Another found the Wellesley author attributions absolutely essential, for the articles he was reviewing were anonymously written. He explained that these came from a secondary reference source that contextualizes the journal and provides author identification and information.
Related to metadata are user annotations. A single researcher wanted the option to annotate and share annotations with others; another was confused by the "rating" display and wondered what it meant. One asked that annotations be linked within the document, not from an outside frame.
5.2.2 Extraction (57 Comments)
Content extraction was an important topic for our researchers. Every one of them collected information from the databases to combine with information already collected elsewhere. Researchers commented on text, image, and citation extraction, for printing or download, and particularly the need to select specific segments of content in addition to the whole. The PDF format was especially valued for this (for text, not images) as opposed to HTML display.
Another discovery was the heavy dependence on desktop search to locate downloaded files. If, as is normally the case, the file names do not reflect author and title, it was common for the researcher to be unable to locate the collected content. Of the six researchers who spoke of downloading PDFs, five of those renamed files upon download, in order to make it easier to locate the content using desktop search. Renaming content with author and at least a portion of the title was preferred; one renamed the PDFs with date and keywords; and one said his method varied according to his current system of organization. Two researchers mentioned downloading images, with one renaming those files in similar fashion.
A single researcher said it was very important to be able to download others' annotations. Annotations were rare in the interfaces selected, so notably few comments about them were gathered.
Text extraction was a hot topic, eliciting 22 comments from nine researchers over 13 interfaces, with an average rating of 2.50. Almost three quarters of comments in this area were negative, indicating a real need to improve services.
Nine comments spoke to the need to save or print content, including the option to select portions of content to save or print. One researcher mentioned that it was important to be able to make that selection at the point of printing, not on a previous screen, which required backtracking. Another rated the ability to download transcriptions as "very important." One was grateful for access to the OCR-derived content, saying it is better to cut-and-paste that, and then correct it, than have to transcribe a long text from scratch (although he added that it would certainly be better if the OCR'ed content were cleaned up).
Seven comments spoke to the need to copy-and-paste or extract text, including the need to extract the text in segments and to do so from page view. Three additional comments revealed that researchers do not like it when text is presented as an image, requiring them to clip and save, or save and crop, the image rather than cut and paste text. Two simply wanted an easier process or clear instructions for selecting that content in page view. One researcher was so frustrated by his inability to figure out how to cut and paste a quote that he said "I give up at this point."
Most of our comments about PDFs were negative, in that the interface did not provide them, or did not provide the downloadable file with the metadata they needed. One reason PDFs are desired for text is because they allow the researcher to see the original page numbers (for use in citation) as well as the context of content if it has been delivered in segments or paragraphs. One researcher specifically asked for scrolling to be supported within PDFs. Six researchers rated the availability of PDFs "very important."
Seven researchers made ten comments about the need for identifying information to be provided with the downloadable PDF; four said this was very important. Seven comments asked for citation information to be included in the PDF itself. Three asked for PDFs to have useful filenames upon download, made from author and title (either partial title or title keywords).
We received ten comments about images, from six researchers reviewing seven interfaces; both the median and mode rating were three, indicating high importance. Two comments addressed the need for better quality images, including one comment about maps in particular. Three wanted access to the full image and just the image, without it being bound up in a PDF, requiring extraction. One asked for a more intuitive method for selecting and extracting images and two suggested particular printing options: to choose how many images to print on a page, and to choose to print a collection's images in bulk. Also valued by one researcher was the ability to select or "clip" just a portion of an image in order to save it. One researcher was totally unable to determine how to collect digital images of illustrations and gave up in frustration.
Most of the ten comments received (from five researchers over seven interfaces; average rating 2.5) reflected researchers who were pleased with the citation generation and extraction options open to them, including export to citation management software (such as EndNote or RefWorks) or by email, especially in bulk rather than one-by-one. However, one researcher expressed frustration with errors in automatically generated citations, and one was extremely upset with the citation emailing service. After clicking "mark all" on his result list and selecting "email list," he then received 42 separate emails, one per result, instead of a comprehensive list. This was far from helpful.
5.2.3 Content Management
Rutner and Schonfeld noted in their 2012 Ithaka study, "Nearly all historians face an ever-growing mass of paper and electronic resources, notes, writing and images. Organizing these materials in a consistent way so that they can be easily accessed throughout the research and writing process typically over many years is an enormous challenge."35 We found this to be true in our study as well, and hope that the following observations may shed light on the options needed in, and services provided by, digital library interfaces.
One of our most startling discoveries was that the vast majority of our participants used a lengthy Microsoft Word document to compile information, aggregating anything from citations and metadata to copied content and research notes. Of the eleven researchers, nine stated that they use this approach; the primary reason mentioned was to support searching across sources, and, secondarily, to have all their resources in one location for reading and ordering. The remaining two researchers also use Word for data aggregation, though they housed content in separate, smaller documents. Among the types of content transferred to Word are the following: source content (nine), citations (seven), metadata (three), and URL (one). Five researchers copy and paste content exclusively. Two manually transcribe content or create citations exclusively. Four researchers use both methods. These compiled Word documents become too large for email transfer, and since the researchers often worked both at home and at the office, multiple versions may exist on different computers and transfer media, creating confusion.
Citation management software gets mixed reviews. While six of 11 don't mention whether they use citation management software, four say they do and one explicitly does not, saying, "Citation software doesn't work with the way I organize my thoughts." Endnote was used by three researchers, and Refworks and Filemaker Pro (a document management system) were also used by one person each. Their use of automatically generated citations in all cases lines up with their choice about citation management software. One researcher complained that the Endnote on her older Mac doesn't recognize the citations from Endnote on her newer PC, so clearly there are issues here to overcome as well. Among those who didn't mention citation software, the use of automatically generated citations was variable, with two each for positive, negative, and not mentioned.
The overall organization of each researcher's content varied widely, from those who make note of just enough metadata to find the content again to those who rely on sifting through exports and downloads to those who meticulously record all needed data and metadata in a centralized location. But all researchers in the study were aware of the need to establish a system of content organization. As Rutner and Schonfeld observed:
The majority of interviewees said that a central challenge of their research is 'gaining intellectual control' over the content they have collected throughout their research process. From the interviews, it was clear that historians are interacting with a wide ecosystem of information, within which they are continuously collecting, interpreting, and attempting to organize and access for analysis. 36
All eleven researchers in our study do have some functioning system of organization, highly personalized (See Appendix III). Given the organic, self-directed nature of research training, this is hardly surprising. However, it is clear that some if not most of our researchers could be helped in their personalized methods if more options were available to them. Almost every researcher interviewed collected citations and annotations (and often transcriptions) in one location, and downloaded documents elsewhere, which then had to be matched up during the research process. As Audenaert and Futura point out, "While information seeking has received considerable attention, how best to support scholars' needs for externalizing knowledge is a question that has not been well studied."37 We hope this study instigates further research in this area.
The researchers we interviewed clearly had difficulty locating content within the selected interfaces. Landing pages rarely contained an overview of the content and scope of the database, which becomes critical in communicating to scholars whether it can be useful for them.
Often, search instructions provided by the interface were insufficient or incorrect. Search widgets rarely offered sufficient or even evident information about how the search process worked, at least not enough to engender trust or confidence in the provided mechanism. Faceting options and limiting options were highly valued, as were browse interfaces that enabled researchers to drill down to the desired content. Browse options were critical to continued use of the database when the researcher was unable to obtain desired results utilizing the search functions, but in many cases browse was not available; when available it sometimes failed to organize content in a way that was helpful for the researcher. Our study also supports the finding by Boonstra, et al.,38 that it would be helpful to see visualizations of content over changes in time and space.
All of the researchers collected information or content from the databases for use, either by printing, downloading, or other forms of extraction. Interfaces which offered PDFs (automatic paging, easy download) and easily extracted images were preferred. The gathering of digital data across multiple interfaces creates chaos for the research process, and all our participants struggled with this. High on the list of requests was citations within the PDFs and an automatic file name that reflects the author/creator and part of the title. The latter makes the content more findable within the researcher's operating system, and the included citation makes it far easier to ensure correct citation or even use in the research. Failure to include the citation means the researcher must store that information elsewhere (sometimes with researcher annotations) and be able to match up the PDF or image file name with the citation.
As stated in the introduction, the desired outcomes of this study were:
- to inform digital library software design,
- to obtain clarification of the most useful metadata fields for inclusion, and
- to identify gaps in services provided to the researcher which would suggest opportunities for improved support.
Our research indicated numerous suggestions for improving digital library design and services, which follow:
- Provide continued support and expansion of databases and aggregations over time
- Increase information:
- Offer email and front page notification systems of new content.
- Provide scope descriptions and visualizations on landing pages.
- Identify which web browsers are supported.
- Provide an index of common database search terms and an easily accessible, clear glossary of symbols.
- Make navigation at all levels should be more intuitive and standardized, with search boxes clearly visible at all times.
- Improve search:
- Provide faceting and limiting options as well as better instructions and descriptions of how the search works.
- Standardize services across databases, to reduce confusion and researchers having to train in each one and remember the differences.
- Explain the relevancy ranking used in search results, and make sure results are relevant.
- Provide search support of a variety of name versions, and specify what will work.
- Ensure search support for Boolean operators, topical searches, variant spellings and fuzzy searches, common synonym recognition, phrase searches, a default to keyword search.
- Provide an index of supported search terms or recommendations on how to select effective search keywords.
- Provide "more results like this" options.
- Provide keyword searching over OCR content and transcripts.
- Enable sorting of results by date created.
- Provide browse options, particularly by date range and location, and visualizations of content; additionally, at the item level:
- Support linked data across databases and interfaces, to enable browsing of like content seamlessly.
- Provide clear instructions for how to select and download or print segments of text.
- Support download of transcriptions and OCR content.
- Provide PDF version of text documents, containing citation and with the file named for the author and part of the title.
- Ensure original page numbers are visible (and preferably in order).
- Do not provide image content in PDFs.
- Support selection and extraction of portions of images.
- Support generated citations that are accurate.
One of the most troublesome discoveries is that the researchers in our study had difficulty with the provision of images displaying text. Current methods of digitization of primary source materials focus heavily on image capture of text. Rarely is the staffing available to transcribe the documents, and OCR is almost useless for handwritten documents (unless it has been trained for a particular handwriting, which is not feasible for the majority of manuscript materials). One of our researchers transcribes everything, and would be grateful for access to the OCR-generated content, as cleaning it up would be simpler than total transcription. To truly support scholarly endeavors, new methods of capturing text from images need to be developed.
In terms of metadata, our study indicated that the following were most useful fields to support in the functional area specified:
- Fielded search: publication, names (by role if possible), location, repository location, year, title, and government organization
- Limiting and refining: date of creation, content type, content location
- Result lists: keyword in context, date, description, title, creator, repository location, format, a link to who has cited it, series detail
- Item level: date, description, repository location, followed by publisher, creator, title, and (of least importance) subject
- Extracted/downloaded content: citations, and creator/title keywords in file names
Our results indicate that certain scholars deeply need some metadata that the majority never mention, which makes it difficult to weigh the results in terms of importance. For example, only two of the researchers (both in the discipline of history) commented on the metadata options needed for fielded search, but some of the fields they requested are rarely available (place of publication, who the item was dedicated to), and they ranked each one of highest importance.
This is an area which is ripe for research targeting scholars of specific fields; the results would enable digital libraries to better serve their desired audience(s). With too much generalizing across disciplines, we risk reducing the status quo of digital library development to a mediocre level which fails to meet the needs of those we most seek to serve.
Our study also identified a need for services which extend beyond the database interface:
- Provide web search and notification services, keyed to specific user interests, of new content, aggregations, and databases.
- Offer search services based on query that return citations of appropriate content found.
- Educate scholars about the relative value of content in Google results.
- Develop personal information management software that meets the needs of scholars, including access from multiple computers.
- Offer training on research information organization and management.
- Provide database and search instruction
This last recommended service is critical. We were appalled to find that, at least among our participants, even experienced researchers now need training in searching. Given that the research does not support this in the past, we conjecture that search interfaces are indeed not keeping pace with the quantity of content now available in databases, and with the current needs of users. Future research should test this finding against a broader population.
Our study seems to indicate that our hypothesis is correct: even experienced researchers have basic needs which are not met by current digital library interfaces, particularly for primary source materials. And inasmuch as the researchers we studied frequently proved to be using the interfaces in ways apparently unanticipated by developers, these observations may have major implications for the user experiences of less adept researchers.
While user training is certainly one potential solution, it cannot address all the difficulties faced by this group of researchers. As the quantity of online content burgeons, our study indicates that new and more effective discovery mechanisms need to be developed to effectively support research and discovery. Even within our small sample, the needs of the researchers varied, but several commonalities existed as well. Our study seems to indicate that online access services should be targeted as much as possible to the needs of the desired audience, but should share commonalities in search functionalities in particular, in order to reduce confusion.
Continued assessment of the ability to support scholarly research should incorporate the full cycle of the research workflow, and should focus heavily on identifying the target audiences' unmet needs.
1 Southern Historical Collection, University Library, University of North Carolina at Chapel Hill, Extending the Reach of Southern Sources: Proceeding to Large-Scale Digitization of Manuscript Collections, June 2009 (final grant report for the Andrew W. Mellon Foundation); Alison J. Head and Michael B. Eisenberg, "Lessons Learned: How College Students Seek Information in the Digital Age," Project Information Literacy Progress Report, 2009; Christopher J. Prom, "User Interactions with Electronic Finding Aids in a Controlled Setting," American Archivist 67 (Fall/Winter 2004): 23468; Cory Nimer and J. Gordon Daines III, "What Do You Mean It Doesn't Make Sense? Redesigning Finding Aids from the User's Perspective," Journal of Archival Organization 6, no. 4 (2008): 216232.
2 Christine L. Borgman, "Designing Digital Libraries for Usability," in Digital Library Use: Social Practice in Design and Evaluation, eds. Ann Peterson Bishop, Nancy A. Van House, and Barbara P. Buttenfield (Cambridge, Mass.: MIT Press, 2003): 110.
3 Onno Boonstra, Leen Breure and Peter Doorn, Past, Present and Future of Historical Information Science (The Hague: DANS, 2006), 50; Peter Denley, "Models, Sources and Users: Historical Database Design in the 1990s," History and Computing 6, no. 1 (1994): 33-43; James E. Everett, "Technical Review of Kleio 5.1.1: A Source-Oriented Data Processing System for Historical Document," Computers and the Humanities 29 (1995): 307-316.
4 Tevko Saracevic, "How were Digital Libraries Evaluated?", presented at DELOS WP7 Workshop on the Evaluation of Digital Libraries (2004): 9.
5 Boonstra, Breure and Doorn, Past, Present and Future of Historical Information Science, 9.
6 Alex Koohang and James Ondracek, "Users' Views about the Usability of Digital Libraries," British Journal of Educational Technology 36, no. 3 (2005): 415. http://doi.org/10.1111/j.1467-8535.2005.00472.x
7 Steven Buchanan and Adeola Salako, "Evaluating the Usability and Usefulness of a Digital Library," Library Review 58, no. 9 (2009): 638-651.
8 James Y. L. Thong, Weiyin Hong, and Kar-Yan Tam, "Understanding User Acceptance of Digital Libraries: What Are the Roles of Interface Characteristics, Organizational Context, and Individual Differences," International Journal of Human-Computer Studies 57 (2002): 215. http://doi.org/10.1016/S1071-5819(02)91024-4
9 Hema Banati, Punam Bedi, and P. S. Grover, "Evaluating Web Usability from the User's Perspective," Journal of Computer Science 2, no. 4 (2006): 314-317. http://doi.org/10.3844/jcssp.2006.314.317
10 Ibid., 314.
11 Sueli Mara Ferreira and Denise Nunes Pithan, "Usability of Digital Libraries: A Study Based on the Areas of Information Science and Human-Computer-Interaction," OCLC Systems & Services, 21, no. 4 (2005): 311-323. http://doi.org/10.1108/10650750510631695
12 Borgman, "Designing Digital Libraries for Usability," 88-89.
13 Ibid., 109.
14 Chadia Abras, Diane Maloney-Crichmar, and Jenny Preece, "User-Centered Design," in Berkshire Encyclopedia of Human-Computer Interaction, Thousand Oaks: Sage Publications 37.4 (2004):445.
15 Rosalie Lack, "The Importance of User-Centered Design: Exploring Findings and Methods," Journal of Archival Organization 4, no. 1/2 (2007): 84. http://doi.org/10.1300/J201v04n01_05
16 Sarah Buck Kachaluba, Jessica Evans Brady, and Jessica Critten, "Developing Humanities Collections in the Digital Age: Exploring Humanities Faculty Engagement with Electronic and Print Resources," College & Research Libraries 75, no. 1 (2014): 101.
17 Maggie Dickson, "CONTENTdm Digital Collection Management Software and End-User Efficacy," Journal of Web Librarianship 2, no. 2-3 (2008): 369. http://doi.org/10.1080/19322900802190852
18 Milena Dobreva and Sudatta Chowdhury, "A User-Centric Evaluation of the Europeana Digital Library," in ICADL 2010, LNCS 6102, eds. G. Chowdhury, C. Khoo, and J. Hunter (Berlin: Springer-Verlag, 2010), 148-157; Gretchen Geugen, "Digitized Special Collections and Multiple User Groups," Journal of Archival Organization 8, no. 2 (2010): 96-109. http://doi.org/10.1080/15332748.2010.513324; Elsa F. Kramer, "IUPUI Image Collection: A Usability Survey," OCLC Systems & Services 21, no. 4, (2005): 346-359. http://doi.org/10.1108/10650750510631712; Lisa R. Norberg, Kim Vassiliadis, Jean Ferguson, and Natasha Smith, "Sustainable Design for Multiple Audiences: The Usability Study and Iterative Redesign of the Documenting the American South Digital Library," OCLC Systems & Services 21, no. 4 (2005): 285-299. http://doi.org/10.1108/10650750510629625; Don Zimmerman and Dawn Bastian Paschal, "An Exploratory Usability Evaluation of Colorado State University Libraries' Digital Collections and the Western Waters Digital Library Web Sites," Journal of Academic Librarianship 35, no. 3 (2009): 227-240. http://doi.org/10.1016/j.acalib.2009.03.011
19 Ying Zhang, "Developing a Holistic Model for Digital Library Evaluation," Journal of the American Society for Information Science & Technology, 61, no. 1 (2010): 88. http://doi.org/10.1002/asi.21220.
20 Jennifer Rutner and Roger C. Schonfeld, "Supporting the Changing Research Practices of Historians," Ithaka S+R technical report for the National Endowment of the Humanities, U.S. Department of Commerce National Technical Information Service (Dec. 2012).
21 Rutner and Schonfeld, "Supporting the Changing Research Practices of Historians," 25.
22 Conway, "Modes of Seeing," 459.
23 Neal Audenaert and Richard Furuta, "What Humanists Want: How Scholars Use Source Materials," Proceedings of the 10th Annual Joint Conference: Digital Libraries (2010): 283. http://doi.org/10.1145/1816123.1816166
24 Audenaert and Furuta, "What Humanists Want," 289-291.
25 Alexandra Chassanoff, "Historians and the Use of Primary Source Materials in the Digital Age," American Archivist 76, no. 2 (2013): 472.
26 Kachaluba, Brady, and Critten, "Developing Humanities Collections," 102.
27 Ibid., 94.
28 Dobreva and Chowdhury, "A User-Centric Evaluation of the Europeana Digital Library," 148; G. G. Chowdhury and Sudetta Chowdhury, Introduction to Digital Libraries (London: Facet, 2003).
29 "Primary, Secondary, and Tertiary Sources," University of Maryland University Libraries.
30 Naresh R. Pandit, "The Creation of Theory: A Recent Application of the Grounded Theory," The Qualitative Report 2, no. 2 (1996).
31 Borgman, "Designing Digital Libraries for Usability," 103.
32 Koohang and Ondracek, "User's Views about the Usability of Digital Libraries," 410.
33 Banati, Bedi, and Grover, "Evaluating Web Usability," 314.
34 Conway, "Modes of Seeing," 459.
35 Rutner and Schonfeld, "Supporting the Changing Research Practices of Historians," 40.
37 Audenaert and Furuta, "What Historians Want," 289.
38 Boonstra, Breure, and Doorn, Past, Present and Future of Historical Information Science, 17.
39 Some of the researchers listed under History are faculty in the American Studies department (variously classed as a humanities or a social science discipline) but are categorized in this way because they are historians by procedure.
40 The University of Alabama Libraries digital repository.
41 At the time of the study, this database was still under the title Evans Digital Editions, 1639-1800.
42 The University of Alabama Libraries EBSCO-driven discovery interface.
Appendix I. Researcher Area of Expertise, Number of Interfaces and Interface Comments
||Number of Interfaces
||Number of comments
Appendix II. Interfaces Selected
- Alphabetical listing with number of comments followed by the number of researchers in parentheses:
- Acumen40 (13 comments, 1 researcher)
- American Periodicals Series (21, 2)
- ArtSTOR (20, 1)
- British Periodicals (30, 2)
- Digital Library of Georgia (40, 2)
- Documenting the American South (22, 1)
- Early English Books Online (20, 1)
- EBSCO Host (8, 1)
- Early American Imprints: Series I, Evans, 1639-180041 (18, 1)
- Gateway Bayern (1, 1)
- Google Scholar (8, 1)
- Historical Abstracts (2, 1)
- HRAF (Human Relations Area Files) World Cultures (8, 1)
- Illustrated London News Historical Archive (2, 1)
- In Motion: The African-American Migration Experience (28, 1)
- Internet Archive (5, 1)
- JSTOR (13, 1)
- Kalliope (5, 1)
- Mark Twain Project (21, 1)
- Nation Archive (15, 1)
- Newspaper Archive (13, 1)
- Parliamentary Papers (13, 1)
- Project Muse (8, 1)
- Richmond Daily Dispatch (7, 1)
- Sabin Americana 1500-1926 (24, 1)
- Scout42 (10, 3)
- VD17 (18, 1)
- Interfaces Selected by Faculty Area of Expertise:
- American Periodicals Series Online (1740-1900)
- British Periodicals
- Digital Library of Georgia (2)
- Early English Books Online (EEBO)
- Early American Imprints: Series I, Evans, 1639-1800
- Google Scholar
- Historical Abstracts
- In Motion
- Internet Archive
- Nation Archive
- Newspaper Archive
- Parliamentary Papers
- Project Muse
- Richmond Daily Dispatch
- Sabin Americana (1500-1926)
- Scout (2)
- American Periodicals Series Online
- British Periodicals
- Documenting the American South
- Illustrated London News Historical Archive
- Mark Twain
- HRAF (Human Relations Area Files) World Cultures
Appendix III. Researcher Content Organization Details
The following are our observations of each researcher's system of organization for research materials:
Researcher 1 said she does not save text content because she expects it to be there if she returns to it, via a saved URL. She does, however, copy metadata and paste it into a Word document. To save images, she does a right-click and "save image as," naming the file something that will prompt her memory, which is noted in a Word document.
Researcher 2 downloads PDFs for reading but in general transcribes citation and text into a single lengthy Word document, organized chronologically. For longer documents, he OCRs the PDF and corrects the resulting text in Word. He names the PDF according to an admittedly inconsistent personal naming system and puts it and all documents into the same digital folder, along with personal and other material. The researcher makes a new digital version of the growing Word document every time it is modified. He sometimes emails documents to himself, to retain copies, as well as printing out each PDF and writing the citation on top (these are placed in folders in a stack, for reference). He may save his work to a home computer, work computer, or flash drive, so he must look in all such places to locate content, which he does by browsing rather than using the his computer's search functionality.
Researcher 3 prints images of digitized manuscript materials, and then transcribes them into a Word document. To locate information within what he's collected, he uses "Find" to search his compiled Word document. He copies and pastes citations during the writing process, rather than during the research stage.
Researcher 4 right-clicks and saves images, then he copies and pastes the URL into a Word document so he can find the image again. Sometimes he includes information about origin of image, but rarely, as he uses these images mainly in his teaching (copying the information from Word to PowerPoint, which is then turned into a PDF).
Researcher 5 organizes by theme, with a single folder for each. She downloads images and cuts and pastes or manually enters text (from multiple sources) into Word documents titled for those themes. She states that she has no good way to match up images and text. She downloads articles in PDF form and extracts citations from them (which are then pasted into the Word document), renaming each PDF with the author name and the first two to three words of the title. When working, she reads a theme, makes notes, and at that point cuts and pastes quotes and citations out of the PDFs into Word.
Researcher 6 highlights text for cut and paste into another program for better readability. He puts all excerpts into a single Word document in the order collected and manually enters citations. He then numbers the quotes and prints out the resulting file for reference. He refers to the numbers when he's writing, and later returns to cut and paste in the appropriate content. He prefers to select his own excerpts and never uses citation buttons, as he doesn't trust the results to be in the form he needs. He sometimes uses email service to send documents to himself.
Researcher 7 downloads the PDF, sometimes placing it in a folder named for the author, and renaming it with the publishing date and a phrase describing the content so she can find it again. If she has access to a printer, she also prints the PDF immediately. She cuts and pastes title/author for use in creating a citation, and she counts on the PDF to have this information included in it; if it doesn't she sometimes has to go back and look it up again. She sometimes copies and pastes metadata into an email to herself. She has not taken the time to learn how to export citations into Endnote. Usually she cuts and pastes citation content into Endnote, then transfers portions to the appropriate fields. Once in Endnote, she attaches the downloaded PDF.
Researcher 8 organizes information by where he found it, and mirrors the organization method of the repository from which it originated. He transcribes descriptions and other metadata from the interface, collecting all his information about people and their interrelationships into a single Word document that he can search. He keeps notes with the downloaded documents in the same folder, and gives them the same filename, based on author (if there is one) and title. He saves digital copies only to print and read them. He transcribes descriptions and other metadata from the interface, collecting all his information about people and their interrelationships into a single Word document that he can search. If he doesn't have enough time to transcribe information from the document description, he requests digital reproductions. He doesn't trust automatically generated citations; instead he creates citations and manually enters them into the Word document. He keeps multiple copies of his documents and notes, including one in a relative's safe in another state.
Researcher 9 transcribes the information she wants, sometimes cuts and pastes citation information, then adds her reflections. She admits that she has no method of organization for her saved documents, so the title of the PDF is very important, especially when she works on her PC, where she can't search the full text. (On her Mac she can use Spotlight to search within the documents.) She keeps a copy of the full citation in her notes, and then later tries to match it up with the author/title on the downloaded PDF. If she can't, she returns to the interface to search for the information again. This researcher uses Filemaker Pro and Endnote.
Researcher 10 pastes image clips into a Word file, typing the citation above it, and dragging the image as large as he can while still keeping it and the citation in the size of a half-sheet. He prints these Word files (up to a 100 pages), then cuts them in half to make them into notecards, which he organizes in boxes according to topic. He retains the Word files for searching, but they're too large to email to himself due to the image files included.
Researcher 11 downloads PDFs and puts them in a common folder by project. He edits PDFs to keep just the cover page and the pages he needs for his research. He copies and pastes between the digital document and Word. He uses the automatic generator to get citations, uses the email function to send them to himself, and copies and pastes the citations into EndNote.
Appendix IV. Number of Comments by Division, Category, and Subcategory
Display Functionality (224 comments)
- Navigation (72)
- Organization of content (17)
- Orientation (15)
- Within item (14)
- Results list (11)
- Linking (8)
- Search box location (6)
- Extraction (57)
- Text (22)
- PDF (18)
- Citations (10)
- Image (10)
- Annotations (1)
- Intuitiveness (25)
- Presentation (13)
- Operability (8)
- Consistency (4)
- Search (23)
- Accuracy (12)
- Capabilities (11)
- Options (13)
- Sorting and ordering (5)
- Mouseover (3)
- Saving (3)
- Crowdsourcing (2)
- Size issues (13)
- Zoom (8)
- Window (3)
- Readability (2)
- Access (12)
- Interoperability (4)
- Login (4)
- Web access (2)
- Software (2)
- Highlighting (10)
Metadata by location (114 comments)
- Search (49)
- Limiting and refining (35)
- Fielded searching (14)
- Item (35)
- General (7)
- Repository location (5)
- Description (4)
- Creator (4)
- Date (3)
- Publisher (3)
- Subject (3)
- Title (2)
- Origin (for art) (1)
- Context (for art) (1)
- Commentary (1)
- Citation information (for text) (1)
- Results (20)
- Description (5)
- Keyword in context (3)
- Date (3)
- Title (2)
- Creator (2)
- Who has cited the work (1)
- Series detail (1)
- Repository location (1)
- Date (1)
- Format (1)
- PDF (10)
Information (33 comments)
- Scope and coverage indicator (11)
- Instructions (9)
- Explanations (7)
- Number of results (5)
- Alerts (1)
Coverage (22 comments)
- General (13)
- Full text (9)
About the Authors
Jody L. DeRidder is head of Digital Services at the University of Alabama Libraries, where she develops production digitization, usability studies, preservation policies and procedures, and methods of alternative access. Previously, she built and supported digital libraries at the University of Tennessee. When she's not automating processes and improving cross-departmental workflows, she reviews grant proposals, serves on the Society of American Archivist (SAA) Publications Board, and co-chairs the SAA Metadata and Digital Objects roundtable. She has an M.S. in both Information Science and Computer Science. ...
Kathryn Matheny is Digitization Outreach Coordinator at the University of Alabama Libraries, where she works to improve access to and promote the use of digitized materials through subject guides, digital exhibits, usability studies, and social media. While earning a PhD in English, she taught courses in composition and literature, and tutored remedial writing skills. She plans to put this experience to good use in information literacy instruction. She currently is pursuing an MLIS.