This article describes an oral history analog-to-digital reformatting pilot project conducted at the University of Kentucky Libraries for the purposes of preservation and access. The project includes master file creation and a custom interface for searching and retrieving Web mounted audio segments. Through a cost analysis of the project process, this article also explores what can be accomplished in this area with a large target collection and limited funding.
The University of Kentucky Libraries houses the Louie B. Nunn Center for Oral History, established in 1973. Over the last three and a half decades, more than 6,500 interviews comprising over 12,000 interview hours have been conducted on topics including the Frontier Nursing Service of Eastern Kentucky, Family Farming, Civil Rights, Veterans, and Kentucky politics and public policy. Staff and student workers have diligently transcribed, in first draft form, nearly half of the collection.In September of 2005, the Nunn Center and the Digital Programs Department embarked on a two-year pilot analog-to-digital conversion project with two main goals:
On both fronts, a desired solution needed to encompass digitization of the entire corpus of oral histories and include automation processes in order to accomplish goals without major funding. The project funding, which came from the Nunn Center's budget, covered 40 hours per week for students dedicated to the project and provided $5,000 to build an analog-to-digital conversion workstation.
Preserving Oral History
The decision to take on the analog-to-digital conversion of our oral history collection stemmed from a growing concern with magnetic tape as a preservation medium and the introduction of born digital oral history interviews. We had to build a digital preservation plan for our new born digital files; reformatting the older analog tapes seemed an obvious extension of that work as well as a necessary one when considering the impermanence of analog versions, their steady deterioration, and the increasing reliability of digital audio preservation [9, 11, 13]. Magnetic cassette tape in particular may remain stable for as little as ten years . The Nunn Center houses analog interviews far older than a decade, a real and growing preservation concern. But the Nunn Center is not unique in this regard: archives, libraries, and oral history programs worldwide are facing this same challenge. Clearly, archives with unique and significant audio collections need to move now toward preserving them for future user access [9, 2, 14].
Establishing a preservation standard for our digital audio files was not an easy task. Although a growing number of resources propose best practices for analog-to-digital reformatting of spoken word and other audio recordings, there is no authoritative, defined standard . Still, there are areas of concern or disagreement as well as those with obvious consensus, such as file format (uncompressed WAV) and number of channels (preferably two).
The most contentious area among audio specialists is by far digital capture using 16 bit (resolution) at 44.1Khz (sample frequency rate) vs. 24 bit at 96Khz (often noted as 16/44 and 24/96 respectively). In short, the higher the bit depth and sampling rate, the higher the capacity to capture more information from the audio source, such as cassette or reel-to-reel tape. The choice may seem obvious until all mitigating factors are taken into account.
Humans have a rather dynamic audible perception range of 20 Hz to 20.05 kHz; the pinnacle of which is considered 22.05 kHz (kilohertz) [7, 10, 8]. The Shannon-Nyquist sampling theorem, also called Nyquist's Law, states that to faithfully capture and reproduce a band-limited recording, the sample rate must be greater than twice the highest frequency in the signal, meaning that a sample with all amplitudes less than 22.05 kHz requires sampling at a frequency of at least 44.1 kHz. Knowing this to be true, audio digitization practitioners agree that sampling at anything less than 44.1 kHz is a disservice to the process.
But this is where a significant technological debate begins. Many think that digital conversion of magnetic tape above the 44.1 kHz threshold adds little more than noise or tape hiss; in particular, any information that it adds is not pertinent to the recording itself, if it can be heard by humans at all. A growing number of music and aural authorities reason that a higher bit depth and sample rate will indeed capture subtle tones greater than 22.05 kHz, which they contend are present and audible, in music recordings especially . It's not out of the realm of possibility that humans can "hear" frequencies above 22.05 kHz, subconsciously if nothing else. Other species certainly can. For instance a dog's range is 50 Hz - 45 kHz, and a cat's is 45 Hz - 85 kHz. Dolphins can hear as much as 200 kHz and elephants as little as 5 Hz . So, an extreme tonal range exists whether we humans actually "hear" it all or not. The fact remains, however, that for spoken word recordings there is little evidence to suggest such aural subtleties are relevant or warrant the significant file size increase that higher settings for their capture would incur.
Furthermore, though no less significant, we considered the state of technology at the time the project began. Regardless of what tonalities may or may not exist, one fact was certain; most hardware and software wasn't designed to accommodate a signal above 16/44, making interoperability and long-term preservation of 24/96 files a problem. Many popular and effective software and hardware systems simply don't recognize bit depth and sample rates greater than 16 bit, 44.1 kHz . More importantly, technology and the standards born of it, as evidenced by previous technological advances, is often fueled by market design. Take, for example, the war between Beta and VHS video tape in the 1980s. Though the experts touted Beta tape as a far superior medium, the consumer/market desire for VHS led to the extinction of the superior Beta model. It's entirely possible that audio digitization hardware and software capture requirements will take the same path. As more authorities embrace the possibilities offered by 24/96 certainly if scientific evidence confirms their suspicions - the more likely hardware and software manufacturers will increase their signal capture capabilities. Perhaps then 24/96 would be a more prudent choice.These issues were deciding factors for us. The entire corpus of analog recordings in the Nunn Center is spoken word only. These speech sounds have an upper frequency of only about 5,000 to 6,000 Hz, far less than current digital technology can record . If we were going to capture at 24/96, thus doubling our storage requirements and automated processing time, we would need a clear indication that at 16/44 we were losing information in the transfer from analog to digital . We decided that for analog magnetic cassette tape holding recorded speech, we would not capture at 24/96, and therefore opted to follow 16 bit and 44.1kHz as our analog-to-digital conversion standard.
Figure 1: Preservation Master File
File Format: PCM.wav
Building an Analog-to-Digital Audio Workstation
Once we decided on quality benchmarks for converting analog tapes to digital files, we needed hardware and software capable of meeting our specifications. To build our audio reformatting workstation we needed five main devices:
Figure 2 gives a specific listing of all the hardware parts involved.
Figure 2: Analog to Digital Audio Workstation
1. Digi-Design 002 recording/mixing board and A/D converter
We chose the Digidesign 002 production system for several reasons. We could have easily chosen a smaller, simpler analog-to-digital device, but we prefer the DigiDesign because we anticipate expanding beyond cassette tape conversion at some point. Many of our first generation oral histories are on reel-to-reel, for instance, and the expandability of the DigiDesign allows for multiple machines to operate without a patch bay or constant moving of cords. Too, it's easy to envision converting other formats like vinyl discs, video, and so forth, and the DigiDesign can handle a number of necessary machines for this.
The unit is constructed around a mixing board with outstanding technical specifications in function and quality. It has excellent internal analog-to-digital and digital-to-analog converters up to 24-bit/96-kHz with an admirable dynamic range in the neighborhood of 114 dB, removing any need to use the PC's stock sound card or an additional sound card.
The DigiDesign 002 works best with its accompanying ProTools software. ProTools, while exceptional in functionality, is notoriously difficult to learn. We found that Adobe Audition, version 1.5, which is easier to learn and use, suited our needs far better. However, we lost a great deal of functionality with the DigiDesign by taking this route and eventually returned to ProTools.
Capturing the Master File
We decided early on to put the greatest amount of available human effort toward creating the master file for preservation of the audio. We favored a trained ear over automation for this so as to ensure that we would get the best master digital transfer from the analog tape . Working with an audio specialist employed by the local public radio station and a technician holding certification in audio engineering, we began capturing the audio from the first generation audio cassettes at 16 bit and 44.1Khz and saving to a 2 channel uncompressed WAV file format.
The process was broken down as follows. Before the conversion starts, the trained student operator opens a new file, turns the tape on, then monitors the sound for approximately two minutes to set the appropriate recording level. Once the level is set, the student rewinds the tape, opens another new file, then starts the recording. With record levels set, the digital conversion begins for the direct transfer to WAV without extraneous hardware or software manipulation except those inherent in the system. It's important to recognize that all components of the analog-to-digital conversion process come with some amount of signal distortion that can impact the converted digital file to varying degrees and should therefore be taken into consideration when choosing hardware and software to build the workstation and then should be monitored throughout the digitization process.
The student operators monitor the signal both visually (watching the visual signal supplied by the ProTools software) and aurally (either with the Sennheiser HD-220 fully enclosed headphones or the Alesis One MK2 Studio Reference near field monitoring system). If the incoming signal from the cassette tape is too hot (also known as clipping and revealed by constant spiking on the graph), the sound becomes distorted and adjustments to lower the signal are required. Conversely, raising the recording level will correct a low signal from a cassette. Perhaps the most prevalent problem with analog tapes of oral histories, however, is the sheer lack of signal on the tape not necessarily due to tape deterioration but by less than stellar field recording. Unfortunately, boosting the recording level for low signal tapes also means an increase in ambient noise, tape hiss, and other annoying artifacts of the recording process such as car horns and slamming doors. Regrettably, little can be done to correct analog recordings that are, for whatever reason, marred by distortion from the beginning. In both of these situations, operators aim for a balance between the coveted clearly audible voice and the overwhelming detritus. Despite adverse possibilities, most recordings are of an ample, consistent level and require almost no adjustments once the initial record levels are set. This allows for less human error and a cleaner digital file.
Any adjustments to the signal beyond the initial "master file" creation session are primarily done by student operators who have also been trained in the subtleties of sound. To be sure, no filtration or manipulation is ever applied to the master file. This initial analog-to-digital conversion file is treated as the preservation copy. A copy of a master file can be generated to which all filtration and manipulation alterations are applied. The product of this process is a second WAV file that we refer to as the "edited" file.
Each "edited" file is treated on its own merits rather than applying blanket settings to a host of files as a batch process. The operators who generate the master file are already familiar with the file's content and are best able to make the necessary adjustments to the audio signal. Though extensive adjustments, and certainly "restorative" adjustments, are heavily discouraged because of a lack of skill, time, and labor cost, we realized early in the project that some minimal filtration is desirable to improve the aesthetics of the majority of interviews that will be presented to the end user. Settings such as normalization and de-hissing are the most common adjustments used and, because they're applied to a surrogate file, they pose no threat to the preservation master copy. Both the master and edited WAV files are archived, and the edited file is used to automate MP3 derivative production.
We feel that documentation of this preservation process is paramount . For each audio conversion session encompassing the transfer to digital of one side of a cassette tape, the student operator creates a configuration file in text format. This file holds all of the technical information dealing with the master file creation as well as cassette sleeve metadata and filtration settings applied to the edited files, the latter of which will save valuable re-editing production time and labor costs should a file be lost or corrupted. Metadata about the conversion process is also kept in a production database. As well, an MD5 checksum is also created for each master file, allowing us to retrieve master files at a future date through an automated process that will ensure that no bits have been lost.
Figure 3: Workflow Diagram
Storage of Master Audio Files
Storage is a major consideration for any digital conversion project. For digital audio projects in particular, large files dictate large capacity storage systems and/or media. In keeping with the general assumption that multiple copies are better than one, we decided to rely on three storage methods encompassing four copies of every master and edited audio file. After making our master and edited files we push them via ftp to an automated mass storage, robotic tape drive system. This allows us to store one copy of our files off site. Two more copies are stored on separate 1.2TB external hard drives and a fourth copy is archived to DVD. Our copies on the external drives serve as our near-line primary backups, the tape drive our secondary near-line backup, and the DVD our last option off-line backup. We realize that optical technologies in particular are prone to problems involving writer/disc compatibility, writing speed issues and dye formulation problems . None of these backup options can be trusted indefinitely. Some experts estimate a ten year window for off-line storage approaches such as tape or optical media, not only because of deficiencies in the physical media but also because of software and hardware changes . Our refreshment process, based on a five-year time span, involves moving files to new media and analyzing checksum calculations for file integrity.
Considering and Facilitating Access
As mentioned earlier, access was a secondary goal for this project. It was our hope to use the high quality master digital audio files and automated processes to create Web files for end-user access. It immediately became clear, though, that for online access, more than just the digital audio files were needed. Metadata was necessary to describe the interviews and provide access points for users to perform searches. Full-text transcripts were also desirable for greater access to the interviews and in large measure because researchers and oral historians have traditionally relied on them for study. The digitized audio files are not solely a replacement for full-text transcripts but do serve as accoutrements to the text while providing entrée for younger researchers who increasingly insist on digital access. Our main focus with the interface is to give users the necessary tools they might need or want in order to get the most from our collections and their time. Taking stock of what we have at our disposal and what methodologies more established oral history collections on the Web were employing, we developed a functionality matrix with specified, distinctive automation and human processes. Since we had a student workforce to complete the project, we also established which human processes could be completed by student staff and which would need to be completed by specialized staff such as historians, oral historians, or library metadata specialists.
Figure 4: Functionality Matrix
What we already had at our disposal was an accession database for the entire oral history collection. This included most of the basic metadata elements we needed for online access. We also had first draft transcripts for roughly 2,600 interviews. One of our main approaches was to utilize the full-text for searching. At the same time, we felt that getting users to the audio was most important, the audio revealing emotional content not apparent in the transcript . Also, no transcript is perfect, and the majority of those we had at our disposal were not quality controlled by an oral historian but were first drafts created by students and staff. We knew errors had to exist, but the job of establishing quality control over this amount of text was a daunting if not economically impossible task. Similarly, although we recognized the significant usefulness inherent in metadata, the analysis of the text to establish and annotate key concepts and assign controlled subject terms would require considerable time from a specialized staff, a cost we could not afford . This type of access is really only economically feasible for small specialized or partial collections .
Considering these issues, we decided to base our access system on the searching of the oral history metadata and full text of the interview transcripts, the results providing a list of interviews with suggested relevant audio segments of an easily digestible (and easily downloadable) duration. Rather than trying to identify key moments in the audio or breaking it up into logical segments, we store landmarks in its metadata: the line numbers in the transcript associated with five-minute intervals in the audio. In the search results, the user's search terms are highlighted and surrounded by nearby text in the transcript to give context to each search hit, and the line number that each search hit appears on is indicated (see Figure 5 below). The line number range for each five-minute segment of audio is also displayed, so the user can select the segment they want, and even estimate where in the segment they can hear their search term. A slide-bar is also available if the users would like to select their own audio segments by moving the sliders to (or typing in) their own start and end times. To create these landmarks in existing transcripts, a student worker listens to the audio around the five-minute marks, searches the transcript to find the text associated with that audio, and inserts a generic marker there. In the future, such markers can be placed in the transcript as it is being created. After markers have been inserted, a script is run on the interviews to extract and store the line numbers and to replace the markers with timestamps in the transcript. When a user requests an audio segment, the site extracts that segment from the full audio file, on the fly, or simply gets it from cache if it has been recently requested.
Another consideration for our project was scalability both in the technological infrastructure and work processes. Keep in mind that our aim was to devise a mode and method for digitizing and providing online access to all 12,000 plus interview hours in the analog oral history collection. We had no doubt that our work processes would handle the large scale as long as we could keep up with the storage requirements over time. We also knew that we would be employing the University of Michigan's DLXS content management system for the online interface. From our extensive experience with this system over the last five years, and our knowledge of other users' experiences, we knew that this was a system capable of providing access to very large collections and that it would allow us, eventually, to deliver the entire oral history collection in digital form. Another benefit of the DLXS system is that the data structures are XML-based and very portable to other systems and metadata formats.
As an extension of our delivery system, audio segments are extracted using SoX <http://sox.sourceforge.net/>, a free and open source sound processing program, along with the MAD <http://www.underbit.com/products/mad/> and LAME <http://lame.sourceforge.net/> libraries, which allow it to decode and create MP3 files.
Figure 5: Interface Search Results Graphic
Preparation for the project occurred during August and September of 2005. During this time we completed extensive research to establish our "Best Practice" approach, which included consultation with several experienced audio conversion specialists, and built an analog-to-digital conversion workstation. We then hired and trained our student workforce.
Production of master digital audio WAV files began in earnest in October 2005. After 11 months, we had approximately 320 digitized interviews, so we turned our attention to the user interface. The interface creation and functionality took the better part of two months to perfect. Simultaneously, to support interface needs, we trained our student workforce on time stamp production.
Today, we continue to produce master and edited WAV files and derivative MP3 audio files, time stamp each corresponding transcript, and refine our workflow as technology dictates.
Project Staff and Cost Analysis at 14 Months (September 2005 December 2006)
Our project staff includes three student workers who combined complete 40 hours per week of production involving analog-to-digital conversion, edited file and derivatives creation, backup procedures, and five-minute time stamping. At various stages of the project we have employed specialized staff for specific and one-time processes. These included collaboration with one of our digital lab technicians, Kopana Terry, who holds certification as an audio engineer; and with one of our graduate assistants, Kathryn Lybarger, who holds advanced degrees in math and computer science. Ms. Terry helped set up our workstation and trained the students on analog-to-digital conversion. Ms. Lybarger customized the DLXS software for our desired interface functionality. Jeffrey Suchanek, director of the Louie B. Nunn Center for Oral History, has served as a constant source of information about the character of oral histories and how they have been gathered over the years at the University of Kentucky. Eric Weig's primary role has been to coordinate the project, publish completed interviews within the DLXS system, and complete interface design for the access component of the project.
The content created through this effort is hosted by the Kentuckiana Digital Library <http://kdl.kyvl.org>, a state-wide gateway to rare and unique digital resources documenting the history and heritage of Kentucky. This allowed the project to avoid startup Web server and content management software costs. We are also in a unique and fortunate situation, having 2,600 first draft transcripts already completed. We have therefore avoided the expense of this process during the first year of production. Since these transcripts were produced over the last three decades, it is difficult to precisely calculate cost. However, the production of new first draft transcripts is accomplished by students. We also know that it typically takes five hours to transcribe one hour of audio.What follows is a breakdown of costs associated with the first thirteen months of our audio reformatting project.
Cost of Personnel
40 hours per week of student labor for 52 weeks ($6.30 per hour)
10/01/05-10/13/06 - analog to digital conversion
11/01/06 - 12/01/06 - time stamped files (120 files time stamped) = $1,008
30 hours total of a graduate student programmer to make customizations to the online interface ($8.00 per hour) = $240
15 hours per week of audio engineer technician to train students at 52 weeks of production ($17.00 per hour) = $13,260
1 hour a week project oversight by Librarian for 52 weeks ($30.00 per hour) = $1,560
Total personnel cost for first thirteen months is $29,172.
Cost of Storage
Figure 6: Storage Calculation Chart
Total Storage Expended for WAV: (391,800MB + 195,900MB) or 588GB
Storage space required for first thirteen months is 610GB.
Breakdown of Cost
IBM Trivoli HSM Robotic Tape Drive Mass Data Storage: IBM Quotes the system at $1,200 per TB = $1,200
2 1.2TB Portable external drives at $700 each = $1,400
116 DVD R discs at MAM A/Mitsui Gold $3.42 each = $396.72
Total storage cost for first thirteen months is $2,996.72.
Production Costs: Personnel and Storage
Total Interviews Converted: 327
Total files time stamped: 96
Total production cost for first thirteen months is $32,168.72.
Analog-to-digital audio reformatting projects involving spoken word recordings are becoming more feasible as best practice consensus becomes increasingly apparent in the library community and digital technology processes become more inevitable as preservation strategies. Over the last thirteen months, the analog-to-digital reformatting project at the University of Kentucky Libraries has successfully deployed a production workflow for analog-to-digital conversion to create master files for preservation as well as a workflow and interface to serve audio files to users on the Web. Our approach has been, in large part, to use a student workforce. This has been advantageous as cost has been kept to a minimum, making the full production of 12,000 interview hours less economically prohibitive.
1. Christel, M. G., Richardson, J., Wactlar, H. D. 2006. Facilitating access to large digital oral history archives through informedia technologies. Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries. Online: <http://www.cs.cmu.edu/~hdw/jcdl06_p194_christel.pdf>.
2. Cohen, E. 2001. Folk heritage collections in crisis. Washington, DC: Council on Library and Information Resources and the Library of Congress. Online: <http://www.clir.org/pubs/reports/pub96/contents.html>.
3. Collaborative Digitization Program Digital Audio Working Group. 2006. Digital audio best practices. Online: <http://www.cdpheritage.org/digital/audio/documents/CDP_DABPv2_1.pdf>.
5. Grotke, R. W. 2004. Digitizing the world's largest collection of natural sounds: key factors to consider when transferring analog-based audio materials to digital formats. RLG DigiNews, Vol. 8, Number 1. Online: <http://www.rlg.org/en/page.php?Page_ID=13201>.
6. Gustman, S., Soergel, D., Oard, D., Byrne, W., Picheny, M., Ramabhadran, B., Greenberg, D. 2002. Supporting access to large digital oral history archives. Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries. Online: <http://www.glue.umd.edu/~oard/papers/jcdl02.pdf>.
7. Henderson, T. 2004. The physics classroom: sound properties and their perception. Retrieved February 20, 2007, from <http://www.physicsclassroom.com/Class/sound/u11l2a.html>.
8. MATRIX: Center for Humane Arts, Letters and Social Sciences at Michigan State University. 2001. Working paper on digitizing audio for the nation gallery of the spoken word and the african online digital library. Retrieved May 13, 2007, from <http://www.aodl.org/audio.php>.
9. National Recording Preservation Board. 2006. Capturing analog sound for digital preservation: report of a roundtable discussion of best practices for transferring analog discs and tapes. Washington, DC: Council on Library and Information Resources. Online: <http://www.clir.org/pubs/abstract/pub137abst.html>.
10. Panasonic Corporation. Description of panasonic HDD5 cassette tape. Retrieved February 20, 2007, from <https://eww.pavc.panasonic.co.jp/pro-av/sales_o/02products/tapes/tapes.html>.
11. Seadle, M. 2001. Sound practice: a report of the best practices for digital sound meeting. RLG DigiNews vol. 5, no. 2. Online: <http://www.rlg.org/preserv/diginews/diginews5-2.html#feature3>.
12. Shannon, C. E. 1998. Communication in the presence of noise. Proceedings IEEE, Vol. 86, No. 2.
13. Smith, A., Allen, D. R., and Allen, K. 2004. Survey of the state of audio collections in academic libraries. Washington, DC: Council on Library and Information Resources. Online: <http://www.clir.org/pubs/reports/pub128/pub128.pdf>.
15. White, R. W., Song, H., and Liu, J. 2006. Concept maps to support oral history search and use. Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries. Online: <http://research.microsoft.com/~ryenw/papers/WhiteJCDL2006.pdf>.
Copyright © 2007 Eric Weig, Kopana Terry, and Kathryn Lybarger