Music to My Ears: The New York Philharmonic Digital Archives
The New York Philharmonic's Digital Archives made its debut with programs, scores and other documents dating from 1943-1970, the International Era, which traces Leonard Bernstein's association with the New York Philharmonic. The Philharmonic plans to digitize its entire collection of 8 million pages of documents and 7,000 hours of audio visual material, reflecting the Philharmonic's commitment to providing the broadest possible access to its collections. Before the launch of the Digital Archives, I met with the New York Philharmonic's Digital Archives Project Manager, Mitch Brodsky, to discuss this digitization project's mission and history, its open source content management systems, its metadata and the metadata's role in information retrieval, and digital asset management issues.
The New York Philharmonic's Digital Archives has launched with more than 3,200 programs, more than 1,000 scores marked by past conductors such as Bernstein and Mahler, and other documents from 1943-1970, known to the Philharmonic as the International Era, which traces Leonard Bernstein's association with the New York Philharmonic. The project plans to finish this first phase of digitizing by 2012 (as per its $2.4 million grant from the Leon Levy Foundation), with plans to eventually offer the entire collection, culminating in 8 million pages of documents and 7,000 hours of audio visual material, including scores, videos, business documents, correspondence, programs, as additional content is digitized.
Spearheaded by the New York Philharmonic's Senior Archivist Barbara Haws, the Digital Archives reflects the Philharmonic's commitment to providing the broadest possible access to its collections. The project was structured and planned in 2009 with cataloging efforts that laid the groundwork for digitization, which began in 2010 and is ongoing.
In Fall 2010, before the launch of the Digital Archives, I met with the New York Philharmonic's Digital Archives Project Manager Mitch Brodsky, to discuss this all-encompassing digitization project's mission and history; open source content management systems; metadata and its role in information retrieval; and digital asset management issues. The following report on our discussion is presented as an interview.
CT: What is the purpose and the scope of the New York Philharmonic Archives?
MB: Just like any institutional archives, the purpose of the New York Philharmonic Archives is to collect, preserve, and provide access to the collective memory of the organization. The NY Philharmonic was founded in 1842 as a cooperative of musicians, making it the oldest symphony orchestra in the United States and the third oldest in the world. We are very lucky that the founding musicians found it necessary to keep documentation of everything that went on, and we are even luckier that this material has not been lost over time. Thanks to the diligence of the institution and the embedded "save everything" culture within the Philharmonic, we have the largest, most comprehensive collection of its kind in the world.
CT: Including ticket stubs?
MB: Well, we don't keep all ticket stubs for every concert, but we do keep ticket stubs for important concerts, such as world premieres, soloist debuts, or other landmark events.
CT: You mentioned outside storage. The space that you showed me earlier is a relatively small space. How much of your current collection is located offsite?
MB: Roughly one third of our business record collection is offsite. Everything else scores, images, programs, press clippings is onsite.
CT: You need to make extensive use of every single spare inch?
MB: We do. When you're dealing with a project to digitize 1.3 million pages in a 3-year time span, you have to be very smart about how space is used so there is enough room to accommodate physical processing workflows while making sure nothing gets lost in the shuffle. We are moving huge quantities of material on and off the shelves every day, which can be dangerous. Every square inch of shelving, floorspace, and table space must have a purpose. Otherwise, things would get very disorganized very quickly. The project funding from the Leon Levy Foundation allowed us to actually recreate the space for the digitization project. We purchased computers, moved portions of the collection offsite, and made space for tables where the photography is done. But the important thing to emphasize when it comes to space is that it is all very carefully controlled.
CT: What are the current issues relating to the collection documentation, access, and storage materials, particularly as they pertain to performing arts material?
MB: The documentation of our collections is actually quite remarkable. Every program from December 7, 1842 until last night is described in great detail in a single database. The Playbill editor, orchestra librarian, personnel manager, and artistic administrator use the same database as the archives, helping to ensure consistency in descriptions. All business records (folders and boxes), audio, and video are kept in an Inmagic DB/Textworks database.
In terms of access, we are visited by scholars and musicians from all over the world, let alone our own staff and orchestra members. Archival material through 1970 is open for research and we welcome visitors. The central purpose of the digitization project is to provide access to this wealth of material to anyone in the world, cutting out the expense of traveling to New York to do research here. As well, the researcher is not under a time constraint to complete his/her work before we close the doors at 5:30. If an inspiration strikes in the middle of the night, anywhere in the world, the scholar has the opportunity to pursue it through the Digital Archives.
CT: What's going to happen after this digitization project ends, in regards to access to these materials? Will people still have physical access to them?
MB: There are no plans to formally change our access policies for anyone who would like to visit the archives. That said, if a researcher insists on utilizing a physical item that can be viewed on our site, I would have to ask why. We believe the Digital Archives site provides a user experience that is actually better than sitting at the reading room table with the physical item. Our imaging standards and the viewing application online were designed to allow users to see things they would otherwise need a magnifying glass to see. So, if a researcher feels that he or she needs to see the physical item, it means that we need to improve our techniques or presentation. Also, we have to consider that one purpose of digitizing material is to lessen handling of the physical items. So generally, unless there is a very compelling reason, we will avoid handling the physical items for research purposes once it is digitized.
CT: In the Herculean effort to identify, preserve and digitize the Philharmonic's holdings in its entirety for the public, what challenges have you faced establishing a digital workflow and other digitization procedures?
MB: Since we're digitizing such a huge quantity of material, 1.3 million pages, we had to take an approach that seems a little like an assembly line in other words, we take each step that has to happen and apply it to batches of material, instead of always treating each item individually. Batch processing is very important to us. Some challenges that come up involve the little details that end up making a big difference to users. For instance, how do you replicate a paperclip on a group of digital pages in a folder? After a lot of experimenting, we ended up using little colored gems in the upper right corner of any pages in a folder that were grouped together physically. In order to know where to apply this gem, we had to place colored sheets in the folder to give instructions to the photographers. Thanks to the diligence of the archival staff leading up this project, the metadata for each item was in good shape before we even started, and metadata application is often a big hang-up for institutions trying to do large-scale digitization projects.
CT: What is your process for selecting what gets digitized first?
MB: We divide the Philharmonic's history into three major time periods: The Founding Era (18421908), The Modern Era (19091943), and the International Era (1943-1970). Not only does the International Era frame the years of Leonard Bernstein's relationship with the Philharmonic from his debut to the end of his tenure as Music Director, but it was the time when the United States emerged as a world power, with New York serving as the cultural capital. As well, this time period is rich in audio and video sources so we can test our assumptions on how all of these different formats relate to each other. Ultimately, it was chosen as a result of a roundtable discussion that included librarians, historians, musicians, conductors, journalists, and students before the start of the project. The "About Us" section on the site gives a great description of the significance of this era in terms of the New York Philharmonic. Anyway, the hope is that when we are finished digitizing the International Era, we will continue with the other two periods, beginning with 1842.
CT: Have you dealt with ownership of separate copyright, duration, and scope of copyright on the Internet issues while undergoing this project?
MB: All of the original material in the business folders was produced as work for hire. As for the marked conducting scores, what we are making available are the conductor's markings. Our emphasis is on the interpreter. Where the underlying item is still in copyright, that copyright is still protected. Only scores with markings are viewable and none of them can be downloaded. We make a clear statement that these items are to be used for educational purposes only and have provided all the publishers' contact information for other uses.
CT: What questions arise about applying traditional archival definitions to performing arts digital preservation?
MB: Whether in the performing arts field or not, there are fundamental differences in how we must think about preservation of physical artifacts versus digital assets. In the physical archives world, one major way we preserve things is to leave them alone apply metadata, file it in an acid-free container, put it on a shelf in a climate controlled facility, and never touch it again. We know that under the right conditions we can preserve paper for hundreds of years.
Digital preservation is really the opposite. It requires constant planning, auditing, migrating, assessing, re-planning, and so on. Technology changes so rapidly that no matter what storage mechanism is used, it will have to be changed within 5 years. Another challenge we have is that while it's easy for everyone to say "storage is cheap," enterprise-level storage and backup in a complex IT infrastructure is not. This project infrastructure was built to scale up to 2 petabytes of data. There is no cheap or easy way to store 2 petabytes, let alone backup, audit, maintain, and migrate that quantity of data.
However, since we are a performing arts institution that records nearly every one of its concerts, the challenge of how to archive all of this material has been in the forefront of our thinking for years. Although obsolete now, the Philharmonic started recording concerts on one of the earliest digital formats in the early 1980s simply because it captured the highest quality sound at the time. It's up to the Archives to make sure it's still accessible 100 years from now. For us, optical media is not big enough, hard drives are not reliable enough, and tape can be complex due to rotation schemes. We've found that the best we can do is keep our data on or near-line, and go through a constant cycle of assessing, planning and migrating. As technology and techniques improve, we will improve with it.
CT: What kind of software are you using to complete the project?
MB: Our online system is a combination between Alfresco Enterprise content management system, Solr (search server), CodeIgniter (PHP framework), Vanilla forums (for item-level threaded discussion), PHP List (a mailing list application), and the GNU Bookreader (viewing application), all of which are open source and have been customized and configured for our project. We have strong commitment to open source tools and software and we have chosen platforms, such as Alfresco, that have strong developer communities and wide users bases. I also use a lot of open-source tools on the project management side as well.
CT: Which ones for example?
MB: I do all text editing with Notepad++. It is a great all-purpose editor that can detect what programming language you are working with and format your text accordingly (amongst lots of other great features). I also use the open-source LogExpert to follow server logs. Some other application we use that are free for personal use and inexpensive for commercial use are Renamer by Denis Koslov and Karen's Directory Printer.
CT: Are you developing your own plug-ins or project templates for these open source applications?
MB: The best example of an open-source application that we built upon is the GNU Bookreader developed by the Internet Archive. We added a loupe magnifier, 90° rotation, the ability to call up book metadata, and audio/video players. The hope is that one day we will be able to share our implementation with other institutions looking to follow our model.
CT: You're digitizing your whole archive. Then what? What do you plan to do with all those digital assets?
MB: Part of our plan, in addition to making the material available online, is to utilize Alfresco for digital asset management internally everything from current concert recordings to email. We plan to use Alfresco to store our born-digital materials so that one day, Alfresco will be the single repository to bring together past, present, and future. We do not have any specific plans for re-purposing any of the digital assets created from this project, but it will obviously be a tremendous resource for the organization.
CT: At the New York Archives Conference last summer, Senior Archivist Barbara Haws gave a presentation describing a very highly tailored metadata schema, one specific to the New York Philharmonic's holdings. Tell me about that.
MB: I suppose you could call it the New York Philharmonic schema. Our philosophy on metadata is that we don't want to fit a square peg in a round hole. We know our data, and we know how people seek information in our data stores and why. The metadata is very specific and well-structured, so we don't see the point in mapping it all to some other system that doesn't mean anything to us. Also, the Philharmonic has developed its own style for music compositions that appear throughout the institution, on the web and in its publications, and these are at the core of our metadata in programs and scores.
For instance, what we track in our program metadata goes far beyond what any existing metadata model can give us, at least that I've heard of. We track not just the composers on a program and the works on a program and a soloist on a program, but we also build relationships between specific works on a program and the conductor and/or soloist that performed the work. This allows us to find the performance history of any artist in a flash. On the Digital Archives site, we tried to represent this using faceted search, which allows you to choose from categories on the left to narrow your search.
CT: A lot of conversation is currently centered on the Semantic Web and Linked Data, and there is a push towards programming languages, and ways of highlighting relationships between creators and within objects and items to improve information retrieval. What are your thoughts on this?
MB: I think there's a lot of potential in the Semantic Web for data retrieval on a large scale. But it's still not clear to me on a practical level how this would help someone like me complete this project. What we're looking at is a database with relationships, and we need software that can index our terms, preserve our relationships within the index, and then allow searchers to find terms in our index and utilize relationships. It's very practical, and there are already well-tested and widely implemented search systems which we can utilize for our purpose, such as Solr/Lucene. For instance, we have composer names hyperlinked in our metadata online. When you click the name, the code makes a query to our index and gives you a list of all other places where that name occurs.
CT: Instances where this name pops up?
CT: I can imagine the discussion that must occur when it comes to setting limits, or to deciding where to stop in the design process.
MB: We never put limits on what we envision in an ideal world. But due to time constraints and in some cases, reliability of the technology, we need to prioritize which items we can tackle now and which items will be left for later. Since we are pioneers here, we are constantly learning and re-evaluating. The process of creating the Digital Archives is iterative and flexible. We often start out with a vision of how something's going to work, we start execution, we tweak our design (or our expectations), and eventually we deploy to the live environment. Often, what gets deployed in the end is a little different than the original vision.
CT: So there is a testing period after you develop the model. How long is your testing period? Two months, three months?
MB: We don't have a defined timeframe for testing. It is different depending on the task. We are still making modifications to how we process business records, and that workflow was thought up and executed for the first time about a year ago. Also, all digital images are proofed in a live environment by our archives staff so we are constantly seeing what works and what doesn't and implementing suggestions to make the search and viewing experience better. The point is, we have tried to develop a flexible system that we control; one that allows for incremental, cost effective and easily implemented changes as we learn and go forward. For application and web development, the process is more clear-cut: create specifications, allocate resources, deploy in the development environment, test, and eventually deploy to the production environment. Depending on the task, it can take anywhere from a day to two months.
CT: Are you documenting this whole procedure as you're going through with the project?
MB: Yes, we are careful to document everything we do in the execution of this project, from workflows to software to storage and beyond. Part of our responsibility is to share what we have learned in this process with others who might want to begin their own digitization projects. I hope that other archives, libraries, and museums will view our project as a viable model on which to base their own projects.
CT: What are the steps after the digitization, in regards to preservation of the materials and outreach, to let patrons know how to use this newly accessible, digital database that you're starting from the ground up?
MB: There have been numerous print and online articles that have driven a lot of traffic to the site (more than 30,000 unique visitors in the first month from around the world). We are working to develop partnerships with University history departments and music conservatories to create curricula that will include the digital material, and we're planning a symposium for the end of 2012. We want everyone to help us spread the word!
In terms of digital preservation, we are investigating various storage and backup systems and technologies. The data is being kept in enterprise Isilon storage, which gives us the ability to scale without sacrificing performance, and we are forming a digital preservation plan to determine how often to audit, backup, and migrate the data. From here, the repository will only grow. The hope is that in ten years, everything between 1842 and 1970 will be digitized.
CT: Even the ticket stubs?
MB: Even the ticket stubs.
Mitch Brodsky, Digital Archives Project Manager at New York Philharmonic, can be reached at firstname.lastname@example.org.
About the Author