Volume 5 Number 11
Encoded Archival Description
An Introduction and Overview
Daniel V. Pitti
Institute for Advanced Technology in the Humanities
University of Virginia
Encoded Archival Description (EAD)1 is an emerging standard used internationally in an increasing number of archives and manuscripts libraries to encode data describing corporate records and personal papers. The individual descriptions are variously called finding aids, guides, handlists, or catalogs. While archival description shares many objectives with bibliographic description, it differs from it in several essential ways. From its inception, EAD was based on SGML, and, with the release of EAD version 1.0 in 1998, it is also compliant with XML. EAD was, and continues to be, developed by the archival community. While development was initiated in the United States, international interest and contribution are increasing. EAD is currently administered and maintained jointly by the Society of American Archivists and the United States Library of Congress. Developers are currently exploring ways to internationalize the administration and maintenance of EAD to reflect and represent the expanding base of users.
There are many reasons for developing a community-based standard for the encoding of archival description. As archives increasingly employ computer and network technology to create and maintain essential, valuable information, they need reasonable assurance that the information they create will endure rapid changes in hardware and software. If archivists do not take this requirement into consideration, then they will find -- indeed, many have already found -- that information created yesterday is no longer usable today. Hardware- and software-independent encoding standards offer the only reasonable assurance of enduring information. In order to exploit archival description fully in a computer environment, archivists need to represent faithfully and accurately its intellectual nature and content. The logical components of archival description and their relations to one another need to be accurately identified in a machine-readable form to support sophisticated indexing, navigation, and display that provide thorough and accurate access to, and description and control of, archival materials. In addition to these benefits, uniform, standard descriptions will make it easier for archivists and researchers alike to readily identify and comprehend the essential components of archival description, thereby making the descriptions "far more useful than the present chaos of irregularities."2
The most appealing reason for standardizing the encoding of finding aids, however, is that standardization will support the long-cherished dream of providing archivists and both professional and public researchers universal, union access to primary resources. Standardization will make it possible to build union access (through union databases, but more ideally through union indexes) to archival descriptions originating in repositories throughout the world, which will enable users to discover or locate archival materials at any time and from any place. Such access will enable libraries and archives to easily share information about related but different records and collections, and dispersed records and collections. Standardized description will also enable the "virtual" reintegration of collections related by provenance, but dispersed in different repositories.
Archives share with libraries the responsibility to remember on behalf of others. Archives differ from libraries in the nature of the things remembered. Libraries collect individual published books and serials, or bounded sets of individual items. The books and journals libraries collect are not unique, at least not in ways that are of general interest. Multiple copies of one publication exist, and any given copy will generally satisfy as well as any other copy. The materials in archives and manuscript libraries are the unique records of corporate bodies and the papers of individuals and families. The records and papers are the unselfconscious byproducts of corporate bodies carrying out their functions and responsibilities, and of individuals or families living their lives. All records or documents generated by one corporate body or individual or family are referred to as a collection, or fonds3. A fonds is defined in International Council of Archives (ICA) General International Standard Archival Description (ISAD(G))4 as:
The whole of the documents, regardless of form or medium, organically created and/or accumulated and used by a particular person, family, or corporate body in the course of that creator's activities and functions.
In contrast to the published items collected by libraries, the identifiable object of interest in the archive is a complex body of interrelated, unique materials. The fonds coheres and is identifiable because all of its records or papers share a common provenance, derived from one source and context. While a fonds may contain just one item, it frequently comprises hundreds or thousands, and sometimes millions. Like serial publications, many fonds are open, with additional materials added over time. The items in a fonds are frequently manuscripts and typescripts, but they may be in any form or medium: plans, drawings, charts, maps, photographs, audio, video, audio-video, electronic records of all kinds, and so on.
Archives and libraries differ not only in the nature of what they remember, but on behalf of whom they remember. While the accessibility of the Internet is rapidly changing the user communities of both libraries and archives, they have traditionally served different, though overlapping, communities. Libraries have generally served the public, and educational and scholarly communities. Many archives, in turn, serve the law, functioning as the institutional memory of specific corporate bodies. Government agencies, public institutions, and businesses have legal requirements pertaining to the keeping of records. Archives and manuscript libraries also remember on behalf of history, which is to say, they preserve a large portion of the raw material on which our historical understanding is based. Both legal and historical memory require a high degree of user confidence in the authenticity and integrity of records and documents. The materials in archives and manuscript libraries are evidence, both legal and historical.
The distinction between what and for whom libraries and archives remember accounts for the major differences in archival and bibliographic description. A bibliographic description, such as that found in a MARC record, represents an individual published item, and thus is item-level. There is a one-to-one correspondence between the description and the item. The description is based on, and is derived from, the physical item. Archival description represents a fonds, a complex body of materials, frequently in more than one form or medium, sharing a common provenance. The description involves a complex hierarchical and progressive analysis. It begins by describing the whole, then proceeds to identify and describe sub-components of the whole, and sub-components of sub-components, and so on. Frequently, but by no means always, the description terminates with a description of individual items. The description emphasizes the intellectual structure and content of the material, rather than their physical characteristics.
For the materials in a fonds to function as both legal and historical evidence, it is important to document the context of their creation, and provide an analysis of their internal structure and content. In archival terminology, this is called the respect des fonds. It encompasses two important principles: provenance and original order. In archival description, documenting provenance involves supplying an administrative history for a corporate body, or a biography for an individual or family. In both the history and biography, functional responsibilities and activities are described. To document the original order, a scope and content analysis provides details on the functions, activities, dates, and geographic areas covered; documentary arrangement and forms; and subjects represented. Archival description is thus collection- or fonds-level, and involves a detailed, hierarchical analysis of the whole and its sub-components, with an emphasis on provenance and the organization, arrangement, and content of the material.
It is worth noting that archival description also differs from bibliographic description in another significant way. Bibliographic descriptions, by and large, are brief. An entire description frequently can be contained on one card, or perhaps two or three. Archival description also can be brief -- especially if the unit of description is one item or the description constitutes only a summary -- but it also can be a thousand or more pages in length. Detailed archival descriptions average 15-30 pages in length.
Encoded Archival Description5
The standardization of archival description requires several interrelated standards.6 First, there needs to be a standardization of the essential components or categories of description, and the interrelation of these categories. This constitutes the intellectual semantics and syntax for archival description. This is essentially a structural framework which is comprehensive rather than prescriptive. ISAD(G) is the International Council of Archives� structural standard for archival description. Second, there needs to be a content standard, with specifications on required and optional categories, how to compose, and what to include in each category. Third, standard rules and authorities are needed for highly controlled information such as geographic, country, and language codes; personal, corporate, and family names; and subjects. Finally, there must be a standard communication format or syntax representing the structural standard. The communication standard enables information sharing between computers and between people. Encoded Archival Description (EAD), based on ISAD(G), is an archival description communication standard.
Encoded Archival Description is in the form of a Standard Generalized Markup Language (SGML) and Extensible Markup Language (XML) Document Type Definition (DTD).7 SGML is a hardware- and software-independent standard maintained by the International Standards Organization for developing encoding schemes for textual material. SGML was first published in 1986, and has enjoyed a great success in government, industry, and academia. Because SGML is complex, with challenging features, programmers have found developing software difficult. XML is a compatible subset of SGML developed by the World Wide Web Consortium (W3C) and approved in February of 1998.8 XML and the companion standards, Extensible Stylesheet Language (XSL) and Extensible Linking Language (XLink), provide most of the functionality of SGML and related standards (DSSSL and HyTime) without the more complex and challenging features that have made SGML software development difficult. XML has been widely embraced by software developers and appears to be fostering the broad and diverse software market envisioned by its designers. As a DTD compliant with both SGML and XML, EAD is well positioned to take advantage of both existing SGML software and, as it emerges, XML software.
Reflecting ISAD(G), the EAD DTD emphasizes the hierarchical nature of archival description and inheritance of description. A diverse set of descriptive elements is available for describing the whole of a collection or fonds. Following the description of the whole, the same elements are available for describing components of the whole, components of the components, and so on. At each level of description, only that description which applies to the entire level is given. Each lower level inherits the description of the containing or superior level. For example, the name of the repository would only be given in the description of the whole, and not repeated in the description of sub-components.
The EAD DTD contains three high-level elements: the <eadheader>, <frontmatter>, and <archdesc>. The <eadheader> is used to document the archival description or finding aid, while the <frontmatter> is used to supply publishing information such as a title page, and other prefatory text. The <archdesc> contains the archival description itself, and thus constitutes the core of the EAD.
The <archdesc> contains several high-level descriptive categories which themselves contain more detailed descriptive categories. The most important of the high-level elements is the <did> or descriptive identification. The purpose of the <did> is to supply the information essential for the user to identify the materials and to make a reasonable judgment concerning their relevance. The <did> thus contains elements for identifying the title, creation dates, creator, extent, and holding repository, as well as an element for supplying an abstract of the scope and content of the materials and a brief biography or history of the creator. Following the <did> are elements for providing administrative information, such as restrictions on access or use (copyright); detailed biographies and histories, and scope and contents; related materials; controlled access; and so on. Finally, the <archdesc> contains an element that facilitates a detailed analysis of the components of a fonds, the <dsc> or description of subordinate components. Following the principle in ISAD(G) that all elements of description be available at all levels in the hierarchical description, the <dsc> contains a repeatable, recursive element, the <c> or component, that has all of the descriptive elements in it that the <archdesc> has. <c>s can thus be "nested" inside of <c>s to any level needed to fully describe all components of a fonds.
In addition to representing standard archival description, EAD takes advantage of the digital medium to support linking the description to original digital and digital representations of archival material. EAD thus can be used to provide direct access to manuscripts, correspondence, pictorial material, audio recordings, audio-visual materials, maps, and so forth. Such linking can be used to enhance the description by providing representative examples of the described materials, or for providing access to entire fonds.
EAD represents a very early stage in the transformation of archival description using advanced technologies. It provides a means to create machine-readable versions of traditional archival description which in turn provides the archival community with the opportunity to experience and understand new technologies. To date, most EAD implementations concentrate on display, and relatively simple indexing. While network access and full-text indexing greatly enhance access to the descriptions and, through them, to the materials described, it is clear that much more is possible. Research is needed to understand whether traditional archival description will be effective in this new medium, what may need to be changed or added, and in what ways and how to exploit fully the descriptive information.
While there are many opportunities for further research and development, there are two areas that have already generated great interest: authority control and language-specific versions of EAD. Though EAD accommodates biographical and historical information, there are clear advantages to creating and maintaining this information independent of archival description. Currently there is an international effort to develop a DTD based on ICA�s International Standard Archival Authority Record for Corporate Bodies, Persons, and Families (ISAAR(CPF)) that would be compatible with EAD. A DTD based on ISAAR(CPF) would facilitate building an international biographical and historical database documenting corporate bodies, individuals, and families which would serve as the gateway to archival descriptions and resources, would be an important resource in itself, and would facilitate description of dispersed and complex fonds.
International use of EAD brings problems associated with languages. Archivists whose native language is not English cannot be expected to understand and apply a standard that is oriented to English. It will be necessary to create versions of the DTD and documentation in other languages. Fortunately, HyTime architectural form processing enables language-specific versions of the EAD DTD that can be mapped to the English version as a canonical form for communication and interchange.9 Such language-specific versions will facilitate further internationalization of EAD.
Since the release of the alpha version of EAD in February 1996, an increasing number of repositories have begun to use it for encoding machine-readable finding aids. In the United States, more than 30 repositories in California have formed a consortium called the Online Archive of California (OAC). Currently, there are approximately 70,000 pages of encoded finding aids describing more than 3,000 collections in the OAC database and, with the use of conversion services, the database is expected to double in size in the next two years. Texas, New Mexico, and Virginia are developing consortia modeled after the OAC. A large number of other American repositories are also employing EAD. Among them are the Library of Congress, Harvard University, Yale University, Duke University, the Minnesota Historical Society, University of North Carolina, University of Michigan, and the University of Virginia. An increasing number of Canadian repositories are also beginning to use or are contemplating the use of EAD, including the British Columbia Archives, York University, University of New Brunswick, and archival networks in Saskatchewan, Manitoba, and Ontario. In the United Kingdom, the largest repository using EAD is the Public Record Office, which is converting its 450,000 page guide describing in excess of 150 kilometers of material dating back 900 years. There are also EAD projects at Oxford University, University of Glasgow, University of Warwick, Durham University, and other universities. EAD projects are also underway in several European, Latin American, and Australian repositories. In the fall of 1998, the Research Libraries Group announced its Archival Resources service, which provides union, universal access to finding aids from participating repositories.10
While EAD is well along in development, like any standard it needs to, and will continue to, develop. EAD is currently jointly maintained and administered by the United States Library of Congress (LC) and the Society of American Archivists (SAA). LC is responsible for physical maintenance, while SAA is responsible for political and intellectual oversight. Within the Society of American Archivists, the EAD Working Group (EADWG) is directly responsible for ongoing development. The working group has representatives from SAA, LC, the Research Libraries Group, OCLC, and the ICA. In addition to the ICA representative, the working group has two international members, one each from Canada and the United Kingdom.
While standards present intellectual and technical challenges, they also present political challenges. A standard will only succeed if it realizes the principles and objectives of the community it serves. The standard will only reflect the principles and objectives if those responsible for its administration and maintenance represent the entire user community. It is anticipated that as use of EAD expands internationally, the membership of the EADWG will expand also, with political and intellectual oversight eventually passing to an international body.
1. EAD related information and files are available at http://lcweb.loc.gov/ead/ and http://jefferson.village.virginia.edu/ead. The following publications are also available at http://www.archivists.org/catalog/index.html: Encoded Archival Description: Context, Theory, and Case studies (Chicago: Society of American Archivists, 1998); Encoded Archival Description Tag Library: Version 1.0 (Chicago: Society of American Archivists, 1998); and Encoded Archival Description Application Guidelines: Version 1.0 (Chicago: Society of American Archivists, 1999). The Tag Library is also available at: http://lcweb.loc.gov/ead/tglib/tlhome.html.
2. Jewett, Charles C. Smithsonian Report on the Construction of Catalogues of Libraries and Their Publication by Means of Separate, Stereotyped Titles (Washington, D.C.: Smithsonian Institution, 1853), 9.
3. Archival terminology is highly problematic, as the international archival community is only in the initial stages of negotiating a common vocabulary. Americans prefer the term "collection" over "fonds." For most of the rest of the world, a "collection" is an intentional gathering of materials based on one or more criteria as opposed to materials "organically generated." In American usage, the intentionally gathered materials are "artificial collections." In Britain, "archive," in the singular, is used for "fonds," though sometimes also, and more recently, "collection" is used. Archival description is used for both "organically generated" fonds and "artificial collections."
4. See http://data1.archives.ca/ica/cgi-bin/ica?04_e
5. For a more detailed description of EAD, see the Encoded Archival Description Tag Library: Version 1.0: http://lcweb.loc.gov/ead/tglib/tlhome.html
6. The following analysis is loosely based on Standards for Archival Description: A Handbook (Chicago: Society of American Archivists, 1994). See http://www.archivists.org/catalog/stds99/index.html
7. For more information on both SGML and XML, see http://www.oasis-open.org/cover/sgml-xml.html
8. See http://www.w3.org/XML/
9. Gary F. Simons. "Using Architectural Processing to Derive Small Problem-specific XML Applications from Large Widely-used SGML Applications" in Markup Technologies �98 Conference Proceedings (Chicago: Graphic Communications Association, 1998), 51-59.
10. For lists of many current EAD sites see http://jefferson.village.virginia.edu/ead/sitesann.html and http://www.loc.gov/ead/eadsites.html
Copyright � 1999 Daniel V. Pitti
Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous story | In Brief
Home | E-mail the Editor
D-Lib Magazine Access Terms and Conditions