Brian F. Lavoie
Released in May 2005, the PREMIS Data Dictionary for Preservation Metadata was the first comprehensive specification for preservation metadata produced from an international, cross-domain consensus-building process. The PREMIS (Preservation Metadata: Implementation Strategies) working group, jointly sponsored by OCLC and RLG, consisted of more than 30 experts from 5 countries, representing libraries, archives, museums, government agencies, and the private sector. The working group was tasked with defining implementable, core preservation metadata, supported by guidelines and recommendations for its creation, management, and use, and applicable in a wide range of digital preservation contexts. Following its release, the PREMIS Data Dictionary won the 2005 Digital Preservation Award and the 2006 Society of American Archivists Preservation Publication Award.
The PREMIS Maintenance Activity  was set up by the Library of Congress to coordinate the release and maintenance of the Data Dictionary and its associated XML schema, as well as to serve as a permanent Web home for PREMIS-related news, resources, and activities. The Maintenance Activity also set up mechanisms to encourage community feedback on the Dictionary, including a discussion list for those interested in implementing the Dictionary in their own digital archiving systems.
At the time of the Dictionary's release, the Maintenance Activity decided that it would be "frozen", with no further updates or changes, for an extended period of time. This would allow the Dictionary to circulate through the community, and afford potential implementers an opportunity to digest its contents and consider issues involved in its application to their local preservation systems. Although the Dictionary was developed with a strong emphasis on practical implementation and ease of use, the working group members recognized that like any new resource or technology, gaps would inevitably arise between conception and application, and that a revision of the Dictionary would eventually be necessary. The period in which the Dictionary was frozen would permit the revision to be based on a critical mass of experiences and feedback from the first-generation of implementers.
After about two years, the Maintenance Activity felt that enough feedback had accumulated to warrant undertaking the first revision of the Data Dictionary and its XML schema. The revision process began in October 2006, and ended with the release of the PREMIS Data Dictionary 2.0  in April 2008. This article briefly describes the revision process and its outcomes, including a summary of the major changes appearing in the new version of the Dictionary. It also describes some of the other activities, in addition to the revision, that the Maintenance Activity has undertaken since the initial release of the Dictionary, and discusses some areas for future work.
The PREMIS Editorial Committee and the Revision Process
In August 2006, the Maintenance Activity formed a ten-person Editorial Committee that among other things would be responsible for coordinating and approving revisions to the PREMIS Data Dictionary. The Editorial Committee is comprised of preservation metadata experts from a variety of institutions and countries, and is chaired by Rebecca Guenther of the Library of Congress. The Committee conducted its discussions via teleconference, meeting at least twice monthly over the course of the revision process.
To initiate the revision process, the Committee reviewed feedback on the Data Dictionary that had accumulated since its release in May 2005. This included postings to the PREMIS Implementers Group (PIG) discussion list, questions or comments sent directly to PREMIS working group members or to the Maintenance Activity, and discussion at conferences and meetings. In addition, useful feedback was received during various offerings of the PREMIS tutorial, as well as from a report describing the results of a survey of institutions implementing the Data Dictionary (the tutorials and the survey report will be discussed later in the article). It is important to emphasize that much of the feedback received was from individuals who were working with the Dictionary in the context of implementing it in their local archiving system, so suggestions for changes and updates for the most part were oriented toward improving the ease of application of the Dictionary, as well as making interpretation of the metadata defined in the Dictionary and the guidelines for its use more consistent and understandable. One of the fundamental principles underlying the development of the Dictionary was that it was to be a resource oriented toward practical use, and much of the commentary received was directed toward strengthening that aspect of the Dictionary.
The commencement of the revision process was formally announced in October 2006. The PIG discussion list was chosen as the primary forum through which the Editorial Committee would engage with the community at large. Interested parties were encouraged to submit additional suggestions for changes and updates for the Dictionary, and to comment on proposed revisions.
The Editorial Committee spent considerable time organizing and synthesizing the feedback into a list of proposed changes for the Data Dictionary. The Committee then worked its way through each item on the list, considering the merits of each in terms of how it might contribute toward improving the Dictionary, and depending on the outcome of the discussion, drafting a proposed revision to the Dictionary to address the comment. Not every suggested change was acted upon, but all were considered.The nature of the suggested changes ranged from minor corrections of errors or inconsistencies, to major re-workings of parts of the Dictionary. The Editorial Committee sought to conduct the revision process as openly as possible, in the sense of affording the community ample opportunity to participate in discussions of issues of particular interest; at the same time, however, the Committee did not want to unduly burden the process by requiring community discussion and consensus on every proposed revision. To balance the desire for community involvement with the need to keep the process moving forward at a reasonable pace, the Editorial Committee settled on a strategy for community engagement whereby proposed revisions the Committee felt were particularly significant were shared in draft form on the PIG list for comment prior to finalization, while relatively minor revisions were not. A list of finalized revisions to the Dictionary was maintained on the Maintenance Activity Web site so interested parties could be apprised of progress as the work advanced.
Highlights of the Revision
The Editorial Committee made changes and updates throughout the Data Dictionary. Many of these were fairly minor corrections and clarifications. Several of the changes, however, were much more significant. This section briefly summarizes the major changes to the Dictionary made during the revision process. A complete list of revisions is available at the Maintenance Activity Web site .
The semantic units defined in the Data Dictionary are organized within a simple data model. The data model includes five entities Intellectual Entities, Objects, Events, Rights, and Agents. A PREMIS semantic unit can be viewed as a property of one of these entities. Thus, the Dictionary is a compilation of the information about these entities that repositories need to know in order to carry out long-term digital preservation activities.
Relationships between entities are documented through the metadata associated with each entity. For example, a relationship between an Object and an Event might be recorded using the linkingEventIdentifier semantic unit associated with the Object, which would include information sufficient to identify the Event in question. In the original version of the data model, most of these relationships were defined as bi-directional for example, metadata associated with an Object could point to a related Event, or metadata associated with the Event could point to the Object. However, in two cases relationships between Rights and Agents, and relationships between Events and Agents the relationship was defined as uni-directional. In the former case, a relationship between a Right and an Agent could only be documented through metadata associated with the Right; in the latter case, a relationship between an Event and an Agent could only be documented through metadata associated with the Event.
The reasoning underlying these uni-directional relationships was that the PREMIS working group felt that detailed metadata about Agents was out of scope for the Dictionary. Therefore, Agent-related metadata was confined to identification and type (e.g., person, organization, system). Metadata pertaining to relationships involving Agents was allocated to other entities. In PREMIS 2.0, relationships in the data model have been generalized to exhibit bi-directionality in all cases, including those involving Agents. A general structure for recording this information is now applied to all entities, with all relationships defined as bi-directional. This simplifies the data model by making all entity relationships exhibit the same general structure, while at the same time introducing more flexibility in how these relationships can be recorded in practice: i.e., as reciprocal links between entities, or as a reference associated with either entity that is party to the relationship. The new PREMIS data model is shown in Figure 1.
PREMIS 2.0 includes a completely revised and expanded Rights entity. Although the PREMIS working group recognized that rights management was an important aspect of digital preservation, the original version of the Dictionary had limited provision for recording information pertaining to intellectual property rights. In this formulation, expression of rights was limited to a simple "permission statement", of the form "Agent A grants this permission for Object B". The Dictionary defined metadata that allowed implementers to identify the Object(s) and Agents associated with the permission, as well as to record the type and characteristics (e.g., duration) of the permission itself.
The limited treatment of rights metadata in the original version of the Dictionary reflected the view of the working group that little work had been done in determining what information was necessary to support rights management in a preservation context, and how that information should be schematized as metadata. Rather than expending a great deal of time and energy in crafting an elaborate set of metadata for rights expression that might soon be superseded by other work, the group decided instead to offer an abbreviated set of metadata that could be used to express some basic rights information, and might also serve as a starting point for further work in this area.
Following the publication of the first version of the Data Dictionary, the PREMIS Maintenance Activity commissioned a report that examined in detail the issue of rights expression in a digital preservation context, along with implications for metadata management. This report, along with several other sources, provided enough background and additional thinking on the topic that the Editorial Committee felt it would be useful to extend the Rights entity in the second version of the Dictionary. Accordingly, the Rights entity in PREMIS 2.0 permits a much richer description of rights statements than was previously possible using PREMIS metadata.
Like its original version, the Rights entity in PREMIS 2.0 is intended to support an automated process that determines if a particular preservation-related action is permissible in regard to an Object or set of Objects within the repository, as well as to record important information about the permission. However, key differences exist between the old and new versions of the Rights entity. In PREMIS 2.0, the permissionStatement container is replaced by a new rightsStatement container, which can be used to express three forms of intellectual property rights: those established by copyright, those established by license, and those established by statute. The Rights entity defines metadata applicable to all three forms of rights statement, such as identifiers, the nature, scope, and characteristics of the rights granted to the repository, the Object(s) to which the rights apply, and the Agents responsible for granting or administrating the rights. In addition, the new Rights entity defines metadata specific to copyright-, license-, and statute-based intellectual property rights. The result is a deeper, more nuanced description of rights in a digital preservation context, yet one that preserves the earlier version's practical orientation toward automated processing.
Significant Properties and Preservation Level
One of the most interesting aspects of preservation metadata is the concept of significant properties: characteristics of a digital object such as look, feel, functionality, and intellectual content designated by a repository as important to preserve over time. Significant properties are not intrinsic to a digital object; instead, they are subjectively determined by the context in which the preservation activity takes place. Thus, one repository might determine that the significant properties of a PDF document include not just the textual content, but the lay-out, internal linkages, and other aspects of the document's appearance and functionality, while another repository might conclude that simply preserving the document's intellectual content is enough to satisfy its preservation goals. Clearly the choice of significant properties will impact the preservation strategy adopted by the repository. Preserving an instance of a digital object faithful to the original's look, feel, and functionality might require a technique such as emulation, while preserving just the intellectual content might be accomplished through format migration.
Significant properties have received considerable attention of late, most recently at a workshop devoted to the topic sponsored by the UK's Digital Preservation Coalition . In the original version of the PREMIS Data Dictionary, a single semantic unit was defined to record significant properties. This semantic unit was unstructured by additional sub-units, and could be implemented in a variety of ways, ranging from detailed, prose descriptions of each significant property, to machine-processable codes. In PREMIS 2.0, the Editorial Committee decided to expand this limited treatment into a more detailed, structured set of semantic units for recording information about significant properties. In the new formulation, the single semantic unit has been replaced with two: one that allows the repository to declare the "facet" of the Object's characteristics to which the significant property applies e.g., content, appearance, functionality, and so on and another that describes the property itself. For example, a repository might record a facet/property pairing of "behavior" and "editable", respectively. Use of these semantic units should aid in the management of significant properties attributed to preserved Objects, and help ensure that these properties persist over time and are not compromised by preservation actions.
Related to the issue of significant properties is preservation level: a description of the preservation actions or objectives a repository intends to fulfill in regard to an Object in its custody. There are a variety of preservation levels that might be invoked on an Object, ranging from simple "bit-level preservation" aimed at nothing more than ensuring that the bits comprising the Object persist in an unaltered form over time, to more complex levels that seek not only to preserve the Object, but to ensure its long-term renderability and understandability as well. Choice of preservation level can stem from various contingencies, including the nature of the significant properties associated with an Object, or even the willingness of a repository or its clients to pay for more complex, and presumably more expensive, preservation strategies. Some repositories may only support a single preservation level that is applied to all Objects; others may support multiple levels. In the latter case especially, it is important to be able to associate the appropriate preservation level with each Object in the repository.
As with significant properties, the original version of the Data Dictionary confined information about preservation level to a single, unstructured semantic unit. The new version of the Data Dictionary provides for a richer, structured description of preservation level. The purpose is to afford repositories the capability to record not just the preservation level itself, but also important information concerning why, when, and in what context the preservation level assignment was made. In PREMIS 2.0, preservation level can be described with as many as four semantic units. In addition to identifying the preservation level attached to an Object, the Dictionary also defines semantic units that allow repositories to document the date the preservation level was assigned, and the rationale for its selection. The latter information can be particularly important to record when the chosen preservation level differs from standard repository policy. For example, it might be standard practice to assign a preservation level of "bit-level only" by default to all Objects ingested into the repository, except in special circumstances. The Dictionary provides a semantic unit with which the repository can record the nature of the circumstances necessitating departure from the norm.
In addition to value, date, and rationale, PREMIS 2.0 defines a semantic unit that allows the repository to indicate the context in which the preservation level was assigned, as a means of differentiating among multiple preservation levels assigned to the same Object. For example, an Object might be assigned a preservation level of "full preservation" (i.e., preserve the look, feel, and functionality of the Object) in the context of the repository's long-term plans for managing the Object. The Object might then be assigned a second, lower preservation level (e.g., "bit-level preservation") to indicate the repository's current capacity for preservation action. The two preservation levels, assigned to the same Object, can be differentiated by pairing them with semantic units that describe their respective contexts: in the first case the preservation level describes a future intention; in the second case, it describes current capability.
The Editorial Committee (as well as the original PREMIS working group) recognized that situations would arise where repositories would want to supplement PREMIS-defined metadata with additional (perhaps locally defined) metadata, or to extend or replace some PREMIS metadata with other schema that provide more structure or granularity. In the original version of the Data Dictionary, extending PREMIS metadata with non-PREMIS metadata was problematic when implementing the Dictionary with the accompanying PREMIS XML schemas, because the schemas lacked a mechanism for accommodating metadata from non-PREMIS specifications. PREMIS 2.0 corrects this problem by introducing a formal mechanism for extensibility when using the PREMIS schemas.
The extensibility mechanism was applied to seven PREMIS semantic units: significantProperties, objectCharacteristics, creatingApplication, environment, signatureInformation, eventOutcomeDetail, and rights. The Editorial Committee felt that these semantic units were the ones implementers would be most interested in extending with non-PREMIS metadata. For example, it is easy to imagine circumstances where implementers would wish to replace the metadata defined in signatureInformation with that defined in the W3C's XML Signatures specification . Similarly, other schemas are available for expressing rights information, such as the California Digital Library's copyrightMD schema ; implementers might wish to utilize one of these to supplement or replace the rights metadata defined in the PREMIS Data Dictionary.
The extensibility mechanism for the PREMIS schemas involves the inclusion of a new "extension container" defined as a semantic component (i.e., sub-element) of each of the semantic units listed above. The name of the extension container combines the name of its corresponding extensible semantic unit, with the suffix "extension": thus, for significantProperties, the name of the extension container is significantPropertiesExtension. The extension container can be used to include metadata from schemas external to the PREMIS Data Dictionary. This externally defined metadata can supplement and/or replace PREMIS-defined metadata.
The extension container objectCharacteristicsExtension is a special case of the extensibility mechanism. This extension container was created specifically for the purpose of allowing implementers to include externally-defined, format-specific technical metadata, such as the NISO Z39.87 MIX schema defining technical metadata for digital still images . The Data Dictionary is technical- and implementation-neutral; because of this, format-specific technical metadata is out of scope for PREMIS-defined metadata. Nevertheless, technical metadata is often crucial for carrying out long-term preservation strategies. The Editorial Committee therefore wanted to make explicit provision for implementers to extend the Dictionary with technical metadata defined in other schemas. The objectCharacteristicsExtension container fulfills this need. Use of this extension container differs slightly from the other extension containers defined in the Dictionary. In this case, the objectCharacteristicsExtension container can be used to supplement, but not replace, other metadata defined in objectCharacteristics. This restriction reinforces the point that the extension is intended to cover a class of information that is not addressed in the Dictionary.
Additional Changes and Updates
The revisions discussed above are the most extensive changes found in PREMIS 2.0. Many other changes or updates have also been implemented in the Dictionary. Usage notes and clarifications were added for many semantic units. The layout and formatting of the Dictionary was updated to improve usability, and semantic units were numbered for easier reference. A number of changes were made to the PREMIS schemas as well. The schemas, originally consisting of five separate schemas in the first release, have now been combined into a single schema for simplified maintenance. In addition, PREMIS 2.0 includes new recommendations for expressing dates and times in the PREMIS schema, using ISO 8601  as a starting point and extending it to include forms of expression useful for preservation metadata not covered in the standard. Limited space prevents an exhaustive enumeration in this article of all changes or updates, and readers may find that some changes not discussed here are of particular significance for their local implementations. Like its earlier version, the new Data Dictionary is accompanied by additional material that offers detailed explanation, guidance, and recommendations for understanding and using PREMIS 2.0.
The Editorial Committee is always interested in feedback on the Data Dictionary or any other PREMIS-related resource. Please post comments or questions to the PREMIS Implementers' Group (PIG) e-mail discussion list, which is open to all. Visit the PREMIS Maintenance Activity Web site for instructions on how to subscribe.
The revision of the Data Dictionary has been the primary focus of the PREMIS Maintenance Activity for the past eighteen months, but other activities have been undertaken as well, including the publication of two commissioned reports and educational outreach.
The Maintenance Activity has published two reports since the first release of the Data Dictionary. Both reports were commissioned through funding provided by the Library of Congress. The first, Rights in the PREMIS Data Model, authored by Karen Coyle and published in December 2006 , is a detailed analysis of rights management in a digital preservation context. Coyle enumerates the various ways "the right to preserve" can be manifested in law and contracts, discusses current metadata specifications for rights expression, and relates all of it to the rights metadata defined in the original version of the PREMIS Data Dictionary. Coyle concludes with a number of recommendations for improving the PREMIS Rights entity, and indeed, many of her suggestions were incorporated into the new Rights entity in PREMIS 2.0.
The second report, Implementing the PREMIS Data Dictionary: A Survey of Approaches, authored by Deborah Woodyard-Robinson and published in June 2007 , gathers the experiences of sixteen institutions in regard to implementing the PREMIS Data Dictionary in their local archiving systems. Among other topics, the report discusses current strategies for storing preservation metadata, the use of automated preservation metadata tools, the relevance of the PREMIS data model to local implementations, and the extent to which the semantic units defined in the Data Dictionary have been implemented in the surveyed institutions. The report also documents issues that emerged in the course of local implementation of the Dictionary. Although the report focuses on implementation issues associated with the first version of the Data Dictionary, much of it remains relevant and useful to potential implementers of PREMIS 2.0.
Both reports are freely accessible online, and may be downloaded from the Maintenance Activity Web site.
The Maintenance Activity has devoted considerable effort toward educational outreach in the digital preservation community. This effort has primarily taken the form of a series of tutorials aimed at acquainting participants with the PREMIS Data Dictionary, providing a comprehensive overview of the Dictionary's contents, and discussing key implementation issues. Often, the tutorials also include a panel of representatives from institutions and projects currently implementing the Dictionary, who share and discuss their preservation metadata experiences and perspectives. Since the release of the first version of the Data Dictionary, the Maintenance Activity has held open tutorials in Glasgow, Boston, Stockholm, Washington DC, and San Diego. Several additional tutorials have been held by special request at particular institutions. The materials used in the tutorials (slideshow presentations, hand-outs, examples, etc.) are available for download from the Maintenance Activity Web site .
In addition to the tutorials, PREMIS has been active at conferences and meetings, and has sponsored several birds-of-a-feather sessions. Outreach is maintained through other mechanisms as well, including contributions to the literature, and maintenance of the PREMIS Implementers Group e-mail discussion list. The Maintenance Activity will continue to engage in educational/outreach efforts such as these as a key component of its ongoing activities.
With the release of PREMIS 2.0, the PREMIS Maintenance Activity can now turn its attention and resources to other activities. Looking ahead, the Maintenance Activity has prioritized several areas for future work, focusing on improving implementation and use of the PREMIS Data Dictionary.
Registry of controlled vocabularies
Many semantic units in the Data Dictionary recommend the use of controlled vocabularies for populating values. The Dictionary also includes many "suggested values" that could be used as starter lists for these vocabularies. As more and more institutions implement PREMIS metadata, significant benefits could be realized by gathering PREMIS-related controlled vocabularies into a central registry. Institutions implementing PREMIS metadata will be interested in consulting the controlled vocabularies employed by other institutions; ideally, a registry of this kind could promote convergence on standard vocabularies for particular semantic units. The Maintenance Activity is establishing such a registry in the near future, populated initially by lists of suggested values for semantic units supplied in PREMIS 2.0. Implementers will be encouraged to contribute other vocabularies in use to the registry. A mechanism is under development to enable the identification of the source of these controlled vocabularies and to validate appropriate values using an XML schema. A registry of controlled vocabularies should be of considerable value to the community, both as a reference to inform implementation decisions, and as a means of encouraging convergence and standardization.
PREMIS and METS
The PREMIS schema has been endorsed by the Metadata Encoding and Transmission Standard (METS) Editorial Board  as an approved extension schema for METS. The METS schema is widely used by digital repositories as a packaging mechanism for objects and their associated metadata. A number of questions have emerged as to how the PREMIS Data Dictionary and schema should be used in conjunction with METS. For example, PREMIS metadata can be clustered as a single unit in a sub-element (e.g., technical metadata, digital provenance metadata) under the administrative metadata section of a METS document, or it can be distributed over multiple sections; is there an optimal approach? Moreover, some redundancy exists between structures defined in METS and semantic units defined in PREMIS; how should this be handled? These and many other implementation issues represent an obstacle to convergence on best practice for incorporating PREMIS metadata into METS documents. To resolve some of this uncertainty, the Maintenance Activity has convened a group of experts to develop a set of guidelines and recommendations for using PREMIS and METS. A working draft of their findings is now available online .
Many other areas relating to the PREMIS Data Dictionary, and to preservation metadata generally, would benefit from attention in the future. The number of institutions and projects implementing the Data Dictionary continues to grow; it would be useful to follow up the earlier PREMIS report and survey the current landscape of implementers, synthesizing their experiences and challenges. This would identify areas of the Dictionary that need to be improved or expanded; highlight implementation issues that need to be addressed; and raise awareness about emerging workflows for creating, maintaining, and using PREMIS metadata in digital archiving systems. Another important issue is the exchange of PREMIS-conformant metadata between repositories: in particular, what are the issues involved in extracting and exporting PREMIS metadata from one digital archiving system in ways that ensure it can be received, ingested, and understood by another archiving system? Automated tools for creating, managing, and processing PREMIS metadata, or indeed, preservation metadata generally, is another important area. Some tools have already emerged, such as JHOVE  and the National Library of New Zealand's Metadata Extraction Tool , but nearly all are oriented toward managing technical metadata. Can tools be developed to support other forms of preservation metadata? A related point is the scope for registry services. Certain forms of preservation metadata, such as format and environment information, lend themselves to sharing and re-use; PRONOM  and the Global Digital Format Registry  are examples of how this kind of metadata can be collected and accessed through centralized registries. Further work could explore how registries such as these can be used to support the provision of PREMIS-conformant metadata, and also whether additional opportunities exist to share preservation metadata through registry services.
As with most activities, the number of opportunities for further work exceeds by a wide margin the resources available. The Maintenance Activity encourages the community to make use of the PIG discussion list to put forward suggestions on aspects of preservation metadata and PREMIS that warrant further work, and to share their views on which areas should be prioritized.
The author would like to thank colleagues on the PREMIS Editorial Committee for reviewing a draft of this article and providing many helpful comments.
 PREMIS Data Dictionary for Preservation Metadata, version 2.0, PREMIS Editorial Committee, March 2008, <http://www.loc.gov/standards/premis/v2/premis-2-0.pdf>.
 Digital Preservation Coalition - JISC/BL/DPC workshop - What to preserve? Significant Properties of Digital Objects, <http://www.dpconline.org/graphics/events/080407workshop.html>.
 Inside CDL: Rights Management, <http://www.cdlib.org/inside/projects/rights/schema/>.
 ISO 8601:2004 - Data elements and interchange formats -- Information interchange -- Representation of dates and times, <http://www.iso.org/iso/catalogue_detail?csnumber=40874>.
 Rights in the PREMIS Data Model: A Report for the Library of Congress, by Karen Coyle, December 2006, <http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf>.
 Implementing the PREMIS data dictionary: a survey of approaches, by Deborah Woodyard-Robinson, Woodyard-Robinson Holdings Ltd., For The PREMIS Maintenance Activity sponsored by the Library of Congress, 4 June 2007. <http://www.loc.gov/standards/premis/implementation-report-woodyard.pdf>.
 PREMIS Resources - PREMIS: Preservation Metadata Maintenance Activity (Library of Congress), <http://www.loc.gov/standards/premis/bibliography.html>.
 Guidelines for using PREMIS with METS, April 22, 2008, <http://www.loc.gov/premis/guidelines-premismets.pdf>.
Copyright © 2008 OCLC Online Computer Library Center