P R I N T E R - F R I E N D L Y  F O R M A T Return to Article
 

D-Lib Magazine

September/October 2010
Volume 16, Number 9/10

 

A Checklist and a Case for Documenting PREMIS-METS Decisions in a METS Profile

Sally Vermaaten1
Statistics New Zealand
sally.vermaaten@stats.govt.nz

doi:10.1045/september2010-vermaaten

 

Abstract

Shared metadata practices foster preservation and interoperability in several ways. They facilitate inter-repository exchange, the development of reusable metadata tools, and repository self-assessments and audits. Despite the benefits of shared practices, there has been little convergence on best practices for a widely used metadata strategy, PREMIS in METS. This paper proposes documenting PREMIS-METS decisions in METS profiles as a beneficial internal practice and an efficient way of sharing and comparing metadata strategies, thereby facilitating best practices. The paper then introduces a tool to help implementers document PREMIS-METS decisions in a METS profile. This tool is a checklist of 13 key PREMIS-METS issues that a repository should consider documenting in their METS profiles. Each of the 13 issues is illustrated with examples from METS profiles currently registered with the Library of Congress.

 

Introduction

Organizations in nearly every field have found that valuable digital information can and, at times, must be re-used years after its creation. Consequently, the need for long-term access to digital information is growing. A crucial step toward long-term access to digital information is the use of standards for recording provenance and other metadata that supports preservation. Use of PREMIS (PREservation Metadata: Implementation Strategies) in the METS (Metadata Encoding and Transmission Standard) container format has become an accepted method for recording such descriptive, structural, administrative, and preservation metadata. Though many organizations are now implementing PREMIS in METS, there has been little consolidation of PREMIS-METS practices. This lack of consolidation complicates inter-repository exchange, the creation of tools that can be used across projects and repositories, and the ability to measure the relative aptness of a metadata strategy.[2]

Developing shared PREMIS-METS practices, like other best practices, entails examining and comparing different strategies. By comparing metadata strategies, implementers can identify commonalities and synthesize them in shared guidelines that may, for example, articulate minimal and optimal metadata requirements. Currently, however, there is no agreed upon mechanism for sharing PREMIS-METS decisions.

This paper proposes documenting PREMIS-METS metadata decisions in METS profiles as a beneficial internal practice and a powerful and efficient way of sharing and comparing metadata strategies and thereby facilitating best practices. The paper then presents a tool to help implementers document PREMIS-METS decisions in a METS profile. This tool is a checklist of 13 key PREMIS-METS issues that a repository should consider documenting in their METS profiles. Each of the 13 issues is illustrated with examples from METS profiles currently registered with the Library of Congress.

 

Benefits of shared practices

Using PREMIS and METS is not an "off-the-shelf" metadata solution. When a repository decides to use PREMIS and METS, it is not the end but rather the beginning of metadata decision-making work. Implementers of PREMIS in METS must decide:

  • which subsets of METS elements and PREMIS semantic units are applicable to their content and goals
  • how to approach redundancies and overlapping content in PREMIS and METS (e.g., both schemas call for the recording of size, message digest algorithm, etc.)
  • how to record PREMIS or METS information (e.g. use of controlled vocabularies, registries)
  • how to structure objects and at what level of granularity metadata will apply (e.g., will it describe a data set? a book? a page?)

Implementing PREMIS in METS involves complex metadata choices. However, individual institutions need not face these complicated decisions alone; after all, their colleagues face similar sets of decisions and may be developing solutions that they would also find useful. Sharing PREMIS-METS solutions and developing PREMIS-METS best practices would ensure that repositories are not reinventing the wheel.

In addition to easing the decision-making work of individual institutions, converging on shared PREMIS-METS strategies would support key interoperability and preservation functions, including:

  • Inter-repository exchange
  • Tool development
  • Self-assessment of metadata strategies
  • Identification of trusted digital repositories

As Pawletko, Caplan, and Kehoe point out,[3] inter-repository exchange is critical to several preservation goals: geographically distributing copies of data, moving to new technologies and systems, and handing off digital assets in a line of succession. In basic inter-repository exchange, repositories ingest, store, and export other repositories' rich Archival Information Packages (AIPs), which contain archived files and metadata. In basic exchange, a repository is not expected to be able to understand or use metadata from another repository's AIPs in their own system; they are simply expected to be able to ingest, export, and store the rich AIP as a package. The Towards Interoperable Preservation Repositories (TIPR) project has successfully carried out basic exchanges. Basic exchange can be conducted with minimal agreement on metadata encoding practices. However, in order to conduct an exchange in which one repository can "unwrap" another repository's rich AIPs and understand and use the metadata in them, agreement on more specific shared metadata practices will be critical.

Shared PREMIS-METS practices would also help foster the development of tools for metadata manipulation and creation that could be used across projects and repositories. Those using PREMIS-METS best practices would have more regularly structured metadata and could therefore share metadata manipulation tools. Tools could also automate the creation of some metadata in a form which could then be used by many repositories. This regular output could be produced by tools ranging from format registries and validation tools, to translational tools that allow metadata to be converted from one schema into another, to more comprehensive preservation systems. In short, adoption of PREMIS-METS best practices would increase the number of repositories that could benefit from any given tool designed to work with metadata adhering to those encoding practices. This increase in the number of potential users would logically stimulate the development of tools that could streamline metadata workflows.

Shared PREMIS-METS practices would also aid internal and external repository assessment by providing greater confidence in metadata strategies. Internally, PREMIS-METS best practices would give repository staff performing a self-assessment greater confidence in their metadata approach because that approach has been shown to work for others. Though best practices are not absolute and may need adjustment over time, they are proven strategies and they provide shared reference points — such as minimal and optimal guidelines — against which a repository may compare its own metadata strategy. Situations in which an organization feels it should not follow best practices could usefully signal a need to adjust the best practices or clarify the limits of the best practices' applicability. Externally, best practices would aid trusted digital repository auditors by providing more concrete evidence of the effectiveness of a repository's metadata strategy. In this way, best practices would allow auditors to more confidently distinguish trusted digital repositories from less robust programs.

 

Documentation: Getting to shared PREMIS-METS practices

Overall, there has been little consolidation of PREMIS-METS practices, despite its many benefits. It is important to note, however, the work on shared practices that has been done thus far. In particular, the recently developed Guidelines for Using PREMIS with METS for Exchange[4] lays out PREMIS-METS best practices for Submission Information Packages (SIPs) and Dissemination Information Packages (DIPs). These guidelines have already facilitated the creation of a set of tools, the PREMIS-METS Toolbox.[5] The guidelines offer specific recommendations but still give implementers a great deal of freedom to accommodate the diversity of repository and content types that may be involved in exchange. The guidelines also represent an important step towards shared practices but, as two of their authors, Guenther and Wolfe, point out, they are not exhaustive[6]. To achieve greater interoperability, more detailed agreements on shared PREMIS-METS encoding practices are necessary.

PREMIS-METS best practices for specific repository or content types such as digital audio or geospatial data could provide helpful guidance to implementers while also lowering the barriers to repository interoperability and robust preservation. One reason these types of community best practices have not yet been developed is because of the relatively short amount of time that PREMIS and METS have been around; the METS XML schema was created in 2001 and version 1.0 of PREMIS was released in 2005. While time is certainly a factor, two other noteworthy barriers to the establishment of community best practices are: 1) a design choice in both PREMIS and METS and 2) lack of a consistent and robust mechanism for communicating metadata strategies.

The first of these barriers to establishing PREMIS-METS best practices is the result of a carefully considered design choice for flexibility and local control in both PREMIS and METS. Flexibility is necessary in order to support a wide variety of uses in domains ranging from science data to digital libraries. As McDonough[7] and Guenther[8] have both pointed out, the flexible and modular nature of PREMIS and METS is essential but inevitably complicates interoperability by allowing implementers a great deal of freedom to customize. Given this flexibility, it is not surprising that implementers have developed few common solutions to issues ranging from controlled vocabularies to AIP structures.[9]

The second barrier, the lack of an agreed-upon mechanism for sharing and comparing PREMIS-METS choices, is a more actionable problem. As discussed earlier, developing shared practices entails examining and comparing different metadata strategies. Comparing complex metadata strategies is only feasible if choices and the rationale behind them are documented. As an Australian Partnership for Sustainable Repositories report points out, if metadata decisions are not documented, it is easy — even for those who made the decisions — to forget what metadata choices were made and why these choices were made.[10] Documenting PREMIS-METS decisions is therefore highly beneficial for both internal (metadata reviews, etc.) and external (convergence on shared practices) reasons.

To date, repositories have taken a variety of approaches to documenting PREMIS-METS implementations. One approach is use of private, internal documents created in whatever format repository staff deems most appropriate. This may take a variety of forms ranging from a standalone text file to a note in a larger resource such as a database or wiki. This type of local documentation is likely easier to create and keep up-to-date than more formal records. However, if the location and meaning of this documentation is not sufficiently clear, future repository staff may have difficulty finding and interpreting it. Moreover, private, internal documentation does not readily facilitate the sharing and comparing necessary for the development of best practices since outside parties do not have ready access to it and because its unique form may contain quite different information from other repositories' documentation.

A second way in which repositories document PREMIS-METS implementations is the use of policy guides and other local documentation that is made public via an institutional Web site or published reports. This type of documentation is relatively easy to create and update. Public posting also offers a higher degree of repository transparency than private documentation and makes it more likely that future repository staff will be able to locate the documents. However, local documents can vary in content and form across repositories, making them more difficult to compare. Additionally, those external to the repository may have difficulty locating it since the documents are posted in many different locations (i.e. each repository's Web site) rather than in a central location.

A third way that repositories document PREMIS-METS decisions is in METS profiles that are contributed to the public, central pool of METS profiles registered with the Library of Congress.[11] Registered METS profiles must follow a standard format, specifically, an ordered list of elements and an example METS document. Form and basic content guidelines for METS profiles are specified in the document, "METS Profile Components," that is available on the METS Web site.[12] METS profiles ideally contain enough detail for readers of a profile to understand metadata decisions and the rationale behind them and then construct a METS document according to the rules set out in a profile or related set of profiles. In other words, a robust METS profile can essentially function as a well-explained template.

PREMIS-implementing METS profiles offer important benefits as a form of documentation and communication, even though documenting PREMIS-METS decisions in a METS registered profile may be more time-consuming than using local documentation and sharing revisions requires re-registration. The public, central location of registered METS profiles makes them widely accessible, easy to locate, and indicative of a high degree of repository transparency. Additionally, the shared, regular structure of METS profiles aids in comprehending and comparing differences in PREMIS-METS strategies recorded in various profiles.[13] In short, the PREMIS-implementing METS registered profile is a simple but powerful mechanism for sharing and comparing PREMIS-METS decisions and, as a result, is particularly well suited to fostering PREMIS-METS best practices.

Despite their shared format, PREMIS-implementing METS profiles currently registered with the Library of Congress exhibit varying levels of detail about PREMIS-METS decisions: some offer rich detail and others only state that PREMIS is used in METS documents that conform to the profile. This unevenness in detail reveals a primary difficulty PREMIS-METS users currently face in developing METS profiles. Those creating a PREMIS-implementing METS profile are given little guidance about what PREMIS-METS decisions to document in their profile. The "METS Profile Components" document specifies the structure of a profile and proposes some general questions that repositories may choose to answer in their METS profile. However, the "METS Profile Components" document focuses on profile format and basic METS issues and understandably does not address the particular issues which arise when meshing PREMIS with METS. Given the potential of the PREMIS-implementing METS profile as a mechanism for converging on shared practice and the fact that PREMIS-METS implementers face a common set of core metadata decisions, it became clear that it would be both possible and highly beneficial to provide greater support to PREMIS users as they document their decisions in a METS profile.

 

A tool for PREMIS-implementing METS profiles

To help those implementing, or considering implementing, PREMIS in METS the author designed a tool in the form of a checklist. This checklist presents and provides examples of 13 key PREMIS-METS issues that implementers should consider documenting in a METS profile when designing or revisiting a metadata strategy. This checklist is not a set of best practices; rather, it is more akin to an airplane pilot's pre-flight checklist: a mnemonic device and reference that prevents the user from overlooking an important part of a complex activity.

This 13-point checklist allows users to quickly benefit from a significant body of PREMIS-METS experience by consolidating — though certainly not replacing — a wide range of PREMIS-METS knowledge otherwise scattered across a host resources including the Guidelines for Using PREMIS with METS for Exchange,[14] case studies published in journals,[15] conference and workshop presentations, local documentation on institutional Web sites, and the "METS Profiles Components" document.[16] The checklist is further enriched by insights and examples gleaned from an analysis, conducted by the author in 2009, of PREMIS usage in METS profiles registered with the Library of Congress (for more about this analysis, see Appendix A).

Each of the points in the checklist is followed by a discussion and one or more examples from METS profiles registered by institutions such as the National Library of Australia, University of Illinois, University of California San Diego, and the University of Southampton. XML examples have been condensed here (ellipses indicate where sections of XML have been omitted) but a standalone version of the checklist with expanded XML examples is available online at the PREMIS Maintenance Activity Web site.[17] The items in this checklist generally follow the order of elements stipulated in the "METS Profile Components" document.

 

The checklist

1. How does the profile relate to other METS profiles?

Discussion: An analysis of METS registered profiles showed that several repositories use more than one METS profile. It is therefore useful to specify the relationship of one METS profile to another (e.g., is it a sub-profile, parent or generic profile, or a sibling profile?). Locating detailed PREMIS-METS documentation in one profile which can then be inherited by other profiles that are more specific to a particular content type or object type can be a useful documentation strategy. It is also important to note if a profile supersedes an older version of the same profile.

Examples: Both the University of Illinois Urbana-Champaign and the institutions that created the Australian METS Profile use parent and sub-profiles that carefully specify the relationships between METS profiles. For example, the abstract of University of Illinois' profile for Web site captures notes:

This profile inherits much of its content from the 'ECHO Dep Generic METS Profile for Preservation and Digital Repository Interoperability.' Unless otherwise noted below this profile must conform to the same rules as the parent profile.[18]

The abstract for the Australian METS Profile notes:

This profile describes the rules and requirements for using METS as an exchange format to support the collection and preservation of and access to content in Australian digital repositories. It is a generic profile not specific to a particular system or implementation. Repositories will need to develop and register sub-profiles that detail implementation-specific requirements.[19]

2. What schemas (PREMIS, MODS, MIX) are used and where are they located?

Discussion: According to the METS Profile Components document, a profile should explicitly record if it uses PREMIS or metadata schema such as MODS or MIX and where that schema is located.

Examples: The University of Southampton's RLUK 19th Century Pamphlets METS Profile notes their use of PREMIS (see Figure 1).[20]

 
Figure showing University of Southampton PREMIS use
Figure 1. University of Southampton notes their PREMIS use.
 

3. What controlled vocabularies for PREMIS semantic units are used and where are they located?

Discussion: The PREMIS Data Dictionary recommends placing controlled vocabularies in a common space or service where a vocabulary can be re-used by multiple repositories.[21] If a repository uses shared vocabularies, its METS profile should point to the relevant vocabularies and registries. If it is not possible to use a shared vocabulary service, repositories can document controlled vocabularies for PREMIS semantic units in the METS profile itself or in separate local documentation referenced in the profile. PREMIS semantic units that lend themselves to controlled vocabularies include but are not limited to: eventType, agentType, format, objectCategory, storageMedium, copyrightStatus, and RightsBasis.

Example: At the time of writing, there is no single central location where users can register and find shared controlled vocabularies for PREMIS semantic units or METS elements. However, the Library of Congress has recently developed an Authorities & Vocabularies service, www.id.loc.gov, which now supports shared controlled vocabularies for cryptographic hash functions, preservation events, and preservation level roles.

The University of California San Diego Simple Object METS profile[22] specifies usage of several controlled vocabularies recorded in separate documents on the UCSD Web site (see Figure 2).

 
Figure of profile
Figure 2. A UCSD profile references multiple controlled vocabularies described in documents on their Web site.
 

4. Is PREMIS information wrapped into or referenced from the METS document?

Discussion: All PREMIS-implementing METS Profiles registered with the Library of Congress at the time of writing wrap metadata from other schema into the METS document using METS mdWrap. However, it is also possible to store PREMIS semantic units outside of the METS document and reference them using the METS mdRef element. The Guidelines ultimately leave it up to the implementer to choose whether to wrap or reference PREMIS information (or both). However, if all PREMIS metadata is outside of the METS document, the Guidelines state that PREMIS linking identifier elements should be used to connect PREMIS and METS documents since ID/IDRefs may break. Regardless of the strategy adopted, this decision should be explicitly documented in the METS profile.

Example: Figure 3 below shows a typical example of PREMIS wrapped in the METS manifest from a sample METS document appended to the University of Illinois' ECHO Dep METS Profile for Web Site Captures.[23]

 
Figure of Profile
Figure 3. The University of Illinois' ECHO Dep METS Profile for Web Site Captures wraps PREMIS into METS using mdWrap.
 

The University of Illinois documents the decision to wrap PREMIS into METS in a parent profile, the ECHO Dep Generic METS Profile for Preservation and Digital Repository Interoperability (see Figure 4).[24] By stating its conformance to the parent profile, the Web Site Captures profile automatically inherits this requirement.

 
Figure of Profile
Figure 4. The parent profile (University of Illinois' ECHO Dep Generic METS Profile for Preservation
and Digital Repository Interoperability) mandates wrapping PREMIS into METS.
 

5. Is PREMIS information bundled or distributed in several places in the METS document?

Discussion: As the Guidelines point out, there are several places where PREMIS can be placed within a METS document. Because these placement decisions are complex, the decisions, and possibly the rationale for them, should be documented in a METS profile.

If keeping all PREMIS semantic units together, the Guidelines state that best practice is placing the entire package in digiProvMD with the premis element as a container. If splitting up PREMIS information across multiple sections of METS, the Guidelines state that the premis element should not be used. Instead, it is best practice to place:

  • premis:event under digiProvMD
  • premis:rights under rightsMD
  • premis:object under techMD or digiProvMD
  • premis:agent under digiProvMD or rightsMD (depending on whether the agent is connected with events or rights)

Example: All but one of the profiles examined (the Library of Congress' 2005 METS Profile for Recorded Events) distributed PREMIS semantic units in various METS amdSec subelements rather than using a single "package" of PREMIS information. For example, the National Library of Australia has placed premis:object under techMD and premis:event in digiprovMD (see Figure 5 below).[25]

 
Figure of Profile
Figure 5. The National Library of Australia splits up PREMIS information among multiple METS sections.
 

6. Is PREMIS information placed in separate amdSec elements or amdSec subelements?

Discussion: According to the Guidelines, implementers can choose to place PREMIS information within separate METS amdSecs or within one amdSec in different sub-elements (e.g. techMD, digiprovMD, rightsMD). The key thing to keep in mind here is that referencing one amdSec with PREMIS information in different sub-elements means referencing the amdSec and all of its children. Consider documenting the rationale for determining what goes in each amdSec or sub-element section and if there is a standard alphanumeric scheme that is used to identify sections.

Examples: University of California San Diego's theses and dissertations profile places different PREMIS entities in separate amdSec sub-elements and assigns each an ID of "ADM1," "ADM2" (Figure 6).[26]

 
Figure of Profile
Figure 6. The University of California San Diego's Electronic Theses and Dissertations Profile places PREMIS in separate amdSec sub-elements.
 

7. Is technical metadata recorded in separate techMD sections or with PREMIS objectCharacteristicExtension?

Discussion: Technical, format-specific metadata such as that recorded in MIX or textMD can be wrapped or referenced in separate METS techMD sections. Alternatively, technical metadata can be wrapped or referenced in the PREMIS semantic unit objectCharacteristicExtension. The Guidelines state that choosing either location (or both) is an implementation-specific decision. Repositories should document their decision in their profile and, if applicable, the naming scheme for their techMD sections so that the correct section can be referenced elsewhere in the METS document.

Examples: Figure 7 below shows an example of how the University of Illinois' Echo Dep Profile for Preservation and Digital Repository Interoperability mandates wrapping technical metadata in separate techMD sections.[27] The profile then illustrates this requirement in an example METS document in an appendix to the profile.

 
Figure of Profile
Figure 7. The University of Illinois' Echo Dep Profile for Preservation and Digital Repository Interoperability
wraps technical metadata in separate techMD sections.
 

8. What PREMIS semantic units does the profile require or recommend?

Discussion: A repository should record any required or recommended PREMIS semantic units in the structural requirements section of a METS profile along with any required values and/or conditions for use of the semantic unit. If different sets of PREMIS semantic units are required for several different scenarios, content types, or functions, the PREMIS semantic units and the circumstances under which they apply can be articulated either in one METS profile or in separate, specific sub-profiles of a more general parent METS profile. As more METS profiles are developed, best practices that list required semantic units for minimal and optimum compliance may be based on patterns in profiles for similar content types or functions.

Examples: The University of California San Diego's Electronic Theses and Dissertations Profile lists several required PREMIS semantic units and values in the structural requirements section (Figure 8).[28]

 
Figure of Profile
Figure 8. Several required PREMIS semantic units and values are stipulated
in the University of California San Diego's Electronic Theses and Dissertations Profile.
 

9. Are relationships between objects expressed using METS div elements, PREMIS relationships, or both?

Discussion: The Guidelines state that hierarchical structural relationships between objects should be expressed as nested div elements in the METS schema. However, if the purpose of your profile is preservation or you are expressing derivative (e.g. image B derived from image A) relationships, the Guidelines also recommend using PREMIS relationships.

Examples: The Australian METS Profile 1.0 uses the METS fileSec and structMap and PREMIS relationship semantic units to express derivate and other relationships (Figure 9).[29]

 
Figure of Profile
Figure 9. The Australian METS Profile 1.0 uses the METS fileSec and structMap
and PREMIS relationship semantic units to express relationships.
 

10. What level of object does PREMIS information describe?

Discussion: PREMIS information should be connected to the object or objects to which it relates in the METS structural map. However, it is helpful to explicitly state in a profile in a more human-readable form what level of object PREMIS information describes. A diagram such as Figure 3 in Angela Dappert and Markus Enders' case study, "Using METS, PREMIS, and MODS for Archiving eJournals," may be valuable for describing how different metadata sections fit together in complex structures.[30]

Examples: The University of Southampton's RLUK 19th Century Pamphlets Profile states the level at which PREMIS information is recorded:

Technical metadata [is] at the file level, extracted from standard file information. There will be one instance per file of any kind listed in the fileSec. This metadata will be contained in an amdSec linked to the relevant file element.[31]

11. How are PREMIS linking identifiers, IDREF's, and PREMIS identifiers used?

Discussion: The Guidelines state that when PREMIS information is wrapped into the METS document, implementers should the use METS ID/IDRefs mechanism to connect files in the fileSec with the PREMIS information that relates to those files. If PREMIS information is outside of the METS document, using PREMIS identifier semantic units in addition to METS ID/IDRefs is recommended because IDRefs may break and PREMIS identifiers allow the implementer to record more detailed information if necessary. Repositories should document their strategy for linking between elements in PREMIS and METS. Repositories should also document how PREMIS identifiers, if used, are generated and at what level the identifiers are resolvable (locally, globally, or within each METS document).

Examples: All METS profiles analyzed used METS IDs. Roughly half of the profiles also used PREMIS linking identifiers. Only one repository, University of Illinois, used PREMIS schema IDREFs (e.g. RelObjectXmlID). Figure 10 shows example of use of a PREMIS linking identifier from the Australian METS Profile.[32]

 
Figure of Profile
Figure 10. Example of use of a PREMIS linking identifier from the Australian METS Profile.
 

12. How are PREMIS-METS redundancies handled?

Discussion: A necessary result of a design requirement of both PREMIS and METS-namely that both are designed to be independent and modular, so that use of one standard does not depend on use of another-is that PREMIS calls for users to record certain key information that is also recorded in METS. The Guidelines lists some of these redundancies (see Table 1 below).[33]

 
  PREMIS METS
Size in size under objectCharacteristics an attribute of file in the fileGrp
CHECKSUM and CHECKSUMTYPE in fixity under objectCharacteristics attributes of file
MIMETYPE in format under objectCharacteristics an attribute of file
 
Table 1: Some redundancies between PREMIS and METS
 

In most cases, the PREMIS semantic unit is more expressive than the METS element and therefore the Guidelines recommends using PREMIS or recording the information redundantly in PREMIS and METS, especially if the purpose of the metadata is preservation. The decision to record this information in PREMIS, METS or both should be specified in the METS profile.

Examples: METS profiles currently registered with the Library of Congress showed great variance in handling redundancies; there seemed to be no one particularly popular strategy. For example, the University of California San Diego's Complex Object Profile uses PREMIS to record size and checksum information and uses both METS and PREMIS for mimetype information. The University of Southampton's RLUK 19th Century Pamphlets Profile, on the other hand, records size in both PREMIS and MIX, and checksum and mimetype information in METS.

13. What metadata tools or applications are used?

Discussion: According to the METS Profile Components document, a repository should record any tools used in the creation, transformation, or preservation of its PREMIS or METS metadata in its METS profile. This information is useful to future curators of the data who may not otherwise know that a specific tool was used. If the METS Profile is shared, other repositories could also benefit from knowing that a given tool is available and being used for a particular metadata operation.

Examples: In its ECHO Dep METS Profile for Web Site Captures, the University of Illinois declares that it inherits the tools specified in its parent profile, the ECHO Dep Generic METS Profile for Preservation and Digital Repository Interoperability, and additionally record using Web Archivists Workbench (see Figure 11 below).[34]

 
Figure of Profile
Figure 11. The ECHO Dep METS Profile for Web Site Captures declares its use of Web Archivists Workbench.
 
 

Conclusion

As digital preservation repositories grow in size and complexity and as inter-repository exchange of digital objects moves from an exploratory to a routine activity, repository and metadata interoperability becomes critical. One way of achieving greater interoperability is through the development of shared metadata practices, and in particular, due to wide usage, shared PREMIS-in-METS practices. The development of PREMIS-METS best practices would positively affect inter-repository data exchange, tool development, repository self-assessment, and trusted digital repositories audits.

Consistently documenting and sharing PREMIS-METS decisions in a central pool of METS profiles — namely the Library of Congress' registry of METS profiles — is one way that a shared view of PREMIS-METS best practices for particular content types or functions could be reached. The checklist presented in this paper helps implementers document PREMIS-METS decisions in a METS profile; it incorporates information from an analysis of METS profiles registered with the Library of Congress as well as an array of PREMIS-METS resources. By supporting documentation of PREMIS-METS issues in METS registered profiles, this checklist could also support comparisons of metadata strategies and convergence on PREMIS-METS best practices.

 

Appendix A: Highlights from an analysis of PREMIS usage in METS Registered Profiles

In July and August of 2009, the author conducted a study of PREMIS usage in METS profiles registered with the Library of Congress. This study examined commonalities and divergences in PREMIS-METS practice as recorded in the 15 METS registered profiles that currently implement PREMIS. Some highlights of this analysis are presented here:

  • Increasing adoption of PREMIS: METS profiles show increasing adoption of PREMIS over time. Only 2 out of 4 profiles registered in 2005 implemented PREMIS but all 4 profiles registered in 2009 used PREMIS. Currently, 15 of the 23 profiles registered with the Library of Congress since the release of PREMIS in 2005 implement PREMIS. Though a small sample, it does reflect a logical adoption trajectory (see Figure 12).
 
Chart
Figure 12. Percent of METS profiles registered with the Library of Congress per year that implement PREMIS.
 
  • Redundantly required information: For information redundantly required by both the PREMIS and METS schemas such as size and checksum, METS registered profiles currently show a lack of agreement about whether to record using the PREMIS or METS element/semantic unit or both. The Guidelines provide guidance about which element or semantic unit to use but ultimately leave the choice up to implementer discretion.
  • Splitting up PREMIS information: Though the Guidelines are neutral on whether to distribute or keep all PREMIS information together, most METS Registered Profiles split up PREMIS metadata among several METS amdSec subelements (techMD, digiProvMD, and rightsMD) rather than keeping all PREMIS information together in one "package."
  • Increasing adoption of PREMIS rights: While PREMIS object, agent, and events entities were frequently used from the beginning, the PREMIS rights entity was little used until the release of PREMIS v.2 in 2008, which included more support for rights information. 3 of 4 profiles registered since PREMIS v. 2 have mandated PREMIS rights.
  • Use of parent profiles: Five profiles examined were used as part of metadata strategies that employed parent and sub-profiles. Parent profiles or "generic" METS profiles apply to all of a repository's METS documents and provide a consistent framework of metadata. More detailed metadata guidance is given in specific sub-profiles that are used in conjunction with the parent profile. A notable example of this strategy is the Australian METS Profile, which was cooperatively developed as a base "template" for all Australian METS implementations.[35]
 

Notes

[1] This research was conducted as part of work with Brian Lavoie, who gave me valuable encouragement and feedback. Much of the research was completed in an internship with OCLC Research that was as part of an IMLS grant-funded project at the University Of Michigan School of Information, of which the author is a recent graduate. The author then joined OCLC Research as a Research Assistant and completed this work in this capacity. Some preliminary findings of this research were presented in a poster at Sustainable Archives: AUSTIN 2009, the Joint Annual Meeting of the Society of American Archivists and the Council of State Archivists.

[2] It speaks to the importance of these goals that progress is being made toward them in spite of the complications of disparate metadata practices. For example, the Hub and Spoke Framework Tool Suite and the Towards Interoperable Preservation Repositories project are developing two different strategies for inter-repository exchange.

[3] Joseph Pawletko, Priscilla Caplan, Bill Kehoe. "Towards Interoperable Preservation Repositories," DLF Spring Forum May 5, 2009. Available at: http://wiki.fcla.edu:8000/TIPR/uploads/2/dlf-sp2009-tipr.1.pdf. (Accessed December 28, 2009.) Hereafter referenced as Pawletko, Caplan, and Kehoe.

[4] Guidelines for Using PREMIS with METS for Exchange. Revised September 17, 2008. Available at: http://www.loc.gov/standards/premis/guidelines-premismets.pdf. (Accessed November 9, 2009.) Hereafter referred to as the Guidelines.

[5] Developed at the Florida Center for Library Automation for the Library of Congress, the PREMIS-METS Toolbox allows users to validate their PREMIS-in-METS XML implementations against a machine interpretation of the Guidelines, create automatic description of a file in PREMIS, and convert PREMIS XML into METS XML and vice versa. PREMIS in METS Toolbox: http://pim.fcla.edu/.

[6] Guenther and Wolfe carefully delineate the scope of the Guidelines and express the need for more specific agreements on encoding practices: "Particular implementations will require more controlled decisions based on agreement between exchange partners." Guenther, Rebecca, Robert Wolfe. "Integrating Metadata Standards to Support Long-Term Preservation of Digital Assets: Developing Best Practices for Expressing Preservation Metadata in a Container Format," iPres 2009: The Sixth International Conference on the Preservation of Digital Objects, October 5-6, 2009. In iPres 2009: Proceedings (2009) p. 83-89.

[7] McDonough, Jerome. "Structural Metadata and the Social Limitation of Interoperability: A Sociotechnical View of XML and Digital Library Standards Development," Balisage: The Markup Conference 2008, Montréal, Canada, August 12 - 15, 2008. In Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies, vol. 1 (2008). doi:10.4242/BalisageVol1.McDonough01.

[8] Guenther, Rebecca. "Battle of the Buzzwords: Flexibility vs. Interoperability When Implementing PREMIS in METS," D-Lib Magazine May/June 2008. doi:10.1045/july2008-guenther. (Accessed November 9, 2009.)

[9] In "Structural Metadata and the Social Limitation of Interoperability" (see citation in note 7), McDonough emphasizes the value of translations between metadata schemas (e.g. crosswalks) as a way of counteracting the flexibility of metadata standards. Though initially de-emphasizing the feasibility of shared practices or rules for description, McDonough later notes that establishing shared practices or rules for structural description might be a useful or even necessary step toward developing robust translational tools.

[10] Judith Pearce, David Pearson, Scott Yeadon and Megan Williams. Report of the METS Profile Development Project. Australian Partnership for Sustainable Repositories (APSR). November 2007. p. 13. Available at: http://www.apsr.edu.au/nla-mets/mets_profile_report.pdf.

[11] Library of Congress. METS Registered Profiles: http://www.loc.gov/standards/mets/mets-registered-profiles.html.

[12] METS Profile Components: http://www.loc.gov/standards/mets/profile_docs/components.html.

[13] An imperfect but still very useful "Index to registered METS Profiles by features used" allows you to view all the profiles that use, for example, Web Archivists Workbench, MODS, or MARC Country Codes: http://www.loc.gov/standards/mets/mets-divTree.html.

[14] Guidelines.

[15] Relevant case studies abound. One particularly rich example of a case study is Angela Dappert's and Markus Enders', "Using METS, PREMIS and MODS for Archiving eJournals," D-Lib Magazine September/October 2008. Available at: doi:10.1045/september2008-dappert.

[16] METS Profile Components: http://www.loc.gov/standards/mets/profile_docs/components.html.

[17] A Checklist For Documenting PREMIS-METS Decisions In a METS Profile: http://www.loc.gov/standards/premis/premis_mets_checklist.pdf.

[18] University of Illinois at Urbana-Champaign's ECHO Dep METS Profile for Web Site Captures: http://www.loc.gov/standards/mets/profiles/00000016.xml.

[19] The Australian METS Profile: http://www.loc.gov/standards/mets/profiles/00000018.xml.

[20] The University of Southampton's RLUK 19th Century Pamphlets METS Profile: http://www.loc.gov/standards/mets/profiles/00000024.xml.

[21] PREMIS Data Dictionary. Version 2.0, March 2008. p. 18. Available at: http://www.loc.gov/standards/premis/v2/premis-report-2-0.pdf.

[22] University of California San Diego's Simple Object Profile: http://www.loc.gov/standards/mets/profiles/00000027.xml.

[23] See note 18.

[24] University of Illinois Urbana-Champaign's ECHO Dep Generic METS Profile for Preservation and Digital Repository Interoperability: http://www.loc.gov/standards/mets/profiles/00000015.xml.

[25] See note 19.

[26] University of California San Diego's Electronic Theses and Dissertations Profile: http://www.loc.gov/standards/mets/profiles/00000026.xml.

[27] See note 24.

[28] See note 26.

[29] See note 19.

[30] See note 15.

[31] See note 20.

[32] See note 19.

[33] Guidelines.

[34] See note 18.

[35] For more information see Judith Pearce, David Pearson, Megan Williams, and Scott Yeadon, "The Australian METS Profile - A Journey About Metadata," D-Lib Magazine, March/April 2008. Available at: doi:10.1045/march2008-pearce. (Accessed November 10, 2009.)

 

About the Author

Photo of Sally Vermaaten

Sally Vermaaten is an Information Analyst with the Data and Metadata Team at Statistics New Zealand. Previously, she was a Research Assistant at OCLC, where she assisted Brian Lavoie with digital preservation research, data mining, and PREMIS Editorial Committee projects. Sally received a Masters of Science in Information in 2010 from the University of Michigan School of Information. Prior to OCLC, Sally worked at the Harvard Law School Library and the University of Michigan Special Collections.

 
 
P R I N T E R - F R I E N D L Y  F O R M A T Return to Article