Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
July/August 2003

Volume 9 Number 7/8

ISSN 1082-9873

Identifying Metadata Elements with URIs

The CORES Resolution

 

Thomas Baker
Birlinghoven Library, Fraunhofer-Gesellschaft
<thomas.baker@bi.fhg.de>

Makx Dekkers
PricewaterhouseCoopers
<mail@makxdekkers.com>

Red Line

spacer

Abstract

On 18 November 2002, at a meeting organised by the CORES Project (Information Society Technologies Programme, European Union), several organisations regarded as maintenance authorities for metadata elements achieved consensus on a resolution to assign Uniform Resource Identifiers (URIs) to metadata elements as a useful first step towards the development of mapping infrastructures and interoperability services. The signatories of the CORES Resolution agreed to promote this consensus in their communities and beyond and to implement an action plan in the following six months. Six months having passed, the maintainers of GILS, ONIX, MARC 21, CERIF, DOI, IEEE/LOM, and Dublin Core report on their implementations of the resolution and highlight issues of relevance to establishing good-practice conventions for declaring, identifying, and maintaining metadata elements more generally. In June 2003, the resolution was also endorsed by the maintainers of UNIMARC.

Introduction

The "Resolution on Metadata Element Identifiers", or CORES Resolution, is an agreement among the maintenance organisations for several major metadata standards — GILS, ONIX, MARC 21, UNIMARC, CERIF, DOI®, IEEE/LOM, and Dublin Core — to identify their metadata elements using Uniform Resource Identifiers (URIs) [CORES-RESOLUTION]. The Uniform Resource Identifier, defined in the IETF RFC 2396 as "a compact string of characters for identifying an abstract or physical resource", has been promoted for use as a universal form of identification by the World Wide Web Consortium [URI, BERNERS-LEE]. The CORES Resolution, formulated at a meeting organised by the European project CORES in November 2002, included a commitment to publicise the consensus statement to a wider audience of metadata standards initiatives and to implement key points of the agreement within the following six months — specifically, to define URI assignment mechanisms, assign URIs to elements, and formulate policies for the persistence of those URIs.

This article marks the passage of six months by reporting on progress made in implementing this common action plan. After presenting the text of the CORES Resolution and its three "clarifications", the article summarises the position of each signatory organisation towards assigning URIs to its metadata elements, noting any practical or strategic problems that may have emerged. These progress reports were based on input from Thomas Baker, José Borbinha, Eliot Christian, Erik Duval, Keith Jeffery, Rebecca Guenther, and Norman Paskin. The article closes with a few general observations about these first steps towards the clarification of shared conventions for the identification of metadata elements and perhaps, one can hope, towards the ultimate goal of improving interoperability among a diversity of metadata communities.

1. The CORES "Resolution on Metadata Element Identifiers"

Whereas:
  • our metadata standards have "elements" — units of meaning comparable and mappable to elements of other standards,
We agree:
  • to assign Uniform Resource Identifiers to our elements;
  • to articulate and publish specific policies regarding the stability, persistence, and maintenance of the URIs assigned to the elements.
Clarifications
  1. This resolution promotes the use of URIs for identifying metadata elements. However, there is no expectation that these URIs will be used to represent those elements within particular application environments — e.g., to include them in instance metadata records or use them in SQL queries. Rather, the intent is to offer a common citation mechanism usable, when needed, for purposes of interoperability across standards.
  1. While the resolution focuses on the identification of individual metadata elements, URIs should also be used to identify other relevant entities at various levels of granularity, such as sets of elements (schemas) and the terms and sets of controlled vocabularies of metadata values. Deciding which of its own entities are important or salient enough to be assigned URIs is the prerogative of a particular Standards-Developing Organization.
  1. This resolution specifies the use of URIs as identifiers with no requirement or expectation that those URIs will reference anything on the World Wide Web, such as documentation pages or machine-understandable representations of metadata elements [CORES-RESOLUTION].

2. GILS

The Global Information Locator Service (GILS) standard is a simplified profile of the international standard for networked search, ISO 23950 (also known as ANSI/NISO Z39.50 [GILS, Z39-50]). A GILS-compliant server maps local data elements to well-known bibliographic concepts. This semantic mapping achieves broad interoperability of search across all manner of data and information. In addition a GILS-compliant server must, on request, convert local structured data into one or another canonical form using a specific schema. Retrieval records generated with this GILS schema are known as "locator records" and are designed to help search catalogs, directories, and other types of metadata.

The GILS search concepts are part of the ISO 23950 "attribute set" known as Bib-1 [BIB1], and the data elements of the GILS schema were originally defined in the GILS Profile [GILS-PROFILE]. These concepts and data elements were formalized to follow the principles of ISO 11179, an international standard for metadata registries, as part of the process of establishing the ISO Basic Semantic Register (BSR) [GILS-SEMANTICS]. The international standards process has not converged on a common URI mechanism for referencing the entries in such a metadata registry. However, the GILS maintenance office (at the US Geological Survey) has suggested an approach that would be in keeping with the CORES Resolution.

The suggested approach builds on two crucial characteristics of the BSR: that the BSR can be reasonably well represented as an XML Schema or RDF Schema, and the requirement that each of the many thousands of BSR concepts and data elements has a unique identifying number. For instance, a "keywords" data element could reference the BSR-registered GILS data element named "Subject-Terms-Uncontrolled". This GILS data element has the unique identifier "1133.541.1207.96", where "1133" corresponds to the English token "Information Resource", "541" to "Subject"; "1207" to "Uncontrolled Term", and "96" to "Group". This data element could be referenced in an RDF Schema version of the ISO BSR through a URI such as

http://www.gils.net/bsr-gils.rdfs#1133.541.1207.96

There is currently much activity aimed at implementing broad-scale semantic and metadata registries using the principles of ISO 11179, including work within OASIS Technical Committees. As many of the issues around semantic register services have yet to be worked out in the standards community, however, the GILS office cannot yet commit to more than a suggestion for how to reference GILS elements with URIs.

3. ONIX and DOI

The Digital Object Identifier System is a system for persistently identifying and managing intellectual property in the digital environment. The system is centered around the DOI — an identifier consisting of a prefix and a suffix separated by a slash (e.g., 10.1234/1234567), which persistently identifies a content object of relevance to an intellectual property transaction and associates it with relevant data and services. DOIs are also associated with attributes descriptive of the content object identified, and these are an essential component in the provision of services related to intellectual property. The DOI Metadata System provides a mechanism for mapping the terms of any known scheme into a common dictionary, the <indecs> Data Dictionary, to support the transformation of metadata from one scheme into the terms of another. The system is developed and maintained by the International DOI Foundation (IDF), a membership organisation aiming to provide a framework for managing intellectual content, for linking customers with content suppliers, for facilitating electronic commerce, and enabling automated copyright management for all types of media [DOI].

ONIX is an international standard for representing and communicating book industry product information in electronic form [ONIX]. It provides an XML message format for exchanging information between systems, which may, internally, use different metadata systems. ONIX data elements have been defined for product information — message headers, reference and product numbers, authorship, subject, publisher, and the like — for books, videos, and related media products. As a standard, ONIX is maintained primarily by EDItEUR, a membership organisation focused on standards for electronic commerce in the book and serials industries.

3.1 A shared dictionary for IDF and EDItEUR

IDF and EDItEUR have supported the CORES Resolution as an important first step in the direction of interoperability. However, they both consider the precise identification of metadata elements to be a necessary but not sufficient condition for real interoperability. Rather, interoperability requires tools and mechanisms for ensuring the semantic equivalence of terms from disparate schemes.

As both communities have built on work of the original <indecs>™ Project (INteroperability of Data in E-Commerce Systems), IDF and EDItEUR are currently collaborating to improve interoperability between the DOI system and ONIX by mapping their metadata elements to the <indecs> Data Dictionary (iDD) [DOI-ONIX]. Based on <indecs> methodology, the iDD has been developed to support the MPEG21 Rights Data Dictionary by providing a semantic bridge between schemes and dictionaries describing intellectual property resources of any form. The iDD will provide a shared dictionary of metadata elements (terms) to underpin both the DOI Metadata System and ONIX.

A detailed explanation of the technical basis of the iDD is given in the recent DOI Handbook Edition 3 and in the full specification of the MPEG 21 Rights Data Dictionary [DOI-HANDBOOK, MPEG21]. All Terms used in ONIX messages and DOI Metadata Declarations will be mapped into the iDD, creating a network of equivalences and other relationships in support of metadata functions such the transformation of metadata from the Terms of one scheme to another and the use of Terms from different schemes together in cross-domain applications. ONIX will use this as a tool for mapping terms from non-ONIX metadata schemes, while the DOI system will use this as its primary tool for associating metadata from various sources with DOI-identified entities for advanced functions such as multiple resolution.

Use of the iDD will mean that ONIX metadata will be usable in the DOI system (and vice versa) via a central, one-to-one mapping rather than through multiple uncoordinated crosswalks between different schemes. As currently planned, terms mapped to the iDD will include all of the terms in ONIX 2.1, the DOI Kernel (the mandatory metadata set associated with every DOI), the DOI Resource Metadata Declaration (an extended metadata set with an XML schema), the CrossRef Metadata Declaration (widely used in the serials publishing sector), and the ISO MPEG-21 Rights Data Dictionary. Further access and use of the iDD for searching and look-up functions will be determined by IDF and EDItEUR according to their community needs. The iDD will be searchable from the IDF Web site, but the business aspects of making this available beyond the circle of DOI users have yet to be determined.

3.2 URI assignment policies

Terms in the iDD will be assigned URIs in the form of DOIs [DOI-RFC]. Syntactically, a DOI can include any alphanumeric, case-insensitive string of unlimited length, which supports both an <indecs> identifier (iid) as used in the iDD and the short alphanumeric tags used in ONIX. In order to distinguish DOIs for elements from DOIs for content objects, IDF currently uses numeric prefixes for all content objects (e.g., 10.1234/), and reserves specific prefixes for administrative purposes, such as 10.ap/???? for Application Profiles and 10.ra/???? for Registration Agencies (where ???? indicates a variable suffix string). A likely prefix for identifying a metadata element in the <indecs> Data Dictionary would therefore be 10.iid/, though no final decision on this has yet been reached.

For the suffix, one option would be to adopt ONIX tags, such as <b245>. In URL and URI syntax, however, angle brackets require hex encoding (i.e., %3c and %3e), so it is preferable to omit these, as in the identifier string 10.iid/b245.

For example, the iDD element TranslatorSourceLanguage ("A Language from which a Translator worked in making a Translation") has the iid b252 and results in the identifier

doi:10.iid/b252

This string would be published as a URI and made resolvable for users with access to the iDD ontology repository. The iDD ontology repository, in turn, would express the logical relationships of this element to other defined elements, such as Creation, Translator, Translation, SourceOfTranslation, and Language, thus facilitating the transformation and integration of data among a diversity of DOI Application Profiles and ONIX message schemes.

4. MARC 21

MARC ("MAchine Readable Cataloging") was developed in the 1960s as a standard record format for computerised library catalogues. It has evolved over the years into "MARC 21" — a family of related standards for functions such as describing all types of resources in any physical format ("bibliographic records"), providing locations and holdings for those resources ("holdings records"), identifying people and organisations ("authority records"), and classifying the intellectual content of resources ("subject headings" and "classification"). MARC 21 code lists provide standardised designations for languages, countries, geographic areas, organisations, information sources, and types of relationships between names and works ("relators"). "MARC 21" refers both to a standard syntax and to a rich set of data elements. It may be expressed syntactically using the standard ISO 2709 (Format for Information Exchange), which defines how MARC records may be communicated, or using MARCXML, which is an XML schema that uses MARC content designators. The Metadata Object Description Schema (MODS), which derives from MARC, is an XML schema that includes a subset of MARC bibliographic data elements expressed as language-based tags. The MARC 21 family of standards is maintained by the US Library of Congress [MARC21, MODS]. The provision of persistent URIs for the "elements" of this large and diverse set of standards will proceed in stages.

4.1. The short term: URIs for MARC and MODS elements

In the short term, "http:" URIs will be assigned for MARC and MODS content designators (e.g., elements and sub-elements and for MARC 21 indicators and values). These URIs will enable any MARC or MODS element to be referenced without ambiguity. Initially, this assignment will take the form of a statement detailing methods for constructing the URIs along with policies regarding their persistence. Assignment methods may also be formulated to provide URIs for values in controlled MARC/MODS value lists. The precise methods have yet to be finalised, since there are issues that have yet to be decided, so the examples below should be understood as tentative. It should be possible to establish persistence policies and URI assignment policies by the second half of 2003. Actually assigning the URIs will probably take less time for MODS elements than for all MARC elements. The entire process is expected to take perhaps one year.

As an initial test case, the Library of Congress will also publish the URIs for terms in the MARC Code List for Relators in a schema document using XML and Resource Description Framework (RDF). Made available in response to a request from the DCMI Usage Board, this document will declare semantic relationships between the MARC Relator terms and a term from Dublin Core (contributor) in a form that can easily be incorporated into RDF-based "registries" of metadata terms. Although some details remain to be clarified, work on this RDF schema is largely complete, so the relator list (with URIs for each entity) should be available shortly after the DCMI Usage Board meeting of June 2003.

4.1.1. MARC elements

MARC 21 has different "formats" — bibliographic, authority, holdings, classification, and community information — so the URI must identify a particular element with respect both to the MARC 21 namespace and to the particular format. Using a namespace established for elements expressed as ISO 2709 or MARCXML as a base, URIs for MARC elements might be constructed as follows:

http://www.loc.gov/marc.[format]/[fieldname].[subfield]

For example:

http://www.loc.gov/marc.bibliographic/245
http://www.loc.gov/marc.bibliographic/245.a
http://www.loc.gov/marc.bibliographic/008-s-03
http://www.loc.gov/marc.authority/008

would identify (respectively) field 245 of the MARC bibliographic format; subfield $a of field 245; sound recording 008 character position 03; and the 008 element of the MARC authority format.

4.1.2. MARC code lists

The MARC Code List for Relators is a list of roles that agents may assume with regard to a particular work, such as "author", "translator", or "censor". Each term on the Relators list will be a assigned a URI of the following form:

http://www.loc.gov/marc.relators/adp

where the last element is the code representing the relator term — in this case, "adp" for "adaptor". At a later date, this pattern — appending a code to a namespace prefix — would be followed to construct URIs for values in other code lists. The MARC language codes present a special case since they are equivalent to codes in the ISO standard 639-2/B codes and will therefore be considered at a later date.

4.1.3. Source codes

MARC 21 uses codes to identify the source of a value (in Dublin Core terminology, these would be "encoding schemes"). The DCMI Libraries Working Group has recommended that certain sources be "registered" and given persistent identity. Several of the encoding schemes registered for use with Dublin Core elements will also exist in the MARC namespace (such as "lcsh", which in both MARC and Dublin Core stands for "Library of Congress Subject Headings"). If the Library of Congress were to assign a URI to the MARC "lcsh" (to which DCMI has already assigned a URI), then the mechanics and the etiquette of cross-referencing and perhaps of establishing a preference between the redundant identifiers would need to be worked out — an interesting test case for situations that surely will arise in other contexts as well.

4.1.4. MODS elements

The namespace for MODS to be used as a prefix for URIs is

http://www.loc.gov/mods

Some MODS elements are already used in the DCMI Library Application Profile, so URI assignment for these would be particularly useful. For example, Location and Edition might be identified as follows:

http://www.loc.gov/mods/location
http://www.loc.gov/mods/originInfo.edition

where edition is a sub-element of originInfo.

4.2. The longer term: URNs

For the longer term, the Library of Congress plans to explore the use of Uniform Resource Names (URNs) — a particular form of URI — in the context of a more comprehensive solution to the problem of identifying its elements. Compared to "http:" URIs, URNs are attractive for use as persistent element identifiers because their primary purpose is naming rather than resolution or retrieval. URNs for MARC and MODS elements could resemble the following:

URN:[urn namespace id]:marc.bibliographic/245.a
URN:[urn namespace id]:mods/originInfo.edition
URN:[urn namespace id]:marc.relators/adp

4.3. Identification of XML schemas

The Library of Congress plans to register handles to be used as persistent names for XML schemas it maintains. These handles will be expressed as URLs that resolve through a proxy server at the Library of Congress (http://hdl.loc.gov). For example:

http://hdl.loc.gov/loc.standards/mods
http://hdl.loc.gov/loc.standards/marc21.slim

5. UNIMARC

UNIMARC ("UNIversal MAchine Readable Catalogue") is a family of standards for bibliographic, authority, classification, and item description of resources. Originally promoted by the International Federation of Library Associations and Institutions (IFLA) to facilitate the exchange of records between national variants of MARC, UNIMARC has until recently been maintained in the context of IFLA's Universal Bibliographic Control and International MARC Core Activity [UBCIM]. Since March 2003, UNIMARC is maintained in a new UNIMARC Program hosted by the National Library of Portugal, which endorsed the CORES Resolution in June 2003. As UNIMARC shares the same basic model as MARC 21, the National Library of Portugal will coordinate on an implementation of the CORES Resolution with Library of Congress.

6. CERIF

The Common European Research Information Format (CERIF) provides a set of guidelines for the exchange of data about research projects along with information about related researchers, organisations, funding programmes, and publications [CERIF]. The format is intended for use in building and managing a Current Research Information System (CRIS). Originally developed under the auspices of the Innovation Directorate of the European Commission in the late 1980s for use by member states of the European Union, CERIF has been transferred to the custodianship of EuroCRIS, a not-for-profit membership forum for individuals and organisations concerned with the use of information technology in the conduct of research information systems [EUROCRIS].

6.1 The CERIF data model

The CERIF data model is defined in terms of entities, attributes, and relationships. There is a full CRIS data model, used as a template for building a new CRIS or as a blueprint for transforming a legacy CRIS; an exchange data model used for the exchange of data between CRISes; and a metadata data model used to describe a CRIS to a portal providing homogeneous access over a heterogeneous, distributed set of CRISes. Each model is based on entities (e.g., <Project>, <Person>, <OrgUnit>) with attributes (e.g., <ProjectId>, <Status>). Some attributes are defined as separate entities, such as <Title> (to support language variants) and some attributes, such as <Status>, must take their values from an entity containing an enumerated list of valid values.

Entities (with their attributes) are linked by relationships, which are themselves represented as entities with attributes. For example, the entity <Project> can be linked by a relationship to entity <Person>, the relationship being expressed as an entity with attributes <ProjectId> (to link to <Project>), <PersonId> (to link to <Person>), <Role> (role of the relationship e.g., as project leader, project member, project reviewer), <StartDateTime> (the start of the relationship between <Project> and <Person>), <EndDateTime> (the end of the relationship between <Project> and <Person>).

6.2 Assignment of URIs

That the "elements" of CERIF (in terms of the CORES Resolution) are embedded in a specific data-model context needs to be reflected in the URIs used to identify them. As of June 2003, EuroCRIS had reached a working consensus on the URIs

http://www.eurocris.org/cerif
http://www.eurocris.org/cerif/model
http://www.eurocris.org/cerif/domainontology
to identify and locate (respectively) CERIF as a whole, the CERIF data model, and a structured ontology in which each element is described formally — i.e., machine-understandably — using DAML+OIL. The latter two URIs are to be extensible in the form
<entity>/<attribute>

as with the following URIs for <Acronym>:

http://www.eurocris.org/cerif/model/OrgUnit/Acronym
http://www.eurocris.org/cerif/domainontology/OrgUnit/Acronym

which define <Acronym> (respectively) both as an attribute of <OrgUnit> (an Organisational Unit such as an Institute, Company, or Department) and in terms of its relationships to other attributes or elements in the context of the CERIF Domain Ontology.

6.3 Resolution of URIs

At present, the EuroCRIS Web site is driven by a database, such that no individual page is accessed directly with a URI. EuroCRIS expects to use the URIs both as identifiers for their "elements" and as locators of machine-understandable resources. The Domain Ontology, for example, would provide a machine-understandable description of the entities, attributes, and associated logic constraints in order to ensure the integrity of the data model and the second URI for <Acronym> (above) would serve not just to identify the element with respect to the ontology but to locate and retrieve machine-understandable code which defines particular constraints on the element.

6.4 Finalising a URI assignment policy

EuroCRIS is currently testing variations in naming policy — e.g., to choose between <Result_Publication> and <ResultPublication> and between the following three variants:
http://www.eurocris.org/cerif/model/OrgUnit/Acronym
http://www.eurocris.org/cerif/domainontology/#OrgUnit/#Acronym
http://www.eurocris.org/cerif/domainontology/OrgUnit/#Acronym

Entities representing relationships may be named using underscores, as with <Person_OrgUnit>. The objective is to finalise a naming policy that most closely fits with existing applications and with emerging conventions for identifier syntax. The URIs will be maintained by EuroCRIS as long as EuroCRIS exists as an organisation, and should EuroCRIS cease to exist, an attempt would be made to transfer custodianship to another responsible body.

7. IEEE/LOM

Learning Object Metadata (LOM) is a standard for facilitating the search, evaluation, acquisition, and use of learning objects, for instance by learners or instructors. It is based on work in the projects ARIADNE and IMS and builds on work from the Dublin Core group [ARIADNE, IMS]. The LOM data elements describe a learning object and are grouped into categories. The Base Scheme consists of nine such categories: General, Lifecycle, Meta-metadata, Technical, Educational, Rights, Relation, Annotation, and Classification. The LOM standard is maintained by the LOM Working Group of the Learning Technology Standards Committee (LTSC) of the Institute for Electrical and Electronics Engineers (IEEE) [LOM-WG].

As of June 2003, the LOM Working Group is balloting a specification for an XML binding of LOM, so a full account of URI assignment policies would be premature. However, five sets of issues under discussion in the working group can be presented in a general sense:

7.1 Form of URIs

The URI assignment policy should be extensible and should be able to evolve nicely with the standard set of elements to which it refers. In the case of LOM, this means considering the following aspects:

Should the namespace reflect the basically hierarchical nature of LOM? In that case, URIs would be of the form

http://ltsc.ieee.org/LOM/General/Identifier
http://ltsc.ieee.org/LOM/General/Title

However, this could be inappropriate for future versions, as it seems to suggest that the hierarchical structure will never be modified (for instance, by moving an element from one category to another) or abandoned altogether (for instance, to create a "flat space" of independent data elements. For these reasons, it may be more appropriate to have a flat namespace, as in

http://ltsc.ieee.org/LOM/Identifier
http://ltsc.ieee.org/LOM/Title

However, a flat namespace conflicts with current practice for naming data elements in LOM, because data elements with an identical name appear in different places in the base schema. For example

General.Identifier — identifies the learning object Meta-Metadata.Identifier — identifies the metadata record

If both of these elements were referred to as

http://ltsc.ieee.org/LOM/Identifier

then the (important!) distinction between them would seem to be lost. Alternatively, then, one might refer to them as

http://ltsc.ieee.org/LOM/GeneralIdentifier
http://ltsc.ieee.org/LOM/MetadataIdentifier

but in this case, the labels from the standard base schema conceptual definition would need to be adapted for the XML binding.

7.2 Representing versions in URIs

How should information about a version of LOM be reflected in the URIs? Two obvious candidates are

http://ltsc.ieee.org/LOMv1.0/<name_of_data_element>
http://ltsc.ieee.org/LOM/v1.0/<name_of_data_element>

However, an alternative could be

http://ltsc.ieee.org/LOM/<name_of_data_element>/LOMv.10

This second approach seems to suggest a more independent evolution over different versions for individual data elements, whereas the earlier proposal seems to suggest a more consolidated evolution of the LOM schema as a whole.

7.3 Maintenance authority

URIs need to be assigned in a way that is consistent across an organisation. In the case of IEEE LTSC LOM, this begs the question whether the relevant organisation is the IEEE, the IEEE LTSC, or the IEEE LTSC LOM working group. The more general approach seems preferable, if only because it will lead to consistency across a wide spectrum of standards, which will decrease the learning time for developers that adopt the standards. On the other hand, aiming at a comprehensive consensus in the larger organisation implies a process that is more complex and time-consuming.

7.4 Relationship between URIs and schema bindings

A variant on the previous issue is the question of consistency of namespaces across different bindings of the LOM. The LOM working group is currently working both on a "straightforward" XML binding and on an RDF binding. The latter is itself bound to XML. On the one hand, it seems preferable to adopt the same URIs for the same LOM data elements in both bindings. On the other hand, one could question whether the data elements really are identical across both bindings. This relates to the first issue above, as RDF is more in line with a flat approach to URI assignment, whereas an XML schema would be more compatible with a hierarchical approach, as XML schemas essentially define tree structures.

7.5 Resolution of URIs

Should the URIs resolve to something, and, if so, to what? In principle, URIs need not resolve to anything if they are intended merely as globally unique identifiers — in this context, for data elements. However, users who think of URIs as URLs may click on them in their browsers, and if this yields the error message "not found", they may genuinely believe that a namespace is "broken". Even more serious is the fact that tools like XML parsers or anti-virus programs may try to resolve the URIs in namespace declarations and signal a "problem" when such a resolution fails. This seems to suggest that URIs should resolve to something. Some argue that they should resolve to a document putting the metadata elements into a context. Others argue that URIs should resolve to XML DTDs or Schemas, so that software tools can read the "definition" of the namespace.

Deciding on a long-term approach for URI assignment is not a trivial matter and will require further discussion in the LOM Working Group during the ballot for the XML binding and beyond.

8. Dublin Core

The Dublin Core is a set of fifteen generic metadata elements for discovering resources across a diversity of domains and languages. Developed in the mid-1990s by a mixture of librarians and computer scientists, the early workshops overlapped with efforts to clarify an architecture and data model for metadata in the Web more generally — efforts that led to the mutually influential development of Resource Description Framework (RDF). The Dublin Core, along with related sets of metadata terms and technical specifications, is currently maintained by the Dublin Core Metadata Initiative, an open forum for the development of interoperable metadata solutions hosted on a not-for-profit basis by the Online Computer Library System in Dublin, Ohio [DCMI]. The key specification for the Dublin Core has been formally endorsed as ISO 15836-2003 [DC-ISO].

8.1 URI assignment methods

The identification of DCMI metadata terms with URIs is governed by a Namespace Policy [DCMI-NAMESPACE]. This policy currently declares URIs for three DCMI namespaces:

http://purl.org/dc/elements/1.1/
http://purl.org/dc/terms/
http://purl.org/dc/dcmitype/

designating (respectively) the fifteen-element Dublin Core, all other DCMI elements and qualifiers, and a controlled vocabulary of values for the Dublin Core element Type. The URI for each DCMI term is constructed by appending the term name to the URI of a DCMI namespace. For example:

http://purl.org/dc/elements/1.1/title
http://purl.org/dc/terms/extent
http://purl.org/dc/dcmitype/Image

identify title (an element in the Dublin Core), extent (a "qualifier" term), and Image (in the DCMI Type Vocabulary). A persistence policy commits DCMI to the maintenance of formal documentation, over time, for any URIs so assigned.

8.2 Information conveyed in URI strings

Two significant issues were raised during the development of this policy. Firstly, it was suggested that DCMI namespace URIs indicate a category of DCMI terms identified therein. For example, it was proposed that different DCMI namespaces be used to partition DCMI elements from DCMI qualifiers, or to indicate that a particular term was originally defined by a particular working group or within a particular domain. Secondly, it was suggested that all DCMI namespace URIs carry versioning information (for example a date stamp) that would be updated as terms within the namespace change.

On the first issue, it was considered that the category of a DCMI term is not necessarily persistent. For example, terms defined initially by the education community might subsequently become useful to other communities. Associating particular URIs with particular categories of terms was not felt to be helpful to the long-term stability of DCMI namespaces or the URIs of DCMI terms within those namespaces.

On the second issue it was again considered that embedding versioning information within the namespace URI was unlikely to be helpful to the long-term stability of DCMI namespaces or the URIs of DCMI terms within those namespaces. Rather, it was felt that versioning information should be carried in the formal documentation of the terms. The one exception to this rule, the DCMI namespace http://purl.org/dc/elements/1.1/, was maintained because it was already in wide use when the Namespace Policy was finished.

8.3 Editorial changes to URI-identified terms

As an alternative to reflecting version changes in URIs, the Namespace Policy articulates the implications for URI identification of various classes of editorial changes to term declarations, whereby "minor" or "substantive" errata may be corrected without consequence for URIs while changes of a semantic nature trigger the creation of a new term (and corresponding URI).

8.4 Resolution of URIs

The Namespace Policy specifies that all of the URIs identifying DCMI terms "will resolve to a machine-processable DCMI term declaration for all the terms within that namespace". As of June 2003, DCMI term URIs redirect to an RDF schema documenting that term's DCMI namespace, but the formulation in the Namespace Policy is sufficiently vague as to support any future conventions for formal representation or content negotiation that might emerge [DC-RDF].

8.5 Practicalities and etiquette of recognising non-DCMI URIs

From the beginning, Dublin Core has defined itself as a small standard with broad applicability, and much of the effort in DCMI has gone towards clarifying how the core and its various extensions can be used together with more detailed or domain-specific vocabularies declared and maintained by specialised communities of expertise. The DCMI Usage Board has shown a bias towards keeping the core standard small through cooperation with maintainers of more specialised standards on forms of mutual recognition and support. As discussed in Section 4 above, DCMI is currently working with the Library of Congress on the mechanics of formally expressing that the MARC relator terms semantically refine the Dublin Core element contributor.

9. Towards forms of good practice

An Accompanying Measure under the Semantic Web call of the Information Society Technologies programme, the CORES Project was motivated in part by a desire to promote the development of metadata practice in support of W3C's vision of a Semantic Web, and in laying the groundwork for the Brussels meeting, the project greatly benefited from the technical advice of W3C's Semantic Web Activity. Although the meeting confirmed that some standards communities are focused more strongly on organising information within specific communities and applications than on the grander context of a global Web, the fact that such a diverse group of standards communities could find common ground with a technology fundamental to Web architecture should be supportive of that broader vision as well.

In the six months since the signing of the CORES Resolution, the signatories have worked towards translating their commitments into practical URI assignment and persistence policies. Given the need to evaluate the impact of design decisions and to build consensus in the communities behind the standards, it was perhaps too ambitious to expect that policies could be finalised and URIs assigned within just thirty-six weeks. However, having such a short fuse for such a specific set of tasks has highlighted a number of areas where forms of good practice have yet to emerge.

9.1 What to identify

Beyond mandating the assignment of URIs to "elements", the Resolution left it up to the signatories to decide exactly what that means in the context of a particular standard and which other entities, such as sets of elements or values in controlled vocabularies, should also be so identified. Some interesting questions have arisen in this regard:

  • Should the URI of an element reflect a hierarchical context within which it is embedded? In IEEE/LOM, for example, a learning object and the metadata record about that learning object each have an identifier. Should GeneralIdentifier and MetadataIdentifier be assigned separate URIs to express this distinction, or just one?
  • If organisation A creates a URI designating an entity maintained by organisation B, and organisation B then creates its own URI for the same entity, by what etiquette or mechanism can the redundant identifiers be cross-referenced or preferences declared? For example, DCMI has assigned a URI to a term used for indicating that a given value was taken from the Library of Congress Subject Headings, but the Library of Congress will now assign its own URI to designate the same vocabulary.
  • If semantically identical elements are shared across multiple element sets maintained by an organisation, should they each be assigned a separate URI or share one common URI? For example, MARC 21 and MODS share the concept "name", though one standard identifies the concept with a number while the other uses a word, leading to the formation of two separate URIs.
  • All participants recognize that metadata elements may evolve over time, but for several signatories the implications of that change for URI assignment were as yet unclear. The question is whether successive historical versions of an element should share a single, unchanging URI or whether each version should be assigned its own URI, perhaps by embedding a version number in the URI string.

9.2 What URIs should resolve to

The Resolution leaves it to the signatory organizations what the URIs should look like and explicitly says that no assumptions should be made that URIs resolve to something on the Web. Aside from the International DOI Foundation, which has proposed the URI scheme "doi:", and Library of Congress, which is exploring the use of URNs for the long term, the signatories have assigned or propose to assign URIs beginning with the URI scheme "http:". Using URIs (identifier strings) that look exactly like URLs (addresses for retrieving something from the Web) inevitably raises the problem of resolution. Users who click on a URI and get "Error: Not Found" may think the URI is broken. As a longer-term solution to the problem of naming, the Library of Congress will be investigating the use of URNs, which do not entail such an expectation. The "doi:" scheme is to be associated with a particular set of resolution, conversion, and retrieval services.

9.3 Asserting semantic relationships

The Resolution is silent about how the URIs assigned can be used in asserting semantic relationships between elements in different sets. URIs were seen as a useful common basis for asserting the relationship of elements in a diversity of applications to shared ontologies such as the Basic Semantic Register or the <indecs> Data Dictionary, or to formally express the relationship between two element sets in the machine-processable and re-usable form of an RDF schema. Facilitating the expression and processing of such assertions in the interest of interoperability between different forms of metadata was seen by its signatories as the longer-term significance of the CORES Resolution.

Acknowledgements

The meeting that resulted in the CORES Resolution was organised in the context of a Standards Interoperability Forum activity of the CORES project, an Accompanying Measure in the Information Society Technologies programme of the European Union [CORES]. The meeting was held at the conference center of the European Committee for Standardization (CEN) in Brussels on 18 November 2002. In attendance were Thomas Baker, Makx Dekkers, Rachel Heery, Tony Hegarty, Pete Johnston, and Jan Ricken (for CORES); Dave Beckett (SWAD Europe); Alan Danskin and Rebecca Guenther (MARC 21); Erik Duval and Wayne Hodgins (IEEE/LOM); Brian Green and David Martin (ONIX); Thomas Baker (DCMI); Keith Jeffery (CERIF); Pierre Paul Sondag (European Commission); and Shigeo Sugimoto (ULIS); with apologies from Norman Paskin (DOI), Karl Best (OASIS), Stephen Katz (FAO), Neil Day (MPEG), and Stuart Sutton (GEM). Eric Miller (W3C Semantic Web Activity) and Dave Beckett and Dan Brickley (W3C and SWAD Europe) provided extensive input and advice in the preparation of the Brussels meeting. The signatories of the CORES Resolution were Eliot Christian, Brian Green, Rebecca Guenther, Keith Jeffery, Norman Paskin, Robby Robson (IEEE/LOM), and Stuart Weibel (Dublin Core). In June 2003, José Borbinha of the National Library of Portugal endorsed the resolution on behalf of UNIMARC.

References

[ARIADNE] Erik Duval, Eddy Forte, Kris Cardinaels, Bart Verhoeven, Rafael Van Durm, Koen Hendrikx, Maria Wentland Forte, Norbert Ebel, Maciej Macowicz, Ken Warkentyne, Florence Haenni, The ARIADNE Knowledge Pool System: a Distributed Digital Library for Education, Communications of the ACM 44(5):73-78, May 2001.

[BERNERS-LEE] Universal Resource Identifiers — Axioms of Web Architecture, <http://www.w3.org/DesignIssues/Axioms.html>.

[BIB1] ISO 23950 Bib-1 Attribute Set, <http://lcweb.loc.gov/z3950/agency/defns/bib1.html>.

[CERIF] CERIF: the Common European Research Information Format, <http://www.cordis.lu/cerif/>.

[CORES] Standards Interoperability Forum, CORES Project, <http://www.cores-eu.net/interoperability/>.

[CORES-RESOLUTION] Resolution on Metadata Element Identifiers, CORES Standards Interoperability Forum, <http://www.cores-eu.net/interoperability/cores-resolution/cores-resolution.pdf>.

[DC-ISO] The Dublin Core Metadata Element Set, <http://www.niso.org/international/SC4/n515.pdf>.

[DC-RDF] DCMI Term Declarations Represented in RDF Schema Language, <http://dublincore.org/schemas/rdfs/>.

[DCMI] Dublin Core Metadata Initiative, <http://dublincore.org/>.

[DCMI-NAMESPACE] DCMI Namespace Policy, <http://dublincore.org/documents/dcmi-namespace/>.

[DOI] The Digital Object Identifier System, <http://www.doi.org>.

[DOI-CORES] "Metadata interoperability: DOI signs metadata agreement", DOI News (March 2003), <http://www.doi.org/news/4Mar2003-News.html#meta>.

[DOI-DLIB] Norman Paskin, "DOI: A 2003 Progress Report", D-Lib Magazine 9:6 (June 2003), <doi:10.1045/june2003-paskin>.

[DOI-HANDBOOK] DOI Handbook, Edition 3.0, first release May 2003, <http://www.doi.org/hb.html>.

[DOI-ONIX] "Common Dictionary for DOI and ONIX Metadata", DOI News (April 2003), <http://www.doi.org/news/030415dictionarynews.pdf>.

[DOI-RFC] Norman Paskin, Eamonn Neylon, Tony Hammond, Sam Sun, "The 'doi' URI Scheme for the Digital Object Identifier (DOI)", IETF Internet Draft (June 2003), <http://www.ietf.org/internet-drafts/draft-paskin-doi-uri-04.txt>.

[EUROCRIS] euroCRIS: Current Research Information Systems, <http://www.eurocris.org/>.

[GILS] Global Information Locator Service, <http://www.gils.net/>.

[GILS-PROFILE] Application Profile for the Government Information Locator Service (GILS), <http://www.gils.net/prof_v2.html>.

[GILS-SEMANTICS] Example Application of a Semantics Register, <http://www.gils.net/register-example.html>.

[IMS] IMS Global Learning Consortium, Inc., <http://www.imsproject.org>.

[ISO11179] ISO/IEC 11179 - Information Technology — Metadata Registries (MDR), <http://metadata-stds.org/11179>.

[LOM-WG] LOM Working Group, IEEE Learning Technology Standards Committee, <http://ltsc.ieee.org/wg12/>.

[MARC21] MARC Standards, <http://www.loc.gov/marc/>.

[MODS] MODS - Metadata Object Description Schema, <http://www.loc.gov/standards/mods/>.

[MPEG21] Jan Bormans, Keith Hill, eds., MPEG-21, Part 6 - Rights Data Dictionary. In MPEG-21 Overview v.5 (October 2002), ISO/IEC JTC1/SC29/WG11/N5231, <http://www.chiariglione.org/mpeg/standards/mpeg-21/mpeg-21.htm#_Toc23297978>.

[ONIX] ONIX for Books, <http://www.editeur.org/onix.html>.

[RFC2396] T. Berners-Lee, R. Fielding, L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, (August 1998) <http://www.ietf.org/rfc/rfc2396.txt>.

[UBCIM] IFLA Universal Bibliographic Control and International MARC Core Activity, <http://www.ifla.org/VI/3/ubcim.htm>.

[Z39-50] Z39.50 — International Standard Maintenance Agency, <http://www.loc.gov/z3950/agency/>.

Copyright © Thomas Baker and Makx Dekkers
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Editorial | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/july2003-baker