Articles
spacer

D-Lib Magazine
September 2000

Volume 6 Number 9

ISSN 1082-9873

RSLP Collection Description

 

Andy Powell
UKOLN, University of Bath
a.powell@ukoln.ac.uk

Michael Heaney
University Library Services Directorate, University of Oxford
michael.heaney@university-libraries.oxford.ac.uk

Lorcan Dempsey
DNER Director, Joint Information Systems Committee, King's College London
lorcan.dempsey@kcl.ac.uk

Red Line

spacer

Introduction

The description of collections is becoming increasingly important in the context of networked information services and is an important underpinning for developing a collective resource. This view has emerged clearly through the MODELS project [1], where it has influenced the course of the clumps and hybrid libraries [2] who are working with collection and service descriptions, and in UKOLN's recent work on retrospective conversion [3]. In the latter case, a strong view is emerging that libraries need to complement item-based description with description at a higher level. A particular feature of this discussion is that this would complement current work in the archives community and that descriptions at this shared level of granularity would facilitate cross-domain working (while acknowledging that collections may mean different things in the different library, archival and other content models). This has been corroborated by recent work which looks at research issues shared by libraries, archives and museums, where it was recognized that description at this level would support higher level navigation of the cultural resource and selection of particular resources for further searching [4].

The creation of collection descriptions allows the owners or curators of collections to disclose information about their existence and availability to interested parties. Although collection descriptions may take the form of unstructured textual documents, for example a set of Web pages describing a collection, there are significant advantages in describing collections using structured, open, standardized, machine-readable formats. Such descriptions enable:

  • users to discover and locate collections of interest,
  • users to perform searches across multiple collections in a controlled way,
  • the refinement of distributed searching approaches based on the characteristics of candidate collections,
  • software to perform such tasks on behalf of users, based on known user preferences.

There are additional advantages where catalogues do not exist for collections, as a collection description may provide some indication to the remote user of content and coverage.

This article describes work undertaken as part of the RSLP Collection Description Project, a project funded by the UK Research Support Libraries Programme (RSLP) [5] with the aim of enabling all projects funded through the programme to describe collections in a consistent and machine readable way. With additional funding from OCLC, the project has developed a model of collections and their catalogues [6]. We have used this work to form the basis of a collection description metadata schema [7], implemented using the Resource Description Framework (RDF) [8]. A Web-based tool has been developed that allows the construction of RDF descriptions by filling in a Web form [9]. Associated with this tool there is a detailed set of data entry guidelines [10] and an enumerated list of collection types [11]. Future work will see the development of a Web robot that will harvest collection descriptions from project Web sites and make them available through a central search service.

Although it has its origin in the Research Support Libraries Programme, many of whose results will be digital resources of one kind or another, our work is not restricted to the description of digital collections. It is intended that the results of the project should be applicable to physical and digital collections of all kinds, including library, art and museum materials. It is by no means applicable only to the resources of large research libraries.

In the specific context of the Programme, the intention of the project was to offset the costs of not adopting a consistent, machine-readable description at an early stage. Such a cost may have fallen on users and managers of collections alike:

  • For users, there would have been the burden of having to individually find and navigate particular Web sites, interpret differently formatted descriptions, and limited opportunity of consistent, search based approaches.
  • For managers, there would have been the burden of having to design their own local approaches, and at some future date of having to redo this work to conform to a consistent approach.

Our work suggests that requirements for collection description fall into three broad informational categories. Firstly, descriptive information about the collection. This may include the subject area, ownership, strengths and weaknesses and sources of items within the collection. Secondly, information about how to access the collection, including physical access, in the case of library, museum or archival collections for example, or networked access in the case of digital collections. Thirdly, the terms and conditions associated with access to the collection and individual items within it.

The term collection can be applied to any aggregation of individual items. Collections are exemplified in the following, non-exhaustive, list: library collections; museum collections; archives; library, museum and archival catalogues; digital archives; Internet directories; Internet subject gateways; robot generated Web indexes; collections of text, images, sounds, datasets, software, other material or combinations of these (this includes databases, CD-ROMs and collections of Web resources); other collections of physical items.

This is a broad list, of overlapping categories. However, it suggests the need for a planned approach, both so that techniques adopted fit in well with broader resource discovery directions and so that techniques are flexible enough to cope with the many collection types that libraries will develop and indicate the relationships between them. It is worth noting that the list includes collections of physical items, collections of digital surrogates of physical items and collections of born-digital items. It is also worth noting that some collections are actually catalogues (metadata) for other collections. For example, a library catalogue typically describes the items in one or more collections within a library. Finally, it is worth noting that collections are often composed of other collections.

An analytic model of collections and their catalogues

The collection description model is aimed in the first instance at those responsible for the development of collection descriptions. It is also a general contribution to the debate about metadata in the digital age. As described above, its initial use is to inform the construction of a demonstrator to which all relevant RSLP projects can feed information. Although the primary purpose of this model is to illumine the process of resource discovery by users, collection description also serves collection management purposes, particularly in discharging an institution's curatorial responsibilities. The work focuses primarily on the needs of libraries in describing their collections but also takes into account the requirements of other sectors. Collection description itself may take a variety of forms, and the model makes no presumption about the format of such a description.

The information landscape can be seen as a contour map in which there are mountains, hillocks, valleys, plains and plateaus. A large general collection of information, for example a research library, can be seen as a plateau, raised above the surrounding plain. A specialized collection of particular importance is like a sharp peak. Upon a plateau there might be undulations representing strengths and weaknesses. The scholar surveying this landscape is looking for the high points. A high point represents an area where the potential for gleaning desired information by visiting that spot (physically or by remote means) is greater than that of other areas. To continue the analogy, the scholar is concerned at the initial survey to identify areas rather than specific features - to identify rainforest rather than to retrieve an analysis of the canopy fauna of the Amazon basin. The model attempts to characterize that initial part of the process of information retrieval. The landscape is, however, multidimensional. Where one scholar may see a peak another may see a trough. The task is to devise mapping conventions that enable scholars to read the map of the landscape fruitfully, at the appropriate level of generality or specificity.

The IFLA study Functional Requirements of Bibliographic Records [12] identifies (pp. 8-9) four functions of records, progression through which may be seen as constituting a successful traverse of the information landscape and the attainment of one's goal. These are:

To find
to provide access points by which information can be found.
To identify
to describe something so as to enable users correctly to interpret records retrieved.
To select
to provide a means for users to choose from among the identified records.
To obtain
to acquire the identified materials.

The first two of these activities are associated with the traditional areas of catalogue codes, access and description. The relations they embody are characteristically static or at least persistent. A static model may adequately represent them - they are the map of the landscape. The second two reflect the more active operations involved in retrieving and using information; they are transactional in nature, and a dynamic or event-driven model may be more appropriate for them - they represent attempts to use the map to reach the areas of interest.

The model attempts to encompass the first two activities. There are, however, many links to be made between all the elements in the process of obtaining information, and these links may be expressed reciprocally. Determining and describing the part of a link that may be embodied in the model inevitably determines the nature of the complementary half, though the objects at the other end may not be described fully, or at all.

The list of example collections above can be categorized into those that are collections of entities (e.g. books) or of derived representations of entities (e.g. photographs of pieces of sculpture) on the one hand, and those that are collections of information about such entities. This article refers to a collection of entities as a 'Collection' and to a collection of information about such entities as a 'Collection-Description'.

Some types of Collection-Description can themselves be seen as Collections, in this case of metadata rather than primary information. The Creators, Producers, etc., of the secondary Collection will not necessarily be those of the Collection it catalogues, however. Moreover, the secondary Collection can have its own recursive Collection-Description. This article also uses Collection-Description to encompass both intellectually created resources and passive assemblages of data such as those gathered by robotic search engines.

The model says nothing explicit about the size of a Collection. It is possible to envisage a Collection consisting of one Item. Where an institution can choose between different degrees of aggregation in determining what are its Collections, there is no structure inherent in the model that requires or predisposes a particular level of aggregation. The institution should base its choices on its own pragmatic grounds, such as the level of detail required to make explicit those elements of the Collection-Description that the institution deems to be useful or necessary for the purposes of resource discovery or collection management. I.e. institutions should adopt a functional granularity approach.

A highly simplified view of the model is presented here:

RSLP Collection Description Model - simplified view

Content is an intellectual creation, without reference to any particular instantiation. Item is the concrete (incorporating physical and electronic) realization of Content. Note that, in so far as the model is concerned with collections, the entities Content and Item are considered only to the extent that their types and attributes impinge upon Collection Description. In the vast majority of cases, too, the Items will coincide with what FRBR calls Items, not Manifestations. Item has been chosen as the most neutral term in preference to other terms which have been used such as 'Document' or 'Document-like Object'. Item can most easily embrace all of the concepts of physical and electronic, text and non-text, and human and natural creations.

A Collection is an aggregation of physical and/or electronic Items. A Location is the place (identified physically or electronically) where a Collection is held. Note that it is important to distinguish between the place and the institution responsible for the place; the latter is represented in this model by the term Administrator.

A Creator is responsible in some way for the existence of the intellectual Content of an Item. A Producer is responsible for the existence of the physical or electronic form in which an Item is realized.

A Collector gathers Items together. An Owner is the Agent who has legal possession of a Collection. An Administrator has responsibility for the physical or electronic environment in which a Collection is held.

The model separates Agents (Creator, Producer, Collector, Owner and Administrator), shown on the left-hand side, from Objects (Content, Item, Collection and Location), shown on the right. Agents are people or organizations. Agents initiate actions, for example they create Content, produce Items, gather Items into Collections and administer Locations. Agents have rights. Agents can have many roles at the same time, for example the Collector of a collection may also be its Owner. Agents also control the usage of the collections. They determine who has access rights to the Collection and its Location and who holds copyright and ownership.

Kinds of collection descriptions

The model defines four broad classes of collection description:

Unitary Finding-Aid
a Collection-Description which consists only of information about the Collection as a whole and does not provide information about the individual Items within it.
Hierarchic Finding-Aid
a Collection-Description which consists of information about the Collection as a whole, together with information about the individual Items within it and their Content, including contextual information about the relation of the Items and their Content to the Collection as a whole. Note that this type of Collection-Description is most often associated with archival collections where contextual information is necessary to the understanding of the Items.
Analytic Finding-Aid
a Collection-Description which consists of information about the individual Items within it and their Content. Note that this type of Collection-Description is typical of a library catalogue.
Indexing Finding-Aid
a Collection-Description that consists of information derived from the individual Items within it.

In three of the four identified types of Collection-Description the information conveyed is analytic: that is, the information is held in discrete packets (e.g. catalogue records). Although they may be brought together and presented as a result of a search, or may be organized in a particular sequence (e.g. by author's name), the packets are largely independent of each other.

However, two qualifications have to be made to the paragraph above. First, a Collection-Description may have some overall structure that reduces the autonomy of its constituent elements - i.e. it may be necessary to know the placement of a catalogue record within the structure of the catalogue - its context -- in order to interpret the record correctly. This is always true to some extent, and the participants in the Toronto conference on the principles and development of AACR [13] stressed the weaknesses in online catalogues resulting from the loss for such contextual information (for example, the ordering of results sets is often effectively arbitrary). It is particularly true for the established practices in cataloguing archival collections (see the rules for multilevel description in ISAD(G) [14]).

Second, with Internet resources the distinctions may become blurred. Take, for example, the existence of a site for the works of Kipling on the World Wide Web. Viewing the site as a whole, it may be said to be a collection of entities or derived representations of entities. However, if much the same list of links can be retrieved by a search on (say) Yahoo, does this make Yahoo a Collection in our definition instead of a Collection-Description? We take the view that ownership, administration and location are relevant to the definition of a collection. The fact that a catalogue can now be directly linked to the entities catalogued - that the searcher can move seamlessly from finding and identifying to selecting and obtaining - need not mean that the constituent elements of those processes have changed.

A Unitary Finding-Aid takes as its basis the information about the Collection as a whole - it makes no attempt to capture information about individual records except in so far as it is necessary to provide aggregate information (e.g. on limiting dates, or on the number of Items it contains).

An Analytic Finding-Aid lists the individual records comprising information about the intellectual Content and the Items in which it is realized. There may, in the individual records, be information about Collections, and the Finding-Aid may be searchable from that aspect, but that is not its focus. A library catalogue is typically an Analytic Finding-Aid.

An archival collection is more often described by a Hierarchic Finding-Aid, in which the individual Items and their Content are described, but firmly grounded within the overall arrangement of the Collection, e.g. grouping together all the letters, account books etc. in an ordered sequence or sequences. The Items are often not uniquely identifiable when considered in isolation, so the context of the Collection is an essential element in compiling the Collection-Description.

An Indexing Finding-Aid is characterized here as consisting of information derived from Items, by implication regardless of their Content. By this is meant that an Indexing Finding-Aid - such as a robotic search engine - will index the words in a document (or catalogue record) regardless of their context and without trying to identify the discrete elements of Content contained therein. The effects of this may be mitigated by the use of metadata tags in Web documents, but in so far as the engine uses such tags, it is creating an Analytic Finding-Aid (which may or may not be combined with the Indexing Finding-Aid). An online Analytic Finding-Aid may incorporate a keyword index that is, in effect, an Indexing Finding-Aid in this sense of the term. At the other end of the technological scale, a printed Calendar of a Collection may have its own printed Indexing Finding-Aid which lists, out of context, the names, places, etc., occurring in the Collection.

External relationships

Because it is a model of a single instance of a Collection, the model of Collection Description does not explicitly map external relationships. Such relationships are between instances of the model, and are not part of the internal structure of the model itself. They may, moreover, operate both at the Collection level and at the Collection-Description level. Relevant external relationships are:

  • Has-Version (including format distinctions; the distinctions made in CCSDS Reference Model for an Open Archival Information System (OAIS) [15] section 5.1.3 p. 5-4 may appropriately be used as instances of this class of Relationship)
  • Has-Part (relating parts to wholes; discussion has identified the desirability of distinguishing Diffuse)
  • Has-Complement (where a formerly unified Collection has been split into separate parts, possibly in separate institutions)
  • Has-Association (where another Collection is relevant but the relationship is not described by one of the above list. Typically relates to another Collection associated with one or more of the Agents of the Collection)
  • Has-Publication (where the collection is the basis of a published scholarly work). The associated Publication is unlikely to be present in a Collection Description resource so this External Relationship will not be reciprocal
  • Is-Described-By (relating a Collection or Collection-Description to a Collection-Description)

RSLP collection description schema

The schema presented here is intended to facilitate the simple description of Collections, Locations and Agents, i.e. of the emboldened entities in the above diagram. This includes:

  • descriptive attributes about the Collection,
  • descriptive attributes about the Location (or Locations),
  • identification and/or description of three kinds of related Agents, the Collector and Owner of the Collection and the Administrator of the Location,
  • external relationships - identification of collections and other resources that are related to the Collection being described.

In terms of the four collection description types listed above, this schema supports the creation of Unitary Finding-Aids.

Collection attributes:

Attribute

RDF property

Definition

General attributes

 

 

Title

dc:title

The name of the collection.

Identifier

dc:identifier

A formal identifier for the collection.

Description

dc:description

A description of the collection.

Strength

cld:strength
(sub-property of dc:description)

An indication (free text or formalized) of the strength(s) of the collection.

Physical Characteristics

dc:format

The physical or digital characteristics of the collection.

Language

dc:language

The language of the items in the collection.

Type

dc:type

The type of the collection.

Access Control

cld:accessControl
(sub-property of dc:rights)

A statement of any access restrictions placed on the collection, including allowed users, charges, etc.

Accrual Status

cld:accrualStatus
(sub-property of dc:description)

A statement of accrual policy (closed, passive, active, partial/selective), accrual method (purchase, deposit)) and accrual periodicity (closed, irregular, periodic).

Legal Status

cld:legalStatus
(sub-property of dc:description)

A statement of the legal status of the collection.

Custodial History

cld:custodialHistory
(sub-property of dc:description)

A statement of any changes in ownership and custody of the collection that are significant for its authenticity, integrity and interpretation.

Note

cld:note
(sub-property of dc:description)

Any general information about the collection.

Location

cld:hasLocation
(sub-property of dc:relation)

The identifier for the physical or online (digital) location of the collection.

Subject

 

 

Concept

dc:subject

A concept (keyword) of the items in the collection.

Object

cld:objectName
(sub-property of dc:subject)

An object name associated with the items in the collection.

Name

cld:agentName
(sub-property of dc:subject)

A personal or corporate name associated with the items in the collection.

Place

dcq:place
(sub-property of dc:coverage)

The spatial coverage of the items in the collection.

Time

dcq:time
(sub-property of dc:coverage)

The temporal coverage of the items in the collection.

Dates

 

 

Accumulation Date Range

cld:accumulationDateRange
(a sub-property of dc:date)

The range of dates over which the collection was accumulated.

Contents Date Range

cld:contentsDateRange
(a sub-property of dc:date)

The range of dates of the individual items within the collection.

Associated agents

 

 

Collector

dc:creator

The identifier for an agent who gathers (or gathered) the items in a collection together.

Owner

cld:owner

The identifier for an agent who has legal possession of the collection.

External relationships

 

 

Sub-collection

dcq:hasPart
(sub-property of dc:relation)

The identifier or name of a second collection contained within the current collection.

Super-collection

dcq:isPartOf
(sub-property of dc:relation)

The identifier or name of a second collection that contains the current collection.

Catalogue or description

cld:hasDescription
(sub-property of dc:relation)

The identifier or name of a second collection that describes the current collection (for example, the catalogue for the current collection).

Described collection

cld:isDescriptionOf
(sub-property of dc:relation)

The identifier or name of a second collection that is described by the current collection.

Associated collection

cld:hasAssociation
(sub-property of dc:relation)

The identifier or name of a second collection that is associated by provenance with the current collection.

Associated publication

cld:hasPublication
(sub-property of dc:relation)

The identifier or name of a publication that is based on the use, study, or analysis of the collection.

Location attributes:

Attribute

RDF property

Definition

General attributes

 

 

Name

dc:title

The name of the location.

Identifier

dc:identifier

A formal identifier for the location.

Access Conditions

cld:accessConditions

Hours of access, classes of permitted user, etc.

Held collection

cld:isLocationOf
(sub-property of dc:relation)

The identifier for a collection held at this physical or online (digital) location.

See also

cld:seeAlso
(sub-property of dc:relation)

The identifier of a resource that provides further information about this location (typically the URL for an organizational home page).

Associated agents

 

 

Administrator

cld:administrator
(sub-property of dc:publisher)

The identifier for an agent who has responsibility for the physical or electronic environment in which the collection is held.

Physical location

 

 

Postal address

cld:address
(sub-property of dc:identifier)

The full postal address for the physical location of the physical collection.

Post/zip code

cld:postcode

The postcode or zip code for the physical location of the collection.

Country

cld:country

The country in which the collection is physically located.

Online location

 

 

Locator

dcq:locator
(sub-property of dc:identifier)

The online location (URL) of an online (digital) collection.

Agent attributes:

Attribute

RDF property

Definition

General attributes

 

 

Name

vcard:fn

The name of the agent.

Identifier

dc:identifier

A formal identifier for the agent.

Organisation name

vcard:org

The organizational name of, or affiliated with, the agent.

Role

vcard:role

The role (typically an organizational role) fulfilled by the agent.

Telephone number

vcard:voice
(sub-property of vcard:tel)

The telephone number of the agent.

Fax number

vcard:fax
(sub-property of vcard:tel)

The fax number of the agent.

Email address

vcard:email

The electronic mail address of the agent.

Agent History

cld:agentHistory

An administrative history of, or biographical details on, the agent.

Note that the 'dc:', 'cld:' and 'vcard:' prefixes used in the 'RDF property' column above provide a convenient short-hand representation for the full RDF property. See the example RDF description listed below for full details of the RDF properties used in RSLP collection descriptions.

It should be noted that, wherever possible, properties have been taken from existing metadata schemas, notably the Dublin Core Metadata Element Set (DCMES) [16] and the vCard set of attributes [17]. The table above also indicates where a property is a sub-property of an existing property in the Dublin Core or vCard schema.

Syntax

The Resource Description Framework (RDF) is the W3C recommended architecture for metadata on the Web. RDF provides a mechanism for making simple metadata statements about resources (including both digital and physical resources) of the form - resource X has property Y with value Z. By grouping sets of these simple statements together, and by using the same mechanism to make statements about the sets of statements, it is possible to build up complex RDF descriptions of multiple resources and the relationships between them. Currently, the exchange of RDF descriptions on the Web is achieved by encoding them using the Extensible Markup Language (XML) [18].

The RSLP Collection Description project chose to encode collection descriptions using the XML encoding of RDF, based on the attributes listed in the schema above. Full collection descriptions are partitioned into separate RDF descriptions of Collections, Locations, Collectors, Owners and Administrators. These separate descriptions are linked together to form a full description.

An example RDF/XML description of the Morrison Collection of Chinese Books housed at the School of Oriental and African Studies Library, London follows:

   <?xml version="1.0"?>
   
    <rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:dcq="http://purl.org/dc/qualifiers/1.0/"
      xmlns:vcard="http://www.imc.org/vcard/3.0/"
      xmlns:cld="http://www.ukoln.ac.uk/metadata/rslp/1.0/">
      <rdf:Description about="urn:x-rslpcd:967715792-47835">
        <!-- Collection -->
        <dc:title>
          Morrison Collection of Chinese Books
        </dc:title>
        <dc:description>
          This collection comprises the Chinese books accumulated by Dr.
          Robert Morrison (1782 - 1834), the first Protestant missionary to
          China, during his sixteen years residence in Guangzhou and Macao
          between 1807 and 1823. Ten thousand Chinese-style thread-bound
          volumes cover a broad spectrum of subjects from early and mid-Qing
          China.
        </dc:description>
        <dc:subject>
          <rdf:Description>
            <dcq:scheme>
              LCSH
            </dcq:scheme>
            <rdf:value>
              Missionaries -- China
            </rdf:value>
          </rdf:Description>
        </dc:subject>
        <dc:subject>
          <rdf:Description>
            <dcq:scheme>
              LCSH
            </dcq:scheme>
            <rdf:value>
              Rare books -- China -- Bibliography -- Catalogs.
            </rdf:value>
          </rdf:Description>
        </dc:subject>
        <dc:subject>
          <rdf:Description>
            <dcq:scheme>
              LCSH
            </dcq:scheme>
            <rdf:value>
              Chinese Imprints -- Catalogs
            </rdf:value>
          </rdf:Description>
        </dc:subject>
        <cld:agentName>
          Morrison, Robert, 1782-1834.
        </cld:agentName>
        <cld:agentName>
          School of Oriental and African Studies
        </cld:agentName>
        <dcq:place>
          South China
        </dcq:place>
        <dcq:place>
          Macao
        </dcq:place>
        <dcq:time>
          Early to mid Qing
        </dcq:time>
        <cld:strength>
          classics, history, philosophy, literature
        </cld:strength>
        <dc:language>
          chi
        </dc:language>
        <dc:format>
          10,000 thread-bound volumes - manuscripts and folios
        </dc:format>
        <dc:type>
          Collection.Library.Special
        </dc:type>
        <cld:accumulationDateRange>
          1807-1823
        </cld:accumulationDateRange>
        <cld:contentsDateRange>
          1650-1825
        </cld:contentsDateRange>
        <dcq:hasPart>
          Literature collection within Morrison Collection of Chinese Books
        </dcq:hasPart>
        <dcq:hasPart>
          Classics collection within Morrison Collection of Chinese Books
        </dcq:hasPart>
        <dcq:hasPart>
          History collection within Morrison Collection of Chinese Books
        </dcq:hasPart>
        <dcq:hasPart>
          Philosophy collection within Morrison Collection of Chinese Books
        </dcq:hasPart>
        <dcq:isPartOf>
          SOAS Library
        </dcq:isPartOf>
        <cld:hasDescription>
          Catalogue of the Morrison Collection of Chinese Books (monograph)
        </cld:hasDescription>
        <cld:hasDescription>
          Items recorded in SOAS Library OPAC <http://195.195.181.2/>
        </cld:hasDescription>
        <cld:accrualStatus>
          deposit, closed
        </cld:accrualStatus>
        <cld:accessControl>
          A Library Guide to Membership is found at:
          <http://www.soas.ac.uk/Library/Guides/membership.html>
        </cld:accessControl>
        <cld:note>
          London Missionary Society held collection from 1825-1834, then UCL
          until 1922. Six "missing" books held at Bodleian library
        </cld:note>
        <dc:creator resource="urn:x-rslpcd:967715792-32366"/>
        <cld:owner resource="urn:x-rslpcd:967715792-62789"/>
        <cld:hasLocation resource="urn:x-rslpcd:967715792-16277"/>
      </rdf:Description>
      <rdf:Description about="urn:x-rslpcd:967715792-32366">
        <!-- Collector -->
        <vcard:fn>
          Morrison, Robert, 1782-1834.
        </vcard:fn>
      </rdf:Description>
      <rdf:Description about="urn:x-rslpcd:967715792-62789">
        <!-- Owner -->
        <vcard:org>
          School of Oriental and African Studies Library
        </vcard:org>
        <vcard:voice>
          +44 207 898 4163
        </vcard:voice>
        <vcard:fax>
          +44 207 898 4159
        </vcard:fax>
        <vcard:email>
          libenquiry@soas.ac.uk
        </vcard:email>
      </rdf:Description>
      <rdf:Description about="urn:x-rslpcd:967715792-16277">
        <!-- Location -->
        <dc:title>
          School of Oriental and African Studies Library
        </dc:title>
        <cld:address>
          Thornhaugh Street, Russell Square, LONDON WC1H 0XG.
        </cld:address>
        <cld:postcode>
          WC1H 0XG
        </cld:postcode>
        <cld:country>
          uk
        </cld:country>
        <cld:accessConditions>
          See opening hours for SOAS at:
          <http://www.soas.ac.uk/Library/open.html>. Reference only in the
          library.
        </cld:accessConditions>
        <cld:seeAlso>
          http://www.soas.ac.uk/Library/home.html
        </cld:seeAlso>
        <cld:isLocationOf resource="urn:x-rslpcd:967715792-47835"/>
      </rdf:Description>
    </rdf:RDF>


This encoding syntax follows the draft recommendations for encoding Dublin Core metadata within RDF [19] (current at the time of development) particularly in the area of how to encode the scheme associated with a particular value.

Notice that the RDF descriptions above are about resources that are explicitly identified using a URI [20]. In the case of most RSLP projects however, the Collections, Locations and Agents being described do not typically have URIs already assigned to them. The URIs used in the above description have been automatically generated, specifically for the purpose of creating the description in RDF.

By encoding descriptions in RDF/XML and by making use of Dublin Core and vCard properties as far as possible, the project hopes to position RSLP collection description very closely alongside other emerging descriptive practice on the Web.

Implementation

The RSLP Collection Description Project has developed a simple Web-based tool that enables the creation and editing of fairly complex RDF collection descriptions. The tool has a number of example descriptions built into it and also contains embedded detailed help in the form of data-entry guidelines for each of the attributes.

RSLP Collection Description tool

The tool is freely available for use on the Web. It has been developed in Perl and the source code will be made available in the near future.

There has been some other, more experimental, implementation of the above schema using a relational database (Microsoft Access) and the ROADS suite of tools [21]. Although these implementations cannot yet produce an RDF/XML encoding of a collection description, there is no reason why they should not do so. This work has already been used successfully as the basis of further implementation by other RSLP projects and it is anticipated that further work will be carried out in this area.

Collection types

Work on the project has also has also resulted in the development of an enumerated set of collection types - terms that may be used as a value for the collection Type (dc:type) attribute in the above schema. The list is made up of the emboldened categories in the left-hand column of the following table. The list of categories are grouped into those that indicate the type of collection, those that indicate the curatorial environment in which the collection has been made, those that indicate the content of the collection and those that indicate the collection policy and/or usage.

Type

Collection

A set of items grouped physically, electronically and/or logically on the basis of a property or properties the items have in common.

Catalogue

(Analytic-Finding-Aid). A collection of individual records describing the items, and the intellectual content therein, of a second collection. There may, in the individual records, be information about collections but that is not the focus of the catalogue. Catalogues are typically created with significant human input.

Finding-Aid

(Hierarchic-Finding-Aid). A collection of records describing the individual items, and the intellectual content therein, of a second collection. The records are firmly grounded within the overall arrangement of the collection, e.g. grouping together all the letters, account books etc. in an ordered sequence or sequences. Items are often not uniquely identifiable when considered in isolation, so the context of the collection is an essential element in compiling the finding-aid. Finding-aids are typically created with significant human input.

Index

(Indexing-Finding-Aid). A collection of records consisting of information derived from items in a second collection, regardless of their content. By this is meant that an Indexing Finding-Aid - An index, for example a robotic search engine, will index the words in a document (or catalogue record) regardless of their context and without trying to identify the discrete elements of intellectual content contained therein. Indexes are typically generated automatically by a software robot or other harvesting technology.

Curatorial Environment

Library

A library collection (books, journals, etc.).

Museum

A museum collection (artifacts, etc.).

Archive

An archive is a whole that documents the life and work of an institution or individual, which has been retained in its original working order and is of known provenance.

Internet

A collection, catalogue or index of Internet resources.

Content

Text

A collection of items that are primarily words for reading. For example - books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre "text".

Image

A collection of items that are primarily symbolic visual representations other than text. For example - images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation. Note that "image" may include both electronic and physical representations.

Sound

A collection of items that are primarily audio. For example - music, speech, recorded sounds.

Dataset

A collection of items that primarily consist of structured information encoded in lists, tables, databases, etc., which will normally be in a format available for direct machine processing. For example - spreadsheets, databases, GIS data, MIDI data. Note that collections of items that are primarily unstructured numbers and words will normally be considered to be type "Collection.Text".

Software

A software repository.

InteractiveResource

A collection of resources that require interaction from the user to be understood, executed, or experienced. For example - forms on Web pages, applets, multimedia learning objects, chat services, virtual reality.

Event

A collection of non-persistent, time-based occurrences.

PhysicalObject

A collection of three dimensional objects or substances that are not primarily text or image or one of the other types listed here. For example - people, computers, sculptures or wheat (!) . Note that collections of digital representations of, or surrogates for, these things should use "Collection.Image", "Collection.Text" or one of the other collection types listed here.

Policy and/or Usage

Dispersed

A collection of material on a single subject, but not kept together and not referred to by a specific name.

Distributed

A collection that is shared among several libraries.

Special

A collection connected with local history, celebrities, industries, etc., or on a certain subject or period, or gathered for some particular reason.

Subject

A collection of material on a particular subject.

Form

A collection of materials of the same form.

User

A collection arranged specifically for a particular group of users.

Virtual

A collection of material on a particular subject that is made available online.

Working

A collection brought together for a particular project or exhibition that is then disbanded.

In this scheme, multiple categories may be selected as appropriate. Typically, zero or one category from each group will be selected (though there may be exceptions to this). Multiple categories should be concatenated together, separated by a '.', to form a string value - for example:

Collection.Library.Dispersed
Catalogue.Museum
Finding-Aid
Collection.Archive
Index.Internet
Catalogue.Internet.Subject
Collection.Image

Although the ordering of categories implies no hierarchy, it is suggested that categories be selected in the order shown here for consistency.

Issues and conclusions

The RSLP collection description schema is not intended to be a replacement for richer archival description schemas, such as that offered by ISAD(G). Rather, it should be seen as a schema for making relatively simple collection descriptions in a wide variety of contexts - a Dublin Core for collection description. It is noted that several of the current RSLP projects will be contributing ISAD(G) conformant EAD descriptions to the UK Archives Hub [22] (or will be eligible to contribute descriptions to the Hub). We have recognized that it is not sensible to ask those projects to describe the same collections twice. To enable RSLP projects to describe collections once, a mapping from ISAD(G) to the RSLP collection description schema or vice versa is necessary, allowing collection descriptions in one format to be transformed into the other format. Mappings between ISAD(G) and the RSLP collection description schema, and tools to automate conversion between ISAD(G) conformant EAD and RSLP collection descriptions encoded in RDF are likely to be generally useful, particularly given the possibility that the RSLP collection description schema may be used outside of the RSLP context.

The choice of RDF/XML as an encoding syntax has not been entirely trouble free. RDF is a fairly new development and there is not a great deal of significant implementation experience. The approach taken of inventing a URI and assigning it to a resource, specifically for the purpose of creating an RDF description, is relatively untested. Furthermore, recent recommendations made by the Dublin Core Metadata Initiative for element qualifiers [23] (schemes and attribute refinements) have been developed in parallel with our work and the conventions for their encoding in RDF are not yet fully mature. The qualifiers and syntax adopted in this area by the RSLP Collection Description project may well be incompatible with the conventions developed elsewhere in the future.

It might be argued that the project has not had sufficient resources to fully develop software tools that enable other RSLP project to describe collections in a cost effective and efficient way. This is largely true - such software development was never envisaged as part of the original project proposal. This means that projects, particularly those projects that need to describe large numbers of collections, are left with the burden of developing their own RDF collection description tools. This is made more difficult given the general lack of off-the-shelf RDF compliant tools. This is not an ideal situation.

However, the RSLP Collection Description Project has been successful in developing a model of collections and collection descriptions, in implementing that model using an RDF encoding and in providing the basis for deployment of that encoding by other RSLP projects.

References

[1] The library, the catalogue, the broker: brokering access to information in the hybrid library
Lorcan Dempsey
Information Landscapes for a Learning Society. London: Library Association, 1999.
<http://www.ukoln.ac.uk/dlis/models/publications/landscape/>

[2] eLib Phase 3 projects
<http://www.ukoln.ac.uk/services/elib/projects/>

[3] Full Disclosure: Releasing the value of library and archive collections
Ann Chapman, Nicholas Kingsley and Lorcan Dempsey.
<http://www.ukoln.ac.uk/services/lic/fulldisclosure/report.pdf>

[4] Scientific, Industrial, and Cultural Heritage: a shared approach: A research framework for digital libraries, museums and archives
Lorcan Dempsey
<http://www.ariadne.ac.uk/issue22/dempsey/>

[5] Research Support Libraries Programme
<http://www.rslp.ac.uk/>

[6] An Analytical Model of Collections and their Catalogues
Michael Heaney
<http://www.ukoln.ac.uk/metadata/rslp/model/>

[7] RSLP Collection Description Schema
Andy Powell
<http://www.ukoln.ac.uk/metadata/rslp/schema/>

[8] Resource Description Framework (RDF) Model and Syntax Specification
Ora Lassila, Ralph R. Swick, editors.
<http://www.w3.org/TR/REC-rdf-syntax/>

[9] RSLP Collection Description Tool
Andy Powell
<http://www.ukoln.ac.uk/metadata/rslp/tool/>

[10] RSLP Collection Description Data Entry Guidelines (draft)
Andy Powell
<http://www.ukoln.ac.uk/metadata/rslp/tool/?mode=printGuide>

[11] CLDT - an enumerated list of collection types
<http://www.ukoln.ac.uk/metadata/rslp/types/>

[12] Functional Requirements of Bibliographic Records
<http://www.ifla.org/VII/s13/frbr/frbr.pdf>

[13] The principles and future of AACR
Jean Weihs, editor.
(Ottawa: London Chicago: Canadian Library Association; Library Association Publishing; American Library Association, 1998).
(Proceedings of the International Conference on the Principles and Future Development of AACR, Toronto, Ontario, Canada, October 23-25, 1997).

[14] ISAD(G): General International Standard Archival Description
<http://www.ica.org/cds/ISAD(G)E-pub.pdf>

[15] Reference Model for an Open Archival Information system (OAIS)
Consultative Committee for Space Data Systems, May 1999
Available from <http://www.ccsds.org/RP9905/RP9905.html>

[16] Dublin Core Metadata Element Set, Version 1.1: Reference Description
<http://purl.org/dc/documents/rec-dces-19990702.htm>

[17] RFC 2426 - vCard MIME Directory Profile
<http://www.imc.org/rfc2426>

[18] Extensible Markup Language (XML)
<http://www.w3.org/XML/>

[19] Guidance on expressing the Dublin Core within the Resource Description Framework (RDF)
Eric Miller, Paul Miller and Dan Brickley
< http://www.ukoln.ac.uk/metadata/resources/dc/datamodel/WD-dc-rdf/>

[20] Naming and Addressing: URIs, URLs, ...
<http://www.w3.org/Addressing/>

[21] Resource Organisation And Discovery in Subject-based services
<http://www.ilrt.bris.ac.uk/roads/>

[22] The UK Higher Education Archives Hub
<http://www.archiveshub.ac.uk/>

[23] Dublin Core Qualifiers
<http://purl.org/dc/documents/rec/dcmes-qualifiers-20000711.htm>

Copyright© 2000 Andy Powell, Michael Heaney, and Lorcan Dempsey
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous Article | In Brief
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/september2000-powell