Articles
spacer

D-Lib Magazine
September 2000

Volume 6 Number 9

ISSN 1082-9873

Collection Level Description - the RIDING and Agora Experience

 

Dr. E.V. Brack
University of Sheffield
Western Bank
SHEFFIELD S10 2TN
Tel: +44 (0)114 222 1143
v.brack@shef.ac.uk

David Palmer
Agora/Assistant Librarian
University of East Anglia
Norwich NR4 7TJ
Tel: +44 (0)1603 593523
david.palmer@uea.ac.uk

Bridget Robinson
AGORA Communications Coordinator
UKOLN (UK Office for Library & Information Networking)
University of Bath, Bath BA2 7AY
Tel: +44 (0)1225 323343
b.r.robinson@ukoln.ac.uk

Red Line

spacer

Abstract

This article will examine the background and results of the eLib working group on Collection Level Descriptions (CLDs) and look at the implementation of CLDs in two of the eLib Phase 3 Library projects -- RIDING and Agora.

Introduction

The Clumps projects and Agora are based on the need to aid discovery of, and increase access to, the vast scholarly resources available to the Higher Education community, as proposed in the Anderson Report [1]. These projects are investigating the use of Z39.50 technology in opening up access to these resources, and providing "the means to locate and to gain access to material with reasonable ease, reasonable speed and at reasonable cost to individuals and individual institutions" as recommended by Anderson.

In order to accomplish this task, many of the projects within eLib Phase 3, especially the Clump projects and the Hybrid Library projects, require metadata about the resources they are handling.

In general, knowledge of library collections, or any other types of collections, is not easy to obtain. The RIDING Gateway [2] intended to address this lack of information by including a searchable database of collection descriptions that would provide information on what was available. Agora’s entire approach to the access to, and organization of, the vast array of resources envisaged by a hybrid library is based upon the use of rich metadata as exemplified by collection descriptions.

eLib Working Group on CLDs

In September 1998, a group of interested people met in London to discuss the development of a scheme for describing collections; this group was convened under the auspices of UKOLN [3] and the first meeting attracted representatives from eLib projects, the JISC data centres, software vendors, and the British Library. A small national working group resulted from this meeting who were tasked with putting together a set of collection description elements. The emphasis was on producing a simple, practical framework for describing collections in general. The time available for discussions was limited, as the projects -- RIDING in particular -- needed to have a basic framework ready for trial within a few months. Papers and discussions of this working party are available on the UKOLN web site at <http://www.ukoln.ac.uk/metadata/cld/>. This includes an eLib supporting study: " Collection Level Description - a Review of Existing Practice".

Subsequently, the RIDING Clump Project created a RIDING Collection Description Working Group to discuss the particular requirements for RIDING collections; this group developed a prototype scheme applicable to RIDING, and created a number of collection descriptions for the Gateway. The Music Libraries Online and AGORA projects, who use the same gateway software as RIDING, also required collection descriptions for their user interfaces, and from Summer 1999 met regularly with RIDING to further develop the scheme for general use. The scheme was refined and added to in light of experience from all three projects; this eLib Collection Description group kept in close contact with the national working group to ensure progress was made along common lines. Later in 1999, the other clumps projects -- CAIRNS and M25 -- also contributed to the development of the scheme.

The purpose of the RIDING collection description scheme was to describe, in a standard manner, any type of collection -- physical or virtual (electronic), networked or otherwise. The collections could be of anything -- books and other library materials, art works, sculpture, living material, digital or physical items, so the scheme needed to be appropriately generic in order to encompass such diverse collections. The metadata elements of the scheme should:

  • allow users to discover, locate and access collections of interest;
  • allow users to perform searches across multiple collections in a controlled way;
  • allow software to perform such tasks on behalf of users based on known user preferences.

The first task was to agree on the definition of a 'collection', and the simplest working definition was:

a grouping of individual items or other collections

Obviously there is often a blurring of distinctions here -- a collection may be made up of other collections as well as items, or items and collections together. An item itself may be made up of other items, e.g., a catalogue is actually a collection of catalogue records; a web page actually comprises text, images, etc. although people will think of a web page as a single item. It was left up to the person who probably knows the collection best (i.e., the person in charge of the material) to make the decision whether it should be described as a collection.

As the purpose underlying the development of the scheme was to aid discovery and location of resources, it was recognised that important additional information about access to a collection should be included, and that access may be via one or more services, particularly in the case of electronic collections; for instance, the Medline database is available on CD-ROM and on the web. Different terms and conditions of use may apply to using the collection, depending on the service used to access it.

Various existing schemes for describing collections were studied at the beginning of the work, including Dublin Core [4], GILS (Global Information Locator Service) [5], and ISAD (G) (General International Standard Archival Description) [6]. Other eLib Phase 3 projects also worked on producing their own collection descriptions during this time, and the MALIBU Project's scheme was consulted in detail. Discussions with archivists and museum personnel also took place.

Simple Collection Description

The original Simple Collection Description Scheme produced by the national working group had 23 elements, 12 of which were taken from the Dublin Core metadata format; they were grouped into elements describing a collection, and those describing a service used to access a collection. A number of collections were catalogued using this first version of the scheme, and issues arising from this exercise were discussed via e-mail. The scheme was refined and a report was produced in October 1998 [7]. Discussions are ongoing and the scheme is constantly under revision; to date there are still several unresolved issues, for example the use of controlled lists for the Subject/Keywords element.

Implementation of Collection Descriptions by RIDING

Following on from the discussions by the national working group on Collection Description, RIDING personnel took the prototype collection descriptions created and reduced the scheme to a subset of fields, based on those most frequently containing data in the collections described for the national working party report. This produced a set of 15 descriptive elements:

  1. Title
  2. Subject/Keywords
  3. Content Description
  4. Collection Administrator
  5. Collection Owner
  6. Publisher
  7. Language
  8. Coverage
  9. Creation Date
  10. Collection Type
  11. Relation
  12. Location and Access
  13. Access Times and Availability
  14. Access Policy
  15. Charging Policy

The use of standard controlled lists of terms is encouraged, for example in the Language field (7) where the ISO 639 language codes are used. Controlled lists are also used for the Collection Type (10) and Relation (11) fields but in these cases the working group have both created their own and have made additions to an existing authority list using the Dublin Core relation metadata elements (see Appendix B).

The RIDING Collection Description Working Group, after discussions and amendments of the proposed description elements, asked all ten RIDING project libraries to use the scheme to produce a minimum of five collection descriptions each, including one of their library as a whole and one of their OPAC (which is a collection of catalogue records). The results of this exercise were approximately 60 descriptions of collections held by the RIDING libraries; these were originally made available as browsable web pages at the RIDING Gateway and later as part of a searchable database.

There were obvious difficulties of interpretation in the way that people described their collections, and it also proved difficult to bear in mind (at least for librarians!) that the scheme is intended to be applicable to any type of collection, not just printed material. Further discussions looked at the use of fixed fields and authority lists versus free text, and added, merged and deleted elements.

The scheme implemented at the RIDING Gateway shows some differences to the eLib scheme as the RIDING version was implemented while the eLib scheme was still under development.

The eLib Scheme

The discussions by the eLib CLD group so far have resulted in a Collection Description Scheme that comprises 29 descriptive elements, only a few of which are mandatory. Where possible authority lists are used rather than free text and URLs are included if required. RIDING had specific needs for some fields to be mandatory, although other projects do not require this. The elements are in two groups, those describing the collection itself and those describing access to the collection. The elements of the scheme are listed in Appendix A. Each element has a number, a name (meaningful to humans), a label (for software), a semantic description, and specifications for field type and length, status, and searchability.

As far as possible elements from Dublin Core have been used, though one -- Coverage -- has been split into two elements, Spatial Coverage and Temporal Coverage.

Implementation of Collection Descriptions by Agora

The RIDING Project has been an important catalyst in discussion and implementation of CLDs. As RIDING draws to a close in terms of ongoing development Agora now takes up the challenge and is playing a key role in the creation of CLDs in relation to the landscape function.

Agora [8] is a consortium-based project led by the University of East Anglia; partners are UKOLN, Fretwell-Downing Informatics and CERLIM (the Centre for Research in Library and Information Management at Manchester Metropolitan University). The project also works with several associate groups: libraries, service providers and systems developers.

The project is developing a hybrid library management system (HLMS) to provide integrated access to distributed information services. In parallel with this it is also developing library skills and experience in the management of hybrid resources.

The Agora Development Framework

Agora is based on concepts that emerged from the MODELS Project, MOving to Distributed Environments for Library Services [9]. MODELS has been developing frameworks for managing distributed resources to enable truly integrated access. The central part of the Agora framework is a layer of 'broker' services or 'middleware' which shields the user from the complex and repetitive processes involved in interacting with individual services. The web provides the primary end-user access point to the Agora Gateway. Agora is based on Fretwell-Downing's VDX software which is also the basis of the RIDING Project.

The concept of information landscapes is integral to the Agora organization and presentation of resources. The term "landscape" is used to describe a way of presenting different views of information resources to users according to their interests and needs. Agora is exploring the construction of information landscapes as part of its user-centred focus. In order to provide information landscaping, it is necessary to match information about users against information about resources -- Collection Level Descriptions. This work is leading to the integration of collection level descriptions as a metadata standard for describing resources and to facilitate discovery and organization of resources.

The Collection Level Descriptions in the Agora project have been created using the same schema as RIDING. All the participating Library Associates were sent scheme templates, guidelines for completion, and supporting documentation. They then used this information to complete CLDs describing their own collections and library catalogues. This was largely successful although the descriptions would have benefited from more detail. The resulting CLDs have been loaded into Agora Release 1. Service Providers were also asked to complete CLDs relevant to their collections. The response from the suppliers was limited but those that did respond provided full data. In the absence of supplier-sourced data, Agora Project staff input limited data; the intent being to provide only enough data to allow the system functionality to be tested.

There are currently 58 CLDs in the first release of the Agora HLMS, these include catalogues (library and internet), gateways, commercial databases and other mixed media. As yet, no non-electronically accessible sources have been included in Release 1.

The CLDs are critical to the Agora concept of the HLMS. Technically, the CLDs are held within VDX and are inextricably linked to the target/database information in VDX . No resource is available for searching without a CLD and a link to a target/database.

The CLDs can be used to provide a guide to the aggregation of resources into "landscapes" and as a guide to the resources themselves. However, there are some limits to their use as "aggregator-searcher" as the search can only be initiated across one attribute, e.g., Subject or Title or Collection Type. A total of nine separate attributes may be used as a search term. Another limitation is the experimental nature of the schema itself -- for example, no existing "collection type" has been found to cover OCLC databases and therefore they cannot, as yet, be searched by that attribute.

The CLDs can be created and modified by each Associate site. It is hoped that this local editing facility will lead to the generation of fuller, richer CLDs as the project progresses. It also allows for the inclusion of additional targets and collections in response to user requirements.

User Case Studies

Release 1 of the Agora HLMS is currently providing a platform for User Case Studies. These are being carried out at five Library Associate sites. The studies are diverse in nature, examining a range of issues pertaining to the implementation and use of the HLMS. The studies address three areas: functionality of the system, how different user groups use the system, and the training/management tools required. Under the area of functionality special emphasis will be placed on the location and organization of resources within the system. This will in turn provide invaluable feedback as to the usefulness of CLDs. The results of the case studies will be published at the end of the year.

In the meantime, there will be a second release of Agora incorporating a more advanced CLD search facility, allowing combination of several CLD attributes in one search. The improved functionality and more intuitive interface should provide the project with additional information on the use and relevance of CLDs and their contribution to the hybrid library.

Issues and Future Developments

The issues surrounding the development of CLDs can be divided into two discrete areas, albeit interconnected; one, the theoretical development of CLDs and their components, and two, the implementation of CLDs in "real" HLMS systems.

In considering the "theoretical" issues, several come to the forefront: the utility of the schema across collection types, the definition of collection, and the use of controlled vocabulary.

The scheme is very useful for describing special collections but is not yet completely suitable for large, general collections, e.g., for describing the whole of a university library’s holdings. Also requiring further discussion are issues such as how to define what is a collection and what is a subject strength; how to describe a collection that is not physically together; how to describe the level of a collection.

The use of controlled lists for the Subject element of the scheme has raised several important issues, which have been discussed within eLib, and nationally, and internationally. It was firmly agreed that a controlled list or subject classification should be used but it was not clear which one. Subject classifications such as Dewey and the Library of Congress Subject Headings were too complex at lower levels and too simple at the highest levels. Large, general library collections covering broad subjects such as Social Sciences, and narrow definitions such as Fairground History, need to be included, and a hierarchical scheme is the only way in which these needs can be met.

However, there still remains the problem of a general list of subject fields that cover all subjects, both academic and popular, with enough detail to be useful. A number of listings and approaches were attempted, also, but none were felt to be particularly useful, especially as quite often there was an obvious academic bias.

The problem was discussed with the other participants in the eLib Collection Description scheme and also with the other clump projects, who suggested different solutions. A consensus has emerged that the solution offering the most promise is the subject headings that the BUBL Link database [10] has created. This list was originally based on Dewey and Library of Congress classifications but has been adjusted over several years to UK requirements. The BUBL scheme has roughly 170 main subject headings, all mapped to Dewey classes, and in addition has around 1400 narrower terms.

Further development is being proposed through funding for a project called HILT (High Level Thesaurus) under the Research Support for Libraries Programme [11], and led by Dennis Nicholson of Strathclyde University.

In the meantime, Agora and RIDING decided to adopt a pragmatic solution -- using a simple scheme with a limited number of broad subjects. It is based on the M25 and Research Assessment Exercise list of subject terms.

In turning to the implementation of CLDs, it should be noted that any comments from Agora will be very preliminary as the results of Release 1 user case studies have not yet been seen; these will be published in Autumn 2000. For example, it is not clear as yet whether CLDs are used more as search tools to identify collections or as metadata to simply describe collections. However, there are some issues that have already arisen which are either new, or reflect the issues identified in the "theoretical" development of CLDs.

Agora has found that the lack of controlled vocabulary has been both less, and more, of a problem than first anticipated. Less, in that most Library Associates within the Project had chosen to use the values that were already in the system, and more, in that there has been little or no development within the Project of any theoretical underpinning for a controlled vocabulary. As noted above, the initial values entered into the Agora system were to facilitate compliance testing and were not in any way an attempt at a comprehensive or cohesive set of controlled language terms. Further work with Release 2 of the Agora HLMS hopes to address this issue.

Another issue is the possibility of different interpretations of any schema, not simply that used in Agora. This becomes an issue where suppliers are asked to provide CLDs and may well have a different interpretation of the components of the schema than the system administrator, or from each other. Clear content guidelines need to be absorbed by the suppliers and agreements have to be made between the system host and supplier as to the responsibility and authority for any changes to CLD data. Because of the low response rate from suppliers, the extent of this problem and the consequent issue of editing of CLD data have yet to be fully explored.

Another, albeit lesser, issue has been the use of the eLib Working Group schema itself. The Project itself is aware that development of CLDs may well be moving in another direction theoretically but given the necessity of "freezing" a version of CLD schemas for development and evaluation, Agora (Release 1) and RIDING do not necessarily reflect the most recent thinking in this area. The incorporation of CLDs within any HLMS in an operational sense will require some stability in the state of the schema and values within the schema.

Conclusion

The experience of both RIDING and Agora have shown that the use of CLDs is useful and desirable but that, as with any new scheme of metadata, there are still many issues to resolve, both in terms of the development of the CLDs themselves and in their implementation in "real-life" situations. Use of controlled language, gaining commitment to a common schema, definition of "collection" and its component parts all need to be addressed within the context of the development of CLDs. The Release 2 phase of the Agora project will look at issues of implementation and use of CLDs, and hopefully spur further research by other projects or institutions in this area.

References

[1] Report of the Group on a National/Regional Strategy for Library Provision for Researchers, HEFCE, 1995; <http://www.ukoln.ac.uk/services/elib/papers/other/anderson/>

[2] The RIDING Gateway: <http://www.riding.ac.uk/>

[3] UKOLN - UK Office for Library & Information Networking is funded by Resource : The Council for Museums, Archives & Libraries (the organization succeeding the Library and Information Commission, the Joint Information Systems Committee (JISC) and the European Union. UKOLN also receives support from the University of Bath where it is based. <http://www.ukoln.ac.uk/>

[4] Dublin Core website: <http://purl.org/dc>

[5] GILS (Global Information Locator Service):<http://www.gils.net/>

[6] ISAD (G) (General International Standard Archival Description): <http://www.ica.org/cds/isad(g)e.html>

[7] (UKOLN Collection Description Working Group: Work in progress - <http://www.ukoln.ac.uk/metadata/cld/wg-report/>

[8] The Agora Project website: <http://hosted.ukoln.ac.uk/agora>

[9] MODELS website: <http://www.ukoln.ac.uk/dlis/models/>

[10] BUBL Link database: <http://bubl.ac.uk/link/menus.html>

[11] Research Support for Libraries Programme: <http://www.rslp.ac.uk/>

Appendices

Appendix A: elib Collection Description Scheme July 1999

Appendix B: Controlled Lists for Use with the eLib Collection Description

Copyright© 2000 Dr. E.V. Brack, David Palmer, and Bridget Robinson
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/september2000-brack