Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
September 2004

Volume 10 Number 9

ISSN 1082-9873

Reengineering a National Resource Discovery Service

MODS Down Under

 

Roxanne Missingham
National Library of Australia
<rmissingham@nla.gov.au>

Red Line

spacer

Abstract

Australian libraries have shared resources and records for over two decades through Kinetica, a service provided by the National Library of Australia. While this service has broadly met the needs of its users, comprising over 1000 Australian libraries, the Library is reengineering the service, using MODS (Metadata Object Description Schema) to improve coverage of online publications and records from specialist collections. This article describes the use of MODS to transform records for digital resources into MARC records for resource discovery.

Introduction

For the past twenty-three years, the National Library of Australia has provided an online service to support resource sharing and collaboration between Australian libraries. The development of a national online resource discovery service in 1980 fulfilled a major National Library goal of developing greater cooperation between libraries, and with the advent of the Internet these shared resources have become even more widely available.

The online service built upon a history of national initiatives providing discovery services to collections held around the nation; the first were published in print and microform, such as the National Union Catalogue of Monographs and Serials in Australian libraries: Social Sciences and Humanities. Next, the online Australian union catalogue was established in 1981, using the WLN software. In 1999, a web-based service, Kinetica, was established.

Kinetica enables Australian libraries, wherever they are, to contribute to the national online catalogue and benefit through shared collection building, cataloguing and interlending. Kinetica has also enabled libraries to link to other, external networks such as those of RLG, OCLC and the National Libraries of New Zealand and Singapore. In recent years the role of the service has expanded to support resource discovery for library users across the nation. Staff and students in approximately 70% of Australia's universities as well as a number of public libraries now can access the national union catalogue directly or through a portal.

The Kinetica service is an essential part of the Library's strategic aim to:

"provide rapid and easy access to the wealth of information that reside in libraries and other cultural institutions—and to break down the barriers that work against this" (1).

Australia is a large country, with a network of public, state, university, research and special libraries spread across 7.7 million square kilometres. It is roughly the size of the United States, excluding Alaska, and has a population of 20 million. Libraries in Australia have a strong history of cooperation, built on the recognition that the national collection would inevitably be distributed over libraries possessing a range of strengths.

At the heart of the Kinetica service is the Australian National Bibliographic Database (ANBD) with over 37 and a half million items held against over 14 million bibliographic records.

The service will be undergoing considerable redevelopment through modularised implementation over the next two years, using contemporary standards and technologies. From 2003 to date, there has been a significant reengineering of the record ingestion process, the "Harvester project", which has involved the collection of records through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and other means. The Metadata Object Description Schema (MODS) has been used to transform Dublin Core (DC) records, enabling more effective contribution of records, particularly contribution of metadata for digital content.

By undertaking these developments, the Library will enable users to as easily as possible find resources in Australian libraries, both physical and online resources, and to access the full range of those information resources.

Collecting records of digital resources

A major twenty-first century trend has been the continued shift to online publication of information. In Australia this has been particularly profound, exemplified in the production of around 60% of all Commonwealth (federal) government publications solely in online form. For the National Library this has led to the development of a two-pronged strategy to ensure access to government publications:

  1. to digitally archive publications, and
  2. to ensure that descriptive information is available for resource discovery through Kinetica and other services.

While approximately 400,000 electronic resources were included in ANBD, the coverage of online Commonwealth government publications was recognised in 2002 as requiring significant improvement. In addition to government publications, the scope of the national union catalogue has evolved to include records from organisations that record resource descriptions in non-MARC format, particularly in special areas such as music. For these organizations the use of a shorter format than MARC has enabled them to efficiently record both print and non-print materials. Information is also being sought from information providers who generally describe online resources in a DC-type format, for example, Australian Music Online (2). To develop the national library resource discovery service to be a true "national resource discovery service," a new approach was required. The challenge of ensuring that these electronic resources are discoverable is not an inconsiderable one.

Online access to government information has been accepted as fundamental for better government in Australia (3), and metadata for Commonwealth government resources was mandated as the AGLS (Australian Government Locator Service) standard, which is based to a large degree on the Dublin Core (DC) standard. AGLS adds an additional 4 elements to DC, and AGLS also uses qualifiers. While implementation is mandatory, Government agencies have applied the AGLS standard in varying ways. Some have applied it to collection level records (for example, to a publications page), while others have used a very small number of fields at a more detailed level. Use of harvest control lists to provide the data is also mandated; however, of the approximately 700 Commonwealth government websites, only 100 have harvest control lists that can be harvested successfully (4).

The Library has also actively pursued the issue of preserving long-term access to online publications, including government publications. Selected online government publications are included in the Library's Australian Internet Archive, Pandora (5).

For those seeking government publications, it is difficult—if not impossible—to know which titles are available in print and which online. Information seekers usually do not know, or care, whether a desired publication is in print or digital format. The user's core issue is "finding" the publication and "getting" a copy. For truly easy access to these resources, all formats must be represented in an integrated resource discovery service.

There has been a significant increase in government publishing online over the past decade. While the listing of online government publications on the ANBD has only been encouraged in the last four years, nevertheless there has been notable expansion of coverage of these publications. A web input form was developed to encourage the contribution of these publication records. Libraries have been encouraged to record digital publications, and the National Library and government libraries have responded positively. There has also been a very significant contribution of records from the National Library's Pandora digital archive. These strategies, though effective, have not, however, been sufficient to ensure a high enough level of recording Commonwealth government publications.

The following graph indicates that the shift to online publishing is increasing markedly each year, and shows that coverage of online publications has, as described above, been a recent initiative.

Chart showing the shift to online publishing, coverage of government publications in the ANBD and increase of government online publications from 0 in 2000 to 2500 in 2003

Graph 1: ANBD coverage of Commonwealth government publications

To ensure that the ANBD would be a reliable and comprehensive resource for accessing both printed and electronic Commonwealth government publications, a pilot project was launched in January 2003 to improve access to these government documents in both formats. The project was based on moving from the contribution, through libraries, of MARC cataloguing records to a new approach supporting addition of AGLS metadata records directly from government web sites.

The pilot project had the following objectives:

  • improve discovery and access to online Commonwealth government publications through Kinetica services;
  • facilitate identification, harvesting and preservation of electronic Commonwealth publications through services such as the Library's Pandora archive;
  • maintain the Australian National Bibliographic Database as a primary and comprehensive resource for access to Commonwealth government publications;
  • enhance exposure of online Commonwealth publications through value-add services including Recent Australian Publications; and
  • promulgate metadata guidelines that will assist in the creation of consistent and quality metadata.

The initial phase of the pilot project involved data collection from participating agencies and also explored workflow options for agencies in the creation and exchange of metadata for their print and electronic government publications. Phase One participants were the Australian Bureau of Statistics, Department of Agriculture Forestry and Fisheries, Department of Environment and Heritage, Department of Health and Ageing, the National Occupational Health and Safety Commission, Parliamentary Library, and the Treasury.

Some of the issues explored in the pilot were:

  • definition of "published" resource (6);
  • scope of resources identified for metadata creation;
  • level of description of the resource (e.g., collection level, single item, analytics);
  • metadata content standards (Commonwealth government agencies are required to produce metadata in AGLS);
  • quality of metadata description; and
  • metadata harvesting issues (e.g., use of different systems platforms in each agency).

The results of the pilot can be found in a background paper (7). In summary, the coverage of the ANBD was expanded with records for online resources after addressing a number of significant issues, such as the definition of "publication" that was based on existing government guidelines:

"A document intended for distribution to the general public, even if only a few copies" and "A document for online publication, which will only be available on the Internet, or documents as exposure drafts available online for public comment".

The definition was expanded through use of examples. Quality issues were found to be very significant, together with the level of application of metadata descriptions. A substantial amount of work was required for agencies to be able to provide files of title level records for inclusion in the national resource discovery service. A significant amount of work was also required by the National Library to transform the records for quality information retrieval. A more detailed report on the use of MODS for the records transformation follows.

Using MODS

One of the key pilot project challenges has been the conversion of the Dublin Core Records into MARC for storage in the ANBD. After considerable investigation, the decision was made to use MODS (Metadata Object Description Schema) as the conversion standard for records. This decision was based on the fact that transformation into MODS made it possible to enhance the records by utilising a profile for each contributing agency to add data, such as publisher, author and material type, and to build upon existing tools enabling the Library's development work to concentrate on the harvesting of records and conversion from DC to MODS.

The Library of Congress developed MODS as a scheme that stands part way between MARC and Dublin Core (6) in terms of complexity of records. It contains a subset of MARC fields and can be used to store descriptions created in a less resource intensive manner than using full MARC. MODS is expressed in XML and has enabled the development of a range of new resource discovery services.

Gunther has proposed that MODS:

"...is particularly applicable to digital library objects that require rich descriptions compatible with existing ones in library catalogs, but not as complex as full MARC and thus easier and quicker to create...it provides the flexibility to be combined with other XML based standards such as METS to satisfy the needs for the digital library environment" (8).

Tennant has commented:

"The Metadata Object Description Schema (MODS) is being developed to carry selected data from existing MARC 21 records to enable the creation of original resource description records" (9).

For the ANDB, MODS was utilised as part of a conversion of records from AGLS (DC) to MARC, while enabling enhancement of the data to support more successful use in a resource discovery context. While this is an unusual application of MODS, it enabled the Library to innovate in the ingest of records from a variety of sources without requiring detailed human intervention for each record, and it also supported significant improvements in data quality and richness.

MODS was chosen as an intermediary format, rather than converting records directly into MARC, for two major reasons. This first reason is that the conversion from DC (AGLS) to MODS could be quite easily written for each contributor. The relationships between fields in these schemas are relatively simple. Complex coded data required for MARC records, such as leader and 008 fields, did not need to be programmed by Library staff if the initial conversion was from DC to MODS. Secondly, there was an existing stylesheet for conversion from MODS to MARC created by the Library of Congress. While suggestions for enhancement were made to the Library of Congress early in the project, the stylesheet proved to be a very robust mechanism for the conversion into MARC.

For this project Kinetica collected data from participating agencies in its original AGLS format in XML, converted to the MODS standard and then to MARC as below:

Flowchart showing the process from AGLS metadata ingestion to converted records ready for distribution

Figure 1: Record conversion process

"Harvesting" the records

The Library developed a record collection system based on compiling a profile of participant agency to manage the record ingestion process. Files of records are "harvested" using OAI-PMH or ftp from contributors. The participant agency profile records details of the contributing library, method of file acquisition and collection events:

  • Contributor details: library symbol (NUC); Library name, short name; business contact details; technical contact details; notes.
  • Collection details:
    • sourceHost, sourcePort;
    • ftp details;
    • oai details;
    • authentication Userid; authentication Password;
    • source data schema;
    • collection schedule—manual or automatic (details of schedule for automatic);
    • nla notification address; and
    • notes.
  • Collection history: collection event information including status, record counts, start and finish times, timestamps and notes.

The collection of records is then generated by the profile.

When files of records are acquired from contributors (in XML), they are converted to MODS. This conversion is written for each contributor based upon test records sets. Only those fields used by the contributor are listed in the conversion program. Often quite extensive record testing is required, as records for different formats of material (such as serials and monographs) may have different issues. Separate profiles were required for each record supplier because of the wide variety of records. For example, data from the National Occupational Health and Safety Commission (NOHSC) contained at least 40+ elements, while data from Environment Australia (EA) contained only 2 elements. Many of the NOHSC data elements could be ignored as not relevant to the ANBD, but the EA data required additional data to be inserted as defaults. The closer the participants' records are to full AGLS descriptions, the closer each AGLS to MODS conversion is. Each conversion is written based on the fields contained in records supplied by that contributor mapped to MODS.

The MODS to MARC conversion uses a Library of Congress stylesheet (10).

An example of a record which was converted using MODS follows:

Record pre-conversion

<item>
      <title>Ashmore Reef National Nature Reserve and Cartier Island Marine Reserve Management Plans - 2002</title>
      <link>http://www.ea.gov.au/coasts/mpa/cartier/plan/index.html</link>
</item>

The first conversion results in the inclusion of new data generated by the profile—for example, corporate author (Environment Australia), resource type description, place of publication and publisher. The data is included to enhance information retrieval.

Record after conversion

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/"
xmlns:xlink="http://www.w3.org/TR/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.loc.gov/mods/
http://www.loc.gov/standards/mods/mods.xsd">
      <mods>
            <titleInfo>
                  <title>Ashmore Reef National Nature Reserve and Cartier Island Marine Reserve Management Plans
                  </title>
            </titleInfo>
            <name type="corporate">
                  <namePart>Environment Australia</namePart>
                  <role>
                        <text>creator</text>
                  </role>
            </name>
            <typeOfResource>software, multimedia</typeOfResource>
            <originInfo>
                  <place>
                        <code authority="marc">aca</code>
                        <text>[Canberra]</text>
                  </place>
                  <publisher>Environment Australia</publisher>
                  <issuance>monographic</issuance>
            </originInfo>
            <language authority="iso639-2b">eng</language>
            <physicalDescription>
                  <form authority="marcform">electronic</form>
            </physicalDescription>
            <identifier type="uri">http://www.ea.gov.au/coasts/mpa/cartier/plan/index.html
            </identifier>
            <recordInfo>
                  <recordContentSource>ASET</recordContentSource>
            </recordInfo>
      </mods>
</modsCollection>

In converting the data to MARC, coded fields are included again to enhance information retrieval.

MARC record

" "
Leader     01037nam     2200217uu     4500
001 00000
008 031212s203     aca|||||s||||f||||||eng||
245 10   $a Ashmore Reef National Nature Reserve and Cartier Island Marine
Reserve Management Plans $h[electronic resource]
110 2 $a Environment Australia
260   $a[Canberra] $bEnvironment Australia $c2003
856   $u http://www.ea.gov.au/coasts/mpa/cartier/plan/index.html
035   $a 00000
040   $aASET
 

The final record clearly has a richer set of descriptive and coded data. For searchers this can be used for greater precision in searching, which in a database with over 14 million bibliographic records is an important factor.

The OAI-PMH harvester workflow is also being used to obtain records for special materials (for example, records of music resources from the Australian Music Centre). These records are converted for use in the ANDB and also for supply to MusicAustralia (11). While these conversions are of records that are much more complex than the AGLS record above, they demonstrate the power of a MODS conversion to produce data that can be stored as MARC for searching in a single repository.

The next challenge for the National Library will be to automate the harvest, conversion, loading and updating of this metadata onto the ANBD. A further goal is to automate the flow of this data to other repositories by exposing the metadata to harvesting by OAI-PMH services and to indexing by web search engines.

The Library's ultimate aim is to enable Australians to discover and link directly to online government and other resources. Providing good quality metadata on the ANBD is a large step towards ensuring that people discover the resources they seek no matter where the journey starts.

Conclusion

The National Library of Australia is using new technologies to reshape a large scale national resource discovery service. The reengineering of Kinetica is critical in supporting the Library's strategic goal of breaking down barriers to access to library collections and online resources. Kinetica will continue to be based on a centralised database, and the network of contributors will be able to use a combination of record supply processes.

This article has outlined the innovative use of MODS to support ingest of enhanced records to the ANBD. The National Library of Australia selected MODS as an interim format for the conversion on the basis of implementing an effective conversion process that built upon the work of the Library of Congress. The records, after conversion, are automatically passed to the Australian National Bibliographic Database for use in a large resource discovery service. After loading to the ANBD, the records can also be automatically reused in a variety of resource discovery services, such as Recent Australian Publications (a free monthly service) and MusicAustralia.

While MODS is generally used as a schema for storing records that are richer than DC but less rich than MARC, with imagination it can be used to transform records to enhance DC-like records for use in rich resource discovery services, such as a national union catalogue.

The use of standards such as AGLS, MODS and MARC has truly enabled interoperability to occur through reuse of descriptive records. The value-added through the MODS transformation demonstrates a use of standards whose "invisibility is a testament to their effectiveness" (9).

For the end user, the success of the project can be seen in resource discovery services that deliver access to publications regardless of their format. In addition, enhanced records can also be returned to information creators to be made available for Internet search engines. This completes a circle of enhanced access through library resource discovery services and Internet search services.

References

(1) National Library of Australia, Direction for 2003-2005, 2002, <http://www.nla.gov.au/library/directions.html>.

(2) Australian Music Online <http://www.amo.org.au/>.

(3) Information Management Steering Committee, Management of Government Information as a National Strategic Resource, Canberra, Office of Government Information Technology, 1996, <http://www.nla.gov.au/imsc/>, the standard adopted for metadata is the AGLS (Australian Government Locator Service) standard <http://www.naa.gov.au/recordkeeping/gov_online/agls/summary.html>.

(4) Thomas, Colin, Presentation to Australian Government Agencies Publications access seminar: 14 March 2004, <http://www.nla.gov.au/kinetica/summary.rtf>.

(5) National Library of Australia, Pandora: Australia's web archive, <http://pandora.nla.gov.au/index.html>.

(6) Kinetica, Commonwealth Government Metadata Pilot Paper, 2004, <http://www.nla.gov.au/kinetica/metadata.html>.

(7) Library of Congress, Metadata Object Description Schema, 2003, <http://www.loc.gov/standards/mods/>.

(8) Guenther, Rebecca, "MODS: Metadata Object Description Schema", 2003, p 1, 12.

(9) Tennant, Roy "The engine of interoperability", Library Journal. December 15, 2003: p. 33.

(10) Library of Congress MODS to MARC stylesheet, <http://www.loc.gov/standards/marcxml/>.

(11) Music Australia <http://www.musicaustralia.org/>.

Copyright © 2004 Roxanne Missingham
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Opinion
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

doi:10.1045/september2004-missingham