XML in Libraries (Book Review May 2003)

D-Lib Magazine
May 2003

Volume 9 Number 5

ISSN 1082-9873

XML in Libraries

Reviewed by Priscilla Caplan, Assistant Director for Digital Library Services, Florida Center for Library Automation, <pcaplan@ufl.edu>

XML in Libraries

By Roy Tennant (Editor).
Paperback, ISBN: 1555704433, $75.00, 213 pages
Neal-Schuman Publishers, Inc.
August 2002

It seems to me that — contrary to the stereotypical view — librarians in general and academic librarians in particular have always been eager to adopt new technologies. We were using email and the Internet long before our friends who work in business, hospitals or law offices. We couldn't wait to find applications for small computers or to gopherize our CWISes. Sometimes we may even be a step too far in front of the curve, as evidenced by our early embraces of OSI and ebooks.

Reading XML in Libraries, edited by Roy Tennant, gave me once again a powerful sense of the vigor and creativity with which we seize upon new technologies. The book features a baker's dozen of short case studies describing various library-related applications using XML in some way. Considering that most of the applications described in this volume were developed two, three, even four years ago, it is hard to believe that XML celebrated its 5th birthday this February, and that XSL and XML Schema are barely two years old.

The State Library of Tasmania developed a central repository of XML-formatted metadata describing government resources for use in their portal of government services. New York University stores XML-encoded finding aids in Oracle. The University of Michigan converts word processing documents into XML for publishing online. Ohio State University transforms interlibrary loan requests into XML and uses XSL style sheets to print them. The Lane Medical Library at Stanford converts MARC records into XML for updating. Truly, a thousand flowers are blooming.

The case studies in XML in Libraries are grouped loosely by application area, with a minimal gloss by the editor. All of the essays follow a common format, with sections on Background; Project goals and justification; Project description; Problems and challenges, successes and failures; Plans; Tips and advice (lessons learned); Implications; Contacts; and Links and resources. The writing is unapologetically technical, containing detailed descriptions of application architecture, sample records, and occasional snippets of code. Sentences like, "This service also uses SOAP to expose its methods for remote execution," and "I'm quite certain that I will very soon find myself rewriting the search engine in Java using a JDBC interface, with Servlet delivery of the HTML, rather than in Perl DBI and CGI.pm," abound. This may scare off some readers while being just the ticket for others.

XML in Libraries is more than a celebration of a community eager to experiment. It offers a feel for the types of things that can be done with XML, exposes the reader to a number of models, tools, and protocols, and gives a good snapshot of the universe of XML-related development at a particular period of time. It also contains some lessons. For example, a number of the projects complain about the immaturity of development tools, poor XML support in browsers, incompatibilities between products, and problems with character set handling. Several applications were being redesigned or rewritten to improve performance, to take advantage of more stable technologies, or simply to incorporate lessons learned in the first iteration. While at this point XML support in browsers, databases, parsers and other utilities is much improved, these essays do give a strong sense of the challenges involved in adopting technologies while they are still in their infancies.

What XML in Libraries does not do, however, is contribute much to the larger debate over the proper role of XML in library bibliographic systems. Despite much "MARC is dead" alarmism and silliness all around, the library community is trying to grapple with serious questions with far-reaching consequences. How can bibliographic systems be redesigned to accommodate both MARC and XML-based metadata formats? To what extent does the expression of AACR2/MARC semantics in an XML-based format, such as MODS or XOBIS, allow (or require) the re-thinking of AACR2? How can we best interface our library business systems with the business systems of our suppliers?

Questions like these are not addressed in this volume. To the extent they are bumped up against at all, the answers are surprisingly casual. Stanford asks, "...once MARC records are reformatted [into XML], what does a library do with them?" but answers only that there are many tools for storing and querying XML. The State Library of Tasmania writes:

A system based on a data repository that would automatically generate site content required controlled and accurate resource descriptions (metadata). This metadata had to be highly specific, yet consistent in terms of access points, vocabularies, and subject terms. This controlled and structured metadata also had to be simple to enter, easy to index, and flexible in terms of output and reuse possibilities. For these reasons, we decided to build an XML system from scratch rather than using traditional library and MARC-based systems.

This is an explanation that certainly leaves one with more questions than insight.

This isn't really a criticism of XML in Libraries, as the volume doesn't purport to be more than a collection of useful case studies, but is mentioned to warn the reader who may be expecting something else. It also doesn't cover very common uses of XML, such as in OAI-compliant metadata harvesting, or the use of XML in descriptive metadata schemes like ONIX. There is a chapter on METS, but nothing on MODS, or on any of the Library of Congress' attempts to move MARC into the XML environment. For all that, the essays are informative, intensely practical, and uplifting in an odd sort of way. While libraries remain mindful of our ends, we can certainly have fun with our means.

CWIS: Campus-Wide Information System, an early-1990s ancestor of the campus portal.

HTTP: HyperText Transfer Protocol, a communications standard used in the Web environment.

MARC: A standard format for exchanging library cataloging records.

METS: Metadata Coding and Transmission Standard, an XML schema for encoding metadata related to objects in a digital library.

MODS: Metadata Object Description Schema, an XML representation of MARC semantics under development by the Library of Congress.

ONIX: A set of XML schemas developed by publishers for communicating book trade information.

OSI: A network architecture and suite of protocols developed by ISO in the 1970s, useful now mostly as a teaching model.

SOAP: Simple Object Access Protocol, a standard for shipping XML messages over HTTP.

XOBIS: The XML Organic Bibliographic Information Schema, an XML representation of MARC semantics under development by the Medlane project of the Stanford University Medical Center.

XML: eXtensible Markup Language, a simplified subset of SGML suitable for use on the Web.

XSL: eXtensible Stylesheet Language, a standard for creating stylesheets to use with XML documents.

DOI: 10.1045/may2003-bookreview

D-Lib Magazine May 2003

Volume 9 Number 5 ISSN 1082-9873

XML in Libraries

Reviewed by Priscilla Caplan, Assistant Director for Digital Library Services, Florida Center for Library Automation, <pcaplan@ufl.edu>

Copyright © 2003 Priscilla Caplan

D-Lib Magazine
May 2003

Volume 9 Number 5

ISSN 1082-9873