Report on the 2009 Joint CENDI/NKOS Workshop
Knowledge Organization Systems: Managing to the Future
Marcia Lei Zeng
A Joint CENDI/NKOS Workshop, "Knowledge Organization Systems: Managing to the Future", was held at the National Agricultural Library at Beltsville, Maryland, United States on October 22, 2009. The themes include: toward a shared development environment, toward interoperability, and toward ontologies and the Semantic Web. Ten invited speakers represented government, academic, and commercial organizations from a variety of disciplines.
The 2009 CENDI/NKOS Workshop, "Knowledge Organization Systems: Managing to the Future", was held at the National Agricultural Library at Beltsville, Maryland, on October 22, 2009. It was the ninth workshop on Networked Knowledge Organization Systems/Services sponsored by the NKOS group in the U.S. since 1998. NKOS is a community of over 300 practitioners from more than 10 countries who are interested in the use of knowledge organization systems (e.g., classifications, gazetteers, lexical databases, ontologies, taxonomies, thesauri, etc.) in networked environments. The workshop's co-sponsor, CENDI, is an interagency working group of senior scientific and technical information managers from 13 U.S. federal agencies. CENDI's mission is to help improve the productivity of federal science- and technology-based programs through effective scientific, technical, and related information-support systems. CENDI also co-sponsored the 2008 Joint CENDI/NKOS Workshop which was held at the World Bank in Washington, D.C. The organizers of this year's workshop were Gail Hodge, Denise Bedford, Marcia Zeng, and Marjorie Hlava.
The full-day workshop was attended by 70 colleagues. Ten invited speakers represented government, academic, and commercial organizations from a variety of disciplines. The workshop organizers were especially grateful that Dr. Thomas Baker, Chief Information Officer of DCMI Ltd. and chair of the Dublin Core Metadata Initiative Usage Board attended the workshop and gave a talk. All presentation materials are available on the workshop website.
Toward a shared development environment was the theme of the first session, moderated by Lori Finch of the National Agricultural Library.
Marjorie Hlava of Access Innovations, Inc. introduced Taxonomy ShareSpace. Taxonomy ShareSpace is a terminology registry that houses a collection of controlled vocabularies of all types and complexities. It is also a space for users to access, deposit, share, save, and discuss taxonomy resources. In addition, ShareSpace provides visitors with informative examples of taxonomy structure, shows the variety and scope of taxonomies (and other terminologies), and reveals the possibilities that can be achieved with a taxonomy.
Michael Pendleton of the U.S. Environmental Protection Agency (EPA) gave a presentation titled "EPA Terminology Services: Toward Better Terminology Management". After walking through the major features of the EPA Terminology Services, Pendleton emphasized the importance of stewards who own and maintain the vocabularies. EPA is trying to identify a steward for every active vocabulary. Currently, efforts of the EPA Terminology Services include enhancing repository content, identifying Stewards, and exploring value added opportunities.
Gail Rayburn from the Johns Hopkins Applied Physics Laboratory (APL) gave a presentation entitled "Subject Matter Expert Thesaurus Review: Improving Collaboration with Researchers". Currently, the APL Thesaurus has 41 major subject categories. Due to the diversity and volume of APL materials, subject matter expert review of the APL Thesaurus is mandatory. A Web-based, stand-alone application called Subject Matter Expert Thesaurus Review Application has been developed by the Lab to support and manage this review.
Toward interoperability was the theme of the second session which was moderated by Marjorie Hlava of Access Innovations, Inc.
Jane Greenberg, University of North Carolina, reported first on "Interoperable Thesauri: The Challenges and Experiences of the HIVE Project". HIVE is the acronym for Helping Interdisciplinary Vocabulary Engineering. HIVE's technological infrastructure stores millions of concepts from different vocabularies and is preparing to make them available on the Web by a simple HTTP protocol. The project uses the automatic metadata generation approach to dynamically integrate discipline-specific controlled vocabularies encoded with the Simple Knowledge Organization System (SKOS) to provide efficient, affordable, interoperable, and user friendly access to multiple vocabularies during metadata creation activities.
Marcia Lei Zeng, Kent State University, reported on the progress of ISO 25964: Thesauri and Interoperability with Other Vocabularies. Part 1 of ISO 25964, Thesauri for information retrieval, is in ballot now (available from http://drafts.bsigroup.com/) and Part 2, Interoperability with other vocabularies, is in progress. She explained the difference between ISO 25964-1 and SKOS and introduced major issues for Part 2, such as addressing "non-symmetrical multilingual thesauri" and modeling various kinds of vocabularies. She also briefly talked about a proposal for development of a Dublin Core Application Profile for KOS Resources by a newly formed DCMI-NKOS task force. KOS' attributes is an area that NKOS members have researched and worked on since 1996. At a minimum level, metadata for KOS resources will describe specific characteristics of a KOS, facilitate the discovery of KOS resources, assist in the evaluation of such resources for a particular application or use, and enhance sharing, reusing, and collaborative developing of the KOS resources.
Thomas Baker, Chief Information Officer of DCMI, spoke on Dublin Core application profiles in context, to relay the DCMI-NKOS proposal mentioned above. He revealed the newest development of Interoperability Levels for Dublin Core Metadata, a DCMI Recommended Resource. He explained the information to be shared at each level and illustrated interoperability layers from different perspectives: open and closed world, supporting technologies, deployed base, rate of growth, design choices, and pros and cons involved in designing applications for different types of interoperability. Along with the illustration of application levels, Dr. Baker also discussed the long-term benefit, that "data quality is independent of profiles used to create it."
Ed Summers of the Library of Congress demonstrated the functions of the LC Authorities and Vocabularies Service. The LC Authorities and Vocabularies Service will enable both humans and machines to programmatically access authority data at the Library of Congress via URIs. The Library of Congress Subject Headings (LCSH) are available from this service in HTML, RDF/XML, N-Triples, and JSON formats. The subject headings are also mapped to Rameau, a subject heading vocabulary used by the French National Library. The Library of Congress will release additional vocabularies such as the Thesaurus of Graphic Materials, the MARC Geographic Area Codes, the MARC Language Codes, and the MARC Relator Codes. Ed Summers also showed a parallel development of dewey.info, a linked data version of the Dewey Decimal Classification (DDC) Summaries in ten languages. He briefly updated the Owlification (referring to OWL Web Ontology Language) of classes and notes.
The last session of the Workshop had the theme Toward ontologies and the Semantic Web, and was moderated by Michael Pendleton of EPA.
"Evaluating ontology alignment techniques" was the title of the presentation by Willem Robert van Hage of the Vrije Universiteit, Amsterdam. He laid out the two approaches in combining vocabularies, ontology merging and ontology alignment. He analyzed the challenges of merging, including legal, management, security, and legacy issues. He encouraged people to publish linked data by creating it or by aligning with it.
Gilberto Fragoso of the National Cancer Institute (NCI) presented "NCI Thesaurus, managing towards an ontology". He began with background on the NCI Enterprise Vocabulary Services (EVS). EVS integrates different conceptual frameworks for clinical, basic and translational research, as well as creating terminological and taxonomic conventions across systems. Among the controlled terminology products used are the NCI Thesaurus, an ontology-like cancer-centric controlled terminology, NCI Metathesaurus, which maps biomedical vocabularies, and the new BiomedGT (Biomedical Grid Terminology). External vocabularies maintained and served include MedDRA, HL7, NDF-RT, LOINC, GO, Zebrafish, RadLex, etc. Details of the BiomedGT initial development were provided.
Finally, Denise Bedford of the World Bank, and also the Goodyear Professor of Kent State University, presented Ontology Summit Review 2008-2009. First she reported on the development of the repository's architecture for the Open Ontology Repository (OOR) which was a major theme of the 2008 Ontology Summit. The theme of the 2009 Summit was the ontology standards. The Summit focused on the intersection and common goals of two active communities, information standards and ontology and semantic technologies, which currently do not work together. Many challenges were proposed throughout the online discussions leading up to the face-to-face Summit. Nine endorsed efforts which demonstrate the advantages of incorporating ontology approaches when developing and applying a standard, were introduced. She concluded that the ontology community's goal is to focus at the 'concept' level and extend and enrich relationships among concepts rather than the KOS level, and that NKOS and IKOS are strong partners for this collaboration going forward.
Just before this CENDI/NKOS Workshop, the 8th European NKOS Workshop took place on October 1, 2009, in Corfu, Greece, as part of the 2009 European Conference on Digital Libraries (ECDL).
Information about events and publications of NKOS, how to subscribe to the NKOS list, and the archives of the previous programs and presentation materials are available at the international NKOS website at http://nkos.slis.kent.edu/.
The author would like to thank Gail Hodge, Senior Information Scientist at Information International Associates, Inc., for contributing to this report.
About the Author