Converging technological trends in the arena of geographical information within the traditional digital library communities have witnessed increasing demand for the establishment of Spatial Data Infrastructures (SDIs) at various levelsglobal, national and regional. The importance of geographic information is thus slowly becoming axiomatic, yet its significance within the realm of digital libraries remains somewhat opaque. This article argues that distinctions between the use of geographic information within and without digital libraries is somewhat artificial and favours a more holistic model predicated on building infrastructures for the access, discovery and delivery of geographic resources that are applicable to communities of users, of which the digital library community is a specific instance. We introduce the concept and vision of a community specific SDI (CSSDI), using the UK academic digital library community as an example, and describe the key components that have been implemented. It is argued that the critical elements for a CSSDI for the UK academic digital library community already exist and that the concept of a CSSDI may itself provide a focus around which other digital libraries might consolidate their efforts to better exploit geographically referenced resources.
There is an increasing recognition that, in a knowledge based economy, the effective use of geographic information (GI) is of critical importance (including a wide range of spatially referenced or referenceable resources). Traditionally, the use of GI has been viewed as a specialist topic. Now, however, a notable paradigm shift is underway, and the means to handle and process geospatial data is supported by emerging mainstream commercial, off-the-shelf data management tools (e.g., Oracle, DB2, MySQL, Postgres).
The exploitation of geospatial data within diverse policy environments, allied with the increasing attention being afforded to cross discipline social and environmental issues, has led to the demand for infrastructures to assist in the discovery, dissemination and exploitation of geospatial data assets.
Such infrastructures are more generally referred to as Spatial Data Infrastructures (SDIs) and may be defined more formally, as "the relevant base collection of technologies, policies and institutional arrangements that facilitate the availability of and access to spatial data" (Nebert, 2001).
More fully, an SDI hosts geospatial data and metadata, provides a means of discovering and evaluating the data, provides methods to access the data and establishes the necessary licensing agreements between stakeholders to make use of the data. A community specific SDI (CSSDI) also has these characteristics, but a CSSDI is not defined by geographical scale; rather its unique characteristic is the way it meets the requirements of a discrete and well-defined sector or community.
What then is the relationship between a CSSDI and a digital library? According to a report by the D-Lib Working Group on Digital Library Metrics (Leiner, 1998), the term "digital library," while assuming a range of possible meanings, has a broad definition that includes a number of key characteristics and can be defined as "the collection of services and the collection of information objects that support users in dealing with information objects and the organization and presentation of those objects (available directly or indirectly) via electronic/digital means."
The commonality between a CSSDI and a digital library therefore hinges on the provision, organisation and presentation of services and objects to a target audience with the added refinement of content being GI-centric for a CSSDI. In effect a digital library provides an ideal environment within which to promulgate and institute a CSSDI. Of course, recognition of the significance of GI within digital libraries has to date been subservient to the more conventional information axes or ways of regarding information resources (by subject, author, keyword), due in large part to the traditional practices of the library cataloguing community. A CSSDI aims to complement traditional digital library practices by enabling geographic access to resources with the ultimate aim of making geography another standard search and use dimension.
Barriers to such adoption exist, and one of the key challenges is convincing the non-GI literate that a geographical approach has a role to play. Geography is both pervasive and synergistic and, for many aspects of resource discovery, represents an intuitive access methodology.
A CSSDI provides a conceptual focus for geographical referencing within digital libraries, and it incorporates mainstream GI thinking, providing a concrete GI application within a specific community context.
Characteristics of the UK Higher and Further Education (HFE) Academic Community
The provision of framework data onto which other thematic data can be portrayed is an essential ingredient of any SDI. As a distinct community within the UK economy, the academic community benefits from arrangements with the National Mapping Agency (Ordnance Survey of Great Britain (OSGB)). Under a unique agreement, the UK academic community has complete access to national coverage geospatial framework data. The significance of this is that it lays the foundation for a CSSDI that has little counterpart outside of the UK academic digital library.
Alongside data and tools, there is a need for a knowledgebase to support users of the data who are often relative GI novices. This needs to contain not only guidelines on how to use and integrate data but also explanations on basic concepts. It is the shared research and teaching agenda of the UK academic community that provides the rationale for the existence of an academic digital library and the motivation for a CSSDI.
Key Stakeholders in the UK Academic CSSDI
The evolution of the CSSDI within the UK digital library realm is an organic outgrowth of a range of independent, strategically guided activities under the aegis of a variety of distinct stakeholders. The main stakeholders are cited below and, while specific to the UK, other countries will no doubt have readily identifiable counterparts to the key stakeholders; e.g., the National Science Foundation in the U.S. Figure 1 indicates how each of these stakeholders maps into the concept of both a CSSDI and a digital library.
The JISC (Joint Information Systems Committee) is the strategic advisory committee working on behalf of the academic community's funding bodies in the UK. JISC itself funds the network infrastructure, information services, and development projects, collectively known as the JISC Information Environment (JISC IE) which is, in essence, the de facto UK academic digital library. The JISC IE provides academia with access to heterogeneous resources, ranging from bibliographic and multimedia to (crucially in this context) geospatial data and research data, which are disseminated primarily through two National Data Centres (EDINA and MIMAS). In the context of the CSSDI, the JISC IE provides a set of common policies for participating services and organisationally manages and oversees the provision of services and data to the community. Taken together, these services and policies provide the substrate within which the UK academic digital library is embedded.
Alongside JISC, CHEST (Combined Higher Education Software Team) acts as a focal point for the supply of software, data, information, training materials and other IT related products to the UK academic community. Notably, CHEST has negotiated access for staff and students to a variety of geospatial data; e.g., digital map data, digitised aerial photography and satellite data, as well as end user tools such as desktop geographic information system (GIS) software.
EDINA Geospatial ServicesThe focus of this article is the range of geospatial services provided by EDINA that are relevant to the UK CSSDI. These services and projects are outlined next and are presented in an order which reflects a discovery, locate, and use workflow.
Given the inherit complexity of geospatial digital data and the difficulties often experienced by its users in its discovery and manipulation, all of EDINA's discovery and access tools endeavour to abstract the complexities into simple point and click interfaces. The primary objective is to achieve maximum dissemination of valuable data (in terms of both cost and research potential) to the widest possible audience with the minimum of obstacle and delay.
Go-Geo! is an online resource discovery tool that allows for the identification and retrieval of spatial metadata records (Figure 2). These metadata records describe the content, quality, condition and other characteristics of geospatial datasets. Go-Geo!'s current geographic scope is the UK. It supports geospatial searching as well as the more traditional topic and keyword forms of searching, including support for both controlled vocabulary searching and free text searching. Users are able to search for datasets held by UK academic nodes and for resources outside academia via the national GIGateway portal and its directory services.
Go-Geo! is based on the Z39.50 protocol. It is a proof-of-concept project undertaken jointly with the UK Data Archive at the University of Essex. It demonstrates the utility of a data locating tool for geospatial resources and the critical role a more comprehensive one-stop shop geo-portal for spatial data and basic mapping services might provide. Metadata creation is as important as a discovery tool, but is often a task which data creators or custodians view as low priority. A core part of the project is looking at strategies for promoting metadata creation, while another aspect is looking at cost-effective mechanisms by which academic researchers could publish their data for use by others.
Launched in January 2000, EDINA's Digimap provides access to a wide range of OSGB data. Users are able to download the data to use with appropriate application software such as GIS or CAD (Medyckyj-Scott and Morris, 1998). However, for those with neither access to nor the skills to use GIS applications, a variety of online tools are provided. A simple tool allows users to view, customise and print maps of any location in Great Britain at a series of predefined scales (see Figure 3). More advanced tasks are supported through a set of tools and include producing maps at user-specified scale, combining datasets on a map, large format printing, and gazetteer functions. The service currently supports in excess of 17,000 users across nearly 84 institutions and provides onward delivery of data to users worth an estimated annual value of £6M. This clearly illustrates (all within the specific context of UK academia) key aspects of an SDInamely, availability of and access to geospatial data.
For a larger view of Figure 3, click here.
The United Kingdom Boundary Outline and Reference Database for Education and Research Study (UKBORDERS) is one of the oldest of the EDINA services. It provides digitised boundary datasets of the UK, for teachers and researchers to download and incorporate in their work. Data includes census, administrative, electoral, and postal boundaries both contemporary and historic, and provides staff and students with a library of digital boundary data which they can download in many common proprietary geographical information systems (GIS). The ability to deliver real-time extraction and delivery to user-specified parameters relieves staff and students at institutions from having to undertake onerous data management and maintenance operations. The service is maintained on a 24/7 basis, supporting real-time on-demand data extraction and just-in-time delivery.
In CSSDI terms, both of the above services illustrate the capacity to store, manage and deliver large holdings of geospatial data as well as providing mechanisms for previewing and publishing. In terms of the characteristics of a digital library, both services admirably illustrate the provision and presentation aspects of DL services.Access to data is either free (UKBORDERS) or available at vastly reduced costs (Digimap). This is a non-trivial consideration, as the cost:benefits to the community (direct revenue and opportunity costs) mean that the relative overheads are kept at a low threshold, which encourages uptake and resource exploitation.
Users of the geospatial data within the UK academic digital library come from a wide range of disciplinary backgrounds, many with a limited tradition of either the application of, or teaching about, geospatial data. The e-MapScholar project (Ross, Medyckyj-Scott and Mackaness, 2003) was a consortium project that developed tools and learning and teaching materials to enhance and support the use of the geospatial data available within UK academia (Figure 5).
The project supported the needs of teaching staff by providing new, exciting and interactive learning materials using geospatial data. It has provided an infrastructure for a specific learning community and enhances the understanding of spatial issues that are crucial as underpinnings to a CSSDI. The project had a number of novel aspects such as the ability for teaching staff to customise the interactive tools embedded within the learning resources and localise the geographic maps and data displayed in those tools. Localisation was achieved by linking the tools to map and data servers using open standard interface specifications. Along with other web-based resources, it forms the kernel of a customisable knowledgebase to support the fuller exploitation of the CSSDI.
geoXwalk ("geo cross walk") (Reid, 2003) is a gazetteer service and shared terminology server providing machine-to-machine protocols to support geographic searching within the JISC IE. In effect, it is a service which enables simple and complex geographic searching of resources within the UK academic digital library. It provides critical middleware infrastructure through which other services within the digital library may undertake geographic querying.
It has long been recognised, at least within the GI community, that geography provides a powerful mechanism by which to search. Within traditional digital libraries, there has been some recognition of this with efforts to provide geographical indexes into resources. However, such efforts have generally been piecemeal and suffer from being too restrictive in the concept of geography; e.g., a placename is frequently used for simple text-based, pattern-matched searches. This is a very simplistic approach to geographic searching and fails to address the fact that there is little consistency in the geographic vocabulary used. In the UK, resources might be indexed (if at all) by placename, by postcode, by administrative code or some other arbitrary coding convention. This has led to a plethora of geographies used to index and search resources. In practical terms, what this means is that geography as a search dimension lacks both consistency and persistence, which makes reliability and repeatability in searches problematic.
There is, however, one geography that obviates all these problems: the coordinate system. In the UK this means the Ordnance Surveys' National Grids or, alternatively, a latitude/longitude encoding. By encoding coordinate representations of our various geographies, we arrive at a geography which is both consistent and persistent and provides a baseline against which comparisons of the various geographies may be conducted. This is the key to geographically enabling resources and is what underpins the ability of geoXwalk to resolve complex geographical queries. As the name suggests, geoXwalk, can crosswalk the varying manifestations of geographies to translate one representation into another (see Figure 6). geoXwalk thus abstracts the geographic search function away from the calling service by providing a shared terminology service upon which other services within the digital library can call.
The core gazetteer and terminology services, in conjunction with another EDINA developed resource referred to as a geoparser, can also be used to derive geographic indexing for non- or partially georeferenced resources, using semi-automatic processes. The geoparser, using natural language processing techniques, attempts to identify occurrences of placenames within the resource and then relates these features to the gazetteer in order to turn words into numbers, i.e., by translating placenames into coordinates using the coordinate locations associated with placenames in the gazetteer. Inconsistent and non-persistent geographical references in the resource can thus be translated into a persistent coding scheme that can in turn be fed back into geoXwalk for additional geographic searching or, as was the original rationale behind the geoparser, used to update the resource's metadata in order to provide explicit georeferencing for use in future geographic based searches. The gazetteer can thus support geographic translation and geographic indexing within the digital library, both of which enhance the capacity to use geography as a key search parameter.
geoXwalk is another proof of concept project being undertaken with the UK Data Archive and is currently undergoing pre-service trialing. Go-Geo! utilises geoXwalk to perform enhanced geographic searching and in so doing overcomes deficiencies that exist in extant metadata about geospatial data holdings.
Standards, Interoperability and Web ServicesA common thread running through all of the services and projects outlined above is the adoption of open standards and protocols, in particular the Open GIS Consortium's (OGC) interoperability suite. (Open GIS Consortium, 1998) Use of such standards and protocols allows for integration and sharing across different communities of which the digital library community is but one. Their adoption also assists in development efforts and service delivery within the digital library as a variety of such web services can be utilised together simply and efficiently in order to develop rapidly function-rich applications and services to meet changing requirements and circumstances. These emergent open standards are well aligned with the global standards models adopted more generally within digital libraries and therefore leverage technologies and vocabularies that are readily familiar.
We see CSSDI as a concept of use for communicating and planning. To talk solely of SDIs as defined by scale (i.e., global, national, regional) disguises the fact that unique policies and institutional arrangements apply to specific communities or sectors, and furthermore, that these have their own distinct requirements in terms of the technologies that are established to facilitate access to the geospatial data they require. Similarly, we have argued that there is a commonality between the vision of a CSSDI and the salient characteristics of a digital library. The UK academic digital library is an example of a community served by a CSSDI. There may also be a lesson here for digital library communities in that advancement within the wider GI community towards the establishment of SDIs might assist in focusing attention on the role that geography can play within digital libraries. The concept of a CSSDI to supplement the traditional view of a digital library may therefore act as a kernel around which diverse geo-centric projects and initiatives may crystallise and provide a conceptual holism with which to accelerate and promote the importance of geography as a key (but thus far neglected) search axis for resource discovery.
EDINA gratefully acknowledge the contribution of the UK Data Archive, University of Essex, with respect to joint projects mentioned in this article.
Leiner, B. (1998): The Scope of the Digital Library, Draft Prepared by Barry M. Leiner for the D-Lib Working Group on Digital Library Metrics, October 15, 1998.
Medyckyj-Scott, D. and Morris, B. (1998): "The virtual map library: Providing access to Ordnance Survey digital map data via the WWW for the UK higher education community." Computers, Environment and Urban Systems, 21, 1: pp. 31-45.
Nebert, D. (Ed.) (2001): "Developing Spatial Data Infrastructures: The SDI Cookbook." GSDI Cookbook, Version 1.1.
Open GIS Consortium (Ed.) (1998): The OpenGIS Guide (3rd ed) Introduction to Interoperable Geoprocessing and the OpenGIS Specification.
Reid, J. (2003): "geoXwalk - a gazetteer server and service for UK academia." In Koch, T. and Solvberg, I.T., eds., Research and Advanced Technology for Digital Libraries, 7th European Conference (ECDL 2003), August 17-22, 2003, Trondheim, Norway. Berlin: Springer, pp. 387-392.
Ross, R.S., Medyckyj-Scott, D. and Mackaness, W.A. (2003): "The e-MapScholar project - An example of interoperabilty in GIScience education." Proceedings of the 6th AGILE Conference on Geographic Information Science, April 24-26th, 2003, Lyon, France. Presses Polytechniques et Univeristaries Romandes, pp. 231-237.
Spatial Data Infrastructures - <http://www.gsdi.org/>
Copyright © 2004 James S. Reid, Chis Higgins, David Medyckyj-Scott, and Adrew Robson