
D-Lib Magazine
October 1999
Volume 5 Number 10
ISSN 1082-9873
Reference Linking in a Hybrid Library Environment
Part 3: Generalizing the SFX solution in the "SFX@Ghent & SFX@LANL" experiment
Herbert Van de Sompel
Los Alamos National Laboratory - Research Library
herbert.vandesompel@rug.ac.bePatrick Hochstenbach
Automation Department of the Central Library
University of Ghent, Belgium
patrick.hochstenbach@rug.ac.be
Abstract
This is the third part of our papers about reference linking in a hybrid library environment. The first part described the state-of-the-art of reference linking and contrasted various approaches to the problem. It identified static and dynamic linking solutions, open and closed linking frameworks as well as just-in-case and just-in-time linking. The second part introduced SFX, a dynamic, just-in-time linking solution we built for our own purposes. However, we suggested that the underlying concepts were sufficiently generic to be applied in a wide range of digital libraries.
In this third part we show how this has been demonstrated conclusively in the "SFX@Ghent & SFX@LANL" experiment. In this experiment, local as well as remote distributed information resources of the digital library collections of the Research Library of the Los Alamos National Laboratory and the University of Ghent Library have been used as starting points for SFX-links into other parts of the collections. The SFX-framework has further been generalized in order to achieve a technology that can easily be transferred from one digital library environment to another and that minimizes the overhead in making the distributed information services that make up those libraries interoperable with SFX.
This third part starts with a presentation of the SFX problem statement in light of the recent discussions on reference linking. Next, it introduces the notion of global and local relevance of extended services as well as an architectural categorization of open linking frameworks, also referred to as frameworks that are supportive of selective resolution. Then, an in-depth description of the generalized SFX solution is given.
Rephrasing the SFX problem statement
The problem statement
It is relevant to rephrase the SFX problem statement in the context of the meetings and the subsequent reports and publications on reference linking organized by the Digital Library Federation (DLF), the National Information Standards Organization (NISO), the National Federation of Abstracting and Indexing Services (NFAIS), and the Society for Scholarly Publishing (SSP) (Caplan 1999a; Caplan 1999b; Caplan & Arms 1999; Needleman 1999).
The generic statement of the reference linking problem, as defined by the working group on reference linking was (Caplan 1999a; Caplan & Arms 1999):
Given the information in a standard citation, how does one get to the thing to which it refers?
However, the working group concentrated on a specific variation on this:
Given the information in a citation to a journal article, how does a user get from the citation to an appropriate copy of the article?
The SFX research also addresses these problems, but only as an instance of a more general problem that can be formulated as:
Given bibliographic metadata, how does one present relevant extended services for it?
Bibliographic metadata as a starting point
Clearly, the SFX research is not only concerned about information in a standard citation. Its starting point is bibliographic metadata in general. As such, information entities originating from typical scholarly resources such as records from abstracting & indexing databases, OPAC systems and preprint archives can be used as a starting point in the SFX problem statement. This is also the case for citations to both journal articles and books found in journal articles or books. But even fractional bibliographic metadata such as an author name taken from an e-mail message is a valid starting point in the SFX problem statement.
Extended services as a goal
A similar generalization holds for the target of the problem statement since the SFX research is not only concerned about linking to the full-text that corresponds to a citation in a journal article. It aims at the presentation of a variety of extended services for whichever metadata is used as a starting point. Extended services are services that present an information entity in a digital library -- defined as the link-source -- in the context of the entire information environment (Van de Sompel & Hochstenbach 1999a). For instance, for a given link-source record from an abstracting & indexing database, extended services can -- amongst others -- be the presentation of:
- the full-text of the paper that is abstracted in the link-source;
- a record abstracting the same publication taken from another abstracting & indexing database;
- citation information corresponding with the link-source;
- library holdings for the journal in which the article described by the link-source appeared.
Global and local relevance of extended services
The adjective relevant is of particular importance in the notion relevant extended services as used in the SFX problem statement. It actually has two meanings: relevance as a global notion and relevance as a local notion. In order to explain this, the following types of extended services are considered:
- full_text: a service providing the full-text that is referred to by a link-source;
- review: a service showing a book review for the item referred to by a link-source;
- abstract: a service that provides the abstract from an abstracting & indexing database for a link-source.
Relevant as a global notion must be interpreted as being opposed to irrelevant in every context. Certain aspects of extended services are independent of the context of an individual collection; they actually apply on a global level:
- full_text: If the publication year of an article is equal to or higher than that of the first electronic issue of the journal in which the article was published, a full_text service has global relevance. On the other hand, it never makes sense to present a full_text service for a link-source referring to a paper in a journal if the publication year of the paper is lower than the publication year of the first issue of the journal for which full-text is globally available. As such, the publication year of the first electronic issue is a constraint of global significance to the full_text service.
- review: It is always irrelevant to present a book review service if the link-source refers to a journal article. But, if the link-source describes a book, such a review service is globally relevant. In this case, the material type is a constraint of global significance for the review service.
- abstract: A constraint of global significance rules the relevance of an abstract service that looks up the abstract of a citation to a journal article in a particular abstracting & indexing database. Such a service is globally relevant if the journal in which the article is published is actually indexed in that abstracting & indexing database and is globally irrelevant otherwise.
Relevant as a local notion, refers to the fact that other aspects of extended services are dependent on the boundaries of a certain digital library collection. Local relevance has two manifestations:
- Relevance related to the content of a local collection:
While certain services are relevant in a global sense, they can become irrelevant if the digital library collection does not contain the information resource(s) required to implement them. Even if a full-text service is globally relevant for a certain link-source, it might be considered to be irrelevant in the context of a certain digital library collection if the journal referred to by the link-source is not part of that collection. In the same way, an abstract service pointing to a particular abstracting & indexing database for a given link-source can be globally relevant, as described above. Still, such a service is of no local relevance if the user’s digital library does not provide access to an implementation of that particular database, while it can be of local relevance if the digital library does.
- Relevance related to the implementation of a local collection:
The relevance of extended services will also depend on the technical implementation of the information resource(s) required to create the services. When a full_text service is globally relevant -- an electronic edition of an article exists -- as well as relevant in relation to the content of a certain collection -- the users of the digital library are authorized to access the electronic edition -- it can be regarded inappropriate to let the full_text service link to a full-text instance at a publisher’s site, when the digital library holds an instance in its local storage. In the DLF reference linking discussion, this issue was given the name of "the Harvard problem" (Caplan 1999a). Similar problems occur in the broader scope of extended services. For instance, as shown before, an abstract service can be globally relevant -- the journal in which an article was published is abstracted in a particular abstracting & indexing database -- as well as relevant in relation to the content of the collection -- the local digital library does provide access to the particular database. Still, the service might be irrelevant in relation to the implementation, if the actual implementation of the database does not support a mechanism to link into it using the parameters required to do an abstract look-up.
Systems supportive of selective resolution
Both issues regarding the local relevance of extended services indicate the need for open linking solutions that take the context of the local collection into account when links are presented to a user (Van de Sompel & Hochstenbach 1999a). When addressing the Harvard problem the DLF reference linking discussions have referred to open linking solutions as being supportive of selective resolution (Caplan & Arms 1999). From the above, it can be seen that the problem of local relevance of extended services is actually a generalization of certain aspects of the Harvard problem. As such, when a framework is able to present an approach to deal with the broader problem, the approach will also contain valuable elements to address the narrower Harvard problem.
![]()
Figure 1: Systems supportive of selective resolution
In relation to the Harvard problem, Caplan and Arms divide systems that support selective resolution into two categories:
- Systems with a non-local location database, to which institutions provide a profile describing their full-text collection. The profile controls the selection of links returned to users of that profile.
- Systems with an institutional location database describing the local full-text collection and a global location database as a fall back. In addition to that, there is a mechanism to pass resolution requests to the local resolver first and in the event of a local full-text instance not being found there, to the global resolver.
This categorization can further be generalized by:
- Broadening the scope of the services to be provided beyond the restriction to the full-text, taking into account all kinds of extended services.
- Identifying the crucial components of systems supportive of selective resolution (see Figure 1):
- The redirection mechanism that brings metadata of the link-source for which extended services are requested from the information resource to which the link-source belongs to the service component. The redirection mechanism addresses the problem that has been referred to as grabbing the link-source (Van de Sompel & Hochstenbach 1999a).
- The service component that takes metadata from whichever information resource in the digital library collection as an input, delivering extended services as an output. The service component is an extension of the location database referred to by Caplan and Arms.
- Recognizing that the order of the redirection is subject to variation:
- Redirection of the link-source metadata to the local service component first, using a central service component as a means to complete the set of services that can be presented.
- Redirection of the link-source metadata to the central service component, whose default services can be overwritten and/or completed after communication with the local service component.
CATEGORY Category 1
central
central
Category 2
a central & local
local => central
b
central & local
central => local
Category 3
local
local
SERVICE COMPONENT REDIRECTION ORDERTable 1: categorization of systems supportive of selective resolution
The resulting categorization is represented in Table 1, where 3 main categories of systems supporting selective resolution are shown, based on the nature of the service component and the redirection order:
- Category 1 only has a central service component and hence a central redirection mechanism. To some extent, this is the category under which the NCBI LinkOut solution resides. Still, since that solution is tied in with the PubMed database and cannot be used in connection with other resources, it can hardly be seen as a real service component in the sense described earlier.
- Category 2 has both a central and a local service component that contribute to the presentation of the services. Also, there is some form of communication between both. For this Category, it is possible to imagine both approaches regarding the redirection order mentioned above.
- Category 3 builds purely on a local service component and hence also needs a local redirection mechanism. The SFX implementations of both the Elektron and "SFX@Ghent & SFX@LANL" experiments fall within this Category.
The "SFX@Ghent & SFX@LANL" experiment
In the "SFX@Ghent & SFX@LANL" experiment (April 1999 - June 1999; henceforth referred to as Ghent&LANL), the Library Without Walls team of the Research Library at the Los Alamos National Laboratory (LANL) and the Automation Department of the Central Library at the University of Ghent have cooperated to illustrate the feasibility of the SFX approach as a means to provide extended services in a realistic and complex information environment.
The information environment in which Ghent&LANL has been conducted is dramatically different from the one of the first Elektron SFX experiment. To illustrate, Table 2 presents an overview of the information resources used in Ghent&LANL. The rows show the names of the information resources used in the experiment, the columns refer to the digital library collection. For each resource/collection combination the table indicates:
- The Type of resource: OPAC system, abstracting & indexing database (A&I), full-text collection (FTXT) or web-service (WWW);
- The Authority running the resource;
- Whether within the digital library collection, the resource is used as a Source. If so, information entities from the resource can be link-sources for which extended services can be requested. If a resource is a Source, the authority running it has made it SFX-aware;
- Whether within the digital library collection, the resource is used as a Target. If so, the resource is used to be linked into in order to provide extended services. If a resource is a Target, a link-to syntax has been developed by the authority running the resource, in order to allow for it to be the Target of dynamic SFX-links.
RESOURCE GHENT LANL Type
Authority
Source
Target
Authority
Source
Target
Advance
OPAC
-
-
-
LANL
yes
yes
Aleph 500
OPAC
Ghent
yes
yes
-
-
-
Amazon.com
WWW
Amazon
no
yes
Amazon
no
yes
Antilope
OPAC
UA
no
yes
-
-
-
APS PROLA
FTXT
APS
yes
yes
APS
yes
yes
the arXiv
FTXT
LANL
yes
yes
LANL
yes
yes
BIOSIS
A&I
Ghent
yes
no
LANL
yes
no
Books in Print
A&I
Ghent
yes
yes
Ghent
yes
yes
Compendex
A&I
Ghent
yes
no
LANL
yes
no
Current Contents
A&I
Ghent
yes
yes
Ghent
yes
yes
EconLit
A&I
Ghent
yes
no
-
-
-
Genome base
A&I
NCBI
no
yes
NCBI
no
yes
Inspec
A&I
-
-
-
LANL
yes
no
SP
no
yes
SP
no
yes
Ulrich’s
A&I
Ghent
yes
yes
-
-
-
LiSa
A&I
Ghent
yes
yes
-
-
-
MathSci
A&I
Ghent
yes
no
-
-
-
Medline
A&I
Ghent
yes
no
-
-
-
NCBI
no
yes
NCBI
no
yes
SciSearch
A&I
LANL
yes
yes
LANL
yes
yes
ScienceServer
FTXT
LANL
no
yes
LANL
no
yes
Various
FTXT
various
no
yes
various
no
yes
Wiley InterScience
FTXT
Wiley
yes
yes
Wiley
yes
yes
Table 2: information resources in Ghent&LANL
Some considerations regarding Table 2:
- As can be seen, some resources are available in both digital library collections, but run on different technical implementations. This is the case for BIOSIS and Compendex, which in Ghent run on a SilverPlatter ERL platform, while LANL -- at the time of the experiment -- used a Geac Advance implementation.
- For the purpose of this experiment, Ghent and LANL share some of their resources. Ghent makes its SilverPlatter ERL version of Books in Print and Current Contents available for LANL, whereas LANL opens access to its Topic implementation of the ISI Science Citation Index (SciSearch) and its ScienceServer storing the full-text of all Elsevier journals.
- Ghent uses two Medline versions: a locally stored ERL version as Source and the NCBI PubMed version as Target. Similarly, LANL uses two Inspec versions: the local Geac Advance implementation as a Source and an ERL implementation run by SilverPlatter in Boston as a Target. Time constraints that prevented the development of appropriate link-to syntaxes for the local versions are the reason for this peculiarity.
- Of special importance is the fact that some journals from the Wiley InterScience collection as well as the complete PROLA archive of the American Physical Society are made SFX-aware (Halstead 1999; Spilka 1999). Both Ghent and LANL can use citations in the full-text of these repositories as link-sources for SFX requests. Also, in the course of this experiment Wiley has implemented a link-to syntax that will be brought into production later in 1999. For the PROLA archive such a link-to syntax was already available.
- Some resource names require a little more explanation. Aleph 500 is the Ghent Integrated Library System, Advance is the one for LANL, while Antilope is the Belgian Union Catalogue of Serials run by the University of Antwerp. The header "various" refers to a variety of full-text repositories to which dynamic links are available in this experiment. This is -- amongst others -- the case for Academic Press, Company of Biologists, HighWire, Springer, American Chemical Society, etc. The arXiv is the Topic implementation of the Ginsparg arXiv e-print repository, developed by the Library Without Walls team of the LANL Library. It has also been made SFX-aware.
- As can be seen from careful exploration of Table 2, from the point of view of each digital library collection, the SFX-aware information resources are highly distributed. Some resources are run by the institutional library automation team while others are run remotely, actually by three external authorities. From the point of view of Ghent these authorities are LANL, Wiley and the American Physical Society; from the point of view of LANL, they are Ghent, Wiley and the American Physical Society.
From the above, it can be concluded that from the point of view of the amount of resources that are involved, and given their distributed nature and the availability of multiple SFX service components, Ghent&LANL is a very realistic experiment.
The need for a generalization of the SFX components
Although the fundamental concepts of SFX -- dynamic linking, just-in time linking and conceptual services (see (Van de Sompel & Hochstenbach 1999b)) -- have been left untouched for the Ghent&LANL experiment, the nature of its working environment and its goals have led to a strong generalization of the SFX components. The main impulses that inspired such a generalization and that distinguish the Ghent&LANL project from the Elektron experiment are:
- The extension of the digital library collection in which SFX was being tested beyond a well-controlled sub-collection of one institution. In Ghent&LANL, SFX is introduced in the realistic, complex and dissimilar digital libraries of two autonomous institutions each running their local SFX components;
- The extension of the scope of data for which extended services can be requested beyond the internally stored collections. Link-sources in Ghent&LANL also originate from resources held by external authorities;
- The extension of the datatypes for which extended services can be requested beyond abstracting & indexing databases and OPAC systems. Link-sources in Ghent&LANL can also be citations in journal articles;
- The accommodation of extended services linking into target resources, based on metadata in general, not only SICI-related metadata;
- The need for high transportability of the SFX solution between the digital library environments that are involved.
The redesign of the SFX solution for Ghent&LANL leads to an architecture with a clear separation between the redirection component and the service component. Both components obviously interoperate in order to achieve a functional system. But the redirection component can potentially operate in an environment with non-SFX service components, while the SFX service component can equally function with another redirection mechanism, as long as that supports delivery of link-source metadata to the SFX service component. Several functional building blocks in both components have also been generalized in order to address the problems that arise from the complexity of the Ghent&LANL environment. The overall approach of the generalized solution is shown in Figure 2 and will be explained in more detail in the remainder of this paper. Information resources that can interoperate with SFX -- from now on referred to as SFX-aware systems -- insert an SFX-button for each link-source in the result set of a query. The just-in time approach of SFX requires the user to click such an SFX-button when requesting extended services for a specific link-source record. In response to this click, the local SFX redirection component will fetch link-source metadata -- usually -- from the origin resource using whichever protocol it takes to do so. Next, link-source metadata as well as information on its origin will be converted into an interfacing format. At this point, the local redirection mechanism has fulfilled its task and is able to deliver this information in a consistent representation to the local SFX service component.
![]()
![]()
Figure 2: the local redirection and service components of the generalized SFX solution
The first task of the local service component is to parse the information, handed over by the local redirection component, into a normalized internal representation object. During this process, the original content can be enhanced and/or augmented. The resulting information object is then fed into the SFX evaluation process in which it will be compared to the SFX-database. The SFX-database is a special kind of linking database. Unlike traditional linking services, it does not contain any static links between "documents" (records/citations/full-text/etc.) of a collection. Rather, it contains a collection of conceptual services that express potential inter-relationships between documents at the level of the resource from which they originate. The SFX evaluation process determines the relevance of each of these conceptual services using the -- lack of -- content in the information object. Next, the resulting bundle of relevant services is sent back to the user in the SFX-menu-screen. Consistent with the just-in-time approach of SFX, only when the user decides to use a service from the bundle, will the service be resolved into a URL to which the user is being redirected.
The SFX mechanism for local redirection
The task of the local redirection mechanism is to transport link-source metadata to the local redirection component, that interfaces with the local service component. In order to be able to interoperate with the SFX redirection mechanism, information resources need to be enhanced by the authorities running them in order to make them SFX-aware. The aim of this is to create the ability for information resources to insert an SFX-button targeted at the local redirection component for each link-source in the result set of a query into the resource. In the context of Ghent&LANL, the following are important considerations with this regard:
- Many information resources that are involved in the experiment are also used in normal production at the very same time. This means that they are also approached by users that do not have access to an SFX service component. In order to prevent such a user from seeing an irrelevant SFX-button, an SFX-aware resource must be able to recognize whether the user has access to an SFX service component or not. Based on that information, the resource can insert an SFX-button or not.
- Some information resources are approached by users from both digital library environments, hence with access to different SFX service components. An SFX-aware resource must be able to target the SFX-button at the appropriate local redirection component, in order for it to be able to deliver the link-source metadata from the origin information resource to the doorstep of the appropriate service component. This means that an SFX-aware resource must be able to parameterize the target of an SFX-button.
- Upon receipt of a request for extended services from a user, the local redirection component must be able to fetch the link-source metadata from its origin resource. This means that the local redirection component has to be informed about the origin and the identity of the link-source in order to be able to take the appropriate steps. Given the amount, distribution and diversity of the SFX-aware resources in Ghent&LANL, a consistent manner to communicate such information to the local service components is required.
- Link-source metadata must be fetched from a wide variety of distributed information resources that support different access protocols. In addition to that, those resources will respond by sending link-source metadata formatted according to different metadata schemes. In order for the local redirection component to be able to interface in a generic manner with the local service component, a unique metadata interchange format is desirable.
As will be shown, in the detailed description below, these issues are approached by:
- For (a) and (b): the CookiePusher mechanism;
- For (c): the consistent SFX-URL structure;
- For (d): the SourceParser solution.
Making information resources SFX-aware
The authorities running information resources need to enhance their systems in order to make them SFX-aware. The complexity of the Ghent&LANL environment has urged for a thoughtful exploration of ways to make resources SFX-aware, since only approaches that minimize the overhead in doing so for the authorities running the resources can be acceptable and workable. In the current implementation of the SFX redirection mechanism, they have to do this by:
- Installing the CookiePusher script delivered by the project managers of Ghent&LANL;
- Hyperlinking the SFX-buttons for link-sources using a URL that complies to a predefined format.
The CookiePusher
The CookiePusher script is a pragmatic solution introduced to dynamically notify an information resource about the existence and location of a local SFX redirection component in the environment of the user consulting the resource. The underlying idea is that an information resource could at any time access the location of a local redirection component, if its URL were written as a cookie in the browser of the user consulting the resource. The availability of this URL is essential, since the resource must be able to dynamically target the SFX-button at the appropriate local component. However, for reasons of security and privacy, such browser cookies can maximally be read within the Internet domain of the server that has set the cookie (see Shishir 1996 pages 203-204). As such, it is impossible to set such a cookie so that it can be read by all information systems in a digital library collection when it consists of resources distributed over several domains, typically resources that are local and remote to the user’s institution.
In order to solve this problem, the first step in connecting to a resource is to request a server in the domain of the information resource to create an HTTP cookie. This detour is called the CookiePusher. The very simple CookiePusher script is installed in the domain of the information resource that has to be made SFX-aware. Rather than connecting immediately to the desired URL in the information resource, a connection is made to the resource’s CookiePusher first, sending values for the two parameters of the CookiePusher script:
- SFX_location: the URL of the local redirection component of the SFX solution;
- Redirect: the desired URL in the resource.
Upon receipt of these parameters, the CookiePusher will first read the URL of the local redirection component and will use it to set a cookie in the user’s browser. Since the CookiePusher is in the domain of the resource, that cookie will be readable by the resource. Next, the CookiePusher will redirect the user to the desired URL in the resource.
As such, once the CookiePusher has been installed for a resource, the URL to connect to that resource will be changed to:
CookiePusher_URL?SFX_location= local_SFX& Redirect= service_URL
Where
- CookiePusher_URL is the URL of the CookiePusher script;
- local_SFX is the URL of the local SFX redirection component;
- service_URL is the desired URL in the information resource as used under normal -- non-SFX -- conditions. Such a URL can point at the initial search screen for an abstracting & indexing database, it can be a URL linking to an article at a publishers site, etc.;
- local_SFX and service_URL are URL-encoded.
For instance:
http://publish.aps.org/edaccess/prolatest/cookiepusher? SFX_location=http%3A%2F%2Fisiserv.rug.ac.be%2Fcgi-bin%2Fsfx%2Fbin%2Fmenu.cgi &Redirect=http%3A%2F%2Fpublish.aps.org%2Fedaccess%2Fprolatest%2Ftext%2FPRD%2Fv52%2Fi1%2Fp15_1is the URL used to connect to an item in the APS/PROLA domain. The APS/PROLA CookiePusher will read the location of the local redirection component from the SFX_location parameter and will use this to set a cookie named local_SFX with value:
http%3A%2F%2Fisiserv.rug.ac.be%2Fcgi-bin%2Fsfx%2Fbin%2Fmenu.cgiwhich is the encoded location of the Ghent local SFX redirection component. Next, it will redirect the user to the desired location in the APS/PROLA:
http://publish.aps.org/edaccess/prolatest/text/PRD/v52/i1/p15_1From now on, at any point in the consultation, APS/PROLA will be able to read this cookie and use it to target -- in this case -- the Ghent redirection component.
The consistent SFX-URL structure
The essence of the detour made via the CookiePusher is the ability it creates for an information resource to know at any point whether the consulting user has access to a selective resolution system and, if so, what the location of its redirection component is. Based on that information, the resource can dynamically decide whether or not to insert an SFX-button for search results and if it does, which redirection component to target with the SFX-button. In order to make the many systems involved in the Ghent&LANL experiment interoperable with SFX, authorities running the systems have been asked to make the URL targeted by the SFX-button -- the SFX-URL -- compliant to the following format:
GENERAL
target?serviceDesc&objectDesc
DETAILED
local_SFX?vendorId=<theVendor>&databaseId=<theBase>&objectDesc=<theIdentifier>
Table 3: the syntax of the SFX-URL
In Table 3
- target is the URL of the local redirection component of the SFX solution;
- serviceDesc uniquely defines the origin resource. It contains information on the vendor of the resource and on the resource itself. It is of the form:
vendorId=<theVendor>&databaseId=<theBase>.
serviceDesc information will play a crucial role at later stages of the SFX local redirection mechanism, as well as in the SFX-base which is central to the SFX service component.
- objectDesc contains information that relates to the identity of the link source. Its syntax and content is extremely flexible and it will be defined by the authority running the resource, making it dependent on the vendor and his database implementation. objectDesc typically contains the unique record identifier for a link-source in its origin resource. Alternatively or in addition to that, it can contain SICI-like metadata. In some cases, it can even contain all metadata of the link-source.
- The parameter values <theVendor>, <theBase> and <theIdentifier> are URL-encoded.
Figure 3 to Figure 6 show examples of link-sources taken from Sources in the Ghent and/or LANL collections, mentioning their SFX-URL. For reasons of readability, the parameter values are not shown as being URL-encoded. Rather, it is mentioned that parts should be URL-encoded by enclosing them in a URLencode function.
![]()
SFX-URL for this link-source, pointing at the Ghent local redirection component:
http://isiserv.rug.ac.be/cgi-bin/sfx/bin/menu.cgi?vendorId=ERL&databaseId=BX
&objectDesc=URLencode(BX02 A:199900063465 I:0008-543X V:00085 S:000001 P:000065 Y:1999)
In the serviceDesc part of the URL, ERL refers to the SilverPlatter ERL implementation of BIOSIS, while BX is the family name of BIOSIS databases in the ERL environment. The objectDesc component contains several information elements in a tagged and fixed length representation. BX02 is the volume of the BIOSIS database where the link-source originates, while 199900063465 is the accession number, a unique record number of the link-source in BIOSIS. Other elements in the objectDesc are ISSN number, volume, issue, starting page and publication year.
Figure 3: a link-source from the Ghent ERL implementation of BIOSIS and its SFX-URL
![]()
SFX-URL for this link-source, pointing at the LANL local redirection component:
http://vole.lanl.gov/cgi-bin/sfx/bin/menu.cgi?vendorId=ADVANCE&databaseId=Biosis
&objectDesc= URLencode(fetchId=21179970&objectId=PREV199800135979&SICI=0016-6731(1998)148:2<645:TIOCTA>2.0.TX\;2-P)
The serviceDesc part of this URL is self-explanatory. The objectDesc component is tagged and fields can have variable lengths. The fetchId is the unique number of the link-source in the LANL implementation of BIOSIS, while the part of objectId after "PREV" is the BIOSIS accession number which is comparable to the A field in the SilverPlatter objectDesc of Figure 3. The SICI part contains a SICI for the link-source, from which ISSN, volume, issue, pagination and publication year can be derived.
Figure 4: a link-source from the LANL Advance implementation of BIOSIS and its SFX-URL
![]()
SFX-URL for the third reference as a link-source, pointing at the Ghent local redirection component:
http://isiserv.rug.ac.be/cgi-bin/sfx/bin/menu.cgi? vendorId=Wiley&databaseId=WIS
&objectDesc= URLencode(TYPE=JCIT& SNM=Saven&FNM=A&SNM=Piro&FNM=L&ATL= The newer purine analogues for the treatment of hairy-cell leukemia.&JTL=N Engl J Med &PYR=1994&VID=330&PPF=691&PPL=7)
The serviceDesc component now refers to the Wiley InterScience collection. The objectDesc is tagged and starts with an indication on the material type of the reference -- journal citation in this case -- followed by a tagged repetition of the full citation.
Figure 5: a link-source from Wiley InterScience and its SFX-URL
![]()
SFX-URL for the first link-source in the above result screen, pointing at the LANL local redirection component:
http://vole.lanl.gov/cgi-bin/sfx/bin/menu.cgi?vendorId=LANLTopic&databaseId=arXiv
&objectDesc= URLencode(fetchId=phys-9811004&objectId=physics/9811004)
The serviceDesc refers to the LANL Topic implementation of the Ginsparg e-print archive. The fetchId is the unique key for the record in that implementation, while the -- very similar -- objectId is the unique record number in Ginsparg’s implementation of the archive. No further metadata is available in the objectDesc.
Figure 6: a link-source from the arXiv and its SFX-URL
Fetching link-source metadata from an SFX-aware information resource with SourceParsers
The CookiePusher mechanism enables a resource to insert an SFX-button for each of the link-sources that are transferred to a user consulting the resource. The structure of the SFX-URL targeted by these SFX-buttons has been made consistent across resources to be of the form target?serviceDesc&objectDesc. When a user requests extended services by clicking such an SFX-button, a request is sent to his local SFX redirection component, which will receive serviceDesc and objectDesc values as parameters for the target script. The local component holds a collection of SourceParser scripts with names corresponding to valid serviceDesc’s (see Table 4). Having analyzed the serviceDesc information, the target script will launch the appropriate SourceParser. This serviceDesc-specific SourceParser uniquely implements:
- The interpretation of the information contained in the objectDesc parameter based upon the syntax defined by the vendor (see examples in Figure 3 to Figure 6);
- The mechanism to fetch the link-source from its origin resource based on its origin and on the content of its objectDesc. Table 4 shows those fetch mechanisms for the examples of Figure 3 to Figure 6. As can be seen, no real fetching is required for the Wiley citations, since these are completely transferred in the objectDesc part of the SFX-URL. The same technique is used for citations in the PROLA archive. Both the Ghent and LANL BIOSIS implementations deliver some -- SICI related -- metadata in the objectDesc. But since several extended services that SFX aims to deliver require more metadata, a fetch is required in order to obtain more complete information. Since the objectDesc for the arXiv only contains an identifier, a fetch is definitely required;
- The conversion of the fetched link-source metadata, that is expressed in the metadata scheme supported by the authority running the origin resource, into a metadata container compliant with the scheme of the unique metadata interchange format. This metadata container is the interface between the local redirection component and the local service component.
RESOURCE
serviceDesc
SourceParser
Fetch protocol
Fetch key
the arXiv
LANLTopic
arXiv
S::LANLTopic:arXiv
HTTP
fetchId
BIOSIS
ERL
BX
S::ERL::BX
Z39.50
A
BIOSIS
ADVANCE
Biosis
S::ADVANCE::Biosis
Z39.50
fetchId
Wiley
Wiley
WIS
S::Wiley::WIS
none
none
Table 4: Some SFX-aware resources with their serviceDesc, Fetch protocol and Fetch key
The SFX service component
The task of the local SFX service component starts at the point where the local redirection mechanism hands over the metadata container that contains, in a consistent representation:
- link-source metadata that became available through the local redirection mechanism;
- information on the origin of the link-source, basically serviceDesc information.
It is the task of the SFX service component to deliver extended services based on this information. The following are important considerations regarding the SFX service component in Ghent&LANL:
- The amount and quality of link-source metadata that becomes available in the metadata container is dependent on the type of resource from which its link-source originated and on the amount of information that the authority running the origin resource allows and/or supports to be fetched. In some cases such metadata can be corrupt or lack information that is essential for the SFX evaluation process to adequately perform its task;
- The SFX service component must be easily transportable between different digital library environments and remain easily manageable;
- The SFX service component must ultimately deliver service links in a just-in-time manner.
As can be seen from a detailed description of the SFX service component, these problems have been approached by:
- For (a): the GenericRequest object;
- For (b): a generalization of the implementation of the SFX-database, that explicitly reflects the notion of global and local relevance of conceptual services as well as the notion of global and local Thresholds;
- For (c): the TargetParser solution.
The GenericRequest object
The service component will take the metadata container delivered by the local redirection mechanism as input and turn it into a normalized internal representation, called the GenericRequest object. Table 5 shows a representation of the GenericRequest object for the third citation in Figure 5. The GenericRequest object is an intelligent object, that is able to self-check the validity of its information elements based on pre-configured rules. It can also augment/enhance its content using information from a supporting database. For instance, the citation of Figure 5 does not contain an ISSN number nor a journal title, but rather an abbreviated journal title. In this case, the GenericRequest object augments its content, by adding the missing information via communication with a supporting database. Obviously, the GenericRequest object also contains a normalized version of the link-source metadata, as well as information about its origin.
At the time of the experiment, interoperability between the SFX local service component and non-SFX local redirection mechanisms was not an issue, since none were existing. As such, for reasons of simplicity, the metadata scheme of the GenericRequest object has fulfilled the role of interfacing metadata scheme between the local redirection and the local service component in Ghent&LANL.
<perldata><hash><item key="rec$vendorId">Wiley</item><item key="rec$databaseId">WIS</item><item key="rec$dbId">Wiley::WIS</item><item key="objectType">JOURNAL</item><item key="@abbrevTitle"><array><item key="0">N ENGL J MED</item></array></item><item key="journalTitle">NEW ENGLAND JOURNAL OF MEDICINE</item><item key="ISSN">0028-4793</item><item key="year">1994</item><item key="volume">330</item><item key="startPage">691</item><item key="endPage">7</item><item key="@authLast"><array><item key="0">Saven</item><item key="1">Piro</item></array></item><item key="@authInit"><array><item key="0">A</item><item key="1">L</item></array></item><item key="articleTitle">The newer purine analogues for the treatment of hairy-cell leukemia.</item></hash></perldata>Table 5: Representation of an augmented GenericRequest object for the link-source of Figure 5
The SFX linking service and the SFX-base
As a result of the above, an instance of the GenericRequest object for the link-source for which extended services have been requested has become available to the SFX service component. It will be the task of this component to deliver the extended services to the user that has requested them. In this sense, the SFX service component is a linking service that, given a certain input "document", outputs "documents" related to the input. The SFX linking service is special, however, since it does not store static relationships between individual documents. Rather, it stores relationships between the resources from which the documents originate. In SFX, these relationships are called conceptual services and they are stored in the SFX-base. The SFX evaluation process will determine the relevance of each of these conceptual services based upon the information and origin of a link-source.
The requirement imposed on the Ghent&LANL implementation of the SFX service component to be easily transportable between different digital library environments has led to an important generalization of the design of the SFX-base. This has been achieved by explicitly reflecting the notion of global and local relevance of services in the implementation. A synthesized representation of the lay-out of the Ghent&LANL SFX-base is given in Figure 7.
![]()
Figure 7: Simplified lay-out of the SFX-base
Splitting the Colli table
As in the Elektron version of the SFX-base, the Source table contains the information resources that can be origins for link-sources. They are SFX-aware resources. In the Elektron version, the Colli contained conceptual services, directly coupled with the Target resources. (see Table 2 in (Van de Sompel & Hochstenbach 1999b)). Such a set-up was not adequately generic and, in the current design, this Colli has been split. One table has kept the name Colli, the other has been named the Target table. The Target table contains those resources into which linking is possible. The Colli table that connects the Source and Target tables now expresses the type of service that relates Source with Target resources. Table 6 shows the type of services implemented in Ghent&LANL.
COLLI SERVICES
FUNCTION
abstract
look-up of abstract information in an abstracting & indexing database for the item represented by the GenericRequest object
author
look-up of references by an author of the item represented by the GenericRequest object in an abstracting & indexing database
cited_author
look-up of citations to work by an author mentioned in the GenericRequest object
cited_reference
look-up of works citing the item represented by the GenericRequest object
full_text
link to the full-text of the item represented by the GenericRequest object
genome
look-up of sequence information found in the GenericRequest object
holding
holdings look-up in an OPAC system for the item represented by the GenericRequest object
review
look-up of a book review for then item represented by the GenericRequest object
Table 6: Services in the Colli and their function
Taking advantage of the global relevance of conceptual services
It is not a coincidence that the resources shown as Source and/or Target carry their globally common names rather than those of their local implementations in Ghent or LANL. This is actually a reflection of the conclusion that services relating Source and Target resources have global relevance. It is globally relevant to deliver an abstract service that, given a link-source from BIOSIS shows the corresponding abstract from Medline. Such a conceptual service can be imagined regardless of the implementations of each of these resources in a specific digital library. Therefore, the Ghent&LANL SFX-base expresses the relationships between Sources and Targets at the level of global relevance: there is an abstract service connecting BIOSIS and Medline, regardless of their local implementations. A very limited number of examples of how such services of global relevance connect Source and Target is shown in Table 7.
COLLI
SOURCE
SERVICE
TARGET
APS/PROLA
abstract
Inspec
the arXiv
author
Inspec
BIOSIS
abstract
Medline
BIOSIS
genome
Genome Base
Current Contents
abstract
LiSa
EconLit
review
Books in Print
Inspec
full_text
Springer
Wiley
abstract
Medline
Wiley
cited_reference
Science Cit. Base
Table 7: Examples of service relationships between Sources and Targets
Localization of services of global relevance
While the services shown in Table 7 are of global relevance, they do not take into account issues of relevance in relation to the local digital library collection. This localization of services of global relevance is achieved by:
- The introduction of fields referring to the local implementations, next to the globally common names.
As shown in Table 8 and Table 9, a key reflecting the serviceDesc values of the local implementations of resources -- found in the rec$dbId field of the GenericRequest object -- is added next to the global common name of the Sources. In the same way, at the Target side, the name of a local TargetParser is added next to the global name of which the local Target is an implementation. The TargetParser procedure implements the link-to syntax into the local implementation of the Target resource. It can be seen from Table 8 and Table 9 that Ghent and LANL use a different SourceParser for BIOSIS, which reflects that they have a different implementation. However, they share a TargetParser to provide the abstract service into Medline, since both have chosen the PubMed implementation as a Target to achieve this.
- Deactivating services of global relevance when they are not of local relevance.
When the Source or Target resource required to implement a certain service is not available in the digital library collection, when the local implementation of the Target resource does not support the link mechanism required to implement the service, or when local librarians decide the service to be of no use to their end-users, its flag will be set to inactive. The service will no longer be taken into account in the SFX evaluation process deciding on the local relevance of conceptual services. In Table 8 this is the case for services with Inspec as a Source since Ghent does not have an Inspec implementation in its collection. In Table 9, this is the case for services with LiSa as a Target, since LANL does not have access to a LiSa implementation.
SOURCE
COLLI
TARGET
local
global
global
local
S::APS::PROLA
APS/PROLA
abstract
Inspec
T::ERL::IN
S::LANLTopic:arXiv
the arXiv
author
Inspec
T::ERL::IN
S::ERL::BX
BIOSIS
abstract
Medline
T::NCBI::PubMed
S::ERL::BX
BIOSIS
genome
Genome Base
T::NCBI::Genome
S::ERL::CCO
Current Contents
abstract
LiSa
T::ERL:LI
S::ERL::EC
EconLit
review
Books in Print
T::ERL::BOIP
inactive
Inspec
full_text
Springer
T::Springer::LINK
S::Wiley::WIS
Wiley
abstract
Medline
T::NCBI::PubMed
S::Wiley::WIS
Wiley
cited_reference
Science Cit. Base
T::CIC15:SciSearch
Table 8: Localization of services from Table 7 for Ghent
Source
Colli
Target
local
global
global
local
S::APS::PROLA
APS/PROLA
abstract
Inspec
T::ERL::IN
S::LANLTopic:arXiv
the arXiv
author
Inspec
T::ERL::IN
S::Advance::Biosis
BIOSIS
abstract
Medline
T::NCBI::PubMed
S::Advance::Biosis
BIOSIS
genome
Genome Base
T::NCBI::Genome
S::ERL::CCO
Current Contents
abstract
LiSa
inactive
inactive
EconLit
review
Books in Print
T::ERL::BOIP
S::Advance::Inspec
Inspec
full_text
Springer LINK
T::Springer::LINK
S::Wiley::WIS
Wiley
abstract
Medline
T::NCBI::PubMed
S::Wiley::WIS
Wiley
cited_reference
Science Cit. Base
T::CIC15:SciSearch
Table 9: Localization of services from Table 7 for LANL
Global and local Thresholds
The relationships between Source and Target resources expressed by a service connection in the Colli is made subject to restrictions called Thresholds. These Thresholds are the way to fine-tune conceptual services in order to minimize the presentation of services that are considered not to be appropriate to be presented. In order to illustrate this concept, two types of Thresholds are described:
- Thresholds expressed in terms of boundaries for the metadata elements that make up the GenericRequest object structure. Technically, these Thresholds are expressed as conditional statements using field names of the GenericRequest object. Such Thresholds are in many cases very simple, but they can as well be scripts of whic