Metasearch also called parallel search, federated search, broadcast search, and cross-database search has become commonplace in the information community's vocabulary. All speak to a common theme of searching and retrieving from multiple databases, sources, platforms, protocols, and vendors at the point of the user's request. Metasearch services rely on a variety of approaches including open standards (such as NISO's Z39.50 and SRU/SRW), proprietary programming interfaces, and "screen scraping." However, the absence of widely supported standards, best practices, and tools makes the metasearch environment less efficient for the metasearch provider, the content provider, and ultimately the end-user.
To spur the development of widely supported standards and best practices, the National Information Standards Organization (NISO) sponsored a Metasearch Initiative  in 2003 to enable:
- metasearch service providers to offer more effective and responsive services,
- content providers to deliver enhanced content and protect their intellectual property, and
- libraries to deliver a simple search (a.k.a. "Google") that covers the breadth of their vetted commercial and free resources.
The Access Management Task Group was one of three groups chartered by NISO as part of the Metasearch Initiative. The focus of the group was on gathering requirements for Metasearch authentication and access needs, inventorying existing processes, developing a series of formal use cases describing the access needs, recommending best practices given today's processes, and recommending and pursing changes to current solutions to better support metasearch applications. In September 2005, the group issued their final report and recommendation . This article summarizes the group's work and final recommendation.
The Access Management Process (AMP) is defined as the abstract communication between a principle (or principle's agent, such as a metasearch engine) to a metasearch engine or to a service provider for the purpose of accessing a resource. An AMP defines the way a user ("principle") authenticates (establishes the right to use an identity) and is authorized (establishes the right for that identity to perform an action) to access a service or resource. In the case of the metasearch environment, there are two AMPs one between the user and the metasearch engine and one between the metasearch engine and the resource provider as depicted in Figure 1.
Figure 1: Access Management Process in Metasearch
The AMP can require multiple steps and can be quite complex, as illustrated in Figure 2. A principle asserts credentials typically in the form of username and password that allow it to establish the right to use a network identity in a process generically called "authentication." Establishing the right to use a network identity, by itself, does not necessarily grant permission to perform an action. The principle must also be "authorized"; a common method used by service providers is to examine attributes of the network identity (does it represent a student, faculty, or a walk-in patron?). Authorization determines a user's entitlements and usually takes the form of a token or "certificate" that represents the completion of the AMP.
Figure 2: The Access Management Process
In a metasearch environment, this process involves multiple "actors" (end user, authenticator, authentication release authority, authorizer, metasearch engine, data source), and an actor can play multiple roles. A variety of methods are currently in use to perform the steps of access management, including proprietary protocols, and the access management tokens from one stage cannot necessarily be passed on directly to the next stage. As such, the access management process in a metasearch environment can be a very resource intensive operation even more so than the search and retrieval operation that follows it. As metasearch use continues to grow, improved authentication / certification mechanisms are needed that would reduce performance impact and sustain or even improve access controls.
The Task Group used the following method to evaluate AMP solutions and to develop their recommendations:
- Survey of authentication methods in use
Existing authentication methods were identified through surveys and interviews of metasearch providers. Table 1 lists the methods that were considered for further evaluation and ranking. (Detailed discussions of these methods are included in Part II of the full report .)
Table 1: Authentication Methods
||a proprietary access management system for controlling access to web-based subscription services
||a small bit of data that a web server directs a web browser to store, which is then returned by the browser to the server in subsequent resource requests
||a method for controlling access to a server based on the Internet Protocol (IP) address of the incoming connection
||an IETF-defined network authentication protocol that utilizes a trusted third party, called a "keyserver", to perform the authentication of clients on a TCP/IP network using symmetric-key cryptography
|an IETF-defined protocol for accessing directory type information utilizing a simplified (as compared to X.500) protocol
|a protocol for the exchange of messages between and among applications to enable them to perform the functions necessary to lend and borrow items, to provide controlled access to electronic resources, and to facilitate co-operative management of these functions
||an intermediary server that is used to provide additional security between a client and the end server by filtering or caching transactions in both directions
||a method for enabling authentication based on the URL of the source which provided the link
||an implementation of OASIS SAML by Internet2 for the exchange of information about users between a web browser and web server in a secure and privacy-preserving manner
|a protocol to allow self-service machines in the library to exchange data with the library automation system
|Username & Password
||a method of authentication requiring the matching of a username with its associated password
|X.509 Digital Certificates
||a mechanism of utilizing public-key certificates for authentication
- Use cases
A comprehensive set of use cases were developed and then simplified to three metasearch specific cases (described in more detail below).
- Environmental factors
A set of environmental factors was identified that are critical success factors in metasearch.
- Ranking of methods against use cases and environmental factors
Each method was ranked on a ten point scale indicating how well it addressed each use case and environmental factor.
- Aggregation and modeling of rankings
The rankings were aggregated and modeled graphically to identify the best solutions.
The committee concluded its evaluations with a best practice recommendation.
Detailed use cases were developed that included an understanding of:
- Primary actor the principal actor that calls upon system services to achieve a goal
- Stakeholders' behavior the behaviors related to satisfying the stakeholders' interests
- Preconditions what must always be true at the beginning of the use case scenario
- Indicators of success what must be true for the successful completion of the scenario
- Main success scenario the typical success path or flow for a successful scenario
- Alternate flows other scenarios, branches, or decisions that may represent successful or failed scenarios
- Technology requirements any technology specific requirements for conducting the use case scenario
- Special requirements additional behavioral or technical requirements related to the use case
- Frequency of occurrence how often or frequently the use case scenario may need to be repeated
- Open issues known issues for success in the use case
These detailed cases were then combined into three broadly defined situations in which the type of authentication or authorization system required by an information service provider affects a member of a subscribing organization (or community) attempting to access the information service via a metasearch engine:
- In-Network User A user attempts to access a licensed database via the metasearch engine from a location that is on the network of the licensing organization (an "in-network" user).
- For most licensed information service products, the in-network user is the simplest case. The IP address of the machine making the request serves as both the credentials (authorized machine) and attributes (a machine from a subscribing organization) for the access management process. Someone who is permitted to access the physical resources of the licensing organization (the machine or network port) is assumed to be authorized to use the networked resources licensed to that organization. In the case of a resource that is licensed to just a particular group within a larger organization , an "in-network" user is one who is on the network of the subgroup that has licensed the material. In this use case, the resource provider assumes that the metasearch engine is screening requests and only performing searches on behalf of authorized users.
- Out-of-Network User A user attempts to access a licensed database via the metasearch engine from a location that is not on the network of the licensing organization (an "out-of-network" user).
- The value of electronic resources to the end user is almost entirely one of convenience. Thus, while in-network use is the simplest to handle, it is out-of-network use that is usually of most importance to the users themselves; they want to be able to access the resources not just in the library, but in departmental offices or labs, or anywhere on campus. Further, they must be able to access the resources from home. In this use case, as in the first, the resource provider assumes that the metasearch engine is screening requests and only performing searches on behalf of authorized users.
- Credentialed Access A user attempts to access a licensed database via the metasearch engine that relies on some sort of credentials to manage resource access.
- In this use case, the resource provider may assume that the metasearch engine is screening requests or the resource provider may demand some interaction with a trusted entity that proves the metasearch request is coming from an authorized user.
Although the authentication methods can be examined purely in terms of the user when evaluating suitability for a given use case, environmental factors play a critical role as well. These factors must be applied within three different contexts: the metasearch service provider, the information service (i.e., database) provider, and the licensing organization and its users.
Eleven environmental factors were identified as critical success factors in metasearching. These are summarized in Table 2.
- Suitability / Effectiveness Is this authentication method suitable or effective at providing access control? Service providers will evaluate this in terms of reliability and security. Users will evaluate in terms of ability to access the licensed resources.
- Ease of Implementation How easy is it to implement this authentication method? This factor can lead to very different rankings for service providers versus licensing organizations. For example, IP filtering can be very simple for a university to "implement," since all that is required is that a list of IP addresses or ranges be reported to the service provider. The provider, on the other hand, must maintain a database of authorized IP ranges and check all incoming connections against that database.
- Licensing Cost How expensive is it to license any infrastructure necessary to implement the authentication method? For the most common systems deployed today, there is zero, or minimal, licensing cost. Newer and proprietary systems (such as Kerberos or SIP) may require users to acquire software licenses.
- Implementation Cost How expensive is it to implement the authentication method? This is indirectly related to the ease of implementation. Systems that require client software to be installed on end-user computers (such as the X.509 digital certificate infrastructure) will be more expensive than more passive systems like IP filtering.
- Software Expertise Required How much networking or programming expertise is required to implement and maintain the system? In some cases, individual end users may require a certain level of software expertise (for example, can the user successfully modify the proxy and security configuration of their web browser?).
- Security How secure is the authentication method? Is it susceptible to spoofing, forging identities, or cracking?
- Maintainability How much ongoing work is required to maintain the authentication system? What types of changes within the licensing organization require changes to the configuration of the system?
- Robustness How robust is the authentication method? The working group members generally interpreted robustness as a combination of security, maintainability, and scalability. One authentication method is more robust than another if it can be set up and then left to run, with little ongoing attention required beyond monitoring its performance.
- Scalability How scalable is the authentication method? Does it cope well with large numbers of users, licensing organizations, or parallel connections?
- Simplicity of Understanding How simple to understand is the authentication method for the people involved? Having a clear model of how the authentication method works can often simplify support issues.
- Market Acceptance / Preexisting Implementations How common is the authentication method? Does the licensing organization already have the necessary infrastructure in place to support the method? Does the information service provider have other clients already using the authentication method?
|Table 2: Summary List of Environmental Factors|
| 1.||Suitability / Effectiveness|| 7.||Maintainability
| 2.||Ease of Implementation|| 8.||Robustness
| 3.||Licensing Cost|| 9.||Scalability
| 4.||Implementation Cost|| 10.||Simplicity of Understanding
| 5.||Software Expertise Required|| 11.||Acceptance / Preexisting Implementation
| 6.||Security|| ||
Ranking of Authentication Methods
Each authentication method was ranked separately on use cases and the environmental factors using a ten-point scale:
All of the rankings were combined into an average, and the rankings were graphed on a scatter plot with the X axis representing Use Case rankings and the Y axis representing Environmental Factor rankings (Figure 3). While ranking each method, the group was mindful of the different organizational contexts of metasearch applications. For instance, an access method such as Kerberos or Shibboleth might fit well in a college campus setting and deserve a higher mark if only considering that environment. However, some of the very attributes that make that method very effective in a campus setting make it inappropriate in a public library setting. The group's goal was to identify the best methods for universal adoption.
Methods to the right of the chart in Figure 3 below are considered better at satisfying the requirements of the use cases. Methods near the top of the chart performed better on the environmental factors. In general, the rankings should be considered to be relative ones, rather than absolute. For example, Shibboleth satisfied use case requirements better than Referring URL did, while IP Filtering ranked better on environmental factors than Shibboleth.
Figure 3: Relative Ranking of Authentication Methods
It can be argued that the X-axis position on the graph that should be considered more important, as it represents an authentication method's ability to meet the needs of the users. In many cases, a poor ranking on the Environmental Factors axis has more to do with the current implementation environment than with the method itself. For example, Shibboleth was the second-highest ranked method in terms of ability to meet the needs of the use cases, but it scored very poorly on the environmental factors because at the time the ranking was begun, it had only been deployed in test environments and few vendors supported it. Since the UK's Joint Information Systems Committee (JISC), has announced plans to move from Athens (its current authorization system) to Shibboleth within the decade, Shibboleth's environmental rankings for Acceptance/Preexisting Implementations and Ease of Implementation are expected to improve dramatically .
Note that these rankings are focused on authentication in the metasearch environment alone. These should not be considered a generic ranking of the strength of the evaluated models.
The NISO Metasearch Initiative Task Group on Access Management recommends that in the current environment to control access to their electronic resources and support metasearch, institutions in the process of acquiring new electronic resources should implement either:
- IP-Authentication with a Proxy Server (either "traditional" or "rewriting"), or
- Username/Password authentication.
Or they could implement both these methods. These were the two highest ranked authentication methods, in terms of both environmental factors and user acceptability systems, are the most widely supported by vendors, have the lowest implementation and maintenance costs, and are the simplest for smaller or less technically sophisticated organizations to implement. They also ensure that remote (i.e., off-site) users can access the resources of the institution with little difficulty.
While Athens and Shibboleth were both evaluated as more "usable," and IP Filtering had a higher ranking in terms of environmental factors, support for these methods was unbalanced: Athens and Shibboleth are not broadly deployed in the current environment (although Shibboleth is emerging); and IP Filtering doesn't provide an acceptable level of usability for off-site users, who are often the primary beneficiaries of an institution's networked resources.
During the development of this study, the Task Group determined that Shibboleth  had many features that make it a desirable alternative to IP authentication and username/password authentication. However, there are barriers to wide-scale adoption:
- While it is possible to implement Shibboleth using bilateral relationships between service providers and service consumers, its real strength comes when it is implemented in conjunction with a trust federation such as the U.S. InCommon or the Swiss SWITCH federations. A trust federation represents an agreement on the definition of terms such as "member" and "student," and interpretation of attributes between identity providers (a higher education institution, for example) and service providers (a subscription database service). Once implemented, federated identity management provides an improved user experience by eliminating proxy servers and multiple sign-ons; reduces human costs in developing and supporting multiple AMPs; addresses access management of remotely held content, as well as local content; and addresses privacy policies required at most institutions.
- Success of Shibboleth depends almost entirely on adoption at an institutional level; in the U.S., the library community rarely drives adoption of InCommon/Shibboleth. Trust federation policy and legal requirements mandate that campus IT and senior leaders must be actively involved in implementation. Library content protection, where IP recognition has long been considered "good enough" now ranks relatively low as compared to university requirements for institutional procurement, federal funding, and student records. To drive adoption, library leaders should initiate Shibboleth support within their institution; or when it is driven from campus IT, they must demand to participate and represent the library's needs.
- Internationally, adoption of federated identity management has been more predominant. Federal governments and other large funding bodies have been able to mandate standards-based implementations on a broad scale. European markets are starting to lead Shibboleth adoption and acceptance.
- The current Shibboleth implementation, based on version 1.0 of SAML (the Security Assertion Markup Language), does not allow for "pass through" access to controlled resources required by the "Credentialed Access" use case described earlier. In a metasearch scenario, the resource provider needs a programmatic way to trust that the metasearch engine has not tampered with the user's credentials as presented to the resource provider. A resource provider may also need to know more about a user's attributes than is known by the metasearch engine in order to make an access control decision. Alternatively, an institution may choose to enforce a rule that provides less information to the resource provider than was offered to the metasearch engine.
- The OASIS standards body approved a new version of SAML earlier this year, and it describes a delegated authority profile that could be used in a user-to-metasearch-to-resource use case. The Internet2 Shibboleth core development team recognizing the growing need for access management in distributed environments such as metasearch, grid computing, and information portals has begun the work of implementing this portion of the OASIS SAML 2.0 specification. Cross participation between the Shibboleth project team and the NISO Metasearch Initiative access management task group has been established, and work has started on the creation of use cases that express the needs of a metasearch environment.
- At the time of this writing, it is anticipated that a Shibboleth implementation in a metasearch environment will require the addition of web services security  as part of the implemented stack between the metasearch engine and the content provider. This complexity may further limit adoption rates.
- In the space between "good enough" (the status quo) and "ideal" (Shibboleth federations) lie many questions for our community. For libraries there are questions like "Are IP address access management and proxy servers sufficient to meet your current and future needs?" and "How much more are you willing to spend on an implementation of a Shibboleth environment?" For content providers: "Are you satisfied with IP address access management and proxy servers for protecting your intellectual property?" and "Can you implement Shibboleth as a common access management system for interaction with metasearch engines (and possibly end-user access)?" And for metasearch service providers: "What kinds of requirements are you willing satisfy?" and "What are you willing to charge?" The NISO Metasearch Initiative Task Group on Access Management encourages the broad community to discuss these questions. NISO is committed to working with the Shibboleth developers to develop practical solutions to the issues raised.
The following members of the NISO Metasearch Initiative Task Group on Access Management contributed to the work that is described in this article: Mike Teets, Chair (OCLC), Katie Anstock (OCLC PICA), Susan Campbell (CCLA), Frank Cervone (Northwestern University), Paul Cope (Auto-Graphics, Inc.), David Fiander (University of Western Ontario), Ted Koppel (Ex Libris), Peter Murray (OhioLINK), Mark Needleman (formerly SirsiDynix), Ed Riding (SirsiDynix), R. L. Scott (U.S. DOE, OSTI), Tim Shearer (UNC-Chapel Hill), and David Yakimischak (formerly JSTOR). The authors also gratefully acknowledge the assistance of Scott Cantor (Ohio State University) and Steven Carmody (Brown University) as well as the other members of the Internet2 Shibboleth core development team.
Notes and References
 NISO Metasearch Initiative Official Website, <http://www.niso.org/committees/MS_initiative.html>.
 NISO Metasearch Initiative Task Group 1. Ranking of Authentication and Access Methods Available to the Metasearch Environment. Bethesda, MD: NISO, September 13, 2005, <http://www.niso.org/standards/resources/MI-Access_Management.pdf>.
 For example, the Law school has a license to Lexis/Nexis, which is restricted to members of that program only, rather than to the entire university.
 JISC. The Future position on Athens and Shibboleth. 9 Aug 2004. Accessed 26 Nov 2004, <http://www.jisc.ac.uk/index.cfm?name=jisc_athens_shibboleth_pos_news050804>.
 Cantor, Scott, ed. Shibboleth Architecture: Protocols and Profiles. Ann Arbor, MI: Internet2, September 10, 2005, <http://shibboleth.internet2.edu/docs/internet2-mace-shibboleth-arch-protocols-latest.pdf>.
 OASIS Web Services Security (WSS) Technical Committee, <http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wss>.
Copyright © 2006 National Information Standards Organization (NISO)