Integration of Services - Integration of Standards: Workshop Report, Koninklijke Bibliotheek, The Hague March 3, 2006

Search | Back Issues | Author Index | Title Index | Contents

D-Lib Magazine
May 2006

Volume 12 Number 5

ISSN 1082-9873

Integration of Services - Integration of Standards

Workshop Report, Koninklijke Bibliotheek, The Hague March 3, 2006

Theo van Veen
Koninklijke Bibliotheek
<theo.vanveen@kb.nl>

Ray Denenberg
Library of Congress
Washington DC USA
<rden@loc.gov>

	Introduction On March 3, 2006, the Koninklijke Bibliotheek (KB) and the SRU Implementers Group held a workshop in The Hague with the theme "Integration of Services; Integration of Standards", following a two-day SRU (Search and Retrieval via URL) Implementers Group meeting. The purpose of the workshop was to hear about and discuss what should be the next steps to improve integration of services and applications, with the main focus on integration with or via SRU. Two mechanisms were discussed: (1) protocols that have been standardised or formalised to some degree (for example, SRU, OpenURL and OAI) and (2) other services that might benefit from the standards or could be used in conjunction with the standards. The reason for organizing this workshop as an extension of the SRU meeting was threefold: Integration of services means that the output of a service can be used as input to other services; in many cases search and retrieval plays an important role here. SRU is undergoing a standardization process with results that might be useful for other new standards. Some services might require extensions of the SRU standard. It could not be expected, however, that directly at the end of the workshop there would be a clear vision on the next steps to be taken, because integration of services as was discussed at the workshop is quite new. In general, a workshop such as this generates discussions and thoughts in the minds of the participants that become clearer after time for reflection. Therefore, besides describing the individual presentations, this workshop report also includes the thoughts of the presenters that arose after the workshop. It is expected that this will help to provide insight into the issues playing a role in service integration. In addition, this workshop report also includes a brief report by Ray Denenberg on the SRU Implementers Group meeting (see Appendix 1). The subjects of the Integration of services – Integration of standards workshop presentations were balanced such that they covered the different stages in the process of 1) finding resources, 2) searching in resources, and then 3) linking to related or specific resources. These subjects were: CQL, the query language for SRU searches MXG, a NISO initiative subset of SRU for metasearching ZeeRex, service descriptions for SRU and similar services Opensearch, a low barrier but less functional competitor for SRU The relationship between OpenURL and SRU in the D+ project OpenURL and COinS to add semantics to web pages Library 2.0 as a concept for integrating services from different providers Record Update as a service using SRU Service registries to make different services findable Shibboleth as an authentication mechanism that might be used in relation to SRU or the data provided via SRU WSDL, SOAP and REST as mechanisms underlying many web services Generic service descriptions to define the relations between different services (The acronyms used above will be explained in a glossary (see Appendix 2) at the end of this report.) Summary of the Presentations Below are the summaries of the different presentations. The powerpoint and HTML presentations are at: <http://www.loc.gov/standards/sru/march06-meeting/presentations.html>. CQL (Mike Taylor, Index Data) Specifications such as OpenSearch facilitate interoperability by providing standardised syntax for searching. But the higher goal of interoperability at the semantic level further requires a common means of expressing rich queries. CQL (the Common Query Language of SRU) provides this. OpenURL and COinS (Ross MacIntyre, MIMAS) The OpenURL Framework for Context-Sensitive Services has now been endorsed as a NISO standard (Z39.88-2004). This new standard has broadened the potential scope of OpenURL implementation beyond the scholarly information community, with the possibility of extension by registration of new formats and profiles for new domains as well as the introduction of an XML format. Furthermore, the OpenURL Framework has separated the details of the reference and its context, known as the ContextObject, from the means of transporting it across the network, which is the OpenURL. This separation enables use of the ContextObject within other applications. For example, if a ContextObject were to be embedded in a web page, other applications, such as web browser extensions (e.g., Openly's OpenURL Referrer), could provide extra functionality. This has led to the recent development of the COinS ('ContextObject in SPANs') specification, which embeds a ContextObject within an HTML 'span' element. IESR: A Registry of Collections and Services (Ann Apps, MIMAS) The UK JISC Information Environment Service Registry (IESR) (http://iesr.ac.uk) publicises collections of resources, along with details of services that provide to access them, in a machine-readable format, and also provides standalone 'transactional' services. It is a central registry, a middleware shared service intended primarily for machine-to-machine use, within the architecture of the Information Environment. Apps explained how the content of IESR is described, based on metadata standards, giving some examples of current descriptions. The IESR API provides access to the records in IESR via several interfaces. She indicated some possible ways in which IESR could be used. A dynamic portal could discover, then provide an SRU metasearch over, collections appropriate to an end-user, without the need for manual intervention to build resources into the portal. This potentially would widen the user's landscape of useful knowledge. Use of IESR descriptions by harvesting, or by human discovery preparatory to manually plugging a resource into an application, is also expected. Apps finished by indicating future envisaged developments for sharing records across distributed service registries, and she also mentioned some integration issues that have arisen during the development of IESR. Formal Descriptions of Non-standardised Services (Theo van Veen, Koninklijke Bibliotheek) At KB we are exploring how to formalise the description of different kinds of services. The purpose is to let the presence of metadata terms in the output of one service trigger another service and use those metadata terms as input for that other service. A user agent will interpret these service descriptions and offer the user the functionality to link to other services using the metadata in a previous result as input. As a proof of concept, this has been demonstrated by means of an SRU client running in the browser and containing a user agent. Clicking on, for example, a creator field in the SRU response generates links to several services with the creator as input. The user may specify another file with service descriptions to control which services are offered for different metadata terms. It is expected that these service descriptions, when formalised, can be exchanged easily and may be useful for other applications as well. Shibboleth and SRU (John Paschoud, PERSEUS Project) Subtitled "You can't have everything you want, but the Web should know what you can have", this presentation examined possible interactions and mutual benefits between SRU and Shibboleth (for access management of non-public, Web-based resources). Paschoud explained how Shibboleth works over HTTP, and explored the questions: Does SRU need Access Management? (Yes! Although the searchable interfaces to many resources are public, there are good examples of why access authorisation may be required in the course of a federated search). What sorts of 'license terms' ('Attribute Acceptance Policies') might a SRU service impose? (These are likely to fit well with scoped affiliations with authorised end-user institutions providing federated search services; for example an eduPersonScopedAffiliation value of "staff@kb.nl"). Can a SRU service explain its access terms? (In a variety of possible ways). How would a SRU client authenticate as a user? (This is challenging; but a requirement of SRU in common with other proxied end-user services). SRU Record Update (Janifer Gatenby, OCLC Pica) The presentation by Gatenby covered various aspects of SRU Record Update, indicating its niche as an interactive protocol alongside other mechanisms such as the OAI PMH push mechanism and batch loading. Interaction scenarios with SRU/SRW were examined and the current development between OCLC and OCLC PICA was covered. OpenSearch, SRU and Google/Widgets: Database Considerations and Experience (Derek Lane, CSC) EIMS is a catalog for EPA work products, projects and data. Providing access to catalog records in an efficient and accessible fashion has required us to track emerging standards for web-based search and provide commonly accepted simple xml representations. OpenSearch, A9's evolving standard for describing simple searches, has fit into existing RSS work easily. Lane described OpenSearch's new extensions for multi-field search and compared the technical and implementation properties of SRU and Opensearch. Library 2.0 (Ian Davis, Talis) Fundamental to the concept of Library 2.0 is the shift from delivery of library services solely within the library building, or via the library's own web site, towards the embedding of discrete library functions within a range of contexts. This presentation and demonstration illustrated how providing library services using SRU and related technologies can sustain an ecosystem of new and innovative applications. D+ : A Common Server for SRU and OpenURL (Robin Taylor, Edinburgh University Library) D+ is a software framework that brokers the searching of resources in distributed repositories. It is based on, and extends the open source SRW/U Server developed at OCLC. In addition, the server also acts as an 'OpenURL friendly' target by supporting queries conforming to version 0.1 of the OpenURL standard, rather than CQL. Taylor's presentation demonstrated the use of both query types in the context of a resource list application using D+ as the search web service. Metasearch and SRU: MXG, the Metasearch XML Gateway (Ray Denenberg, Library of Congress) NISO defines metasearch as "search and retrieval spanning multiple databases, sources, platforms, protocols, and vendors at one time." It cites the problem as follows: "Current systems require users to know how to select, access and search specific databases", and the goal is: "To create an environment that helps users find what they are seeking while minimizing what they need to know". In a more detailed elaboration, NISO attributes goals to meatasearch entities as follows: Metasearch provider (i.e. the metasearch engine): "offer more effective and responsive services"; Content/database provider: "deliver enhanced content and protect intellectual resources"; Client (i.e., library): "deliver services that distinguish them from Google and other free web services." Of these entities, the main focus is on interaction between the metasearch engine and content provider, rather than between the library and metasearcher. The NISO Metasearch initiative has been charged with identifying/developing standards/best practices to improve interoperability between metasearch engines and content providers, and identifying a simple search/retrieve protocol to help database providers more effectively interoperate with metasearching applications. As part of the latter charge, task group 3 of the metasearch initiative has been charged with evaluating SRU for suitability as a protocol between metasearcher and content provider. As part of this process, the MXG (Metasearch XML Gateway) specification has been developed, for communication between metasearcher and content provider, based on SRU and CQL. WSDL, UDDI, SOAP, REST: SOA Acronym Soup (Matthew Dovey, Oxford University) There has been a lot of activity on Web Services and now Service Oriented Architectures over the past five years. Neither of these terms is particularly well defined. "Web Services" might be SOAP-based or REST-based; the OASIS definition of Service Oriented Architectures could also describe CORBA and DCOM. REST itself is often vague as to its meaning (e.g., SRU whilst often described as REST is really only REST-Like!). Attempts such as the Web Service Interoperability Profile have attempted to rectify some of the interoperability issues surrounding Web Services (particularly in the Web Service Description Language), but issues remain, especially as you move higher up the Web Service stack (UDDI, WS-Addressing, etc.). This presentation described what all these acronyms mean and which ones are "safe" or "risky" from an interoperability perspective. Use of ZeeRex (Z39.92) to Describe Search and Retrieve Services (Robert Sanderson, University of Liverpool) ZeeRex is an XML schema developed over the last 5 years to describe the semantics of a service that supports retrieval, and typically search. It models only the interactions, not the protocol's syntax (which is left up to ASN.1, WSDL or similar), and hence can be used to describe different methods of doing the same thing – issuing a search and retrieving matching records. It could be used to describe, for example, Z39.50, SRU, OpenSearch, OAI and even such things as FTP, not typically thought of as a type of information retrieval protocol. This presentation primarily discussed modifications from the current SRU version ZeeRex 2.0 to the standardised Z39.92 and the advantages these modifications provide. Conclusions and Thoughts Generated after the Workshop Below are thoughts contributed by some of the workshop presenters after they had time to reflect on the issues discussed at workshop. This is perhaps one of the most interesting results of the workshop, and it is hoped that the views expressed will contribute to new ways of integrating services and information to the user. Ann Apps There are still many resources that do not provide a machine-readable interface. SRU is not very widely adopted or known about outside of the SRU cognoscenti who attended this workshop and the preceding SRU Users Group meeting. It would seem a good idea to advertise SRU as a low barrier, standard solution for the provision of machine access to resources. Advertising SRU services in online directories would encourage their use and improve the general profile of SRU. The semantic interoperability provided by SRU should be promoted as a means to implement dynamic middleware solutions, which cannot currently be achieved by the apparently ubiquitous SOAP Web Services. The information environment includes a diverse range of technologies. Attempts to persuade people to converge on a single service protocol are likely to be futile. Activities aimed at encouraging interoperation between services of different types would seem a better use of effort. The ability to discover within a registry a wide range of resources and their service connection details should assist in the eventual integration of different service protocols within a general service oriented architecture. Derek Lane SRU is a mature standard. The standards meeting mostly concentrated on changes that could be ignored by simple implementations, and that is a good thing. In an issue of direct interest to me (approximate counts), there were about four proposals of varying levels of complexity, and I was very glad that the simplest one was accepted. Context management (loss of default technology implementation; loss of http sessions) were echoed in the discussion. For example, record ids allow one to ignore http sessions; xpath for sort is expensive if you have to do it at the boundary of a large system with non-xml internal structure; base URLs maintain context for resolving relative urls outside of http sessions; hit counts for subqueries are cheap for post-indexes but insanely expensive for SQL. As SRU is deployed in new environments I expect to see more issues of this type appear. OpenSearch integration seems possible, at least so far. There is a profile of CQL that can be mapped to OpenSearch, so that an OpenSearch implementor (of which I expect there to be many) can have a simple implementation story: use the generic CQL to OpenSearch mapping and edit the code for formatting results one more time. This kind of implementation is attractive, but will not last over the years without some level of coordination between SRU and OpenSearch. It is also possible to add capability to SRU servers to speak OpenSearch. This could be worthwhile for sophisticated SRU servers to do (it increases the number of clients for existing implementations), but it will not directly affect the number of SRU servers available. Ian Davis I feel that there are huge advantages to be gained by adopting web-friendly service interfaces such as SRU. Although there are more performant or more robust protocols available, none have the inherent scalability advantages that derive from utilizing the world wide web infrastructure, nor do they have the benefit of ubiquitous client toolkits. I expect to see many more rich web applications using SRU natively from the client although perhaps limited to a smaller syntax profile. It would be interesting to explore the utility of SRU outside of its traditional focus, perhaps as a search interface to some of the larger community publishing or discussion sites. Aside from RSS and OpenSearch, there appears to be no standard machine interface for searching these types of content. One tactic could be to donate an SRU interface to a selected few of the open source CMSs. I also think more work has to be undertaken around the area of security and identity, and I was very interested to see the presentation of Shibboleth. I would like to see how SRU can fit better with these kinds of mechanisms and also with future efforts such as the IETF DIX work. Overall I was pleased to learn that there are few interoperability issues between different implementations of the standard, and there even seems to be some scope for interoperating with other protocols that address the same space. Future work should seek to preserve and build on this level of interoperability, and it should feature strongly in the prioritisation of new features. John Paschoud When I started preparing this presentation for the workshop, I was quite cynical whether there was a significant requirement for authentication and authorisation to inter-operate with SRU (worth the significant development effort that will be required to implement such extensions). I came away convinced that there is a case for work on integration, and that machine or proxy clients (including SRU) need to be considered by the Shibboleth developers in future releases. The devolved / distributed functions of access management implemented by Shibboleth (and other SAML-based methods) share some problems in common with other services, such as identifying the host organisation of an end-user, so as to access the appropriate Shibboleth Identity-Provider, OpenURL resolver and other services. Therefore I believe that the best solutions lie in development of collection-level registries (such as the IESR) to describe resources, and the widening of scope of corresponding registries (or "resolver-resolver" services, such as those of OCLC and Edina for OpenURL) to describe organisations with which users are affiliated. Matthew Dovey What became apparent during a number of the presentations is that there are more people using SRU, often in very imaginative and innovative ways (the European Library being a prime example), than we were previously aware of, and presumably there are many more systems using SRU somewhere under the covers. Although there is a list of known implementations and applications on the SRU website, it is currently a fairly sparse list, which is sometimes taken to indicate that the take up of SRU is poor, but the workshop demonstrated that this is not reflected in reality. In this respect the workshop format was a success that needs to be taken forward. Previous SRU meetings have concentrated on technical aspects in the further development of the standard itself, which is often beyond the level of detail that implementers of SRU need, or that those currently assessing the value of SRU to their application need. A workshop purely aimed at demonstrations of applications (rather than just toolkits) and implementation issues is therefore crucial to promoting the use of SRU. Rob Sanderson There is a very real need to provide a roadmap for implementers in terms of how to progress from a quick and easy implementation to one which is more functional. SRU has to play nicely with the other protocols, especially where those protocols do related and useful jobs. SRU is not a replacement for OAI, OpenURL or A9's OpenSearch – it must both adapt to and promote the current environment, but be agile enough to adapt to future requirements that the environment presents. On the other hand, the environment should also recognise the role that SRU does play in terms of providing a stable and functional protocol, capable of being integrated within applications and services, and capable of significantly furthering the goals of the 'Web 2.0' agenda. The exact mechanics of SRU itself are not fundamentally important, the wisdom gained over 20 years of Z39.50 and SRU is what is important to ensure is not lost. The related semantic aspects (ZeeRex and CQL) are usable outside of the SRU protocol and this can be encouraged without fear of somehow lessening SRU. Quite the opposite in fact; syntactic interoperability is comparatively easy whereas ensuring that communities have access to the same semantics is the challenge about which the library world has years of useful experience to share. Theo van Veen The coexistence of SRU, MXG and Opensearch indicates that there are still ways for us to improve interoperability in search and retrieval. There is a need for a standard that is low barrier to implement, while at the same allowing the possibility of using complicated queries. We are dealing with a dilemma: metasearch engines want to broadcast a single query to many services, yet it is sometimes hard to convert a single query to an appropriate request for each individual service. With Opensearch on one end of the spectrum, SRU on the other end, and MXG in the middle, we might end up with clients and services that need to support more than one protocol, and that is certainly not an ideal situation. The problem seems to be related to the support of CQL. Clients should be able to recognize that a service does not support CQL from the explain record (or in the worst case from the absence of an explain record). SRU services should recognize queries that are not CQL and should therefore be treated just as a list of terms. In this way queries that are not CQL queries can be broadcasted to all services, even those that don't support CQL. Another interesting item is the relation SRU has to Shibboleth. SRU responses, explain as well as seachRetrieve, might depend on the credentials obtained from authentication services. An interesting question is whether SRU could carry authentication information by using SRU's extension mechanism. For example, a requested service has to redirect the user to a "where you from" service to obtain the address of the authentication service of the user's institution. When the requested service is an SRU service and when authentication information is available to the SRU client, it can be passed directly to the SRU service as extra parameters. Especially when a local SRU client is being used, it is convenient that the SRU client can rely on XML responses rather than being redirected to another page for authentication purposes. Appendix 1: Report of the SRU Implementer Group Meeting (Ray Denenberg) The SRU Implementer Group meeting preceding the Integration of Services; Integration of Standards workshop was very fruitful. There will be a much-needed bibliographic index set developed, based on MODS semantics. There will be an OpenURL profile, which will prescribe a mapping from these bibliographic indexes to OpenURL keys. The profile may also specify how an SRU response can facilitate the client process of formulating an OpenURL: An SRU client receives a record and wants to create an OpenURL where the object described by that record will be the referent. The client could request the record for that item in the appropriate OpenURL metadata format, which could then be used directly as the context object. The "sort" proposal was accepted. It is felt to be a major improvement over the way sorting is done in SRU 1.1. SRU via Post is defined. SRU now has three forms: (1) via URL (as originally), (2) CQL (currently known as the "Common Query Language" will instead be the "Contextual Query Language), and (3) SRU over SOAP (formerly SRW, and the SRW acronym will be dropped). There is progress towards aligning SRU and OpenSearch. The strategy we discussed is to make OpenSearch requests legitimate SRU requests. (See: <http://www.loc.gov/standards/sru/march06-meeting/report.html>.) Then an SRU-friendly OS server will be able to do something intelligent when it gets an SRU-loaded OS request. There are clear advantages of SRU over OpenSearch: CQL, schemas, scan, and diagnostics. An OAI over SRU profile will be defined. It will specify that a server support three indexes necessary for OAI: identifier, last modification date, and collection identifier. A basic agreement was reached on how to incorporate various bits of information in a request or response. This would include hit and term counts returned by the server. A client will be able to indicate that it does not care whether or not the server includes the record count in the response. The reason for this is the concern that in some environments, counting the records accurately is expensive. A diagnostic will be defined to indicate that the reported record count is approximate. The basic standardization plan presented was approved in principal, to take SRU to OASIS. The philosophical basis for this decision is as follows: The world clearly needs a single, well-defined, powerful protocol for searching by URL with results returned in XML. Competing protocols are being developed; one of these will drive this standardization effort if SRU does not, and if so, it won't meet our needs. We conclude that SRU needs to drive this effort, and needs to involve the other interested communities. It follows that SRU standardization needs to occur in a mainstream standards body. OASIS is probably the only mainstream standards body whose scope covers SRU. OASIS is a neutral ground for merging competing de facto standards into an industry standard. It uses a lightweight process to promote industry consensus and unite disparate efforts. We would first form a public discussion list to determine whether to form an OASIS Technical Committee, based on the likelihood that a standard would actually emerge from an OASIS TC: whether there are there intrinsic, insurmountable differences of opinion; and whether other parties (A9, etc.) will participate. The discussion would also seek to determine how much change input from other parties will introduce, and how long it will take to get to a committee draft (the version prior to public comment and a vote of all OASIS members). The public list process might take roughly 3 months, the technical committee, six months, and then it might take another three month for a standard to emerge. We will likely first formalize the "easy" changes into SRU version 1.2. and take the more complex problems into the standardization process. The result of the OASIS standardization process would be version 2.0. Included for standardization along with SRU would be CQL, Scan, the Explain Operation (but not the Explain specification itself), and mappings: SRU over SOAP (i.e., SRW), and SRU Post. Appendix 2: Acronyms Used in this Workshop Report ASN.1 (Abstract Syntax Notation). Formal language for abstractly describing messages to be exchanged among an extensive range of applications. COinS (Context Objects in Spans). A standard to put contextual information in the HTML span tag. DIX (Digital Identity Exchange). A still to be defined identity information exchange protocol. IETF (Internet Engineering Task Force). Open international community concerned with the evolution of the Internet architecture. MXG (Metasearch XML gateway). A prescribed dumbdown SRU. CORBA (Common Object Request Broker Architecture). A standard communication protocol for sharing objects across distributed platforms. CQL (Contextual Query Language). The query language for SRU. OAI-PMH (Open Archive Initiative – Protocol for Metadata Harvesting) OAISIS. Organization for the Advancement of Structured Information Standards. OpenSearch. A technology for publishing of search results in a format suitable for syndication. OpenURL. A type of URL containing metadata in a standard format. REST (Representational State Transfer). An architectural style for accessing Webservices via HTTP. SAML (Security Assertion Markup Language). An XML based standard for exchange of authentication and authorisation information. Shibboleth. An architecture and open-source implementation for federated identity-based authentication and authorization infrastructure. SOAP (Simple Object Access Protocol). A protocol for exchanging XML based messages over HTTP. SRU (search and Retrieval via URL). A protocol for search and retrieval via Web services. WSDL (Web Service Definition Language). An XML format for describing Web services. UDDI (Universal Description, Discovery, and Integration). An XML based registry of services Zeerex (Z39.50 Explain, Explained and Re-Engineered in XML). A format for SRU and Z39.50 services to provide their own service description. Copyright © 2006 Theo van Veen

	Top \| Contents Search \| Author Index \| Title Index \| Back Issues Project Update \| In Brief Home \| E-mail the Editor

	D-Lib Magazine Access Terms and Conditions doi:10.1045/may2006-vanveen