Libraries and the Long Tail: Some Thoughts about Libraries in a Network Age

Search | Back Issues | Author Index | Title Index | Contents

D-Lib Magazine
April 2006

Volume 12 Number 4

ISSN 1082-9873

Libraries and the Long Tail

Some Thoughts about Libraries in a Network Age

Lorcan Dempsey
Vice President, Research, and Chief Strategist
OCLC Online Computer Library Center, Inc.
<dempseyl@oclc.org>

	Discussions of the long tail that I have seen or heard in the library community strike me as somewhat partial. Much of that discussion is about how libraries contain deep and rich collections, and about how their system-wide aggregation represents a very long tail of scholarly and cultural materials (a system may be at the level of a consortium, or a state, or a country). However, I am not sure that we have absorbed the real relevance of the long tail argument, which is about how well supply and demand are matched in a network environment. It is not enough for materials to be present within the system: they have to be readily accessible ('every reader his or her book', in Ranganathan's terms), potentially interested readers have to be aware of them ('every book its reader'), and the system for matching supply and demand has to be efficient ('save the time of the user') [1]. Think of two numbers in this context. One is about interlibrary lending (the flow of materials between libraries), and the other is about circulation (the flow of materials within a library). The first is that interlibrary loans (ILLs) account for 1.7% of overall library circulations. This goes up to 4.7% if we just look at academic libraries [2]. What this suggests is that we could do a better job making it easier to find and obtain materials of interest wherever they are, or, in other words, of aggregating system-wide supply. The flow of materials from one library to another is very low when compared to the overall flow of materials within libraries. This might be what one would expect if the overlap between library collections were high. Last year, we at OCLC did some work looking at the aggregate collections of the Google 5 (G5) libraries. There we discovered that more than 60% of the G5 aggregate print book collection consists of books held by a single G5 library [3]. This suggests that collections are not as 'vanilla' as is sometimes thought. The second number is about circulation. We have also done some work looking at circulation data in two research libraries across several years. In each case, about 10% of books (we limited the investigation to English language books) accounted for about 90% of circulations. This shows that many books are not being borrowed (of course, some may be consulted in the library) [4]. These numbers suggest that many items in a specific collection may be underused, and that there is limited exchange of materials between collections. As we move forward, we will be increasingly asked if this is an optimal system-wide arrangement, especially as readers increasingly move to the network. We can think of requirements in the terms expressed by the following subset of Ranganathan's laws: books are for use; each book its reader; each reader his or her book; save the time of the user. I want to look at some of these questions in more detail within a context established by the 'long tail' discussion [5]. The long tail First a recap of the long tail argument, which since the publication of the original Chris Anderson Wired Magazine article has been much discussed. [6] The argument is about how the Internet changes markets. In the 'physical world', the costs of distribution, retail and consumption mean that an item has to generate enough sales to justify its use of scarce shelf, theatre or spectrum space. This leads to a limit on what is available through physical outlets and a corresponding limit on the selection potential of users. At the same time, the demand for a particular product or service is limited by the size of the population to which the physical location is accessible. This scarcity drives behaviors, about which we may have made mistaken assumptions: For too long we've been suffering the tyranny of lowest-common-denominator fare, subjected to brain-dead summer blockbusters and manufactured pop. Why? Economics. Many of our assumptions about popular taste are actually artifacts of poor supply-and-demand matching – a market response to inefficient distribution. [6] These inefficiencies are mitigated in a network environment. And, accordingly, so the argument goes, we observe different behaviors with network services: Unlimited selection is revealing truths about what consumers want and how they want to get it in service after service, from DVDs at Netflix to music videos on Yahoo! Launch to songs in the iTunes Music Store and Rhapsody. People are going deep into the catalog, down the long, long list of available titles, far past what's available at Blockbuster Video, Tower Records, and Barnes & Noble. And the more they find, the more they like. As they wander further from the beaten path, they discover their taste is not as mainstream as they thought (or as they had been led to believe by marketing, a lack of alternatives, and a hit-driven culture). [6] Netflix, for example, aggregates supply as discussed here. It makes the long tail available for inspection. However, importantly, it also aggregates demand: a larger pool of potential users is available to inspect any particular item, increasing the chances that it will be borrowed by somebody. Anderson provided some interesting numbers to show the impact of this phenomenon in his original article, and these have been updated on his website [7]. He notes that the aggregation of the long tail is a major part of the business model of the leading Internet services (Amazon, eBay, Google, etc.). Google, for example, services the long tail of advertising – those for whom the bar was too high in earlier times of scarce column inches or broadcast minutes. And by aggregating demand, delivering a large volume of users, they increase the chances of the advertisement being seen by somebody to whom it is relevant. Of course, merely being on the web is only a part of the issue. What the web allows is consolidation. Anderson's examples are massive, consolidated web presences. As suggested a moment ago, this consolidation has two aspects: aggregation of supply and aggregation of demand. Each is important. Five things come to mind about the aggregation of supply and demand. The first is transaction costs, the costs incurred – whether in attention, money, expertise or some other resource – in achieving one's goal. High transaction costs inhibit use: they increase the friction in the system; low transaction costs encourage use: they increase flow through the system. iTunes for example, has low transaction costs. The burden of discovering tracks of interest, transacting for their use and downloading them is low. The tracks are immediately available. Netflix has higher transaction costs given the delays caused in the mail system, but still works to provide as frictionless a workflow as possible for the user. We can think of two aspects of transaction costs: search costs and fulfillment costs. How difficult is it to discover something, and once it is found, how difficult is it to acquire a service or an object? The second thing to come to mind is the availability of consolidated data about choices and behaviors [8]. Netflix, Amazon, Rhapsody and others refine their service based on what they know of their users' choices, mined directly from the aggregated clickstream. This allows them to develop services that can further develop reflexively based on usage, and that can be tailored around particular behaviors and preferences. Furthermore, additional services can by built by leveraging this mined user data – recommender services for example. These services potentially reduce transaction costs, because they use aggregate data about behaviors to better target their offerings. The third thing to consider is inventory. These large web presences consolidate inventory: they are not encumbered by the costs of massively redundant, just-in-case inventory, scattered through multiple physical delivery points. This consolidation may happen in virtue of the digital nature of the collections, as with iTunes. Or, where physical inventory is involved, as with Amazon, they can consolidate in strategic locations, or with particular suppliers, as inventory need not be tied to physical storefronts. They manifest their store through the management and presentation of data, not through the actual display of goods in a physical store. And, of course, consolidation of inventory may reduce transaction costs by streamlining fulfillment. The fourth thing is about navigating the consolidated resource. Google introduced a major innovation with its ranking approach, by aggregating and mining the linking choices made by web page authors. Amazon is interested in rich interconnection through reviews, wish lists, reader selected lists, the various 'phrases' (capitalized and statistically improbable), and so on. Amazon provides a rich texture of suggestion [9]. In each case, simple aggregation is not good enough: also needed are effective ranking, recommending, and relating. And finally, large web presences help aggregate demand. The level of use of a resource partly depends on the size of the population to which it is accessible. One aspect of the long tail argument is that the aggregation of demand – extending the population to which a resource is accessible – means that resources have a better chance of finding interested users. In other words, use will extend down the long tail. So, as discussed above, Netflix finds viewers for movies that might not move in a physical outlet, because Netflix aggregates demand across a larger population than a single physical store can. Google, iTunes, Amazon, eBay: the gravitational pull of these resources on the open web means that they have achieved a wide audience of potential buyers or sellers. This increases the chances that resources they disclose will rendezvous with interested consumers on the web. So, they aggregate demand by drawing users to them. However, increasingly they also go to users. Google, Amazon and eBay, for example, are very actively trying to reach into multiple user environments through the use of toolbars, APIs and other approaches. Libraries and the long tail So, now let's turn back to libraries, and focus on these two issues: the aggregation of supply, and the aggregation of demand. For convenience of discussion, I focus primarily on books, drawing in other resources occasionally. I hope readers can see how the discussion can be extended to cover other parts of collections. The aggregation of supply and demand in libraries Libraries have been subject to the same physical constraints as, say, bookstores, albeit within a different service context. The library collection is not limited to the current or the popular: the library has some responsibility to the historical record, to the full range of what has been made available as well as to what is now available. That responsibility varies by library type, and is variably exercised. The library has met that responsibility in two ways: by assembling a local collection, and by participating in systems of extra- and inter-library provision. These latter systems may be organized in different ways; the resource-sharing consortium is a common pattern, and a library may belong to several. The library collection is driven by local perception of need and available resources: collection development activities exist to balance resource and need. A large research library and a busy public library system will have different profiles, but both are influenced by physical constraint. In the material world, the transaction costs of access to a distributed library collection are high, so those libraries that could afford it sought to amass large local collections in order to aggregate supply locally. Think of, for example, the large just-in-case research library collections. And, indeed, we are still measuring research library strengths by number of volumes. A busy public library may move towards the bookstore model. I was at a presentation recently about a busy public library system in an affluent suburban area. They turned over 15% of their stock per annum: they want stock to circulate and to keep it fresh for a demanding audience; just as in a bookstore, titles had to justify their occupation of limited shelf space. Next I discuss the issues I mentioned above (transaction costs, data about choices and behaviors, inventory, navigation and aggregation of demand through major web presences) as they apply to libraries. Transaction costs A library user has a range of discovery tools and services that provide access to a fuller range of scholarly and learning materials. This in turn is supported by a well-developed apparatus of deposit libraries, resource sharing systems, union catalogs, cooperative collection development, document supply, and other collaborative and commercial services. This 'apparatus' may be imperfectly and intermittently articulated, but it is a significant achievement nonetheless. What an individual library may not be able to supply should be available within the overall system in which libraries participate. However, this availability is bought at the expense of some complexity, which in turn means that the transaction costs of using the system are high enough that some needs go unrecognized or unmet. A library user may not be familiar with available tools or may not be aware that other materials are available. Local policies may restrict some types of access. Thus, historically, one can say that while library services explicitly aim to aggregate supply and demand both to meet user needs and to maximize use of resources within an overall apparatus of provision (see Ranganathan's laws again), imperfect articulation of that apparatus means that users are variably served. To make this more concrete think about the D2D chain: discover, locate, request, deliver [10]. Here lack of integration increases transaction costs. By integration, I mean within processes (there are many discovery options, for example) and between processes (the processes are not always connected in well-seamed ways). Discover. The discovery experience is a fragmented one. A user has a range of discovery tools available and may not always know which is the most suitable. This is especially the case with the journal literature, in which case the deployment of metasearch approaches is a partial response. Even for books, users may have to navigate a patchwork of catalogs to find what they are looking for; search costs are high [11]. What might one do? One approach is consolidation: fewer but larger pools of metadata to support discovery would help. Another is 'syndication', moving the metadata to where it might more readily rendezvous with the reader. I use syndication as a general term to include such ideas as letting metadata flow into citation managers, search engines and other resources, and to expose it in services upon which other applications may build. The latter is familiar to us from Amazon, which can make its data and services available in other interfaces through its APIs. Locate. Having identified an item of interest, a user needs to find a service that will supply it. This may be as simple as noting a call number and walking to a shelf. Or it may involve a resolution service that actually provides several service options. Or it may involve a further discovery experience in a library resource if the item was originally found outside the library. This latter case is especially interesting, as library users have many more discovery options outside the library than within it. What is needed is a way of connecting the discovery experience to a library service. Here Coins provides a potential approach, coupled with various browser tools [12]. Request. This is another transaction, which may involve one or more steps. It can be simple, as in placing a hold, or more complex if a form has to be filled out, and so on. Increasingly, libraries may want to route requests in several directions: allowing a user to buy from Amazon, initiate an ILL request, initiate a document supply request, or place a hold on the requested material. Deliver. Again, several potential options exist for resource delivery, which can involve more or less difficulty depending on how the delivery options are presented and on the disposition of supplier and user. This ties interestingly to the inventory question, and I come back to this below. You get the idea: at each stage, there are potentially many processes that need to be connected, and they potentially need to be connected to each other in different combinations. The better connected, the lower the transaction costs. Indeed, it is interesting to wonder if resolution services will move more to the center of library operations, as they are effectively 'service routers' connecting multiple discovery experiences to multiple fulfillment services. Data about choice and behaviors Transactional and behavioral data is used to adapt and improve systems. In the library community we have not yet fully exploited these opportunities. Examples of such data are holdings data (choices made by libraries), circulation and ILL data (choices made by users), and database usage data (choices made by users). We have few services yet that aggregate such data. Libraries are increasingly interested in using this data to refine services and build new services as discussed above. Think of recommender services based on circulation data, for example. As new services and user behaviors co-evolve in changing digital spaces, it is likely that we will want to capture new forms of data. Inventory The historic library model has been physical distribution of materials to multiple locations so that the materials can be close to the point of need (as in the bookstore model). And again, in the network environment, of course, this model changes. Resources do not need to be distributed in advance of need; they can be held in consolidated stores, which, even with replication, do not require the physical buildings we now have. As we move forward, and as more materials are available electronically, we will see more interest in managing the print collection in a less costly way. We can see some of this discussion starting in relation to the mass digitization projects and the heightened interest in off-site storage solutions. In each case, there is a growing interest in being able to make investment choices that maximize impact – based, for example, on a better understanding of what is rare or common within the system as a whole, on what levels of use are made of materials, and so on. In fact, again looking forward some time, it would be good to have management support systems in place that make recommendations for moving to storage or digitization based on patterns of use, distribution across libraries, and an agreed policy framework. There are two medium-term questions that are of great interest here. First, what future patterns of storage and delivery are optimal within a system (again, where a system may be a large library system, a state, a consortium, a country)? Think of arranging a system of repositories so that they are adjacent to good transport links for example, collectively contracting with a delivery provider, and having better data intelligence for populating the repositories, based on patterns of use and demand. Second, think of preservation. Currently, we worry about the unknown long-term costs of digital preservation. However, what about the long-term costs of print preservation? I contend that for many libraries they will become unsustainable. If the use of large just-in-case collections declines, if the use of digital resources continues to rise, if mass digitization projects continue, then it becomes increasingly hard to justify the massive expense of maintaining multiple collections – especially where there is growing demand for scarce space. Long-term we may see a shift of cost from print to digital, but this can only be done if the costs of managing print can be reduced, which in turn means some consolidation of print collections. As these questions push us towards a system-wide perspective, aggregating data about supply and demand gives a better sense of what is collectively held in the system, what is collectively being used in the system, and from this, how decisions about the optimum disposition of collections can be facilitated. Navigation Library aggregations have not exploited the structure of the data very effectively to support navigation. The interest in faceted browse, FRBR [13], recommendation, ranking by holdings or other data, and so on is testament to a realization that better ways to exploit the large bibliographic resource are needed. Ranking, recommending, and relating help connect readers to relevant materials and also help connect the more heavily used materials to potentially useful, but less used, materials. Aggregation of demand The library resource is fragmented. It is fragmented within the library (there are many databases to choose from; they may be organized in a different ways in different libraries). It is fragmented across libraries, as discussed above. In the new network environment, this fragmentation reduces gravitational pull. It means that resources are prospected by the persistent or knowledgeable user, but they may not be reached by others to whom, nevertheless, the resources are potentially useful. Additionally, the library resource cannot be very well assimilated into user workflows. The availability of RSS feeds, APIs, and other approaches are making it possible to insert the library into the user environment (rather than always expecting the user to come to the library environment), but we are only in early stages in this regard. There are two issues here. The first is that libraries may need to do more work to aggregate demand within their own institutions. And one approach to this is to consolidate the library web presence (think of metasearch for example) and to project library services into user workspaces (embedding database searches in course pages, for example). The second issue is that it may be difficult for individual libraries to aggregate demand above the individual library level. Union catalogs and resource sharing systems have historically operated above an individual library level, and we are now seeing organizations who supply those services thinking about how to re-develop as major web presences that help aggregate demand (backed up by aggregated supply). Examples here are RedLightGreen, OpenWorldCat, and Libraries Australia. Library organizations are also very keen to be visible within the major web-based search engines and book selling sites. Of course, one way for a library to try to reach its local audience is to make its resources visible in these major web presences, which is where its users spend much of their time and attention. This provides an interesting perspective from which to view Google Scholar and Google Book Search, in particular their interaction with libraries. Take Google Book Search: what Google is doing here is potentially aggregating demand for books: it will be interesting to see what influence this has on their use. Presumably a case has been made that there is potential interest in the full scope of those collections, or, in other words, in moving down the long bibliographic tail (and remember the figures I presented above about the current situation). They are also aggregating demand for books and journals through Google Scholar. And, to avoid frustrating users, they are aggregating supply behind the discovery experience. Hence they are working with resolver data and multiple suppliers to complete the locate/request/deliver chain for journal materials. In addition, they are working with OCLC to connect the Google Scholar discovery experience to the 'Find in a Library' option for fulfillment. What OCLC is doing is making metadata about those books available to the major search engines and routing users back to library services, to complete the D2D chain for books. To the extent that a large amount of materials are made available through these services, Google is aggregating demand, aggregating supply, and reducing transaction costs. Logistics and libraries So, briefly, what are some consequences for libraries? Libraries have rich deep collections, and the aggregate library system is a major achievement. However, in our current network environment, libraries compete for scarce attention. This suggests that if the 'library long tail' is to be effectively prospected then the 'cost' of discovering and using library collections and services needs to be as low as possible. This is a logistics issue. Logistics is about matching supply and demand in a timely fashion across a network of potentially many parties. Within a particular domain, this is what libraries have always done, and some of the recent innovation in libraries has been precisely to automate supply chains (think of resolution services, for example). Here are some ways of improving aggregation of supply and demand: Unify discovery experiences: Fragmentation is costly, and fewer but larger resources might help. Project library discovery experience into other environments: search engines, browser tools, RSS aggregators, etc. Better integrate D2D, both within operation (for example, combine request options – Amazon, place hold, ILL, ...) and between operations: The aim should be to be able to place a 'get it' button anywhere and guide the user through simple choices. In the medium term, explore how 'inventory' and 'distribution' are managed across a system: (This should be done whether a system is a library, a consortium, a state, or a country). Utilize better 'intelligence' within the network: This involves better representing the entities within the network. It touches on the growing interest in 'registries' – registries of services (a registry of deep OPAC links, or OpenURL resolvers, or Z39.50 targets are examples here), registries of collections (a registry of database descriptions is an example), registries of institutions (see the very fine National Library of Australia Libraries Gateway for example), registries of policies (increasingly important, as libraries will organize within policy frameworks), and so on. In this context, it is interesting to reflect that the distinctive value of union catalogs is the holdings data: a union catalog is a registry of 'information object' data related to holding institutions. Collectively, the registry data discussed here will drive the applications that support 'library logistics'. Provide transaction support: In an environment of multiple transactions between libraries it is useful to have a way of tracking and reconciling between libraries. OCLC's Fee Management service [14] is an example of a service that supports some classes of transaction. (Think of how PayPal has released various possibilities of interaction.) Aggregate demand through significant web presences: If more users are exposed to library collections, the collections will be used more. Of course, in some contexts demand from external users has been one reason for not more widely exposing collection information. However, the dynamics of the network have changed use. The major Internet search presences are often the first and last resorts of research, and fragmentation of library resources reduces their gravitational pull. Libraries are having to compete for the attention of their own users. They need to be in user environments, and the open web is now very much part of those environments. This leads to consideration of the discovery strategies mentioned. Conclusion Libraries collectively manage a long tail of research, learning and cultural materials. However, we need to do more work to make sure that this long tail is directly available to improve the work and lives of our users. Books, after all, are for use. I mentioned Ranganathan near the beginning of this article. Ranganathan's five 'laws' have classic status in the library community. They express something that remains relevant even as contexts change. Think of 'book' as shorthand for the range of resources the library provides. I wrote about the 'long tail' in terms of aggregation of supply and aggregation of demand. In this context, aggregation of supply is about improving discovery and reducing transaction costs. It is about making it much easier to allow a reader to find it and get it, whatever 'it' is. Or, in other words, 'every reader his or her book'. Aggregation of demand is about mobilizing a community of users so that the chances of rendezvous between a resource and an interested user are increased. Or, in other words, 'every book its reader'. Finding better ways to match supply and demand in the open network will 'save the time of the user'. How we do this is a part of a general reshaping of activities and organizations in a network environment. We need new services that operate at the network level, above the level of individual libraries. These may consolidate D2D, or management of collections, or other services. They may be collaboratively sourced or provided by third parties. It does pose interesting questions about how resources are allocated to best achieve local impact and system-wide efficiencies. This change also shows that the library continues to be 'a growing organism'. Acknowledgement Discussion with my colleague Brian Lavoie improved this article; I remain responsible for its deficiencies. Notes and References [1] Ranganathan's Five Laws of Library Science remain a valuable touchstone <http://en.wikipedia.org/wiki/Five_laws_of_library_science>. They are listed there as: Books are for use. Every reader has his or her book. Every book has its reader. Save the time of the reader. The library is a growing organism. [2] The source of this number is OCLC marketing, based on available data. [3] Brian Lavoie, Lynn Silipigni Connaway and Lorcan Dempsey. Anatomy of Aggregate Collections: The Example of Google Print for Libraries. D-Lib Magazine, Vol. 11, No. 9, September 2005. <http://dx.doi.org/10.1045/september2005-lavoie>. (Reprinted in Zeitschrift fur Bibliothekswesen und Bibliographie, Vol. 52, No. 6, 2005. pp 299-310). [4] This data is from unpublished work by Lynn Silipigni Connaway and Edward T. O'Neill. [5] This short article adapts an earlier piece: Lorcan Dempsey's Weblog. Libraries, Logistics and the Long Tail. February 15, 2006. <http://orweblog.oclc.org/archives/000949.html>. Some responses to that post are discussed in Lorcan Dempsey's Weblog. Systemwide activities and the long tail. <http://orweblog.oclc.org/archives/000955.html>. [6] Chris Anderson. "The long tail". Wired Magazine, Issue 12.10 – October 2004. <http://www.wired.com/wired/archive/12.10/tail.html>. [7] See Chris Anderson's long tail web site at <http://www.thelongtail.com/>. [8] Usage data seems too flat an expression for what I mean here. Elsewhere I have used the phrase 'intentional data', modeled after John Battelle's characterization of the 'database of intentions'. This is the accumulated usage data of the internet search engines. See: John Battelle. "The SearchNew York : Portfolio, 2005. page 6. <http://www.worldcatlibraries.org/wcpa/isbn/1591840880>. [9] Lorcan Dempsey's Weblog. "The simple search box and the rich texture of suggestion. March 12, 2006. <http://orweblog.oclc.org/archives/000966.html>. [10] Lorcan Dempsey's Weblog. Discovery, locate ... horizontal and vertical integration. November 20, 2005. http://orweblog.oclc.org/archives/000865.html>. [11] See the article by Swarthmore faculty member, Tim Burke, Burn the Catalog, January 20, 2004. <http://www.swarthmore.edu/SocSci/tburke1/perma12004.html>. [12] Coins. OpenURL COinS: A Convention to Embed Bibliographic Metadata in HTML. <http://ocoins.info/>. [13] FRBR (Functional Requirements for Bibliographic Records). See, What is FRBR? <http://www.loc.gov/cds/FRBR.html>. [14] OCLC's Fee Management service, <http://www.oclc.org/resourcesharing/features/feemanagement/default.htm>. Copyright © 2006 OCLC Online Computer Library Center, Inc.

	Top \| Contents Search \| Author Index \| Title Index \| Back Issues Editorial \| Opinion \| Next Commentary Home \| E-mail the Editor

	D-Lib Magazine Access Terms and Conditions doi:10.1045/april2006-dempsey