David E. Wojick
Walter L. Warnick
Bonnie C. Carroll
With the United States federal government spending over $130 billion annually for research and development, ways to increase the productivity of that research can have a significant return on investment. It is well known that all scientific advancement is based on work that has come before. Isaac Newton expressed this thought most eloquently in 1676, when he wrote, "If I have seen further than others, it is by standing on the shoulders of giants."
The process by which science knowledge is spread is called diffusion. It is therefore important to better understand and measure the benefits of this diffusion of knowledge. In particular, it is important to understand whether advances in Internet searching such as simultaneous, ranked searching of distributed digital collections made broadly available via the Internet can speed up the diffusion of scientific knowledge and accelerate scientific progress. Near-term opportunities continue to emerge to further speed up knowledge diffusion. To help craft a strategy for converting opportunities to reality, research is needed on the impact such speeding up of knowledge diffusion has on the advancement of science.
This article discusses these issues and describes research being conducted by the Office of Scientific and Technical Information (OSTI) of the United States Department of Energy (DOE) under its strategic initiative, Innovations in Scientific Knowledge and Advancement (ISKA).
Diffusion of Scientific Knowledge
Almost all communication, whether spoken or written, constitutes the sharing of knowledge. Although much of this knowledge is personal and local, our civilization is based on the widespread use of general knowledge. One of the most eloquent proponents of the diffusion of knowledge was Thomas Jefferson, who in 1786 said, "I think by far the most important bill in our whole code is that for diffusion of knowledge among the people. No other sure foundation can be devised for the preservation of freedom and happiness." 
Scientific publication, in all its myriad forms, is a huge system of deliberate knowledge diffusion. As collectors, organizers, and disseminators of published knowledge, all libraries, including digital libraries, have knowledge diffusion as their primary mission.
Science is organized into various research communities, each pursuing a particular set of scientific problems within their discipline. Most of the knowledge produced within a given community stays within that special scientific arena because the nexus of interactions both interpersonal and through publication exchange are there. However, as we see from the increasingly interdisciplinary nature of research today, research results from one community increasingly are useful in a wide array of other communities as well.
With the goal of advancing science, principal concerns are: knowing how knowledge flows within science, and knowing how new knowledge and technology flow into science. There are two different kinds of knowledge flow: (a) forward or within-community flow and (b) lateral or between-communities flow. These kinds of flow are of special interest, because diffusion of scientific knowledge is essential for scientific progress, and it is believed that a corollary exits such that as diffusion of knowledge is speeded up, scientific progress will be accelerated also.
Increasingly science information is in digital format, and today, Internet search is the principal means by which the outward flow of this information is facilitated and determined. To address the complexity of the search issue, we use the term global discovery for the act of searching across heterogeneous environments and distant communities.
There are thousands of scientific research communities, with millions of researchers and thousands of journals and multiple data sources. It is challenging for a particular scientist to identify the appropriate combination of digital resources to expand his or her research horizons, but this challenge can be better met through global discovery. Global discovery has the potential to facilitate the advancement of science. If scientists could easily discover the initial breakthroughs being made in communities other than their own, then scientific knowledge diffusion would be greatly accelerated. Thus, global discovery itself has become a necessary focus area for research.
Significant strides have been made in global discovery in recent years, but the vast majority of scientific information resources continue to be held in deep web databases that many search engines cannot fully access. Some search engines such as Google Scholar are attempting to change this situation by harvesting small parts of the deep web, but at the time of writing, this remains an effort in progress.
Bibliographic databases offer limited information about resources; however, they only contain a tiny fraction of the total content of a resource, such as a 100-word abstract to a 10-page article or a 100-page report. The problem with this limited information is that it often leaves important content undiscoverable, because abstracts usually focus on community-specific aspects of the research while more global aspects of the research, such as mathematical technique, are barely touched on, if at all.
Thus, the basic problem is that while vast quantities of scientific information are available in principle via the web, there still is no simple way for a scientist to get to it all. The information is mostly available only in community-specific databases. Because of this distribution, in-depth global discovery still proceeds primarily on a community-by-community basis. In most cases, the labor involved in such searching has been prohibitive; thus, up till now global discovery has been achieved only to a very limited degree.
However, another way to improve global discovery is emerging, one that promises to greatly extend every scientist's capability. This new approach involves the simultaneous, ranked, federated, full text search of multiple, scientific databases across many communities. In principle, all the scientific content accessible via the web can be searched at once this way; in actuality, we are still far from that goal, because federated search is difficult and expensive to set up.
The OSTI Initiative to Improve Global Discovery
The mission of the Office of Scientific and Technical Information (OSTI) is to facilitate science by disseminating information, especially research results. Today OSTI includes a digital library with over 10 million pages online. There are over 100,000 DOE research reports available on-line, as well as several large bibliographic collections and numerous other resources.
OSTI has also been a leader in the simultaneous, ranked, searchable federation or aggregation of large digital collections that reside elsewhere. This includes creating the ePrint Network, which aggregates 35 science preprint databases, as well as operating Science.gov, the portal supported by a 14-agency consortium with over 50 million pages online. Currently, OSTI is engaged in a strategic initiative, Innovations in Scientific Knowledge and Advancement (ISKA), which focuses on these and related innovations.
As the ISKA initiative matures, its goal is to significantly enable and accelerate advances in science. To date, ISKA's program has been modest, but it is nevertheless worth noting. It includes the following key components:
Science.gov (http://www.science.gov) readily shows the value of investigating a topic by drawing information resources from a number of disparate scientific domains. Science.gov searches the open access literature, especially the so-called "gray literature" of research reports, preprints, etc. But this is still just a small fraction of the potentially available science information, and Science.gov is highly selective in some ways. OSTI also has other, smaller federations of digital resources and libraries with global discovery capability. This interdisciplinary approach to scientific discovery is a small, yet important, blueprint of the global search facility of tomorrow. Specifically, what ISKA wants to show is that simultaneous, ranked search of distributed digital collections, made broadly available via the Internet, speeds up diffusion of scientific knowledge and accelerates scientific progress. In particular, it facilitates global discovery.
What We Know from Searching the Scientific Literature
Since the corollary to our premise that science depends on diffusion is that speeding up diffusion will accelerate scientific progress, it is important that we at ISKA have evidence to support the directions it is going and assertions being made. We undertook a literature search to see what research is being done and what is known about the diffusion of scientific knowledge, especially with regard to the assertion that speeding up knowledge diffusion will accelerate scientific progress. In particular, we were looking for insights, methods or tools that could be applied to measure or evaluate specific ways to speed up diffusion. Our findings were:
There are three strategies to better understand and promote the advancement of science:
Through all three strategies, the advancement of the science corollary is addressed.
The OSTI ISKA initiative focuses on the third strategy, but in the process of setting out its agenda has confronted both the need for conceptual context and the importance of reviewing the body of knowledge that exists. With $130 billion being spent annually on federal science and technology, this is a significant enough challenge that it should be more comprehensively addressed. The payoff is that the greater sharing of intellectual endeavors will enhance the discovery process and thereby enable scientific breakthroughs to occur more quickly, for the ultimate benefit of society.