Visualizing Keyword Distribution Across Multidisciplinary C-Space (Appendix)

Visualizing Keyword Distribution Across Multidisciplinary C-Space

By Donald Beagle

Appendix A: Literature Review

The question of classification and context in online searching has been discussed at least since the 1980's, when Elaine Svenonius described several options in her paper "Use of Classification in Online Retrieval" [1]. Not coincidentally, this was the period when the first generation of online catalogs revealed the growing user preference for keyword searching, today known as "googling," over the use of cross-referenced controlled vocabulary. Svenonious observed that classification might be used in online systems to increase the number of relevant documents retrieved by broadening a keyword search, or it could aid in focusing an online search by enabling browsing, or "...by contextualizing vague words, such as 'freedom,' within perspective hierarchies, the computer might guide a user from an ineptly or imprecisely articulated search request to one that is quite specific" [2]. As an example, Svenonious used the call numbers and captions of the DDC, describing a "hierarchy of meaning" with higher-level captions as being embedded or implied in lower ones, allowing one to broaden a search by successively dropping the right-most digits of a classification number. She then went on to speculate that a future use of online classification apparatus might be "...to provide collocation of a kind not possible in manual systems, viz., the collocation of documents that are like each other by virtue of sharing linguistic features in common such as similar index terms or similar citations or a similar natural language vocabulary" [3]. This, of course, is very similar to the Asimov-Beagle notion of keyword vector clusters.

Three years after the Svenonius article, Lois Mai Chan authored a study called "Library of Congress Classification as an Online Retrieval Tool: Potentials and Limitations" [4]. Noting a DDC enhanced catalog built as a research project jointly sponsored by OCLC, Forest Press, and the Council on Library Resources, she wrote: "Test results for the DDC-enhanced test catalog were revealing, particularly in terms of how many unique subject-rich terms were generated. The study as a whole indicates that incorporating a classification scheme in the online catalog can provide enhanced subject access that is not possible through the alphabetical approach alone" [5]. Review of the findings suggests that much of the increased effectiveness of the DDC-enhanced catalog was related to its enlarged entry vocabulary, provided by way of captioning. Chan argued that the rich variety among terms from the LCC schedules and captions have high potential as sources for enriching entry vocabulary.

Beyond the question of enriched vocabularies, Chan followed Svenonious' thinking and went on to explore ways that LCC might be useful in expanding the range of strategies available to online searchers. She described these as: 1) shelf-order browsing, 2) full-schedule browsing, 3) coordinate subject browsing where searchers could see arrays of coordinate subjects without their intermediate breakdowns, and 4) outline browsing where hierarchical breakdowns are indicated by indentations of the schedule captions [6]. Notice that Chan's fourth option would make specific use of a basic visualization function.

An elaborate attempt to enrich online subject searching through classification was described in 1994 by Mary Micco and Rich Popp [7]. The ILSA project (Improving Library Subject Access) again involved the use of captions for contextual strategy and vocabulary enrichment. Here, natural language mapping was used to relate normal keywords used in searching with LCSH controlled vocabulary terms, forming keyword clusters. With all keywords indexed in a central dictionary, the user would then be guided from any specific keyword to all the subject clusters in which that term appeared. In turn, the subject headings anchoring these keyword clusters were then associated with DDC classification captions and ultimately the titles found in those classes. The ILSA interface was even designed around some basic visualization iconography, with branching symbols representing grouping, expansion, filtering, and tangential exploration. In their conclusion, Micco and Popp stated: "We have explored the relationship between the class number and the controlled subject headings or keywords assigned to the various topics....We have taken advantage of the captions for the classification numbers to provide a context for the subject headings retrieved" [8].

Appendix B: Theoretical Implications

Beyond these practical studies there exists a backdrop of theoretical discussion, including assertions (often by non-librarians) that subject classification is no longer a viable or necessary concept, particularly for digital collections. A full exploration of theory is beyond the scope of this paper, but I must at least acknowledge a few arguments and counter-arguments. Nicholas Rescher is an example of one who has argued for a system of order that is not hierarchical, but rather like chain-mail-work interlinkage reminiscent of medieval armor [9]. Rescher's school of thought has been continued by a number of theorists, including Roger Frye of the Santa Fe Institute [10]. Jeremy J. Shapiro and Shelly K. Hughes authored a brief but insightful conference paper summarizing several trends: a) the growth in the sheer volume of information making general classes and descriptors less useful, b) taxonomic complexification in the sciences pointing to new modes of cognitive inter-relations and ordering; c) interdisciplinary and transdisciplinary research that stresses internal relations of interdependence that are poorly grasped by disjunctive categories; and d) post-modern awareness of the limitations of rationalistic frameworks that were themselves based on neo-Platonic ontologies that asserted the primacy of the abstract over the concrete [11].

To supplement or replace such ontologies, Shapiro and Hughes offer the notion of personal meaning schemes, which they relate to the hypertext infrastructure of the World Wide Web, or in their words, "...a global system for organizing information and knowledge with a simple and viable non-hierarchical infrastructure" [12]. However, while the Shapiro-Hughes paper is thought-provoking, it fails to fully differentiate the top-down hierarchical schema of DDC from the bottom-up enumerative schema of LCC. The authors also overlook the fact that some hierarchical substructures are embedded within the Web, and that one can sometimes navigate among URL substrings by dropping right-most elements in a way similar to Svenonious' example of navigating DDC subclass strings. For example, one can go to the Shapiro-Hughes conference paper itself online at <http://www.iath.virginia.edu/ach-allc.99/proceedings/shapiro.html"> or one can shorten this URL string to <http://www.iath.virginia.edu/ach-allc.99> to get an overview of the more general scope of the International Humanities Computing Conference of which their paper was a part. And none of the above theorists has adequately accounted for citation network research that appears to lend classification an additional measure of validity. (see Leydersdorff) [13].

My contribution to this discussion came ten years after the Asimov conversation, in "Libraries and the Implicate Order: A Contextual Approach to Theory" [14], where I suggest that knowledge growth exhibits a pattern of order described by the physicist David Bohm as implicate order [15]. Implicate order describes any case where the totality of a system or structure is encoded or represented within each constituent subsystem or substructure, such as the image of a tree extending its branches, or on another level, a living organism where each cell contains a DNA encoding of information formative to the whole. I further related this to the thought experiment of Harvard cosmologist David Layzer who studied the flow of information in a system as a flow of probability fluid through a phase-space divided into macroscopic and microscopic cells. Layzer showed that probability fluid expands not by uniformly spreading while changing its density, "...but by sending out 'fingers' that grow longer and narrower and more numerous as the system evolves....As probability fluid extends fingers at smaller and smaller scales, the total hypervolume occupied by the fluid remains constant but the shape of the occupied region grows steadily more complex" [16].

The similarity between the images of knowledge as a tree extending its branches, a curriculum of learning where general trunks of disciplines extend into branching specializations, and the image of probability fluid expanding through phase-space by extending ever-finer fingers into smaller sub-domains, seems worth further study. Could a classification system map the extending fingers of probability fluid by meaningfully labeling regions of macroscopic and microscopic cells? Could we visualize a flow of knowledge expansion through c-space in a similar manner? I concluded the paper by borrowing a concept from Ingetraut Dahlberg's "Ontical Structures and Universal Classification" [17] To capture knowledge growth, Dahlberg felt that the key feature of classification was not a fixed taxonomy of subject fields, but rules for the construction of new classificatory statements. The rules governing each classificatory statement would be formed according to the system's most general statement about the overall organization of knowledge. Dahlberg posited that it would thus be possible to show that the macrostructure of knowledge is mirrored in any microstructure (e.g., implicate order) by relating successive levels of classificatory statements.

My article prompted a number of responses in library literature. Rafael Capurro expanded on certain precepts in his 1989 article "Towards an Information Ecology" [18]. Joseph Nitecki offered further interpretation in his 1995 work "Philosophical Aspects of Library Information Science in Retrospect" [19], where he viewed the article as a dialectic research model expressed within an abstract process of theory-building. Archie L. Dick (whom Nitecki also characterized as a dialectician) revisited the question in his 1999 study "Epistemological Positions and Library and Information Science," referring to my article as an example of "holistic perspectivism" being applied to the growth of recorded knowledge [20].

In 1999, Albert-László Barabási and colleagues at Notre Dame reported that the growth of the World Wide Web, and more specifically the topology resulting from that growth, exhibits a pattern comparable to the growth of a living plant [21]. Barabási added that what prevents any such system in the real world from settling into a static equilibrium is the phenomenon of novelty. Years earlier, Layzer had similarly commented: "This view of the world evolving in time differs radically from the one that has dominated physics since the time of Newton. The present moment always contains an element of genuine novelty and the future is never wholly predictable" [22]. Layzer's theory implies that the universe is unfolding in time but not necessarily devolving toward a thermodynamic equilibrium. Instead, it may be becoming more complex and richer in information. Knowledge, libraries, and the World Wide Web may be viewed as particular examples, with c-space potentially offering a range of interfaces for visualizing and querying the unfolding complexity.

References

[1] Svenonious, Elaine. "Use of Classification in Online Retrieval." Library Resources & Technical Services. 27(1) Jan/Mar 1983. pp. 76-80.

[2] Svenonious. p. 79.

[3] Svenonious. p. 79.

[4] Chan, Lois Mai. "Library of Congress Classification as an Online Retrieval Tool: Potentials and Limitations." Information Technology & Libraries. September 1986. p. 181.

[5] Chan. p. 181.

[6] Chan. P. 186.

[7] Micco, Mary and Popp, Rich. "Improving Library Subject Access (ILSA): A Theory of Clustering Based in Classification." Library Hi Tech 45(12:1) 1994.pp.55-66.

[8] Micco and Popp. p. 65.

[9] Rescher, Nicholas. Cognitive Systematization: A Systems-Theoretic Approach to a Coherentist Theory of Knowledge. Totowa: Rowman and Littlefield, 1979.

[10] Frye, Roger. "The Impact of Complexity Science on Science and Technology Information." Evolving Digital Libraries, Oct 21 - 22, 2002, Santa Fe, New Mexico, at <http://www.aisti.org/events/mini02/index.php>.

[11] Shapiro, Jeremy J. and Hughes, Shelly K.. "The Personal Meaning Scheme as Principle of Information Ordering: Postmodernism, Transdisciplinarity, and the Ontology of Classification." 1999 Joint Annual Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, June 9-13, 1999, University of Virginia, Charlottesville, Virginia, at <http://www.iath.virginia.edu/ach-allc.99/proceedings/shapiro.html>.

[12] Shapiro and Hughes.

[13] Leydersdorff, Loet. "Dynamic and Evolutionary Updates of Classificatory Schemes in Scientific Journal Structures," Journal of the American Society for Information Science and Technology 53(12) October 2002. p. 991.

[14] Beagle, Donald. "Libraries and the Implicate order: A Contextual Approach to Theory," Libri: International Library Review 38(1) March 1988. pp. 26-44.

[15] Bohm, David. Wholeness and the Implicate Order. London, Routledge & Kegan-Paul, 1980.

[16] Layzer, David."The Arrow of Time." Scientific American. 233(6) 1975. p. 63.

[17] Dahlberg, Ingetraut. Ontical Structures and Universal Classification. Bangalore, Sarada Rangathanan Endowment for Library Science, 1978.

[18] Capurro, Rafael. "Towards an Ecology of Knowledge." NORDINFO International Seminar "Information and Quality," August 23-25 1989, Royal School of Librarianship, Copenhagen. Proceedings: I. Wormell ed.: Information Quality. Definitions and Dimensions. London: Taylor Graham. 1990, p. 122-139. Also at <http://www.capurro.de/nordinf.htm>.

[19] Nitecki, Joseph Z. Philosophical Aspects of Library Information Science in Retrospect. [5.4.10.4] Volume 2 of The Nitecki Trilogy. 1995. Available as ERIC 381 162. Also at: <http://www7.twu.edu/library/nitecki/aspects/ch-05.html>.

[20] Dick, Archie L. "Epistemological Positions and Library and Information Science" Library Quarterly, 69(2) July 1999 p 317.

[21] Cohen, David. "All the World's a Net," New Scientist, 174(2338) April 13, 2002.

[22] Layzer, p. 69.

Back to the Article