Beneath the Metadata: Some Philosophical Problems with Folksonomy

Search | Back Issues | Author Index | Title Index | Contents

D-Lib Magazine
November 2006

Volume 12 Number 11

ISSN 1082-9873

Beneath the Metadata

Some Philosophical Problems with Folksonomy

Elaine Peterson
Associate Professor / Information Resources Specialist
Montana State University
<elainep@montana.edu>

	Background People have been trying to classify and organize information for thousands of years. There are many examples of cataloged items in ancient repositories, including items in the Library of Alexandria in Egypt. Taxonomy arose as an attempt to organize information about plants and animals in the physical world, and Aristotle is often considered the father of classification or taxonomy. In his Categories, he names Substances (nouns) and determines the nine distinctive things that can be said about a particular thing [1]. How we ultimately name something reflects the category to which we assign it. Through the development of categories, one is trying to answer the question, "What is it?" Taxonomic methodology has also become important in mathematical set theory through discussions of set, class, aggregate, and collection [2]. Neo-Aristotelian realists are as interested today in taxonomy as they are in ontology. Accurate classification is important in most, if not all, disciplines. In today's networked world of digital information, classification has become very important. One gathers, collects, and shares resources, making the organization of databases and websites crucial. Items that are different or strange can become a barrier to networking [3]. Therefore, with the advent of the Internet, structure and consistency of classification or indexing schemes has taken on a new relevancy. Traditional Classification Although they do not construct ontologies, catalogers and indexers are the inheritors of the Aristotelian tradition of categorizing things. Catalogers work with information (books or journal articles) instead of biological entities like animals, but there are similarities in the processes of classification they use. When a cataloger applies a subject term to a book or a keyword descriptor to a journal article, he or she attempts to apply specific, relevant terms to the work. The cataloger is naming the work and distinguishing it from other works, yet is also grouping the work with similar entities. Keeping in mind contraries, particulars, and categories, a cataloger applies basic Aristotelian principles. A book on horses would receive the specific subject heading Horses even though many of the horses described in the book may be different from each other, e.g., Arabian horses vs. Thoroughbred horses, or white horses vs. black horses. Moreover, if one assigned the subject heading "white horse" to a photograph, it would be incorrect to also assign the subject heading "black horse". Aristotelian contraries do matter in traditional classification systems. Perhaps the most important philosophical underpinning of traditional classification is the phrase, "A is not B". Even if a cataloger did not hold an underlying metaphysical stance that there is a particular way things are, the necessity of classifying and grouping physical objects has placed catalogers into that framework. Book A might be related to Book B, but a choice has to be made between them when classifying them. Classical cataloging is restrictive rather than expansive. It is irrelevant that digital items can reside in more than one place, since one is talking about a classification scheme, not about the items themselves. Another foundation of classical cataloging is the priority of the author's intent. "The cataloger must envisage the needs of the reader, endeavoring in every way to make it a simple process for him to find books. He should, like the librarian, adopt a neutral stand between the reader and his books, giving emphasis to what the author intended to describe rather than to his own views" [4]. This quote is timely since it recognizes the needs of the information seeker, but settles on a preference for the author's intent. The acceptance and prioritization of the author's intent as the way the item should be understood and therefore classified have traditionally been part of the practice of cataloging. Recognizing an author's intent can sometimes be difficult; nevertheless, the goal is to recognize the author's intent over others' interpretations. Folksonomy In the digital information world, folksonomy has emerged as an alternative to traditional classification. An article in Wikipedia states: "A folksonomy is an Internet-based information retrieval methodology consisting of collaboratively generated, open-ended labels that categorize content such as Web pages, online photographs, and Web links" [5]. The labels are called "tags", and they can make a search engine more effective in finding content because the vocabulary used for tags is user-generated rather than author-generated." It seems appropriate to define "folksonomy" using Wikipedia, since Wikipedia itself is a good example of a social network of individuals contributing to a work. Wikipedia allows any person on the Internet to contribute articles to it without judgment from others. (Hammond and others have preferred to call folksonomies "social bookmarking", thereby emphasizing the social networking often achieved by a collaborative effort of assigning subjects and tagging an online piece [6].) What can folksonomies or collaborative tagging achieve? "In contrast to professionally developed controlled vocabularies (also called taxonomies), folksonomies are unsystematic and, from an information scientist's point of view, unsophisticated; however, for Internet users, they dramatically lower content categorization costs because there is no complicated, hierarchically organized nomenclature to learn. One simply creates and applies tags on the fly" [7]. Gray and others recognize the overall importance of folksonomies, but also note some problems with them, including typographical errors or spelling variations [8]. The overall usefulness of folksonomies is not called into question; just how they can be refined without losing the openness that makes them so popular. Weinberger and others claim that folksonomies provide more benefits than simply cost benefits. They view the social aspect of tagging to be the most important feature of folksonomies to users. Weinberger points out that "readers, not just authors, get to tag objects. An author is an authority when it comes to what she intended her work to be about, but not about when [sic] it means to others. When it comes to searching, what a work means to the searcher is far more important than the author's intention" [9]. Weinberger also mentions as benefits financial savings and elimination of bureaucracies of catalogers and indexers, but the salient feature is the relevance of the subject terms to the searcher rather than to the author. Hence, since there are multitudes of Internet searchers, a multitude of tags is possible and indeed welcome. This candidness reveals more about the philosophy behind folksonomies. Philosophical relativism appears to be the underlying philosophy behind folksonomies. Because of those underpinnings, it is possible to jettison the limitations of a traditional classification statement such as "A is not B". In a folksonomy system, "A is relative to B", because each item's index terms will depend on the individual user and the tags he or she decides to use. A philosophy of relativism allows folksonomy to draw on many users with various perceptions to classify a document instead of relying on one individual cataloger to set the index terms for that item. Thus, classification terms become relative to each user. Certainly all individuals' perceptions are influenced by their own experiences and cultures, whereas the professional cataloger, even if trying to be unbiased, has only one viewpoint. Yet to include all viewpoints opens up a classification scheme to the inconsistency that allows a work to be both about A and not about A. There is no question that an individual might have a personal, valid interpretation of a text. That is not the issue. The issue is that adding enough of those individual interpretations through tags can lead to inconsistencies within the classification scheme itself. Take, for example, a dissertation entitled Chemical effects of biofilm colonization on steel parts of automobiles in United States cities. Based on the author's intention, the keywords assigned to the dissertation by a traditional cataloger might be: Biofilms; Corrosion. However, the same dissertation could be interpreted by an Internet reader as a work on the destruction caused by rust in the Rustbelt who might then use the tag: Destruction; Detroit for the dissertation. The folksonomy tag headings, Destruction; Detroit, would persist on the Web as access points to the dissertation. A search of the Web reveals sites, including many university websites, where folksonomy tags are now being employed, including an increase in the use of folksonomies for classifying Electronic Theses and Dissertations (ETDs). Although folksonomy tags began with bookmarking personal web sites, or grouping digital images in Flickr, some are now using such tagging to index academic journal articles and university dissertations. Although folksonomy practitioners point out some problems with the practice, they typically center on language problems. Gray mentions linguistic issues. Weinberger admits that differing terms might be applied when employing folksonomies, but that those will be minimal and eventually sorted out. For example, he refers to one user assigning a heading of "San Francisco", while another uses "Frisco". Some advocates of folksonomies have recognized that a democratic approach to Web cataloging also contributes to the abundance of irrelevant or inaccurate information, usually referred to as "Meta Noise". Meta Noise can be inadvertent (spelling white horse as whit horse), inaccurate (tagging White Horse when the image is of a white cat), or irrelevant (using an esoteric tag known to very few). Overall, many will view folksonomic classification of the Web, as Weinberger does, as "messy and inelegant and inefficient, but it will be Good Enough" [10]. If Weinberger means that it might be good for allowing individual users to supply their own tags, he might be correct. However, if he means that it will be good for the average user, his claim is questionable, since folksonomies will not produce an efficient index. Some of the problems with folksonomies can be traced to problems inherent with relativism. The first is that folksonomy tags are not merely "messy", they can be inaccurate. Because they assume a non-Aristotelian stance, the tags allow contraries to exist. If I tag an article with the subject "white horse" and you tag it "black horse", that is all right since both can coexist in a folksonomy classification scheme. The problem with relativism is the question: "relative to what?" Each Internet user is bringing to bear on the item a different linguistic and cultural background. Although this is an inherent strength of folksonomies (since it recognizes many valuable individual perspectives), it can also lead to the existence of contraries. A folksonomy advocate might reply that this is not true since the tags are relative to each user. Yet, within the database itself, tagging allows an inconsistency to exist. This situation is, perhaps, the strongest criticism one could make of folksonomies. A dissertation displayed on the Internet could be assigned subject headings deemed true to some groups of readers, but those same headings could deemed false by other readers. Therefore, a folksonomy universe allows both true and false statements to coexist. Because tags are relativized, personal, idiosyncratic views can coexist and thrive in the form of tags, in spite of their inconsistencies. Readers of texts on the Internet become individual interpreters, despite the document author's intent. Related to this is the problem of hermeneutics when multiple interpretations abound. As Eco once observed, "while it is a principle of hermeneutics that there are no facts, only interpretations", this does not prevent us from asking if there might not be "bad" interpretations. Because to say that there are no facts, only interpretations, certainly means that what appears to us as fact is the effect of interpretations but not that every possible interpretation produces something that, in light of subsequent interpretations, we are obliged to consider as fact [11]. As with the example given above in this article about the dissertation about rust, personal interpretations and judgments might be wrong if one is considering the author's intent. Yet, a stated premise of advocates of folksonomy is that the searcher's interpretation of a document is actually more important than the intent of the document's author. Even should all interpretations be of equal worth, if users can continuously add tags to articles, at some point it is likely that the whole system will become unusable. A folksonomic system threatens to undermine its own usefulness. A final criticism one could make of folksonomies as classification systems is that their advocates seem to assume everything on the Internet needs to be organized and classified. Anyone who has a home library knows that this is not necessarily true. Everyday, individuals make critical assessments of information bits they encounter. Their first decision is whether or not to retain the information, and if so, how to organize it. Folksonomy advocates seem not to recognize that critical, first decision about retention. The free labor available to create folksonomies is appealing only to those who have already agreed that the entire Internet needs some organization and cataloging. However, rather than being retained and organized, many Internet items could be eliminated, ignored, or allowed to die off. Most people put into the wastebasket (physically or online) flyers, ads and newsletters, and would not bother to organize ephemera. Concluding Comments The choice to use folksonomy for organizing information on the Internet is not a simple, straightforward decision, but one with important underlying philosophical issues. Although folksonomy advocates are beginning to correct some linguistic and cultural variations when applying tags, inconsistencies within the folksonomic classification scheme will always persist. There are no right or wrong classification terms in a folksonomic world, and the system can break down when applied to databases of journal articles or dissertations. Folksonomists are confusing cataloging structure with personal opinions and subsequent social bookmarking. These are not the same thing, and they need to be separated. A traditional classification scheme based on Aristotelian categories yields search results that are more exact. Traditional cataloging can be more time consuming, and is by definition more limiting, but it does result in consistency within its scheme. Folksonomy allows for disparate opinions and the display of multicultural views; however, in the networked world of information retrieval, a display of all views can also lead to a breakdown of the system. One is reminded of the Borges story about the Chinese emperor who wanted an accurate map of China [12]. The resulting map was very accurate, but it was exactly the size of China. With its inclusiveness, it was of no help, and it finally disintegrated. Most information seekers want the most relevant hits when keying in a search query. Folksonomy is a scheme based on philosophical relativism, and therefore it will always include the failings of relativism. A traditional classification scheme will consistently provide better results to information seekers. References [1] Jonathan Barnes, ed. The complete works of Aristotle. Princeton: Princeton University Press, 1984, p. 4. [2] John R. Gregg. The language of taxonomy, an application of symbolic logic to the study of classification schemes. New York: Columbia University Press, 1954, p. viii. [3] Lorcan Dempsey. The library and the network in the changing research and learning environment. Montana State University Academic Libraries Symposium, Sept. 29, 2006. [4] Margaret Mann. Introduction to cataloging and classification of books. Chicago: American Library Association, 1930, p. 3. [5] "Folksonomy." In Wikipedia. Retrieved November 10, 2006 from <http://en.wikipedia.org/wiki/Folksonomy>. [6] Tony Hammonds, et al. "Social bookmarking tools (I), a general review." D-Lib Magazine 11(4). November 10, 2006 from <doi:10.1045/april2005-hammond>. [7] "Folksonomy." In Wikipedia. [8] Marieke Guy and Emma Tonkin. "Folksonomies: tidying up tags?" D-Lib Magazine 12(1). Retrieved November 10, 2006 from <doi:10.1045/january2006-guy>. [9] David Weinberger. "Tagging and why it matters." Retrieved November 10, 2006 from <http://cyber.law.harvard.edu/home/2005-07>. [10] Ibid. [11] Umberto Eco. Kant and the platypus, essays on language and cognition. New York: Harcourt Brace, 1997, p. 48. [12] Jorge Luis Borges. Everything and nothing. New York: New Directions, 1988. Copyright © 2006 Elaine Peterson

	Top \| Contents Search \| Author Index \| Title Index \| Back Issues Previous Article \| In Brief Home \| E-mail the Editor

	D-Lib Magazine Access Terms and Conditions doi:10.1045/november2006-peterson