The web is rapidly becoming both more open and more social through the provision of technologies that make it easier for end users to access resources and join in social networks. Social networks have pioneered online communities, allowing users to contribute to collective knowledge by tagging online resources. Tagging behavior increased dramatically between 2005 and 2007. This article reports on an investigation of social tagging using data gathered from Delicious, Flickr and YouTube for the years 2005, 2006 and 2007. Preliminary findings indicate both that it is possible to profile a social network through the analysis of tagging data and that Delicious is a more representative venue for analyzing the social tagging behavior of users than either Flickr or YouTube.
The web is undergoing a period of rapid transition from a space for presentation of syntactically formatted information to a more open and social platform that allows users to communicate knowledge and share resources. Pioneering technologies on the web make it easy for users to participate in social networks, leading to the development of online communities. In concert with Wikipedia, Google Map, RSS feeds, mashup services and the interlinked social semantics that are the hallmarks of the current web revolution, social networks are actively contributing to the evolution of collective intelligence. Among the many social networks available on the web, Delicious, Flickr and YouTube are three of the best known and most popular. These networks established their prominent position either by hosting large numbers of online objects, by building a large user community, or by generating heavy web traffic. In 2007, Flickr hosted more than 2 billion images (Auchard, 2007); in 2006, Delicious had more than one million users (Robinson, 2006); and a scrape of YouTube in August 2006 indicated a total of 1.73 billion viewings of videos since YouTube's inception in 2005 (Gomes, 2006). The popularity of social networks has triggered a boom in social network research (e.g., Kipp and Campbell, 2006; Mika, 2007; Li, Guo, & Zhao, 2008; Lin, Chi, Zhu, Sundaram, & Tseng, 2008; Singla & Richardson, 2008). But the question remains as to whether analysis of user tagging behaviors can be used to reveal particular and distinctive characteristics of social networks.
This article investigates social tagging1 behavior in Delicious, Flickr and YouTube for the years 2005, 2006 and 2007. It describes the crawler used to harvest tagging data from each of these three social networks. It then provides an analysis of the most popular tags and the evolution of tag use in each of these three social networks. It concludes with a discussion of the findings, which indicate that it is possible to profile a social network through analysis of tagging data and that Delicious is a more representative venue for future analysis of social tagging data and the tagging behaviors of users.
UTO Tag Crawler
To integrate tagging data from different social networks, we developed a tag crawler to crawl Delicious, Flickr and YouTube and to store data about tags and tagging behavior in RDF triples based on the Upper Tag Ontology (UTO) (Ding, Toma, Kang, Fried & Yan, 2008; Ding, Kang, Toma, Fried & Yan, 2008).
Our UTO tag crawler is based on the Smart and Simple Webcrawler framework developed by Torunski (2008), which provides the functionalities of maximum interactions, maximum depth, filter interface, and pluggable http connection libraries. The UTO crawler was designed as a multi-thread crawler to avoid timeouts and to make efficient use of available internet bandwidth. It includes two different parsers: The first parses a page and identifies links that should be visited or filtered out, while the second parses the HTML code to retrieve information about tags.
In Delicious, the crawler began with the tag cloud at <http://delicious.com/tag> and visited every tag in the cloud. For TagA in the tag cloud, the crawler visited <http://delicious.com/tag/tagA> and parsed the HTML code to grab information about bookmarks, taggers and related tags. For each bookmark, the crawler went to
For Flickr, the crawler accessed the tag cloud at <http://flickr.com/photos/tags> and visited each tag in the cloud. For each tag page (e.g., <http://www.flickr.com/photos/tags/party/>), information about related tags was collected. Each individual image on the tag page was visited and information about the image, the tags and the tagger was extracted. The crawling process continued with <http://www.flickr.com/photos/tags/party/?page=2>. To avoid duplicate visits, only links of the form
For YouTube, the crawler started from the main page at http://youtube.com and visited every available video page. For each video page, the crawler collected data about tags and taggers and then visited all links pointing to other video pages. In order to avoid visiting the same page more than once, query parts of links were ignored (e.g., <http://www.youtube.com/watch?v=X2IExa2A198> and <http://www.youtube.com/watch?v=X2IExa2A198&watch_response> lead to the same video). Figure 1 provides an overview of the UTO crawler. Detailed information about the UTO ontology and the UTO crawler is available in Ding, Toma et al. (2008) and Ding, Kang et al. (2008).
We used the UTO crawler to retrieve tagging data from Delicious, Flickr and YouTube in September 2007. For all three social networks, tagging data harvested by the crawler included object, tagger, tag, date, comment and vote. The tagging data was then converted to RDF triples based on the UTO ontology (Ding, Kang et al., 2008). In total, the crawler retrieved approximately 21 million RDF triples for Delicious, 2.3 million RDF triples for Flickr, and 2.2 million RDF triples for YouTube.
Table 1 presents an overview of the data collected from each social network. The total dataset contains around 1 million bookmarks, 2.8 million taggers and 9.3 million tags from Delicious; around 300,000 photographs, 150,000 taggers and 1.4 million tags from Flickr; and around 500,000 videos, 200,000 taggers and 1.35 million tags from YouTube. The average number of tags per object ranges from a low of 2.74 in YouTube to a high of 9.31 in Delicious. The average number of tags a tagger assigns ranges from a low of 3.33 in Delicious to a high of 8.79 in Flickr. The average number of objects a tagger tags ranges from 0.36 in Delicious to a high of 2.84 in YouTube. The seeming disparity reflected in the low average for objects tagged by taggers in Delicious is accounted for by the fact that, while users are required to provide a title when uploading bookmarks to Delicious, they are not required to include tags in the tag field. Thus there may be many bookmarks in Delicious that have titles but no tags. Combined data from the three social networks totals 1.8 million objects, 3.1 million taggers and 12.1 million tags, of which 648,368 tags are unique.
Power law distribution
We merged the tagging data from Delicious, Flickr and YouTube to form a single, comprehensive dataset. Using this combined dataset, we analyzed the tag frequency. Figure 2 demonstrates that the distribution of tag frequency follows a power law distribution that conforms to Zipf's Law. Table 2 shows the details of this distribution: Only 1,363 out of 648,368 unique tags (or approximately 0.2% of all tags assigned between 2005 and 2007) were assigned more than 1,000 times each, while 357,028 (or approximately 55% of all tags) were assigned only once.
In the combined dataset, the most frequently occurring tag is design, which accounts for 101,786 or nearly 1% of all tag occurrences. The second most frequently occurring tag is blog and accounts for 90,242 or 0.7% of the total tags assigned between 2005 and 2007. The 1,363 most frequently occurring tags account for a total of 6,210,163 tagging instances; these 1,363 tags comprise a core tagging vocabulary that represents more than 50% of the entire corpus of 12,077,183 tagging instances. (See Appendix for a list of the 1,363 tags that make up the combined core tagging vocabulary of Delicious, Flickr and YouTube). It is hoped that linguistic analysis of this core set of tags will be able to reveal features of the evolving vocabulary of tags in each social tagging network.
Social Tagging Analysis
In order to generate individual portraits of tag use and the composition of tag vocabularies in Delicious, Flickr and YouTube, the data from each social network were analyzed independently using three time frames (2005, 2006, 2007).
Table 3 shows the 20 most frequently assigned tags in Delicious for the years 2005, 2006, and 2007. These tag sets appear to be relatively stable across the three years. The tags xml, science, search, games, technology, and security appear among the top 20 tags for 2005 but are dropped from the lists of top 20 tags for 2006 and 2007; and the tags imported, research, and internet are dropped from the list of top 20 tags for 2007. The tags development, howto, tutorial and Web2.0 appear in the lists for both 2006 and 2007, and webdesign, free and opensource are introduced in 2007, pointing to the emergence of new trends in user interests. Overall, 85% of the top 20 tags are stable across 2006 and 2007, indicating that a shared social vocabulary may be emerging in Delicious.
A profile of Delicious users can be generated through analysis of the lists of popular tags. The dominance of tags such as blog, web, programming, and design indicate key interests of Delicious users who are tagging bookmarks to store or share. While the tags music, video, art and news indicate a level of general interest that spans all three years, actual tagging evidence strongly supports the popular assumption that Delicious is a social network for individuals interested in the web and programming skills. Furthermore, the tags introduced in 2006 and 2007 indicate a growing interest in free or open source resources as well as tutorial and how-to resources that support learning programming languages or applications, and developing new computer skills.
Figure 3 shows the evolution of dominant topical tags used in the Delicious social network for the period 2005-2007. The tag Web2.0 shows the highest peak in both 2006 and 2007: The raw frequency with which Web2.0 was used to tag bookmarks increased 16 times in 2006 and 76 times in 2007 when compared with its raw tagging frequency in 2005. The tags showing the most dramatic increase in raw tagging frequency from 2006 to 2007 were webdesign, free and Web2.0, indicating growing interest in these topics on the part of Delicious taggers. The three tags with the least impressive increase in raw tagging frequency from 2006 to 2007 were java, programming, and music. While this might seem to indicate waning interest in these topics, only the ranking for java, which dropped from eighth most popular tag in 2005 to twentieth most popular in 2007 (Table 4), appears to support this conclusion. The tag programming drops from second position in 2005 and 2006 to fourth position in 2007; however, this is not a drop in popularity significant enough to justify any conclusions about waning interest on the part of Delicious taggers. The tag music does demonstrate a more dramatic drop in popularity from fourth position in 2005, to sixth position in 2006, and to tenth position in 2007 but the fact that Last.fm became one of the more popular social networks for sharing music during this period may help to explain why tagging with music decreased from 2005 through September 2007.
Table 5 shows the 20 most frequently used tags in Flickr for the years 2005, 2006, and 2007. In sharp contrast to the more topical tagging culture of Delicious, Flickr taggers like to tag photographs with dates, locations, colors, and seasons. Favorite locations in Flickr include Hong Kong (2005), Germany (2005), USA (2006 and 2007), London (2005-2007), California (2006), and Japan (2007). Favorite color tags are orange (2005), blue (2006 and 2007), red (2006 and 2007), green (2006 and 2007), and black-and-white (i.e., bw in 2007). The most frequently used tags for seasons are autumn and fall (2007). In addition, users also favor tagging photographs with the time of day (or lighting conditions), especially when the photographs are night views. With the exception of tags in the categories year, color and location, the top 20 tag sets differ widely across the three years.
Flickr taggers frequently assign informal tags to photographs (e.g., me), indicating that users may be tagging photographs for purposes of storing and retrieving them for their own use rather than with any intent to share them with others. When tagging photographs, users tend to emphasize the eye-catching features of an image such as color, subject (e.g., sky, water, beach and specific locations), and light conditions (e.g., night and nightview). Nonetheless, time (i.e., year, season or month), locations and colors are the major features of images tagged by users. It could be useful to analyze the tagging culture of Flickr in greater detail given that annotating images is an important area for image retrieval.2
Figure 4 and Table 6 show the temporal history of tag popularity in Flickr for the period 2005-2007. In 2005 and 2006, tagging was not particularly popular in the Flickr community, with total tags of 3,598 in 2005 and 23,066 in 2006. However, as tagging became more popular on the web, tagging behavior changed dramatically in Flickr. There were 1,324,537 tags assigned by Flickr taggers through September 2007, approximately 50 times more tags than were assigned for all of 2006. Raw tagging frequency for cannon, the second most popular tag in 2007, increased 203.5 times over its total use in 2006; but fall, the thirteenth most popular tag in 2007, showed the greatest jump, increasing 672.5 times over its raw frequency of assignment in 2006.
Interestingly, an analysis of tagged photographs indicates that there are two major communities of Flickr taggers: One community contains non-professional photographers who appear to use Flickr as a platform for sharing photographs with friends and family, and they tag images so that the images can be retrieved by others; the second community consists of professional photographers who do not tag often but who frequently provide comments on photographs taken by other professionals.
Table 7 shows the 20 most popular tags in YouTube for the years 2005, 2006 and 2007. The topics that are most frequently tagged in this social network are music, videos, humor, sex and girls, apparently reflecting the broad interests of the general web community.
Tagging activity in YouTube increased dramatically between 2005 and 2007. The total number of tags assigned in YouTube increased from 4,735 in 2005, to 366,147 in 2006, to 1,073,042 in 2007: Tag use was 78.7 times greater in 2006 and 236.7 times greater in 2007 than it was in 2005. Compared with 2005, the tag [year] had the greatest increase in use in 2007, followed by new and sex/sexy, while dance showed the least increase between 2005 and 2007. The tag set in YouTube appears to be more stable than that of Flickr for the same time period, seemingly indicating that areas of user interest have remained fairly steady for the social web community as a whole (see Figure 5 and Table 8).
Summary and Conclusion
When comparing these three social networks, Delicious demonstrates the tightest connection to the use of tags as extended information about resources. In Delicious, every user can tag an object with the tag(s) of his or her own choice; and an object can be tagged many times and by many different users, thereby indicating that it "belongs" (or is highly relevant) to the Delicious community as a whole. Delicious exemplifies community tagging where anyone can tag (or bookmark) any online resource (Marlow et al., 2006). Other similar social networks include CiteULike and Connotea, where tagged resources are bibliographical records, and LibraryThing, where tagged resources are books.
Social networks such as Delicious, CiteULike, and LibraryThing are very different from Flickr, where a resource (photograph) is generally tagged only by the individual who uploads it. The major activity of other members of the Flickr community is to "comment on" or "vote for" resources by indicating that a particular photograph is a favorite image. Flickr also provides users with the ability to allow friends to tag photos they have uploaded; but this functionality limits tagging behavior and thus the development of a sense of community in that it prohibits open tagging by Flickr users at large. Because tagging a resource in Flickr is not generally open to everyone, Flickr cannot be considered a true community-based tagging system; rather, it is better thought of as a self-tagging system for users and their close friends. YouTube operates in a manner very similar to that of Flickr, allowing individuals to tag the resources (videos) they have uploaded while limiting the participation of other Flickr users to voting for resources by assigning "stars".
These differences in tagging rights have created differences not only in the role tags play in each system but also in the nature of the tags that are assigned (Marlow et al., 2006). Based on analyses of the top 20 tags in each of the three social networks, it is apparent that tags in Delicious are more content-oriented in that they are generally related to the topics of the resources bookmarked. The tags used in Flickr are more annotation-oriented in that they are generally related to the physical features of the photographs themselves, such as colors, lighting and location. While tags in Delicious are likely to reflect the intellectual content of resources and those in Flickr generally represent the physical features of photographs, tags in YouTube tend to focus on the medium or genre of resources (e.g., music, video, comedy, movie, tv) and on affective judgments (e.g., funny, sexy, hot, love, new).
The role of tags in Delicious is to represent bookmarked resources not only for future retrieval but also for sharing them with the larger community. Tags play a major role in Delicious: Without the tags assigned by users of the social network, there would be no means either to share bookmarks or to identify and retrieve resources, which are the main functions of Delicious. In contrast, tagging does not play a major role in Flickr. Because the decisions as to whether or not to tag a photograph and who may tag it are left to the individual uploading a photograph, tagging in Flickr is more of a secondary activity or side effect. Furthermore, photographs on Flickr can be searched for and retrieved by their titles and are ranked by comments or votes rather than by the number of tags assigned. This is also the case with YouTube in that videos are most frequently shared based on comments and votes rather than assigned tags. Indeed, it appears that many YouTube users may not understand the purpose of tagging: Instead of adding specific tags, users often enter descriptions of their videos in the tagging field, which accounts for the occurrence of helping words such as articles, prepositions and conjunctions (e.g., the, of, in, and) among the more popular tags in YouTube. Table 9 summarizes the characteristics of social networks that were identified in the analysis of Delicious, Flickr and YouTube.
Social tagging behaviors are also related to the community of users in each social network. Delicious gathers a community interested in IT-related topics. These individuals are interested in the content of bookmarked resources, and tagging provides a way for them to summarize this content. In such a situation, tagging becomes the key function of the system and plays a major role in sharing and retrieving bookmarks. Users of Flickr are more interested in commenting on and sharing their photographs with family and friends. Thus, rather than comprising a single, cohesive community, users in Flickr appear be divided into two primary communities: professional photographers who upload photographs for comment and feedback from other professionals, and non-professional users for whom Flickr provides a place to store personal photographs and share them with close friends. Alternatively, the community of YouTube can be viewed as a snapshot of the entire Web community. YouTube is populated by individuals from all over the world who are of different ages and have many different interests. They come to YouTube with many different purposes and expectations, and many of them do not tag their videos because the role of tagging is overshadowed by rating and commenting.
After analyzing social tagging behavior in Delicious, Flickr and YouTube, it is apparent that tagging activities have increased tremendously from 2005 to 2007. An increasing number of individuals are using online social networks to tag resources for purposes of storage, access, and retrieval, both for themselves and for the purpose of sharing those resources with others. Through tag analysis, it is possible to develop a portrait of the social culture of a network and, in some cases, to identify trends of emerging or waning topical interests among users.
While tag sets in Delicious appeared to become more stable across the time frame of this study, it was also apparent that collective tagging vocabularies could benefit from both syntactic and semantic normalization of tags: For example, in YouTube in 2007 there were 2,796 uses of the tag girl and 1,851 uses of the tag girls. Normalization of singular and plural forms as well as acronyms and full names would increase the effectiveness of tags for retrieval purposes, as would standardization of the syntactical formation of tags (e.g., tag phrases with or without a space between individual terms). Perhaps as important is the introduction of user education regarding the choice of tags and their potential utility in social networks (Ackerman, James & Getz, 2007).
This study demonstrates that it is possible to profile a social network by analyzing data about tags and tagging behaviors in social networks. Thus, analysis confirms the popular assumption that the Delicious community is largely comprised of individuals interested in IT-oriented topics such as design and programming. In contrast, the Flickr community appears to contain two primary groups of users: professional photographers interested in feedback and non-professional photographers interested in sharing photographs with family and friends. In contrast to Delicious and Flickr, the YouTube community is very broad and can be best viewed as a self-selected subset of the general social web community. Tagging is a major activity in Delicious but not in Flickr and YouTube. Tagging in Delicious is used primarily for purposes of storing, retrieving and sharing online resources across the community; tagging in Flickr emphasizes indexing objects for retrieval by the tagger and his friends and associates; and tagging in YouTube is undertaken primarily for identifying the genre of a video and for indicating the tagger's affective reaction to it. Taggers want to represent the content of a resource in Delicious, but they tend to focus on the specific features of an image in Flickr and the genre of a video in YouTube.
In Delicious, changing trends in user interests can be identified and tracked by analyzing tag frequencies across time; in both Flickr and YouTube, however, such trends are not obvious, perhaps because the focus of tagging activities is not on the intellectual content of resources but on more superficial features such as color (in Flickr) or affective reactions (in YouTube). Thus, even though YouTube has been characterized as a subset of the general web population, the results of this research indicate that Delicious is a more representative venue for analyzing social tagging vocabularies and the tagging behaviors of users. This conclusion is supported by the finding that the community of users in Delicious is more cohesive than in Flickr or YouTube; by the dynamic behavior of users that supports tracking of emerging and waning interests within the Delicious community; and by the participatory focus on sharing that characterizes user tagging activity in Delicious.
The authors would like to thank the University of Innsbruck for its support of data collection and analysis. The authors are also very grateful for the technical support provided by Ioan Toma of the University of Innsbruck.
Notes1. Social tagging is a method for web users to add keywords to online objects such as bookmarks, photos, videos and so on. These added keywords are called tags. Web 2.0 technologies enable massively and collectively creating and managing tags that can be utilized to analyze different online social behaviors.
2. An interesting example of ongoing research on social annotation of images and videos is GWAP, the "games with a purpose" project at Carnegie Mellon, which is available at <http://www.gwap.com/gwap/>.
Ackerman, G., James, M., & Getz, C. T. (2007). The application of social bookmarking technology to the national intelligence domain. International Journal of Intelligence and Counterintelligence, 20, 678-698, <doi:10.1080/08850600701249808>.
Auchard, E. (2007, November 19). Flickr to map the world's latest photo hotspots. Reuters. Retrieved September 30, 2008, from <http://www.reuters.com/article/technologyNews/idUSHO94233920071119>.
Ding, Y., Kang, S., Toma, I., Fried M., & Yan, Z. (2008). Integrating Social Tagging Data: Upper Tag Ontology. Proceedings of the 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore. Available at <http://info.slis.indiana.edu/~dingying/Publication/SMC2008-UTO-cameraready.pdf>.
Ding, Y., Toma, I. Kang, S., Fried, M., & Yan, Z. (2008). Data Mediation and Interoperation in Social Web: Modeling, Crawling and Integrating Social Tagging Data. Proceedings of the Workshop on Social Web Search and Mining (SWSM2008), 17th International World Wide Web Conference, Beijing, China. Available at <http://keg.cs.tsinghua.edu.cn/SWSM2008/short%20papers/swsm08_submission_5.pdf>.
Gomes, L. (2006, August 30). Will All of Us Get Our 15 Minutes On a YouTube Video? The Wall Street Journal. Retrieved September 26, 2008, from <http://online.wsj.com/public/article/SB115689298168048904.html>.
Kipp, M. E., & Campbell, D. G. 2006. Patterns and inconsistencies in collaborative tagging systems: An examination of tagging practices. In Proceedings Annual General Meeting of the American Society for Information Science and Technology, November 3-8, 2006, Austin, Texas. [S.l.]: Richard B. Hill. Available from <http://eprints.rclis.org/archive/00008315/>.
Li, X., Guo, L., & Zhao, Y. (2008). Tag-based social interest discovery. In Proceedings of the 17th International World Wide Web Conference, April 21-25, 2008, Beijing, China (pp. 675-684). Retrieved January 31, 2009, from <http://www2008.org/papers/pdf/p675-liA.pdf>.
Lin, Y., Chi, Y., Zhu, S., Sundaram, H., & Tseng, B. (2008). FacetNet: A framework for analyzing communities and their evolutions in dynamic networks. In Proceedings of the 17th International World Wide Web Conference, April 21-25, 2008, Beijing, China (pp. 685-694). Retrieved January 31, 2009, from <http://www2008.org/papers/pdf/p685-linA.pdf>.
Marlow, C., Naaman, M., Boyd, D., & Davis, M. (2006). HT06, tagging paper, taxonomy, flickr, academic article, to read. In Proceedings of the Seventeenth Conference on Hypertext and Hypermedia, August 22-25, 2006, Odense, Denmark (pp. 31-40). New York: ACM. Available from <http://portal.acm.org/citation.cfm?id=1149949>.
Robinson, B. (2006, September 25). Del.icio.us reports 1 million users post Yahoo! growth tops all of Digg. Message posted to <http://www.techcrunch.com/2006/09/25/del.icio.us-reports-
Singla, P., & Richardson, M. (2008). Yes, there is a correlation From social networks to personal behavior on the web. In Proceedings of the 17th International World Wide Web Conference, April 21-25, 2008, Beijing, China (pp. 655-664). Retrieved January 31, 2009, from <http://www2008.org/papers/pdf/p655-singla.pdf>.
Copyright © 2009 Ying Ding, Elin K. Jacob, James Caverlee, Michael Fried, and Zhixiong Zhang