Tagging Full Text Searchable Articles: An Overview of Social Tagging Activity in Historic Australian Newspapers August 2008 August 2009
In August 2008, tagging was implemented on articles that were full text searchable within the National Library of Australia's historic Australian Newspapers service. During the first year, 500 users created over 100,000 tags, 38,000 of which were distinct. The tagging was very successful and the National Library will be extending the tagging functionality to all of its other collections before the end of 2009. In this article, the tagging activity, behaviors and outcomes are analyzed and compared with other research on image tagging.
Keywords: tagging, tags, social engagement, social metadata, user engagement, folksonomies, historic newspapers, Australian newspapers, full-text resources, web 2.0, user generated content, social tagging.
Non-profit making and commercial organizations like Flickr, Youtube, LibraryThing, and Amazon have responded to user needs by enabling tagging, commenting, rating and other social metadata engagement (web 2.0) tools across books, videos, images and websites. Despite proving very popular with users, the library, archive, museum and gallery sectors have been slow to follow suit with their own collections. In the cultural heritage sector a fair amount of research and pilot testing has been carried out on social tagging as the precursor to perhaps implementing it more widely. The most notable and extensive research is undoubtedly that around steve.museum, an open source tagging tool. Two excellent reports have recently been published on two years of research (2006-2008) undertaken by steve.museum.
In August 2008, the National Library of Australia (NLA) implemented a tagging application for the first time on one of its own collections: the newly released Australian Newspapers beta service, which contained 1 million full-text searchable articles of historic Australian Newspapers from 1803-1954. The number of articles increased to 5 million by the end of the year. The tagging was not done as an experiment and therefore was not controlled. The taggers were real users who wanted to tag for reasons of their own. Nevertheless, over 500 users created a tag pool of over 102,000 total tags in the first year of which 38,000 were different (distinct) tags.
Users could be registered or anonymous. During this time, all web 2.0 user activity, such as tagging, commenting and text correction, was monitored (but not moderated) by NLA through the gathering of statistics and communicating with users.
The Australian Newspapers service is the only online service from the National Library of Australia that has utilized web 2.0 features. It is also one of only two known large cultural heritage institutions in Australia that have enabled tagging across their collections, the other being the Powerhouse Museum. A survey carried out on Australian cultural heritage institutions in 2008 revealed that though many institutions were thinking about tagging only two were actually doing it. It also reported that "institutions who have not implemented user tagging generally perceive many potential problems that institutions who have implemented user tagging do not report".
I have undertaken my own research into the tagging activity that occurred in the Australian Newspapers service over the first year. This article gives an overview of the public reaction to and utilization of the tagging facility in a full-text searchable collection, and provides statistics over a year's duration, observations on the use of tagging and suggestions for future developments. These may be relevant for other libraries and collections who are considering implementation of tagging. I was also interested in finding out if tagging activity and behavior may be any different on full-text resources as compared to image collections, so I have compared the NLA findings with other recent research on tagging in image collections. This includes the steve.museum project research where 1621 users added 36,981 tags to 1,784 images of museum and gallery works between March 2007- March 2008, and the Library of Congress Flickr pilot project where 2,518 users added 67,176 tags to 4,615 photographs from their collection between January 2008 October 2008. These can be considered a fair comparison to the Australian Newspapers service.
Tagging of resources in the Australian Newspapers collection was implemented for the primary purpose of improving the data quality of the resource. The success of the tagging was measured on three things:
By these measures, tagging of the resources was considered successful.
A secondary but very significant outcome was that the Library harnessed a high level of social engagement from its users. The Library is now extending the tagging functionality across all of its analogue and digital collections.
This article does not cover the public text correction feature in Australian Newspapers that was introduced along with the tagging and commenting features and was even more successful than the tagging. Text correction is fully covered in the 'Many Hands Make Light Work' report.
2. Tagging functionality and implementation in Australian Newspapers
The functionality given to users for tagging in Australian Newspapers include the following:
Although it was always intended that users would be able to search across tags, users have not, however, been given that ability so far. Unfortunately, project priorities were diverted from implementing the agreed interface and functionality enhancements, including the searching of tags, to other more critical priorities. Therefore, during the entire period of this research tags were not searchable by the public, though they were browsable and could be viewed at article level. In addition, no guidelines for creation or management of tags were provided, due to lack of staff resource to develop them and to the team's thinking that guidelines were not essential.
The data enhancements from tagging, commenting and text correction are stored in layers within the Lucene database (they do not overwrite existing data, even for text corrections). The data layers can, in theory, be searched separately as user layers or in combination with library provided metadata layers. At present a searching facility across tags and comments has not been enabled but the search across text corrections has been (both user and library layers). Most other library collections that have tagging do so across either images or at item level, e.g. book. The Australian Newspapers service is perhaps unique in that it has enabled tagging of searchable text at article level. When tagging was implemented, it was not anticipated that it would be used very much since all the articles are full-text searchable. In this sense it is quite different from tagging an image collection. Nonetheless, article tagging has been utilized a great deal and has proved very popular with users.
Implementation of tagging was a relatively easy task and took little time. A challenging question and one not yet answered is this: if you enabled searching across the different user-generated layers and library layers together, how would the presence of a tag that matches your search term affect the relevancy ranking?
3. Staff resource required to support tagging
After initial development and implementation of tagging, no staff resource was required to support it, because the public tagging was not moderated. However, in the first year the Australian Newspapers Digitisation Program (ANDP) team decided to monitor tagging by gathering statistics and communicating with users, since this was the first implementation of tagging on a collection at the National Library of Australia. This was a task within the main project plan, since other specific things were also being monitored at that stage, such as text correction. In the first year around 20 hours were spent on monitoring and statistics gathering related to the tagging. Once the service was officially launched and out of beta phase, it was not a requirement to gather tagging statistics for reporting purposes or to monitor tagging. Social engagement is not measured at the National Library of Australia, nor is improvement to data quality by addition of user generated content layers. I did feel, however, that it would have been desirable to have the staff resource to discuss, establish and write tagging guidelines, and on a weekly basis to manually scan new tags created to ensure that no abusive terms had been created. Neither of these things has yet been done. Should tagging guidelines be established and if, for example, it were decided to 'tidy up' tags, then existing tags would need to be retrospectively converted. This could be done largely with user volunteers, rather than library staff. The tagging feature has been a big crowd-pleaser for public users, and It was a quick win for the Library that required very little work to implement and little to no support.
4. Usage of the Australian Newspapers service and tagging feature
The beta service was not publicized or promoted by the National Library of Australia. It was not originally intended to be in 'beta' version for a year, only for 3 months. Originally it was anticipated that relatively few users would become aware of the service and that they would agree to become 'testers' and give feedback in specific areas. As it turned out, the service was in 'beta' version for a year and thousands of users became aware of the service via viral marketing (mainly genealogy blogs) resulting in half a million users by the end of the year. Hundreds of users gave feedback multiple times and responded to specific queries, and the data in the service expanded considerably during beta phase. As users and data increased, tagging also increased. The tables below give an overview of service usage and activity.
Table 1: Australian Newspapers Service Usage August 4 2008 November 4 2009.
Table 2: Tagging Activity in Australian Newspapers - August 4 2008 - August 4 2009.
Table 3: Top 10 Tags August 2008 August 2009.
Table 4: Most common tags (based on number of times assigned and number of different users who assigned) grouped by type.
Table 5: Most tagged articles in the service as at 4 August 2009.
Table 6: Top 20 Taggers by number of tags created from August 4 2008 August 6 2009.
5. Tagging Guidelines
Tagging took off from day one of release. After the first 12 weeks, around 14,000 tags had been added and quite a few e-mails were received saying that there was tagging chaos. There was a strong expectation from users that, since this was a service run by a library, there would be some tagging rules and that librarians would be monitoring and editing tags that did not adhere to the rules. The ANDP team took no action at this time other than telling users that there were no rules or guidelines for tagging. As time went by and users successfully used the other web 2.0 features (commenting and text correction) and understood that a certain level of control and monitoring was in their own hands, they began to suggest that they themselves should be able to monitor and edit other people's tags to help make them conform. The large majority of tags added were for people's names, and taggers mainly wanted to know how the names should be entered. There was an expectation that the library would want them in some kind of library-authorised format, for example surname first, and taggers worried that they were doing it wrong. Taggers could edit their own tags (for example to correct spelling mistakes or change the order of words in personal names). After about 6 months when the ANDP team again confirmed it would not create guidelines, the taggers themselves bought order to the perceived chaos. Through common sense and their observation of other users' tagging activity, they clearly developed their own unwritten rules. Amazingly, they achieved this without being able to communicate with each other using the system. The unwritten rules they developed for tagging can be described as follows:
6. Observations on tagging activity
The total number of tags added in each three-month period was recorded during the first year of service availability. We were unsure if the pattern and amount of tagging that occurred in the first three months would be different than that in other months once the service and a tag cloud had been established. The result was that in the first three months there was a lot of confusion on the part of users who were unsure what to put in their tags since there were no guidelines and no examples. Users also appeared to be unclear about the purpose of tags, where they would be able to view them and whether it was possible to search or browse for tags. On reflection, if establishing a new tagging service, it would be preferable to seed a sample subject area with tags so that users could see the tags in action and have some examples to which to refer, and also to provide guidelines for those who wanted them. All the taggers were real users who had discovered the service themselves and decided on their own to start tagging. They were not directed or encouraged in any way.
As a result of user requests, the tag length was increased from 30 to 60 characters, and the limit of 50 tags per article was removed. Some articles, for example family notices, were tagged with more than 50 names.
Once the tagging community had established its own unwritten commonsense guidelines, the tagging settled down. During that period, the number of taggers did not increase much; it remained around 500+, and users consistently added about 10,000 tags a month. In the first three months, most of the tags created were distinct tags, and these were mostly used only one time. This may be a normal pattern when a tag pool is being established. By the end of the year users were duplicating tag terms, so new tags being created were not always unique. 74% of the distinct tags were used only once, and most of these tags were personal names. This is noted because some information professionals are of the opinion that tags are only useful if used more than once; however, the taggers do not seem to share that opinion. Tonkin's research on sample data from Flickr showed single use tags comprised 10 -15% of the tags (and may be due to misspellings), so the incidence of single use tags in Australian Newspapers is higher. Less than 1% of the distinct tags had been used 100 times or more. This is why the tag cloud looked more like 'tag fog' and was not useful. No words jumped out; the tag cloud was mostly just a solid mass of names.
At 12 weeks the tag fog had already developed, and there were 18,000 tags in the tag cloud, most of which were distinct (used only once). It was becoming impossible to easily browse the cloud or find items within it. Due to the lack of tag search functionality, people were using the internet browser 'find' function to try to find items in the cloud; however, a few weeks later this was taking on average 10 minutes because the page took so long to load, and using the 'find' function became a very unsatisfactory option. Unfortunately, this could not be addressed during the first year. Despite the unsatisfactory nature of the tag cloud and the lack of guidelines, users continued to create and use tags at a far greater rate than was ever anticipated.
As expected, there was the usual range of spelling mistakes, inconsistencies in upper and lower case, variation in description of dates, mixed use of singular and plural, and creation of non-dictionary-word tags, e.g. xx1. Tonkin's research of tagging inconsistencies shows that in Flickr and Deli.ci.ous spelling mistakes (or terms not found in a range of dictionaries) appear in around a third of tags. We were not able to confirm this rate of spelling mistakes in the Australian Newspapers tags.
98% of the tags created were given the status of 'public' because users stated that they wanted to feel they may help the wider community. 2% were private. There was no discernable difference in the type of tags created as public vs. private. 14% of the taggers utilized the private tag feature. It appeared that there were two reasons for creating private tags: 1) either the users thought their tag would not be helpful to anyone else, or 2) they did not want anyone else to add tags to 'their tag' because they were using their own tags to track their research progress.
92% of the tags were added by registered users and 8% were added by anonymous (unregistered) users. The research by the Library of Congress and by the steve.museum also showed higher use of tagging by registered users than anonymous users. 57% of the tag pool was created by the top 10 'super taggers'. Super taggers create a significantly higher number of tags than other users (usually thousands). The presence of super taggers is not unusual. This correlates with the findings in the Library of Congress Flickr project where 40% of the tags were added by a group of 10 super- taggers. The top super-tagger entered more tags than all the anonymous users put together.
The overwhelming majority (estimated to be 80%) of distinct tags created were for personal names and were being used by genealogy researchers. This was clearly a different tagging pattern to that seen in museum and image collections, where subjects and geotags dominate. 37% of the tag pool was comprised of distinct tags. This was slightly higher than the findings of steve.museum, which had 32% distinct tags, and Library of Congress, which had 21% distinct tags.
It was observed that far more users (approximately 10 times more) opted to correct text than added a tag, and five times more articles were corrected than were tagged. This was perhaps because users understood that correcting the text had a more radical effect on search results than adding a tag did. Two of the four super-taggers, who were also super text correctors, said they added tags to articles at the same time as correcting text, because they thought it might help other people find things in a different way. They both said they were not using the tags for their own purposes, instead finding articles by keyword searching, but they hoped the tagging would help other people, and they found it easy to do as they went along. Other text correctors said they saw no point in tagging once they had corrected the words in which they were interested. A survey of the text correctors and user testing of the system had revealed that many users were confused by the three interaction options available (tagging, commenting and text correction). They were sometimes unsure which one to choose or "which one was best". Many users had never used features like tagging or rating or reviewing before and did not understand the purpose of tagging. This certainly implied that the majority of users would do one or another but would rarely use all three features together.
Users wanted to be able to see in the keyword search results list if articles had been corrected or tagged. This was not implemented until the end of the year. Although no moderation took place, as far as the ANDP team were aware no abuse of tags took place. Users were quick to report errors and inconsistencies, and since no users reported abuse, it was assumed there was none. The fear of abuse is probably unjustified since both the steve.museum research and the Library of Congress Flickr project research found a tiny percentage of inappropriate tagging.
Our understanding of what the 'top tags' were (viewable from the browse page) was open to interpretation. At the end of the year, it was apparent that the most created tags were quite different from the tags used by the most users. We were displaying the top most created tags some of which had been used hundreds or thousands of times, but if Clay Shirky's hypothesis in his article 'Ontology is overrated' is correct that users want to know: "is anyone tagging it the way I do?" then they would find the second type more useful. Interestingly, there was a direct correlation between the second type (tags used by the most users) and the most frequently used search terms, i.e. the way people think when they are looking for things is the same as the way they think when they are describing things.
Users want the tags to be of benefit to everyone, and they think consistency, guidelines and moderation is the key to this. Whether they are right or not is hard to tell. Clay Shirky says that "Tagging gets better with scale". Perhaps we should not get too hung up on guidelines and just do it. Shirky also says "If there is no shelf, then even imagining that there is one right way to organise things is an error". In the digital, shared space everything is different from the library with shelves.
A summary of the observations made during the first year of tagging are below:
7. Tagging enhancements suggested by public users
Within the first 12 weeks many of the public testers/users e-mailed the ANDP team saying that they urgently wanted the following:
The user's perceptions of the priority of these items did not change throughout the year. Other suggestions were also made. A complete list of all suggested enhancements for tagging is below. Those marked in green were implemented before the year ended. Enhancement requests for a single feature received from many users were given a high priority. The team were all in agreement that the ability to search tags and improvements to browsing the cloud were needed, but unfortunately they could not take action on these items since there were other more pressing priorities to be addressed first.
Table 7: Suggested enhancements for tagging functionality.
8. Future development of tagging at the National Library of Australia
The National Library of Australia has decided that:
I have also suggested that the following activities take place:
The observations show that there were both similarities and differences in tagging activity and behaviours across a full text collection as compared to the research done on tagging in image collections. Similarities included that registered users tag more than anonymous users, that distinct tags form 21-37% of the tag pool, that 40% or more of the tag pool is created by 'super-taggers' (top 10 tag creators), that abuse of tags occurs rarely if at all, and that spelling mistakes occur fairly frequently if spell-check or other mechanisms are not implemented at the tag creation point. Notable differences were the higher percentage of distinct tags used only once (74% at NLA) and the predominant use of personal names in these tags. This is perhaps related to the type of resource (historic newspaper) rather than its format (full-text). It is likely that this difference may be duplicated if tagging were enabled across archive and manuscript collections. There was an expectation from users that since this was a library service offering tagging, there would be some 'strict library rules' for creating tags, and users were surprised there were none. The users quickly developed their own unwritten guidelines. Clay Shirky suggests "Tagging gets better with scale" and libraries have lots of scale both in content and users. We shouldn't get too hung up on guidelines and quality. I agree with Shirky that "If there is no shelf, then even imagining that there is one right way to organise things is an error".
The experience of the National Library of Australia shows that tagging is a good thing, users want it, and it adds more information to data. It costs little to nothing and is relatively easy to implement; therefore, more libraries and archives should just implement it across their entire collections. This is what the National Library of Australia will have done by the end of 2009.
2. Trant, J. (2009). Tagging, Folksonomy and Art Museums: Results of steve.museum's research. http://verne.steve.museum/SteveResearchReport2008.pdf.
4. Clayton, S; Morris, S; Venkatesha, A; Whitton, H. (2008) User Tagging of Online Cultural Heritage Items: A project report for the 2008 Cultural Management Development Program prepared by the Australian War Memorial, the National Library of Australia, the Royal Australian Mint and the National Archives of Australia. http://www.nla.gov.au/openpublish/index.php/nlasp/article/view/930/1205.
5. Springer, M., Dulabahn, B. Michel, P. Natanson, B., Reser, D., Woodward, D., et al (2008). For the Common Good: the Library of Congress Flickr Pilot Project. http://www.loc.gov/rr/print/flickr_report_final.pdf.
6. Holley, R. (2009) Many Hands Make Light Work: Public Collaborative OCR Text Correction in Australian Historic Newspapers, National Library of Australia, ISBN 9780642276940 http://www.nla.gov.au/ndp/project_details/documents/ANDP_ManyHands.pdf.
8. Shirky, C. (2005) Ontology is overrated: Categories, Links and Tags. http://www.shirky.com/writings/ontology_overrated.html.
9. RLG Social Metadata Working Group Background: http://www.oclc.org/programs/ourwork/renovating/changingmetadata/aggregating.htm. Overview of progress June 2009: http://www.oclc.org/programs/events/2009-06-02j.pdf.
About the Author