Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Commentary

spacer

D-Lib Magazine
June 2005

Volume 11 Number 6

ISSN 1082-9873

Plenty of Room at the Bottom? Personal Digital Libraries and Collections

 

Neil Beagrie
<Nbeagrie@aol.com>

Red Line

spacer

Abstract

People are capturing and storing an ever-increasing amount of digital information about or for themselves, including emails, documents, articles, portfolios of work, digital images, and audio and video recordings. Computer processing, storage, and software tools available to individuals are increasing in power, volume, and ease of use, year on year. Many issues arise from this more informal and increasingly empowered landscape of personal collection, dissemination, and digital memory, which will have major future impacts. This article provides a commentary on current research and emerging services in this area and discusses potential implications for individuals, libraries and their institutions.

Introduction

Two major trends underpin this article and the developments it outlines. The first is the exponential increase in computer processing power often referred to as "Moore's Law" where there is roughly a doubling of the number of transistors on integrated circuits every 18 months for the same unit cost. This technical trend of rapidly increasing computing power is substantially reinforced by a similar trend over time in the volume and cost of computer storage. These developments are placing remarkable levels of computing power and storage within reach of individuals. Already, personal digital stores of a terabyte or more are almost within reach. Projecting current trends, in a few more years it will be possible, for example, to envisage individuals being able to store the equivalent of all the texts in the Library of Congress on their PC.

The second trend is increasing consumer digital creativity and an appetite for digital content. This has been dubbed "Generation C [for Content]" by market analysts [1]. The Generation C phenomenon refers to a perceptible consumer shift from consumption to personal creation, customization, and co-production of digital content.

Similar to the past "democratization" of computing with the shift from centralized mainframes to personal computers, there appears to be an emerging democratization and personalization of digital content creation. Spectrum Strategy Consultants recently predicted for the UK government growth in weblogging, personal online journals, personal journalism operating on a mass scale, and interpersonal links such as picture, video and music sharing. It is forecast that there will be increased consumption of "amateur" content and that amateurs will have better links to professional producers/publishers to send electronically content they have made themselves (e.g., text, photos and video clips) [2].

The overall effect of these trends is that people are able to create, capture and store an ever-increasing amount of digital information about or for themselves, including emails, documents, portfolios of work, digital images, and audio and video recordings, and can edit, share, and distribute them easily over the net via blogs, personal webpages, peer to peer networks, or shared services.

Although many wider public consumer trends may seem remote to the academic sector, they do have parallels in, and provide a broader context for, developments such as e-portfolios for students, self-archiving by academics, and interest amongst universities and colleges in applications of the creative commons.

Defining Personal Digital Collections

Individuals have always used physical artifacts as external memory and reference aids. Over time these have ranged from personal journals and diaries, to photographs and photographic albums, to whole personal libraries of books, serials, clippings and off-prints. The urge to express individuality and creativity, and to spend substantial time in developing personal collections of antiques, rare books, art, and ephemera is also of long-standing. This has shaped and defined not only personal collections but also has often been the foundation and lifeblood of most museum, library, and archive collections.

As personal collections shift from paper and analogue formats to hybrid and increasingly digital formats, personal digital collections are emerging. These personal collections are often composites drawing material from the individual's private life, work, and education, as well as from external communities and content sources. Ownership and intellectual property rights in such collections are therefore often diverse and complex. These collections are often composed of materials intended solely for private reference and use, and/or materials intended to be shared with others at work, or with other communities including family, friends, and interest groups.

A Ven-diagram-like image showing individuals' public and private digital collections reflect private and public personas

Figure 1. Personal Digital Collections are composed of information and content assembled by individuals from their private activities, work and external communities. They can be intended for private or public consumption and reflect both private and public personas of individuals.

The term "personal digital collection" is used here to distinguish these informal, diverse, and expanding collections accumulated and maintained by individuals. It focuses on what is maintained and accumulated by an individual, and excludes, for example, information on individuals that may be held in government sources such as census records or reviews of an individual's work created and maintained by third parties. Personal digital collections therefore may in part be an informal "personal archive" of record; a "personal library" of externally generated articles, PowerPoint slides, music, video and monographs; or other materials such as working papers or family photographs intended either solely for personal access or for sharing with others. Although in many ways similar to past personal papers and collections, there are radical divergences emerging in a digital environment.

Capture

As noted by the Microsoft MyLifeBits research project, with the evolution of digital storage and capture devices it is now both theoretically and practically possible to envisage capturing all aspects of an individual's life digitally. MyLifeBits uses a combination of continuous digital capture via devices such as video cams and retrospective digitization of analogue sources [3]. Although MyLifeBits remains the best-known project, this is now a growing area of computer science and industry research. In the USA this led to the establishment in 2004 of the first ACM workshop on Continuous Archival and Retrieval of Personal Experiences – CARPE [4].

The approach of near continuous and comprehensive capture has a number of strengths and weaknesses from the perspective of professional curators. There are important changes in approaches to long-term management of digital materials compared to their analogue equivalents. One of the most important is that the process of managing digital material requires continuous management, and for its long-term preservation, intervention at or close to the point of creation [5]. A traditional approach of capture close to or at the end of an individual's life for a collection may pose significant challenges in a digital environment, including obsolete formats and media, and missing data (email, webpages, etc.) or access gateways such as passwords. Continuous digital capture or synchronization is an approach therefore that may have significant advantages in a digital environment, and it would be interesting to see further experimentation and evaluation of this for collection development.

Selection, Narrative and Retrieval

A frequent criticism is that such passive capture methods fail to be selective and will overwhelm the user at the point of retrieval. Although undoubtedly true in an analogue environment, it is increasingly less clear that this argument now holds for digital materials. The combination of cheap digital storage and very sophisticated retrieval tools is shifting the balance of costs: digitally it is becoming cheaper to collect and more expensive to select, and cheaper to search than to organize [6].

Active personal collection by individuals will potentially protect a significant body of material that might otherwise be lost, and allows for future appraisal and selection by long-term repositories. Individual item selection, of course, is likely to have a continuing role in some areas, with owners (or their executors) filtering for privacy or irrelevant material such as spam.

Selection and retrieval may be less significant for many purposes and for some content than it is for narrative creation: the ability to edit, organize, and interpret massive collections for the user. Some of the tools and skills required for narrative creation may be drawn from those of television or video production or database visualisation. Continuous capture in a personal video camera, like live footage of reality TV shows, will be incredibly tedious to watch. It is the editing, organization and interpretation of such material that makes the material accessible. If future personal digital collections have real breadth and depth, then tools supporting the overlay of specific views into that material and narrative creation will have critical importance.

Digital Continuity and Discontinuity

As digital content in personal collections continues to grow, particularly content that has been paid for such as digital music or video, it seems likely that individual and public consciousness of and concerns over digital continuity will also increase. At its most basic level this is likely to focus on better provision and automation for backup of content. It is telling that research on digital data loss has suggested that a substantial amount of personal data is not backed up and that, on average, 6% of data held on all PCs is lost each year (more for laptops and mobile devices because of the higher incidence of theft) [7]. For any collection intended for access and use over a decade or more, the incremental accumulation of risk will become unacceptable. Its mitigation may become more inherent and automated in systems. Similarly, public awareness of and resistance to all but essential format migrations and associated costs may increase.

Digital systems are currently poorly adapted to what might be called individuals' discontinuity of interest. There is a focus on the immediate needs of users and little in the way of digital equivalents of physical storage spaces in which material can be laid down and later re-discovered, forgotten or discarded. Some personal interests in collections change or may lie dormant over time. For example, in family history, one of the largest and rapidly growing personal pastimes, use of personal collections and material may lie dormant for many decades. Individuals with no interest in historic material or potential future applications early in life are highly likely to be interested in them at a later stage of their lives. Digital systems should ultimately support digital memory.

Transmission of Memory

In this context the experience of archives, libraries, and museums (often referred to as "memory institutions" because of their role in social and cultural memory) could enrich and interact with computing science research on "memories for life" and the development of memory systems for individuals. In the mass consumer market, current interests in family history may drive interest in transmission of memories held in personal digital collections. In academic research, these personal histories will also be of interest, but it is likely the personal digital collections of leading creative writers and artists, politicians, and scientists may first engage research libraries and archives in this area.

Digital Estates

The growth of personal digital information brings interesting issues for libraries, family and employers relating to "digital estates" following the death of individuals. This is not solely an issue of content and its value but an issue of access (although it will be interesting to see if the often very high value placed on the personal archives of correspondence and manuscripts of some creative authors will also translate to the digital world of their email and electronic documents). Most personal digital collections can only be accessed via personal passwords and authentication. One of the best-known examples of unexpected outcomes from this relates to the Norwegian Reidar Djupedal. He took to the grave the password he had chosen for the database that indexed over 11,000 titles he had compiled at the Ivar Aasen Centre of Language and Culture in Norway. The case achieved world-wide publicity after the Centre's director made an international appeal for hackers to help identify the password. It only took hackers five hours to crack the code and unlock access to the database. It would have taken the institute about four years of work to recreate the catalogue had they failed to find the password [8]. It does not seem too far-fetched to suggest that in time we may see the emergence of "digital executors" with access to secure digital safe-deposit boxes storing passwords and access rights.

Emerging Services and Research

Alongside personal digital collections, we are seeing emerging remote services to interact with them intended to facilitate operation of personal collection and information systems. These offer shared services for security, information management, and publishing. Many are beginning to form what might be called "information banks", secure or public extensions of personal digital collections.

Already in the commercial sector emerging demand has seen the growth of a number of services to help individuals and their employers begin to address the challenges of managing personal data on PCs. Several companies now offer online backup of digital data to a remote secure repository using synchronization and encryption software as a safeguard against data loss and to ensure privacy [9]. Others are offering secure web-hosting of selected personal data for individuals, such as address books and contact details, that can then be centrally maintained and accessed from different mobile and fixed devices [10].

A desire to share digital images and documents has also led to rapid growth in software for individuals to publish blogs or digital images captured via mobile cameras and phones. Sharing of such information may be between immediate family, friends, and interest groups or may be open to all individuals on the Web. Services such as Lifeblog for mobile phones [11] or Flickr [12] for sharing, categorizing, searching, and publishing digital images are seeing sharp increases in their user base and provide a number of tools for individuals. These tools are proving highly popular.

More recently, the Internet Archive and other partners have established Ourmedia. Individuals creating video, music, photos, audio clips and other personal media can store their content for free in perpetuity on Ourmedia's servers, as long as they're willing to share their works with a global audience. Ourmedia's goal is "to expose, advance and preserve digital creativity at the grassroots level." This is the first such service to explicitly offer long-term preservation as well as hosting services for personal and community content [13].

Although such services are in their infancy, there is also a growing interest from Computing Science in these areas. In the UK "Memories for Life" was recently recognized as a Grand Challenge for Computing Science by the UK Computing Research Committee and by the UK Foresight Cognitive Systems Group. This interest spans a wide range of potential applications from digital memory aids for individuals, to life-caching and personal digital agents [14]. In Austria, the Vienna Technical University is also embarking on a related computing science project entitled "Semantic Life". This project aims to build a prototype Personal Information Management system designed to store, manage, and resolve information over one's lifetime [15].

To date there has been little related activity in major research repositories on personal digital collections – although one might expect them to be a major focus of future research. However, JISC has recently funded the PARADIGM project under its 04/04 digital preservation programme. This project is a partnership between the universities of Oxford and Manchester to provide a best-practice template for establishing long-term access to private digital papers of politicians. The project will work with the digital papers of at least two senior UK politicians [16].

In the area of e-learning there has been increasing interest in the use of e-portfolios and student learning records. One future vision for e-portfolios suggests moving towards "Lifetime Personal Web-spaces" with every citizen granted a cradle-to-grave webspace that will enable connections among personal, educational, social, and business systems [17]. The Lifetime Personal Webspaces suggested would share many of the characteristics of personal digital collections and shared services to support them noted above.

Conclusions

Sixty years ago Vannevar Bush drew a remarkable and frequently quoted vision of the future. He wrote:

"Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory." [18].

Today we are close to seeing Vannevar Bush's vision become a reality and this will have major future impacts.

The growing abundance of personal data and collection outlined in this article will present numerous challenges to individuals, including: how to physically secure such material sometimes over decades; how to protect privacy; how to organize and extract information and to use it effectively; and for material intended to be shared, how to effectively present and control access by different groups of users.

The shift towards personal collection, and to services aimed at supporting activity from the desktop, will also lead to new forms of shared services, publishers and information banks, and will re-inforce informal social networks and mechanisms of communication.

Informal sharing of such collections by academics has always been important for peers and contemporaries. Arguably their importance for current scholarship is growing along with the power and reach of software tools and communications available to individuals to create, manage, and disseminate them.

Similarly the digital material in many of these personal collections is likely to be as significant for future users of historic collections as their paper equivalents are today, providing it survives for future access. Personal digital collections should become a major area of interest for research collections.

In 1959, fourteen years after Vannevar Bush's article was published, Richard Feynman gave his classic lecture "There's Plenty of Room at the Bottom: An invitation to enter a New World of Physics", which outlined what was to become the field of nanotechnology [19]. The title of his lecture, and many of the sentiments behind it, seems in many ways appropriate at this point in time to personal digital collections. I hope this brief commentary will encourage more research and thought on personal digital collections and their place and relationship to digital libraries.

Acknowledgements

This is a personal commentary and does not necessarily reflect the views of the British Library or the Joint Information Systems Committee (JISC). I am deeply grateful to Maggie Jones, Daphne Charles, Andreas Rauber, Cliff Lynch, and many colleagues in the British Library, JISC, University College London, the Arts and Humanities Data Service, and the Memories for Life Network in various ways including valuable discussion on these themes. All opinions and any errors remain the responsibility of the author.

References and Notes

[1] See Generation C in the online newsletter of Trendwatching.com at: <http://www.trendwatching.com/trends/GENERATION_C.htm> or via this URL in the Wayback Machine in the Internet Archive at <http://www.archive.org>.

[2] Spectrum Consultants 2004 BBC online Review – module 2: Future UK Internet market trends. Final report for DCMS 16 March 2004 at
<http://www.culture.gov.uk/NR/rdonlyres/
25115125-9AFB-471B-8FB0-E538146CEEAE/0/BBConlineAnnex115.pdf
>.

[3] MyLifeBits:
<http://research.microsoft.com/barc/mediapresence/MyLifeBits.aspx>.

[4] CARPE 2004:
<http://research.microsoft.com/CARPE2004/>.

[5] N. Beagrie and M. Jones 2001 Preservation Management of Digital Materials: a Handbook, British Library 2001. Online edition at:
<http://www.dpconline.org/graphics/handbook/>.

[6] I am grateful to Michael Lesk who first made this observation. See M. Lesk, 2003, Size Matters: Web and Book Archiving:
<http://www.scils.rutgers.edu/~lesk/spring05/lis553/eva-apr22.doc>.

[7] A good summary of Backup market research and sources is available from Meganet:
<http://www.meganet.net/pdfs/onlinebkresearch.pdf>.

[8] This story was covered by a number of news organizations, for example the BBC:
<http://news.bbc.co.uk/1/hi/sci/tech/2038756.stm>.

[9] There are now a wide range of these services. One example is Data Deposit Box:
<http://www.datadepositbox.com/>.

[10] Again there are a wide range of such services. One example is Collabrio's contact manager:
<http://www.myevents.com>.

[11] Nokia Lifeblog:
<http://www.nokia.com/lifeblog/>.

[12]Flickr:
<http://www.flickr.com>.

[13] Ourmedia:
<http://www.ourmedia.org>.

[14] Memories for Life Network:
<http://www.memoriesforlife.org>.

[15] M. Ahmed, H.H. Hoang, M.S. Karim, S. Khusro, M. Lanzenberger, K. Latif, E Michlmayr , K. Mustofa, H.T. Nguyen, A. Rauber, A. Schatten, M.N. Tho, A.M. Tjoa, 2004, SemanticLIFE - A Framework for Managing Information of a Human Lifetime Proceedings of the 6th International Conference on Information Integration and Web-based Applications and Services (IIWAS 2004), September 27-29, Jakarta, Indonesia.

[16] Paradigm:
<http://www.paradigm.ac.uk/>.

[17] E.R. Cohn and B.J. Hibbitts, 2004, Beyond the Electronic Portfolio: A Lifetime Personal Web Space, EDUCAUSE Quarterly volume 27 number 4 2004:
<http://www.educause.edu/apps/eq/eqm04/eqm0441.asp?bhcp=1>.

[18] V. Bush, 1945, As We May Think, The Atlantic Monthly, July 1945.

[19] R.P. Feynman, 1959, "There's Plenty of Room at the Bottom: An Invitation to Enter a New World of Physics." A transcript of his talk is available online and a published version appears in Caltech's Engineering and Science February 1960 issue: <http://www.zyvex.com/nanotech/feynman.html>.

Copyright © 2005 Neil Beagrie
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Editorial | First article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

doi:10.1045/june2005-beagrie