Articles
spacer

D-Lib Magazine
September 2002

Volume 8 Number 9

ISSN 1082-9873

Coming to TERM

Designing the Texas Email Repository Model

 

Marlan Green
<marlangreen@mail.utexas.edu>

Sue Soy
<ssoy@gslis.utexas.edu>

Stan Gunn
<sgunn@gslis.utexas.edu>

Patricia Galloway
<galloway@gslis.utexas.edu>

Graduate School of Library and Information Science
The University of Texas at Austin

Red Line

spacer

Abstract

This article explores access to and long-term preservation of digital records in state government settings using the Open Archival Information System (OAIS) Reference Model to design a repository for managing email records in the state of Texas. Through creation of a multi-agency digital repository, state government can provide for the lawful management of email records and build a large knowledge base that can be tapped for information that will enhance business and growth.

The proposed trusted repository concept ensures that records of a truly transitory nature are destroyed efficiently and legally rather than left to accumulate in ever increasing numbers in an unwieldy manner in networked computing systems. The repository preserves the official business records of state government and can expand access services to a customer base that includes the agencies within state government and the public it serves. Finally, it is anticipated that lessons learned in its implementation will allow its eventual expansion to encompass all the digital records of Texas state government.

1   Introduction

1.1  The Problem

Texas state government, like other government entities, is grappling with the problem of constructing a repository for preservation and access for electronic records and other digital objects. Email is a special problem because of its volume. The International Data Corporation estimates that 9.8 billion electronic messages are sent each day around the world [1]. In view of recent well-publicized problems with the management of government email, the Texas Department of Information Resources has been exploring the possibility of a single centralized solution for managing Texas state government email according to the standards set by the Texas State Library and Archives Commission. The authors, in a seminar on digital records preservation, undertook the challenge to:

  • describe systems and processes that ensure that executive, legislative, and judicial electronic recordkeeping systems meet uniform standards and produce actions that lawfully preserve and lawfully dispose of electronic mail;
  • suggest strategies for building a trusted repository for email records that will not just preserve the records, but can also offer features that will enable access and, over time, develop into a knowledge base; and
  • propose a system for secure email retention and use which can maximize the benefits of ongoing activities of state agencies and consequently improve service to the citizens served.

1.2  Current Environment Overview

This report describes a system that encompasses what to collect and how to collect, process, move, preserve, and access increasing numbers of email records created at nearly every desktop computer used to provide Texas state government services as well as at the desktops of citizens and businesses that use email to communicate with state government.

The Task Force on Archiving of Digital Information recommended that a collaborative system of certified digital archives be established in the United States [2] and since then, groups worldwide have developed models and strategies for distributed systems that collect, protect, and preserve digital information. Examples of research and model development include work reported by Berthon and Webb [3], Dollar [4], Duranti [5], Moore [6], van der Werf-Davelaar [7], and Underwood [8]. The Open Archival Information System (OAIS) Reference Model has attracted wide attention as a workable model because it provides the elements that research indicates are necessary: a closely audited, well documented, and constantly maintained and updated system. These elements are especially attractive to government. This model also has the advantage of being an ISO international standard [9].

Within the United States, practical implementations of OAIS-modeled digital repositories can be seen at Harvard and MIT libraries. Additionally, the National Archives and Records Administration (NARA) [10] is currently involved in developing an OAIS-based digital repository for electronic records in cooperation with the San Diego Supercomputer Center (SDSC) [11], where researchers have concluded that agencies of the federal government can create an email archives that could successfully grow and migrate to new technologies over time.

International examples of OAIS implementations include the Networked European Deposit Library (NEDLIB) [12] of the National Library of the Netherlands, the National Library of Australia [13], and the Royal Library of Sweden [14]. These projects provide leadership in the exploration of metadata, preservation methods, selection of records, retention, and access in the context of the OAIS Reference Model. The recently completed CEDARS digital archiving prototype [15] demonstrates that there is good reason to be confident that data in the form of a stream of bytes can be preserved indefinitely. The CEDARS Guides publication series [16] describe how to implement digital preservation systems based on actual experience with the OAIS Reference Model.

International researchers working on the InterPARES [17] project contributed research leading to development of strategies, policies, and metadata standards essential for the long-term preservation of electronic records. In addition, the Internet Message Format standards for electronic mail, RFC 822 and the recently revised version RFC 2822, provide the framework for record-creation metadata elements.

1.3  Classification of Email Records

Expanding e-government services will generate increasing amounts of email records involving every agency of state government. To meet the current and anticipated requirements for the retention of these records, the email repository must store and make available email records and their component parts for administrative use. Further, the storage processes must ensure that the repository is trustworthy, must implement records schedules to destroy and retain email records appropriately, and must be capable of preserving and retrieving those email records for the long-term.

State records in Texas are defined as any written, photographic, machine-readable, or other recorded information created or received by or on behalf of a state agency or an elected state official that documents activities in the conduct of state business or use of public resources [18]. In Texas, all electronic messages are state records. At present, electronic mail messages sent or received by an agency are scheduled somewhat arbitrarily into three general series categories: (1) administrative correspondence, (2) general correspondence, and (3) transitory information, and we find that some agencies follow these schedules only sporadically as yet [19].

The Texas Email Repository Model (TERM) anticipates massive numbers of email records produced using a variety of messaging platforms. As additional governmental functions are implemented through email messages, state agencies will need a finer level of granularity for classification of these records. This level is achievable by using standard email metadata to determine the record creator and then linking this identification to job function or job activity and an associated records schedule set. The use of this information, together with email content, may permit an adequate automatic classification of email records.

Message flow is also important for setting up definitions of what to capture. TERM categorizes email records into three types of transactions based on message flow:

  • internal - state employee sending email records to state employee-where the official record is the email record created by the sender;
  • outgoing - state employee sending email records to the customer outside of an agency—where the official record is the message created and sent by the state employee; and
  • incoming - a customer outside the agency sending email records to a state employee—where the official record is the message received at the agency.

1.4  Storing Email in a Trusted Repository

TERM incorporates a preservation strategy with the following elements:

  • operating procedures followed at the agency level as well as in the trusted repository;
  • migration or emulation strategies for preserving records and their attachments in ways that ensure the integrity of the records and produce clear audit trails and reports;
  • backup and disaster recovery procedures that enable the business of the state to continue with minimal interruption;
  • metadata management features that link the component parts of records and describe their content, context, structure, and presentation requirements;
  • stable environmental controls; and
  • secure facilities with well-documented procedures and audit mechanisms.

This preservation strategy respects the archival bond [20] by providing evidence of the official business transactions of the state in an architecture that respects and maintains links and ties between the compound parts of the record as well as larger logical groupings of records.

TERM also has the potential for re-purposing the information in a searchable data warehouse. Enhanced systems of classification may emerge to refine and redefine records schedules by assessing what categories of activity and content are actually present in the email records. Such a system would lead to systematic classification, retention, and management of large volumes of email records from the point of their creation forward through time and would supplement e-government processes.

2   Building TERM

The RLG/OCLC report, Trusted Digital Repositories [21], reflects the preservation community's thinking about reliable and trustworthy digital archives. TERM is conceived as an OAIS compliant centralized repository for all Texas state agencies to share, but it also takes advantage of the flexibility of the OAIS model by recognizing the possibility that multiple OAIS complaint repositories can also exist. Repositories for short-term retention, based upon the same design elements and standards of trustworthiness but more limited functionally, should also be implemented within agencies, facilitating the secure movement of records to the statewide repository at designated intervals. TERM also provides essential protections for the electronic records of multiple state agencies using strategies that plan at the outset for disaster recovery, cessation of the repository, and a highly regulated environment.

2.1  The OAIS Reference Model Overview

The OAIS Reference Model illustrates the functions and information flows applicable to a trusted repository constructed to maintain safe long-term custody of electronic records simultaneously with access to them. The major functions of the model (Figure 1) are:

  • Ingest—receipt and verification of records;
  • Archival Storage—secure storage of records;
  • Data Management—secure management of records;
  • Access—provision of records in response to user queries;
  • Preservation—management of record integrity and security over time; and
  • Administration—management of internal and external relations.

diagram of major functions of the OAIS Reference Model

2.2  Repository Management Team for TERM

The Repository Management Team's responsibilities as defined by the OAIS model are to "set overall policy as one component in a broader policy domain" [22]. We found that the careful consideration of the membership of this team and the members' roles is vital. A centralized email repository for Texas state government requires an active management team composed of all stakeholders who, acting together, can facilitate the cross-agency cooperation necessary for the success of the central repository and can provide support for an eventual centralized statewide electronic record preservation system. During the initial stages of TERM, the Repository Management Team's primary task is to foster an environment in which TERM can develop by:

  • promoting statewide commitment, funding, and involvement;
  • facilitating communication among stakeholders;
  • advocating adoption of non-proprietary mail messaging systems throughout state agencies;
  • encouraging compliance with current email use and retention guidelines;
  • developing disaster and cessation strategies; and
  • guiding selection of contractors and continuous oversight of contractor performance.

Figure 2 outlines the membership of this team, built upon the membership of the existing Records Management Interagency Coordinating Council.

Chart showing areas of responsibility

2.3  Agency-Level Activities

Long-term preservation of electronic records begins with the creation of the email messages themselves. Any incoming, outgoing, or internal email (Figure 3) is captured by the email messaging system's SMTP server. Capture is accomplished by creating a duplicate message and then routing it to a temporary internal agency repository that implements a subset of OAIS structure and procedures. Proprietary systems will need to translate the message to be RFC 2822 compliant. This internal repository retains the email duplicates until a sufficient number aggregate, then it notifies TERM and the Ingest process begins. The agency-level activities should meet the same level of trustworthiness as an OAIS repository, and can be so articulated:

  • Ingest—capture copy from messaging system, place in storage, notify Data Management;
  • Archival Storage—maintain secure storage;
  • Data Management—track storage contents, notify TERM of Submission Information Package (SIP) readiness, provide records list to harvester;
  • Access—only by Data Management and TERM harvester;
  • Preservation—mainly security tasks, since records are only kept temporarily; and
  • Administration—includes Records Management Officer, IT management, agency legal counsel, agency executive representative, user representative.

Chart showing agency level messaging workflow

Before the email can be accessioned by TERM, a number of arrangements must be made to accomplish the transfer of the digital data. The OAIS Reference Model identifies Information Packages for describing the digital information to be preserved as well as its necessary accompanying metadata (Figure 4). TERM will negotiate a Submission Information Package agreement with the state agency producing and submitting the electronic records.

Chart showing the information flow from agency to consumer.

The arrangements recorded in the SIP agreement between the repository and the email record-producing agency will address issues such as: representation metadata, retention periods, access rights, permissions and privacy, and preservation information. The SIP agreement may also include options designed to meet individual agency needs and demands stemming from e-government services. The varied nature of government activities requires a flexible system for the long-term retention of electronic records. Possibilities for variations within the SIP agreement include:

  • pre-classification of records at the agency prior to submission;
  • inclusion of digital signatures for authentication;
  • encryption specifications;
  • scheduling of data transfer based on regular intervals or size; and
  • creation of a robust local OAIS-modeled repository at the agency level.

The agreement also addresses requirements for handling email attachments as well as other digital objects.

For assuring the security of TERM, we suggest that SIPs be harvested by a TERM agent rather than sent by the agencies. Once TERM has received the SIP it will perform quality assurance on the SIP, create internal log files, and track any errors that may have occurred during the transmission or validation process. TERM will verify receipt by sending the agency a report that indicates that the package was authenticated and that the package complied with the SIP agreement thus permitting the agency to destroy the SIP on the local server. If the SIP does not comply, a repeated attempt at harvesting the SIP will be performed. Upon successful receipt of a valid SIP, the originating agency's direct involvement ends and the Ingest process begins at TERM.

2.3.1  Ingest Function

Ingest begins the process of preparing the contents of the SIP for storage and management inside TERM. The essential tasks of Ingest are: (1) generating the Archival Information Package (AIP) from the SIP for the Archival Storage function and (2) generating transformed, manipulatable records for the Data Management function of TERM.

The Ingest process transforms the SIP into an AIP. It is during this transformation process that the original bit stream is stored with identifying information in Archival Storage and the record is transformed into searchable form for TERM's repository management database (Figure 5). The AIP must have the ability to protect and produce the original bit stream by retaining authentication, access, and provenance information. A tested means of managing metadata and AIPs is by transforming the email records into records marked up in Extensible Markup Language (XML). The use of XML not only accommodates descriptive, access, and preservation metadata, but also the original content of the email and its attachments. Further, the emerging use of XML for cross-platform e-commerce applications is leading to provisions for manipulating XML-encoded records in major database systems.

2.3.2  Archival Storage Function

Once created, the AIP is maintained in the archival storage environment of TERM using well-understood data-center administrative procedures (backups, media refreshment, offsite storage/replication) to ensure the protection of the AIPs and the trustworthiness of the repository. The transformed XML-encoded record is passed to the Data Management function.

2.3.3  Data Management Function

Deployment of TERM's Data Management entity depends significantly on vendor and platform choices. Since TERM will provide long-term preservation of multi-agency digital objects, non-proprietary software applications where available and open standards for database development are necessary for successful operation into the future.

The data management function can be thought of as managing a data warehouse. The initial task of Data Management is to receive information transmitted from ingest procedures. The transformed, manipulatable record will be stored in a searchable and retrievable form and probably additionally indexed. Ideally, a use-copy of the email would also be generated to be stored and remain ready for immediate access, but it is likely that on-the-fly-generation will be more practical.

Chart showing workflow for extraction at Ingest

2.3.4  Access Function

Because State agency concerns and statutory requirements require that agencies retain long-term intellectual control over their email records and because public requests in the short term must be made through agencies, it is of paramount importance that access issues be resolved at the outset. Information regarding permissions for access and privacy requirements must be a part of every object contained in TERM. Issues related to shifts of ownership need to be scheduled (i.e., automated shift of custody from agency ownership to ownership by the Texas State Library and Archives Commission at a specific point in a permanent record's lifecycle, Figure 6). If Ingest is the first contact agencies have with TERM, then Access is their ongoing contact. Access provides the interface through which agency customers engage the system and maintain contact with copies of their data submissions.

Access receives requests for information, validates access permissions, and uses those requests to generate queries to be passed on to Data Management. Once Access receives the query result set, it generates a Dissemination Information Package (DIP). DIPs can represent one-time, even interactive requests or can be routinely generated for periodic delivery to TERM customers. If a certified copy of the original bit stream is required, a request is sent to Archival Storage and the appropriate objects are copied and returned to Access for dissemination to the customer, in whatever form the customer requests.

Additionally, the Access function of TERM provides reports relevant to the functioning of TERM, on demand or by scheduled event, to the Administration, Data Management, and Archival Storage functions. Examples of reports include compilations of usage patterns, request types, or accounts receivable. These reports, over time, can inform and help refine the records management and classification processes used by the state of Texas.

Chart showing Email record life cycle under TERM

2.3.5  Preservation Function

Because of the relatively fragile nature of physical digital storage media and the transitory nature of proprietary digital formats, active preservation is a vital part of the OAIS Reference Model and consequently of TERM. All common forms of digital media, both magnetic and optical, decay over time. This deterioration, in turn, affects the integrity of digital information objects stored on the media. While some media may have a lifespan reasonable for brief retention periods, even short-term records may sometimes have to be moved to avoid technological obsolescence and long-term preservation will require perpetual media refreshment and repackaging [24]. Preservation of the records in the sense of moving them forward into the future across software and hardware changes will be carried out by the appropriate method(s) in the continuum from emulation to migration [25].

2.3.6  Administration Function

Administration oversees and administers all of the functions of TERM. It has contact with every portion of the model and all those who use the repository. The Administration function of TERM is accountable for archival storage of digital objects. If all or part of the administrative role in TERM is provided by a qualified contractor, contract specifications will require full documentation of the contractor's repository management activities and regular review by external auditors. The Repository Management Team is ultimately responsible for monitoring TERM's Administration function.

Typical responsibilities of TERM Administration include:

  • setting physical access controls such as locks for doors, guards, and security procedures;
  • providing appropriate hardware and software;
  • monitoring error logs of data transfers to ensure integrity of data objects;
  • refreshing storage media;
  • performing migration tasks as required;
  • creating backup copies;
  • testing disaster plans and performing disaster recovery activities when necessary;
  • monitoring system performance;
  • activating requests for periodic or event-driven reports;
  • supplying user services including billing and user account setup; and
  • monitoring the designated user community using tools to gauge the effectiveness of TERM.

3   Conclusion

Hundreds of archival institutions exist in North America, and yet very few, public or private, are currently preserving electronic records for the long-term. The greatest threat to the process may well be, as Stephens and Wallace [26] point out, the lack of organizational commitment to allocate the resources required to assure the preservation of electronic records. At present, it seems that Texas is poised to make this commitment, and the environment is certainly favorable for it.

There are potentially many advantages to be gained by the implementation of TERM. Work to construct TERM could help state agencies develop uniformity of practice and lead to uniformity in messaging systems. Additionally, building a highly regulated long-term repository for email records can lead to the development of a potentially valuable knowledge base for recognizing new patterns of records creation and use in support of emerging e-government functions. Over time, use of the search and retrieval capabilities available in a TERM-like repository could lead to revised retention requirements, new requirements for metadata types, insight into the value of recordkeeping in open environments, and development of automated classification systems. These developments would, in turn, inform decision-makers and would lead to new strategies for organizational structures within state governments that cost less to operate and provide more value. Most importantly, information and services accessible to the public would increase and improve.

4   Notes and References

[1] Haftner, Katie, "The 30 Year Path of E-mail," The New York Times on the Web (06 December 2001): online, Internet, <http://www.nytimes.com>, 10 December 2001.

[2] John Garrett and Donald Waters, Chairs, Task Force on Archiving of Digital Information, Preserving Digital Information: Final Report and Recommendations. (Mountain View, CA: Commission on Preservation and Access and Research Libraries Group, 1996): online, Internet, <http://www.rlg.org/ArchTF/>, 20 July 2002.

[3] Hilary Berthon and Colin Webb, The Moving Frontier: Archiving, Preservation, and Tomorrow's Digital Heritage (Melbourne, Victoria: VALA Biennial Conference and Exhibition 16-18 February 2000): online, Internet, <http://www.nla.gov.au/nla/staffpaper/hberthon2.html>, 20 July 2002.

[4] Charles Dollar, Ensuring Access Over Time to Authentic Electronic Records: Strategy, Alternatives, and Best Practices (Sacramento, CA: National Association of Government Archives and Records Administrators, 16-19 July 1997)

[5] Luciana Duranti, "Reliability and Authenticity: The Concepts and Their Implications" Archivaria 39 (1995).

[6] Reagan Moore, Chaitan Baru, Arcot Rajasekar, Bertram Ludaescher, Richard Marciano, Michael Wan, Wayne Schroeder, and Amarnath Gupta, "Collection-Based Persistent Digital Archives -Part I," D-Lib Magazine 6 (March 2000): online, Internet, <http://www.dlib.org/dlib/march00/moore/03moore-pt1.html>, 17 July 2002.

[7] Titia van der Werf-Davelaar, "Long-term Preservation of Electronic Publications: The NEDLIB Project" D-Lib Magazine 5 (Sept. 1999): online, Internet, <http://www.dlib.org/dlib/september99/vanderwerf/09vanderwerf.html>, 17 July 2002.

[8] William E. Underwood, Analysis of Presidential Electronic Records: Final Report (Atlanta, GA: Computer Science and Information Technology Division, Georgia Tech Research Institute, Sept. 1999): online, Internet, <http://perpos.gtri.gatech.edu/perpos/Final_Report.pdf>, 20 July 2002.

[9] Consultative Committee for Space Data Systems (CCSDS), CCSDS 650.0-B-1 Reference Model for an Open Archival Information System (OAIS). Blue Book. Issue I. (Washington DC: CCSDS, Jan. 2002). Adopted by ISO 14721 (2002): online, Internet, <http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html>, 20 July 2002.

[10] Kenneth Thibodeau, "Building the Archives of the Future: Advances in Preserving Electronic Records at the National Archives and Records Administration" D-Lib Magazine 7 (Feb. 2001): online, Internet, <http://www.dlib.org/dlib/february01/thibodeau/02thibodeau.html>, 20 July 2002.

[11] Consultative Committee for Space Data Systems (CCSDS), CCSDS 650.0-B-1 Reference Model for an Open Archival Information System (OAIS). Blue Book. Issue I. (Washington DC: CCSDS, Jan. 2002): online, Internet, <http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html>, 20 July 2002.

[12] Networked European Deposit Library (NEDLIB) (The Hague, The Netherlands, 2002): online, Internet, <http://www.kb.nl/coop/nedlib/>, 20 July 2002.

[13] National Library of Australia, (Canberra, Australia, 2002): online, Internet, <http://www.nla.gov.au>, 20 July 2002.

[14] Royal Library of Sweden. Kulturarw3 Heritage Project, (Stockholm, Sweden: Royal Library of Sweden, 2002): online, Internet, <http://www.kb.se/kw3>, 20 July 2002.

[15] CEDARS Project, (Leeds, England: University of Leeds, 2002): online, Internet, <http://www.leeds.ac.uk/cedars>, 20 July 2002.

[16] CEDARS Publications, (Leeds, England: University of Leeds, 2002): online, Internet, <http://www.leeds.ac.uk/cedars/pubconf/pubconf.html>, 20 July 2002.

[17] InterPARES, (Vancouver: University of British Columbia, 2002): online, Internet, <http://www.interpares.org/researchplan.htm#domains>, 20 July 2002.

[18] State of Texas, Government Code Chapter 441. Libraries and Archives. Subchapter L. Preservation and Management of State Records and Other Historical Resources 441.180(11) (Austin, TX, 2002): online, Internet, <http://www.capitol.state.tx.us>, 20 July 2002.

[19] State of Texas. Texas State Library and Archives Commission. Email Policy Model for State Agencies (Austin, TX, 2002): online, Internet, <http://www.tsl.state.tx.us/slrm/recordspubs/email_model.html>, 20 July 2002.

[20] Heather MacNeil, Trusting Records: Legal, Historical and Diplomatic Perspectives (Dordrecht, The Netherlands: Kluwer Academic, 2000).

[21] RLG-OCLC, Trusted Digital Repositories: Attributes and Responsibilities, (Mountain View, CA: RLG, May 2002): online, Internet, <http://www.rlg.org/longterm/repositories.pdf>, 20 July 2002.

[22] Consultative Committee for Space Data Systems (CCSDS), CCSDS 651.0-W-1, Producer-Archive Interface Methodology Abstract Standard. (OAIS). White Book. Issue I. Draft Recommendation for Space Data System Standards (Washington DC: CCSDS, Dec. 200l): online: Internet, <http://ssdoo.gsfc.nasa.gov/nost/isoas/CCSDS-651.0-W-1.pdf>, 20 July 2002.

[23] Kenneth Thibodeau, The Unsteady State of the Art of Preserving Electronic Records (Florence, Italy: VI European Conference on Archives, May 2001): online, Internet, <http://www.nara.gov/era/europe/europe_index.html>, 15 Feb. 2002.

[24] Anne Kenney and Oya Y. Rieger, Moving Theory into Practice: Digital Imaging for Libraries and Archives (Mountain View, CA: Research Libraries Group, 2000).

[25] State of Digital Preservation an International Perspective,Conference Proceedings April 24-25, 2002 (Washington, DC: CLIR, 2002): online, Internet, <http://clir.org/pubs/reports/pub107/pub107.pdf>, 20 July 2002.

[26] David O. Stephens and Roderick C. Wallace, Electronic Records Retention: An Introduction (Prairie Village, KS: ARMA International, 1997).

 

Copyright © Marlan Green, Sue Soy, Stan Gunn and Patricia Galloway
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Conference Report
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/september2002-galloway