Graduate School of Library and Information Science
This article explores access to and long-term preservation of digital records in state government settings using the Open Archival Information System (OAIS) Reference Model to design a repository for managing email records in the state of Texas. Through creation of a multi-agency digital repository, state government can provide for the lawful management of email records and build a large knowledge base that can be tapped for information that will enhance business and growth.
The proposed trusted repository concept ensures that records of a truly transitory nature are destroyed efficiently and legally rather than left to accumulate in ever increasing numbers in an unwieldy manner in networked computing systems. The repository preserves the official business records of state government and can expand access services to a customer base that includes the agencies within state government and the public it serves. Finally, it is anticipated that lessons learned in its implementation will allow its eventual expansion to encompass all the digital records of Texas state government.
1.1 The Problem
Texas state government, like other government entities, is grappling with the problem of constructing a repository for preservation and access for electronic records and other digital objects. Email is a special problem because of its volume. The International Data Corporation estimates that 9.8 billion electronic messages are sent each day around the world . In view of recent well-publicized problems with the management of government email, the Texas Department of Information Resources has been exploring the possibility of a single centralized solution for managing Texas state government email according to the standards set by the Texas State Library and Archives Commission. The authors, in a seminar on digital records preservation, undertook the challenge to:
1.2 Current Environment Overview
This report describes a system that encompasses what to collect and how to collect, process, move, preserve, and access increasing numbers of email records created at nearly every desktop computer used to provide Texas state government services as well as at the desktops of citizens and businesses that use email to communicate with state government.
The Task Force on Archiving of Digital Information recommended that a collaborative system of certified digital archives be established in the United States  and since then, groups worldwide have developed models and strategies for distributed systems that collect, protect, and preserve digital information. Examples of research and model development include work reported by Berthon and Webb , Dollar , Duranti , Moore , van der Werf-Davelaar , and Underwood . The Open Archival Information System (OAIS) Reference Model has attracted wide attention as a workable model because it provides the elements that research indicates are necessary: a closely audited, well documented, and constantly maintained and updated system. These elements are especially attractive to government. This model also has the advantage of being an ISO international standard .
Within the United States, practical implementations of OAIS-modeled digital repositories can be seen at Harvard and MIT libraries. Additionally, the National Archives and Records Administration (NARA)  is currently involved in developing an OAIS-based digital repository for electronic records in cooperation with the San Diego Supercomputer Center (SDSC) , where researchers have concluded that agencies of the federal government can create an email archives that could successfully grow and migrate to new technologies over time.
International examples of OAIS implementations include the Networked European Deposit Library (NEDLIB)  of the National Library of the Netherlands, the National Library of Australia , and the Royal Library of Sweden . These projects provide leadership in the exploration of metadata, preservation methods, selection of records, retention, and access in the context of the OAIS Reference Model. The recently completed CEDARS digital archiving prototype  demonstrates that there is good reason to be confident that data in the form of a stream of bytes can be preserved indefinitely. The CEDARS Guides publication series  describe how to implement digital preservation systems based on actual experience with the OAIS Reference Model.
International researchers working on the InterPARES  project contributed research leading to development of strategies, policies, and metadata standards essential for the long-term preservation of electronic records. In addition, the Internet Message Format standards for electronic mail, RFC 822 and the recently revised version RFC 2822, provide the framework for record-creation metadata elements.
1.3 Classification of Email Records
Expanding e-government services will generate increasing amounts of email records involving every agency of state government. To meet the current and anticipated requirements for the retention of these records, the email repository must store and make available email records and their component parts for administrative use. Further, the storage processes must ensure that the repository is trustworthy, must implement records schedules to destroy and retain email records appropriately, and must be capable of preserving and retrieving those email records for the long-term.
State records in Texas are defined as any written, photographic, machine-readable, or other recorded information created or received by or on behalf of a state agency or an elected state official that documents activities in the conduct of state business or use of public resources . In Texas, all electronic messages are state records. At present, electronic mail messages sent or received by an agency are scheduled somewhat arbitrarily into three general series categories: (1) administrative correspondence, (2) general correspondence, and (3) transitory information, and we find that some agencies follow these schedules only sporadically as yet .
The Texas Email Repository Model (TERM) anticipates massive numbers of email records produced using a variety of messaging platforms. As additional governmental functions are implemented through email messages, state agencies will need a finer level of granularity for classification of these records. This level is achievable by using standard email metadata to determine the record creator and then linking this identification to job function or job activity and an associated records schedule set. The use of this information, together with email content, may permit an adequate automatic classification of email records.
Message flow is also important for setting up definitions of what to capture. TERM categorizes email records into three types of transactions based on message flow:
1.4 Storing Email in a Trusted Repository
TERM incorporates a preservation strategy with the following elements:
This preservation strategy respects the archival bond  by providing evidence of the official business transactions of the state in an architecture that respects and maintains links and ties between the compound parts of the record as well as larger logical groupings of records.
TERM also has the potential for re-purposing the information in a searchable data warehouse. Enhanced systems of classification may emerge to refine and redefine records schedules by assessing what categories of activity and content are actually present in the email records. Such a system would lead to systematic classification, retention, and management of large volumes of email records from the point of their creation forward through time and would supplement e-government processes.
2 Building TERM
The RLG/OCLC report, Trusted Digital Repositories , reflects the preservation community's thinking about reliable and trustworthy digital archives. TERM is conceived as an OAIS compliant centralized repository for all Texas state agencies to share, but it also takes advantage of the flexibility of the OAIS model by recognizing the possibility that multiple OAIS complaint repositories can also exist. Repositories for short-term retention, based upon the same design elements and standards of trustworthiness but more limited functionally, should also be implemented within agencies, facilitating the secure movement of records to the statewide repository at designated intervals. TERM also provides essential protections for the electronic records of multiple state agencies using strategies that plan at the outset for disaster recovery, cessation of the repository, and a highly regulated environment.
2.1 The OAIS Reference Model Overview
The OAIS Reference Model illustrates the functions and information flows applicable to a trusted repository constructed to maintain safe long-term custody of electronic records simultaneously with access to them. The major functions of the model (Figure 1) are:
2.2 Repository Management Team for TERM
The Repository Management Team's responsibilities as defined by the OAIS model are to "set overall policy as one component in a broader policy domain" . We found that the careful consideration of the membership of this team and the members' roles is vital. A centralized email repository for Texas state government requires an active management team composed of all stakeholders who, acting together, can facilitate the cross-agency cooperation necessary for the success of the central repository and can provide support for an eventual centralized statewide electronic record preservation system. During the initial stages of TERM, the Repository Management Team's primary task is to foster an environment in which TERM can develop by:
Figure 2 outlines the membership of this team, built upon the membership of the existing Records Management Interagency Coordinating Council.
2.3 Agency-Level Activities
Long-term preservation of electronic records begins with the creation of the email messages themselves. Any incoming, outgoing, or internal email (Figure 3) is captured by the email messaging system's SMTP server. Capture is accomplished by creating a duplicate message and then routing it to a temporary internal agency repository that implements a subset of OAIS structure and procedures. Proprietary systems will need to translate the message to be RFC 2822 compliant. This internal repository retains the email duplicates until a sufficient number aggregate, then it notifies TERM and the Ingest process begins. The agency-level activities should meet the same level of trustworthiness as an OAIS repository, and can be so articulated:
Before the email can be accessioned by TERM, a number of arrangements must be made to accomplish the transfer of the digital data. The OAIS Reference Model identifies Information Packages for describing the digital information to be preserved as well as its necessary accompanying metadata (Figure 4). TERM will negotiate a Submission Information Package agreement with the state agency producing and submitting the electronic records.
The arrangements recorded in the SIP agreement between the repository and the email record-producing agency will address issues such as: representation metadata, retention periods, access rights, permissions and privacy, and preservation information. The SIP agreement may also include options designed to meet individual agency needs and demands stemming from e-government services. The varied nature of government activities requires a flexible system for the long-term retention of electronic records. Possibilities for variations within the SIP agreement include:
The agreement also addresses requirements for handling email attachments as well as other digital objects.
For assuring the security of TERM, we suggest that SIPs be harvested by a TERM agent rather than sent by the agencies. Once TERM has received the SIP it will perform quality assurance on the SIP, create internal log files, and track any errors that may have occurred during the transmission or validation process. TERM will verify receipt by sending the agency a report that indicates that the package was authenticated and that the package complied with the SIP agreement thus permitting the agency to destroy the SIP on the local server. If the SIP does not comply, a repeated attempt at harvesting the SIP will be performed. Upon successful receipt of a valid SIP, the originating agency's direct involvement ends and the Ingest process begins at TERM.
2.3.1 Ingest Function
Ingest begins the process of preparing the contents of the SIP for storage and management inside TERM. The essential tasks of Ingest are: (1) generating the Archival Information Package (AIP) from the SIP for the Archival Storage function and (2) generating transformed, manipulatable records for the Data Management function of TERM.
The Ingest process transforms the SIP into an AIP. It is during this transformation process that the original bit stream is stored with identifying information in Archival Storage and the record is transformed into searchable form for TERM's repository management database (Figure 5). The AIP must have the ability to protect and produce the original bit stream by retaining authentication, access, and provenance information. A tested means of managing metadata and AIPs is by transforming the email records into records marked up in Extensible Markup Language (XML). The use of XML not only accommodates descriptive, access, and preservation metadata, but also the original content of the email and its attachments. Further, the emerging use of XML for cross-platform e-commerce applications is leading to provisions for manipulating XML-encoded records in major database systems.
2.3.2 Archival Storage Function
Once created, the AIP is maintained in the archival storage environment of TERM using well-understood data-center administrative procedures (backups, media refreshment, offsite storage/replication) to ensure the protection of the AIPs and the trustworthiness of the repository. The transformed XML-encoded record is passed to the Data Management function.
2.3.3 Data Management Function
Deployment of TERM's Data Management entity depends significantly on vendor and platform choices. Since TERM will provide long-term preservation of multi-agency digital objects, non-proprietary software applications where available and open standards for database development are necessary for successful operation into the future.
The data management function can be thought of as managing a data warehouse. The initial task of Data Management is to receive information transmitted from ingest procedures. The transformed, manipulatable record will be stored in a searchable and retrievable form and probably additionally indexed. Ideally, a use-copy of the email would also be generated to be stored and remain ready for immediate access, but it is likely that on-the-fly-generation will be more practical.
2.3.4 Access Function
Because State agency concerns and statutory requirements require that agencies retain long-term intellectual control over their email records and because public requests in the short term must be made through agencies, it is of paramount importance that access issues be resolved at the outset. Information regarding permissions for access and privacy requirements must be a part of every object contained in TERM. Issues related to shifts of ownership need to be scheduled (i.e., automated shift of custody from agency ownership to ownership by the Texas State Library and Archives Commission at a specific point in a permanent record's lifecycle, Figure 6). If Ingest is the first contact agencies have with TERM, then Access is their ongoing contact. Access provides the interface through which agency customers engage the system and maintain contact with copies of their data submissions.
Access receives requests for information, validates access permissions, and uses those requests to generate queries to be passed on to Data Management. Once Access receives the query result set, it generates a Dissemination Information Package (DIP). DIPs can represent one-time, even interactive requests or can be routinely generated for periodic delivery to TERM customers. If a certified copy of the original bit stream is required, a request is sent to Archival Storage and the appropriate objects are copied and returned to Access for dissemination to the customer, in whatever form the customer requests.
Additionally, the Access function of TERM provides reports relevant to the functioning of TERM, on demand or by scheduled event, to the Administration, Data Management, and Archival Storage functions. Examples of reports include compilations of usage patterns, request types, or accounts receivable. These reports, over time, can inform and help refine the records management and classification processes used by the state of Texas.
2.3.5 Preservation Function
Because of the relatively fragile nature of physical digital storage media and the transitory nature of proprietary digital formats, active preservation is a vital part of the OAIS Reference Model and consequently of TERM. All common forms of digital media, both magnetic and optical, decay over time. This deterioration, in turn, affects the integrity of digital information objects stored on the media. While some media may have a lifespan reasonable for brief retention periods, even short-term records may sometimes have to be moved to avoid technological obsolescence and long-term preservation will require perpetual media refreshment and repackaging . Preservation of the records in the sense of moving them forward into the future across software and hardware changes will be carried out by the appropriate method(s) in the continuum from emulation to migration .
2.3.6 Administration Function
Administration oversees and administers all of the functions of TERM. It has contact with every portion of the model and all those who use the repository. The Administration function of TERM is accountable for archival storage of digital objects. If all or part of the administrative role in TERM is provided by a qualified contractor, contract specifications will require full documentation of the contractor's repository management activities and regular review by external auditors. The Repository Management Team is ultimately responsible for monitoring TERM's Administration function.
Typical responsibilities of TERM Administration include:
Hundreds of archival institutions exist in North America, and yet very few, public or private, are currently preserving electronic records for the long-term. The greatest threat to the process may well be, as Stephens and Wallace  point out, the lack of organizational commitment to allocate the resources required to assure the preservation of electronic records. At present, it seems that Texas is poised to make this commitment, and the environment is certainly favorable for it.
There are potentially many advantages to be gained by the implementation of TERM. Work to construct TERM could help state agencies develop uniformity of practice and lead to uniformity in messaging systems. Additionally, building a highly regulated long-term repository for email records can lead to the development of a potentially valuable knowledge base for recognizing new patterns of records creation and use in support of emerging e-government functions. Over time, use of the search and retrieval capabilities available in a TERM-like repository could lead to revised retention requirements, new requirements for metadata types, insight into the value of recordkeeping in open environments, and development of automated classification systems. These developments would, in turn, inform decision-makers and would lead to new strategies for organizational structures within state governments that cost less to operate and provide more value. Most importantly, information and services accessible to the public would increase and improve.
4 Notes and References
 John Garrett and Donald Waters, Chairs, Task Force on Archiving of Digital Information, Preserving Digital Information: Final Report and Recommendations. (Mountain View, CA: Commission on Preservation and Access and Research Libraries Group, 1996): online, Internet, <http://www.rlg.org/ArchTF/>, 20 July 2002.
 Hilary Berthon and Colin Webb, The Moving Frontier: Archiving, Preservation, and Tomorrow's Digital Heritage (Melbourne, Victoria: VALA Biennial Conference and Exhibition 16-18 February 2000): online, Internet, <http://www.nla.gov.au/nla/staffpaper/hberthon2.html>, 20 July 2002.
 Charles Dollar, Ensuring Access Over Time to Authentic Electronic Records: Strategy, Alternatives, and Best Practices (Sacramento, CA: National Association of Government Archives and Records Administrators, 16-19 July 1997)
 Luciana Duranti, "Reliability and Authenticity: The Concepts and Their Implications" Archivaria 39 (1995).
 Reagan Moore, Chaitan Baru, Arcot Rajasekar, Bertram Ludaescher, Richard Marciano, Michael Wan, Wayne Schroeder, and Amarnath Gupta, "Collection-Based Persistent Digital Archives -Part I," D-Lib Magazine 6 (March 2000): online, Internet, <http://www.dlib.org/dlib/march00/moore/03moore-pt1.html>, 17 July 2002.
 Titia van der Werf-Davelaar, "Long-term Preservation of Electronic Publications: The NEDLIB Project" D-Lib Magazine 5 (Sept. 1999): online, Internet, <http://www.dlib.org/dlib/september99/vanderwerf/09vanderwerf.html>, 17 July 2002.
 William E. Underwood, Analysis of Presidential Electronic Records: Final Report (Atlanta, GA: Computer Science and Information Technology Division, Georgia Tech Research Institute, Sept. 1999): online, Internet, <http://perpos.gtri.gatech.edu/perpos/Final_Report.pdf>, 20 July 2002.
 Consultative Committee for Space Data Systems (CCSDS), CCSDS 650.0-B-1 Reference Model for an Open Archival Information System (OAIS). Blue Book. Issue I. (Washington DC: CCSDS, Jan. 2002). Adopted by ISO 14721 (2002): online, Internet, <http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html>, 20 July 2002.
 Kenneth Thibodeau, "Building the Archives of the Future: Advances in Preserving Electronic Records at the National Archives and Records Administration" D-Lib Magazine 7 (Feb. 2001): online, Internet, <http://www.dlib.org/dlib/february01/thibodeau/02thibodeau.html>, 20 July 2002.
 Consultative Committee for Space Data Systems (CCSDS), CCSDS 650.0-B-1 Reference Model for an Open Archival Information System (OAIS). Blue Book. Issue I. (Washington DC: CCSDS, Jan. 2002): online, Internet, <http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html>, 20 July 2002.
 CEDARS Publications, (Leeds, England: University of Leeds, 2002): online, Internet, <http://www.leeds.ac.uk/cedars/pubconf/pubconf.html>, 20 July 2002.
 InterPARES, (Vancouver: University of British Columbia, 2002): online, Internet, <http://www.interpares.org/researchplan.htm#domains>, 20 July 2002.
 State of Texas, Government Code Chapter 441. Libraries and Archives. Subchapter L. Preservation and Management of State Records and Other Historical Resources 441.180(11) (Austin, TX, 2002): online, Internet, <http://www.capitol.state.tx.us>, 20 July 2002.
 State of Texas. Texas State Library and Archives Commission. Email Policy Model for State Agencies (Austin, TX, 2002): online, Internet, <http://www.tsl.state.tx.us/slrm/recordspubs/email_model.html>, 20 July 2002.
 Heather MacNeil, Trusting Records: Legal, Historical and Diplomatic Perspectives (Dordrecht, The Netherlands: Kluwer Academic, 2000).
 Consultative Committee for Space Data Systems (CCSDS), CCSDS 651.0-W-1, Producer-Archive Interface Methodology Abstract Standard. (OAIS). White Book. Issue I. Draft Recommendation for Space Data System Standards (Washington DC: CCSDS, Dec. 200l): online: Internet, <http://ssdoo.gsfc.nasa.gov/nost/isoas/CCSDS-651.0-W-1.pdf>, 20 July 2002.
 Kenneth Thibodeau, The Unsteady State of the Art of Preserving Electronic Records (Florence, Italy: VI European Conference on Archives, May 2001): online, Internet, <http://www.nara.gov/era/europe/europe_index.html>, 15 Feb. 2002.
 Anne Kenney and Oya Y. Rieger, Moving Theory into Practice: Digital Imaging for Libraries and Archives (Mountain View, CA: Research Libraries Group, 2000).
 State of Digital Preservation an International Perspective,Conference Proceedings April 24-25, 2002 (Washington, DC: CLIR, 2002): online, Internet, <http://clir.org/pubs/reports/pub107/pub107.pdf>, 20 July 2002.
 David O. Stephens and Roderick C. Wallace, Electronic Records Retention: An Introduction (Prairie Village, KS: ARMA International, 1997).
Copyright © Marlan Green, Sue Soy, Stan Gunn and Patricia Galloway