Safeguarding Digital Library Contents and Users

Document Access Control

Henry M. Gladney
IBM Almaden Research Center
San Jose, California 95120-6099
[email protected]

D-Lib Magazine, June 1997

ISSN 1082-9873

Abstract. Digital library services must protect document owners, users, and themselves against misuses of their contents. IBM Research is working on a suite of technical tools which address various perceived risks to library quality. Which tools are useful depends on the circumstances in which a document collection is held. If many independent users need library update privileges, an access control tool is essential.

This report sketches an access control method that mimics organizational practice by combining a subject tree with ad hoc role granting, that controls privileges for many operations independently, that treats privileged roles such as auditor and security officer like every other individual authorization, and that makes access control information part of ordinary objects. This Document Access Control Method (DACM) scales efficiently from very small to very large libraries, is functionally flexible, and can be built into a library or be implemented as an external reference monitor for any collection of information objects.

A realization exists, performs well, and minimizes human administration needed. A single library can implement different policies for different document classes, such as mandatory access controls (MAC) for defense documents and discretionary access controls (DAC) for other documents.

Introduction

In a prior article [Gladney97], we have sketched the risks faced by the owners and users of digitally stored documents and have shown how various technical measures help protect against accidental and deliberate breaches of the economic, quality, and confidentiality interests of the parties to Digital Library (DL) transactions. Which of several measures (encryption, marking, access control, etc.) are helpful depends on the kind of data to be safeguarded and their environment and circumstances of use.

Some information collections, such as those produced by cinema and broadcast media studios, embody the core business of their owners and are released outside their owners' enterprises only partially and under contractual constraints. Within the enterprise protection boundaries, the largest risks to such materials come from unauthorized or unintended actions by enterprise employees. Access control services built into the Digital Library (DL) can significantly reduce such risks.

Since the early 1970s, when we created one of the first access control subsystems [Gladney75], which became RACF® [IBM85] and is still used on most IBM mainframes and also on OS/2® LAN products, there has been little access control innovation practical enough to be introduced into products. (Access control in Unix® file systems is similar to RACF, as is Tivoli's recent distributed access control). Other than our work [Gladney92] sketched below, nothing suitable for DL has emerged, not even within object-oriented databases [Rabbitti91]. Nor have well-known office delegation requirements [Moffett88] been effectively addressed. We report a scheme which satisfies these needs with remarkable efficiency. It has been implemented in an enterprise document management product [IBM94] which we are adapting for use in digital libraries.

What follows is our preferred design for the access control mechanism called for in our prior article [Gladney97]. The current story is a summary of a more complete article which has just been published. We recommend this article [Gladney92] to the reader interested in formal definitions, fine points, and careful analysis of the properties of our solution.

What Access Control Services Are Needed?

In principle, object access control is simply conformance to a rules array which records the privileges allowed to each subject for each object. Such arrays are, however, impractical because of the human effort needed to manage them. We can replace them with much smaller data structures which work well both for small systems (10 users and a few thousand objects) and also for large systems (thousands of users and millions of objects).

In addition to well-known needs [Gladney75], a comprehensive scheme must provide:

Decentralized administration of resource pools, because no single individual can know what protection everything needs;
Certifiable behavior at resource pool boundaries, so that service offerers can confidently commit to protect other peoples' data;
Smooth synthesis of mandatory and discretionary access control [Chokani92], as might be needed by a company with both military and commercial work;
Support for generally accepted accounting principles, which require that each person be limited to resources needed to discharge his responsibilities, that sensitive resources be modifiable only in partial steps by independent users, and that auditors can easily review user actions;
Ability to define what it means to be a specially privileged user, such as an auditor; an enterprise might require its own definition of the operations associated with each such role and constrain each instance to a limited data scope;
Proxy support, in which a human user acting for another human temporarily gets partial privileges of the principal;
As little clerical burden as possible; and
Upwards compatibility from prior methods such as RACF [IBM85], industry conventions such as OSF-DCE [OSF91] security, and formal standards such as POSIX [ISO91] security.

This paper is about access control;. security is a bigger topic. We assume that improved communication security, document and user identity authentication, operating system basics, and the like are provided by well-known techniques. Our scope, jargon, and design conform to the applicable ISO security framework

Access Control

Security is conformance to proper authorizations for data movement out of a store into other stores -- the rest of the world in Figure 1 -- and for changes made in this store responsive to external commands; the distinguished store is sometimes called the protected resource(s). Procedures which define the sole external interface to what is stored -- collectively called the resource manager -- are part of the protected resource and are responsive to commands -- bit strings passed from outside. Since distinct programs can effect the same state change, we allude to each equivalent procedure set as an operation.

A library is a protected resource containing data sets called objects or documents and a catalog which locates and describes each object in one or more records. Thus, a library is a specialized protected resource.

Figure 1. A protected resource and the rest of the world: The black portions describe any library or object store; the green portions are specific to access control.

Access control is a custodial contract governing the relationship of the store with the rest of the world. Specifically, it has to do with the execution of operations which deliver information, or which change the state in a way that potentially affects future information delivery. Access control is said to be in effect if the store state permits any past action or future permission to be traced to proper authority, and if such permissions faithfully reflect an articulated policy. The store is said to have integrity if its state conforms to articulated consistency rules. A storage subsystem which has integrity and access control is said to be secure.

A reference monitor is a subsystem which records who may do what, i.e., what is authorized, and provides conforming yes/no decisions responsive to queries by active system components. The ISO access control framework [ISO92] delineates how to partition the functionality. The reference monitor for a particular store can be embedded within the store itself or be part of another protected resource.

Access Control Information Structure

In what follows, the access control matrix size is handled by aggregating subjects and, separately, objects into classes; constraint to useful policies is provided by limiting interrelationships of subjects; and easily understood delegation rules are provided by emulating common business practices. A tabulation of access control rules would look something like:

OBJECT

USER

PRIVILEGES

id1	charlie	0001011111001
id1	robert	0010011111001
id2	charlie	1111111111001
id9	james	0001010111001
. . .	. . .	. . .

It relates objects and subjects to boolean vectors representing privileges. Such tables can be managed by relational database management systems. Every possible relationship between objects and subjects can be represented by this form of table. However, this form of table is not only too large to be practical, but it also permits relationship values that human users neither expect nor want. The rest of this paper describes a different form of the information that is not only practical, but also models the behaviors that enterprises have been found to want.

This is done by aggregating rows in particular ways; to explain our choices in natural language, we must define some terms, starting with permission. A privilege can be access to an operation or omission of some normally required validity check or audit trail addition. Even though the words "privilege" and "permission" are synonyms in normal English usage, to help the reader distinguish what is tabulated from what is calculated we use privilege for an express access grant which is relatively persistent, i.e., a datum which is tabulated as part of the access control information, and permission for a value calculated by combining privileges, other resource state elements, and transitory circumstances. We collect objects and subjects into classes; in fact, this is essential for POSIX compliance [ISO91]. Aggregating subjects and objects into equivalence sets allows us to re-express the former table as one with far fewer rows:

OBJCLASS	USERGROUP	USERKIND	PRIVILEGES
class1id	charlie	user	0001011111001
class1id	robert	group	0010011111001
class1id		owner	0010011111001
class1id		public	0000001110000
class2id	charlie	group	1111111111001
class9id	james	department	0001010111001
. . .	. . .	. . .	. . .

Here each first column entry identifies an access equivalence set -- a set of objects with the same access control information.. Each set of rows with a common first column value is the access control list for an equivalence set. The second column identifies a user, a group containing a user, an object owner, and so on; the third column indicates which which interpretation is intended. A permission function interprets the tabulated information, adding logic which implements delegation as described below.

We treat each access control list as an attribute of the object identified in the first column, and call this an access control object. Any access control object can contain other document parts and be manipulated as other objects; i.e., access control objects are ordinary library objects. The combination of a permission function and the tabulation just described is a reference monitor in the sense called for by [ISO92].

Subjects, Roles, and Proxies

A subject is a potential resource user. (Terms describing human subjects, such as user, and custodian, are mapped by data blocks bound to processes. Relationships, such as manager of, are mapped by cross-references among these blocks.) Formally, a subject is a privilege set for operations and data resources. Privileges are granted by other subjects; each subject has a parent and may have any number of children, collectively called the group of that subject. The subject graph is a delegation or grant hierarchy; i.e., delegation is realized as a directed acyclic graph. Each subject may grant only privileges which (s)he has (see below). Groups are not distinguished from individual subjects, but a subject can still give privilege by virtue of being a group member.

The subject tree root, called the custodian, is a surrogate for the person offering storage services and committing to users the proper treatment of data held. The custodian has unconditional access to everything in the store, i.e., no access control checks are performed for a user logged on as the custodian, who is therefore analogous to a UNIX root user.

Delegations are task-oriented relationships which recur within a community; examples are "is secretary to" and "is auditor for". A delegation is a set of privileges required to accomplish related tasks, and is represented by a named bit vector. For instance, an administrator allows subjects to connect to the library, defines their privileges, and sets some basic access control information.; the custodian grants the privileges needed to meet these responsibilities to each administrator, who is distinguished from other subjects only by having certain uncommon privileges.

A subject can pass on privileges for well-defined tasks on circumscribed resources within the resources controlled by the granting subject; for instance, the manager-secretary relationship is homogeneous, but each manager can grant resources only from his own pool. Thus, a proxy is a subject's authorization that another subject may use a specific subset of the grantor's privileges.

A role is the exercise of a proxy, limited to the duration of a session; a user accesses a role by asking for it during logon. The act of connecting to a store (logging in) establishes an association between a user and a subject. Optionally, logging in also binds a role to the user. For the library session, the user gets the union of the privileges of the bound subject and the bound role.

Subjects' Operation Privileges and Delegations via Roles

Subjects and subject interrelationships are needed mainly for resource management. Part of each subject descriptor is a specification of which library operations it may use. Each subject s is added to the access control information. by a SubjectDefine operator. The subject doing this must itself be authorized to use SubjectDefine; this is what we mean by "library administrator".

The privileges granted to s are not necessarily related to those of its parent in the subject tree, but are limited to those of the library administrator. A user can create a new subject only downwards in the subject tree. This rule applies not only to his own subordinate tree, but also to the subordinate tree of his current proxy grantor. Library administrators can employ this mechanism to enable upwards administration of subjects without granting other privileges high in the tree. For instance, the owner of the SALES node in Figure 2 could authorize his subordinate BILL to manage the SALES subject tree. As another example, a security auditor could receive a proxy to inspect but not change objects in an entire library even though this individual has update privileges only on a more limited set of objects.

Figure 2. Subject graph and delegation: for instance, the ordinary grant might be within the SALES department, and the proxy grant might be to an internal auditor, JOHN, who is a member of an audit group temporarily assigned by the sales manager to audit the machines group.

The access paths for information used in permission decisions are collected in Figure 3, which suggests a relationship of DACM to object-oriented computing models.

Figure 3. Entity-relationship diagram describing access control data structures: an access control object is a special case of an ordinary object. Taken from Moffet [Moffett88].

Embedding Access Control in Subsystems

A popular line of thought, exemplified by OSF-DCE [Kumar91], emphasizes a network of mutually supportive resource managers, each providing a specialized service to multiple concurrent clients. Figure 1 is redrawn in Figure 4 to emphasize how such resource managers embed themselves in networks and interact. Each resource manager encapsulates a protected resource, providing a certified approximation to data quality management encompassed in the ACID (Atomicity, Consistency, Integrity, Durability) properties [Gray93].

Figure 4. Client/server structure for a protected resource: being one way of providing isolation demanded by Figure 1.

A DACM-protected library has been built [IBM94] with this structure, using relational database managers and file resource managers as the implementation vehicle. It has been used for 4 years by enterprise document management applications which have many similarities to DL applications. The client-server separation provides the world/library isolation needed (Figure 1). The programs and data that express and enforce library policies are in storage pools inaccessible to users except by way of library interface software, and thus themselves easily protected.

Properties of a Preferred Implementation

DACM is extremely flexible. A standard implementation creates a decision logic more by the contents of tables than by fixed programs. Standard DACM can be extended by modules that realize different policy classes, providing different policies for portions of a single library.

Flexibility, Extensibility, and Custom Reference Monitors

DACM treats subject relationships, subject-to-object relationships, and operation relationships differently. We know enough about generic delegation rules [Moffett88] to represent subjects and proxies in fixed data structures and programs. Subject-to-object relationships are left to users, who can tabulate but not otherwise manipulate them within DACM.

DACM avoids built-in operation relationships. Such relationships built explicitly or implicitly into other access control systems have invariably been unwelcome in some applications; for instance, the frequent assumption that write privilege should imply read privilege is unsatisfactory for an audit trail. Since DACM implementations represent privilege sets by bit vectors, an enterprise which wants to enforce certain relationships can limit which vector patterns are used.

Any object can bind both an access control object and a permission function, and can choose its permission function either directly or via its access control object. This allows different rules for disjoint object sets; a single collection can mix access control policies, combining old methods with new ones and MAC with DAC, to meet the needs of different departments in a complex enterprise.

DACM roles emulate job descriptions. Roles carry names like "secretary" because their instances are similar for all donor/donee pairs; here, "similar" means "have (approximately) the same privilege vector mask". Some examples of common meanings are:

Library Administrator	Permitted to administer subject definitions and to bypass checks dependent on object identifiers, in order to execute database backup and recovery.
Departmental Administrator	Permitted the same privileges as a library administrator, but only for objects owned by a particular department, e.g., owned by subjects in a subtree of Figure 2.
Auditor	Permitted to inspect data owned by a subject subset, but prevented from changing any data in the inspection domain.
Secretary	Permitted to a subset of the privileges of some other single user.

The specific privileges of each role are likely to differ in different enterprises, but be uniform within each enterprise. Sharing privilege vector templates will be common, with graphic screen interfaces to define templates and to manage arcs in the delegation graph.

Economy of Administration

Ordinary computer users mostly become aware of security machinery indirectly, but they must often understand access control because they have to do something to take advantage of it. Managing DACM controls requires a modest effort and can be delegated to the users who best know what protection each resource pool deserves.

Library administration can be as centralized or decentralized as is wanted.
Each user community and each individual user can choose as little or as much control differentiation as is wanted. Access control objects are readily attached to objects, and can reflect subject groupings which parallel typical organizations.
A library service can define any number of distinct privileges, and can leave enforcing implied relationships to higher level software.
Changing an access control list is similar to other document editing tasks.
Administrators can arrange that most users get access controls attached automatically to the objects they create, and can leave other administration to technicians.
Users can readily determine why any particular action was or will be denied or allowed. Since delegation patterns are explicit (Figure 3), audits can be effected with simple utility programs acting on the security log (Figure 1).
In our implementation [IBM94], no new computer operation or database administration tasks are imposed by DACM.

In summary, the DACM model mimics conventional office patterns so closely that external interfaces and jargon can readily be designed for computer novices; users can choose different behavior for different circumstances, and can change specific rules whenever they want without unexpected side effects; and access management is so similar to other operations that it imposes few extra administrative chores.

Economy of Execution

Evaluation of the access decision requires only a short walk from a tree branch towards its root and some existence checks and boolean vector arithmetic. The cost of locating access control information is negligible because the root linkage is in each object's main catalog record. The potential performance degradation is the I/O overhead required to fetch access lists. A DACM implementation can make this overhead small in all practical circumstances and imperceptible in many interesting applications because DACM data structures will usually be small and heavily reused so that caching is feasible and effective. I/O overhead can be easily made small because:

Each user's operation privileges are fetched as part of library session creation and held in main memory for the duration of the session. This privilege vector is checked before accessing objects, saving time in the case of failed checks and costing next to nothing otherwise.
At the option of the library custodian, any subject is permitted any operation he has on objects he owns without further check.
Many objects will be protected by generic rules, e.g., in a public library, every object will have read privileges for most people and update privileges only for librarians. In such a library, fewer than ten access control objects will describe the protection pattern for most of the collection.
The worst case would be an application in which every user created his own access control lists. We believe that most users will create fewer than 10 lists with fewer than 10 subjects or groups in each, e.g., 1000 users' information will occupy less than 10 Mbyte.

Caching of evaluated permissions has been implemented in the user description in each library session [IBM94]. Retrieving the access control list for the first item in an access equivalence set contributes 10-15% delay to retrieving item catalog information. Subsequent access to equivalence set members imposes no perceptible overhead.

Conclusions

Our Document Access Control Method (DACM) complements other essential digital library protection tools [Gladney97] and comes closer to meeting known requirements than prior work. DACM implementations can readily comply with all pertinent standards. The scheme for privilege delegation is novel, and the ability to mix different control models within a single protected resource is unique.

DACM scales well, allowing as little or as much control differentiation as is wanted. Librarians with relatively simple requirements can specify them quickly, possibly without consulting with end users. Users can mix fine differentiation for critical resources with uniform controls for most of a collection. Lightly-protected resources suffer no performance or administrative burden because sensitive resources happen to exist.

Massive data collections can be extremely valuable assets which, by the very act of collection, are put at risk from deliberate and accidental misuses. Whatever one estimates the risks to be, it seems prudent to minimize temptation by raising barriers, provided that these do not impede legitimate access and are not a nuisance to administer. We believe that DACM improves safety without being obtrusive.

Acknowledgements

DACM design evolved through conversations with Rene Furegati, Paul Hudecek, Marcel Schlatter, and Eldon Worley. The implementation was created by Tom Burket, John DiClemente, Kevin McBride, and Mike Vitale

References

Chokani92. S. Chokhani, Trusted Products Evaluation, Comm. ACM 35(7), 64-76, (1992).

Gladney75. H.M. Gladney, E.L. Worley, and J.J. Myers, An Access Control Mechanism for Computer Resources, IBM Systems Journal 14, 212, (1975).

Gladney78. H.M. Gladney, Administrative Control of Computing Service. IBM Systems Journal 17, 151, (1978).

Gladney92. H.M. Gladney, Access Control for Large Collections, Trans. Information Systems, 15, 2: 154-194 (April 1997).

Gladney97. H.M. Gladney and J.B. Lotspiech, Assuring Convenient Security and Data Quality, D-Lib Magazine, (May 1997).

Gray93. J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufman Publishers, San Mateo, California, (1993).

IBM85. IBM Corporation, Resource Access Control Facility (RACF) General Information Manual, IBM Sys. Ref. Lib. GC28-0722, (1985).

IBM94. IBM Corporation. ImagePlus VisualInfo® General Information and Planning Guide, IBM Systems Ref. Lib. GK2T-1709, (1994).

ISO91. International Organization for Standardization (ISO), Draft Standard for Information Technology - Portable Operating System Interface (POSIX) - Security Interface, ISO/IEC JTC 1/SC 22/WG 15 N046R1 P1003.6 Draft 12, (September 1991).

ISO92. International Organization for Standardization, Information Retrieval, Transfer and Management for OSI: Access Control Framework, ISO/IEC JTC 1/SC 21/WG 1 N6947 Second CD 10181-3, (May 1992).

Kumar91. R. Kumar, OSF's Distributed Computing Environment, IBM AIXpert #2, 22-29, (Fall 1991).

Moffett88. J.D. Moffett and L.S., Sloman, The Source of Authority for Commercial Access Control, IEEE Computer, 59-69, (Feb. 1988).

OSF91. Anonymous, OSF® DCE Version 1.0:. DCE Application Development Reference, Open Software Foundation, Cambridge Center, Cambridge, Mass. 02142 (December 1991).

Rabbitti91. F. Rabitti, E. Bertino, W. Kim, and D. Woelk, A Model of Authorization for Next-Generation Database Systems, ACM Trans. Database Systems 16(1), 88-131, (1991).

Copyright and Disclaimer Notice

© Copyright IBM Corp. 1997. All Rights Reserved. Copies may be printed and distributed, provided that no changes are made to the content, that the entire document including the attribution header and this copyright notice is printed or distributed, and that this is done free of charge. We have written for the usual reasons of scholarly communication. This report does allude to technologies in early phases of definition and development, including IBM property partially implemented in products. However, the information it provides is strictly on an as-is basis, without express or implied warranty of any kind, and without express or implied commitment to implement anything described or alluded to or provide any product or service. IBM reserves the right to change its plans, designs, and defined interfaces at any time. Therefore, use of the information in this report is at the reader's own risk. Intellectual property management is fraught with policy, legal, and economic issues. Nothing in this report should be construed as an adoption by IBM of any policy position or recommendation.

hdl:cnri.dlib/june97-gladney