A Secure Repository Design for Digital Libraries

Carl Lagoze
Digital Library Research Group
Cornell University
[email protected]

D-Lib Magazine, December 1995


Abstract

We describe a distributed object-based design for repositories in a digital library infrastructure. This design for Inter-operable Secure Object Stores, ISOS, defines the interfaces to secure repositories that inter-operate with other repositories, with clients, and with other services in the infrastructure. We define the interfaces to ISOS as class definitions in a distributed object system, such as CORBA or OLE. We also define an extension to CORBA security that is used by repositories to secure access to themselves and their contained objects.

Introduction and Background

The Random House Dictionary of the English Language, 1987, provides the following definition of the word infrastructure:

"the basic underlying framework or features of a system or organization"

Amy Friedlander's books on the history of infrastructure development [1,2], for the Corporation for National Research Initiatives (CNRI), demonstrate that infrastructure development is far more than advances in technology. Rather, infrastructures result from the complex interaction of existing legal, political, and economic frameworks with technical developments. This is certainly true for digital libraries, where any infrastructure proposals must accommodate the rich framework for dissemination of information that exists in our society. This framework embodies the following properties:

Researchers from the Digital Library Research Group at Cornell, the Computing and Communications Group at NCSA (University of Illinois), CNRI, and Xerox Corporation collaborated over the past several months to develop a design for repositories of objects in digital form, a fundamental component of digital libraries. The result of this work is a design for Inter-Operable Secure Object Stores (ISOS). We have recently finished the first stage of design work on ISOS. This story is an abridgement of the detailed design available in [3].

ISOS and the Kahn/Wilensky Architecture

Our starting point for this design is the framework articulated by Robert Kahn (CNRI) and Robert Wilensky (UC Berkeley) [4], as a result of the Advanced Research Projects Agency Computer Science Technical Report Project [5]. This work is commonly referred to as the Kahn/Wilensky architecture. Kahn/Wilensky broadly defines the components of an open system for storage, access, dissemination, and management of information in digital form. These components are as follows.

Kahn/Wilensky makes no assumptions about implementation details. This paper describes one possible design approach to the Kahn/Wilensky framework; based on the distributed object model. CORBA [6] and OLE [7] are two examples of implementations of the distributed object model.

We have chosen the distributed object framework for a number of reasons. From a software engineering point of view, the object-oriented model is an excellent abstraction for separating interface (what we are concerned with when defining infrastructure) from implementation. The distributed object model allows us to extend the object-oriented model to a networked environment without linking the design to specific network transport and session layer protocols. Finally, CORBA, a well developed and open distributed object framework, provides an extensible security architecture [8], on which we can build the ISOS security model.

Within this object-oriented framework, ISOS makes two contributions. First, it provides class definitions (instance variables and methods) for the concepts in Kahn/Wilensky: digital_object, dissemination, repository, data, and terms_and_conditions. These class definitions are the basis for interoperability among individual ISOS repositories and between these repositories and other digital library services. The methods are semantically equivalent to Kahn/Wilensky RAP. Second, ISOS defines a uniform and extensible method for securing access to repositories, to digital objects, and to operations on digital objects. We define an object class, terms_and_conditions, that is an encapsulation of the stated terms and conditions that apply to access to repositories, digital objects, and disseminations. This terms_and_conditions class interacts with the CORBA security framework allowing a repository to provide protection for the objects that it contains.

Repositories are One Component of a Larger Digital Library Infrastructure

We make no claim that ISOS itself is a digital library -- any more than a book warehouse embraces the rich array of services that make the collection accessible to patrons of a library. But ISOS is, in a sense, the specification for a set of distributed book (or, more generally, content) warehouses, or repositories. The repositories have the necessary security at the doors to ensure that the intellectual property that they contain is adequately protected.

These repositories interact with and are used by many other services, the totality of which comprises a digital library infrastructure. These include: naming, security, authentication, and a host of services oriented toward end-user resource identification and management. ISOS is dependent upon a naming service, which provides mapping from globally-unique, persistent names to location(s). Our design assumes the CNRI handle service [9], on which we will base our implementation. ISOS security is also dependent upon authentication services, which certify the identity of entities (patrons, agents, other machines and services), and payment services, which effect the transfer of funds for transactions. Other services that may use ISOS as a foundation are finding services, browsing services, annotation services, and link services.

In the following paragraphs, we describe the components of ISOS, the classes in the ISOS type system, and then provide an example of interaction of the instances of these classes with the other services in a digital library infrastructure.

Digital object - A digital object is a content-independent package with a number of components. We describe the important ones below.

  1. Data is a package for the contents of the digital object. A fundamental part of the architecture is that the class digital object itself is content-independent; content specific concerns are restricted to the class data contained within.

  2. The handle is the globally-unique, persistent identifier for the digital object.
  3. Terms and conditions are an encapsulation of access rules on use of the object. We will describe these access rules in the context of the ISOS security architecture later in this article.

The digital object class contains the method get_dissemination, used by a client to derive a dissemination from the respective object.

Repository- Each instance of the class repository provides access to (through the access_digital_object method) a set of instances of the class digital object. There are a number of components of the repository class; we describe the two most important here.

  1. The handle is the unique identifier for the repository. A request to the name server to resolve a handle for a digital object returns the handles of one or more repositories in which it resides. (Note that issues related to digital object replication are not covered by this design but must be resolved for the eventual scalability of the infrastructure). The repository class contains the method, access_digital_object, that allows a client to then access the contained digital object using the object's handle.

  2. The terms and conditions are an encapsulation of the access rules for establishing a connection to the repository by a client (i.e., the beginning of a session with the repository). This is a level of security even before the client can access objects in the repository.

Dissemination - An invocation of the get_dissemination method of a digital object produces an instance of the class dissemination at the calling client. There are five notable components of a dissemination.

  1. Data is a package for the content of the dissemination. Note that both a digital object and the disseminations derived from it contain data. This in no sense implies, however, that the data is the same in form (the bits themselves) or in type (the structure of the bits). Some examples illustrate the scope of this relationship. There may be a digital object which contains the fixed PostScript encoding of a computer science technical report. A dissemination of this digital object may contain that same PostScript encoding, or a portion thereof. There may also be a digital object that contains a program that is recording the current video image of a Senate session. A dissemination of this digital object might be the MPEG clip of the ten minutes of that session on September 1, 1995. Finally, there may be a digital object that contains a program, a dissemination of which might contain another program that interacts with the user, external services, and the source digital object in the repository.

  2. The handle is the identifier of the source digital object.
  3. The terms and conditions are an encapsulation of the stated terms and conditions that apply to the dissemination. These are derived from the access rules contained in the source digital object. Wrapping the delivered content with access rules gives us the ability to broaden the ISOS security framework to restrict access to disseminations. For example, the access rules might specify and enforce access only to the individual who originally requested the dissemination, or it might specify and enforce "read once" access. We are exploring mechanisms for doing this such as encrypting the data in the dissemination. The terms and conditions might specify a network-available "applet" in a "safe" language (e.g., Java [10] or Python [11]) that could decrypt the dissemination after authentication of the user, or some other action.
  4. The repository identifier is the unique identifier of the repository containing the source digital object.
  5. The transaction identifier is an identifier, unique to the source repository, that identifies the transaction that produced this dissemination.
Data - This is the class that packages content of digital objects and disseminations. In its simplest form (the primitive class), data is just a stream of bits. However, we anticipate that there will be multiple sub-classes of data, each with specialized access methods. For example, there might be a special subclass for video data with a method that allows a user of the system to access a selected time slice of the video. We define two sub-classes of data, contained_digital_object and contained_handle, which allow a digital object to package or provide a reference to another digital object. Furthermore, we allow for composites (sets) of data allowing a single object to package multiple data items or reference or contain multiple digital objects.

We demonstrate the interaction of ISOS and the remainder of a digital library infrastructure through the following hypothetical example.

Lucy is a computer science graduate student who wants to find research papers on "functional programming". She knows that CS Research Associates (CSRA) has done an excellent job indexing computer science research. CSRA has done this by getting permission (perhaps through licensing) from many computer science research repositories to browse their repositories (perhaps using the methods defined in this article) and index the digital objects included in them. Using her browser, Lucy searches the CSRA index and sees a set of search "hits" on her screen, each of which is know to the browser through its unique handle. She chooses one of them, a paper in ACM TOPLAS. The following steps occur:

  1. Resolve the handle. The browser uses the name service to resolve the handle of the paper to one or more repository handle(s). The browser may then select one repository based on some user profile (cost-based, location-based) or other decision process.
  2. Resolve the repository handle. The browser again uses the name server to resolve the repository handle to the object identifier of the repository. This is necessary because handle system identifiers are in a separate name space from distributed object identifiers. There are two reasons for this. First, the handle system is designed to service a wide variety of objects, many of which are not in the distributed object domain. Second, handles are a many-to-many mapping (e.g., there may be replications of an object with multiple handles). Distributed object identifiers are a one-to-one mapping (although there are some plans in the CORBA community to fold object replication into the framework).
  3. Initiate a "session" with the repository. Terms and Conditions associated with repository may require that the client interact with external authentication services, to certify that Lucy is who she claims to be (for example, the ACM may have special access rules for students at accredited Ph.D. institutions). The client may also interact with external payment services if there is a charge for accessing the repository (outside of the charge for individual objects).
  4. Download a dissemination. This may be the entire paper, or Lucy might specify that she only wants certain pages.
  5. View the paper. Lucy can now use a viewer program to view the contents of the dissemination. If the dissemination is protected by terms and conditions of its own, this may require interaction with payment and authentication services.

Building upon the CORBA Security Framework

The ISOS security architecture is a framework for the enforcement of security at four levels: access to repositories, access to digital objects within repositories, access to operations on digital objects (modify, get a dissemination, etc.), and access to disseminations that are produced by access requests on digital objects in repositories. We enforce this multi-level security uniformly through the use of a new object class, terms and conditions, that interacts with the CORBA security framework.

It is important to note that the stated terms and conditions for objects in a digital library infrastructure may go far beyond conventional credential schemes in traditional security architectures, and in the CORBA security framework. Traditional security architectures generally support individual or group membership and assume identities (credentials) of the potential principals (i.e., those people or agents who may access objects) are known in advance. This simple credential scheme is not realistic for objects in a digital objects in repositories, where the possible use of an object is unknown. Some of the possible credential-related complexities are described below.

CORBA, or any distributed object framework, defines a model where clients interact with objects that are distributed over the net. Clients have priviledge attributes; the credentials that prove the identity of this party. For example, a credential might prove that a user is a student at a specific university that holds a license for access to some collection of objects. Objects have control attributes; the rules that determine who or what may access the object. For example, control attributes for an object might specify that clients with credentials "student at Cornell University" have access to this object.

In CORBA, the object request broker (ORB) is the foundation layer that mediates object invocation requests and method calls between clients and servers. CORBA implements security within the ORB through a construct called an interceptor. The interceptor is responsible for setting up a secure association between client and server (including encrypted communication) and for creating a set of security-related objects that encapsulate specific security functions. The credential object holds the authenticated identity for a user (proving the user is who they claim to be) and defines privilege attributes within the system. This is a generic interface that can be extended to a number of specific credential types (e.g., Kerberos). The security context object is created at both the client and server side of an object invocation. These objects encapsulate the state information for a secure association that is shared between a client and server. Finally, the access decision object (ADO) is a security monitor that is responsible for granting or denying access to an object and its methods (operations) based on the information in the credential object and security context objects. The ADO's decision is based on the security context of the invocation.

The foundation for extending CORBA security is the ISOS terms and conditions class, which is used by extensions to the CORBA security framework (described later). Each instance of this class (contained in a digital object, repository or dissemination) packages the respective object's access rules in the form of rules sets that define the conditions for secure access to the object. Some initial work on formulating these rules sets has been done at the University of Illinois [12,13]. We are also examining KQML (Knowledge Query and Manipulation Language) [14] as the method carrying on negotiations between clients on servers to resolve these terms and conditions. KQML, an outgrowth of the ARPA Knowledge Sharing Effort, is a language to support cooperative problem sharing between intelligent agents.

The CORBA security framework described earlier is extensible via sub-typing of the objects that are responsible for enforcing security. Recall that the access decision object (ADO), on the server side, is the security monitor that is responsible for granting or denying access to an object. We propose to sub-type the ADO to allow it to interpret the rules in the instance of terms and conditions for the respective object. The ADO can then use these rules as the basis of its access decision, rather than the simple control attributes in the standard CORBA model. Recall also that the credential object (CO), on the client side, is responsible for keeping the authenticated identity of the user for the current security context. We propose to sub-type the CO to allow it to access and interpret the complex credentials defined earlier; that is, Boolean combinations of credentials, credentials requiring user negotiation, and credentials requiring negotiation with outside services.

Interaction of these encoded terms and conditions with the CORBA security architecture will occur as follows. When a client attempts to create an invocation of a ISOS object (a repository or digital object), or execute a method on an object (e.g. produce a dissemination), it will be trapped by the server's (repository's) access control interceptor. If the object or method is protected by terms and conditions (i.e., this is a protected object), the server will translate the rule set into its internal set of control attributes that will be used by the server's ADO to define the level of protection for the target object. We expect that some repositories, for efficiency sake, might undertake this translation process on deposit of digital objects and cache them in a "security database".

The level of complexity of the terms and conditions determines what happens next. In a simple case the ADO might check credentials presented by the client in the access request, or might check (through a method call) those already residing in the client's CO. Assume a complex case, however, where the client has no way to pre-determine the needed credentials without interaction with the ADO. In this case, the ISOS-extended CO will need to execute a method on the ISOS-extended ADO to determine what are the required credentials. In response to this method call, the ADO will send to the CO the subset of the access rules that define the credentials needed by the CO. These rules may express that the CO needs to obtain a new certificate from some authentication service ("prove you are a U.S. citizen"), with which it can then retry the object access or method invocation request. More complex rules might require that the client invoke an agent that interacts, over a secure channel, with an agent created by the repository. The result of this negotiation might be a new certificate proving that the client has completed the terms of the negotiation, which can them be submitted in a new access request.

Future Work

This story and the companion technical paper represent the completion of the initial ISOS design. The multiple groups that collaborated in this design will now undertake prototype implementations to verify the design and lay the groundwork for further development.

Of primary concern in the initial prototypes is how well the design inter-operates with other digital library services, both object-based and those based on other paradigms. Coordination with work being done by the six Digital Library Initiative sites will be especially important.

Acknowledgments

This is joint work with Robert McGrath of NCSA, Ed Overly of CNRI, and Nancy Yeager of NCSA. James Davis of the Design Research Institute (Xerox Corporation and Cornell University) and David Ely of CNRI are also major contributors to this work. The author wishes to thank Robert Kahn and Robert Wilensky, whose initial work on the digital object, framework has enabled this research. The findings discussed in this paper were supported in part by the Advanced Research Projects Agency under Grant No. MDA972-92-J-1029 with the Corporation for National Research Initiatives (CNRI), and in part by funding from the National Science Foundation, the State of Illinois, and NASA Cooperative Agreement Notice (CAN) "Public Use of Earth and Space Science Over the Internet". Its content does not necessarily reflect the position or policy of any of the sponsoring parties, and no official endorsement should be inferred.


References

[1] Friedlander, Amy. 1995. Emerging Infrastructure: The Growth of Railroads. Corporation for National Research Initiatives. Reston, VA.
[2] Friedlander, Amy. 1995. Natural Monopoly and Universal Service. Corporation for National Research Initiatives. Reston, VA.
[3]Lagoze, Carl and McGrath, Robert and Overly, Ed and Yeager, Nancy. 1995. A Design for Inter-Operable Secure Object Stores (ISOS). Cornell Computer Science Technical Report TR95-1558. http://www.ncstrl.org/Dienst/UI/2.0/Describe/ncstrl.cornell%2fTR95-1558 [4]Kahn, Robert and Wilensky, Robert. 1995. "A Framework for Distributed Digital Object Services." http://www.cnri.reston.va.us/home/cstr/arch/k-w.html.
[5]http:www.cnri.reston.va.us/home/cstr.
[6] Object Management Group. 1993. "The Common Object Request Broker: Architecture and Specification." http://www. acl.lanl.gov/sunrise/DistComp/Objects/corba.html
[7] Brockschmidt, Kraig. 1995. Inside OLE 2, Second Edition. Microsoft Press.
[8] Object Management Group. 1995. CORBA Security.
[9] Corporation for National Research Initiatives. "Handles and the Handle System." http://www.cnri.reston.va.us/home/cstr/handle-intro.html.
[10]Sun Microsystems Computer Company. 1995. "The Java Language Environment." White Paper.
[11]http://www.python.org.
[12] Jones, E. and Ching, N. and Winslett. M. 1995. "Credentials for Privacy and Interoperation." Proceedings of the New Security Paradigms '95 Workshop.
[13] Winslett, M. and Smith, K. and Qian, K. 1994. "Formal Query Languages for Secure Relational Databases." ACM Transactions on Database Systems.
[14] Finin, Tim and Fritsson, Rich and McKay, Don. 1992. "A Language and Protocol to Support Intelligent Agent Interoperability." Proceedings of the CE & CALS Washington `92 Conference.
[15] Janssen, Bill and Severson, Denis and Spreitzer, Mike. 1995. ILU 1.8 Reference Manual. Xerox Palo Alto Research Center.


D-Lib Forum |  D-Lib Magazine Contents Page |  Next Story  Comments
hdl://dlib.cnri/december95-lagoze