Check out the new USENIX Web site. next up previous
Next: Motivation Up: The CRISIS Wide Area Architecture Previous: The CRISIS Wide Area Architecture

Introduction

One of the promises of the Internet is to enable a new class of distributed applications that benefit from a seamless interface to global data and computational resources. A major obstacle to enabling such applications is the lack of a general, coherent, scalable, wide area security architecture. In this paper, we describe the architecture and implementation status of CRISIS, a wide area authentication and access control system. CRISIS forms the security subsystem of WebOS, a system that extends operating system abstractions such as security, remote process execution, resource management, and named persistent storage, to support wide area distributed applications.

Today, many wide area applications are limited by the lack of a general wide area security system. As one example, one of the goals of the WebOS project is to build a scalable SchoolNet service, to provide a safe place for the tens of millions of K-12 students in our states (California, Washington, and Texas) to learn and play. One challenge to making SchoolNet a reality is building scalable network services, for example, to provide a highly available email account to every student, without requiring a system administrator at every location. This paper focuses on an equally difficult challenge - maintaining the confidentiality and integrity of data and resources, for example, so that unauthorized people cannot obtain information about school children. As another example, we (as a geographically distributed development team) would like to use WebOS to allow us to seamlessly access any file or computational resource at any of our sites; once CRISIS is fully operational, we plan to rely on it to protect our development environment from external attacks. As a final example, we have built Rent-A-Server [Vahdat et al. 1997], a system to dynamically replicate and migrate Internet services, to gracefully handle bursty request patterns, and to exploit geographic locality to reduce latency and congestion. To be practical, however, RentAServer requires the ability to securely access and control remote data and computational resources (e.g., CPUs and disks).

An initial approach for supporting secure access to remote resources is to simply employ an authenticated login protocol. Unfortunately, this approach is inadequate because many wide area applications require more fine-grained control over access to remote resources. Further, the administrative overhead of creating and maintaining separate accounts in all domains where users wish to run jobs can be prohibitive. For example, it would be difficult to use authenticated login to support a user job running on an anonymous compute server in a remote administrative domain that needs access to a single file on the user's home file system.

Another approach for supporting secure wide area applications is to add fine-grained rights transfer to an existing authentication system, such as Kerberos [Steiner et al. 1988]. However, while Kerberos has proven quite successful for local area networks in a single administrative domain, it faces a number of challenges when extended to the wide area. First, Kerberos has no redundancy; security is undermined if even a single authentication server or ticket granting server is compromised, allowing an adversary to impersonate any principal that shares a secret with the compromised authentication server. In the wide area, the number of such single points of failure scales with the size of the Internet. Further, Kerberos requires synchronous communication with the ticket granting server in order to set up communication between a client and server; in the wide area, synchronous communication with a hierarchy of ticket granting servers is required. Given that the Internet today is both slow and unreliable, this can have a significant effect on availability and performance as perceived by the end-user. Although Kerberos servers could conceivably be replicated to improve availability, the servers would need to be geographically distributed to hide Internet partitions, providing an intruder even more points of attack.

Public key cryptography seems to hold out the promise of improving availability and security in the wide area, by eliminating the need for synchronous communication with a trusted third party. The public key of every principal (user or machine) can be freely distributed; provided the public keys are known, two principals can always communicate if they are connected, regardless of the state of the rest of the Internet. Unfortunately, this also comes at a cost; any compromise of a private key requires that every entity on the Internet be informed of the compromise. This is analogous to Kerberos, in that the number of single points of failure (in this case, the number of private keys) scales with the size of the Internet.

In this paper, we present the design and implementation of CRISIS, a system for secure, authenticated access to wide area resources. To avoid an ad hoc design where features are thrown together in an attempt to prevent all known types of security attacks, our approach is the systematic application of a set of design principles. These principles are inspired by analogy with other areas of distributed systems, where scalability, performance and availability can be achieved through redundancy, caching, lightweight flexibility, and localized operations. A goal of the CRISIS architecture is to demonstrate that these principles can also be applied to increasing the security of wide area distributed systems.

Specifically, the principles underlying the design of CRISIS include:

CRISIS is loosely based on the DEC SRC security model [Lampson et al. 1991]. Relative to their work, one of our contributions is to simplify the model by using transfer certificates as the basis of fine-grained rights transfer across the wide area. Transfer certificates provide an intuitive model for both rights transfer and accountability, as they allow a complete description of the chain of reasoning associated with a transfer of rights. In addition, revocation is a first class CRISIS operation; even privileges described by transfer certificates (which are typically valid only for a limited period of time) can be revoked immediately. CRISIS also provides for explicit reasoning about the state of loosely synchronized clocks, an important consideration for wide area applications. Further, CRISIS supports user-defined lightweight roles, to capture persistent collections of transferred rights (e.g., ``Tom running a job on remote supercomputer''). Finally, in contrast to the DEC SRC work which was implemented in the kernel of a platform that is no longer available, CRISIS is designed to run portably across multiple platforms, a requirement for a wide area security system to be useful in practice.

The rest of this paper describes CRISIS in more detail. We first provide some motivating examples for CRISIS along with a quick review of relevant technology in Sections 2 and 3. We then outline the CRISIS architecture in Section 4, followed by a detailed example of how CRISIS is used in Section 5. We evaluate the performance of our implementation in Section 6, and discuss related work in Section 7. We summarize our results in Section 8.


next up previous
Next: Motivation Up: The CRISIS Wide Area Architecture Previous: The CRISIS Wide Area Architecture
Amin Vahdat
12/10/1997