|
JVM '01 Paper   
[JVM '01 Tech Program Index]
Proof Linking: Distributed Verification of Java Classfiles in the Presence of Multiple ClassloadersPhilip W. L. Fong
AbstractTo offload the computational burden of bytecode verification within Java Virtual Machines (JVM), distributed verification systems may be created using any one of a number of verification protocols, based on such techniques as proof-carrying code and signed verification by trusted authorities. This paper advocates the adoption of a previously-proposed mobile code verification architecture, proof linking, as a standard infrastructure for performing distributed verification in the JVM. Proof linking not only supports both CLDC-style and signature-based distributed verification protocols, but it also provides interoperability between the two. To ground our work in the real-world requirements of Java bytecode verification, we also extend previous work on proof linking to handle multiple classloaders.
IntroductionSecurity is the cornerstone of trustworthy mobile code systems such as that of Java. In accepting arbitrary mobile code from unknown and potentially untrustworthy sources, a Java Virtual Machine (JVM) enforces type safety--the first line of defence in mobile code security--through a a link-time bytecode verification process. The bytecode verifier performs dataflow analysis and various structural analyses to guarantee that untrusted classfiles can be linked into the JVM without producing type confusion. We call this protection mechanism, in which a static code verification procedure is invoked dynamically by the runtime environment, proof-on-demand.Proof-on-demand is conceptually simple and allows the JVM to take full responsibility for assuring type safety even in the presence of dynamically generated code. However, proof-on-demand imposes a considerable computational burden on the JVM. The link-time overhead is significant enough that some authors hyperbolically compare it to a denial-of-service attack [12, p. 110]. Moreover, the architectural complexity of the JVM is greatly increased by the coupling of complex verification logic with lazy dynamic linking [7]. Compounding these concerns, the bytecode verifier also adds significantly to the JVM's memory footprint [16, Sec. 5.3.1]. Future computational platforms will likely include a vast array of small information appliances that have limited computational resources and demanding response-time requirements. Downloaded mobile code will continue to be popular to provide short-lived system extensions (see, for example, the mobile code language WMLScript [17] for the Wireless Application Protocol). With its stability and widespread acceptance, the Java platform--and specifically realizations thereof based on the Connected, Limited Device Configuration (CLDC) specification [16]--will likely become a major infrastructure for hosting mobile programs in small devices. In these contexts, however, the high resource requirements and architectural complexity of proof-on-demand implementations may become intolerable. The CDLC specification has hence rejected the proof-on-demand approach. Future systems will also likely see additional forms of run-time verification to provide enhanced levels of protection. As the pervasiveness of mobile code hosting environments increases, so too do the vulnerabilities and the potential consequences of these vulnerabilities. To counteract this, attention will turn to safety properties that go beyond simple ``type safety'' in ensuring system integrity. The complex program analyzers necessary to verify these additional properties may well become impractical even for a standard JVM, let alone a CLDC-compliant device. To address these issues, some or all of the verification burden may be offloaded to parties other than the mobile code hosting environment. This gives rise to a distributed verification system, in which a mobile code runtime environment shares some or all of its verification burden with certain remotely located facilities. Each facility interacts with the hosting environment by means of a verification protocol. A distributed verification system may in fact employ distinct verification protocols for different code units provided that an overall framework for protocol interoperation is defined. An individual verification protocol is thus a fixed scheme that orchestrates the communication and division of labour among the parties involved in the distributed verification of a code unit. For example, proof-on-demand is a trivial verification protocol that assigns the entire verification burden to the host environment. This paper is a contribution to the ongoing development of a standard distributed verification architecture for mobile code systems in general, and for the JVM in particular. The focus of this paper is twofold:
Distributed Verification: Related work and major issuesIn a distributed verification system, the machine hosting the mobile code runtime environment is called the code consumer. The party responsible for construction and distribution of mobile programs is the code producer. Code producers and consumers interact in various ways to define a verification protocol. As an alternative to proof-on-demand, two families of verification protocol have been proposed in the related literature.
Self-Certifying CodeThe first protocol family involves augmenting the untrusted code to make it self-certifying. This approach is exemplified in the work on proof-carrying code [14,13]. The protocol proceeds as follows. (i) The code consumers, or possibly an authority representing them, publish a safety policy in the form of a verification-condition generator. Given any mobile program, the generator computes a verification condition that must be shown to be true if the code is to be accepted as safe by consumers. (ii) To distribute a program, a code producer computes the verification condition from the code, proves the condition, and then attaches the proof to the program code when it is distributed. (iii) Upon receiving a mobile program, a consumer recomputes the verification condition, and then checks if the attached proof indeed establishes the verification condition. Execution is granted if proof checking succeeds. Since proof checking is often substantially easier than proof generation, this protocol induces less link-time overhead than proof-on-demand. Furthermore, since proof generation may now be performed once and for all on the producer side, difficult-to-prove safety properties may become affordable.In application to Java, the essential idea behind proof-carrying code is that the code producer can annotate a mobile program with static analysis results, so that a consumer may use the annotations to avoid performing a full bytecode verification. This idea has been applied to the verification of Java bytecode in various forms [15,9,1], and has further been incorporated in the stack map method of the CLDC specification [16, Sec. 5.3].
Signature-based MethodsA second family of distributed verification protocols is based on a very efficient and well-understood mechanism, namely, signature checking. Execution is granted to code that is signed by a trusted party.A major objection to these protocols is that, unlike a proof (or other kind of annotation), the semantics of a signature may not be well defined. Thus, there may be no protection against the possibility that signing authorities miscertify. Moreover, celebrity is required in the certification of mobile programs, making it hard for non-established developers to inspire trust. These objections are nicely addressed by a protocol which we call proof delegation [2,3]. The protocol proceeds as follow. (i) The code consumers, or more likely an authority representing them, publish a safety policy in the form of a static program analyzer that checks if a given mobile program is safe. The analyzer is encapsulated in a trusted coprocessor, for example, having the form factor of a PCMCIA card or a PCI card [8]. Attempts to physically tamper with the encapsulated analyzer or to extract the private encryption key in the hardware will render the hardware dysfunctional, or perhaps clear its memory [4]. The hardware is then distributed to code producers. (ii) To distribute software, a code producer submits mobile programs to the trusted program analyzer, which verifies the safety of the code, and digitally signs it. (iii) Upon arrival at a consumer site, the signature atttached to the program code will be authenticated. The bytecode verification of the proof-on-demand protocol is replaced by a simple and efficient signature-checking primitive. Using trusted coprocessors, proof delegation physically binds the signature to the formal properties enforced by a static program analyzer, thereby giving a well-defined semantics to the signature. However, in order to support signature-based verification protocols such as proof delegation, two further issues must be addressed.
In support of both self-certifying protocols and signature-based methods, this paper proposes a proof linking architecture with the following features.
The Proof Linking Architecture
Assumption, Notations and ModellingThe standard Java classloading semantics uses multiple classloaders to implement namespace partitioning. A loaded class is identified by both its classname and its defining classloader [10]. For the purpose of introducing the proof linking architecture and its application to distributed verification, however, this section makes the simplifying assumption that only a single classloader exists. This assumption will be relaxed in subsequent sections. As a result of the single-classloader simplification, symbolic class references and loaded classes are each denoted simply by classname. A member named of a class with type signature is identified by the symbolic reference ::. We do not differentiate a symbolic member reference and an actual member of a loaded class. We model the linking activities of the simplified JVM by a set of linking primitives, including, for example, ``load '', ``verify '', ``resolve in '', ``resolve :: in '', and so on. The JVM executes linking primitives in a concurrent, nondeterministic fashion, in accordance with a well-defined partial ordering enforced by the implementation. We call this partial order the linking strategy of the runtime environment. Further details are presented in Section 4.
A Motivating ExampleSignature-based verification protocols involve the remote verification of classfiles. Safety is in general a whole-program notion, and thus cannot be established by merely examining a single module. External dependencies (e.g., checking for assignment compatibility, computing the meet of two flow values, and so on) validated remotely may no longer hold when a classfile is linked into the runtime environment. The following example is illustrative. Suppose class defines a method . Suppose further that has a direct subclass , which in turn has a direct subclass . Assume that overrides the method . Say the method :: contains an invokespecial bytecode instruction that delegates the call to method ::. A bytecode verifier is supposed to check that class is a subclass of class . If the verification of class is performed at link time, the bytecode verifier will recursively invoke the classloader to bring in the classfiles for both and to validate the required subclassing relationship. However, if the verification of class is carried out remotely, this strategy will fail for two reasons. First, the remote verifier might not have access to the classfiles for and . Second, even if it does, the versions of and considered by the remote verifier may not be compatible with the ones actually loaded into the JVM. This configuration management issue motivates a two-stage verification architecture based on conditional certification and proof linking. In the first stage, remote verification is performed in a modular fashion to generate conditional certificates. These certificates specify the assumptions (dependencies) that must be checked to ascertain the safety of the code unit in question. In the second stage, the JVM performs proof linking to check the asserted conditions in the context of the classes actually loaded.
First Stage: Modular VerificationUntrusted classfiles are certified by a trusted verifier at a remote site. Instead of endorsing external dependencies that might be invalidated at runtime, the remote verifier computes, for each classfile, a conservative safety precondition summarizing the external dependencies that must hold at runtime in order for the classfile to be safe. The safety precondition is here represented as a conjunctive set of database queries. In the running example, during the remote verification of classfile , a trusted verifier will generate the query subclass(,) as the invokespecial instruction is scanned. The remote verifier may end up generating many such queries. The conjunctive set of all the queries formulated by the remote verification session becomes the safety precondition for endorsing the classfile being considered. In addition, the remote verifier also schedules each of the queries for evaluation. Each query describes a safety precondition for a particular linking primitive. For example, the query above, subclass(,), is associated with the linking primitive ``resolve :: in .'' In essence, this schedules the subclassing check for evaluation immediately prior to resolution of :: within class . Such a query is said to be the proof obligation for the associated primitive, representing a condition that must be met if the runtime system is to safely execute the corresponding linking primitive. We also say that a proof obligation is attached to its associated primitive. In order for the runtime system to discharge proof obligations, the remote verifier also computes, for each code unit, a set of clauses called commitments. The commitments are ground facts that describe the interface properties of the code unit. For example, during the remote verification of the classfile , the fact extends(,) is generated as one of the commitments. As we shall see below, this fact will be used to satisfy the subclassing query in our example. Prior to shipping a classfile, the remote verifier will package together the classfile itself, the obligations annotated with the associating primitives, and the commitments. The entire package is signed by the verifier using its private key. The signed package is then distributed to consumers. The obligations, commitments, and the signature can all be packaged inside a classfile using classfile attributes (see the JVM specification for details [11]).
|
subclass(,). | |
subclass(,) :- | |
extends(,), | |
subclass(,). |
extends(,).
extends(,).When the primitive resolve :: in is to be executed, the JVM will look up its attached obligations, among which the query subclass(,) will be found. The JVM then attempts to satisfy this query by consulting the facts and rules within the commitment database. The query succeeds and the primitive is executed (assuming that any other obligations are satisfied as well).
The scheme presented above nicely addresses the need for conditional certification. Even though the remote verifier is unable to validate the external dependencies of a class, it nevertheless can express them as proof obligations. The proof linking mechanism is invoked to discharge the obligations at runtime, when the necessary information has become available.
Proof linking also provides interoperability between distributed verification protocols such as proof delegation, proof-carrying code, and proof-on-demand. Because linking primitives communicate solely by attaching obligations and asserting commitments, they are highly modular. A local verifier may process classfiles annotated with CLDC-style stack maps to generate the appropriate proof obligations and commitments just as a remote verifier would do through bytecode analysis. In the event of a classfile that is neither signed by a remote verifier nor annotated with stack map, the local verifier may itself perform a complete bytecode verification to generate the necessary proof obligations and commitments. The proof linking mechanism checks and discharges obligations from each of these sources in the same way, without any need to know the verification protocol used for a particular classfile. As a result, a Java application could consist of a number of classfiles that are signed remotely, others that are ``pre-verified'', and still others that are completely uncertified. As long as each verification protocol uses the same verification interface for asserted proof obligations and commitments, interoperability is assured.
That the incremental proof linking procedure is as safe as proof-on-demand is not obvious. Could an obligation be generated after its attachment target is already executed? Could it be possible that checked obligations are falsified by subsequently generated commitments? Is it possible that an obligation fails only because the commitments necessary for satisfying it are generated too late? To address these concenrs we formulate the following three correctness conditions.
Throughout the presentation above, we have used a simple logic programming notation to represent the proof obligations, commitments, and initial theory. However, this abstract model is presented as such only to clarify ideas and to facilitate the articulation of correctness (see Section 5). By no means are we suggesting that an actual implementation employ a logic programming system in proof linking. Rather, commitments can be optimized as flags and pointers stored in the Class and Method objects inside the JVM. A recursive query such as subclassing could well be realized by a standard tree traversal algorithm. The obligation table need not even exist physically; obligations attached to a resolve primitive could be stored in the corresponding constant pool entry. Such optimizations were used in the prototype implementations reported previously [5,7].
The discussion above assumes a simplified JVM with only one classloader. We relax the assumption in this section, and analyze the interaction between proof linking and dynamic linking in the setting of multiple classloaders. It turns out that a systematic, straightforward set of extensions to the previously proposed model is sufficient to make proof linking work with multiple classloaders. This demonstrates that the proof linking technique is applicable to realistic mobile code environments and is orthogonal to Java's delegation-style classloading.
When a Java application attempts to load a class with a given name with a classloader , the initiating classloader of , may delegate the classloading task to another classloader, which, in turn, might delegate the task to yet another classloader. The classloader that eventually loads and defines is said to be its defining classloader. is uniquely identified by the pair . We also write to indicate the fact that initiates the loading of . When a symbolic reference is resolved in a class , the classloader will be used as the initiating classloader for class . We identify the member of a loaded class by the notation . Details of Java's classloading mechanism are described elsewhere [11, Chapter 5][10].
Since a loaded class is identified not only by its classname, but also by its defining classloader, a remote verifier has no way of naming the classes in the commitments and obligations it generates. To deal with this, we extend our scheme by the following set of reformulations:
We begin the discussion of our extended proof linking model by looking at its linking primitives. We define a linking primitive as a self-contained action which never activates other primitives as a subroutine, nor recursively invokes itself. The JVM executes a linking primitive using the following protocol:
Our multiple-classloader linking model contains the extended set of linking primitives in Figure 1. We inherit the two endorse primitives from our original work. They are auxiliary targets of obligation attachment. Since they do not contribute much to our present discussion, readers may safely identify them with class preparation. Only the verify primitives generate obligations, and only the verify and bind primitives generate commitments.
Two changes to the original primitive set have been made [7, Sec. 4.1]:
The JVM orders the nondeterministic, concurrent execution of linking primitives according to the constraints prescribed by the linking strategy found in Section 4.7.
The static type rules for Java under the single classloader assumption have been presented previously [7, Figure 6]. A straightforward mechanical translation to replace classnames with loaded class notations adapts these rules for the multiple classloader case. For example, consider the subclassing rule mentioned above.
This rule is transformed as follows.
subclass(,). subclass(,) :- extends(,), subclass(,).
A list of all the reformulated rules is available in a technical report [6, Figure 2].
subclass( , ). subclass( , ) :- extends( , ), subclass( , ).
Suppose that a classfile with classname is being verified remotely, and that extends a class with name . The verifier must assert a commitment specifying this subclassing relationship. However, at remote-certification time, the defining classloaders and for the classes and respectively are unknown, so the commitment cannot be phrased in terms of and . Three reformulations address this problem. First, the remote verification procedure instead formulates the commitment:
extends(this,)The relative reference this represents the class being verified. When the commitments are actually asserted into the commitment database by the local ``verify '' primitive, the defining classloader for class is known. Therefore, whenever ``verify '' asserts a commitment , it systematically tags the commitment as @ . For example, the commitment above will be asserted as:
extends(this,)@A list of all the reformulated commitments that a remote bytecode verifier should generate can be found in the technical report [6, Figure 4].
Second, execution of the bind primitive contributes binding information by asserting commitments. Whenever a ``bind to '' primitive terminates, it asserts the commitment `` ''. These facts will be used for explicit resolution of symbols in commitments and queries.
Third, the initial theory is augmented with translation rules that express how subgoals expressed in terms of loaded class notations may be satisified using corresponding goals using tagged commitments. For example, to evaluate queries of the form subclass( , ), we will eventually need to evaluate subgoals of the form extends( , ) using the tagged commitments above. To do so, the following translation rule is used.
The rule basically retrieves the corresponding tagged commitment, and then validates binding information by consulting the binding commitments. A similar translation rule is required for each predicate that may be asserted as a commitment. The formulation of these rules is straightforward and the complete set is presented in the technical report [6, Figure 6].
extends( , ) :- extends(this,)@ , .
As with commitments, a remote verifier cannot identify the defining classloader for the classes appearing in obligations. We then follow the same strategy and formulate obligations in terms of static classnames, and then rely on the local verify primitive to tag the obligations with the context in which they are to be evaluated. For example, the remote verifier may formulate an obligation of the following form:
subclass(,)When the local verify primitive processes this obligation, it tags the query with an evaluation context before attachment:
subclass(,)@Similar tagging is systematically applied to each obligation [6, Figure 7].
Translation rules transform tagged queries into queries in terms of loaded classes. For example, the following rule is required in the initial theory in order to handle all subclass/2 queries:
The translation rule basically resolves all the symbols in the tagged context, and evaluates a corresponding query in terms of loaded classes. The complete set of these translation rules is presented in the technical report [6, Figure 8].
subclass(,)@ :- , , subclass( , ).
Recall that, in the running example, the above subclassing obligation should be attached to the primitive ``resolve :: in '', which is identified by the loaded class reference . The remote verifier cannot completely identify the target primitives to which the obligation is attached. Fortunately, obligations are always attached to primitives that are operating on the class being verified [6, Figure 7]. The remote verifier may thus formulate the target of attachment in terms of place holders:
resolve :: in ___and rely on the local ``verify '' primitive to fill in , as it does when tagging obligations.
A linking strategy schedules the execution of linking primitives, and coordinates incremental proof linking. The following notations are used to express ordering constraints. For linking primitives and , the constraint ``'' requires that any execution of primitive should be preceded by the completion of primitive . The constraint `` if'' requires that execution of must not begin if holds and has not yet completed.
Java proof linking requires the ordering constraints shown in Figure 2. Except for the newly introduced Proper Resolution Property [PR], these constraints for the multiple-classloader case are extensions to the corresponding constraints for the single classloader case [7, Sec. 4.1].
To illustrate how the scheme above works, consider a refinement of the running example. Suppose class defines a method . Suppose further that has a direct subclass , which in turn has a direct subclass . Assume that overrides the method . Say the loaded method contains an invokespecial instruction that delegates the call to . The obligation subclass(,)@ will be attached to the primitive ``resolve :: in ''. When the obligation is checked, the subgoals in Figure 3 will be generated. The original obligation is shown as the top-level goal, annotated with ``resolve :: in '', the primitive to which the obligation is attached. We have also annotated all the innermost subgoals with the primitives that assert their matching commitments.
The deduction is successful because the commitments required by the innermost subgoals are already asserted at the time the obligation is checked, that is, at the time ``resolve :: in '' is executed. For example, subgoal 1.1 is satisfiable because, according to the Natural Progression Property [NP], the primitive ``bind to '' has already been executed. Also, subgoal 1.2 is satisfiable because
bindto | ||||
resolvein | ||||
resolve |
verify | ||||
bindto | ||||
bindto | ||||
endorse | ||||
resolve |
This example is really a skeleton for the proof of Completion, one of the three correctness criteria for proof linking. These criteria are considered in detail in the next section.
Given a well-defined linking strategy, proof linking is correct if we can establish the three correctness conditions: Safety, Monotonicity and Completion [7, Sec. 3.3].
Establishment of the Safety and Monotonicity properties follows the corresponding arguments for the single-classloader case [7, Sec. 4.3].
Completion has to be established on an obligation-by-obligation basis. Continuing with our running example in section 4.8, we consider an obligation subclass(,)@ that is attached to the primitive ``resolve :: in ''. Our goal is to show that, if the predicate subclass(,)@ eventually becomes provable, then it is necessarily provable before the primitive ``resolve :: in '' is executed.
Suppose that the obligation subclass(,)@ becomes provable at a certain point. Generalizing the proof found in Figure 3, the proof tree of the obligation contains the innermost subgoals in Figure 4.
We number the subgoals as (-), (-) and (). We want to show that the primitives that assert commitments satisfying these subgoals have all been executed prior to the execution of ``resolve :: in ''. As already explained in Section 4.8, the Proper Resolution Property [PR] guarantees that supporting commitment () is already in place. We use induction to show that commitments (-) and (-) are already asserted when the obligation is checked.
verify | ||||
bindto | ||||
resolve |
verify | ||||
bindto | ||||
bindto | ||||
endorse | ||||
resolve |
The linking strategy above suggests a natural implementation of proof linking in a JVM with multiple classloaders. In particular, a call to the defineClass method of class loader , with argument as the expected classname, will execute ``load '' and ``verify ''. A call to the loadClass method of class loader , requesting the loading of class , corresponds to the primitive ``bind to ''. The primitive ``endorse '' could be executed when the class `` '' is prepared [11, Sec. 5.4.2]. The primitive ``endorse '' could be executed right before the method is first resolved. The resolution primitives coincide with regular symbol resolution.
The translation rules, binding commitments, and tagging can be optimized readily. For example, a classloader usually has a hash table storing the classes whose loading it has initiated. Such a table can be reused to represent binding commitments. Also, tagging is just an abstract way to say that the tagged commitments/obligations are stored in the Class objects themselves. Notice that this suggests a very convenient way to retract commitments/obligations when a class is finalized.
As opposed to Sun's JVM implementation, which postpones bytecode verification until a class is linked, the implementation strategy above performs eager verification, that is, the local ``verify '' primitive is executed immediately after is defined. This is necessary to ensure that commitments are gathered as soon as possible. Sun's JVM performs one pass of verification at class definition time, postponing the second and third passes until link time.
We are in the process of implementing a proof linker in a CLDC-compliant KVM extended with multiple classloaders. The exercise has clarified a lot of our thoughts, and has confirmed the compatibility of proof linking and Java's delegation style classloading mechanism.
We advocate the adoption of the proof linking architecture as a standard framework for conducting distributed verification for the JVM. The architecture supports the notion of conditional certification essential for signature-based verification protocols, and offers interoperability among various distributed verification protocols. We have also extended our original proof linking model to account for the presence of multiple classloaders in the standard JVM, thereby showing that proof linking is applicable to complex mobile code environments such as J2SE.
This document was generated using the LaTeX2HTML translator Version 99.2beta8 (1.42)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 multi.tex
The translation was initiated by Philip Fong on 2001-02-26
Philip Fong 2001-02-26
This paper was originally published in the
Proceedings of the Javatm Virtual Machine Research and Technology Symposium,
April 23-24, 2001, Monterey, California, USA.
Last changed: 4 Jan. 2002 ml |
|