Our system continues to function correctly even when some replicas are compromised by an attacker; this is worthwhile because the growing reliance on online information services makes malicious attacks more likely and their consequences more serious. The system also survives nondeterministic software bugs and software bugs due to aging (e.g., memory leaks). Our approach improves on the usual technique of rebooting the system because it refreshes state automatically, staggers recovery so that individual replicas are highly unlikely to fail simultaneously, and has little impact on overall system performance. Section 4.7 discusses the types of faults tolerated by the system in more detail.
Because of recovery, our system can tolerate any number of faults over the lifetime of the system, provided fewer than 1/3 of the replicas become faulty within a window of vulnerability. The best that could be guaranteed previously was correct behavior if fewer than 1/3 of the replicas failed during the lifetime of a system. Our previous work [6] guaranteed this and other systems [26,16] provided weaker guarantees. Limiting the number of failures that can occur in a finite window is a synchrony assumption but such an assumption is unavoidable: since Byzantine-faulty replicas can discard the service state, we must bound the number of failures that can occur before recovery completes. But we require no synchrony assumptions to match the guarantee provided by previous systems. We compare our approach with other work in Section 7.
The window of vulnerability can be small (e.g., a few minutes) under normal conditions. Additionally, our algorithm provides detection of denial-of-service attacks aimed at increasing the window: replicas can time how long a recovery takes and alert their administrator if it exceeds some pre-established bound. Therefore, integrity can be preserved even when there is a denial-of-service attack.
The paper describes a number of new techniques needed to solve the problems that arise when providing recovery from Byzantine faults:
Proactive recovery. A Byzantine-faulty replica may appear to behave properly even when broken; therefore recovery must be proactive to prevent an attacker from compromising the service by corrupting 1/3 of the replicas without being detected. Our algorithm recovers replicas periodically independent of any failure detection mechanism. However a recovering replica may not be faulty and recovery must not cause it to become faulty, since otherwise the number of faulty replicas could exceed the bound required to provide safety. In fact, we need to allow the replica to continue participating in the request processing protocol while it is recovering, since this is sometimes required for it to complete the recovery.
Fresh messages. An attacker must be prevented from impersonating a replica that was faulty after it recovers. This can happen if the attacker learns the keys used to authenticate messages. Furthermore even if messages are signed using a secure cryptographic co-processor, an attacker might be able to authenticate bad messages while it controls a faulty replica; these messages could be replayed later to compromise safety. To solve this problem, we define a notion of authentication freshness and replicas reject messages that are not fresh. However, this leads to a further problem, since replicas may be unable to prove to a third party that some message they received is authentic (because it may no longer be fresh). All previous state-machine replication algorithms [26,16], including the one we described in [6], relied on such proofs. Our current algorithm does not, and this has the added advantage of enabling the use of symmetric cryptography for authentication of all protocol messages. This eliminates most use of public-key cryptography, the major performance bottleneck in previous systems.
Efficient state transfer. State transfer is harder in the presence of Byzantine faults and efficiency is crucial to enable frequent recovery with little impact on performance. To bring a recovering replica up to date, the state transfer mechanism checks the local copy of the state to determine which portions are both up-to-date and not corrupt. Then, it must ensure that any missing state it obtains from other replicas is correct. We have developed an efficient hierarchical state transfer mechanism based on hash chaining and incremental cryptography [1]; the mechanism tolerates Byzantine-faults and state modifications while transfers are in progress.
Our algorithm has been implemented as a generic program library with a simple interface. This library can be used to provide Byzantine-fault-tolerant versions of different services. The paper describes experiments that compare the performance of a replicated NFS implemented using the library with an unreplicated NFS. The results show that the performance of the replicated system without recovery is close to the performance of the unreplicated system. They also show that it is possible to recover replicas frequently to achieve a small window of vulnerability in the normal case (2 to 10 minutes) with little impact on service latency.
The rest of the paper is organized as follows. Section 2
presents our system model and lists our assumptions; Section 3
states the properties provided by our algorithm; and Section 4
describes the algorithm. Our implementation is described in Section 5
and some performance experiments are presented in Section 6.
Section 7 discusses related work. Our
conclusions are presented in Section 8.
Next:System
Model and Assumptions Up:Contents
Miguel Castro and Barbara Liskov, "Proactive Recovery in a Byzantine-Fault-Tolerant System", in Proceedings of the Fourth Symposium on Operating Systems Design and Implementation, San Diego, USA, October 2000.