NSDI '04 Abstract
Pp. 323336 of the Proceedings
Consistent and Automatic Replica Regeneration
Haifeng Yu, Intel Research Pittsburgh and Carnegie Mellon University; Amin Vahdat, University of California, San Diego
Abstract
Reducing management costs and improving the availability of
large-scale distributed systems require automatic replica regeneration, i.e., creating new replicas in response to replica
failures. A major challenge to regeneration is maintaining consistency
when the replica group changes. Doing so is particularly difficult
across the wide area where failure detection is complicated by network
congestion and node overload.
In this context, this paper presents Om, the first read/write
peer-to-peer wide-area storage system that achieves high availability
and manageability through online automatic regeneration while still preserving
consistency guarantees.
We achieve these properties through the following techniques. First,
by utilizing the limited view divergence property in today's
Internet and by adopting the witness
model, Om is able to regenerate from any single replica rather than
requiring a majority quorum, at the cost of a small (10-6 in our
experiments) probability of violating consistency. As a result, Om can
deliver high availability with a small number of replicas, while
traditional designs would significantly increase the number of
replicas. Next, we distinguish failure-free reconfigurations
from failure-induced ones, enabling common reconfigurations to
proceed with a single round of communication. Finally, we use a lease graph
among the replicas and a two-phase write protocol to optimize for
reads, and reads in Om can be processed by any single replica.
Experiments on PlanetLab show that consistent regeneration in Om
completes in approximately 20 seconds.
- View the full text of this paper in HTML and PDF.
The Proceedings are published as a collective work, © 2004 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
|