NSDI '05 Abstract
Glacier: Highly Durable, Decentralized Storage Despite Massive Correlated
Failures
Andreas Haeberlen, Alan Mislove, and Peter Druschel, Rice University
Abstract
Decentralized storage systems aggregate the available disk space of
participating computers to provide a large storage
facility. These systems rely on data redundancy to ensure durable
storage despite of node failures. However, existing systems
either assume independent node failures, or they rely on introspection to carefully place redundant data on nodes with low
expected failure correlation. Unfortunately, node failures are not
independent in practice and constructing an accurate failure model is
difficult in large-scale systems. At the same time, malicious worms
that propagate through the Internet pose a real threat of large-scale
correlated failures. Such rare but potentially catastrophic failures
must be considered when attempting to provide highly durable storage.
In this paper, we describe Glacier, a distributed storage system
that relies on massive redundancy to mask the effect of large-scale
correlated failures. Glacier is designed to aggressively minimize the
cost of this redundancy in space and time: Erasure coding and garbage
collection reduces the storage cost; aggregation of small objects and
a loosely coupled maintenance protocol for redundant fragments
minimizes the messaging cost. In one configuration, for instance, our system can
provide six-nines durable storage despite correlated failures of up to
60% of the storage nodes, at the cost of an eleven-fold storage
overhead and an average messaging overhead of only 4 messages per node
and minute during normal operation. Glacier is used as the storage
layer for an experimental serverless email system.
- View the full text of this paper in HTML and PDF.
Until May 2005, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2005 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
|