OSDI '04 Abstract
Pp. 151166 of the Proceedings
FUSE: Lightweight Guaranteed Distributed Failure Notification
John Dunagan, Microsoft Research; Nicholas J. A. Harvey, Massachusetts Institute of Technology; Michael B. Jones, Microsoft Research; Dejan Kostic, Duke University; Marvin Theimer and Alec Wolman, Microsoft Research
Abstract
FUSE is a lightweight failure notification service for building
distributed systems. Distributed systems built with FUSE are
guaranteed that failure notifications never fail. Whenever a failure
notification is triggered, all live members of the FUSE group will
hear a notification within a bounded period of time, irrespective of
node or communication failures.
In contrast to previous work on failure detection, the responsibility
for deciding that a failure has occurred is shared between the FUSE
service and the distributed application.
This allows applications to implement their own definitions of failure.
Our experience building a scalable
distributed event delivery system on an overlay network has
convinced us of the usefulness of this service. Our results
demonstrate that the network costs of each FUSE group can be small; in
particular, our overlay network implementation requires no additional
liveness-verifying ping traffic beyond that already needed to maintain
the overlay, making the steady state network load independent of the
number of active FUSE groups.
- View the full text of this paper in HTML and
PDF.
Until December 2005, you will need your USENIX membership identification in order to access the full papers. The Proceedings are published as a collective work, © 2004 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
- If you need the latest Adobe Acrobat Reader, you can download it from Adobe's site.
|