NSDI '08 Abstract
Pp. 147160 of the Proceedings
San Fermín: Aggregating Large Data Sets Using a Binomial Swap Forest
Justin Cappos and John H. Hartman, University of Arizona
Abstract
San Fermín is a system for aggregating large amounts of data
from the nodes of large-scale distributed systems.
Each San Fermín node
individually
computes the aggregated result by
swapping data with other nodes to
dynamically
create its own binomial tree.
Nodes that fall behind abort their trees, thereby reducing overhead.
Having each node create its own binomial tree makes San Fermín
highly resilient to failures and ensures that
the internal nodes of the tree have high capacity,
thereby reducing completion time.
Compared to existing solutions, San Fermín handles large aggregations better,
has higher completeness when nodes fail,
computes the result faster,
and has better scalability.
We analyze the completion time, completeness,
and overhead of San Fermín versus existing solutions using analytical
models, simulation, and experimentation with a prototype built on
peer-to-peer system
deployed on PlanetLab.
Our evaluation shows that San Fermín is scalable both in the number of
nodes and in the aggregated data size.
San Fermín aggregates large amounts of data
significantly faster than existing solutions:
compared to SDIMS, an existing aggregation system, San Fermín computes
a 1MB result from 100 PlanetLab nodes in 61-76% of the time and from
2-6 times as many nodes.
Even if 10% of the nodes fail during aggregation, San Fermín
still includes the data from 97% of the nodes in the result and
does so
faster than the underlying peer-to-peer system recovers from failures.
- View the full text of this paper in HTML and PDF. Listen to the presentation in
MP3 format.
The Proceedings are published as a collective work, © 2008 by the USENIX Association. All Rights Reserved. Rights to individual papers remain with the author or the author's employer. Permission is granted for the noncommercial reproduction of the complete work for educational or research purposes. USENIX acknowledges all trademarks within this paper.
|