Next: Bibliography
Up: FastReplica: Efficient Large File
Previous: 4 Performance Evaluation
In recent years, the Web and Internet services have
moved from an architecture where data objects are located at a single
origin server or site to the an architecture where objects are
replicated across multiple, geographically distributed servers. Client
requests for content are redirected to a best-suited replica rather
than the origin server. For large files, the replication process
across this distributed network of servers is a challenging and
resource-intensive problem on its own.
In this work, we introduce FastReplica for efficient and
reliable replication of large files in the Internet environment.
FastReplica partitions an original file into a set of subfiles and
uses a diversity of Internet paths among the receiving nodes to
propagate the subfiles within the replication set in order to speedup
the overall download time for the original content. We scale the
algorithm by clustering the nodes in a set of replication groups, and
by arranging efficient group communications among them, i.e. by
building the overlay tree on top of those groups.
Through experiments on a prototype implementation, we demonstrate the
efficiency of FastReplica in the small in a wide-area
testbed. Since FastReplica in the small defines the iteration
step in the general algorithm, these performance results set the basis
for performance expectations of FastReplica in the large.
The interesting and important issues for future research are ``how to
better cluster the nodes in replication groups?'' and ``how to build
an efficient overlay tree on top of those groups?'' Recent
research [19,14,22] shows that the large-scale Internet
application could benefit from incorporating IP-level topological
information in the construction of the overlay to significantly
improve overlay performance. In [19], a new distributed
technique is introduced where the nodes partition themselves into bins
in such a way that nodes within a given bin are relatively close to
one another in terms of network latency. It might be an interesting
technique for clustering ``close'' nodes into replication groups in
FastReplica.
To analyze and validate future optimization for
FastReplica, a large-scale Internet environment or testbed is needed.
In recent work [23], authors propose ModelNet as a
comprehensive Internet emulation environment to evaluate
Internet-scale distributed systems. A new initiative within the
research community around PlanetLab [18] is aiming to
build a global testbed for developing and accessing new network
services. The introduction of such environments and large-scale
testbeds will help to support interesting scalability experiments in the
near future.
Acknowledgements: We would like to thank HPLabs summer interns
who helped us to build an experimental wide-area testbed from their
university machines: Yun Fu, Weidong Cui, Taehyun Kim, Kevin Fu,
Zhiheng Wang, Shiva Chetan, Xiaoping Wei, and Jehan
Wickramasuriya. Their help is highly appreciated.
Authors also would
like to thank John Apostolopoulos for motivating discussions on
multiple descriptions for streaming media and John Sontag for his
active support of this work.
We would like to thank the anonymous
referees for useful remarks and insightful questions, and our shepherd
Srinivasan Seshan for constructive suggestions to improve the content
and presentation of the paper.
Next: Bibliography
Up: FastReplica: Efficient Large File
Previous: 4 Performance Evaluation