Next: Bibliography Up: FastReplica: Efficient Large File Previous: 4 Performance Evaluation

5 Conclusion

In recent years, the Web and Internet services have moved from an architecture where data objects are located at a single origin server or site to the an architecture where objects are replicated across multiple, geographically distributed servers. Client requests for content are redirected to a best-suited replica rather than the origin server. For large files, the replication process across this distributed network of servers is a challenging and resource-intensive problem on its own. In this work, we introduce FastReplica for efficient and reliable replication of large files in the Internet environment. FastReplica partitions an original file into a set of subfiles and uses a diversity of Internet paths among the receiving nodes to propagate the subfiles within the replication set in order to speedup the overall download time for the original content. We scale the algorithm by clustering the nodes in a set of replication groups, and by arranging efficient group communications among them, i.e. by building the overlay tree on top of those groups. Through experiments on a prototype implementation, we demonstrate the efficiency of FastReplica in the small in a wide-area testbed. Since FastReplica in the small defines the iteration step in the general algorithm, these performance results set the basis for performance expectations of FastReplica in the large. The interesting and important issues for future research are ``how to better cluster the nodes in replication groups?'' and ``how to build an efficient overlay tree on top of those groups?'' Recent research [19,14,22] shows that the large-scale Internet application could benefit from incorporating IP-level topological information in the construction of the overlay to significantly improve overlay performance. In [19], a new distributed technique is introduced where the nodes partition themselves into bins in such a way that nodes within a given bin are relatively close to one another in terms of network latency. It might be an interesting technique for clustering ``close'' nodes into replication groups in FastReplica. To analyze and validate future optimization for FastReplica, a large-scale Internet environment or testbed is needed. In recent work [23], authors propose ModelNet as a comprehensive Internet emulation environment to evaluate Internet-scale distributed systems. A new initiative within the research community around PlanetLab [18] is aiming to build a global testbed for developing and accessing new network services. The introduction of such environments and large-scale testbeds will help to support interesting scalability experiments in the near future. Acknowledgements: We would like to thank HPLabs summer interns who helped us to build an experimental wide-area testbed from their university machines: Yun Fu, Weidong Cui, Taehyun Kim, Kevin Fu, Zhiheng Wang, Shiva Chetan, Xiaoping Wei, and Jehan Wickramasuriya. Their help is highly appreciated. Authors also would like to thank John Apostolopoulos for motivating discussions on multiple descriptions for streaming media and John Sontag for his active support of this work. We would like to thank the anonymous referees for useful remarks and insightful questions, and our shepherd Srinivasan Seshan for constructive suggestions to improve the content and presentation of the paper.

Next: Bibliography Up: FastReplica: Efficient Large File Previous: 4 Performance Evaluation