IPTPS '10 Workshop Session Abstracts

Back to Program

Back to Program

WORKSHOP PROGRAM ABSTRACTS

Blindfold: A System to "See No Evil" in Content Discovery
Back to Program
Existing content aggregators provide fast and efficient access to large volumes of shared data and serve as critical centralized components of many peer-to-peer systems, including content discovery for BitTorrent. These aggregators' operators are tasked to spend significant human resources to manually vet uploaded data to ensure compliance with copyright laws. This task does not scale with today's increasing demand for such services. In this paper, we introduce Blindfold, a scheme to ensure that the operators of content aggregators are completely blind to the content that they are storing and serving, thereby eliminating the possibility to censor content at the servers. It works by partitioning the search and upload operations into a series of dependent key-value operations across servers under different administrative domains, with the connection between servers obfuscated using captchas. We have implemented a prototype of Blindfold to show that it is a simple, feasible, and efficient system for serving content that is opaque to the storage servers.

AmazingStore: Available, Low-cost Online Storage Service Using Cloudlets
Back to Program
"Cloud-based" Internet services rely on the availability and reliability of managed data centers. Recent events indicate that data centers tend to create centralized points of failure, and providing resilience to large-scale faults remains a significant challenge for both providers and users of cloud infrastructures. Running data centers also incurs high hardware and network costs, particularly for storage-intensive applications such as data synchronization and backup. In this paper, we show how to improve data availability while reducing costs in storage clouds, by augmenting centralized clouds with an efficient client-side storage system. We introduce AmazingStore, a low-cost cloud storage system that provides high data availability while protecting against correlated failures. We describe our initial experiences with an already deployed prototype and outline opportunities in this modified cloud model.

Peer-to-Peer Bargaining in Container-Based Datacenters
Back to Program
In container-based datacenters, failure-prone components are sealed in pre-packaged shipping containers, and component failures over time reduce the availability of resources. From the perspective of services, application instances can usually be migrated across the boundary of containers as virtual machines (VMs). In such an environment, it would be sometimes beneficial to migrate application instances to take better advantage of available resources. In this paper, we first point out that application placement strategies that are singularly focused on the efficiency of resource usage may not be beneficial, especially when resources are over-provisioned in container-based data centers. We believe that application instances should be placed based on the Buffet principle, using resources in an aggressive fashion. Once failures occur, application instances can be migrated across containers, by allowing containers to bargain with each other in a peer-to-peer fashion, treating application instances with different resource requirements as "commodities" in a Nash bargaining game. We show that the ratio of utilizing resources can improved by establishing such local Nash bargaining games, where two containers bargain and trade VMs with each other to increase their own utilities. With simulation, we verify the effectiveness of the proposed bargaining games.

Estimating Peer Similarity using Distance of Shared Files
Back to Program
Peer-to-Peer (p2p) networks are used by millions of users for sharing content. As these networks become ever more popular, it becomes increasingly difficult to find useful content in the abundance of shared files. Modern p2p networks and similar social services must adopt new methods to help users efficiently locate content, and to this end approximate meta-data search and recommendation systems are utilized. However, meta-data is often missing or wrong, and recommender systems are not fitted to handle p2p networks due to inherent difficulties such as implicit ranking, noise in user generated content and the extreme dimensions and sparseness of the network. This paper attempts to bridge this gap by suggesting a new metric for peer similarity, which can be used to improve content search and recommendation in large scale p2p networks and semi-centralized services, such as p2p IPTV. Unlike commonly used vector distance functions, which is shown to be unfitted for p2p networks due to low overlap between peers, this work leverages a file similarity graph for estimating the similarity between peers that have little or no overlap of shared files. Using 100k peers sharing over 500k songs in the Gnutella network, we show the advantages of the proposed metric over commonly used geographical locality and vector distance measures.

Don't Love Thy Nearest Neighbor
Back to Program
Distributed applications often require finding nodes that satisfy location constraints. For instance, nodes in overlay networks may need to find their nearest neighbor by latency. Network coordinate systems, by their construction, are ideally suited for solving this problem. In the general case however, nearest neighbor algorithms built using network coordinates are not sufficient. The node closest to a theoretically optimal point can have a cost that is arbitrarily high. In this paper, we consider generalizations of node location using network coordinates. We show that finding nodes under location constraints can naturally be expressed as a cost optimization problem, where the cost of each node is determined by an arbitrary continuous function over node coordinates. We present the design of Sherpa, an overlay network that discovers the lowest cost node for distributed applications and show its applicability to locating rendezvous points for online game participants.

SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks
Back to Program
This paper presents SplitQuest, a controlled and exhaustive search protocol that relies on a hybrid network structure to avoid unnecessary query replication and to speed up query propagation in P2P networks. In SplitQuest, peers are organized in replication groups, in which each peer shares its contents with all members, and queries are propagated only once to a group. By avoiding query duplication, directing queries to disjoint groups, and exploiting peers' heterogeneity, SplitQuest is able to achieve high levels of recalls and low response times, while incurring very low overhead. We simulate SplitQuest using synthetic and traces of real-world topologies and show that it outperforms the best known solution in number of messages, response time, number of hops, and success rate for query resolution, while being resilient to high churn rates. We also derive upper bounds on query routing for SplitQuest.

Balancing Gossip Exchanges in Networks with Firewalls
Back to Program
Gossip protocols are an important building block of many large-scale systems. They have inherent loadbalancing properties as long as nodes are deployed over a network with a "flat" topology, that is, a topology where any pair of nodes may engage in a gossip exchange. Unfortunately, the Internet is not flat in the sense that firewalls and NAT boxes block many peer-wise interactions. In particular, nodes that are behind a firewall can initiate communication with nodes on the public Internet, but not vice versa. This may easily unbalance the number of gossip exchanges in which nodes are involved. In particular, nodes in well connected regions of the network tend to participate in many more interactions than other nodes and may suffer from resource exhaustion. In this paper we present and evaluate a new approach to balance gossip exchanges in networks with firewalls. Our solution requires only local information and has no coordination overhead, allowing nodes to participate in a similar number of gossip exchanges independent of the network topology.

A Middleware for Gossip Protocols
Back to Program
Gossip protocols are known to be highly robust in scenarios with high churn, but if the data that is being gossiped becomes corrupted, a protocol's very robustness can make it hard to fix the problem. All participants need to be taken down, any disk-based data needs to be scrubbed, the cause of the corruption needs to be fixed, and only then can participants be restarted. If even a single participant is skipped in this process, say because it was temporarily unreachable, then it can contaminate the entire system all over again. We describe the design and implementation of a new middleware for gossip protocols that addresses this problem. Our middleware offers the ability to update code dynamically and provides a small resilient core that allows updating code that has failed catastrophically. Our initial PlanetLab-based deployment demonstrates that the middleware is efficient.

StAN: Exploiting Shared Interests without Disclosing Them in Gossip-based Publish/Subscribe
Back to Program
Publish/subscribe mechanisms for scalable event dissemination are a core component of many distributed systems ranging from Enterprise Application Integration middleware to news dissemination in the Internet. Hence, a lot of research has been done on overlay networks for efficient decentralized topic-based routing. Specifically, in gossip-based dissemination, bringing nodes with shared interests closer in the overlay makes dissemination more efficient. Unfortunately, this usually requires fully disclosing interests to nearby nodes and impacts reliability due to clustering. In this paper we address this by starting with multiple overlays, one for each topic subscribed, that then separately self-organize to maximize the number of shared physical links, thereby leading to reduced message traffic and maintenance overhead. This is achieved without disclosing a node's topic subscription to any node that isn't subscribed to the same topic and without impacting the robustness of the overlay. Besides presenting the overlay management protocol, we evaluate it using simulation in order to validate our results.

Public and Private BitTorrent Communities: A Measurement Study
Back to Program
BitTorrent communities, both public and private, are immensely popular in the Internet, with tens of millions of users simultaneously active at any given moment. Public and private BitTorrent communities are managed in different ways—for instance, some private communities enforce sharing ratios, have strict rules for content management, have a certain level of community oversight, and maintain a strong sense of exclusiveness. In this paper, we present the results of extensive measurements of more than half a million peers in five communities, ranging from highly popular and well-known public communities to elite private communities that can only be joined by invitation. We observe that the performance experienced by downloaders in the private communities is by far superior to the performance in the public communities, and we observe significant differences in connectability, seeder/leecher ratio, and seeding duration. Based on our results, we conjecture that when effective ratio enforcement mechanisms are in place, BitTorrent's tit-for-tat mechanism is hardly influential anymore. Our multi-community, multi-swarm measurements are significantly broader and more extensive than any earlier measurement study on BitTorrent.

Comparing BitTorrent Clients in the Wild: The Case of Download Speed
Back to Program
BitTorrent (BT) is currently the main P2P protocol used for sharing large files over the Internet. It is therefore important to understand the performance characteristics of existing real-world BT implementations (clients). In this work, we measure the download speed of over 10 million BT users over one month. Surprisingly, our measurements show that the two most famous BT clients, namely uTorrent and Vuze (Azureus), achieve different download speeds for the same set of torrents. In particular, we observe that uTorrent users achieve 16% faster download speed than users of Vuze in our data set. To shed light to the cause of this difference, we reverse engineer the two clients to infer their individual design choices. Our study shows that the two clients differ in two important areas: (a) how they manage their neighborhood, and (b) how they distribute their upload capacity to their peers. We speculate that the cause of the mismatch in download speeds can be attribute to these differences. We hope that our findings will open the door for new research efforts to better understand the impact of design choices in the performance of real-world BT implementations.

Power-law Revisited: A Large Scale Measurement Study of P2P Content Popularity
Back to Program
The popularity of contents on the Internet is often said to follow a Zipf-like distribution. Different measurement studies showed, however, significantly different distributions depending on the measurement methodology they followed. We performed a large-scale measurement of the most popular peer-to- peer (P2P) content distribution system, BitTorrent, over eleven months. We collected data on a daily to weekly basis from 500 to 800 trackers, with information about 40 to 60 million peers that participated in the distribution of over 10 million torrents. Based on these measurements we show how fundamental characteristics of the observed distribution of content popularity change depending on the measurement methodology and the length of the observation interval. We show that while short-term or small-scale measurements can conclude that the popularity of contents exhibits a power-law tail, the tail is likely exponentially decreasing, especially over long time intervals.

Strange Bedfellows: Community Identification in BitTorrent
Back to Program
While P2P systems benefit from large numbers of interconnected nodes, each of these connections provides an opportunity for eavesdropping. Using only the connection patterns gathered from 10,000 BitTorrent (BT) users during a one-month period, we determine whether randomized connection patterns give rise to communities of users. Even though connections in BT require not only shared interest in content, but also concurrent sessions, we find that strong communities naturally form—users inside a typical community are 5 to 25 times more likely to connect to each other than with users outside. These strong communities enable guilt by association, where the behavior of an entire community of users can be inferred by monitoring one of its members. Our study shows that through a single observation point, an attacker trying to identify such communities can uncover 50% of the network within a distance of two hops. Finally, we propose and evaluate a practical solution that mitigates this threat.

Need help? Use our Contacts page.

Back to Program
Last changed: 31 March 2010 jel