Blindfold: A System to "See No Evil" in Content Discovery Back to Program
Existing content aggregators provide fast and efficient access to large volumes of shared data and serve as critical centralized components of many peer-to-peer systems, including content discovery for BitTorrent. These aggregators' operators are tasked to spend significant human resources to manually vet uploaded data to ensure compliance with copyright laws. This task does not scale with today's increasing demand for such services. In this paper, we introduce Blindfold, a scheme to ensure that the operators of content aggregators are completely blind to the content that they are storing and serving, thereby eliminating the possibility to censor content at the servers. It works by partitioning the search and upload operations into a series of dependent key-value operations across servers under different administrative domains, with the connection between servers obfuscated using captchas. We have implemented a prototype of Blindfold to show that it is a simple, feasible, and efficient system for serving content that is opaque to the storage servers.
AmazingStore: Available, Low-cost Online Storage Service Using Cloudlets Back to Program
"Cloud-based" Internet services rely on the availability and reliability of managed data centers. Recent events indicate that data centers tend to create centralized points of failure, and providing resilience to large-scale faults remains a significant challenge for both providers and users of cloud infrastructures. Running data centers also incurs high hardware and network costs, particularly for storage-intensive applications such as data synchronization and backup. In this paper, we show how to improve data availability while reducing costs in storage clouds, by augmenting centralized clouds with an efficient client-side storage system. We introduce AmazingStore, a low-cost cloud storage system that provides high data availability while protecting against correlated failures. We describe our initial experiences with an already deployed prototype and outline opportunities in this modified cloud model.
Peer-to-Peer Bargaining in Container-Based Datacenters Back to Program
In container-based datacenters, failure-prone components
are sealed in pre-packaged shipping containers, and
component failures over time reduce the availability of resources.
From the perspective of services, application instances
can usually be migrated across the boundary of containers as
virtual machines (VMs). In such an environment, it would be
sometimes beneficial to migrate application instances to take
better advantage of available resources. In this paper, we first
point out that application placement strategies that are singularly
focused on the efficiency of resource usage may not be beneficial,
especially when resources are over-provisioned in container-based
data centers. We believe that application instances should be
placed based on the Buffet principle, using resources in an
aggressive fashion. Once failures occur, application instances
can be migrated across containers, by allowing containers to
bargain with each other in a peer-to-peer fashion, treating
application instances with different resource requirements as
"commodities" in a Nash bargaining game. We show that the
ratio of utilizing resources can improved by establishing such
local Nash bargaining games, where two containers bargain and
trade VMs with each other to increase their own utilities. With
simulation, we verify the effectiveness of the proposed bargaining
games.
Estimating Peer Similarity using Distance of Shared Files Back to Program
Peer-to-Peer (p2p) networks are used by millions of
users for sharing content. As these networks become ever more
popular, it becomes increasingly difficult to find useful content in
the abundance of shared files. Modern p2p networks and similar
social services must adopt new methods to help users efficiently
locate content, and to this end approximate meta-data search
and recommendation systems are utilized. However, meta-data
is often missing or wrong, and recommender systems are not
fitted to handle p2p networks due to inherent difficulties such as
implicit ranking, noise in user generated content and the extreme
dimensions and sparseness of the network.
This paper attempts to bridge this gap by suggesting a new
metric for peer similarity, which can be used to improve content
search and recommendation in large scale p2p networks and
semi-centralized services, such as p2p IPTV. Unlike commonly
used vector distance functions, which is shown to be unfitted
for p2p networks due to low overlap between peers, this work
leverages a file similarity graph for estimating the similarity
between peers that have little or no overlap of shared files. Using
100k peers sharing over 500k songs in the Gnutella network, we
show the advantages of the proposed metric over commonly used
geographical locality and vector distance measures.
Don't Love Thy Nearest Neighbor Back to Program
Distributed applications often require finding nodes that satisfy location constraints. For instance, nodes in overlay networks may need to find their nearest neighbor by latency. Network coordinate systems, by their construction, are ideally suited for solving this problem. In the general case however, nearest neighbor algorithms built using network coordinates are not sufficient. The node closest to a theoretically optimal point can have a cost that is arbitrarily high. In this paper, we consider generalizations of node location using network coordinates. We show that finding nodes under location constraints can naturally be expressed as a cost optimization problem, where the cost of each node is determined by an arbitrary continuous function over node coordinates. We present the design of Sherpa, an overlay network that discovers the lowest cost node for distributed applications and show its applicability to locating rendezvous points for online game participants.
SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks Back to Program
This paper presents SplitQuest, a controlled and exhaustive
search protocol that relies on a hybrid network
structure to avoid unnecessary query replication
and to speed up query propagation in P2P networks. In
SplitQuest, peers are organized in replication groups, in
which each peer shares its contents with all members,
and queries are propagated only once to a group. By
avoiding query duplication, directing queries to disjoint
groups, and exploiting peers' heterogeneity, SplitQuest
is able to achieve high levels of recalls and low response
times, while incurring very low overhead. We simulate
SplitQuest using synthetic and traces of real-world
topologies and show that it outperforms the best known
solution in number of messages, response time, number
of hops, and success rate for query resolution, while being
resilient to high churn rates. We also derive upper
bounds on query routing for SplitQuest.
Balancing Gossip Exchanges in Networks with Firewalls Back to Program
Gossip protocols are an important building block of
many large-scale systems. They have inherent loadbalancing
properties as long as nodes are deployed over
a network with a "flat" topology, that is, a topology where
any pair of nodes may engage in a gossip exchange. Unfortunately,
the Internet is not flat in the sense that firewalls
and NAT boxes block many peer-wise interactions. In particular,
nodes that are behind a firewall can initiate communication
with nodes on the public Internet, but not vice
versa. This may easily unbalance the number of gossip exchanges
in which nodes are involved. In particular, nodes
in well connected regions of the network tend to participate
in many more interactions than other nodes and may suffer
from resource exhaustion.
In this paper we present and evaluate a new approach to
balance gossip exchanges in networks with firewalls. Our
solution requires only local information and has no coordination
overhead, allowing nodes to participate in a similar
number of gossip exchanges independent of the network
topology.
A Middleware for Gossip Protocols Back to Program
Gossip protocols are known to be highly robust in scenarios
with high churn, but if the data that is being gossiped
becomes corrupted, a protocol's very robustness
can make it hard to fix the problem. All participants
need to be taken down, any disk-based data needs to be
scrubbed, the cause of the corruption needs to be fixed,
and only then can participants be restarted. If even a single
participant is skipped in this process, say because it
was temporarily unreachable, then it can contaminate the
entire system all over again. We describe the design and
implementation of a new middleware for gossip protocols
that addresses this problem. Our middleware offers
the ability to update code dynamically and provides a
small resilient core that allows updating code that has
failed catastrophically. Our initial PlanetLab-based deployment
demonstrates that the middleware is efficient.
StAN: Exploiting Shared Interests without Disclosing Them in Gossip-based Publish/Subscribe Back to Program
Publish/subscribe mechanisms for scalable event dissemination
are a core component of many distributed
systems ranging from Enterprise Application Integration
middleware to news dissemination in the Internet.
Hence, a lot of research has been done on overlay
networks for efficient decentralized topic-based routing.
Specifically, in gossip-based dissemination, bringing
nodes with shared interests closer in the overlay
makes dissemination more efficient. Unfortunately, this
usually requires fully disclosing interests to nearby nodes
and impacts reliability due to clustering.
In this paper we address this by starting with multiple
overlays, one for each topic subscribed, that then separately
self-organize to maximize the number of shared
physical links, thereby leading to reduced message traffic
and maintenance overhead. This is achieved without disclosing
a node's topic subscription to any node that isn't
subscribed to the same topic and without impacting the
robustness of the overlay. Besides presenting the overlay
management protocol, we evaluate it using simulation in
order to validate our results.
Public and Private BitTorrent Communities: A Measurement Study Back to Program
BitTorrent communities, both public and private,
are immensely popular in the Internet, with tens of millions
of users simultaneously active at any given moment. Public
and private BitTorrent communities are managed in different
ways—for instance, some private communities enforce sharing
ratios, have strict rules for content management, have a certain
level of community oversight, and maintain a strong sense
of exclusiveness. In this paper, we present the results of
extensive measurements of more than half a million peers in
five communities, ranging from highly popular and well-known
public communities to elite private communities that can only
be joined by invitation. We observe that the performance experienced
by downloaders in the private communities is by far
superior to the performance in the public communities, and we
observe significant differences in connectability, seeder/leecher
ratio, and seeding duration. Based on our results, we conjecture
that when effective ratio enforcement mechanisms are in
place, BitTorrent's tit-for-tat mechanism is hardly influential
anymore. Our multi-community, multi-swarm measurements
are significantly broader and more extensive than any earlier
measurement study on BitTorrent.
Comparing BitTorrent Clients in the Wild: The Case of Download Speed Back to Program
BitTorrent (BT) is currently the main P2P protocol
used for sharing large files over the Internet. It is therefore
important to understand the performance characteristics of
existing real-world BT implementations (clients). In this work,
we measure the download speed of over 10 million BT users over
one month. Surprisingly, our measurements show that the two
most famous BT clients, namely uTorrent and Vuze (Azureus),
achieve different download speeds for the same set of torrents.
In particular, we observe that uTorrent users achieve 16% faster
download speed than users of Vuze in our data set. To shed
light to the cause of this difference, we reverse engineer the
two clients to infer their individual design choices. Our study
shows that the two clients differ in two important areas: (a) how
they manage their neighborhood, and (b) how they distribute
their upload capacity to their peers. We speculate that the cause
of the mismatch in download speeds can be attribute to these
differences. We hope that our findings will open the door for
new research efforts to better understand the impact of design
choices in the performance of real-world BT implementations.
Power-law Revisited: A Large Scale Measurement Study of P2P Content Popularity Back to Program
The popularity of contents on the Internet is often
said to follow a Zipf-like distribution. Different measurement
studies showed, however, significantly different distributions depending
on the measurement methodology they followed. We
performed a large-scale measurement of the most popular peer-to-
peer (P2P) content distribution system, BitTorrent, over eleven
months. We collected data on a daily to weekly basis from
500 to 800 trackers, with information about 40 to 60 million
peers that participated in the distribution of over 10 million
torrents. Based on these measurements we show how fundamental
characteristics of the observed distribution of content popularity
change depending on the measurement methodology and the
length of the observation interval. We show that while short-term
or small-scale measurements can conclude that the popularity of
contents exhibits a power-law tail, the tail is likely exponentially
decreasing, especially over long time intervals.
Strange Bedfellows: Community Identification in BitTorrent Back to Program
While P2P systems benefit from large numbers of
interconnected nodes, each of these connections provides
an opportunity for eavesdropping. Using only the connection
patterns gathered from 10,000 BitTorrent (BT)
users during a one-month period, we determine whether
randomized connection patterns give rise to communities
of users. Even though connections in BT require not only
shared interest in content, but also concurrent sessions,
we find that strong communities naturally form—users
inside a typical community are 5 to 25 times more likely
to connect to each other than with users outside. These
strong communities enable guilt by association, where
the behavior of an entire community of users can be
inferred by monitoring one of its members. Our study
shows that through a single observation point, an attacker
trying to identify such communities can uncover 50% of
the network within a distance of two hops. Finally, we
propose and evaluate a practical solution that mitigates
this threat.
|