The protocol for creating additional
replicas for a file is run when a user tries to access
a file not present in her local node. Say that a user on node
wants to read file
. A read or write request is always
preceded by a directory lookup (during the open request) on
. Thus, to create a replica,
must replicate the file's parent
directory. This recursive step may continue all the way up to the root
directory. The locations of root replicas are maintained by
the membership service (Section 3.2).
Pangaea performs a short-cut replica
creation to transfer data from a nearby existing replica.
To create a replica of ,
first discovers
the file's gold replicas in the directory entry
during the path-name lookup.
then requests the file contents from the
gold replica closest to
(say
).
then finds a replica closest
to
among its own graph neighbors (say
, which may be
itself) and forwards the request to
, which in turn sends the
contents to
. At this point,
replies to the user and lets her
start accessing the replica.
This request forwarding is performed because
the directory only knows
's gold replicas, and there may be
a bronze replica closer to
than the gold ones.
The new copy must be integrated into the file's replica graph to be
able to propagate updates to and receive updates from other replicas.
Thus, in the background, chooses
existing
replicas of
, adds edges to them, and requests them to
add edges to the new replica in
. The selection of
peers must
satisfy three goals:
Pangaea satisfies all these goals simultaneously, as a replica can
have multiple edges. chooses three types of peers for the new
replica. First,
adds an edge to a random gold replica, preferably
one from a different region than
, to give that gold
replica more variety of regions in its neighbor set. Second, it asks
a random gold replica, say
, to pick the replica (among
's
immediate graph neighbors) closest to
. Third,
asks
to
choose
random replicas using random walks that start from
and perform a series of RPC calls along graph edges.
This protocol ensures that the resulting graph is
edge- and
node- connected, provided that it was
-connected before.
Parameter trades off availability and performance. A
small value increases the probability of graph disconnection (i.e.,
the probability that a replica cannot exchange updates with other
replicas) after node failures. A large value for
increases the
overhead of graph maintenance and update propagation by causing
duplicate update delivery. We found that
offers a good balance in
our prototype.