The workload characteristics of wide-area collaboration systems are not well
known. We thus created a synthetic benchmark modeled after a
bulletin-board system. In this benchmark, articles (files) are
continuously posted or updated from nodes chosen uniformly at random;
other randomly chosen nodes (i.e., users) fetch new articles not yet
read. A file system's performance is measured by two metrics: the
mean latency of reading a file never accessed before by the server,
and the wide-area network bandwidth consumption for files that are
updated. These two numbers depend, if at all, only on the file size,
the number of existing replicas (since Pangaea can perform
short-cut creation), and the order in which these replicas are created
(since it affects the shape of the graph). We choose an
article size of 50KB, a size typical in Usenet [29]. We
try to average out the final parameter by creating and reading about
1000 random files for each sample point and computing the mean.
We run both article posters and readers at a constant speed
(5 articles posted or read/second), because our performance
metrics are independent of request inter-arrival time.
In this benchmark, we run multiple servers in a single (physical) node to build a configuration with a realistic size. To avoid overloading the CPU or the disk, we choose to run six virtual servers on a type-B machine (Table 1), and three virtual servers on each of other machines, with the total of 36 servers on 9 physical nodes. Figure 10 shows the simulated geographical distribution of nodes, modeled after HP's corporate network. For the same logistical reasons, instead of Coda, we compare three versions of Pangaea:
![]() |
![]() |
We expect Pangaea's access latency to be reduced as more replicas are added, since that increases the chance of file contents being transferred to a new replica from a nearby existing replica. Figure 11 confirms this prediction. In contrast, the hub configuration shows no speedup no matter how many replicas of a file exist, because it always fetches data from the central replica.
Figure 12 shows the network bandwidth consumption during file
updates. Although all the systems consume the same total amount of
traffic per update (i.e.,
, Pangaea uses
far less wide-area network traffic since it transfers data
preferentially along fast links using dynamic spanning-tree construction
(Section 5.1.3). This trend becomes accentuated as more replicas
are created.
![]() |
Figure 13 shows the time the pang configuration
took to propagate updates
to replicas of files during the same experiment. The ``max'' lines show
large fluctuations, because updates must travel over 300ms RTT links
multiple times using TCP. Both numbers are independent of the
number of replicas, because (given a specific network configuration)
the propagation delay depends only on the graph diameter, which is
three, in this configuration. We believe that
seconds average/
seconds
maximum delay for propagating 50KB of contents over 300ms, 1Mb/s links is
reasonable. In fact, most of the time is spent in waiting when
constructing a spanning tree (Section 5.1.3); cutting the delay
parameter would shrink the propagation latency, but potentially would
worsen the network bandwidth usage.
![]() |