A node that reports single-node attributes
,
, ...,
periodically sends a tuple of all of its values for those attributes to the
DHT keys
,
, ...,
, with
, where each
is computed based on
the corresponding value of
at the time the measurement is sent.
We call each such message a measurement report.
Associated with each attribute is a function that
maps the value of an attribute to a 160-bit DHT key.
SWORD provides default mapping functions for its 54 pre-configured
attributes and allows the administrator to specify new ones
when a new attribute is added to the system. The mapping
functions convert measured values from their native datatype and range
to the range of the DHT key space. The generated key is composed of
high-order ``attribute bits'' used to
partition attributes among
subsets of DHT nodes so as to bound the maximum number of nodes among which
values for an attribute are spread; and low-order ``value bits'' and
``random bits'' used to spread the expected range of
an attribute evenly among all nodes responsible for that attribute.
A second, active layer of load balancing can be added to these
passive techniques [3], but active load balancing is not used in our PlanetLab
deployment.
Upon receiving a measurement report, a server (DHT node) stores the tuple in memory. Measurement reports time out after a configurable multiple of the measurement report interval so that information about dead nodes, and nodes that have become the responsibility of another DHT server due to nodes joining or leaving the DHT, is removed. SWORD uses a multi-attribute range search mechanism similar to Mercury's [3] to find nodes meeting the single-node requirements. Once the ``candidate nodes'' satisfying all single-node requirements have been returned, the querying node obtains the inter-node measurements. SWORD includes an implementation of the Vivaldi network coordinates algorithm [5], and inter-node latencies are computed from the network coordinates of the returned nodes. Finally the single-node and inter-node measurements are sent to the querying node's optimizer, which attempts to find a penalty-minimizing assignment of candidate nodes to groups. The number of candidate nodes returned from the distributed range query may be larger than the number of nodes ultimately mapped to groups; conversely, an insufficient number of nodes may meet the query's requirements in which case the request cannot be satisfied.
Jeannie Albrecht 2004-11-03