A node that reports single-node attributes , , ..., periodically sends a tuple of all of its values for those attributes to the DHT keys , , ..., , with , where each is computed based on the corresponding value of at the time the measurement is sent. We call each such message a measurement report. Associated with each attribute is a function that maps the value of an attribute to a 160-bit DHT key. SWORD provides default mapping functions for its 54 pre-configured attributes and allows the administrator to specify new ones when a new attribute is added to the system. The mapping functions convert measured values from their native datatype and range to the range of the DHT key space. The generated key is composed of high-order ``attribute bits'' used to partition attributes among subsets of DHT nodes so as to bound the maximum number of nodes among which values for an attribute are spread; and low-order ``value bits'' and ``random bits'' used to spread the expected range of an attribute evenly among all nodes responsible for that attribute. A second, active layer of load balancing can be added to these passive techniques [3], but active load balancing is not used in our PlanetLab deployment.
Upon receiving a measurement report, a server (DHT node) stores the tuple in memory. Measurement reports time out after a configurable multiple of the measurement report interval so that information about dead nodes, and nodes that have become the responsibility of another DHT server due to nodes joining or leaving the DHT, is removed. SWORD uses a multi-attribute range search mechanism similar to Mercury's [3] to find nodes meeting the single-node requirements. Once the ``candidate nodes'' satisfying all single-node requirements have been returned, the querying node obtains the inter-node measurements. SWORD includes an implementation of the Vivaldi network coordinates algorithm [5], and inter-node latencies are computed from the network coordinates of the returned nodes. Finally the single-node and inter-node measurements are sent to the querying node's optimizer, which attempts to find a penalty-minimizing assignment of candidate nodes to groups. The number of candidate nodes returned from the distributed range query may be larger than the number of nodes ultimately mapped to groups; conversely, an insufficient number of nodes may meet the query's requirements in which case the request cannot be satisfied.
Jeannie Albrecht 2004-11-03