As discussed in Section 4.2, the propagation of updates is beneficial only if the updates are justified; when a node's incentive to receive updates for a particular key fades, continuing update propagation to that node simply wastes network bandwidth. Therefore, each node needs an independent and decentralized way of controlling its intake of updates.
We base a node's incentive to receive updates for a key on the popularity of the key at the node. The more popular a key is, the more incentive there is to receive updates for that key, because updates for that key are more likely to be justified. For a key K, the popularity is the number of queries a node has received for K since the last update for K arrived at the node. (Note that the popularity metric is node-dependent and could be defined in another way such as with a moving average of query arrivals for K.)
We examine two types of thresholds against which to test a key's popularity when making the cut-off decision: probability-based and history-based.
A probability-based threshold uses the distance of a node N from the
authority node A to approximate the probability that an update pushed
to N is justified. Per our cost model of section 4.2, the
further N is from A, the less likely an update at N will be justified.
We examine two such thresholds, a linear one and a logarithmic one.
With a linear threshold, if an update for key K arrives at a node at
distance and the node has received at least
queries for
K since the last update for some constant
, then K is
considered popular and the node continues to receive updates for K.
Otherwise, the node cuts off its intake of updates for K by pushing up
a clear-bit message. The logarithmic popularity threshold is similar.
A key K is popular if the node has received
queries
since the last update. The logarithmic threshold is more lenient than
the linear in that it increases at a slower rate as we move away from
the root.
A history-based threshold is one that is based on the recent history
of the last n update arrivals at the node. If within n
updates, the node has not received any queries, then the key is not
popular and the node pushes up a clear-bit message. A specific
example of a history-based policy is the ``second-chance policy'',
. When an update arrives, if no queries have arrived since the
last update, the policy gives the key a ``second chance'' and waits
for the next update. If at the next update, still no queries for K
have been received, the node pushes a clear-bit message. The
philosophy behind this policy is that pushing these two updates down
from the node's parent costs the same as one query miss occurring at
the node, since a query miss incurs one hop up to the parent and one
hop down. This means that just one query arriving at the node between
the first update and the expiration of the second update is enough to
recover their propagation cost.
Table 1 compares PCX with CUP using the linear and
logarithmic polices for various values, with CUP using second
chance, and with a version of CUP that does not use any cut-off policy
but instead pushes updates until the optimal push level is reached.
To determine the optimal push level we make CUP propagate updates to
all querying nodes that are at most
hops from the authority node.
By varying the push level
, we determine the level which achieves
minimum total cost. This is shown by the row labeled ``optimal push
level'' and used as a baseline against which to compare PCX and CUP
with the cut-off policies described.
In Table 1 we show the cut-off policy results for a
network of 1024 nodes and Poisson rates of 1, 10, 100 and
1000 queries per second. In each table entry, the first number is the
total cost and the number in parentheses is the total cost normalized
by the total cost for PCX. First, we see that regardless of the
cut-off policy used, CUP outperforms PCX. Second, for the lower query
rates, the performance of the linear and the logarithmic policies is
greatly affected by the choice of parameter
, whereas for the
higher query rates, the choice of
is less dramatic. These
results show that choosing a priori an
value for the linear
and logarithmic policies that will perform well across all workloads
is difficult.
For the higher query rates, the history-based second-chance policy performs comparably to the probability-based policies, and for the lower query rates outperforms the probability-based policies. In fact, across all rates, the second-chance policy achieves a total cost very near the optimal push level total cost. In all remaining experiments, we use second-chance as the cut-off policy.