The correlation algorithm operates by comparing the results of two
executions of the grouping algorithms. Let Pt-1 and Pt be the
group sets generated by the grouping algorithm at time t-1 and t
respectively. The correlation algorithm updates the ID set of Pt,
so that
IDGt = IDGt-1, where and
, if and only if Gt and Gt-1 considered to represent the
same logical role. More specifically, the connection patterns of the
members of Gt and those of Gt-1 are very similar. The groups
correlation algorithm correlates the IDGt and IDGt-1 in a
meaningful manner and thus allow applications to preserve data specific
to a particular group.
The role correlation algorithm:
First, the correlation algorithm eliminates the differences between the
two host sets, It and It-1, so that it can compare the
connection patterns meaningfully. The algorithm computes the set of
nodes that existed at time t-1 but have been removed in time t
(
), and the set of nodes that only appear at
time t (
). All new nodes are removed from
It and deleted nodes are removed from It-1. Thus, the changes in
connection set of each host is only as a direct result of changing
connection patterns between the host and its neighbors (which existed at
time t).
Second, the algorithm heuristically identifies the set,
, of
nodes that are very like to play the same logical roles from t-1
to t. We say that the two nodes ht and ht-1 are highly likely
to be the same machine (i.e. it hasn't changed its logical role) if they
have the identical connection sets. Specifically,
. We will explain
shortly how we use the fact that a host
to our
advantage in computing the time varying similarity measure.
The role correlation algorithm will determine whether the two groups Gt and Gt-1 are the same group by heuristically computing the time-varying similarity measure and comparing against the pre-defined threshold. The group correlation algorithm operates as follows:
Step 1 decides whether the two groups Gt and
Gt-1 are identical based on the time varying similarity measure.
As in Section 4.2, we compute the similarity measure
based on the average number of connections between the groups and their
common neighbors. However, finding the common neighbor set between Gt
and Gt-1 is not trivial. This is because we cannot simply assume
that a neighbor
and a neighbor
are the same host even if they have the same host identifier. We use
the following techniques to identify the common neighbor set: