The correlation algorithm operates by comparing the results of two executions of the grouping algorithms. Let Pt-1 and Pt be the group sets generated by the grouping algorithm at time t-1 and t respectively. The correlation algorithm updates the ID set of Pt, so that IDGt = IDGt-1, where and , if and only if Gt and Gt-1 considered to represent the same logical role. More specifically, the connection patterns of the members of Gt and those of Gt-1 are very similar. The groups correlation algorithm correlates the IDGt and IDGt-1 in a meaningful manner and thus allow applications to preserve data specific to a particular group.
The role correlation algorithm:
First, the correlation algorithm eliminates the differences between the two host sets, It and It-1, so that it can compare the connection patterns meaningfully. The algorithm computes the set of nodes that existed at time t-1 but have been removed in time t ( ), and the set of nodes that only appear at time t ( ). All new nodes are removed from It and deleted nodes are removed from It-1. Thus, the changes in connection set of each host is only as a direct result of changing connection patterns between the host and its neighbors (which existed at time t).
Second, the algorithm heuristically identifies the set, , of nodes that are very like to play the same logical roles from t-1 to t. We say that the two nodes ht and ht-1 are highly likely to be the same machine (i.e. it hasn't changed its logical role) if they have the identical connection sets. Specifically, . We will explain shortly how we use the fact that a host to our advantage in computing the time varying similarity measure.
The role correlation algorithm will determine whether the two groups Gt and Gt-1 are the same group by heuristically computing the time-varying similarity measure and comparing against the pre-defined threshold. The group correlation algorithm operates as follows:
Step 1 decides whether the two groups Gt and Gt-1 are identical based on the time varying similarity measure. As in Section 4.2, we compute the similarity measure based on the average number of connections between the groups and their common neighbors. However, finding the common neighbor set between Gt and Gt-1 is not trivial. This is because we cannot simply assume that a neighbor and a neighbor are the same host even if they have the same host identifier. We use the following techniques to identify the common neighbor set: