Next: Defining Similarity Up: Role Classification of Hosts Previous: System Overview

Model

In this section, we develop a model for thinking about the grouping problem. We define the problem in the abstract, providing a model with several functions and parameters that can be adjusted to meet various goals. Later in the paper, we present and evaluate instantiations of these parameters.

Let I be the set of hosts in an enterprise network. We will use |I| to denote the number of hosts in I.
Let similarity be a commutative function from pairs of hosts in I to an integer greater than or equal to 0. Roughly speaking, if similarity(h₁, h₂) is high, then we would like our grouping algorithm to place the hosts h₁ and h₂ in the same group. Defining similarity so that it is both efficient to compute and yields a good grouping is at the heart of the problem addressed in this paper.
A partitioning P of I respects similarity if for all distinct groups , , and ,
- similarity $(h_1,h_2) \geq$ similarity(h₁,h₃)
- similarity $(h_1,h_2) \geq$ similarity(h₂,h₃)

We extend this definition of similarity to define the average similarity between a host h₁ and a group G₂, avg_similarity(h₁, G₂), as the ratio of the sum of the similarity between h₁ and each $h_2 \in G_2$ to the number of hosts in G₂:

$\begin{displaymath} \textit{avg\_similarity}(h_1, G_2) = \frac{\sum_{h_2\in G_2}{\mbox{\emph{similarity}}(h_1, h_2)}}{\vert G_2\vert} \end{displaymath}$

A partitioning P of I respects avg_similarity if for all $h_1 \in G_1$ and $G_2 \in P$ , avg_similarity $(h_1, G_1) \geq$ avg_similarity(h₁, G₂). Respecting similarity or avg_similarity is not sufficient to generate a useful partitioning of I. After all, a partitioning that puts all the nodes in one group or one that puts each node in a separate group respects similarity. We therefore provide a parameter that can be used by network administrators to control how aggressive the algorithm is in partitioning I into groups.

Let S_min, the similarity threshold, be an integer greater 0. A partitioning respects similarity and S_min if it respects similarity and if, for h₁ and h₂ in G, similarity $(h_1, h_2) \geq S_{min}$ .
A partitioning P of I is said to be maximal with respect to similarity and S_min if it respects similarity and S_min and there does not exist another partitioning of I that respects similarity and S_min and has fewer groups. By adjusting S_min, one gets a maximal grouping with fewer groups in which the members of each group are more similar to each other.

Subsections

Defining Similarity

Next: Defining Similarity Up: Role Classification of Hosts Previous: System Overview

Godfrey Tan 2003-04-01