We evaluate the effectiveness of the role classification algorithm by comparing the groups formed by the algorithm against the logical roles that hosts play as determined by knowledgeable network administrators. For all the experiments, unless otherwise noted, we set user-defined thresholds, Shi = 80, Slo = 55, and Khi = 7. We examine how these thresholds affect the results in later sections.
Figure 4 shows some of the groups formed by the role classification algorithm running on the Mazu data and configured with the default parameters. Each circle in the figure represents a group and lists its members and its connections with other groups. Where possible, we have indicated the logical role of each host, which we obtained by asking the Mazu network administrator. (Of course, this logical information was not used in constructing the grouping.)
Observe that the role classification algorithm placed almost all engineering (eng) machines in a single group, 85. Also note that the number of connections of an engineering host varies from 4 to 9. Similarly, most machines used by sales, management (admin) and operations (ops) were placed in a single group, 87. The largest group, 80, contains new machines and test machines in the lab.
However, four hosts that are identified as engineering machines are placed in group 87 rather than group 85. The reason is that these machines do not communicate with a set of hosts that engineering machines in group 90 communicate with. As shown in Figure 4, each engineering machine in group 90 has, on average, one connection with group 10, which consists of a Unix mail server, and one connection with group 6, which consists of a source revision control server (not shown in the figure). On the other hand, almost every sales host in group 87 communicates with both the Microsoft Exchange server and the NT sever from group 71, but not with the Unix mail server nor the source revision control server. In fact, there are just two connections between group 87 and each of groups 6 and 10. The four engineering hosts in 87 had connection patterns very similar to those of sales hosts, so they were grouped accordingly. Most probably, these machines are used by engineering managers who do not perform engineering-related tasks such as coding, and use the Exchange mail server instead of the Unix mail server.
If we have the perfect knowledge of the logical structure of the
network, we can use that knowledge to quantify the resulting quality
of the groups produced by grouping algorithms. One simple yet
effective metric used in the cluster validation
literature [16,12] is Rand
Statistic, which is based on testing whether a pair of objects belongs
to the same group as decided by the grouping algorithm and according
to our knowledge. Let P and P* be the partitionings of hosts
produced by a grouping algorithm and based on our knowledge
respectively. Let SS, SD, DS and DD be the numbers of host
pairs that belong to the same group in both P* and P, to the same
group in P* and to different groups in P, to different groups in
P* and to the same group in P, and to different groups in both
P* and P respectively. SD and DS are indicative of how
different P is from P*. Rand Statistic
is between 0 and 1. The higher the value, the
more similar P and P* are.
For the Mazu network, we were able to ascertain the logical roles of all except 8 hosts. We worked closely with the Mazu network administrator to obtain P*, the ideal partitioning of hosts. We find that the partitioning produced by the grouping algorithm (with default parameters) achieves SS = 452, SD = 710, DS = 133, DD = 3856 and R = 0.8363. This shows that the results of the grouping algorithm reflect to a high degree our intuitive notion of the underlying structure of the network. We note that the reason for having a relatively high SD is because the algorithm identifies subsets of hosts within large groups as separate groups. For instance, the grouping algorithm produces a few different groups, each containing a single eng machine instead of putting them in group 80 (not shown in the figure). This is because those eng machines have the total number of connections far greater than the average number of connections that a host in group 80 does. Such distinction may prove useful in certain situations.
Table 1 lists the five largest groups produced by running
the grouping algorithm on the BigCompany network. Again, we relied on information
generated by the network administrator to help us understand whether the
groupings generated by the algorithm matched the logical structure of
the network. Group 1020 consists of desktops whose IP addresses are
managed by the DHCP server. Almost every machine in group 1020
communicates with approximately 85 of the machines in group 1075,
and vice-versa. This pattern suggests that it was appropriate for the
grouping algorithm to combine the machines in group 1075, which use
static IPs, into a group. Most machines in both groups run Microsoft
Windows. The high number of connections between the groups is due to
Windows file sharing, which uses the NetBIOS network protocol. File
sharing creates a large number of connections between the hosts in the
two groups, even though in both groups there is little intra-group
communication. We continue to investigate this interesting relationship
between the two groups. It is striking, and further proof of the need
for better analysis tools, that the network administrators we have
talked to themselves don't know why the groups are partitioned in this
way.
The grouping algorithm also correctly classifies all IP phones into one group, 1092. Group 1138 consists of web servers and other servers that desktops in group 1020 regularly communicate with. Group 1043 is the largest group, with 1519 members. Most machines in this group have a single connection (hence the role name idle), and that is to a host that opens connections to about 1,600 machines. With our help, BigCompany is currently investigating why this host scans about 45% of all the machines on the network. This example is another good use of how the role classification algorithm can be applied to understand networks and detect anomalous behavior.
Table 2 summarizes the grouping results of the two networks. Observe that the number of groups in the BigCompany network is 26 times smaller than the number of hosts. Unfortunately, we cannot use Rand Statistic to quantify the quality of the groups produced by the grouping algorithm since we don't have the perfect knowledge of the logical roles of each machine in the BigCompany network. Nevertheless, the network administrators at BigCompany report that they find them both useful and consistent with their intuitions about their networks. We are also in the process of analyzing a larger network owned by HugeCompany that consists of 49,041 hosts.