Our heuristics are different versions of greedy algorithms: a host repeatedly selects other hosts to include in Core until some condition is met. Hence we chose a representation that makes it easier for a greedy algorithm to find good candidates to include in Core. This representation is a three-level hierarchy.
The top level of the hierarchy is the operating system that a host runs, the second level includes the applications that run on that operating system, and the third level are hosts. Each host runs one operating system, and so each host is subordinate to its operating system in the hierarchy (we can represent hosts running multiple virtual machines as multiple virtual hosts in a straightforward manner). Since most applications run predominately on one platform, hosts that run a different operating system than are likely good candidates for including in Core. We call the first level the containers and the second level the sub-containers. Each sub-container contains a set of hosts. Figure 3 illustrates these abstractions using the configurations of Example 3.1.
More formally, let be the set of canonical operating system names and be the set of containers. Each host has an attribute that is the canonical name of the operating system on . The function maps operating system name to container; thus, is the container that contains .
Let denote the set of canonical names of the applications that are running on , and let be the canonical names of all of the applications. We denote with the set of sub-containers and with the function that maps a container to its sub-containers. The function maps a container and application to a sub-container; thus, for each , host is in each sub-container .
At this high level of abstraction, advertising a configuration is straightforward. Initially is empty. To advertise its configuration, a host first ensures that there is a container such that . Then, for each attribute , ensures that there is a sub-container containing .