Consider a system composed of a set of hosts each of which
is capable of holding certain objects. These
hosts can fail (for example, by crashing) and, to keep these objects
available, they need to be replicated. A simple replication strategy is
to determine the maximum number
of hosts that can fail at any time,
and then maintain more than
replicas of each object.
However, using more than replicas may lead to excessive
replication when host failures are correlated. As a simple example,
consider three hosts
where the failures of
and
are correlated while
fails independent of the other
hosts. If
fails, then the probability of
failing is
high. As a result, one might set
and thereby require
replicas. However, if we place replicas on
and
, the
object's availability may be acceptably high with just two replicas.
To better address issues of optimal replication in the face
of correlated failures, we have defined an abstraction that we call a
core [15]. A core is a minimal set of hosts such that, in
any execution, at least one host in the core does not fail. In the above
example, both and
are cores.
would not be a core since the probability of both failing is too
high and
would not be a core since it is not minimal.
Using this terminology, a central problem of informed replication is
the identification of cores based on the correlation of failures.
An Internet catastrophe causes hosts to fail in a correlated manner
because all hosts running the targeted software are vulnerable.
Operating systems and Web servers are examples of software commonly
exploited by Internet pathogens [27,36]. Hence we
characterize a host's vulnerabilities by the software they run. We
associate with each host a set of attributes, where each
attribute is a canonical name of a software package or system that the
host runs; in Section 3.2 below, we discuss the
tradeoffs of representing software packages at different
granularities. We call the combined representation of all attributes
of a host the configuration of the host. An example of a
configuration is Windows, IIS, IE
, where Windows
is a canonical name for an operating system, IIS for a Web
server package, and IE for a Web browser. Agreeing on canonical
names for attribute values is essential to ensure that dependencies of
host failures are appropriately captured.
An Internet pathogen can be characterized by the set of attributes
that it targets. Any host that has none of the attributes in
is
not susceptible to the pathogen. A core is a minimal set
of hosts
such that, for each pathogen, there is a host
in
that is not
susceptible to the pathogen. Internet pathogens often target a single
(possibly cross-platform) vulnerability, and the ones that target
multiple vulnerabilities target the same operating system. Assuming that any attribute is susceptible to attack, we can re-define a core using attributes:
a core is a minimal set
of processes such that no attribute is
common to all hosts in
. In Section 5.4, we relax this
assumption and show how to extend our results to tolerate pathogens
that can exploit multiple vulnerabilities.
To illustrate these concepts, consider the system described in
Example 3.1. In this system, hosts are characterized
by six attributes which we classify for clarity into operating system, Web
server, and Web browser.
Attributes: | Operating System = {Unix, Windows}; |
Web Server = {Apache, IIS}; | |
Web Browser = {IE, Netscape}. |
Hosts: |
![]() |
![]() |
|
![]() |
|
![]() |
Cores
![]() |
The smaller core might appear to be the better choice
since it requires less replication. Choosing the smallest core,
however, can have an adverse effect on individual hosts if many hosts
use this core for placing replicas. To represent this effect, we define
load to be the amount of storage a host provides to other
hosts. In environments where some configurations are rare, hosts with
the rare configurations may occur in a large percentage of the
smallest cores. Thus, hosts with rare configurations may have a
significantly higher load than the other hosts. Indeed, having a rare
configuration can increase a host's load even if the smallest core is not
selected. For example, in Example 3.1,
is the only host
that has a flavor of Unix as its operating system. Consequently,
is present in both cores.
To make our argument more concrete, consider the worms summarized in Table 1, which are well-known worms unleashed in the past three years. For each worm, given two hosts with one not running Windows or not running a specific server such as a Web server or a database, at least one survives the attack. With even a very modest amount of heterogeneity, our method of constructing cores includes such pairs of hosts.