Evaluation of collaborative worm containment on the DETER testbedL. Li, P. Liu, Y.C. Jhi, G. Kesidis |
Abstract: The advantage of collaborative containment over independent block or address blacklisting on worm defense has been advocated in previous worm studies. In this work, we will evaluate two collaborative worm containment proposals and present some of the results of our DETER emulation experiments. In the first one, proactive worm containment (PWC), security agents block all suspicious hosts on the network on receiving alerts of a worm and run “relaxation analysis" on those blocked hosts afterwards. Emulation experiments will evaluate PWC's ability to stop the propagation of fast local worms and to reduce scan traffic of fast global scanning worms. The second proposal, which detects and contains a scanning worm based on the concept of dark port, focuses on stealthy worms that target only specific local networks or enterprise networks. Emulation experiments run on the DETER testbed demonstrate the efficiency of local scanning worms and their elevated threat to enterprise networks. The effectiveness of a collaborative containment strategy based on dark port detection is evaluated using DETER emulation and compared with that of individual address blacklisting.
While most conventional worm containment proposals depend much on novel detection technique or containment strategy, we believe that a collaborative approach in which end-hosts or sub-networks share security alerts and act proactively could be superior, especially in those contexts where the priority of network security is put higher than those of temporary availability loss. Works in [18] and [3] have demonstrated the advantage of group defense over independent containment. In this study, we will run emulation experiments to evaluate the effectiveness of two worm containment schemes we proposed that champion a collaborative approach. The first one, Proactive Worm Containment, or PWC [8], is based on two key concepts: early blocking and post-hoc relaxation. In abstract terms, the PWC system contains a suspicious host after an early-but-immature worm alert is raised by a worm agent running on the host. A potentially false alert is to be repaired by the subsequent relaxation phase. To be more proactive, the worm alert is propagated to all other worm agents in the network once it is raised. Upon receiving a propagated worm alert, an agent proactively blocks its host and starts the relaxation phase. Early containment reduces the number of outbound scans, which otherwise could infect another victim in the local network or somewhere in the Internet.
The second proposal, dark port detection and containment, takes aim at a new kind of worms that target specifically local or enterprise networks. For the defense against this new threat, we have proposed a new dark port detection scheme that can effectively monitor all suspicious scans within an enterprise network as soon as those scans arrive at the local routers that are supposed to deliver those packets to their intended final targets. A selective or full block action then can be taken to block dark-port-scan originating hosts, cells, or the full network on specific port numbers. We will run evaluation experiments to compare the new dark port proposal with the individual blacklisting containment with various detection latencies.
This paper is organized as follows. Section 2 reviews the related work on the worm detection, worm containment, and enterprise worms. The procedure of running a worm experiment on the DETER testbed and related tools are briefly discussed in section 3. In section 4, we introduce the PWC system and present evaluation results of PWC. In section 5.2, we show the threat of a new class of local-only scanning worms and introduce our new dark port detection framework. A cooperative containment strategy based on the new dark port detection is presented in section 6, together with results of its DETER evaluation. In the final section, we conclude the paper and list some future research topics.
Existing worm containment techniques can be roughly broken down into following classes: Rate limiting [25, 4] is to limit the sending rate of scan-like traffic at an infected host, which may introduce longer delays for normal traffic; Signature-based worm filtering [21, 17, 15, 9, 22, 26] relies on worm signatures to prevent scans from entering/leaving a LAN/host, which may be evaded by a proper use of polymorphism [7]. PWC can effectively slow down or even stop the propagation of a fast scanning worm before a reliable worm signature is generated and applied. The idea of early response before having enough evidence for worm containment also appeared in the dynamic quarantine scheme in [27]. But unlike PWC, dynamic quarantine is not a collaborative approach and it cancels containment on a suspicious host after a period of time without any further analysis.
In many cases, worm signature generation is helped by a worm detector. Honeypot or honeynet [5] can detect suspicious worm activities by monitoring scans that are targeting at unused or dark network addresses. For detection and defense in local or enterprise networks, a double honeypot can be used to collect worm samples for signature generation [22]. In [6], an enhanced honeypot based detection, HoneyStat, combines network security alerts with computer OS alerts to improve the detection accuracy and reduce false positives.
Another type of worm detection scheme, failed scan detection, is also taking advantage of worms' blind scanning behavior. While the early version of failed scan detector used to rely on the returned ICMP packets from the destination host or network [16], most recent proposals detect failed scans by keeping logs of all connection requests on the sender site and singling out these requests that have not received response after a pre-defined period of time [20, 24]. Compared with these failed scan detection proposals, our dark port detection can detect a failed local scan much faster since the security agent at the destination site can report a failed scan right after its arrival and there is no need to wait for a connection to go time-out at the sending site.
Recently, network testbeds are becoming available and researchers began to use them to simulate and emulate worms and study their behavior in the Internet [23] or in enterprise networks [13]. The advantage of cooperative or group defense over independent firewalls has been evaluated by simulation experiments such as [18].
Network testbeds such as DETER provide a simulation and emulation platform with the highest level of flexibility and fidelity in term of hardware and network configurations, code compatibility, and network metrics. In [13], we reported our virtual node approach to leverage limited testbed resources to emulate worm propagation in a large enterprise network. The general method and steps of running a DETER emulation experiment are briefly reviewed here.
The ESVT GUI tool provides an integrated environment to conduct an interactive worm or other network experiment on a testbed. It is a component based topology editor, NS2/TCL script generator, worm experiment designer, and a visualization tool of experimental results. At the first step, it can be used to draw the topology based on a prototype network. The toolbox of programs includes network components such as computer/host node, switch/router node, network/Internet interface, and link. Computer nodes can be defined as susceptible or non-susceptible. To request network resources from the testbed and apply network topology on the testbed, a NS (network simulator) style TCL script file needs to be submitted to the testbed control plane. The ESVT GUI has a TCL script generation function that can output and convert a network topology into a DETER TCL script to simplify this task.
Employing a one-to-one emulation approach entails substantial resources that a normal testbed cannot support. In [13], we compared our virtual node design with other kinds of virtualization methods such as VMWare and Emulab VM and concluded that the performance of the virtual node design in realistic LAN simulation is comparable with the all-real-node scenario, while consuming much fewer resources than other virtualization approaches.
In our virtualization approach, one switched LAN in the real topology is emulated by one virtual-node application running on a testbed host. The virtual node program consists of a number of virtual end-systems, each representing one real host and having a unique virtual IP address. Inside the program, each virtual end-system is implemented by a thread which simultaneously executes a user-defined procedure (such as worm traffic generator) and a background traffic generator. The parameters of background traffic of each virtual node that can be configured include the maximum sending rate, the number of simultaneous sessions, packet inter-arrival distribution, and port number/protocol distribution.
We adopted a similar design for the timing of incoming worm scans from the Internet interface as reported in [13, 12]. The Internet interface node recreates and injects scanning traffic into the enterprise network under test. The speed and pattern of worm traffic were based on simulation results from our extended KMSim model [1, 10]. KMSim worm model, an extension of Kermack-McKendrick epidemic model, takes enterprise networks as the unit of analysis so that various distributions of worm susceptibles and network topology characteristics can be accounted for. Suppose that the total scan rate, S(t), of a worm is obtained using the KMSim simulation, under the assumption of a uniform distribution, the scan-rate from the Internet directed at the enterprise under simulation could be approximated as (A/232)S(t), where A is the size of the address space of the enterprise network. The data source for the background traffic was mainly from the enterprise traffic report in [11].
PWC is a proactive worm containment solution for enterprises. Motivated by the observation that a worm uses a sustained outgoing packet rate, PWC gains infection awareness seconds before a signature or filter can be generated [8] and broadcast worm alerts to all worm agents. To overcome denial-of-service possibly caused by containment based on received worm alerts, PWC performs relaxation analysis that detects and releases contained-but-uninfected hosts. PWC has following features: minimal denial-of-service; signature-free; lightweight; and evasion resilience.
Each host in an enterprise network protected by PWC runs a worm agent, or a PWC agent, that performs detection and suppression of worm scans released from its host. The PWC agents are coordinated by a PWC manager that has two roles: first, it distributes authenticated worm alerts reported by the PWC agents to all the PWC agents in the enterprise network; second, it is a certificate authority in authentication between each PWC agent and the PWC manager and vice versa. PWC can handle multiple simultaneous worm alerts raised by different worms in one contain/relax procedure.
Similarly to other worm detection techniques such as virus throttle [25], a PWC agent calculates the rate of most recent outbound connection attempts1 sent to unique IP addresses in order to gain awareness of worm infection on its host. When the rate exceeds λ connections per second, it raises an alert and reports it to the PWC manager. The PWC manager then propagates the alert to the rest of PWC agents in the enterprise network. When a PWC agent receives an alert from the PWC manager, the PWC agent properly authenticates it before accepting it.
A PWC agent starts containing its host either on raising or on accepting an alert propagated from the PWC manager. When a PWC agent is containing its host, it buffers TCP SYN and UDP packets sent to new destination addresses, allowing traffic through existing TCP connections and packets to known destinations. On initiating containment, the PWC agent also starts relaxation analysis for limited duration τ seconds, to check if the outbound connection rate is sustained. If the relaxation analysis detects a sustained connection rate, the PWC agent silently discards buffered connection attempts and performs relaxation analysis again; otherwise, the containment is relaxed and buffered connection attempts are forwarded.
To test the effect of PWC on containing fast scanning worms, we conducted two enterprise network emulation experiments on a network testbed, DETER [2]. The first experiment is to test whether PWC can effectively contain a fast local-scanning worm, while the second is to study how PWC limits high volume scanning traffic originated from suspicious hosts.
To conduct a detailed worm propagation experiment using emulation and simulation method, the scale of the network has to be large enough to capture the interplay between the worm and defense, and the configuration of the network has to be typical of enterprise networks. For this purpose, we utilized our ESVT toolkit [14] to design a hypothetical enterprise network, which included one Internet stub-link, 22 internal routers, 66 switched LANs, 7 servers, and more than 10000 end-hosts. Of the hosts, 110 hosts (about 1%) are susceptible to the worm that we intentionally injected into the network. Thanks to the virtualization, emulation of this enterprise network on the DETER testbed only took 93 physical nodes.
The prototype for the PWC evaluation implements three components: worm detection based on unique destination addresses, worm alert broadcast, and proactive containment upon receiving a worm alert. The parameters of PWC for the emulation experiments are as follows: sending rate threshold 20 scans/second; vulnerability window 1 second. We implemented worm alert `broadcast' by building a list of LAN gateways (virtual node program) and sending a message to all addresses on this list sequentially.
The first experiment tested the effect of PWC on containing a local scanning worm. An Internet node injected infectious worm packets into the enterprise at rates following our Blaster simulation[12]. Inside the enterprise, the worm randomly chose an initial IP address and began to scan sequentially from that address. Without PWC, the infection ratio was high (72 out of 110 after 120 seconds) and the speed of infection was rapid (about 10 seconds to reach the peak) as shown in Figure 4.3. While with PWC enabled, we saw a marked difference in the same 120-second experiment: the number of infected hosts was reduced to 18 and the speed of infection was much slower, suggesting that the fast local infection was contained and most infection was caused by the incoming scan from the Internet node.
The second experiment was to study the effect of PWC on reducing high volume scanning traffic. In this experiment, infected hosts scanned randomly to global addresses. Because of its random-scanning nature, most of worm scanning traffic was directed to the Internet interface where worm traffic may congest and impact normal traffic. From Figure 4.3, we see that in the case without PWC, there were several peak traffic periods caused by worm scanning and traffic as a whole suffered increased delay after. In the case with PWC, there was no abnormally high traffic volume and the aggregate traffic rate was smooth.
It is worth noting that relaxation analysis was not implemented and there was no additional signature-based worm defense (such as EarlyBird) deployed in the emulation. For this reason, the blocked hosts were kept blocked through the 120-second experiment. The maximum number of hosts that were blocked was about four hundred out of ten thousand. We believe there would be an significant improvement in terms of infection rate, availability recovery, and traffic filtering if those methods were deployed.
Most scanning worms chose the whole IPv4 address segment as their target scanning space, though the scanning strategies may vary. As a result of this vast scanning space, even a very fast scanning worm could not finish scanning the whole space efficiently by just one infected host. Take the example of the Slammer worm, it will take one Slammer infected host 232/10,000=429496.7 seconds to scan the whole space with a speed of 10,000 scans per second. Luckily, there are many concurrent active worm infectives that can work together to jointly scan the space so the total scan and infection time can be reduced. The efficiency of worms to reach full infection or scan the whole address space will be greatly increased when a worm is particularly designed to target an enterprise network whose address space is much more limited. For a /16 network with 256 /24 subnets, a worm can adopt pure enterprise-wide random scanning, hybrid or preferential local random scanning, or enterprise-wide sequential scanning. We will first run testbed emulation experiments to explore the efficiency of local scanning in an enterprise network environment and the danger such local scanning worms pose to the network.
The topology we used for emulation is the same network we used for PWC evaluation in section 4. We configured 959 hosts to be vulnerable to the worm attack and they were randomly distributed among 66 sub-networks. We ran three experiments on this topology with different scanning strategies: the first with a pure enterprise-wide random scanning at a rate of 2 scans per second; the second with a pure enterprise-wide sequential scanning starting from a random address at a rate of 2 scans per second; and the third with a hybrid strategy at a combination of one enterprise-wide random scan per second and one sequential scan per second starting from a sub-network address. There was one initial seed infective in all three experiments. The results are depicted in Figure 4.
Thus, we found that the pure random scanning was more efficient than the pure sequential scanning. The hybrid scanning strategy was the fastest among three strategies and it took less than 200 seconds to infect the majority of susceptible hosts. For such a fast locally scanning worm, there is an urgent need for an effective early-warning detection scheme.
The danger of worms targeting enterprise networks has been demonstrated through emulation experiments. While there have been some discussions on the topic of enterprise worms and their detection and defense in the literature, none has specifically looked at a pure enterprise-targeting local worm. For this reason, we propose a dark port worm detection scheme and go over the main ideas of this scheme briefly here.
A dark port detection system is comprised of a central security console and a series of soft firewalls. The soft firewalls are installed on the network components connecting a sub-network or cell and the enterprise backbone network, normally a router or a switch which has the ability to monitor all inbound and outgoing packets and can block any one of them. Instead of looking at outgoing packets or connection attempts at the sender site, the soft firewall only inspects incoming connection attempts for the worm detection. The detection is rather simple in that any connection attempt whose tuple of (destinationIP:protocol:portnumber) is not on a safe list of the soft firewall could be deemed suspicious. The real task of detection is so on the creation and maintenance of the safe list, or white list of services that are running on the hosts protected by the firewall. Figure 5 is an illustration of proposed soft firewalls in an enterprise network. When firewall A detects a suspicious scan that tries to sneak into cell A by checking its safe list of running services, it will send alerts to the central security console. The security console will issue containment orders to relevant firewalls when some threshold has been reached. Firewall B can then block the corresponding host, or particular service port number, or all hosts in the cell.
Real world traces collected on our lab computers showed that most intra-network sessions involved only a limited set of servers and port numbers. Traces in Table 1 were collected on two normal Windows user hosts and one Linux file server. We removed all packets related to L2R (local to remote) sessions and only kept unicast intra-enterprise sessions. Even the busiest host (trace 1) only contacted 14 distinct local IPs in a period of more than six hours and these local sessions were focusing on 13 TCP and UDP services. A blind scanner will be detected efficiently using the proposed dark port detection.
The universal deployment of a dark port detector on sub-network firewalls facilitates the detection of single random scanners. Detection of single scanner, however, is not the sole purpose of dark port detection system and cannot be used as the evidence and rationale for a collaborative containment action. Only when the system detects an increasing number of hosts participating in scanning and/or timing information indicates actual worm propagation, the IDS will detect the existence of an ongoing worm in the network. For this purpose, we need a sensitive detector for the propagation of worm infections.
Counting the number of distinct scanning hosts is a straight-forward method for worm detection, but only counting the number of scanning hosts does not consider infection timing information which is an important indicator of threat severity. The inter-arrival time of successive worm scanners may be a good measure to detect the existence and urgency of worm propagation. For a pure random- scanning worm in a /16 network, the expected time for a new infection can be calculated using a formula similar to equation (4) in [19]:
|
Here Ns is the total number of susceptible hosts, Ik−1 is the number of infected hosts at current time, and Tk is the expected wait time for the next infection. The formula can be further simplified as Tk=N/σ Ik−1 (Ns−Ik−1), where N is the amount of local network address space (216 for a /16 network). It is easy to find that when the number of infected hosts increases, the expected time for the next infection decreases. This phenomenon of decreasing inter-arrival times can be contrasted with the behavior of other background scan noise where no consistent trend in inter-arrival times of distinct scanners is expected. Using data collected by honeypot computers deployed in an actual enterprise network, we plotted the curve of inter-arrival times of background scanners together with the inter-arrival times of a random scanning worm in a /16 network in Figure 6. For the generation of worm inter-arrival times, we set the total number of susceptible hosts to be 1,000 and the scan rate per worm victim to 2. From the figure, we can see though the inter-arrival times of background scanners sometime fell to a rather low value, and there is no consistent trend among them over a long period of time, compared to the strong decreasing trend in the early stages of worm propagation. This feature was the basis for our worm propagation detection in our emulation experiment.
When the aggregated level of suspicious scan activities is above some pre-defined threshold, the central security console will decide appropriate actions to deter possible worm propagation. The set of actions could include doing nothing, blocking individual hosts from initiating any new connection attempt, blocking particular service port numbers from accepting new connections, or blocking all new service requests. Implication of blocking all new service requests or on some specific port numbers could be severe since it may disrupt important network transactions within the enterprise boundary. To avoid taking such drastic block action, some simple intuitive or rule-of-thumb decision rules can be used. For example, when security alert analysis indicates that there are rather widespread scanning activities within the network and the majority of scans are focusing on one service port number, the decision should be sending commands to all security agents to block any new connection requests targeting that particular port number.
We built a prototype defense system which implemented only selective block actions and ran emulation experiments to test the effectiveness of it on containing a hypothetical local-only scanning worm. Network configuration and density of susceptible population were the same as we used for propagation experiments in section 5.1. Each infected host sent out a combination of one enterprise-wide random scan and one sub-network random scan per second. We assumed zero false negative in dark port detection, which means that each scan on a non-existent service port was reported to the central security console right after it arrived at the receiving soft firewall. The central security console maintained the following records: the number of dark port alerts per service port, the times of new distinct IPs sending the first scan, and the number of distinct cells that had been reported sending scans per service port. The method for propagation detection we used in the emulation was based on the fact of decreasing inter-arrival infection times we introduced in the last section. The actual algorithm is in the form of sequential likelihood ratio test and the details and its evaluation will be published in another paper. When the worm propagation measurement is above the threshold, the central security console would issue a selective block command on the related port(s) to all soft firewalls. The thresholds on the rate of new infections and distinct scan-sending cells should be adjusted according to the local network traffic to have a satisfactory detection performance in real world application. Figure 7 shows the effect of our containment.
Collaborative containment performed well and outperformed individual block or address-blacklisting, one of few workable containment strategy for this local scanning worm, as shown in the figure. The dark port collaborative containment activated selective port block on all hosts after a few hosts were infected and none was infected after the containment. Independent individual containment with a delay of 10 seconds resulted a final size of about 60 infected hosts. The detection based on failed scans only is also less prone to counter-detection by forged benign scans by a `smart' worm. The drawback of selective block on all cells is the potential service disruption on normal hosts. To fully account for the loss of network service by worm containment actions and find an optimal defense strategy, we will need a quantitative evaluation framework to run the cost-benefit analysis of different containment strategies.
Testbed emulation experiments demonstrated the efficiency of locally scanning worms with advanced scanning strategies. The defense to this threat entails early detection and collaborative containment. Selective block containment based on a new dark port detection scheme showed promise in this regard through experiments run on the DETER testbed. Also in this article, we presented the evaluation results of another collaborative containment scheme that targets fast scanning worms. The experimental results clearly demonstrated the effectiveness of collaborative containment on worm propagation.
Our future work includes fine-tuning of selective block strategy to reduce service disruption. Improving the communication security between collaborating security agents is another important task. Also, we will explore related issues of botnet emulation and defense evaluation in our future work.
This document was translated from LATEX by HEVEA.