|
USENIX '05 Paper   
[USENIX '05 Technical Program]
Figure 3 illustrates an early deployment stage, where only eBay and Stanford have deployed RR and AITF. Traffic sent by Stanford hosts to eBay has path {* Agw Vgw eBay}. Malicious node spoofs this path and sends to eBay high-rate traffic that appears to be coming from Stanford. Vgw requests from Agw to stop; although Agw agrees, Vgw continues to see high-rate traffic from Stanford; it falsely concludes that Agw is non-cooperative and blocks all traffic from Stanford to eBay. To prevent this abuse, we augment the recorded path with randomized components. Now each RR-enabled border router that forwards a packet writes on the packet (i) its IP address and (ii) a random value that depends on the packet's destination. By picking a sufficiently large and properly random value, it can be made arbitrarily hard for malicious off-the-path nodes to guess it (see Section 5.4). For example, in Figure 3, traffic sent by Stanford to eBay, has recorded path {* Agw:R1 Vgw:R2 eBay}, where R1 and R2 are random values inserted by Agw and Vgw, respectively. To avoid keeping per destination state, each router computes the random value to insert in each packet as follows: where key is a local key, hash is a keyed hash function and D is the packet's destination. When Agw receives a request to block undesired flow F{P}, it checks whether it indeed forwarded F, i.e., whether P includes the correct random value. If yes, Agw commits to filter F by responding to the handshake as described in Section 3.4. Otherwise, Agw responds with the "authentic" path P', i.e., the path that includes the correct random value. So, if the victim's gateway sends a filtering request with a spoofed path, the handshake consists of just two messages, depicted in Figure 4.
The victim's gateway uses the authentic path P' to recognize spoofed traffic that claims to be coming from Agw and block it. For example, in Figure 3, Vgw blocks everything that appears to be forwarded by Agw but does not include Agw's correct random value R1, i.e., Vgw blocks traffic with path {* Agw:!R1 Vgw eBay}. Note that in the specific example, Vgw can only block this traffic locally -- escalation is not an option, since no other domain has deployed AITF. We defer answering the following questions to later sections: (i) Exactly how large must each random component be? (ii) A router computes its random values based on a local key; how often does this key expire? (iii) When the local key changes, all random values communicated to victim gateways as part of the handshake become invalid; does the router notify the corresponding victim gateways?
5 EvaluationIn this section, we use back-of-the-envelope calculations to show that (i) a victim can have an undesired flow blocked within milliseconds; (ii) the victim's gateway can block a certain number of undesired flows with two orders of magnitude fewer filters than flows and a reasonable amount of DRAM; and (iii) the probability of abusing AITF can be made arbitrarily low.5.1 Filtering Response TimeWe define filtering response time Tfr as the time that elapses from the moment the victim sends a filtering request against an undesired flow, until the victim stops receiving the flow. With AITF, this is equal to the one-way delay from the victim to the victim's gateway plus the negligible overhead of processing the filtering request and installing the corresponding temporary filter. I.e., filtering response time is a few milliseconds.If there are compromised routers on the flow's path, they can agree to filter the flow and later break the agreement -- recall that each attack gateway is given two chances to cooperate. This results in the victim receiving "spikes" of the undesired flow after Tfr. Spikes are spaced out by at least Ttmp seconds; the number of spikes is bounded from above by 2×n, where n is the number of compromised routers on the flow's path. The effect of these spikes on the victim is insignificant, as we demonstrate with later simulation results. 5.2 Filtering Rate, Capacity and GainIn this section, we examine three basic AITF properties: how much attack traffic it blocks, how fast it does so, and at what cost. To quantify these properties, we use three simple metrics: the filtering rate of a router is the number of flows that the router can block every second; the filtering capacity, is the number of flows that the router can keep blocked simultaneously; the filtering gain is the number of blocked flows divided by the required number of filters:For example, G = 1 flow/filter means that the router uses 1 filter to keep 1 flow blocked. Victim's gateway -- best-case scenario: Assume all attack gateways cooperate and all attack sources comply with their gateways to avoid disconnection. In this case, every time Vgw satisfies 1 filtering request, it spends 1 filter for Ttmp seconds and causes 1 flow to be blocked for Tlong minutes. Thus, if Vgw uses Nfilters filters, it achieves filtering rate Nfilters / Ttmp flows/sec, filtering capacity Nfilters×Tlong / Ttmp flows and filtering gain For example, assume Ttmp = 1 sec and Tlong = 10 min. With 10,000 filters, Vgw blocks 10,000 flows/sec and keeps 6,000,000 flows blocked simultaneously. If certain attack gateways and/or attack sources do not cooperate, undesired flows occur more than once. Suppose each undesired flow occurs on average n times. In this case, Vgw spends n temporary filters to cause 1 flow to be blocked for Tlong minutes. Thus, filtering rate, capacity and gain drop by a factor of n. For example, if all attack sources lie to their gateways, each undesired flow occurs twice. Then, with 10,000 filters, Vgw blocks 5,000 flows/sec and keeps 3,000,000 flows blocked simultaneously. Victim's gateway -- worst-case scenario: Now assume that none of the attack gateways cooperates, filter utilization exceeds its threshold, Vgw escalates all filtering requests, and all escalation attempts fail. In this situation, Vgw blocks traffic from all attack gateways to the victim locally, which means that with 1 filter, Vgw keeps 1 flow blocked. I.e., Vgw achieves filtering gain G = 1. So, the maximum number of filters ever needed on Vgw to protect a single victim equals the total number of potential attack gateways. Note that this holds even if all undesired flows are spoofed -- Vgw ends up accepting one flow from each alleged attack gateway, so it still needs as many filters as there can be attack gateways. Surprisingly, the number of potential attack gateways is only a few tens of thousands. According to BGP data from Route Views [3] (retrieved in February 2005), there are currently 19,230 Autonomous Systems (ASes); of these, roughly % correspond to edge domains, while the rest are Internet Service Providers (ISPs). We processed the data with Gao's algorithm for inferring AS relationships [15] to get the number of providers per edge domain. Assuming a separate border router per customer-provider pair, there are only tens of thousands of border routers that could act as attack gateways. To summarize, there may be millions of attack sources, but there are only tens of thousands of edge domains to host them. A router cannot accommodate a million filters, but it does accommodate tens of thousands. So, if eBay is under attack, eBay's gateway may be unable to locally block each attack source individually, but it is able to locally block each edge domain individually -- i.e., each edge domain that does not cooperate to block its own misbehaving clients.
If an attack source does not comply, the corresponding flow recurs before Tlong minutes, Agw completes a second handshake and spends again a temporary filter for Ttmp seconds. However, the attack source gets disconnected, which means that it does not come back online, unless it has been cleaned and patched. Thus, if attack sources do not comply, the filtering gain of the attack gateway actually increases, because infected hosts get disconnected, and Agw does not have to filter their traffic again. 5.3 DRAM RequirementsShadow memory: Both Vgw and Agw keep a shadow filter for each flow that has been blocked. So, the maximum number of used shadow filters equals the maximum number of flows simultaneously blocked (i.e., the filtering capacity). Each shadow entry has form {A Agw:R ... Vgw V}. Assuming R is 64 bits long (see Section 5.4) and flows are classified/filtered based on the last 6 components of their path, each shadow entry is 320 bits wide (2×32 bits for the IP source and destination, 6×32 bits for the non-random RR path components, and 64 bits for R). So, to keep a million flows blocked, Vgw needs about 40 MB of DRAM.Note that DRAM is not the resource bottleneck; if it were not for the limited number of wire-speed filters, 40 GB of shadow memory would be enough to keep 1 billion undesired flows blocked. Long-term filters on attack source: An attack source is not necessarily a malicious or compromised node; it is simply the sender of a traffic flow deemed undesired by its recipient. An "innocent" host classified as an attack source needs long-term filters to remember which receivers do not want its traffic (and avoid disconnection). Specifically, to satisfy n requests/sec by its provider, the host needs n×Tlong filters. However, as opposed to wire-speed filters on routers, software filters on end-hosts are not a scarce resource. Note that A does not risk disconnection for not satisfying requests beyond the agreed rate -- a correctly functioning provider does not overload a customer with filtering requests and then disconnect the customer for failing to satisfy them. 5.4 Probability of AbuseOne potential attack against AITF is to try to guess the randomized RR header. For example, consider the topology of Figure 3 and suppose a set of malicious nodes (like ) spoof Stanford addresses and send high-rate traffic to eBay. Vgw contacts Agw, gets the random value recorded by Agw on all packets to eBay, and blocks all spoofed traffic. However, by sending a sufficiently large number of messages to eBay, the malicious nodes can try to guess the correct random value by brute force. If they succeed, they can pretend to be Stanford hosts and potentially cause all traffic from Stanford to eBay to be classified as undesired and, thus, blocked.To limit the probability of such abuse, Agw changes the process by which it computes its random values every Tchange minutes. If the random value is N bits long, to guess it with probability p, the malicious nodes must send p×2N messages. The amount of time it takes to send that many messages to eBay is bounded from below by eBay's maximum packet reception rate B. So, if the random value is N bits long, the malicious nodes can guess it during one Tchange interval with probability p = Tchange×B / 2N. The probability that the malicious nodes guess one random value in Tguess minutes is the probability that they guess the random value in any of the Tguess / Tchange intervals: For example, if eBay is connected through a 1 Gbps link, it receives up to 1.95 million packets/sec (to make our analysis conservative, we assume that eBay responds to all the messages sent by the malicious nodes; in reality, if a set of nodes send such high-rate traffic to eBay, eBay will consider their traffic undesired and use AITF to have it blocked). Suppose Tchange = 10 minutes and N = 64 bits. The probability that the malicious nodes guess one random value in a month is 2.74× 10-7. Changing the process that computes the random value creates the following problem: Suppose Agw has communicated random values to a number of victim gateways; the moment it changes the process, all communicated random values become invalid. To avoid this problem, when Agw responds to a handshake, it communicates to the victim's gateway, not only its current random value, but also the next Tchange / Tlong random values and when they will be valid. By choosing Tchange = Tlong, Agw must pre-compute one random value. We should note that the size of the random value cannot be chosen solely based on the desired probability of abuse; it also affects the bandwidth overhead introduced by RR (because each RR-enabled border router adds a random value to each forwarded packet). We discuss RR bandwidth overhead in the Appendix, Section A.4. Another potential attack against AITF is to try to compromise the 3-way handshake. For example, consider the topology of Figure 1 and suppose a set of malicious nodes pretend to be V's gateway and initiate 3-way handshakes with Agw, asking it to block all traffic to V. Assuming the malicious nodes are not on the path from Agw to V, they do not see Agw's responses and the included nonce, necessary to complete the handshake. However, by sending a sufficiently large number of messages, they can guess the correct nonce by brute force. Following similar rational as above, the probability to guess one nonce in Tguess minutes depends on Agw's maximum packet reception rate. For example, if Agw is connected through a 10 Gbps link, it receives up to 19.5 million packets/sec. Suppose Tchange = 10 minutes and the nonce size is 64 bits. The probability that the malicious nodes guess one nonce in a month is 2.74×10-6. For a 128-bit nonce, the probability becomes 1.3×10-25. Note that, unlike the random value, the nonce is not included in every packet forwarded by Agw; hence, the incentive to keep it small is less relevant. 5.5 Good Traffic Lost to EscalationEscalation blocks all traffic from a non-cooperative attack gateway Agw to the victim; clearly, this can lead to loss of good traffic. The decision to escalate or not is made by the victim, because the victim is the only one who can quantify the value of Agw's good traffic versus the damage caused by Agw's attack traffic. For example, suppose eBay is under attack by a million attack sources, all connected through AOL; at the same time, it is serving 1,000 legitimate AOL clients. If the AOL gateway does not cooperate, only eBay can decide whether serving the 1,000 good AOL clients is worth sustaining the 1,000,000 bad ones.Whether a flow is "escalatable" or not depends on the policy module. Hence, we cannot compute a general estimate of the percentage of good traffic lost to escalation. However, we do implement a simple policy module in our simulation, and show how its decisions affect the victim's good traffic.
6 Simulation ResultsWe use real Internet routing table data to build a realistic simulation topology. We simulate different attack scenarios, where multiple attack sources (up to a million) attack a single victim, connected through a 100 Mbps link; the victim's gateway uses up to 10,000 filters to protect the victim. For each scenario, we plot the bandwidth of the attack traffic that reaches the victim as well as the victim's goodput as a function of time, i.e., we show how fast attack traffic is blocked and how much of the victim's goodput is restored.6.1 FrameworkWe built our simulator within the Dartmouth Scalable Simulation Framework (DaSSF) [2]. To create our topology, we downloaded Internet routing table data from the Route Views project site [3]. We map each AS and each edge network to a separate AITF domain -- we derive AS topology and peering relationships by applying Gao's algorithm for inferring inter-AS relationships [15] to the Route Views data; we derive edge network topology by roughly creating one edge network per advertised class A and class B prefix. Each AITF domain is represented by one AITF router. AITF routers are interconnected through OC-192 (9.953 Gbps) and OC-48 (2.488 Gbps) full-duplex links. End-hosts are connected to their routers through Fast (100 Mbps) and Thin (10 Mbps) Ethernet full-duplex links. Internet round-trip times average 200 msec. Host-to-router round-trip times average 20 msec. In all scenarios, Ttmp = 1 sec and Tlong = 2 min.6.2 Filtering Response TimeOur first experiment demonstrates that AITF achieves a filtering response time equal to the one-way delay from the victim to its gateway, i.e., on the order of milliseconds. It also demonstrates that "lying" gateways are quickly detected and blocked; the worst each lying gateway can do is cause up to two "spikes" spaced out by at least Ttmp seconds.Scenario 1: The victim receives a flooding attack by 10,000 attack sources, each behind its own attack gateway. The bandwidth of the attack (before defense) is 1 Gbps. All attack gateways are lying, i.e., they agree to block their undesired flows and then break the agreement.
Figure 5 illustrates that Vgw blocks attack traffic within milliseconds from the moment the attack is detected; attack traffic recurs twice and is completely blocked within 2 seconds. Vgw gives two chances to each attack gateway to honor its filtering agreement; when the attack gateways break their agreements twice, all their traffic to V is blocked. We run the experiment only for 10,000 undesired flows (which allows the victim to have all of them blocked within 1 sec), so that the "spike" effect due to the recurring flows is visible. 6.3 Filtering GainThe next two experiments demonstrate that the victim's gateway can achieve filtering gain on the order of hundreds, i.e., the victim's gateway blocks two orders of magnitude more flows than it uses filters.Scenario 2: The victim receives a flooding attack by 100,000 attack sources. The bandwidth of the attack (before defense) is 1 Gbps. The victim's goodput (before the attack) is approximately 80 Mbps. All attack gateways cooperate. Scenario 3: Similar to scenario 2, but the victim receives a flooding attack by 1,000,000 attack sources.
Figures 6 and 7 show that, using 10,000 filters, Vgw blocks 100,000 flows in 10 seconds and 1,000,000 flows in 100 seconds. Without AITF, a router needs a million filters to block a million flows; these experiments demonstrate that, with AITF, Vgw needs only ten thousand filters to block a million flows. Hence, AITF reduces the number of filters required to block a certain number of flows by two orders of magnitude -- a critical improvement, since routers typically accommodate tens of thousands of filters, whereas DDoS attacks can easily consist of millions of flows. Figure 7 also reveals what happens after Tlong = 2 minutes. We assume that attack sources are "smart", i.e., they pause sending an undesired flow when so requested (to avoid disconnection) and they restart after Tlong = 2 minutes. 6.4 Escalation and Lost GoodputThe last two experiments illustrate the trade-off involved in policy decisions regarding non-cooperative gateways. The victim's gateway may decide to escalate and lose all traffic from a non-cooperative gateway to the victim; alternatively, it may decide to locally block as many undesired flows as possible and let the rest through. Although we did not implement a complete policy module, we choose two "extreme" scenarios, each favoring a different decision, implement the best policy for each scenario, and show the results.Scenario 4: The victim receives a flooding attack, but, this time, half of the attack gateways are non-cooperative. Good and bad sources are collocated, i.e., evenly distributed behind the same gateways. The attack comes from 100,000 attack sources; attack bandwidth (before defense) is 1 Gbps. The victim's goodput (before the attack) is approximately 80 Mbps.
Figure 8 shows that, using 10,000 filters, Vgw restores 50% of the victim's goodput in 6 seconds. In this scenario, attack traffic has 10 times the rate of good traffic. The policy module chooses to block all traffic from non-cooperative attack gateways. This cannot make things any worse, since most good traffic is being dropped anyway due to the flood; on the contrary, it allows good traffic from cooperative gateways to get through. Scenario 5: In this scenario, all attack gateways are non-cooperative, and good and bad sources are collocated. However, the attack comes from fewer attack sources (20,000) and consumes lower bandwidth (160 Mbps, before defense). The victim's goodput (before the attack) is approximately 80 Mbps.
Figure 9 shows that, using 10,000 filters, Vgw restores 75% of the victim's goodput in a few milliseconds from the moment the attack is detected. This scenario is trickier than the previous one, because attack traffic has only twice the rate of good traffic. In this case, escalating would be disastrous -- it would drop goodput to 0, because good and bad sources are collocated. The policy module chooses to block as many flows as possible locally and let the rest through. This results in half the attack traffic being dropped and half getting through, which allows only 75% of the victim's goodput to get through. I.e., the victim prefers to sustain some attack traffic from non-cooperative attack gateways, rather than sacrificing all their good traffic.
7 DeploymentThis section describes how AITF can be deployed in today's Internet. We also consider the incentives for early adopters and compatibility with legacy hosts.7.1 ModelRather than requiring every Internet router to support AITF, it is sufficient for the border routers between administrative domains to support it. We introduce the notion of an AITF domain as an administrative domain whose border routers are AITF-enabled.An AITF domain has a filtering contract with each local end-host and peering domain. Such a contract specifies a maximum filtering request rate, i.e., the maximum rate at which the AITF domain can send/receive requests to block undesired flows to/from each end-host and peering domain. An AITF domain enforces the specified rates and indiscriminately drops messages from an end-host/domain when that party exceeds the agreed rate. In a way, an AITF domain is the "dual" of a BGP Autonomous System (AS): ASes exchange routing information, which communicates their willingness to relay certain packets. Similarly, AITF domains exchange filtering information, which communicates their unwillingness to receive certain packets. However, an AITF domain differs from an AS, in that it exchanges messages with other AITF domains that are not adjacent to it -- recall that the victim's gateway talks directly to the attack gateway. It seems natural for every AS to map to a separate AITF domain. Popular public-access sites, like eBay or Amazon, may even deploy AITF on their corporate firewalls, to avoid sharing a victim's gateway with other customers of their AS. Our position is that the filtering contract should be part of the Service Level Agreement (SLA) signed between the customer and the service provider. In this way, when a domain agrees to provide a certain amount of bandwidth to a customer, the provider also agrees to satisfy a certain rate of filtering requests coming from that customer. At the same time, the customer agrees to satisfy a certain rate of filtering requests coming from the provider. The customer-provider pair can be an end-host and an edge network, or an edge network and an ISP, or even a small ISP and a backbone network. 7.2 Incentive for Initial DeploymentIt makes sense for AITF deployment to begin at the edges: An edge network that hosts potential DDoS victims (e.g., a web hosting domain) deploys AITF to protect its clients; an edge network that hosts potential attack sources (e.g., a campus network or a dialup provider) deploys AITF to maintain its connectivity to public-access sites, even when these sites are under attack.For example, in Figure 3, we depicted an early deployment stage, where only eBay and Stanford have deployed AITF. Suppose a worm compromises millions of attack sources, uniformly distributed across the Internet, and commands them to send high-rate traffic to eBay. Without AITF, there is nothing eBay can do; almost all its good traffic is lost in the flood. With AITF, eBay identifies the undesired flows coming from Stanford; eBay's gateway exchanges a handshake with Stanford's gateway, which agrees to block its undesired flows to eBay; eBay's gateway accepts all traffic from Stanford and drops (or rate-limits) all the rest -- most good traffic to eBay is lost anyway due to the attack, so dropping traffic from AITF-unaware domains cannot make things worse. I.e., Stanford is the only domain that cooperates with eBay and, hence, the only domain to maintain its connectivity to eBay throughout the attack. To conclude, the first domains to deploy AITF benefit, because they preserve their connectivity to each other in the face of DDoS attacks. As AITF deployment spreads, the benefit of AITF-enabled domains grows, because they maintain their connectivity to larger and larger portions of the Internet. 7.3 Compatibility with Legacy HostsAITF involves a rather draconian measure against attack sources: Either they stop sending an undesired flow or they get disconnected. One could argue that malicious node M can abuse this measure to disconnect legacy host L: M sends a filtering request to L's gateway to block all traffic from L to M; L's gateway sends a similar request to L (which is ignored, since L is AITF-unaware); then M tricks L into sending it traffic (e.g., sends an ICMP request); as a result, L's gateway disconnects L.This scenario cannot happen. Recall that an AITF domain has a filtering contract with each of its customers; the contract specifies the maximum rate at which the domain sends filtering requests to the customer. An AITF-unaware customer by definition has agreed to rate 0; hence, its provider never sends filtering requests to it. A domain that deploys AITF either forces its end-hosts to deploy AITF as well, or accepts the cost of filtering their undesired flows; for example, L's gateway locally blocks all traffic from L to M for Tlong minutes. The second option is more incrementally deployable, but requires more filtering resources from an attack gateway -- namely, as many filters as undesired flows generated by its end-hosts. As a compromise, a provider can charge legacy customers that do not support AITF, for the potential cost induced by their inability to block their undesired traffic themselves.
8 Discussion of Additional AttacksMalicious nodes collocated with the victim: The handshake between the victim's gateway and the attack gateway protects only the communication between these two entities; it does not protect the communication between the victim and the victim's gateway. A malicious end-host located on the same LAN with end-host V can clearly spoof V's address, send filtering requests as V, and disrupt V's communications.An AITF domain can easily avoid such abuses by either preventing source address spoofing in its own network or authenticating local filtering requests. Note that the latter does not require any public key infrastructure; it only requires from each border router of an AITF domain to share a secret with each of the domain's customers. DDoS against AITF: One could argue that a set of malicious nodes can launch the following attack against router Agw: First, pretend they are victim gateways and send a high rate of filtering requests to exhaust Agw's filters. Once Agw's filters are exhausted, a malicious node located behind Agw is commanded to start an undesired flow against, say, Google; Google asks from Agw to stop, Agw has no filters left and gets disconnected from Google. The first thing to note is that the attack gateway uses the 3-way handshake to verify the "authenticity" of a filtering request, i.e., that the requester is on the path to the alleged victim. Once a request is deemed authentic, the identity of the requesting victim gateway is established, and the attack gateway can accept or drop the corresponding request based on that. The attack gateway satisfies up to a certain rate of requests from each victim gateway to avoid exhausting all its filters to satisfy a few demanding domains. Now consider a set of malicious nodes seeking to attack router Agw. One way is to send to Agw a high rate of authentic filtering requests; the only thing they accomplish is to exhaust the quota of their own domains (note that an AITF-enabled border router can easily prevent its own clients from pretending to be victim gateways and exhausting its quota, simply by dropping all outgoing filtering requests -- recall that no other node but the border router itself is supposed to send filtering requests outside the local domain). The other way is to send to Agw a high rate of spoofed filtering requests; now their requests are dropped. The only harm the malicious nodes can do is launch a SYN-flood-style attack, i.e., flood Agw with fake requests, hoping to exhaust the processing cycles devoted to 3-way handshakes. There are two steps to dealing with such attacks. The first one is to implement the part of the attack gateway algorithm that deals with the 3-way handshake in the fast path -- it just involves hashing certain contents of a received message and comparing the result to a value included in the message. The second step is to let Agw act as the victim: if a set of malicious nodes manage to flood Agw's AITF (or any other) hardware module, then Agw uses the RR headers of incoming traffic to identify undesired flows and asks from its own gateway to have them blocked. Malicious on-the-path nodes: The handshake between the victim's gateway and the attack gateway prevents malicious node M from installing a filter on router Agw to block certain traffic to router Vgw as long as M is off the path from Agw to Vgw. A malicious node on the path from Agw to Vgw can clearly forge filtering requests and disrupt Agw-Vgw communication. However, a malicious node on the path from Agw to Vgw can only be an Internet core router; such a malicious router can disrupt Agw-Vgw communication anyway, e.g., by blocking all their traffic. I.e., if a core router gets compromised, thousands of domains that connect through that router are at its mercy -- it makes little difference whether the router is AITF-enabled or not.
9 Related WorkPacket Marking: The alternative to route record is probabilistic packet marking (PPM)[21,14]: Each participating router marks each forwarded packet with certain probability p; a DDoS victim combines marks from multiple packets to identify the routers that forward attack traffic. Instead of using a separate header, PPM uses a lightly utilized IP header field (typically the 16-bit IP identifier); doing so facilitates deployment and decreases packet overhead. Although a promising and incrementally deployable traceback technique, PPM is less useful in actually blocking attack traffic. Identifying the routers that forward attack traffic is not enough; the victim's gateway Vgw must identify their traffic in order to block it. Even if attack gateway Agw agrees to block an undesired flow, Vgw must still identify Agw's traffic in order to verify that Agw is honoring the agreement. Therefore, Vgw must be able to identify the path followed by each incoming packet at wire speed. The only way to perform path-based wire-speed filtering is to have each packet's path deterministically recorded on the packet.One could certainly argue for "compressing" the path -- no need to record one full 32-bit address per border router! In Path Identifier (Pi) [23], each participating router deterministically marks the forwarded packets, so that each packet obtains a "fingerprint" that reflects the entire path followed by the packet. Indeed, this approach enables the victim (or its gateway) to locally block undesired flows in the face of source address spoofing. However, it does not meet the two following requirements: (i) Vgw must know the addresses of the border routers on an undesired flow's path; otherwise, it must locally block all attack flows itself, which is typically beyond its capabilities. (ii) Vgw must be able to block attack traffic at different granularities, e.g., block all packets with path {* Agw ... Vgw V}. Filtering: The Pushback scheme [18] uses hop-by-hop filter propagation to push filtering of undesired traffic away from the victim: The victim identifies the upstream routers that forward attack traffic to it and sends them filtering requests; the routers satisfy the requests, potentially identify the next upstream routers that forward attack traffic to the victim, and send them similar requests. Each Pushback router installs a single filter per victim and rate-limits all traffic to each victim. The main benefit of this approach is that its does not require knowledge of the attack paths, which makes it deployable without packet marking. However, when attack traffic and good traffic share common paths, and attack traffic is of much higher rate than good traffic (increasingly the case, nowadays), rate-limiting all traffic to the victim sacrifices most of the victim's good traffic. The Stateless Internet Flow Filter (SIFF) [9] divides all Internet traffic into two categories: privileged and non-privileged. A client establishes a privileged channel to a server through a capability exchange handshake; the client includes the capability in each subsequent packet it sends to the server; each router along the path verifies the capability and gives priority to privileged traffic. The main advantage of SIFF is that it does not require any filtering state in the routers. However, once a server is under attack, a new client must contend with attack traffic to establish a connection -- because channel establishment involves an exchange of non-privileged packets. It also requires counter-measures to prevent malicious nodes from establishing privileged channels between themselves and flood the network with privileged traffic. Secure Overlays: The set of AITF-enabled border routers can be viewed as a "filtering overlay". Filtering overlays have also been suggested as a way to prevent DoS against critical applications (like Emergency Services) that are meant to be accessed only by authorized users [16,12]. In that context, the overlay nodes perform client authentication and relay traffic to a protected server, whose IP address is unknown outside the overlay.
10 ConclusionWe presented Active Internet Traffic Filtering (AITF), a mechanism for filtering highly distributed denial-of-service attacks. We showed that AITF can block a million undesired flows, while requiring only tens of thousands of wire-speed filters from each participating router -- an amount easily accommodated by today's routers. It also prevents abuse by malicious nodes seeking to disrupt other nodes' communications.More specifically, we showed the following: 1. AITF offers filtering response time equal to the one-way delay from the victim to the victim's gateway. I.e., a victim can have an undesired flow blocked within milliseconds. 2. AITF offers filtering gain on the order of hundreds of blocked flows per used filter. I.e., a router can block two orders of magnitude more flows than it has wire-speed filters. For example, suppose eBay is receiving a million undesired flows; with 10,000 filters, eBay's gateway can have all flows blocked within 100 seconds. In the worst-case scenario, eBay's gateway blocks all traffic from each domain that hosts attack sources and refuses to filter their traffic, which (in today's Internet) requires a few tens of thousands of filters. 3. A set of malicious nodes can practically not abuse AITF to disrupt communication from node A to node B, as long as they are not located on the path from A to B. This holds even during initial deployment, where most Internet domains are AITF-unaware. The idea behind AITF is that the Internet does have enough filtering capacity to block large amounts of undesired flows -- it is just that this capacity is concentrated close to the attack sources. AITF enables service providers to "gain access" to this filtering capacity and couple it with a reasonable amount of their own filtering resources, in order to protect their customers in the face of increasingly distributed denial-of-service attacks.
11 AcknowledgmentsWe would like to thank Daniel Faria, Evan Greenberg, Dapeng Zhu, George Candea, Nagendra Modadugu, Costa Sapuntzakis, Tassos Argyros and our shepherd, Mema Roussopoulos, for their constructive feedback.
A Appendix: Route RecordA.1 Header UpdateThe RR header consists of three fields: the path is a list of (initially empty) slots; the size is the total number of slots in the path; the pointer points to the first empty slot in the path. The first RR-enabled border router on a packet's path (i) inserts an RR header in the packet; (ii) writes its own IP address and random value in the first slot; and (iii) sets the pointer to point to the second slot. Each subsequent RR-enabled border router that ingresses the packet into a new AS (i) reads the pointer; (ii) writes its own IP address and random value in the indicated slot; and (iii) increments the pointer to point to the next slot. If there is no room left in the RR header, the router drops the packet.A.2 Hardware SupportEach RR-enabled border router updates the RR headers of forwarded packets as described in Section A.1. RR-header update requires (i) computing a keyed hash function on the packet's destination, (ii) reading and modifying the RR pointer, and (iii) writing the router's address and random value on the indicated slot. So, although the RR header has a variable length, its update requires reading/modifying a 4-bit pointer and a 96-bit path component (assuming 64-bit random values). Computing a 64-bit hash (like HMAC-SHA1) per forwarded packet can be easily done today at wire-speed [11].As mentioned in Section A.1, RR headers are not inserted by end-hosts; they are inserted by the first border router on each packet's path -- call it the packet's "gateway". So, when an edge network deploys RR, it upgrades its border routers to support RR-header insertion. A border router determines the required RR header size for each packet, by looking up the AS path length to the packet's destination domain as communicated through BGP. If the real AS path turns out to be longer than the communicated AS path, the packet gets dropped; the packet's gateway receives an ICMP message, increases the RR header size and retransmits the packet -- i.e., AS path length discovery is similar to MTU discovery using the "Don't Fragment" IP-header flag. Once a router discovers the real AS path length to a destination domain, it caches its value to avoid future retransmissions. RR-header insertion is similar to packet encapsulation, a well-studied operation, for which multiple hardware implementations already exist. Note that, as stated in Section 7.2, RR/AITF deployment starts at the edges; i.e., it is edge routers that insert RR headers, not core routers connected to Internet backbones. Edge routers are connected at best through OC-48 (2.488 Gbps) links; the Cisco 10000 Series OC-48c linecard already supports encapsulation.
A.3 Compatibility with Legacy HostsBefore inserting an RR header in an outgoing packet, the packet's gateway must make sure that the receiving domain has deployed RR; otherwise, the receiver will not recognize the RR header and drop the packet. Hence, each packet gateway keeps track of the destination ASes to which it forwards packets, and asks them whether they care to receive RR headers.To avoid introducing latency, a packet gateway forwards packets without inserting RR headers and starts doing so as soon as it concludes that the destination domain is RR-enabled. To avoid querying the destination domain on each packet, it periodically queries potential destination domains (e.g., once per hour) and caches their response. Note that the number of potential destination domains can be no bigger than the number of Internet ASes -- 19,230 for the current Internet.
A.4 Bandwidth OverheadRR bandwidth overhead depends on the average number of ASes per packet path. Although we do not know this number, we can approximate it with the average number of AS-level hops between Internet ASes, which is close to 4 [17,10]. Assuming an average of 4 ASes per Internet path and 64-bit random values, route record introduces on average 49 extra bytes per packet. For an average packet size of 500 bytes [5], this leads to 10% bandwidth overhead.Bandwidth overhead can be reduced to 5% by using the following twist: Instead of each router adding a random value to the recorded path, only one router does so; the first border router that forwards the packet into an RR-unaware domain. For example, consider the topology in Figure 1; suppose only ANET, AISP and VNET have deployed AITF. If a packet is sent from A to V, the only router that adds a random value to the packet's header is X. This reduces bandwidth overhead, at the cost of restricting the routing of filtering requests: If V sends a filtering request against A's traffic, that request must be routed through X so that X verifies the authenticity of the recorded path. Bibliography
|
This paper was originally published in the
Proceedings of the 2005 USENIX Annual Technical Conference,
April 1015, 2005, Anaheim, CA, USA Last changed: 2 Mar. 2005 aw |
|