When deploying Internet servers in a multihomed environment, it is useful to be able to transparently direct connections initiated by external clients over a specific link, according to performance or other metrics. Recently, several route control device vendors have introduced features to use Domain Name System (DNS) resolution requests as a means to direct inbound client traffic over the desired link. In this scheme, it is assumed that the destination IP address used by the client determines which ISP link is used for the connect request. Hence, by responding with the appropriate IP address when the client makes a request to resolve a service name (e.g., www.service.com), the inbound link can be selected. This is very similar to using DNS as a server selection mechanism in content distribution networks [19].
While DNS is a convenient and relatively transparent mechanism, it is unclear whether it can respond quickly enough for dynamic route control. Responses to name resolution requests have an associated time-to-live (TTL) value that determines how long the response should be cached by the client's local nameserver. Ideally, by setting the TTL to a very small value (e.g., 10s or even zero), it is possible to force external clients to resolve the IP address frequently, thus providing fast responsiveness. In practice, however, this is complicated by the behavior of the wide variety of applications and DNS servers deployed in the Internet. Many applications perform their own internal DNS caching that does not adhere to the expected behavior, and some older implementations of DNS software have been reported to ignore low TTL values. These artifacts make it difficult to predict how quickly clients will respond to changes communicated via DNS responses.
In order to quantify the responsiveness of DNS in practice, we perform a simple analysis of client behavior in response to DNS changes during a large Web event. We collect logs from a set of Web caches that served requests for content related to a Summer 2003 sporting event with global audience. During the event when the request rate was very high, the authoritative nameservers directed all clients to the set of caches with a 10min TTL. After the event, the nameservers were updated to direct clients to lower capacity origin servers. Ideally, all traffic to the caches should subside after 10min.
Figure 12 shows the aggregate request volume to all caches over time, just before and after the DNS change, where requests were gathered into 1-minute intervals. During the one-hour period after the DNS change, requests came from about unique client IP addresses and unique IP subnets. The number of subnets is computed by clustering client IP addresses using BGP tables obtained from [11,2].
Figure 12(a) shows the last part of the trace, with a clear peak occurring on the last day of the event, followed by a period of relatively constant and sustained traffic, and finally a sharp dropoff corresponding to the time when the DNS is updated. Figure 12(b) focuses on the time around the DNS update; the solid line denotes the time of the update and the dashed line is the time when the 10min TTL expires. Between these times, the request volume decreases by 66%. The remaining third of the traffic decays very slowly over a period of more than 12 hours. While this analysis is not definitive, it does suggest that DNS is at best a coarse-grained mechanism for controlling traffic.