As described earlier, during our HTTP fetch tests, we measure the time of local DNS lookups. When local name servers are having problems, DNS lookups can take many seconds to finish, despite usually taking only a few milliseconds. We further investigate how DNS lookups behave on each proxy by looking at DNS failure rates and average response time for successful queries. If a DNS lookup takes longer than 5 seconds, we regard it as a DNS failure, since this value is the resolver's default timeout.
Figures 13 and 14 show the DNS failure rates and DNS average lookup time for successful queries on 2 of our sampling proxies, ny-1 on east coast and st-1 on west coast. DNS lookup time is usually short (generally well below 50ms), but there are spikes of 50-100ms. Recall that these lookups are only for the controlled set of the intra-CoDeeN ``fetch'' lookups. Since these mappings are stable, well-advertised, and cacheable, responses should be fast for well-behaved name servers. Anything more than tens of milliseconds implies the local nameservers are having problems. These statistics also help to reveal some major problems. For example, the st-1 proxy has a period of 100% DNS failure rate, which is due to the name server disappearing. The problem was resolved when the node's resolv.conf file was manually modified to point to working name servers.
Though DNS failure rates on individual proxies are relatively low, the combined impact of DNS failures on web content retrieval is alarming. Downloading a common web page often involves fetching the attached objects such as images, and the corresponding requests can be forwarded to different proxies. Supposing an HTTP session involves 20 proxies, Figure 15 shows the probability of incurring at least one DNS failure in the session. From the cumulative distribution we can see that for more than 40% of time we have DNS failure probability of at least 10%, which would lead to a pretty unpleasant surfing experience if we did not avoid nodes with DNS problems. Note that these problems often appear to stem from factors beyond our control - Figure 16 shows a DNS nameserver exhibiting periodic failure spikes. Such spikes are common across many nameservers, and we believe that they reflect a cron-initiated process running on the nameserver.
To avoid such problems, we have taken two approaches to reduce the impact of DNS lookups in CoDeeN. The first is a change in redirector policy that is intended to send all requests from a single page to the same reverse proxy node. If a request contains a valid ``referer'' header, it is used in the hashing process instead of the URL. If no such header exists, the last component of the URL is omitted when hashing. Both of these techniques will tend to send all requests from a single page to the same reverse proxy. Not only will this result in fewer DNS lookups, but it will also exploit persistent connections in HTTP. The second modification to reduce the problems stemming from DNS is a middleware DNS brokering service we have developed, called CoDNS. This layer can mask local DNS resolver failure by using remote resolvers, and is described elsewhere [17].