Our work is motivated by a related effort by Shaikh, et al. [15] examining the effectiveness of DNS-based server selection. They developed a method of finding client-LDNS associations using time correlations of DNS and HTTP requests from DNS and Web server logs. However, as they have noted, the associations obtained using their method are inherently inaccurate due to clock skews, client DNS caching, and mishandling of TTLs. To resolve ambiguities, they used heuristics based on AS numbers and domain names to decide whether a client and a nameserver did in fact belong together. This heuristic removed misconfigured client-nameserver pairs and did not assure the correctness of associations. They also obtained a set of 1090 client-LDNS associations from accounts with 9 commercial ISPs to study the proximity correlations.
In comparison, our method provides accurate associations eliminating any need for validation. Furthermore, our study has more than 4.2 million associations, consisting of clients from a diverse set of ISPs, far exceeding their data set of 1090 associations.
More recently, Bestavros, et al. [4] have also developed a method for finding client-LDNS associations by assigning multiple IP addresses to a Web server and correlating DNS lookups with client IPs based on the server IP used. This method is slow in discovering client-LDNS pairs due to the limited number of IP addresses a Web server can have. In addition, their method is complicated to implement, requiring reassignment of server IPs and modification of the Web server.
Compared to both works, the distinguishing features of our measurement methodology are efficiency, nonintrusiveness, and accuracy. This allowed us to collect more extensive data, which we used to evaluate the effectiveness of DNS-based server selections using four different proximity metrics in several real-world CDN settings. To our knowledge, we are the first to conduct such an exhaustive proximity evaluation between clients and their local DNS servers using such a representative data set. We are also not aware of other work in examining the impact that the proximity between the local DNS server and the client has on DNS based server selection in commercial CDNs.
There has been a recent effort within the IETF to categorize different mechanisms for request routing in CDNs [3]. DNS-based redirection is one of those mechanisms, and our methodology may prove useful in evaluating the effectiveness of this technique in that context.