Creating and managing a high-performance, Internet-scale Web service is a formidable challenge involving deployment of multiple Web servers in strategic locations throughout the network. The introduction of Content Distribution Networks (CDNs) has allowed organizations to overcome this challenge by outsourcing the distribution of their Web content. With CDNs, content providers need only to supply an origin Web server -- the CDN distributes the content to end users through a set of CDN servers it has deployed in the network. Ideally, this reduces Web response time and download latencies in addition to providing overload protection and bandwidth savings.
In a well-designed CDN, servers are placed to avoid congested links and slow network paths. When a Web client requests content, the CDN dynamically chooses a server to route the request to, usually one that is appropriately close to the client. Note that this dynamic CDN request routing is an extra step that is not necessary for stand-alone Web servers. Efficient CDN server selection allows CDNs to overcome the extra overhead of the dynamic routing step by taking advantage of improved connectivity to the end user. CDN server selection applies for both static and dynamic content. In the latter case, content can be dynamically assembled at the edge servers [1].
CDNs typically perform dynamic request routing using the Internet's Domain Name System (DNS) [11]. The DNS is a distributed directory whose primary role is to map fully qualified domain names (FQDNs) to IP addresses. To determine an FQDN's address, a DNS client sends a request to its local DNS server. The local DNS server resolves the request on behalf of the client by querying a set of authoritative DNS servers. When the local DNS server receives an answer to its request, it sends the result to the DNS client and caches it for future queries. Each DNS record has a time-to-live (TTL) field that tells the local DNS server how long it may cache the result.
Normally, an authoritative DNS server's association from FQDNs to IP addresses is static. However, CDNs use modified authoritative DNS servers for CDN server selection. The results of a DNS query to one of these DNS servers may vary dynamically depending on factors such as the source of the request and the condition of the network. Typically, the CDN's authoritative DNS server maps the client's local DNS server address to a geographic region within a particular network and combines that with network and server load information to perform CDN server selection. To enable fast reaction to dynamic resource changes, the answer returned by the CDN's DNS server has a small TTL. This approach is largely transparent to the client, and works for any Web content (including both HTML and streaming media).
Although DNS-based server selection is transparent and general, it has two inherent limitations [15,4]. First, it is based on the implicit assumption that clients are close to their local DNS servers. The CDN DNS server performing dynamic request routing only has access to the client's local DNS server's IP address--it does not know the client's own IP address. However, the assumption that clients are close to their local DNS server may not be valid. For example, the client might be using a local DNS server hierarchy in which the outermost local DNS server that communicates with authoritative DNS servers may be far removed from clients; the client may have been configured with a local DNS server which is far away; or the client may be using a secondary local DNS server that is more distant from it than its primary local DNS server. Therefore, using only the local DNS server information to select CDN servers has the inherent risk of selecting a server farther away from the client than other available CDN servers.
The second inherent limitation of DNS-based server selection is that a single request from a local DNS server can represent differing numbers of Web clients -- this is called the hidden load factor [8]. The hidden load has implications on a CDN's load balancing algorithm. For example, a DNS request from a local DNS server of a large ISP may result in many more Web requests than a DNS request from a local DNS server of a small site. CDNs need to be able to properly weigh individual DNS requests to distribute Web requests among its CDN servers. If the hidden load factors are known, load balancing algorithms described by Colajanni, et al. [7,8] can be easily deployed to achieve better load distribution. On the other hand, if the hidden load factors are not known, fine-grained request distribution may be difficult.
We study the extent of the first limitation and its impact on CDN server selection. To this end, we developed a simple, non-intrusive, and efficient mapping technique to determine the associations between clients and local DNS servers. We deployed this technique on several sites to collect an extensive data set which we use to study the impact of proximity on DNS-based server selection using four different proximity metrics. We conclude that DNS is good for very coarse-grained server selection, since 64% of the associations belong to the same Autonomous System (AS). DNS is less useful for finer-grained server selection, since only 16% of clients use DNS servers in the same network-aware cluster [13] (based on BGP routing information). We also measure the CDN server distribution of several real-world CDNs to evaluate whether the proximity of a client to its local DNS server leads to potentially suboptimal CDN server selection decisions in practice. Our technique could also be used to determine hidden load factors by associating the HTTP request pattern in the Web server logs with the DNS request information.
Our work makes the following contributions. We developed a novel measurement methodology and architecture for accurately collecting local DNS server IP addresses of Web clients. We demonstrated its successful deployment on several sites including a large commercial site and through the collection of a huge database of associations. Based on this data, we did an extensive analysis of the proximity between clients and their local DNS servers and discovered that significant improvement in proximity is possible by configuring clients to use a closer local DNS server. Finally, we evaluated the impact of the proximity between clients and their local DNS servers on server selection in three of the largest commercially deployed CDNs. We conclude that DNS is good for very coarse-grained server selection, but less suitable for fine-grained request distribution.
The rest of the paper is organized as follows. Section 2 describes our methodology and measurement setup for gathering DNS client associations. In Section 3, the association results are analyzed in detail to evaluate the proximity between the client and its local DNS server. Then, in Section 4 we study the impact of proximity evaluation on DNS-based server selection in three of the largest commercially deployed CDNs. Related work is covered in Section 5. In section 6, we discuss future work. Section 7 concludes.