There are three main components necessary to use our technique: a specialized authoritative DNS server, an HTTP redirector, and a one-pixel embedded transparent GIF image. To obtain a client population we solicited volunteer Web sites. All the volunteers had to do to participate in our study was to add a link to our one-pixel transparent GIF to the end of one or more of their commonly accessed Web pages. Assuming the experiment is hosted by us at example.com, this involves adding the following HTML code towards the end of a web page:
<img src="https://xxx.rd.example.com/tr.gif" height=1 width=1>
To allow us to easily account for hits from different sites, each participant replaces xxx in the URL with a site identifier3. This allows us to easily add additional volunteer sites without having to make any changes to our Web or DNS server configuration.
When a Web client loads the one-pixel embedded image, our technique allows us to match the address of the local DNS server resolving host names on behalf of the client with the address of the client itself. This process is shown in Figure 1.
First, the client attempts to get the image from xxx.rd.example.com -- our HTTP redirector. Rather than serving the image, the redirector determines the client's IP address and issues an HTTP redirect to ipCLI.cs.example.com, where CLI is replaced with a string encoding the IP address of the client (step 2). Next, the client contacts its local DNS server to resolve this domain name (step 3). The client's local DNS server attempts to resolves ipCLI.cs.example.com by sending a DNS request to our authoritative DNS server (step 4). At this point our authoritative DNS server logs the IP address of the local DNS server and the client IP address embedded within the query. It then sends the address of the content server hosting the image back to the client's local DNS server (step 5). This resolution is passed on to the client (step 6), which retrieves the image from the content server (steps 7 and 8).This measurement methodology has a limitation for clients that do not fetch inlined images and those that abort the page download process before the DNS resolution is made for the embedded image. In these cases, we are unable to collect their local DNS server information.
Note that in some cases, a local DNS server hierarchy may exist. The local DNS server recorded in our measurement is the outermost local DNS server which directly contacts the authoritative DNS server for the example.com domain. In DNS-based server selection, the CDN's DNS server only sees the outermost local DNS server. In this study, this outermost DNS server is what we refer to as the ``local DNS server.''
This measurement approach is fully deterministic. It collects one association each time a new client visits a site with the embedded image. Multiple pages on the same site, or subsequent visits to the same page, may result in repeated retrievals of the calibrating image depending on the client's caching policy.
Note that the redirector also logs client requests -- this information can be correlated with the DNS and web server logs to obtain the hidden load factors. Statistics on client browsing characteristics can also be gathered from the HTTP headers in the redirector log.