A lower bound on referrals

**Figure 5:** **Bounding referrals from below**: User U retrieves `pageA.html` (message 1) and clicks on a link in it, causing `pageB.html` to be requested (message 2). Only if B responds (message 3), the browser notifies A of the referral (message 4).
$\begin{figure} \rule{\columnwidth}{.5mm} \begin{center} \setlength {\unitlength... ...pageA.html})}}}}}\end{picture}}\end{center}\rule{\columnwidth}{.5mm}\end{figure}$

Achieving the interaction of Figure 5 is more complicated than the simple tricks of Section 2. The general strategy that we take is as follows. When the link to pageB.html in pageA.html is clicked by the user, pageA.html opens a new browser window, named nextpage, and directs pageB.html to be displayed there. This enables JavaScript embedded in pageA.html to continue to run in the original window while pageB.html is being loaded. The goal then is for the pageA.html script to detect when the nextpage window has received a response from site B (i.e., message 3 in Figure 5), indicating that B has received the HTTP request for pageB.html including the Referrer field crediting A for the referral. When it detects this, the script in pageA.html causes a URL on site A to be requested, thereby notifying A of the referral (message 4 in Figure 5).

The complexity in this approach is in the means by which the pageA.html script detects that site B has responded. A first attempt might be for the script to set the onload event handler for the nextpage window when that window is created. The onload event handler is invoked when pageB.html finishes loading into the nextpage window (see [Fla98]). Thus, if pageA.html sets the onload event handler to be a function that notifies site A of the referral, this would achieve the exchange of Figure 5. While this works with NC4, it does not work with IE4: presumably for security reasons, IE4 clears the nextpage window's onload event handler before loading pageB.html, and so A is not notified when pageB.html has loaded. Moreover, NC4's failure to clear the onload event handler is arguably a weakness in its security model that should disappear in future versions of the browser (see [AM98]).

Fortunately, security mechanisms similar to those that cause this approach to fail for IE4 can be exploited in both browsers to achieve the effect we desire. The approach we take is for the script in pageA.html to periodically probe the JavaScript namespace of the nextpage window. Before B has responded, these probes will be allowed by the browser. After B has responded, however, these probes will be disallowed by the browser's security mechanisms (see [Fla98, Chapter 21]), causing a JavaScript error. By specifying an appropriate error handler for this error, the script in pageA.html can notify site A of the referral. In the remainder of this section, we present an implementation of these ideas. For simplicity, our implementation here is not fully general; in particular, it suffices only for the case in which pageA.html offers a link to only one target site B. However, it can be generalized so that pageA.html can offer multiple target sites.

As in Section 2.2, our solution here is structured using a file pageAcontents.html to hold the actual contents of the page that A wants to display to the user (including the link to B's page), which is served to the user within pageA.html in a frameset. The file pageAcontents.html now looks as shown in Figure 6. The two differences from the previous pageAcontents.html (Figure 3) are the addition of a target attribute in the link and the invocation of setup (vs. notify) in the onClick event handler. Due to the latter, when the link is clicked, now the setup function is invoked, which is defined in pageA.html as shown in Figure 7.

**Figure 6:** File `pageAcontents.html` for scheme of Section 3
$\begin{figure} \rule{\textwidth}{.5mm} \begin{verbatim} <html\gt <!-- File: pag... ...r site B. </a\gt ... </html\gt\end{verbatim}\rule{\textwidth}{.5mm}\end{figure}$

**Figure 7:** File `pageA.html` for scheme of Section 3
$\begin{figure} \rule{\textwidth}{.5mm} \begin{verbatim} <html\gt <!-- File: pag... ...''\gt </frameset\gt </html\gt\end{verbatim}\rule{\textwidth}{.5mm}\end{figure}$

When invoked, the setup function opens a new browser window named nextpage (as specified in the second argument of the window.open method call). This name is the value of the target attribute of the link to pageB.html in pageAcontents.html (see Figure 6), which means that pageB.html will be displayed in this new window when it is eventually retrieved. To ensure that the script in pageA.html is allowed to probe the namespace of nextpage until B has responded, setup initially writes a simple HTML page into nextpage, using the document.open, write, and close methods.

The probes into the namespace of the nextpage window are performed by the probe function. The probe function attempts to read a portion of the namespace of the nextpage window (in this case, its location.href property) that will cause an error after B has responded but will be allowed beforehand. If its read is allowed, then the probe function sets a timer so that it is invoked again 100 milliseconds later. Otherwise the error handler for the window in which this script is running, i.e., the window displaying pageA.html, is invoked. This error handler schedules an invocation of the notify function (see the line before the </script> tag in Figure 7). As in Section 2.2, this function invokes the record.cgi CGI script on site A with the URL of pageB.html, which was stored in the retrieved variable during the execution of setup. Again, the trick of assigning to the location property of an invisible frame is used to invoke record.cgi.

To summarize, this achieves the mechanism shown in Figure 5: when the link to pageB.html in pageAcontents.html is clicked by the user, (i) the setup function is invoked to open a new browser window; (ii) pageB.html is retrieved and displayed in that window (messages 2,3); (iii) an error is encountered in probe; (iv) the error handler is invoked to schedule notify, which (v) invokes record.cgi on site A with the URL of pageB.html (message 4). Moreover, if B does not receive message 2, then neither message 3 nor message 4 will be sent. Like the mechanisms of Section 2, this technique requires no cooperation from B for A to track the referrals its pages give to B. And again, this technique presents nothing out of the ordinary to the user; spawning new windows is not uncommon while following links to other sites.

The main factor limiting the accuracy of A's referral counting with this scheme appears to be the risk that the user closes the window containing pageA.html before notify is invoked. In this case, the script in pageA.html will be halted before site A is notified of the referral. Thus, at best we can claim that this mechanism reports a lower bound on the number of referrals that A gives to B. While there are other potential sources of inaccuracy, we believe that those of which we are aware can be discounted or virtually eliminated. For example, there is a risk that the user aborts the loading of pageB.html before receiving a response from site B and instead loads a different page in the nextpage window, in which case the referral notification to A would be erroneous. However, this risk is mitigated if the window is created with no location line or toolbar, as is achieved by the third argument of the open call in Figure 7. Another risk is that some party invokes record.cgi arbitrarily, and in particular when no referral has taken place. However, there seems to be little practical motivation for such ``attacks''.

Because this scheme reports only a lower bound on referrals, perhaps the most prudent use of it is in combination with that of Section 2. Combined in the obvious way, these mechanisms enable site A to retain both an upper and lower bound on the number of referrals that it has given to B, and typically these numbers should be very close to one another. A large gap in these bounds indicates to the webmaster of site A that she should examine the availability of site B. Even without further evidence, large discrepancies between these upper and lower bounds may be good cause to no longer advertise B's pages.