Next: Implementation
 Up: Alleviating the Latency and
 Previous: Cache Manager
We used several heuristics in order to improve the efficiency 
of pre-fetching. While many of the heuristics listed below are 
simple and intuitively obvious, we found that a combination of 
these heuristics provided a remarkable performance improvement 
in our daily operation. This section lists the heuristics we used. 
- 1.
 - Web Sessions: The notion of a Web session had one of the 
greatest performance impacts. Without the notion of a session, 
the pre-fetch engine re-issued multiple requests for the same 
documents if a user accessed the same pages again later in the session. 
With the notion of a session, documents are only fetched once 
during the session. Subsequent requests are satisfied from the 
cache, unless the user explicitly requests a fresh access using 
the RELOAD button (which issues a HTTP ``Pragma: no-cache'' header). 
At the start of a session, documents with the highest node weights 
are hoarded. The idea of Web sessions is not new. Current browsers
all have a notion of a Web session.
 - 2.
 - Hoard walking: Hoard-walking [14] periodically refreshes 
pages with the highest node weight. Since hoard-walking involves pre-fetching
the pages in the user defined hoard file as well as the most heavy nodes 
in the learnt database, consequently, a large fraction of typical user 
accesses can be satisfied locally during disconnection.
 - 3.
 - Issue Of Pre-Fetch Requests: Pre-fetches are performed only 
when the network is ``idle''. In our system, only four ongoing 
pre-fetches are allowed at any time at the local proxy server and
only eight ongoing pre-fetches are allowed at any one time in the
backbone proxy server. Furthermore, on the local proxy server,
pre-fetches are not started until there are no pending user requests. 
We observed performance improvements when we amortize pre-fetch requests. 
Of course, not issuing pre-fetch requests while there is a pending request 
might actually lower performance, especially in the case of requests 
with high delays. However, giving priority to user requests eliminates 
the harmful effects of pre-fetch tying up bandwidth when it is needed.
 - 4.
 - Weighting edges: Using weighted edges as opposed to only using node 
weights for pre-fetches ensures that the proper usage pattern is 
captured. When a user visits a URL, we should choose the edge with the
heaviest weight rather than the adjacent node with the heaviest node
weight. For example, consider the following scenario:
C was visited 2 times from A and 100 times from B. B was visited 10 times 
from A. When a user is at A, B should be pre-fetched even though it has 
a smaller node weight than C.
 - 5.
 - Dynamically determining a document's dependents: 
We distinguish a document from the images that are linked to it 
(which constitute its dependents). Dependents do not appear in the user 
profile, because they are accessed automatically upon access of the 
document, and also because they change frequently over time. The original 
implementation stored the URLs of the dependents. However, due to the 
frequent changes in HTML documents, requests were being made for 
non-existent documents. This heuristic removed all those redundant 
pre-fetch requests, at the same time keeping the user profile graph 
small. Furthermore, pre-fetch requests for dependents are issued (at both
the local proxy server and the backbone proxy server) before the browser
issues them.
 - 6.
 - Continued download of document: Even after the user specifies 
``stop'', we continue downloading a document in the local proxy 
server. This allows a user to click on another page while the previous 
page is being downloaded in the background. It also serves as a crude 
form of short-term user-specified hoarding or ``user-driven pre-fetch''. Even 
though this is not captured in the performance tests, we found it 
extremely useful when we were reading a page with links we knew we wanted 
to visit. We would click on the link and quickly press stop. This would issue 
the pre-fetch request but keep the browser on the current page. Later, 
when we eventually clicked on the page, it came up instantly.
 - 7.
 - CGI scripts: CGI and other dynamic pages are pre-fetched and 
retained in the cache for a short period of time. Currently, CGI pages 
are not cached either in browsers or proxies; thus CGI latency is not 
hidden. However, with this heuristic, we can pre-fetch CGI pages and 
cache them for short periods of time in anticipation of an access. A 
CGI page is deleted upon the timeout, or after it is read once.
 - 8.
 - HTTP Redirections: HTTP response codes 301 and 302 indicate 
that a particular URL has moved. If the local proxy system detects 
this, it will store the redirection and provide the correct response 
to the browser. This prevents the browser from reconnecting to the 
server.
 - 9.
 - Thresholds: Since many documents are only accessed once from a 
parent document (e.g. a news item from a topic), we impose a minimum 
threshold edge frequency in order to pre-fetch the node. Another 
threshold we impose is the size of documents. Large documents (more than 
1 megabyte) are not pre-fetched in our current implementation. 
However, we anticipate that with the inclusion of size as a weight 
parameter, this heuristic will be subsumed in the weighted edge heuristic.
 
 
 
   
 Next: Implementation
 Up: Alleviating the Latency and
 Previous: Cache Manager
Sau Loon Tong
10/26/1997