To experiment with the API, both from a functional aspect as well as from a performance perspective, we developed some modules that use various aspects of the interface, including a module that implements ICAP. We were pleased with the simplicity of module development and the compactness of the code necessary to implement various features. Initial development and testing of each module required from a few hours to a few days. More detail about each module's behavior and implementation is provided below. Table 4 summarizes information about the code size needed for each module. Since the modules are freed from the task of implementing basic HTTP mechanisms, none of them are particularly large. The ``Total Lines'' count includes headers and comments, the ``Code Lines'' count removes all blank lines and comments, and the ``Semicolons'' count gives a better feeling for the number of actual C statements involved. All modules use the callback interface, with some spawning separate helper processes under their control.
Ad Remover - Ad images are modified by dynamically rewriting
their URLs and leaving the original HTML unmodified. On each client
request, the module uses a callback to compare the URL to a known
list of ad server URL patterns. Matching URLs are rewritten to point
to a cacheable blank image, leading to cache hits in the proxy for
all replaced ads. To account for
both explicitly-addressed and transparent proxies, the module constructs
the full URL from the first line of the
request and the Host header line of the request header. On
modified requests, the Host header must be rewritten as
well, utilizing the DR_Header functions.
Other uses for this module could include
replacing original ads with preferred ads.
Dynamic Compressor - This module invokes the zlib library from
callbacks to compress data from origin servers and then caches the
compressed version.
Clients use less bandwidth and the proxy avoids
compressing every request by serving the modified content on future
cache hits. This module checks the request method and Accept-Encoding
header to ensure that only
GET requests from browsers that accept compressed content are
considered. The response header is used to ensure that only full
responses (status code 200) of potentially compressible types
(non-images) are compressed. The header is also checked to ensure that
the response is not already being served in compressed form and is
not an object with multiple variants (since one of those variants may already
be in compressed form). Using the DR_Header functions, the
outbound response must be modified to remove the original
Content-length header and to insert a Vary header to
indicate that multiple versions of the object may now exist.
Image Transcoder - All JPEG and GIF images are converted to
grayscale using the netpbm and ijpeg packages.
Since this task may be
time-consuming, it is performed in a separate helper process. The
module buffers the image until it is fully received, at which point it
sends the data to the helper for transcoding. The helper returns the
transcoded image, or the original data if transcoding fails. The
module kills and restarts the helper if the transcoding
library fails, and also limits the
number of images waiting for transcoding if the helper can not satisfy
the incoming rate of images. The module uses the DR_FDPoll
functions to communicate with the helpers, the DR_Header
functions to modify the response, and the DR_RespBodyInject function
to inject content into an active connection.
Text Injector - The main module scans the
response to find the end of the HTML head tag, and then calls out
to a helper process to determine what text should be inserted into the
page. The helper process currently only responds with a text line
containing the client IP address, but since it operates
asynchronously, it could conceivably produce targeted information
that takes longer to generate. The module passes data back to the
client as it scans the HTML, so very little delay is introduced. For
reasons similar to the case of the Image Transcoder module, the
DR_FDPoll, DR_Header, and DR_RespBodyInject
functions are all invoked.
Content Manager - This demonstration module accepts local
telnet connections on the machine and presents an interface to the
DR_Obj
content management functions. The administrator can query URLs, force
remote fetches, revalidate objects, and delete objects. Object
contents can also be displayed, and dummy object data can be forced
into the cache. The module uses the DR_FDPoll family of functions to
perform all processing in callback style even while waiting on data
from network connections.
ICAP Client - This module implements the ICAP 1.0 draft for
interaction with external servers that provide value-added
services [8]. The module must encapsulate HTTP
requests and responses in ICAP requests,
send those requests to ICAP servers, retrieve and parse responses,
and send forth either the
original HTTP message or the modified message provided by the ICAP
server. All of this processing can be implemented through callbacks
with the assistance of the polling functions for network event
notification. Implementing ICAP as a module rather than an integrated
part of the proxy core is particularly appropriate as ICAP specifications
continue to evolve.