To experiment with the API, both from a functional aspect as well as from a performance perspective, we developed some modules that use various aspects of the interface, including a module that implements ICAP. We were pleased with the simplicity of module development and the compactness of the code necessary to implement various features. Initial development and testing of each module required from a few hours to a few days. More detail about each module's behavior and implementation is provided below. Table 4 summarizes information about the code size needed for each module. Since the modules are freed from the task of implementing basic HTTP mechanisms, none of them are particularly large. The ``Total Lines'' count includes headers and comments, the ``Code Lines'' count removes all blank lines and comments, and the ``Semicolons'' count gives a better feeling for the number of actual C statements involved. All modules use the callback interface, with some spawning separate helper processes under their control.
Ad Remover - Ad images are modified by dynamically rewriting their URLs and leaving the original HTML unmodified. On each client request, the module uses a callback to compare the URL to a known list of ad server URL patterns. Matching URLs are rewritten to point to a cacheable blank image, leading to cache hits in the proxy for all replaced ads. To account for both explicitly-addressed and transparent proxies, the module constructs the full URL from the first line of the request and the Host header line of the request header. On modified requests, the Host header must be rewritten as well, utilizing the DR_Header functions. Other uses for this module could include replacing original ads with preferred ads.
Dynamic Compressor - This module invokes the zlib library from callbacks to compress data from origin servers and then caches the compressed version. Clients use less bandwidth and the proxy avoids compressing every request by serving the modified content on future cache hits. This module checks the request method and Accept-Encoding header to ensure that only GET requests from browsers that accept compressed content are considered. The response header is used to ensure that only full responses (status code 200) of potentially compressible types (non-images) are compressed. The header is also checked to ensure that the response is not already being served in compressed form and is not an object with multiple variants (since one of those variants may already be in compressed form). Using the DR_Header functions, the outbound response must be modified to remove the original Content-length header and to insert a Vary header to indicate that multiple versions of the object may now exist.
Image Transcoder - All JPEG and GIF images are converted to grayscale using the netpbm and ijpeg packages. Since this task may be time-consuming, it is performed in a separate helper process. The module buffers the image until it is fully received, at which point it sends the data to the helper for transcoding. The helper returns the transcoded image, or the original data if transcoding fails. The module kills and restarts the helper if the transcoding library fails, and also limits the number of images waiting for transcoding if the helper can not satisfy the incoming rate of images. The module uses the DR_FDPoll functions to communicate with the helpers, the DR_Header functions to modify the response, and the DR_RespBodyInject function to inject content into an active connection.
Text Injector - The main module scans the response to find the end of the HTML head tag, and then calls out to a helper process to determine what text should be inserted into the page. The helper process currently only responds with a text line containing the client IP address, but since it operates asynchronously, it could conceivably produce targeted information that takes longer to generate. The module passes data back to the client as it scans the HTML, so very little delay is introduced. For reasons similar to the case of the Image Transcoder module, the DR_FDPoll, DR_Header, and DR_RespBodyInject functions are all invoked.
Content Manager - This demonstration module accepts local telnet connections on the machine and presents an interface to the DR_Obj content management functions. The administrator can query URLs, force remote fetches, revalidate objects, and delete objects. Object contents can also be displayed, and dummy object data can be forced into the cache. The module uses the DR_FDPoll family of functions to perform all processing in callback style even while waiting on data from network connections.
ICAP Client - This module implements the ICAP 1.0 draft for interaction with external servers that provide value-added services . The module must encapsulate HTTP requests and responses in ICAP requests, send those requests to ICAP servers, retrieve and parse responses, and send forth either the original HTTP message or the modified message provided by the ICAP server. All of this processing can be implemented through callbacks with the assistance of the polling functions for network event notification. Implementing ICAP as a module rather than an integrated part of the proxy core is particularly appropriate as ICAP specifications continue to evolve.