Our goal in this section is to convey the basic flavor of our algorithms (which we call WK algorithms); the actual code is available from our web site and is well-commented for those who wish to explore it or experiment with it.
We note that these algorithms were designed several years ago, when CPU's were much slower than today--they therefore stress simplicity and speed over achieving high compression. We believe that better algorithms can be designed by refining the basic modeling technique, perhaps in combination with more traditional sequence-oriented modeling, and by using more sophisticated encoding strategies. Given their simplicity, however, they are strikingly effective in our experiments.
Our compression algorithms exploit in-memory data regularities by scanning through the input data a 32-bit word at a time, and looking for data that are numerically similar--specifically, repetitions of the high-order 22-bit pattern of a word, even if the low-order 10 bits are different.3 They therefore perform partial matching of whole-word bit patterns.
To detect repetitions, the encoder maintains a dictionary of just 16 recently-seen words. (One of our algorithms manages this dictionary as a direct mapped cache, and another as a 4x4 set-associative cache, with LRU used as the replacement algorithm for each set. These are simple software caching schemes, and could be trivially implemented in very fast hardware. Due to lack of space and because the exact algorithm did not matter for compressed caching performance, we will only discuss the direct-mapped algorithm in this paper.)
For these compression algorithms to work as well as they do, the regularities must be very strong. Where a typical LZ-style compressor uses a dictionary of many kilobytes (e.g., 64 KB), our compressors use only 64 bytes and achieve similar compression ratios for in-memory data.
The compressor scans through a page, reading each word, probing its cache (dictionary) for a matching pattern, and emitting a two-bit code classifying the word. A word may
For the all-zeroes case, only the two-bit tag is written to the compressed output page. For the other three cases, additional information must be emitted as well. In the no-match case, the entire 32-bit pattern that did not match anything is written to the output. For a full (32-bit) match, the dictionary index is written, indicating which dictionary word was repeated. For the partial (22-bit) match case, the dictionary index and the (differing) low 10 bits are written.
The corresponding decompressor reads through the compressed output, examining one two-bit tag at a time and taking the appropriate action. As with more conventional compression schemes, a tag indicating no-match directs it to read an item (one word) from the compressed input, insert it in the dictionary, and echo it to the output. A tag indicating all-zeroes directs it to write a word of zeroes to its output. A tag indicating a full-word match directs it to copy a dictionary item to the output, either whole (in the full match case) or with its low bits replaced by bits consumed from the input (for a partial match).
The encoding can then be performed quickly. Rather than actually writing the result of compressing a word directly to the output, the algorithm writes each kind of information into a different intermediate array as it reads through the input data, and then a separate postprocessing pass ``packs'' that information into the output page, using a fast packing routine. (The output page is segmented, with each segment containing one kind of data: tags, dictionary indices, low bits, and full words.) For example, the two-bit tags are actually written as bytes into a byte array, and a special routine packs four consecutive words (holding 16 tags) into a single word of output by shifting and XORing them together. During decompression, a prepass unpacks these segments before the main pass reconstructs the original data.