3.2 Our implementation for coloring

Next: 3.3 Negative effect of Up: 3 Our solution - Previous: 3.1 Implementation issue for

3.2 Our implementation for coloring

Although the task structure had the above constraints, we noticed that a small modification to the get_current() function can enable the coloring.

**Figure 5:** An example of coloring (four colorings)

Figure 4 (b) shows a new get_current(). We only inserted the single line in bold face to the original. The new get_current() returns a slightly different base address, which is shifted in multiples of cache line size. We achieved low overhead by adding only a few bit operations to the original get_current(). This is very important because this function is frequently called in the system. Note that this implementation does not require any changes to the core routines of the standard kernel scheduler.

Figure 5 illustrates the implementation of coloring for a task structure. To be easy to understand, we assume that the cache size is 32KB (which probably is smaller than a real L2 cache size) maintained by a direct mapping scheme. The kernel stacks (the task structure objects) shall be contiguously arranged on a main memory (in fact, they are not necessarily arranged continuously in this way, but are aligned on the 8KB boundary on a main memory). The current kernel stack is surrounded by the thick line in this figure.

Figure 5 (a) shows the case of the standard kernel (we call this kernel vanilla). The shaded cache blocks are missed frequently. Their memory address is given above each block. As shown in this figure, since each kernel stack is aligned on an 8KB boundary, task structures which are placed at 32KB distance addresses on a main memory are mapped to the same cache lines. If the scheduler continuously accesses these task structures (such variables as has_cpu), cache conflicts occur. For example, grayed data in four kernel stacks of 8KB, 40KB, 72KB, and 104KB in a main memory are transferred to the same line in a cache memory. Therefore, if the scheduler continuously accesses these four task structures, conflicts would occur at the shaded cache lines labeled ``8k+0x30''.

On the other hand, Figure 5 (b) shows the case of a kernel using four-coloring. When the coloring kernel calls the modified get_current(), it returns 72K+0x40. This is the base address at a 32*2 bytes (equals the size of two cache lines) different offset. In the practical implementation, the modified get_current() generates a new current value by hashing it with its upper two bits. In this way, when the scheduler continuously accesses the task structures, cache conflicts never occur.

To apply our implementation to different cache configurations, we have to be careful in our choice of address bits to copy from the masked stack pointer to create the colorized pointer to the current task structure. For colors, we would choose for copying the bits immediately higher than the bit satisfying the following equation.

$\begin{displaymath} b = \log_2 \frac{cache\ size}{cache\ associativity} \end{displaymath}$

(1)

Suppose the cache configuration is 512KB with four-way set associative, the address difference of the two blocks on a same line is always multiple of 128KB. This means the lower 17 bits of the two block addresses are always same and using these bits cannot contribute coloring. Thus, to make the cache coloring effective, bits higher than the 17th bit should be used to colorize the structure.

And, we can control the number of colorings by the number of bits to be copied. For example, in order to implement eight colorings, we should use three bits for hashing task structures (replacing ``0x60'' by ``0xe0'' in Figure 4 (b)).

The coloring kernel needs modifications in kernel source files other than the one containing get_current(). However, the patch file is only 225 lines.

Next: 3.3 Negative effect of Up: 3 Our solution - Previous: 3.1 Implementation issue for

Shuji YAMAMURA 2002-04-16