Check out the new USENIX Web site. next up previous
Next: 6 Related work Up: 5.2 Result Previous: 5.2.1 WebBench

5.2.2 Chat

Figure 10 shows the throughput performance on the standard kernel (vanilla), 32 coloring kernel (c32), and multi-queue scheduler kernel (MQ).

Figure 10: Chat Performance (30 rooms, 300 messages)

As expected, c32 shows significantly better throughput than vanilla. Vanilla kernel slightly scales up to 4 CPUs, but its throughput performance starts dropping from 5 CPUs upward. On the contrary, c32 scales up to 6 CPUs which provides 89.6% throughput improvement compared to vanilla. MQ gains the best throughput among these three kernels.

As similar in WebBench, we collected information for L2 cache miss ratio and lock statistics. The results are shown in Table 7 and 8, respectively. We can see that there is a substantial reduction in L2 cache miss ratio on any number of CPUs, leading to speedup of the run queue traversal. This result shows that the lock hold time is reduced from 40us to 14us on the 8 CPUs system by the coloring scheme. The lock contention is also decreased from 85.8% on vanilla to 69.7% on c32. The run queue lock contention of Chat micro benchmark is higher than that of WebBench.


Table 7: L2 cache miss ratio during run queue traversal (Chat)
1CPU 2 CPUs 3 CPUs 4 CPUs 5 CPUs 6 CPUs 7 CPUs 8 CPUs
vanilla 99.7% 99.8% 76.4% 98.8% 64.5% 85.3% 94.4% 84.8%
32 coloring 3.2% 4.6% 2.8% 2.4% 5.8% 3.6% 2.8% 3.3%



Table 8: lock statistics for run queue_lock (Chat)
2 CPUs 3 CPUs 4 CPUs 5 CPUs 6 CPUs 7 CPUs 8 CPUs
Contention 33.0% 48.4% 51.9% 73.0% 75.0% 85.2% 85.8%
vanilla Hold Mean [us] 113 51 32 50 34 54 40
Hold Max [us] 352 196 125 186 147 270 155
Contention 23.6% 38.7% 48.3% 56.4% 60.6% 61.9% 69.7%
32 coloring Hold Mean [us] 26 25 18 19 17 13 14
Hold Max [us] 235 215 192 200 168 133 166



next up previous
Next: 6 Related work Up: 5.2 Result Previous: 5.2.1 WebBench
Shuji YAMAMURA 2002-04-16