5.2.2 Chat

Next: 6 Related work Up: 5.2 Result Previous: 5.2.1 WebBench

5.2.2 Chat

Figure 10 shows the throughput performance on the standard kernel (vanilla), 32 coloring kernel (c32), and multi-queue scheduler kernel (MQ).

**Figure 10:** Chat Performance (30 rooms, 300 messages)

As expected, c32 shows significantly better throughput than vanilla. Vanilla kernel slightly scales up to 4 CPUs, but its throughput performance starts dropping from 5 CPUs upward. On the contrary, c32 scales up to 6 CPUs which provides 89.6% throughput improvement compared to vanilla. MQ gains the best throughput among these three kernels.

As similar in WebBench, we collected information for L2 cache miss ratio and lock statistics. The results are shown in Table 7 and 8, respectively. We can see that there is a substantial reduction in L2 cache miss ratio on any number of CPUs, leading to speedup of the run queue traversal. This result shows that the lock hold time is reduced from 40us to 14us on the 8 CPUs system by the coloring scheme. The lock contention is also decreased from 85.8% on vanilla to 69.7% on c32. The run queue lock contention of Chat micro benchmark is higher than that of WebBench.

Table 7: L2 cache miss ratio during run queue traversal (Chat)

	1CPU	2 CPUs	3 CPUs	4 CPUs	5 CPUs	6 CPUs	7 CPUs	8 CPUs
vanilla	99.7%	99.8%	76.4%	98.8%	64.5%	85.3%	94.4%	84.8%
32 coloring	3.2%	4.6%	2.8%	2.4%	5.8%	3.6%	2.8%	3.3%

Table 8: lock statistics for run queue_lock (Chat)

		2 CPUs	3 CPUs	4 CPUs	5 CPUs	6 CPUs	7 CPUs	8 CPUs
	Contention	33.0%	48.4%	51.9%	73.0%	75.0%	85.2%	85.8%
vanilla	Hold Mean [us]	113	51	32	50	34	54	40
	Hold Max [us]	352	196	125	186	147	270	155
	Contention	23.6%	38.7%	48.3%	56.4%	60.6%	61.9%	69.7%
32 coloring	Hold Mean [us]	26	25	18	19	17	13	14
	Hold Max [us]	235	215	192	200	168	133	166

Next: 6 Related work Up: 5.2 Result Previous: 5.2.1 WebBench

Shuji YAMAMURA 2002-04-16