Semaphore Performance

Next: Semaphore Complexity Up: RCU Implementation of System Previous: Semaphore Operation

Semaphore Performance

Use of RCU improves the performance of System V semaphores as measured by both system-level benchmarks and focused microbenchmarks.

The Open Source Development Lab (OSDL) used a DBT1 benchmark to evaluate system-level performance, comparing Andrew Morton's Linux 2.5.42-mm2 both with and without ipc-rcu. These tests were run on an Intel $^{(R)}$ dual-CPU 900MHz PIII with 256MB of memory.

The raw transaction rate for each of the five runs with each kernel are shown in Figure 17. The erratic results for the stock kernel are not unusual for workloads with lock contention. The reason for this is that if the lock contention is not too extreme, relatively deterministic workloads can ``get lucky'' such that multiple CPUs happen to be less likely to be contending for the same lock at the same time. As shown in Table 4, the difference is statistically significant: not only is ipc-rcu's average three standard deviations above that of the stock kernel, but ipc-rcu's smallest value of 90.4 TPS exceeds the stock kernel's median of 87.6 TPS.

**Figure 17:** DBT1 Database Benchmark Raw Results
$\begin{figure}\begin{center} \epsfxsize =3in \epsfbox{dbt1}\end{center}\end{figure}$

Table 4: DBT1 Database Benchmark Results (TPS)

Kernel	Average	Standard
		Deviation
2.5.42-mm2	85.0	7.5
2.5.42-mm2+ipc-rcu	89.8	1.0

Bill Hartner [Hartner02] constructed a System V semaphore microbenchmark named semopbench and ran it on an Intel 8-CPU 700 MHz PIII system. The results in Table 5 clearly show the order-of-magnitude reduction in runtime obtained by applying the reader-writer-locking/RCU analogy with RCU to System V IPC mechanisms.

Table 5: semopbench Microbenchmark Results (seconds)

Kernel	Run 1	Run 2	Avg
2.5.42-mm2	515.1	515.4	515.3
2.5.42-mm2+ipc-rcu	46.7	46.7	46.7

Next: Semaphore Complexity Up: RCU Implementation of System Previous: Semaphore Operation

Paul McKenney 2003-03-28