Next: Macrobenchmark: Kernel Compilation
Up: Performance Overhead
Previous: Performance Overhead
Microbenchmark: LMBench
We used LMBench [31] for microbenchmarking. LMBench
was developed specifically to measure the performance of core kernel
system calls and facilities, such as file access, context switching, and
memory access. LMBench has been particularly effective at establishing
and maintaining excellent performance in these core facilities in the
Linux kernel.
Table:
LMBench Microbenchmarks, 4 processor machine
Process tests, times in useconds, smaller is better: |
|
|
|
% Overhead |
Test Type |
2.5.15 |
2.5.15-lsm |
with LSM |
null call |
0.49 |
0.48 |
-2.0% |
null I/O |
0.89 |
0.91 |
-2.2% |
stat |
5.39 |
5.49 |
1.9% |
open/close |
6.94 |
7.13 |
2.7% |
select TCP |
39 |
41 |
5.1% |
sig inst |
1.18 |
1.19 |
0.8% |
sig handl |
4.10 |
4.09 |
-0.2% |
fork proc |
187 |
187 |
0% |
exec proc |
705 |
706 |
0.1% |
sh proc |
3608 |
3611 |
0.1% |
|
File and VM system latencies in useconds, |
smaller is better: |
|
|
|
% Overhead |
Test Type |
2.5.15 |
2.5.15-lsm |
with LSM |
0K file create |
73 |
73 |
0% |
0K file delete |
8.545 |
8.811 |
3.1% |
10K file create |
142 |
143 |
0.7% |
10K file delete |
25 |
27 |
8% |
mmap latency |
4874 |
4853 |
-0.4% |
prot fault |
0.974 |
0.990 |
1.6% |
page fault |
4 |
5 |
25% |
|
Local communication bandwidth in MB/s, |
larger is better: |
|
|
|
% Overhead |
Test Type |
2.5.15 |
2.5.15-lsm |
with LSM |
pipe |
537 |
542 |
-0.9% |
AF Unix |
98 |
116 |
-18.4% |
TCP |
257 |
235 |
8.6% |
file reread |
306 |
306 |
0% |
mmap reread |
368 |
368 |
0% |
bcopy (libc) |
191 |
191 |
0% |
bcopy (hand) |
148 |
151 |
-2% |
mem read |
368 |
368 |
0% |
mem write |
197 |
197 |
0% |
|
Table:
LMBench Microbenchmarks, 1 processor machine
Process tests, times in useconds, smaller is better: |
|
|
|
% Overhead |
Test Type |
2.5.15 |
2.5.15-lsm |
with LSM |
null call |
0.44 |
0.44 |
0% |
null I/O |
0.67 |
0.71 |
6% |
stat |
29 |
29 |
0% |
open/close |
30 |
30 |
0.5% |
select TCP |
23 |
23 |
0% |
sig inst |
1.14 |
1.15 |
0.9% |
sig handl |
5.23 |
5.24 |
0.2% |
fork proc |
182 |
182 |
0% |
exec proc |
745 |
747 |
0.3% |
sh proc |
4334 |
4333 |
0% |
|
File and VM system latencies in useconds, |
smaller is better: |
|
|
|
% Overhead |
Test Type |
2.5.15 |
2.5.15-lsm |
with LSM |
0K file create |
96 |
96 |
0% |
0K file delete |
31 |
31 |
0% |
10K file create |
157 |
158 |
0.6% |
10K file delete |
45 |
46 |
2.2% |
mmap latency |
3246 |
3158 |
-2.7% |
prot fault |
0.899 |
1.007 |
12% |
page fault |
3 |
3 |
0% |
|
Local communication bandwidth in MB/s, |
larger is better: |
|
|
|
% Overhead |
Test Type |
2.5.15 |
2.5.15-lsm |
with LSM |
pipe |
630 |
597 |
5.2% |
AF Unix |
125 |
125 |
0% |
TCP |
222 |
220 |
0.9% |
file reread |
316 |
313 |
0.9% |
mmap reread |
378 |
368 |
2.6% |
bcopy (libc) |
199 |
191 |
4% |
bcopy (hand) |
168 |
149 |
11.3% |
mem read |
378 |
396 |
2.6% |
mem write |
206 |
197 |
4.4% |
|
We compared a standard Linux 2.5.15 kernel against a 2.5.15 kernel with
the LSM patch applied and the default capabilities module loaded, run
on a 4-processor 700 MHz Pentium Xeon computer with 1 GB of RAM and an
ultra-wide SCSI disk,
with the results shown in Table 2.
In most cases, the performance penalty is in the experimental noise
range. In some cases, the LSM kernel's performance actually exceeded the
standard kernel, which we attribute to experimental error (typically
cache collision anomalies [24]). The 18% performance
improvement for AF Unix in Table 2 is anomalous, but
we have not identified the testing problem.
The worst case overhead was 5.1% for select(), 2.7% for open/close, and 3.1% for file delete. The open, close, and
delete results are to be expected
because the kernel repeatedly checks permission for each element of a
filename during pathname resolution, magnifying the overhead of these
LSM hooks. The performance penalty for select() stands out as an
opportunity for optimization, which is confirmed by macrobenchmark
experiments in Section 5.2.3.
Similar results for
running the same machine with a UP kernel are
shown in Table 3. One should
also bear in mind that these are microbenchmark figures; for comprehensive
application-level impact, see Sections 5.2.2
and 5.2.3.
Next: Macrobenchmark: Kernel Compilation
Up: Performance Overhead
Previous: Performance Overhead
Chris Wright
2002-05-13