Swapping application performance

Fast, preemptible reforking and one-way copy-on-write have negligible effect (less than one percent) on the execution times of our explicit-I/O applications, which use a modest amount of virtual memory. Figure 3 shows the degree by which our swapping applications benefit from these mechanisms. All four swapping applications run significantly faster compared with our baseline approach, with the speedup across applications approximately equally attributed to the two mechanisms. Moreover, while MGRID runs 60% slower with baseline speculative prefetching, fast, preemptible reforking and one-way copy-on-write eliminate this overhead.

**Figure:** Performance of speculative prefetching with fast reforking and one-way copy-on-write.
$\includegraphics[width=3in]{cow}$

Detailed information about the benefits of fast, preemptible reforking are presented in Table 3. For the three scientific applications, the improvement over our baseline approach is dramatic: total refork time is reduced by a factor of three or more. Importantly, unlike normal reforks, the average fast refork time is much shorter than an average disk access for all of the benchmarks. Faster reforking enables a high proportion of these preemptible refork attempts to complete, and provides speculative execution with more time to run before it is preempted by normal execution.

Table 3: Synchronization cost: the effect of a fast, preemptible re-fork.

Benchmark	Refork	Refork time		Refork attempts
	type	Total	Mean	Total	Completed (%)
FFTPDE	Normal	929 s	7 ms	136645	136645	(100%)
	Fast	274 s	2 ms	118581	105879	(89%)
MGRID	Normal	428 s	12 ms	37065	37065	(100%)
	Fast	144 s	3 ms	48668	39583	(81%)
MATVEC	Normal	178 s	10 ms	17669	17669	(100%)
	Fast	75 s	3 ms	24951	17083	(68%)
Sphinx	Normal	13 s	3 ms	3769	3769	(100%)
	Fast	6 s	2 ms	2465	2254	(91%)

Fast, preemptible reforking can increase the number of synchronization attempts (as with both MGRID and MATVEC) because preempted attempts will quickly be retried, the next time normal execution stalls. This mechanism increases the number of completed synchronizations, however, only for MGRID. In terms of execution time, this increase is far outweighed by the reduced refork time.

Examination of detailed application traces reveal that the improved performance of Sphinx is due to reforks being preemptible. Not only does this prevent normal execution from being needlessly delayed, but also it reduces the time during which the speculative process is runnable. Because Sphinx has a large memory footprint, leaving speculative execution non-runnable can substantially reduce memory contention, which is the main reason for degraded performance during this benchmark.

Table 4 shows that the one-way copy-on-write mechanism delivers dramatic reductions in the number of copy-on-write faults during normal execution for FFTPDE and MGRID. The performance benefit of these reductions can be seen from the results in Figure 3. These performance improvements are due not only to the direct benefit of fewer copy-on-write faults, but also to the indirect benefit of decreasing memory contention by requiring fewer page allocations. MATVEC and Sphinx gain no noticeable benefit from one-way copy-on-write because, even with this mechanism disabled, normal execution experiences very few copy-on-write faults due to speculative execution. (Each benchmark experiences some unavoidable number of copy-on-write faults as a result of write accesses within shared libraries, which can be seen from the count of copy-on-write faults when speculative execution is disabled.)

Benchmark	CoW faults
	No Specx	Basic	One-way
FFTPDE	8	391281	8
MGRID	8	179714	8
MATVEC	7	91	7
Sphinx	1386	1388	1387