By recording all CPU time values, we find that the largest call times are for the fork() system call and that its cost grows with the number of invocations, approaching 130 msec. Figure 5 shows the per-call time as a function of invocation. We observe that fork() time increases as the program runs, starting as low as 0.3 msec. These calls stem from the SpecWeb99 workload's requirement that 0.15% of the requests be handled by forking new processes.
A full call trace indicates that fork() spends the bulk of its time copying file descriptors and VM map entries (for mapped regions). Rather than changing the implementation of fork(), we opt to slightly modify the Flash architecture. We introduce a new helper process that is responsible for creating the CGI processes. Since this new process does not map files or cache open files, its fork() time is not affected by the main process size. This change yields a 10% improvement, to 440 simultaneous connections and a 1.50GB dataset size.