After a fork, the memory image of the parent must be reproduced in the child. Any private mappings for a file cause a snapshot of that file at the time of the fork to be produced in the child. Any shared mappings cause a shared mapping to be created in the child with modifications made by the parent to be replicated in the child, with the possible caveat of files mapped in read-only but modified by the debugger through the use of ptrace.
Traditionally the operation of reproducing these mappings is done in the kernel. However, in K42, before starting the child, the parent reproduces these mappings with the help of a few underlying K42 kernel services. K42 provides the ability to create a new non-executing process, the ability to make memory mappings in it, and the ability for a region and associated objects to be fork copied. In K42, a contiguous piece of memory is represented by a region, which in turn is backed by a File Cache Manager (FCM) that caches the in-core pages, and a File Representative (FR) that maps all the pages associated with a file. Figure 4 illustrates a standard fork copy. Prior to the fork (before), the three kernel structures (FR, FCM, Region) represent a contiguous portion of the memory in the parent. After the fork (after), the upper-left three kernel structures still represent the portion of memory in the parent, and the upper-right three structures represent the same portion of memory in the child. All the frames backing the region are immediately transferred to an internal FCM (the root FCM in this figure) and lazily copied back to the parent and child as faults occur.
As additional forks occur, a binary fork-tree is produced. When nodes on the tree are removed because the process for which they are representing memory finishes or is terminated, it is important to ``collapse'' the tree, otherwise inefficiencies result because page-fault processing needs to traverse the unnecessarily long chain of nodes. The standard way to address this is to acquire a lock and walk the tree performing the collapse. Because K42 is programmed with independent object instances representing each of the regions, and with a programming model not allowing the acquisition of a single lock across multiple object instances, we acquire and release a fine-grain lock for each node in the tree. The FCM objects therefore need to be able to handle in-flight page fault requests occurring during the collapse operation. Although this does make the programming more complicated, fine-grain locking schemes provide better scalability.
The collapse algorithm begins when a node in the tree needs to be removed. That node removes itself and it tells its sibling to combine with their parent. If the sibling is not a leaf node then all frames are transferred to the parent and the sibling node removes itself. If the sibling is a leaf node then no action is taken. To prevent a chain from forming in this case, on the next fork copy, the new child is made a child of the existing parent (no new parent is created). For this algorithm to work, frames are not copied to an internal child node (except in one case for optimization purpose only), instead they are copied to the leaf node requesting the page. As examples, two standard scenarios are illustrated in Figure 5. In the first, node 4 is being deleted. This results in a collapse with node 5 becoming the parent of nodes 1 and 2. In the second, node 2 is being deleted. As above, because it is a leaf node, its removal does not induce a collapse.
In the normal fork/exec case we do not perform any collapse. The algorithm is in place to allow K42 to continue to perform acceptably in scenarios that would generate long fork-tree chains. This algorithm provides efficient collapsing of fork trees and does not require a global lock.