Our earlier work on developing the DeBox tool (17) identified the call sites in Flash where blocking occurred, but did not investigate the mechanisms by which it occurred. Among the problems, we identified that the Flash server would sometimes block in the ``find file'' step of the HTTP processing pipeline shown in Figure 3. This step involves performing a series of open() and stat() calls to traverse the URL's components in the filesystem. This blocking was unexpected because of the way Flash opens files - it invokes a helper process to perform the steps first, and then the helper notifies the main process, which repeats the process. In this case, the helper had presumably just finished this process, so all of the necessary metadata should have been memory-resident when the main process performed the same actions. This blocking occurs even if the filesystem is mounted asynchronous or read-only, ruling out synchronous metadata writes.
|
|
Further investigation reveals that the metadata locking problem is due to lock contention during disk access. In particular, we find that one of the problems is lock contention when the main process and the helper access a shared file path. When this happens, the helper usually is doing disk I/O but still holding the vnode name lock to ensure the consistency of the corresponding entry. The decision to make this lock exclusive instead of read-only appears to be a design decision to simplify the associated code - in most types of code, the probability of lock contention would be low, so making this lock exclusive simplifies the code. We further validate this theory by confirming that the blocking occurs even when access time modifications are disabled and even when the filesystem is mounted read-only.
The problem of metadata handling is not FreeBSD-specific. We observe little lock contention in Linux but have observed metadata cache misses commonly occurring when the data set exceeds the physical memory size, causing blocking in otherwise cached requests.
|
|
The degree of this problem is significant in FreeBSD due to an interaction between a number of implementation choices. The choices and possible motivation for each are as follows:
These choice are independently reasonable, but their combination leads to the unintended blocking. In particular, if multiple processes are trying to resolve similar paths, and one blocks on an inode access, the others can block waiting on an exclusive lock for the shared parent. If more processes try to resolve the same path, they can block higher in the file tree waiting on other readers to release lower-level exclusive locks. A single inode read can then cause many readers to become unblocked, leading to a burst of activity in the form of ready processes.
The metadata locking problem also explains what occurs in Apache and why it has gone unnoticed for so long. Since Apache does not cache open file descriptors, every request processed must perform this same set of steps. The design relies on the OS's own metadata caching to avoid these steps requiring excessive disk access, but without any information about which accesses should be cached, Apache developers can not determine when blocking during an open() call is unexpected.