The most significant effect of this blocking behavior is unnecessary delays in serving queued requests. In particular, cached requests that could have been served in memory and with low latency are forced to wait on disk-bound requests, similar to the priority inversion problem in scheduling. We term this phenomenon ``service inversion'' since the resulting latencies would be inverted compared to the ideal latencies. In this section, we study this phenomenon and propose an approach to quantify the service inversion value.
Since certain request processing steps operate independently of the server process, any blocking that occurs early in request processing can affect the system's fairness policies. Specifically, the networking code is split in the kernel, with the sockets-related operations occurring in the ``top half'', which is invoked by the application. The ``bottom half'' code is driven by interrupts, and performs the actual sending of data. So, when an application is blocked, any data that has already been sent to the networking code can still operate in the kernel's ``bottom half.'' Likewise, since the disk helpers in Flash operate as separate processes, they can continue to operate on their current request even when the main process is blocked.
Head-of-line blocking in the literature is usually studied in the network scheduling context. To understand the blocking scenario in the OS and how it causes service inversion, consider the scenario in Figure 9, where three requests arrive simultaneously, with the middle request causing the process to block. Assume it is blocked by an open() call, which takes place before the data reads occurs (if needed) and before any data is sent to the networking code. If the first and third requests are cached, they would normally be served at nearly the same time. However, the first request may get sent to the networking code, and the third request would then have to wait until the process is unblocked. The net effect is that the third request suffers from head-of-line blocking. The system's fairness policies, particularly the scheduling of network packets, are not given a chance to operate since the three requests do not reach the networking code at the same time.
If the requests before the blocked requests are larger than the ones that follow, we label the resulting phenomenon service inversion. The occurrence of this behavior is relatively simple to detect at the client - the latencies for small requests would be higher than the latencies for larger requests.