Check out the new USENIX Web site. next up previous
Next: Lack of other support Up: Problems for a Linux Previous: Problems for a Linux

   
Linux Device Driver Issues

As an example, the 2.0.x versions of Linux cannot use the concurrency provided by multiple disks if its device driver framework is strictly followed. Though concurrency exists at the hardware device level, if the same driver is used for all of them, the I/O requests for different devices will be serviced sequentially. A driver can have only one queue of device requests and hence all the requests for the devices controlled by the driver will be in the same queue. The processing of a new request is initiated only when the previous I/O is finished and in interrupt context. This problem has been eliminated in 2.2.x versions by allowing drivers to register a function which returns the pointer to the head of the queue in which the new request is to be inserted. Now a driver can maintain separate queues for each hardware device.

We face the following situations in implementing a layered device driver in Linux:

Blocking in Interrupt Context For interrupt driven block drivers, the strategy routine (request_fn) can be called from interrupt context but it cannot block. On Solaris, this is possible as it has interrupt threads. For a layered implementation, one needs to call the ll_rw_block routine from the request_fn, so that it can put the buffers in the request queue of the underlying device.

But ll_rw_block routine in Linux can block as it has a global array of request structures, and if all the slots in the array are filled then the function has to block. One solution could be to modify the ll_rw_block code so that if we cannot find a request structure, we return immediately and queue a task in schedule queue, to be executed later.

A better solution would be to make sure that we never need to call strategy routine in the interrupt context. This can be done by consuming all the requests queued to the device queue in a single invocation of the request_fn. This is so as the kernel calls the request_fn from process context only if the device queue is empty.

The solution to this problem is to design the request_fn() in such a way that it keeps on executing till all the requests in the device queue are exhausted. Thus it will always execute from the process context. One drawback of this scheme is that one process may have to delayed or blocked for I/O requested by some other process, but this is acceptable as the situation will occur only when all the request structures are exhausted which is likely to be infrequent. The pseudocode for request_fn() is as below:

tss_strategy() {
  while (1) {
    if (no request in queue) return
    remove first request from queue
    get tss dev corresp to minor# 
      in request
    call personality specific strategy
    if (error in delegating I/O)
      call end_request with buffers 
        not uptodate
  }
}

Fixed Size Buffer The buffer size for a device is fixed, unlike Solaris where we can have variable sized buffers. For example, to implement RAID5 efficiently, we need to distinguish between the full stripe write and partial stripe write as the latter involves a read-modify-write cycle. In Solaris, this is easier as one buffer can span across stripes. In Linux, each logical buffer is already split into smaller fixed sized buffers, so one has to rediscover the logical buffer to distinguish between the two cases and do the processing accordingly.

In addition, reporting of errors when they occur has to be at buffer granularity. We can keep track of errors only at the individual buffer and therefore cannot do error reporting at the stripe level. end_request If we need to use multiple queues, then the current end_request does not work. We need a new implementation.


next up previous
Next: Lack of other support Up: Problems for a Linux Previous: Problems for a Linux
Dr K Gopinath
2000-04-25