Lack of other support in Linux kernel

Next: Other problems not unique Up: Problems for a Linux Previous: Linux Device Driver Issues

Lack of other support in Linux kernel

In addition to the above problems arising due to lack of infrastructure in the Linux kernel for layered device drivers, we have the following additional problems, due to lack of other required support in the Linux kernel. Firstly, there is a cache consistency problem that occurs in case of RAID5 writes. RAID5 writes are of two types. The full stripe writes completely bypass the buffer cache as I/O is done by creating only the buffer header structures with the data pointers appropriately set, and later releasing them. The partial stripe writes go through buffer cache as they involve a read-modify-write cycle. Thus the cache can become inconsistent. To eliminate this problem, buffers for the stripe undergoing a full stripe write need to be invalidated.

Second, Linux does not have an implementation for condition variables to allow atomic grabbing of a lock with condition checking. ilock had to be implemented to check for any overlap among the various I/Os being issued concurrently. When a request comes in, the region that needs to be locked is calculated in terms of starting and ending sector. First, a region structure is created with this information, the global lock that protects the list is then acquired followed by disabling of the interrupts. This whole exercise ensures that any addition or deletion to or from the list is atomic. Now each region structure in the list is compared for any overlap with the new structure. If no overlap is found, a lock structure is allocated and initialized. The mutex lock in the lock structure is acquired and the reference count is set to 1. Now the list lock is released followed by reenabling of the interrupts.

If an overlap occurs, then the reference count of the associated lock structure is atomically incremented, the list lock is then dropped. Now the process does a down on the mutex of the lock structure corresponding to the overlapping region. Linux reenables the interrupts when the process tries to sleep. By exploiting this feature, we can do checking of overlap and wait in an atomic fashion.

**Figure 6:** Interlock Structure
$\begin{figure}\centering \epsfig{file=EPS/lock.eps,height=2in,width=3.2in}\end{figure}$

When the process wakes up, it atomically decrements the reference count of the lock structure on which it was waiting. If it is zero, it frees the lock structure. Now it starts all over again to check for overlaps. Following is the pseudo code:

     create region struct for request
rep: grab the list lock
     disable the interrupts
     check for overlap
     if(overlap occurs) {
       release the list lock
       wait on mutex & enable interrupts
       atomically decrement the refcnt
       if zero, free lock structure
       goto rep
     } else {
       create a lock struct 
       set its refcnt=1 & lock its mutex
       insert region struct in list
       release the list lock
       enable interrupts
     }

When an I/O is done, the associated lock structure's reference count is decremented in the interrupt context and tested for zero. If it is zero, the associated lock structure is freed. The region structure associated with the done I/O is also removed from the list and freed. The pseudo code is given below:

  decr refcnt of associated lock struct
  if (lock refcnt is zero)
    free the lock structure
  else wake up those sleeping on mutex
  free the region structure in list

Next: Other problems not unique Up: Problems for a Linux Previous: Linux Device Driver Issues

Dr K Gopinath
2000-04-25