PARDA flow control ensures that each host obtains a fair share of storage array capacity proportional to its IO shares. However, our ultimate goal for storage resource management is to provide control over service rates for the applications running in VMs on each host. We use a fair queuing mechanism based on SFQ [10] for our host-level scheduler. SFQ implements proportional-sharing of the host's issue queue, dividing it among VMs based on their IO shares when there is contention for the host-level queue.
Two key features of the local scheduler are worth noting. First, the scheduler doesn't strictly partition the host-level queue among VMs based on their shares, allowing them to consume additional slots that are left idle by other VMs which didn't consume their full allocation. This handles short-term fluctuations in the VM workloads, and provide some statistical multiplexing benefits. Second, the scheduler doesn't switch between VMs after every IO, instead scheduling a group of IOs per VM as long as they exhibit some spatial locality (within a few MB). These techniques have been shown to improve overall IO performance [3,13].
Combining a distributed flow control mechanism with a fair local scheduler allows us to provide end-to-end IO allocations to VMs. However, an interesting alternative is to apply PARDA flow control at the VM level, using per-VM latency measurements to control per-VM window sizes directly, independent of how VMs are mapped to hosts. This approach is appealing, but it also introduces new challenges that we are currently investigating. For example, per-VM allocations may be very small, requiring new techniques to support fractional window sizes, as well as efficient distributed methods to compensate for short-term burstiness.
Ajay Gulati 2009-01-14