Our general approach maps the problem of distributed storage management to flow control in networks. TCP running at a host implements flow control based on two signals from the network: round trip time (RTT) and packet loss probability. RTT is essentially the same as IO request latency observed by the IO scheduler, so this signal can be used without modification.
However, there is no useful analog of network packet loss in storage systems. While networking applications expect dropped packets and handle them using retransmission, typical storage applications do not expect dropped IO requests, which are rare enough to be treated as hard failures.
Thus, we use IO latency as our only indicator of congestion at the
array. To detect congestion, we must be able to distinguish
underloaded and overloaded states. This is accomplished by
introducing a latency threshold parameter, denoted by
.
Observed latencies greater than
may trigger a reduction in
queue length. FAST TCP, a recently-proposed variant of TCP, uses
packet latency instead of packet loss probability, because loss
probability is difficult to estimate accurately in networks with high
bandwidth-delay products [15]. This feature also helps in
high-bandwidth SANs, where packet loss is unlikely and TCP-like AIMD
(additive increase multiplicative decrease) mechanisms can cause
inefficiencies. We use a similar adaptive approach based on average
latency to detect congestion at the array.
Other networking proposals such as RED [9] are based on early detection of congestion using information from routers, before a packet is lost. In networks, this has the added advantage of avoiding retransmissions. However, most proposed networking techniques that require router support have not been adopted widely, due to overhead and complexity concerns; this is analogous to the limited QoS support in current storage arrays.
Ajay Gulati 2009-01-14