The research literature contains a large body of work related to providing quality of service in both networks and storage systems, stretching over several decades. Numerous algorithms for network QoS have been proposed, including many variants of fair queuing [8,2,10]. However, these approaches are suitable only in centralized settings where a single controller manages all requests for resources. Stoica proposed QoS mechanisms based on a stateless core [23], where only edge routers need to maintain per-flow state, but some minimal support is still required from core routers.
In the absence of such mechanisms, TCP has been serving us quite well
for both flow control and congestion avoidance. Commonly-deployed
TCP variants use per-flow
information such as estimated round trip time and packet loss at each
host to adapt per-flow window sizes to network conditions. Other
proposed variants [9] require support from
routers to provide congestion signals, inhibiting adoption.
FAST-TCP [15] provides a purely latency-based approach to
improving TCP's throughput in high bandwidth-delay product
networks. In this paper we adapt some of the techniques used by TCP
and its variants to perform flow control in distributed storage
systems. In so doing, we have addressed some of the challenges that
make it non-trivial to employ TCP-like solutions for managing storage IO.
Many storage QoS schemes have also been proposed to provide differentiated service to workloads accessing a single disk or storage array [13,16,30,14,4,25]. Unfortunately, these techniques are centralized, and generally require full control over all IO. Proportionate bandwidth allocation algorithms have also been developed for distributed storage systems [26,12]. However, these mechanisms were designed for brick-based storage, and require each storage device to run an instance of the scheduling algorithm.
Deployments of virtualized systems typically have no control over storage array firmware, and don't use a central IO proxy. Most commercial storage arrays offer only limited, proprietary quality-of-service controls, and are treated as black boxes by the virtualization layer. Triage [18] is one control-theoretic approach that has been proposed for managing such systems. Triage periodically observes the utilization of the system and throttles hosts using bandwidth caps to achieve a specified share of available capacity. This technique may underutilize array resources, and relies on a central controller to gather statistics, compute an online system model, and re-assign bandwidth caps to hosts. Host-level changes must be communicated to the controller to handle bursty workloads. In contrast, PARDA only requires very light-weight aggregation and per-host measurement and control to provide fairness with high utilization.
Friendly VMs [31] propose cooperative fair sharing of CPU and memory in virtualized systems leveraging feedback-control models. Without relying on a centralized controller, each ``friendly'' VM adapts its own resource consumption based on congestion signals, such as the relative progress of its virtual time compared to elapsed real time, using TCP-like AIMD adaptation. PARDA applies similar ideas to distributed storage resource management.
Ajay Gulati 2009-01-14