Resource management schemes of time-sharing operating systems, such as Unix [15] and Windows NT [8], often achieve acceptably low response time and high system throughput for time-sharing workloads. However, as explained in the following paragraphs, several trends make those schemes increasingly inappropriate.
First, many workloads now include real-time applications (e.g., multimedia). Unlike time-sharing applications, real-time ones must have their requests processed within certain performance bounds (e.g., minimum throughput). To support real-time applications correctly under arbitrary system load, the operating system must perform admission control and offer quality of service (QoS) guarantees: The operating system admits a request only if the operating system has set aside enough resources to process the request within the specified performance bounds.
Second, even for purely time-sharing workloads, the trend toward distributed client-server architectures increases the importance of fairness, that is, of preventing certain clients from monopolizing system resources. The fairness of time-sharing systems can be quite spotty. For example, time-sharing systems typically cannot isolate the performance of a Web site from that of other Web sites hosted on the same system. If one of the sites becomes very popular, the performance of the other sites may become unacceptably (and unfairly) poor.
Finally, the same trend toward client-server architectures also makes it necessary to manage resources hierarchically, that is, recursively allowing each client to grant to its servers part of the client's resources. For example, new Web and other user-level servers often need mechanisms for processing client requests with specified QoS and/or fairness bounds. However, time-sharing operating systems usually do not provide such mechanisms.
The mentioned shortcomings of time-sharing operating systems have motivated considerable recent work on new algorithms for CPU [14,11,13,17,6], disk [20,21,4], and network [2,3,12,23] scheduling. In particular, we recently proposed MTR-LS, a new CPU scheduling algorithm with demonstrated throughput, delay, and fairness guarantees [6]. MTR-LS is an example of a proportional share scheduler. Proportional share schedulers stand out for their sound theoretical foundations [22,11,6,2,3,23].
This paper considers the practical aspects of how to integrate proportional share schedulers into mainstream operating systems. We contribute a new application programming interface (API) for hierarchical proportional resource sharing: the /reserv file system. We discuss in detail the implementation of /reserv and several proportional share schedulers (MTR-LS, YFQ, H-WF2Q) on FreeBSD. (FreeBSD is a freely available derivative of 4.4 BSD Unix; other Unix variants, Windows NT, and other time-sharing operating systems could be similarly modified.) We call the modified FreeBSD system Eclipse/BSD, as opposed to the Eclipse/Plan 9 system used in our previous MTR-LS work [6] (where the distinction is obvious or unimportant, we will say simply Eclipse). Our experiments demonstrate how Eclipse/BSD's /reserv API and schedulers improve on FreeBSD's, providing QoS guarantees, fairness, and hierarchical resource management.
The rest of this paper is organized as follows. Section 2 describes the Eclipse/BSD resource management model and its retrofitting into FreeBSD. Section 3 discusses the scheduling algorithms used in Eclipse/BSD. Section 4 shows that Eclipse/BSD implementation requires only a modest amount of changes to FreeBSD. Experiments in Section 5 illustrate how Eclipse/BSD scheduling improves the isolation between Web sites hosted on the same system. Section 6 discusses related and future work, and Section 7 concludes.