Scheduling Anomalies in Current Operating Systems

Next: What Useful Scheduling Models Up: Introduction Previous: Introduction

Scheduling Anomalies in Current Operating Systems

Traditional schedulers have evolved along a path that has emphasized throughput and fairness. Their goal has been to effectively time-multiplex resources among conventional interactive and batch applications. But today, there is a growth of applications like multimedia audio and video, virtual reality, transaction processing etc. whose resource requirements are real-time in nature. Unlike conventional requests, real-time resource request need to be completed within application specific delay bounds, called deadlines, in order to be of maximum value to the application. For conventional and real-time tasks to co-exist, a scheduler has to allocate resources in such a way that real-time processes should be able to meet their deadlines, interactive jobs get good responsiveness and batch jobs should be able to make some progress. This is a very challenging problem.

Unix provides the ``nice'' system call to reduce/increase the base priority of processes so that there is forward progress in an application mix. But the values for these nice calls are often non-intuitive and many experiments have to be done to come up with the correct values. And these values are highly dependent on the application mix and have to changed every time the application mix changes.

Another problem in the current general purpose operating system is that scheduling is done on a per process basis or on a per thread basis. A user having more number of processes/threads gets more CPU time than another user running a lesser number of threads. A malicious user can hog the CPU by creating lot of processes and hence preventing the progress of other users' processes. We need a mechanism for preventing these types of ``denials of service'' by guaranteeing a fixed percentage of CPU for the users if required.

Yet another problem in the current general purpose operating systems is that network-intensive applications do most of the processing in interrupt context. Processing in interrupt context is either charged to the process that was running when the interrupt occurred or it is not charged at all. This can lead to inaccurate accounting and hence inaccurate scheduling.

Network servers have become one of the most important applications of large computer systems. Network processing in UNIX is mostly interrupt driven. Interrupt processing has strictly higher priority than user level code. This leads to interrupt live-lock or starvation when there is a high network activity.

Another problem in current general purpose operating system is in the lack of support for real time processes. Unix SVR4[2] supports static priority real-time classes, but it has been shown to be of not much use[3]. Real time classes are supported by having a global priority range for each class, and real the time class has higher priority value than the system class, which in turn have higher priority than the time shared class. Scheduling is done based on this priority. So when there is a process in the real time class, none of the system and time shared class processes are allowed to run. This model is fine for hard real time tasks, where the cost of missing a deadline is really high and efficiency is not of much concern. But this is not the right model for soft real time applications like multimedia, where the results are not catastrophic when the deadlines are not met. Also real time processes may depend on the system processes for some system services, but they are not able to get those services because of their own presence.

Nieh et al[3] studied the scheduling in SVR4 in the context of an application mix of typing, video and compute jobs. They found that no combination of assignment of different priority and classes to these applications gives a satisfactory performance to all the applications. They also found that the existence of a real time static priority process scheduler in no way allows a user to deal with problems of scheduling this application mix. They found that when using the real time class, not only do application latencies become much worse than desired, but pathologies can occur due to the scheduler such that the system no longer accepts user input.

For example, when the time shared class was used for all of the three type of application, compute bound (batch job) tasks performed well. This was due to the fact that the batch application forks many small programs to perform work, and then waits for them to finish. Since this job sleeps, the TS scheduler assumes it as an I/O intensive ``interactive job'' and provides it repeated priority boosts for sleeping.

Next: What Useful Scheduling Models Up: Introduction Previous: Introduction

Mansoor Alicherry 2001-05-01