Check out the new USENIX Web site. next up previous
Next: 2 Current Scheduler Issue Up: Speeding Up Kernel Scheduler Previous: Abstract


1 Introduction

Linux has been widely used for enterprise applications, such as web servers and DBMS. In these environments, Linux is expected to be stable and scalable on SMP systems. However, the current Linux (2.4.x) scheduler design contains a well-known performance problem: a single global run queue protected a spin lock. The scheduler has to traverse the entire run queue to select an appropriate process to run. Therefore as the number of runnable processes increases, traversal time becomes longer. On a SMP system, long traversal time causes both the lock hold time and the lock contention to increase. This limits the scalability of the Linux system. A lot of efforts have been made to improve the scheduler performance by LSE (Linux Scalability Effort) [1]. But no effort focusing on cache and memory architecture has been made yet.

In this study, we observed and analyzed the scheduler behavior from a memory architectural viewpoint using a hardware system bus monitor on the IA32 platform. And we found that a large number of cache misses occur during the run queue traversal. To reduce these cache misses and realize faster traversing, we introduce a new solution into the Linux kernel 2.4.x: cache coloring for a task structure.

In this paper, we propose an experimental implementation for coloring which provides large performance gains under heavy workloads. This scheme can reduce cache misses during the run queue traversal and speed up the process scheduling. Furthermore, this method can reduce both the run queue lock hold time and the lock contention on a SMP system.

The rest of this paper is organized as follows: Section 2 summarizes the current scheduler issue in terms of cache efficiency by using a hardware memory bus tracer. Section 3 describes our implementation of cache coloring for a task structure. Section 4 evaluates the performance of our implementation and gives detailed analysis of bus transaction statistics. Section 5 presents the scalability improvement compared with the standard kernel. Section 6 describes related work, and finally we draw conclusions in Section 7.


next up previous
Next: 2 Current Scheduler Issue Up: Speeding Up Kernel Scheduler Previous: Abstract
Shuji YAMAMURA 2002-04-16