We believe that our evaluation of dynamic speed and voltage setting algorithms to be the first such empirical evaluation - to our knowledge, all previous work from different groups has relied on simulators [7,6,5,11,12]; none modeled a complete pocket computer or the workload likely to be run on it.
Weiser et al. [7] proposed three algorithms, OPT, FUTURE, and PAST and evaluated them using traces gathered from UNIX-based workstations running engineering applications. These algorithms use an interval-based approach that determines the clock frequency for each interval. Of the algorithms they propose, only PAST is feasible because it does not make decisions using future information that would not be available to an actual implementation. Even so, the actual version of PAST proposed by by Weiser et al. is not implementable because it requires that the scheduler know the amount of work that had to be performed in the preceding intervals. This information was used by the scheduler to choose a clock speed that allows this delayed work to be completed in the next interval, if possible. For example, suppose post-processing of a trace revealed that the processor was busy 80% of the cycles while running at full speed. If, during re-play of the trace, the scheduler opted to run the processor at 50% speed for the interval, then 30% of the work could not be completed in that interval. Consequently, in the next interval, the scheduler would adjust the speed in an effort to at least complete the 30% ``unfinished'' work. Without additional information from the application, the scheduler can simply observe that the application executed until the end of the scheduling quanta, and does not know the amount of ``unfinished'' computing left. Because most pocket computer applications do not provide a means for the processor to know how much work should be done in a given interval, the PAST algorithm is not tractable for such systems.
The early work of Weiser et al. has been extended by several groups, including [6,12]. Both of these groups employed the same assumptions and the same traces used by Weiser. Govil et al. [6] considered a large number of algorithms, while Martin [12] revised Weiser's PAST algorithm to account for the non-ideal properties of batteries and the non-linear relationship between system power and clock frequency. Martin argues that the lower bound on clock frequency should be chosen such that the number of computations per battery lifetime is maximized. While Martin correctly assumed a non-zero energy cost for idling the processor and changing clock speed, neither Govil nor Weiser did.
Both our work and that of Pering et al. [5,11] addresses some of the limitations of the above noted earlier work. In particular, we both evaluate implementable algorithms using workloads that are representative of those that might be run on pocket computers. We assess the success of our algorithms under the assumption that our applications have inelastic performance constraints and that the user should see no visible changes induced by the scheduling algorithms. By comparison, Pering et al. assume that frames of an MPEG video, for instance, can be dropped and present results which combine a combination of energy savings vs. frame rates. Our goal was to understand the performance of the different scheduling algorithms without introducing the complexity of comparing multi-dimensional performance metrics such as the percentage of dropped frames vs. power savings.
Pering et al. use intervals of 10-50ms for their scheduling calculations. In comparison to the earlier approaches presented in [7,6,12] in which work was considered overdue if it was not completed within an interval, both Pering et al. and our study consider an event to have occurred on time if delaying its completion did not adversely affect the user. However, a number of important differences exist between our work and Pering et al.. First, Pering et al. model only the power consumed by the microprocessor and the memory, thus ignoring other system components whose power is not reduced by changes in clock frequency. Second, by virtue of our work using an actual implementation, we are able to evaluate longer running applications and more complex applications (e.g., Java). By virtue of their size, our applications exhibit more significant memory behavior, and thus, expose the non-linear relationship between power and clock speed noted by Martin. Lastly, by using an actual system, our scheduling implementations were exposed to periodic behaviors that are captured by traces; for example, the Java implementation uses a 30ms polling loop to check for I/O events. This periodic polling adds additional variation to the clock setting algorithms, inducing the sort of instability we will explain in §5.3.