Software View - Energy Distribution across Software Components

Figure

gives the energy distribution for the software components in both the interpreter and JIT modes. For example, jack executing the interpreter mode, the instruction accesses consume 60J and data accesses consume 232J. The corresponding energy numbers for the JIT mode are much lower at 10J and 20J respectively. These results are in consonance with the better locality of instruction accesses in the interpreter mode as discussed earlier. In the interpreter mode almost all the energy is spent in the interpretation and GC and class loading were found to be less than 2% of the overall energy consumption. Although execution takes the largest amount of energy in the JIT mode, the dynamic compilation also consumes a significant amount of energy. This is due to two main reasons [23]. First, there are abrupt changes in the working set during dynamic compilation as the code and data structures used by the compiler are different from that for the rest of the JVM. Thus, when we move to the code generation phase, we experience poor locality in the cache (data and instruction) accesses, and this in turn causes more references to the memory (both Imemory and Dmemory). Second, when code is installed after dynamic compilation, it causes references to main memory. We observe that (in the JIT mode), on an average, the dynamic compilation consumes 24% of the overall energy across the benchmarks. Figure

(d) breaks down the energy consumption by the hardware components during dynamic compilation and shows that Imemory and Dmemory are responsible for the bulk of the overhead.

Figure: Energy distribution based on software components. Instruction access energy involves Icache energy and Imemory energy, and data access energy involves Dcache energy and Dmemory energy. In the interpreter mode, the class loading and GC portions were small as compared to the actual interpretation of the code and hence are not shown separately in this breakdown. In the JIT mode, this component is mainly comprised of the energy spent in executing the native code after compilation.

In the rest of this discussion, we focus only on the JIT mode since the energy consumption for the interpreter is dominated entirely by the interpretation. Figure

gives the energy breakdown of javac into different software components with different cache configurations. We observe that (as opposed to class loading and garbage collection) the dynamic compilation and execution can take advantage of larger cache sizes. Data from other experiments [22] show that the energy consumption during loading is mainly dominated by compulsory misses. Hence, the number of total misses during loading is fairly constant across different cache configurations. However, there are small variations in energy consumption with changes in cache configuration as the per-access energy cost is affected not only by number of accesses but also by the energy cost of the tag-matching hardware and the capacitive load of bit lines. As can be observed in class loading profile in Figure

(a), most of the energy is consumed by the data memory. It should be noted that some Java environments may be running multiple applications concurrently, in which some of the class loading costs can be amortized over the different applications [10].

Figure: Energy breakdown for different components. Note that the Y-axis scales for the different graphs are different. The Icache and Dcache are 32 KB two-way set associative and have 32 byte block size.

We see from Figure

that the garbage collector consumes a very small fraction of the energy. Its energy consumption due to data accesses is higher than that due to instruction accesses as the garbage collector code itself is very small (i.e., good Icache locality) but the data accessed by the GC has a relatively poor locality. In fact, our detailed analysis shows that most of the energy expended in data memory is a result of the the cache misses. More innovation in improving the data locality of garbage collection will be valuable from energy perspective. While the absolute energy consumed by the garbage collector is small compared to overall execution in these experiments, we believe that the need for more aggressive garbage collection for limited memory embedded systems will make this component more important. It must be noted that the energy consumed in the garbage collection portion is also influenced by the choice of the algorithm and the size of the heap. The size of the heap can influence the number of times the garbage collector is invoked. For example, when we varied the heap size from 24M to 8M, the energy consumed by garbage collection increases eight fold when executing mtrt (s100 dataset and JIT compilation mode). The dataset of the application can also influence the energy consumed by the garbage collector. As an example, we found that the GC is responsible for nearly 14% of total data misses for s100 data set (compared to 7% with s10) in the JIT mode, for javac, contributing to the overall energy more significantly. More detailed analysis of these tradeoffs in garbage collection energy consumption is beyond the scope of this work and is an interesting area of research in itself.

The execution of compiled code consumes the major chunk of the energy and Figure

shows the energy distribution for the different hardware and software components of the JIT mode. Overall, observing the trends shown in Figure

, it is interesting to note that different applications in SPEC JVM98 exhibit different energy behaviors. For instance, while mtrt consumes the maximum energy during the execution phase, its energy consumption is smaller than that of compress during loading, garbage collection, and dynamic compilation. The energy consumption in different software components is a function of the number of classes loaded, the size of the classes, the number of methods compiled, the number of times a method is invoked after compilation, the heap size determining the frequency of GC invocation, the size of data set and the heap allocation, and memory access behavior during execution. Since the actual execution of the compiled code is the dominant component, we need to focus on developing techniques to reduce the energy consumed by this component. Optimizations during the JIT compilation phase (e.g., [28,38]) can also potentially improve the energy efficiency of the execution phase, sometimes at the cost of increasing energy consumption due to dynamic compilation itself.

Finally, we would like to emphasize that the energy behavior in the different portions of the JVM is also dependent on the dataset size. We observe from Figure

that the share of class loading and dynamic compilation are comparatively smaller for the s100 dataset as compared to s10 dataset.

Figure: Energy distribution (based on software components) for five of the benchmarks for (a) s10 (b) s100 dataset. 16K 2-way associative data and instruction cache were used.