Hardware View - Impact of Cache Configuration

Figure

presents the energy spent in caches and memory for different cache configurations for javac and db. From these graphs, we can make the following observations. First, when the cache size is increased, we observe a decrease in the overall energy consumption up to a certain cache size. However, beyond a point the working set for instruction or data are contained in the cache, and a size larger than this does not help in improving the locality but does increase the per access cost due to the larger address decoder and larger capacitive load on the cache bit lines. A similar trend is also observed when we change the associativity for a fixed cache size, where increasing the associativity aggressively brings diminishing returns in energy saving. This effect is due to more complicated tag-matching hardware required to support higher associativities. Second, we observe that the instruction accesses seem to take advantage of larger caches better than data accesses. For example, in javac with the interpreter mode, when we move from a 4K direct-mapped cache to a 128K direct-mapped cache, the instruction memory energy drops from 132.8J to 17.0J. On the other hand, except for the move from 4K to 8K, the data memory energy does not vary significantly across different cache configurations for the dataset size that is used in these experiments (s1). Finally, as far as the general energy trend is concerned, the JIT mode behaves similar to the interpreter mode except for the fact that the actual energy values are much smaller, less than half typically, and in some cases the cache configuration that results in the minimum energy consumption is different in the JIT mode from that of the interpreter mode.

It should be noted that although the number of memory accesses in the interpreter mode is higher than the JIT mode, the memory footprint of the former is smaller [23]. The increase in memory footprint for JIT compiler can be due to the additional libraries required for the JIT optimizations and dynamic code installation. For example, the SPARC and Intel versions of the JIT compiler proposed in [7] themselves require 176Kbytes and 120Kbytes. The influence of extra space required for compiled code in JIT mode is found to require 24% more memory space as compared to interpreter mode for the SPEC JVM98 benchmarks, on the average [23]. Consequently, in embedded environments where we have the flexibility of designing custom memory for a given application code [5,2], we can potentially use a smaller memory for the interpreter mode of operation. In order to capture the effects of lower memory overheads due to the absence of dynamic compilation overheads, we scale the memory size of the interpreter relative to that of the JIT compilation mode. It must be noted that a smaller physical memory will incur less energy due to smaller decoders and less capacitive loading on the bitlines. We will assume that the energy cost of a memory access decreases linearly with memory size for the purposes of this comparison.

Figure

gives the total energy consumptions for db for different ratios of memory footprint between the interpreter and JIT compiler. For the purposes of this approximation, we have neglected the effect of increased garbage collection overhead which would result when reducing the memory size. The way to interpret the graph is as follows. In db, the memory overhead of the JIT mode needs to be at least 1.67 (1/.6) times, (corresponding to 0.6 scaling factor) more than that of the interpreter mode before the interpreter becomes preferable from the energy viewpoint. Until then, JIT is preferable. The observed expansion in data segment for the JIT compilation mode is limited to 24% on an average [23] and the overhead of current JIT compilers is much smaller than the heap size (24 megabytes) needed for both modes. Hence, while one might think that reducing the memory size makes interpretation more attractive, the above observations show that the size expansion in JIT compilation mode is not significant enough to influence optimal energy consumption choice, even neglecting the increased GC overhead. However, if this footprint expansion becomes too large due to some JIT optimizations [38,28,21] that increase code size (e.g., in-lining), or the compiler becomes much larger compared to other resources such as heap space required in both modes one may need to re-evaluate this trade-off and select the suitable execution mode during memory system construction for an embedded device where physical memory space is limited.

Figure: Relative energy consumption of interpreters as compared to JIT Compiler. The memory size for the interpreter is varied relative to that of the JIT Compiler to capture the differences in the overheads associated with the storage associated with different compilers and the code expansion that can occur during native code generation and installation. An instruction cache and a data cache, both 32 KB, two-way associative with 32 byte block size, are used.

Main memory has long been a major performance bottleneck and has attracted a lot of attention (e.g., [26]). Changes in process technology have made it possible to embed a DRAM on the same chip as the processor core. Initial results using embedded DRAM (eDRAM) show an order of magnitude reduction in energy cost per access [26]. Also, there have been significant changes in the DRAM interfaces that can potentially reduce the energy cost of external DRAMs. For example, unlike conventional DRAM memory sub-systems that have multiple memory modules that are active for servicing data requests, the direct RDRAM memory sub-system delivers the full bandwidth with only one RDRAM module active. Similarly, new technologies such as magnetic RAMs consume less than one hundredth the energy of conventional DRAMs [33]. Also, based on the particular low power operating modes that are supported by memory chips and based on how effectively they are utilized, the average energy cost per access for external DRAMs can be reduced by up to two orders of magnitude [8]. In order to study the influence of these trends, we performed another set of experiments using four different E_m values: $4.95\times$ 10^-9J (our default value), $2.45\times$ 10^-9J, $2.45\times$ 10^-10J, and $4.95\times$ 10^-11J. Each of these value might represent the per-access cost for a given memory technology. Figure

shows the normalized energy consumptions (with respect to the interpreter mode with the default E_m value). We observe that the ratio of the total memory energy consumed by the interpreter mode to that of the JIT mode varies between 1.05 (2.07) and 1.80 (2.85) for db (javac) depending on the E_m value used. We also observe that the relative difference between energy consumed in interpreter mode and JIT mode increases as E_m reduces due to better technologies. For example, the energy consumed in JIT compilation mode is half of that consumed in the interpreter mode for most energy-efficient memory while it is around 70% of interpreter energy for most energy consuming configuration when executing db. This indicates that the even when the process technology significantly improves in the future, the JIT will remain the choice of implementors from an energy perspective.

Figure: Overall memory system energy consumption with different values of E_m for the interpreter and the JIT mode.