Discussion |
We found that having a tool to manipulate the HPM data is indispensable. With over sixty thousand events in one 30-second execution of pseudojbb, some kind of tool is required to organize the HPM data. Surprisingly we found that, in many cases, visualization alone was not sufficient to detect trends. Alternatives, such as selecting subsets of trace records and computing average metric values over the subset, were required.
Because the POWER4 has only 8 counters, multiple runs of pseudojbb were required to collect traces for all the HPM events. The Performance Explorer is currently designed to work on one trace file at a time. Therefore using the Performance Explorer to understand trends across all the HPM events is rather tedious as a new window has to be opened for each trace. Extending the functionality of the Performance Explorer to perform more complex computations and support for better report generation would be helpful.
The performance degradation before GC demonstrates why traces for all HPM events are required as it is difficult to always know a priori what subset of HPM events are needed.
It took us a while to determine the correct computed metrics to explore the performance issues related to memory latency. One difficulty was understanding how HPM events can be combined to compute metrics. In general, the HPM data are part of the puzzle and getting the complete picture may require additional information or experiments. We found that the HPM data can help to determine what additional information is required. For example, we used JIT configurations to understand the IPC improvement over time for the adaptive configuration of Jikes RVM.
Discussion |