Next: Discussion
Up: Performance Evaluation
Previous: Application Performance
We evaluate the impact of copies, necessary as part of data transfer
when the entire shared address space cannot be registered with VIA,
by presenting a comparison between the performance of the no-copy and copy
versions in Figure 2.
We present a comparison of the execution times breakdown for both versions,
normalized with respect to the executions with copies.
We had to run the applications with problem sizes
smaller than the ones mentioned in Table 1 so that we could
use both versions with the same problem size.
The bars on the left, labeled ``NO COPY'', present the performance
results for the no-copy version, and the bars on the right,
labeled ``COPY'', present the performance results for the version
with copies. Each bar presents a percentage breakdown of the
different components which make up the execution time on a single node.
Computation time is the time spent doing application computation. Page
fetch time is the time spent in fetching a page from the home node,
on a page miss.
Lock time is the time spent in getting the lock from the current owner.
Barrier time is the time spent waiting for barrier messages from other
nodes, at the barrier. Overhead time is the time spent performing
protocol actions. Handler time is the time spent inside the handler,
servicing remote requests. Since we used only one processor on each
node, for our experiments, the handler competes for the CPU with the
application thread to service the messages received via the receive
completion queue.
The page fetch time is what increases as a result of the
additional copies at the home node and the receiving node during
page transfers.
We can see that Page Time makes up for a significant percentage
of the execution time for Barnes, FFT and Radix, and these three
applications show an improvement in performance with copy avoidance.
Although avoiding copy is good, data transfer with copies doesn't
degrade performance drastically. The performance
degradation was maximum for FFT (15%) and very little (less than 5%)
for the other applications.
Next: Discussion
Up: Performance Evaluation
Previous: Application Performance
Murali Rangarajan
2000-08-09