Check out the new USENIX Web site. next up previous
Next: Speedups Up: Performance Evaluation Previous: Basic Costs in MILLIPAGE

Applications

This section presents the results of parallel execution of five benchmark applications on the MILLIPAGE system. Our application suite consists of: Water-nsquad (WATER) and LU-contiguous (LU) from SPLASH-2 [25]; Integer-Sort (IS) from the NAS parallel benchmarks [2]; Successive Over Relaxation (SOR) and the Traveling Salesperson Problem (TSP) from the Treadmarks [13] benchmark applications.

Table 2 summarizes application information such as data sets, shared memory size, and the sharing granularity. As can be seen, different applications naturally use minipages of different sizes, which in turn dictates the number of views as explained earlier in Section 2.


 
Table 2: Application Suite.
  Input Set Shared Num. Sharing Barr. Locks
    Mem. Size views Granularity    
SOR 32768x64 matrices 8 MB 16 a row, 256 bytes 21 -
IS 223 numbers, 29 values 2 KB 8 256 bytes 90 -
WATER 512 molecules 336 KB 6 a molecule, 672 bytes 29 6720
LU 1024x1024 mat., 32x32 blocks 8 MB 1 a block, 4 KB 577 -
TSP 19 cities, recursion level 12 785KB 27 a tour, 148 bytes 3 681
 

The code for memory allocation in three of the applications was slightly modified in order to equate the allocations and the sharing units.

In the original code for WATER, all the molecules are stored in a single array (VAR) and are referenced via pointers. We altered the main function so that each molecule will be allocated separately.

IS allocates a shared portion of memory where the keys reside. The array is relatively small and is divided into regions of equal size where each host is in charge of another region. We modified the allocation routine to have these regions allocated separately and thus reside in different minipages.

TSP allocates a global memory structure that contains an array of tours. Each tour (TourElement) is of size 148 bytes and each tour is manipulated exclusively by one of the tasks. We extracted the array out of the global memory structure, leaving there only a pointer. We then allocated each tour independently so that each one resides in a separate minipage.

There was no need to modify SOR, as it uses a matrix which is allocated row by row. The granularity of a row is suitable as the sharing unit, so the size of a row may determine that of a minipage. Similarly, it was not necessary to modify LU, as it builds a matrix by allocating sub-blocks, each of size $32\times{}32\times\vert int\vert=4\mbox{KB}$. Since the granularity of these sub-blocks is suitable as the sharing unit, the size of a minipage may be set equal to that of a 4KB page.



 
next up previous
Next: Speedups Up: Performance Evaluation Previous: Basic Costs in MILLIPAGE
Ayal Itzkovitz and Assaf Schuster, The Technion