Check out the new USENIX Web site. next up previous
Next: Application Performance Up: Performance Evaluation Previous: Applications

Experimental Platform

All our experiments were performed on a cluster of eight SMP PCs. Each PC contains two 300 MHz Pentium II processors. However, for this study, we used only one processor on each node. Each processor has a 512KB L2 cache and each node contains 512 MB of main memory. All nodes run Linux-2.2.10.

Each node has a Giganet cLAN NIC, which is a 32-bit 33 MHz PCI-based card. These nodes are connected by an 8-port Giganet cLAN switch. The performance characteristics for our experimental platform are reported in Table 2. Latency denotes the time taken to transfer a 1 word packet between two nodes using VIA. PostSend denotes the average time taken to post a send using VIA. The last row presents the cost of the VipRegisterMem operation used to register memory used for communication buffers in VIA.


Table 2: Giganet VIA Microbenchmarks
One-way Latency (1 word) 8.2 $\mu$s
Bandwidth (32 KB) 101 MB/s
PostSend (4 KB) 2.1 $\mu$s
RegisterMem (4 KB) 4.3 $\mu$s


We also present (Table 3) the cost of other operations or events that occur frequently in a software DSM system: page fault handler invocation, the mprotect system call, and memory copy bandwidth. The last row in Table 3 presents the time taken to copy a page(4096 bytes on the Pentium II running Linux) from memory to cache.


Table 3: Linux System Microbenchmarks
Operation (per page) Time ($\mu$s)
Page fault 6.2
Mprotect call 2.7
Memory copy 23.2


In Table 4, we present some microbenchmarks for the DSM system itself. To derive the basic cost of all these operations, these microbenchmarks were done using just two nodes. The Acquire microbenchmark gives the time to update data structures and fetch the lock from a remote node. The Release microbenchmark measures the cost of a release without any pending request for the lock. The page fetch time indicates the time to fetch a page from home without copies. The diff application time includes the time to copy the diff from the diff buffer onto the page and update the version of the page. The Barrier microbenchmark includes the time to send the barrier message to the other node, and wait for the barrier message from the other node.


Table 4: Software DSM Microbenchmarks
Operation Time ($\mu$s)
Acquire (Local, Remote) 1, 34
Release 1
Page fetch (no copy) 89
Diff Computation 24
Diff Application 22
Barrier(2-node) 17



next up previous
Next: Application Performance Up: Performance Evaluation Previous: Applications
Murali Rangarajan 2000-08-09