Check out the new USENIX Web site. next up previous
Next: false sharing. Up: 4. Memory bandwidth and Previous: 4. Memory bandwidth and

continuous datas.

Level 1 BLAS use loops that access arrays in a sequential manner; if we suppose that an array t[n] of double is cache line aligned (for simplicity) accessing t[0] loads t[0],t[1],t[2],t[3] into one level 1 cache line, then following array cell accesses may not use memory until we access t[4]. This situation makes the ration of useful loads8 on effective loads9 be 1. By using vector increments of 2, only even cells are used which make t real size becomes 2n, and the previous ration becomes 0.5. This is a first argument to avoid cycling split of vectors for multi-threaded level 1 BLAS because master and slave processes will use more memory access to do the same job.



Thomas Guignon
2000-08-24