Next: false sharing.
Up: 4. Memory bandwidth and
Previous: 4. Memory bandwidth and
Level 1 BLAS use loops that access arrays in a sequential manner; if we suppose
that an array t[n] of double is cache line aligned (for simplicity) accessing
t[0] loads t[0],t[1],t[2],t[3] into one level 1 cache line, then following
array cell accesses may not use memory until we access t[4]. This situation makes the
ration of useful loads8 on
effective loads9 be 1. By using vector
increments of 2, only even cells are used which make t real size becomes 2n,
and the previous ration becomes 0.5. This is a first argument to avoid cycling
split of vectors for multi-threaded level 1 BLAS because master and slave
processes will use more memory access to do the same job.
Thomas Guignon
2000-08-24