Check out the new USENIX Web site. next up previous
Next: continuous datas. Up: BLASTH, a BLAS library Previous: 3.2 dgemm and block

4. Memory bandwidth and cache use

In this section we present some well known key points of cache use and how they impact on performances in BLAS subroutines for single and dual cpu execution, we will discuss on effect of non continuous data, false sharing, mutual exclude, data blocking, stack alignment and thread/processor affinity. All examples suppose we are using an Intel P6 class processor which suppose that L1 caches lines are 32 bytes long and L1 is 2 way set associative, reader may refer to[1] for full information on optimizing codes for Pentium processors.





Thomas Guignon
2000-08-24