Check out the new USENIX Web site. next up previous
Next: 5. Conclusions and perspectives Up: 4. Memory bandwidth and Previous: data blocking.

stack alignment and thread/processor affinity.

Stack alignment has been an issue because older gcc versions cause doubles not being aligned on an 8 bytes boundary which make access cost extra cycles. We have face this problem with level 3 BLAS that use fixed size arrays on stack for blocking resulting in poor performances. Recent gcc versions (i.e. 2.95.x) solve this problem and propose various option to control the stack alignment such as -malign-double and -mpreferred-stack-boundary=x.

Thread/processor affinity is a general issue in smp systems; the cache efficiency can be reduce if the task scheduler moves thread from one processor to another. At this time there is no way to force thread/processor affinity on a standard Linux kernel but as we said in section 2.1 the normal behavior of the Linux scheduler is to place each running process (master and slave) on 2 different processors and a kernel patch is available at https://isunix.it.ilstu.edu/~thockin/pset/ that add some control on the thread/processor binding.



Thomas Guignon
2000-08-24