Biham recently discovered a notable software optimization of DES which he called bitslicing [4]. By replacing DES's S-Boxes with a logic gate circuit, one can reduced DES to a set of bit operations. One can then treat a 64-bit processor as 64 parallel one-bit processors, each implementing the circuit.
On a 300MHz Alpha 8400 processor, Biham gained about a factor of 5 speedup using bitsliced DES. His implementation encrypted 137 Mb/sec on average, compared to Eric Young's libdes, which achieved only 28 Mb/sec.
For MD5 crypt the situation is similar. In ``John the Ripper'' [5] a considerable speedup was made by simplifying MD5 crypt's central computing loop.
Bitslicing relies on the fact that DES's S-boxes are fixed and well known. In contrast, Bcrypt's S-boxes change constantly over the course of the algorithm, and take on different values for every combination of password and salt. Bitslicing cannot be applied to bcrypt.