Check out the new USENIX Web site. next up previous
Next: The inverse MixColumn transformation Up: The public-key coprocessor based Previous: The public-key coprocessor based

The MixColumn transformation

The multiplication of columns ( MixColumn) is based on the $ {\tt xtime}$ operation as defined within the AES specification. It multiplies a byte of the so called state by 2 modulo the irreducible polynomial $ x^8+x^4+x^3+x+1$. This operation is usually performed on a byte by left shifting the byte (multiplication by 2) and, in case of overflow, xoring (addition modulo 2) with the hexadecimal value $ 0x1b$. The MixColumn transformation requires matrix multiplication in the field $ \mathbb{F}_2^8$. In an 8-bit CPU, this can be implemented in an efficient way for each column as follows:
$\displaystyle y_0$ $\displaystyle =$ $\displaystyle 02 * x_0 \oplus 03 * x_1 \oplus 01 * x_2 \oplus 01 * x_3$  
$\displaystyle y_1$ $\displaystyle =$ $\displaystyle 01 * x_0 \oplus 02 * x_1 \oplus 03 * x_2 \oplus 01 * x_3$  
$\displaystyle y_2$ $\displaystyle =$ $\displaystyle 01 * x_0 \oplus 01 * x_1 \oplus 02 * x_2 \oplus 03 * x_3$  
$\displaystyle y_3$ $\displaystyle =$ $\displaystyle 03 * x_0 \oplus 01 * x_1 \oplus 01 * x_2 \oplus 02 * x_3,$  

where $ *$ represents the $ {\tt xtime}$ operation. After re-ordering the equations we get:
$\displaystyle y0$ $\displaystyle =$ $\displaystyle 02 * x_0 \oplus 03 * x_1 \oplus x_2 \oplus x3$  
$\displaystyle y1$ $\displaystyle =$ $\displaystyle 02 * x_1 \oplus 03 * x_2 \oplus x_3 \oplus x0$  
$\displaystyle y2$ $\displaystyle =$ $\displaystyle 02 * x_2 \oplus 03 * x_3 \oplus x_0 \oplus x1$  
$\displaystyle y3$ $\displaystyle =$ $\displaystyle 02 * x_3 \oplus 03 * x_0 \oplus x_1 \oplus x2$  

The $ {\tt xtime}$ operation can be performed inside the coprocessor on the 16 bytes of the state in parallel via the following formula:
$\displaystyle {\tt xtime}($state$\displaystyle )$ $\displaystyle =$ $\displaystyle (($state$\displaystyle \& m_2) << 1) \oplus$  
    $\displaystyle ((($state$\displaystyle \& m_1) >> 7) * m_3),$  

where $ m_1 = 0x8080...80$ (16 bytes), $ m_2 = 0x7f7f...7f$ (16 bytes) and $ m_3 = 0x1b$. Here, $ *$ denotes the multiplication operation in $ \mathbb{F}_2^8$, $ \oplus $ is the addition modulo 2, $ \&$ the AND operation and $ <<$ and $ >>$ are the bit-left and bit-right shift operations respectively. The $ {\tt xtime}$ operation itself can be implemented inside the coprocessor with only two temporary registers, as shown below:
$\displaystyle t_1$ $\displaystyle =$ state$\displaystyle \& m_1$  
$\displaystyle t_1$ $\displaystyle =$ $\displaystyle t_1 >> 7$  
$\displaystyle t_1$ $\displaystyle =$ $\displaystyle t_1 * m_3$  
$\displaystyle t_2$ $\displaystyle =$ state$\displaystyle \& m_2$  
$\displaystyle t_2$ $\displaystyle =$ $\displaystyle t_2 << 1$  
$\displaystyle t_1$ $\displaystyle =$ $\displaystyle t_1 \oplus t_2$  

If the AND operation is not supported by the coprocessor, it has to be done in the standard CPU before loading the state into the coprocessor's register. Then, one has to load the result of the AND operations in both $ t_1$ and $ t_2$. Based on the previous definition of the $ {\tt xtime}$ operation, the whole MixColumn transformation can be defined to operate on the 16 bytes of the state in parallel. The implementation is based on the previous definition of the $ {\tt xtime}$ operation:
$\displaystyle t_1$ $\displaystyle =$ $\displaystyle {\tt xtime}($state$\displaystyle )$  
$\displaystyle %%; 02 * state
t_2$ $\displaystyle =$ $\displaystyle t_1 \oplus$   state  
$\displaystyle %%; 03 * state
t_2$ $\displaystyle =$ RotWord$\displaystyle (t_2)$  
$\displaystyle %%; 1-byte left rotation of 03 * state
t_1$ $\displaystyle =$ $\displaystyle t_1 \oplus t_2$  
$\displaystyle %%; (02 Å 03) * state
t_2$ $\displaystyle =$ RotWord$\displaystyle ($state$\displaystyle )$  
$\displaystyle %%; 1-byte left rotation of state
t_2$ $\displaystyle =$ RotWord$\displaystyle (t_2)$  
$\displaystyle %%; 2-byte left rotation of state
t_1$ $\displaystyle =$ $\displaystyle t_1 \oplus t_2$  
$\displaystyle %%; (01 Å 02 Å 03) * state
t_2$ $\displaystyle =$ RotWord$\displaystyle (t_2)$  
$\displaystyle <tex2html_comment_mark>10$   state $\displaystyle =$ $\displaystyle t_1 \oplus t_2 <tex2html_comment_mark>11$  

The total number of registers needed for the implementation of the MixColumn transformation in the coprocessor is 3, two temporal registers for the intermediate results and another for the state. The RotWord operation as defined in the AES specification has to be performed on every 4 bytes of the state independently. If it is not supported by the coprocessor, this operation must be done by the standard CPU, accessing the internal coprocessor's registers.
next up previous
Next: The inverse MixColumn transformation Up: The public-key coprocessor based Previous: The public-key coprocessor based
Roger Fischlin 2002-09-25