**Tags:** aes crypto

Rating:

# A2S [355 points] (10 solves)

The challenge is to recover the key of a reduced AES variant (only 2 rounds), given only

3 ciphertext-plaintext pairs. While we are also given two bytes of the key, I only noticed

this after I've already solved the challenge. I don't see how to use these to achieve a

significant speedup with my approach, though.

The following sequence of operations is performed to encrypt a block:

```python

add_round_key(plain_state, self._key_matrices[0])

sub_bytes(plain_state) # nonlinearity only here...

shift_rows(plain_state)

mix_columns(plain_state)

add_round_key(plain_state, self._key_matrices[1])

sub_bytes(plain_state) # ...and here

shift_rows(plain_state)

mix_columns(plain_state)

add_round_key(plain_state, self._key_matrices[2])

```

However, `sub_bytes` is a byte-wise transformation, while `shift_rows` simply performs

the bytes. We can therefore write it down in this order:

```python

add_round_key(plain_state, self._key_matrices[0])

shift_rows(plain_state) # now before the sub_bytes

sub_bytes(plain_state)

mix_columns(plain_state)

add_round_key(plain_state, self._key_matrices[1])

sub_bytes(plain_state)

shift_rows(plain_state)

mix_columns(plain_state)

add_round_key(plain_state, self._key_matrices[2])

```

The `mix_columns` at the end was explicitly added by the author of the challenge. I don't

see why, as it is an affine operation, meaning:

```

mix_columns(S ^ K) = mix_columns(S) ^ f(K)

```

(what `f` is exactly is not relevant to the solution, but it is the linear part of the

affine transformation)

This allows out to swap the final `mix_columns` with `add_round_key`, if we're willing

to imagine a different key expansion procedure.

Long story short, we can calculate the XOR between internal states at the beginning and

end of this part of the cipher:

```python

sub_bytes(plain_state)

mix_columns(plain_state)

add_round_key(plain_state, self._key_matrices[1])

sub_bytes(plain_state)

```

Namely, the following code does the trick:

```python

def shift(st):

mat = bytes2matrix(st)

shift_rows(mat)

return matrix2bytes(mat)

def unmix(st):

mat = bytes2matrix(st)

inv_mix_columns(mat)

inv_shift_rows(mat)

return matrix2bytes(mat)

pre_delta1 = shift(xor_bytes(plaintexts[0], plaintexts[1]))

pre_delta2 = shift(xor_bytes(plaintexts[0], plaintexts[2]))

post_delta1 = xor_bytes(unmix(ciphertexts[0]), unmix(ciphertexts[1]))

post_delta2 = xor_bytes(unmix(ciphertexts[0]), unmix(ciphertexts[2]))

```

Notice how each of these four operations are either byte-wise or column-wise:

```python

sub_bytes(plain_state) # byte-wise

mix_columns(plain_state) # column-wise

add_round_key(plain_state, self._key_matrices[1]) # byte-wise

sub_bytes(plain_state) # byte-wise again

```

We will therefore consider this as four independent transformations of each of the 32-bit columns.

Let's look at the state at this moment:

```python

# Call the state here P1, P2, P3 for each of the pt-ct pairs.

sub_bytes(plain_state)

mix_columns(plain_state)

add_round_key(plain_state, self._key_matrices[1])

# Call the state here S1, S2, S3 for each of the pt-ct pairs.

sub_bytes(plain_state)

# Call the state here C1, C2, C3 for each of the pt-ct pairs.

```

When you look at it bytewise, `(S1 ^ S2, S1 ^ S3)` has very little possibilities.

1. There are 256 possible values for `C1[i]`.

2. This determines `C2[i]` and `C3[i]` because of the known `post_delta`.

3. The inverse S-box determines the possible values for `S1[i]`, `S2[i]`, `S3[i]`.

```rust

fn expected_deltas(d1: u8, d2: u8) -> DeltaSet {

let mut deltas = DeltaSet(BitArray::zeroed());

for a in 0..=255 {

let p = INV_SBOX[a as usize] ^ INV_SBOX[(a ^ d1) as usize];

let q = INV_SBOX[a as usize] ^ INV_SBOX[(a ^ d2) as usize];

deltas.add([p, q]);

}

deltas

}

// ...

let expected: [DeltaSet; 4] = array_init(|i| expected_deltas(post_delta1[i], post_delta2[i]));

```

The reason why we're interested in the deltas is that `add_round_key` won't change them. Now we just

bruteforce the input P1, determine P2 and P3, calculate `sub_bytes` and `mix_columns`, and check

whether the differences match what we expect. The chance of a false positive for any specific attempt

is `2**-32`, so we should expect about one spurious result. And indeed, this is the result produced:

```

067c20d3

1f473b29

--------

8a151475

d441a30c

--------

d4eb9f89

34a72235

--------

a0227d5b

1e89501b

f669b656

336f0e52

--------

real 2m29.248s

user 9m52.629s

sys 0m0.020s

```

Internally, I am representing the `DeltaSet` as a bit array of `2**16` bits. After the CTF, I've also

tried storing a sorted array and either binary or linear searching, but as I estimated, the bit array

is fastest, as memory bandwidth is nowhere near saturated.

All that's left to do is try out the 32 candidates, compute the corresponding key for each, and see if

the ciphertexts match. Calculating said key essentially amounts to a single XOR.

Original writeup (https://github.com/p4-team/ctf/tree/master/2021-05-28-pwn2win/a2s).