Tags: aes crypto

Rating:

# A2S [355 points] (10 solves)

The challenge is to recover the key of a reduced AES variant (only 2 rounds), given only
3 ciphertext-plaintext pairs. While we are also given two bytes of the key, I only noticed
this after I've already solved the challenge. I don't see how to use these to achieve a
significant speedup with my approach, though.

The following sequence of operations is performed to encrypt a block:

python
sub_bytes(plain_state) # nonlinearity only here...
shift_rows(plain_state)
mix_columns(plain_state)
sub_bytes(plain_state) # ...and here
shift_rows(plain_state)
mix_columns(plain_state)


However, sub_bytes is a byte-wise transformation, while shift_rows simply performs
the bytes. We can therefore write it down in this order:

python
shift_rows(plain_state) # now before the sub_bytes
sub_bytes(plain_state)
mix_columns(plain_state)
sub_bytes(plain_state)
shift_rows(plain_state)
mix_columns(plain_state)


The mix_columns at the end was explicitly added by the author of the challenge. I don't
see why, as it is an affine operation, meaning:


mix_columns(S ^ K) = mix_columns(S) ^ f(K)


(what f is exactly is not relevant to the solution, but it is the linear part of the
affine transformation)

This allows out to swap the final mix_columns with add_round_key, if we're willing
to imagine a different key expansion procedure.

Long story short, we can calculate the XOR between internal states at the beginning and
end of this part of the cipher:

python
sub_bytes(plain_state)
mix_columns(plain_state)
sub_bytes(plain_state)


Namely, the following code does the trick:

python
def shift(st):
mat = bytes2matrix(st)
shift_rows(mat)
return matrix2bytes(mat)

def unmix(st):
mat = bytes2matrix(st)
inv_mix_columns(mat)
inv_shift_rows(mat)
return matrix2bytes(mat)

pre_delta1 = shift(xor_bytes(plaintexts[0], plaintexts[1]))
pre_delta2 = shift(xor_bytes(plaintexts[0], plaintexts[2]))

post_delta1 = xor_bytes(unmix(ciphertexts[0]), unmix(ciphertexts[1]))
post_delta2 = xor_bytes(unmix(ciphertexts[0]), unmix(ciphertexts[2]))


Notice how each of these four operations are either byte-wise or column-wise:

python
sub_bytes(plain_state) # byte-wise
mix_columns(plain_state) # column-wise
sub_bytes(plain_state) # byte-wise again


We will therefore consider this as four independent transformations of each of the 32-bit columns.
Let's look at the state at this moment:

python
# Call the state here P1, P2, P3 for each of the pt-ct pairs.
sub_bytes(plain_state)
mix_columns(plain_state)
# Call the state here S1, S2, S3 for each of the pt-ct pairs.
sub_bytes(plain_state)
# Call the state here C1, C2, C3 for each of the pt-ct pairs.


When you look at it bytewise, (S1 ^ S2, S1 ^ S3) has very little possibilities.

1. There are 256 possible values for C1[i].
2. This determines C2[i] and C3[i] because of the known post_delta.
3. The inverse S-box determines the possible values for S1[i], S2[i], S3[i].

rust
fn expected_deltas(d1: u8, d2: u8) -> DeltaSet {
let mut deltas = DeltaSet(BitArray::zeroed());
for a in 0..=255 {
let p = INV_SBOX[a as usize] ^ INV_SBOX[(a ^ d1) as usize];
let q = INV_SBOX[a as usize] ^ INV_SBOX[(a ^ d2) as usize];
}
deltas
}

// ...
let expected: [DeltaSet; 4] = array_init(|i| expected_deltas(post_delta1[i], post_delta2[i]));


The reason why we're interested in the deltas is that add_round_key won't change them. Now we just
bruteforce the input P1, determine P2 and P3, calculate sub_bytes and mix_columns, and check
whether the differences match what we expect. The chance of a false positive for any specific attempt
is 2**-32, so we should expect about one spurious result. And indeed, this is the result produced:


067c20d3
1f473b29
--------
8a151475
d441a30c
--------
d4eb9f89
34a72235
--------
a0227d5b
1e89501b
f669b656
336f0e52
--------

real 2m29.248s
user 9m52.629s
sys 0m0.020s


Internally, I am representing the DeltaSet as a bit array of 2**16 bits. After the CTF, I've also
tried storing a sorted array and either binary or linear searching, but as I estimated, the bit array
is fastest, as memory bandwidth is nowhere near saturated.

All that's left to do is try out the 32 candidates, compute the corresponding key for each, and see if
the ciphertexts match. Calculating said key essentially amounts to a single XOR.

Original writeup (https://github.com/p4-team/ctf/tree/master/2021-05-28-pwn2win/a2s).