Tags: aes zip crime 

Rating:

**Description**

> no logos or branding for this bug
>
> Take your pick nc `crypto.chal.csaw.io 8040` `nc crypto.chal.csaw.io 8041` `nc crypto.chal.csaw.io 8042` `nc crypto.chal.csaw.io 8043`
>
> flag is not in flag format. flag is PROBLEM_KEY

**Files provided**

- [`serv-distribute.py`](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-09-14-CSAW-CTF-Quals/files/flatcrypt-serv-distribute.py)

**Solution**

Let's examine the script:

```python
def encrypt(data, ctr):
return AES.new(ENCRYPT_KEY, AES.MODE_CTR, counter = ctr).encrypt(zlib.compress(data))

while True:
f = input("Encrypting service\n")
if len(f) < 20:
continue
enc = encrypt(
bytes(
(PROBLEM_KEY + f).encode('utf-8')
),
Counter.new(64, prefix = os.urandom(8))
)
print("%s%s" % (enc, chr(len(enc))))
```

Our target is `PROBLEM_KEY` and we don't know `ENCRYPT_KEY`.

Whenever we interact with the script, we have to give it at least 20 bytes of data. It then prepends `PROBLEM_KEY` to our data, uses `zlib` to compress the result, and finally encrypts that using AES-CTR with a random counter. We get to see the encrypted result and the length of that data.

So there are a few things to note here. First of all, AES-CTR is not a block cipher, it is a stream cipher - it encrypts each byte separately and hence the size of the cipher text is the same as the size of the plain text. This is different from e.g. AES-CBC.

`zlib` is a standard compression library. By definition, it tries to compress data, i.e. make the output smaller than the input. The way this works is by finding sequences of data which occur multiple times in the input and replacing them with back-references. It works at the byte level, i.e. it can compress `1234 1234` to `1234 <back-ref to 1234>`, but it cannot compress `1234 5678` to `1234 <back-ref to 1234 with +4 to byte values>`.

$ python3
>>> import zlib
>>> # all bytes from 0 to 255:
>>> len(bytes([ i for i in range(256) ]))
256
>>> # zlib compression makes the result larger:
>>> len(zlib.compress(bytes([ i for i in range(256) ])))
267
>>> # the sequence 0, 1, 2, 3, 0, 1, 2, 3, ...:
>>> len(bytes([ i % 4 for i in range(256) ]))
256
>>> # zlib compression identifies the repeating 0, 1, 2, 3:
>>> len(zlib.compress(bytes([ i % 4 for i in range(256) ])))
15

What do we actually do in this challenge? We cannot decrypt the AES, since we don't know the encryption key and the counter is 8 random bytes that would be very hard to predict. The only information leak that we have available is the size of the cipher text, and this depends on how well the `zlib` compression performs.

If you are not familiar with an attack like this, see [CRIME](https://en.wikipedia.org/wiki/CRIME) or [BREACH](https://en.wikipedia.org/wiki/BREACH).

How does `zlib` compression help us? Let's assume that `PROBLEM_KEY` is the string `good_secret`. Now let's append `baad` and `good` to `PROBLEM_KEY`, compress it, and check the length of the result:

>>> len(zlib.compress(b"good_secret" + b"baad"))
23
>>> len(zlib.compress(b"good_secret" + b"good"))
21

The length of the string we appended was the same, and yet the second is shorter - because `good` is a part of `PROBLEM_KEY` and hence even without knowing what `PROBLEM_KEY` is, we can tell that it contains `good`.

So this will be our general approach: send various strings to the server and character-by-character we can find out what `PROBLEM_KEY` is, based on how well our attempts compress.

There is just one complication in this challenge, and it is that our input needs to be at least 20 bytes. This is why the above-mentioned fact that `zlib` operates on the byte level is important to us. We can prepend our test string with 20 bytes that will definitely not occur in `PROBLEM_KEY`. The challenge tells us what the search space is for `PROBLEM_KEY`:

```python
# Determine this key.
# Character set: lowercase letters and underscore
PROBLEM_KEY = 'not_the_flag'
```

We can easily find a simple 20-byte string that doesn't contain lowercase characters or underscores: `1234567890ABCDEFGHIJ`. Even better, we can use it to our advantage. This will be the general outline of our search algorithm:

- `padding` := "1234567890ABCDEFGHIJ"
- start with an empty `known_flag` string
- while `known_flag` is not the full flag:
- for each `candidate` in the key alphabet:
- send `padding` + `candidate` + `known_flag` + `padding` to the server
- prepend the best-compressed `candidate` to `known_flag`

During the CTF my script first started with a three-byte string found by brute-force, then extended the candidate in both directions, but the above method is much simpler and less bandwidth-intensive.

([Full Python script here](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-09-14-CSAW-CTF-Quals/scripts/flatcrypt.py))

$ python solve.py
o
go
ogo
logo
_logo
a_logo
_a_logo
e_a_logo
ve_a_logo
ave_a_logo
have_a_logo
_have_a_logo
t_have_a_logo
nt_have_a_logo
snt_have_a_logo
esnt_have_a_logo
oesnt_have_a_logo
doesnt_have_a_logo
_doesnt_have_a_logo
e_doesnt_have_a_logo
me_doesnt_have_a_logo
ime_doesnt_have_a_logo
rime_doesnt_have_a_logo
crime_doesnt_have_a_logo
done!

`flag{crime_doesnt_have_a_logo}`

Original writeup (https://github.com/Aurel300/empirectf/blob/master/writeups/2018-09-14-CSAW-CTF-Quals/README.md#100-crypto--flatcrypt).