Rating: 5.0

# ddu du ddu du ddu du ddu du ddu du ddu du ddu du ddu du ddu du ddu du
> nc ec2-13-251-81-16.ap-southeast-1.compute.amazonaws.com 3333

TLDR: The server xors each byte with the previous byte and encodes the output
as base16 using uppercase letters A-P.

Full solver code on my
[GitLab](https://gitlab.com/Askaholic/ctf-writeups/blob/master/Mates%202018/base16_solver.py)

--------------------------------------
The description of this challenge gives us a server to connect to and nothing
else, so we have to connect to figure out what it's about.

```
$ nc ec2-13-251-81-16.ap-southeast-1.compute.amazonaws.com 3333
Please select one of flowing options:
1 - You send me any message then I will give you corresponding cipher text
2 - I show you the challenge flag and you have 3 times to guess the plain text
Your choice:
```

So what we need to do is figure out how the server is encoding our messages and
if we can decode arbitrary messages, then the server will just give us the flag.

So we start by giving the server some simple, and repeating input just in case
it will help us find any obvious patterns.

```
Your choice: 1
Your message: AAAAAAAAAA
The cipher text: T0VFQktGQUFPRUVCS0ZBQU9FRUJLRkFBT0VFQktGQUFPRUVCS0ZBQQ==
```

The server responds with some base64 encoded data which after decoding looks
like this: (spaces added for clarity)
```
OEEBKFAAOEEBKFAAOEEBKFAAOEEBKFAAOEEBKFAA
OEEBKFAA OEEBKFAA OEEBKFAA OEEBKFAA OEEBKFAA
```

So we can already see that just like our input, the output follows a very
consistent pattern. We also notice that the output is 4 times as long as our
input. But what could it mean?

Lets take one step back for a second and notice another interesting fact. The
data was given to us in base64 encoded format, but it only consists of uppercase
letters. Usually base64 is used when you want to transfer arbitrary data with
ascii printable characters only. Maybe our try with the A's just happened to
produce only uppercase letters. Lets see what the server gives us when we pick
option 2 for solving the cipher.

```
nc ec2-13-251-81-16.ap-southeast-1.compute.amazonaws.com 3333
Please select one of flowing options:
1 - You send me any message then I will give you corresponding cipher text
2 - I show you the challenge flag and you have 3 times to guess the plain text
Your choice: 2
Nice! Your challenge message: OCEHJDDGNKHPJJDMNHHCLKBPOJEMLABFPNFIJODLNMHJLBBEN
DHGKAAFMOGLLPBKPBFEJFDAMMGJIECBONEIJLDOKPAKPHFCINCIOCEHJHDCMOGLKIANNMHJKEABMDGG
PBFEJDDGMGGDIJCMMLGOKGADPHFCJFDAPLFO
Your guess:
```

Again, only uppercase letters. So why would the server be base64 encoding it?
Maybe it's a subtle hint as to what the letters could mean. The way base64
encoding works is by splitting the input into 6-bit chunks (yes 6
bits is very awkward to work with) and translating the resulting integer by
indexing into a 64 (2^6) character alphabet array (so `000000` becomes `a`, `000001`
becomes `b`, etc).

So it's possible that this challenge is doing something similar, but with a
different alphabet and probably fewer bits. If we look at the challenge message
from option 2, we notice that the "highest" letter which appears is `P`. And if
we check at which position it is located in the English alphabet we find that
it is the 16th letter! So the text could be a sort of base16 encoding using the
alphabet `ABCDEFGHIJKLMNOP` instead of the standard hex digits which we would
expect.

After testing a few of the letters manually and finding that some of them do
decode to the ascii text that we inputed, we write a simple decoder for turning
the letters into bytes.

```python
import string
ALPHABET = string.ascii_uppercase[0:16]

def decode(cipher):
bits = ""
for c in cipher:
num = ALPHABET.find(c)
bits += int_to_bin(num)
return bits
```

Now when we decode the cipher for our original 10 A's and print it out as as
ascii we get some sort of mess of unicode:
```
äA¥äA¥äA¥äA¥äA¥
```

But we notice that in some places our original text is already visible. The
length of this text is also a bit strange because it isn't a multiple of 10. It
turns out that's just because there were some null bytes in the string which
didn't get printed.
```
'äA¥\x00äA¥\x00äA¥\x00äA¥\x00äA¥\x00'
```

The actual length of the string is twice as long as our input, and when we ask
the server to encode only a single A, we notice that it gives us back 2
characters worth of data: `äA`. The first character doesnt seem to make much
sense, but the second character is simply our input. So we decide to ignore the
first characters for now (in the end, these turned out not to matter at all).

Removing the first characters in the string gives us:
```
'A\x00A\x00A\x00A\x00A\x00'
```

So half of our message is preserved, but the other half has turned into null
bytes. We can try some other input strings like `ABCD` and notice that the null
bytes only occur with repeating characters. Since this is a crypto challenge
there is a good chance that this is a result of xoring characters together. If
you have experience with xor you will know that xoring something with it's self
always results in 0. So the null bytes are probably the result of xoring two A's
together. But which two A's?

When we try the string `ABCD` we notice that we now only see the `A` in our
decoded text: `'A\x03@\x04'`. But since we expect that the cipher is using xor,
and xor is it's own inverse (a ^ b ^ a = b), we can xor the second byte with
`B` to get some hints about what's happening.

```python
>>> ord('B')
66
>>> hex(66 ^ 3)
'0x41'
>>> chr(0x41)
'A'
```

And bingo! The second byte is simply xored with the first byte. We can try this
with the other bytes too and notice that they are always xored with the previous
byte. We can write a simple function to undo this xor, and we should have a
working decoder.

```python
def undo_mangle(binstr):
i = 0
resultstr = ""
prev_chunk = None
while i < len(binstr):
chunk = binstr[i+8:i+16]
if prev_chunk:
resultstr += xor(chunk, prev_chunk)
else:
resultstr += chunk

prev_chunk = chunk
i += 16 # Skip two chars ahead because we always ignore the first one
return resultstr
```

With our working decoder we can decode the challenge string we got earlier:
`GqICNmSYMcBmbsnqNdYHiv4XzouYftxg2bUOBmQbn`. We send this back to the server
and get the flag.

```
Your guess: GqICNmSYMcBmbsnqNdYHiv4XzouYftxg2bUOBmQbn
Awesome! Here is the final flag: Good fun with bad crypto
```

Original writeup (https://gitlab.com/Askaholic/ctf-writeups/blob/master/Mates%202018/ddu_du.md).