Rating:

## Alice & Bob (FE-CTF 2023)

(This write-up is also available [as a gist](https://gist.github.com/TethysSvensson/51c7325808619a08c2ea353300588d72))

This challenge consists of three parts. In each part participants are tasked with creating two python functions, `alice()` and `bob()`, which will be sent to the server. These functions are executed multiple times on the server in a sandbox and are used to encode and decode messages generated by the server code.

### Detailed challenge overview

We did not find the sandbox particularly interesting. We assumed it was safe and focused our attention elsewhere. A simplified version of the runner code without the sandbox is this:

```python
def run_single_check():
token = list(os.urandom(N))
msg = alice(token)
assert type(msg) == list and len(msg) <= 64 * 8 and all(x is True or x is False for x in msg)
if msg: msg[random.randrange(0, len(msg))] ^= True
assert bob(msg) == token

for _ in range(1000):
run_single_check()

send_flag()
```

The above code gives some random data to your `alice()` function, which will then encode it as a list of up to `64*8 = 512` bits. After encoding it will flip exactly one random bit in your message and give it to your `bob()` function. Your `bob()` function should then recover the original message despite the single bit flip.

Solving the challenge requires writing `alice()` and `bob()` functions that consistently produce the correct results in all 1000 executions. Moreover, for each part of the challenge will increase by adjusting the value of `N`.

The simplified code above also leaves out the part of how to communicate your functions to the server, but that part is trivial as you just need to hex-encode the functions and send a string for both `alice()` and `bob()`.

### Part 1 (N=21)

The first part requires transmitting `21*8 = 168` data bits using up to `64*8 = 512` encoding bits. This is really easy, since we can just triplicate each bit and take the majority when decoding.

```python
def alice(args):
o = []
for b in args:
for c in f'{b:08b}':
o += [bool(int(c))] * 3
return o

def bob(args):
o = []
for i in range(0, len(args), 8*3):
c = 0
for i in range(i, i+8*3, 3):
cur = (args[i] + args[i+1] + args[i+2]) // 2
c = (c << 1) | cur
o.append(c)
return o
```

After doing the proper hex encoding and connecting to the server we get the flag:

`flag{m4j0rity-vot3-s1mple,-f4ir-&-effect1ve}`

### Part 2 (N=62)

For this part we have to transmit `62*8 = 496` data bits using the same `512` encoding bits. The intended solution for this challenge was probably to use [`Hamming(511, 502)`](https://en.wikipedia.org/wiki/Hamming_code), which can encode up to `502` data bits using `511` encoding bits.

While we considered using this approach, we decided to skip it and instead write a solution that also works for both part 2 and 3 with minimal changes.

### Part 3 (N=63)

For this part we have to transmit `63*8 = 504` data bits using the same `512` data bits. For a while, we were somewhat stuck trying to make this work.

Isn't the hamming code supposed to be optimal?? We only get one more encoding bit than needed for `Hamming(511, 502)`, and yet we are supposed to cram two more data bits?? What gives???

It turns out that we made two flawed assumptions.

#### Flawed assumption 1: One more bit!

The realization that the code doesn't demand lists of precisely 512-bits opens up the possibility of utilizing shorter lists, nearly providing an extra bit by converting between representations of up-to-512-bits and exactly-513-bits.

```python
# This function takes a list in the exactly-513 representation and encodes it as a list in the
# up-to-512 representation
# There is only one 513-bit list that cannot be encoded like this, and it is
# the list that does not have any True values
def to_512(l):
while l[-1] == False:
l.pop()
l.pop()
return l

# This function is the inverse of the function above. It takes a list of up-to-512 bits and encodes it as a list
# with exactly 513 bits
def to_513(l):
l.append(True)
while len(l) < 513:
l.append(False)
return l
```

#### Flawed assumption 2: Hamming is not optimal?!?

It turns out that the hamming code is only optimal if you need to protect against **up to 1** bit flip. In our case we have additional information, since we know that **exactly 1 bit** will be flipped. This means that the hamming bound no longer applies and we are free to try and find smarter ways.

#### Our solution

Our solution was strongly inspired by [an interesting puzzle using a chess board](http://datagenetics.com/blog/december12014/index.html).

We chose to implement a very special hash function, which maps the `513` bits to a single `9`-bit number. This hash function works by associating each position in the array with a 9 bit number and `xor`ing the number of the position that has `True` values.

Since we are free to which number to associate with each position, we choose the last 9 positions of the our 513 bit number with the 9 different powers of two. This means that by manipulating those last 9 junk bits, we can arrive at any hash value we want.

We pick those 9 bits, so that the hash becomes zero. After a single bit gets flipped, this will change the corresponding hash value. By seeing how the hash value was changed, we can figure out which position of the data was flipped and revert the change.

There is only one small gotcha: There are only 513 unique 9-bit values! We got around this problem by having a single repeated occurrence of `0x100`. By putting this repeated value at the last place, we know are guaranteed that this bit will not be flipped, since this place is artificially reconstructed by our `to_513()` code.

#### Code

```python
def alice(args):
powers_of_two = [1, 2, 4, 8, 16, 32, 64, 128, 256]
hash_values = [i for i in range(512) if i == 0 or ((i - 1) & i) != 0 or i == 256] + powers_of_two

def checksum(args):
r = 0
for (h, b) in zip(hash_values, args):
r ^= b * h
return r

def to_512(l):
while l[-1] == False:
l.pop()
l.pop()
return l

def to_bits(args):
o = []
for b in args:
for c in f'{b:08b}':
o.append(bool(int(c)))
return o

args = args + [0] * (63 - len(args)) # fixup for part 2
args = to_bits(args)
c = checksum(args)
for b in powers_of_two:
args.append((c & b) != 0)

# debug check
assert checksum(args) == 0

# trim down to 512 bits
return to_512(args)

def bob(args):
powers_of_two = [1, 2, 4, 8, 16, 32, 64, 128, 256]
hash_values = [i for i in range(512) if i == 0 or ((i - 1) & i) != 0 or i == 256] + powers_of_two

def checksum(args):
r = 0
for (h, b) in zip(hash_values, args):
r ^= b * h
return r

def to_513(l):
l.append(True)
while len(l) < 513:
l.append(False)
return l

def from_bits(args):
o = []
for i in range(0, len(args), 8):
c = 0
for i in range(i, i+8):
c = (c << 1) | args[i]
o.append(c)
return o

args = to_513(args)

c = checksum(args)
args[hash_values.index(c)] ^= True

return from_bits(args[:504])
```

Flag for part 2: `flag{textb00k-h4mming-c0d3s? no more!}`

Flag for part 3: `flag{v3ry-cl0se-to-the-th30retical-limit}`

#### Addendum 1: Alternative solution using `Hamming(511, 502)`

After the CTF ended, I was made by aware of a different solution by some other players from Kalmarunionen (thanks Killerdog and eskildsen!).

To understand this solution, first recall that a decoder for `Hamming(M, N)` messages actually returns two values:

- A corrected message with the following behavior:
- For messages with 0-1 bitflips: Guaranteed to equal the original message
- For messages with 2-āˆž bitflips: Guaranteed to be different from the original message
- A boolean indicating if message was already valid or if it had to be corrected
- For messages with 0 bitflips: Guaranteed to be `True`
- For messages with 1-2 bitflips: Guaranteed to be `False`
- For messages with 3-āˆž bitflips: Both behaviours possible depending on which bits were flipped

We can use this extra boolean to figure out if the bit flip happened outside of the hamming protected area and flip it accordingly.

A sketch of this solution looks like this:

```python
def alice(msg):
msg = to_bits(msg)
msg = hamming_encode(msg[:502]) + msg[502:504]
msg = to_512(msg)

return msg

def bob(msg):
msg = to_513(msg)
hamming_msg, hamming_was_valid = hamming_decode(msg[:511])

if hamming_was_valid:
# The flip must be in bit 511, since bits 0-510 were part of the hamming
# encoding and bit 512 was artificially reconstructed by the to_513 function.
msg[511] ^= True

msg = to_bytes(hamming_msg + msg[511:513])

return msg
```

#### Addendum 2: Suggestion for challenge improvements

It was possible to solve the challenge using the `Hamming(511, 502)` exactly as outlined above, but without realizing that you get an extra boolean out of the decoder. Instead you would simply apply a bit of wishful thinking and return the wrong message if the bitflip hit outside the protected area. Over 1000 rounds, the probability of being lucky every time is `(511/512) ** 1000` or roughly 14%.

In my opinion, this means that the challenge should have run for more than 1000 rounds, or alternatively tested every bitflip position at least once.

Original writeup (https://gist.github.com/TethysSvensson/51c7325808619a08c2ea353300588d72).