Rating:

*For the full experience with images see the original blog post!*

**TL;DR:** The challenge requires us to reverse an encryption with three operations applied in rounds in a structure that reminded me a bit of BCD and Hamming code.
We solved it by deducing the order of the operations (they're ordered randomly) and implementing a decryption from that.

As works best for me, I quickly loaded the binary into ghidra to look at the program and find it's C++.
Many people in our team dislike C++ reversing since it often complicates the process with structs, classes, template methods, optimizations etc.
I agree that it is challenging but I started looking into it since my first own challenge (writeup coming soon!) and enjoying taking my time to look at the types and methods.
If you're mainly interested in the part reversing the encoding, you can [skip to that block](#encryption-logic) directly.

## The program structure

The program was fully stripped, but as is often the case, we could jump to the main function from the entry point.
Here, the program did some parameter parsing with `getopt` to get the options.
The challenge already explained what each option is for and we can confirm that easily.
The number of threads is stored in a global constant and the input and output paths are used for opening file streams.
Here, C++ methods often require us to fix their signature by hand in ghidra but I could produce readable results quickly with the [docs](https://en.cppreference.com/w/).

The core part of the main method reads a size from the input file and initializes `srand` with this size XOR the number of threads.
The it reads `size` numbers into a vector, executes some timed method and finally writes the resulting number vector to the output file.

The `run`-method sets up a lot of structs and starts the encryption threads.
First, it initializes some kind of sync struct, shuffles the operations using `rand` and starts the threads with the vector data, the operations, a global position pointer and the sync struct.
The rest of the method waits for completion and checks for exceptions.

The thread construction method contains the real thread constructor with a reference to a vtable.
Additionally, it copies the arguments mentioned earlier to a new struct.
Creating this struct was really useful to understand the thread `run` method, the last method in the vtable.

The logic of the thread `run` method is hidden six layers deep in a tree of nested calls copying the parameters.
It does however contain the encryption logic so it's worth the search.

## Encryption logic

After analyzing the program structure and fixing signatures and types, I looked at the algorithm itself together with others of my team, [KITCTF](https://kitctf.de/).
The encryption method uses the `thread_args`-struct I created from the constructor and all the other structs and methods I analyzed.
Since I already named them pretty specifically, I will only show you the most important methods.
The encryption contains two phases.

During the first one, it iterates over the vector using the global position pointer to get unique positions for each thread.
The round counter is used as the distance between elements and also determines the operation to be used.
Every time, the upper index is updated with the result of the operation.
We visualized the element accesses of the algorithm like this:

We actually have 51 characters in the flag, but the pattern still applies.
After each round, the threads are synchronized with an interesting barrier implementation of a barrier using the sync struct:

It first counts down using the first semaphore and a mutex and then up again using a second semaphore.

Now, after the first encryption block, the same structure is repeated, this time counting down the counter and adding the counter instead of subtracting it.

This results in this access structure, from the bottom upwards:

Now, as you might have noticed, the first number is never updated.
We could use this to confirm that it is the flag as ASCII codes since `chr(100) == 'd'`.
Knowing that the flag starts with `dice{` we can deduce that the first operation in the list is `XOR` from the second character: `chr(100 ^ 13) == 'i'` (also the possible option, with such a low number).
Next, we can do the same with the fourth character (after reconstructing those before it) to deduce that the next operation is `add`: `19 = 13 + (99 ^ 101)`.

Finally, I implemented the inverse of the encryption knowing the order of the operations:

```python
import math

values = []
with open("flag.out", "r") as file:
line = file.readline()
while line != "":
values.append(int(line))
line = file.readline()

def xor(a, b):
return a ^ b

def add(a, b):
return a - b

def mult(a, b):
return a // b

ops = [xor, add, mult]

SIZE = len(values)

counter = 1

while counter < 32:
glob = 1
while True:
if SIZE <= glob * counter * 2 - 1:
break

lower = glob * counter * 2 - 1
upper = lower + counter

if upper < SIZE:
iter_count = round(math.log2(counter))

values[upper] = ops[iter_count % 3](values[upper], values[lower])

glob += 1

counter <<= 1

while counter > 0:
glob = 1
while True:
if SIZE <= glob * counter * 2 - 1:
break

upper = glob * counter * 2 - 1
lower = upper - counter

if upper < SIZE:
iter_count = round(math.log2(counter))

values[upper] = ops[iter_count % 3](values[upper], values[lower])

glob += 1

counter >>= 1

print("".join([chr(i) for i in values]))
```

Original writeup (https://ik0ri4n.de/dice-ctf-23#not-baby-parallelism).