Rating:

This is a similar challenge than Alkaloid Stream but with a harder to reverse key derivation.

More details are in the full writeup, but this summary is enough to understand the solution.

## Understanding the key derivation

The keystream derivation works as follows:
- let `ln` be the size of the flag. Generate `ln` linearly independent keys of `ln` bits, denoted `key[i]`.
- for every `i <= ln - ln//3`, let `fake[i]` be the XOR of `ln//3` randomly chosen different keys `key[j]` with `i+1 <= j <= ln`.
- generate a random keystream `ks` of `ln` bits. Define `public[i] = (fake[i], key[i]) if ks[i] == 1 else (key[i], fake[i])`
- then choose a permutation `pi`, the keystream used to encrypt the flag is `pi(ks)` and public released data is `pi(public)`.

Only lines 2 changes from the previous Alkaloid cipher.

## Retrieving the keystream

This time we don't have an explicit recurrence relation where we can deduce the `fake` values.

However we know that `fake` values are linear combinations of other keys, while `key` values are not (they are linearly independent).
Thus we still have some kind of recurrence relation: if we know `key[i+1], ..., key[ln - 1]`, then we can find `fake` values that are a linear combination of those elements.
We also know that `fake[i]` is one such element, so we will learn at least one new key on this round.

The question now is to understand how we can state if some element `v` is a linear combination of the known keys.
To do so, I put all keys in a matrix of GF(2). This is a matrix of size `k * ln` where `k` is the number of known keys.
Because keys are linearly independent and `k < ln`, the rank of this matrix is k.

Then I add a new row with the last row being `v`. I now have a matrix of size `(k+1) * ln`.
If `v` is a linear combination of the keys, then my matrix will still have rank `k`.
Otherwise they are linearly independent and so the matrix will have rank `k+1`.

The following code evaluates the rank of a GF(2) matrix (borrowed from [Stackoverflow](https://stackoverflow.com/questions/56856378/fast-computation-of-matrix-rank-over-gf2)):
```python
def gf2_rank(rows):
"""
Find rank of a matrix over GF2.

The rows of the matrix are given as nonnegative integers, thought
of as bit-strings.

This function modifies the input list. Use gf2_rank(rows.copy())
instead of gf2_rank(rows) to avoid modifying rows.
"""
rank = 0
while rows:
pivot_row = rows.pop()
if pivot_row:
rank += 1
lsb = pivot_row & -pivot_row
for index, row in enumerate(rows):
if row & lsb:
rows[index] = row ^ pivot_row
return rank
```

Given this function I can say if some value is a linear combination of the known keys:
```python
def is_linear_combination(keys, test_value):
rows = keys.copy()
rows.append(test_value)
n = len(rows)
return gf2_rank(rows) < n
```

## Final exploit

```python
with open("output.txt", "r") as f:
c = bytes.fromhex(f.readline())
c = bytes_to_bits(c)
public = eval(f.read())

ln = len(public)
key = [0] * ln
keystream = [0] * ln
for i in range(ln-1,-1,-1):
fake = 0
for j in range(ln // 3):
if i + j + 1 >= ln:
break
fake ^= key[i + j + 1]
for k in range(ln):
if fake in public[k]:
if fake == public[k][0]:
keystream[k] = 1
key[i] = public[k][1]
else:
keystream[k] = 0
key[i] = public[k][0]
public[k] = [-1, -1]
break

flag = bits_to_bytes(xor(keystream, c))
print(flag)
```

It takes 5 min or so to run.

Original writeup (https://github.com/apoirrier/CTFs-writeups/blob/master/PBCTF2021/Crypto/SteroidStream.md).