Tags: crypto 

Rating:

category: crypto

points: 300

# Description
We are given with many chipher text produced with XOR cipher with the same key.

# Solution
I know that is defenetly not secure to reusing the same key in xor cipher. But how to exploit that?

My first thought was to use frequency analisis but I decide to try another method:

consider bit [1:3] bits of `a-zA-Z` and `space` ascii characters:
```
In [10]: '{:08b}-{:08b}'.format(ord(b'a'), ord(b'z')) # always 11
Out[10]: '01100001-01111010'
```
```
In [6]: '{:08b}'.format(ord(b' ')) # 01
Out[6]: '00100000'
```

```
In [9]: '{:08b}-{:08b}'.format(ord(b'A'), ord(b'Z')) # always 10
Out[9]: '01000001-01011010'
```

So here is the hack:

assumming only `a-zA-Z` and `space` are using in plaintext:

and almost every character is `a-z`

if we xor two cipher text and look at the bits [1:3]:

- `11` -> we xor `space` with `A-Z` (sure)
- `00` -> we xor `a-z` with `a-z` (almost everytime)
- `10` -> we xor `space` with `a-z` (sure)
- `01` -> we xor `a-z` with `A-Z` (sure)

But is this enough leakage?

yes. using the same assumptation about frequency of a-z, we can identify space possition:

now we xor one cipher text(`cp0`) with everyone else. And look at the bist [1:3]:
- if frequency of `10` is greater, than its probably space in `cp0`

And once we know space possition we can use [Knowing plain text attack](https://en.wikipedia.org/wiki/Known-plaintext_attack) to restore key

based on this idea I wrote python scirpt
```
from Crypto.Util.strxor import strxor

def read_ct(path):
res = []
with open(path, 'r') as f:
for line in f.readlines():
res.append(bytes.fromhex(line.strip()))
return res

if __name__ == '__main__':
cts = read_ct('emperors_new_crypto_encrypted.txt')
n = len(cts[0])
offset = len('flag{')
look = range(offset, n - 1)
look = range(n)

pos_to_line: dict[int, int] = {}
correction: dict[int, int] = {i: 0 for i in range(n)}
correction.update({
3: ord('b') ^ ord('y'),
4: ord(',') ^ ord(' '),
14: ord('|') ^ ord('r'),
19: ord('c') ^ ord('o'),
28: ord('i') ^ ord('e'),
36: ord('v') ^ ord('t'),
37: ord('d') ^ ord('h'),
41: ord('v') ^ ord('t'),
45: ord('~') ^ ord('e'),
47: ord('~') ^ ord('p'),
48: ord('`') ^ ord('l'),
63: ord('j') ^ ord('h')
})

for pos in look:
for i, x in enumerate(cts):
stat = []

for y in cts[i+1:]:
stat.append('{:08b}'.format(x[pos] ^ y[pos]))

b_00 = len([1 for bins in stat if bins[1:3] == '00'])
b_01 = len([1 for bins in stat if bins[1:3] == '10'])
if b_01 > b_00 and pos not in pos_to_line:
pos_to_line[pos] = i

lines = []

for ct in cts:
line = []
for pos, x in enumerate(ct):
char = ord(' ') ^ x ^ cts[pos_to_line[pos]][pos] ^ correction[pos] if pos in pos_to_line else ord(' ') ^ correction[pos]
line.append(char)
lines.append(bytes(line))
for line in lines:
print(line)

print(strxor(lines[0], cts[0]))

```

noted that I need some corrections because my assumptation
> only `a-zA-Z` and `space` are using in plaintext

is not correct.

Key of xor cipher was a flag