Rating: 5.0

The problem description indicates that we are confronted with a substitution
cipher. Substitution ciphers are vulnerable to frequency analysis : letters in
any language appear in text at different frequencies. So we can easily guess
that the letter that appears the most in the ciphertext is the substitution for
the letter that appears the most in the english language (that would be "e".)
After guessing a few most common letters, we can just guess the other
substitutions by just looking a the partially deciphered text.

The name of the task is a hint at this approach (hertz being a unit used to
measure frequency).

So I used the following bit of code to look at character frequency in the
ciphertext :

```python
from collections import Counter
from string import ascii_lowercase

with open('ciphertext', 'r') as f:
ciphertext = f.read()

charcount = Counter(c for c in ciphertext if c in ascii_lowercase)
total_chars = sum(charcount.values())
for char, count in charcount.items():
print(f'{char}: {(count*100)/total_chars}% ({count})')
```

It produced this output :

```python
l: 2.174877940523746% (49)
r: 7.23479804704838% (163)
o: 6.56901908566356% (148)
w: 2.3524189968930314% (53)
q: 6.1695517088326675% (139)
s: 6.924101198402131% (156)
z: 9.409675987572125% (212)
g: 7.1016422547714155% (160)
v: 5.947625388371061% (134)
e: 13.049267643142477% (294)
y: 6.924101198402131% (156)
k: 2.2192632046160674% (50)
p: 2.929427430093209% (66)
a: 2.4411895250776743% (55)
i: 5.0599201065246335% (114)
u: 1.3759431868619618% (31)
h: 1.908566355969818% (43)
f: 1.020861074123391% (23)
n: 3.195739014647137% (72)
c: 2.796271637816245% (63)
b: 2.174877940523746% (49)
j: 0.1775410563692854% (4)
d: 0.0887705281846427% (2)
x: 0.5770084332001776% (13)
t: 0.13315579227696406% (3)
m: 0.04438526409232135% (1)
```

We can compare that with the [letter frequency](https://en.wikipedia.org/wiki/Letter_frequency)
of the english language. Looking at the output, we see an overwhelming
majority of "e", with a 13% frequency : according to wikipedia, the letter e has
a frequency of 12.702% in the english language. from that we can deduct that the
letter e is in fact no substitued with anything in that cipher.

Let's look at the next most common letters :

- "z" has a frequency of about 9.41% : the closest match in english would be "t"
- similarly, the frequency of "y" in the ciphertext seems to match that of "i"
in english.

Next, I built the following script :

```python
from colorama import Fore

with open('ciphertext', 'r') as f:
ciphertext = f.read()

sub = {
'e': 'e',
'z': 't',
'y': 'i',
}
text = ''
for char in ciphertext:
if char in sub:
char = char.replace(char, f'{Fore.GREEN}{sub[char]}{Fore.RESET}')
text += char

print(text)
```

It simply prints the text, performing the guessed substitutions and printing the
decrypted letters in green (is uses a third party library, `colorama`, for that).
By just looking at the partially decrypted text, we can guess more letters.

I simply extended the `subs` dictionary with letters progressively, each newly
guessed substutions helping to guess one, until the flag appeared in the
partially decrypted text : `substitution_ciphers_are_solvable_uyhyldalrg`.

Original writeup (http://blog.iodbh.net/picoctf2018-crypto-hertz.html).