Tags: huffman zlib bruteforce
Rating: 5.0
Connecting with Netcat, we are welcomed by a prompt asking us a seed and a string. It then returns the name of few compression libraries alongside a number.
We can then provide a new seed and string and the program loops on that.
Please send: seed:string
I'll then show you the compression benchmark results!
Note: Flag has format DrgnS{[A-Z]+}
1:A // This is our input
none 26
zlib 34
bzip2 63
lzma 84
? We learn that the flag only contains uppercase characters inside the brackets. ?
We are provided a server.py file, a copy of it is available at the end of this WU, under Attachments.
The interesting part is under handle_connection
:
seed, string = line.split(b':', 1)
flag = bytearray(FLAG)
random.seed(int(seed))
random.shuffle(flag)
test_string = string + bytes(flag)
response = []
for cfunc in COMPRESSION_FUNCS:
res = cfunc(test_string)
response.append(f"{cfunc.__name__:>8} {res:>4}")
Every time we provide a seed
and a string
, it takes the FLAG
, shuffles it based on our seed
and prepend the shuffled flag by our string
.
For example, considering FLAG = DrgnS{AAAAAAAAAAAAA}
, seed = 1
, string = a
, then test_string = aA{A}ADArAAAAAAAnAgAS
, and the result of the benchmark is:
none 21
zlib 24
bzip2 56
lzma 80
def none(v):
return len(v)
def zlib(v):
return len(codecs.encode(v, "zlib"))
def bzip2(v):
return len(codecs.encode(v, "bz2"))
def lzma(v):
return len(lz.compress(v))
? Those always correspond to the number of bytes after compression of the shuffled and prepended flag. ?
First thing first, the "none" compression is... not compressing. So it returns the size of the flag + the size of the string we prepend.
Remind our discovery, we now know ? the flag is 25 characters long ?, including 7 characters of the flag format (i.e. "DrgnS{}"), so 18 characters to find!
Please send: seed:string
I'll then show you the compression benchmark results!
Note: Flag has format DrgnS{[A-Z]+}
1:A // This is our input
none 26
zlib 34
bzip2 63
lzma 84
Not sure it is the intended way to go forward. We focused on zlib results.
What's great with zlib (and also bzip2) is that they use Huffman Coding and we remembered how it worked overall.
It basically split its input into tokens and counts how many times they appear.
Tokens are then be converted to bits in a way to be as efficient as possible. Those that appear the most will use as few bits as possible.
Our vector of attack will be to have long tokens that take very few bits.
At that point, we tried to validate our idea. For that we updated server.py to add debug info:
print("Initial flag: " + str(FLAG))
flag = bytearray(FLAG)
random.seed(int(seed))
random.shuffle(flag)
test_string = string + bytes(flag)
print("Random + prefixed: " + str(test_string))
We then started it with a custom flag: DrgnS{ABCEFGHIJKLMNOPQRT}
. Our server says that with seed = 1
, the shuffled flag is FP}QT{RILEGMDBrAHOJKnCgNS
.
1:YZ
none 27
zlib 35
bzip2 69
lzma 84
1:FP
none 27
zlib 35
bzip2 66
lzma 84
1:FP}QT{RI
none 33
zlib 35
bzip2 72
lzma 92
1:abcdefhi
none 33
zlib 41
bzip2 76
lzma 92
What we see is that with few letters there is no difference. That's kind of expected, Huffman Coding tokenizes bits, not characters.
However with longer strings, we can see that having a good prefix helps zlib stay at the same value, while unusual prefix will make it grow.
In order for the main vector of attack to work, we actually need to know a substring of the shuffled flag.
This can be done by combining two piece of information.
1/ We actually know the flag format is DrgnS{??????????????????}
, so we know 7 characters: 'D', 'r', 'g', 'n', 'S', '{', '}'
.
2/ Flag shuffling is fully deterministic and only depends on the seed we provide.
We created a script seed_unshuffle.py (cf Attachment) to find seeds for which the shuffled flag starts with DrgnS
.
Our attack is done on the shuffled flag. So in order to find the flag, we actually need to unshuffle it.
This part is a matter of keeping track of how to swap characters. Using a fake flag where all characters are unique (e.g. DrgnS{ABCEFGHIJKLMNOPQRT}
), we can see our letters are swapped.
We did it manually to win time in solving this challenge.
Cf attachment "exploit.py (V1)".
Using seed 2430447
, our local flag is DrgnS{ABCEFGHIJKLMNOPQRT} the shuffled flag is DrgnSHJFCNTLM{BEPQ}OAIGKR
which starts with "DrgnS".
In this first brute force, we take the flag prefix ("DrgnS") and add one character among uppercase characters, {
and }
and check the zlib size. The character providing the lowest zlib size is then kept and we try the next one.
Result? => ❌ DrgnSHJFCNTAAAAAAAAAAAAAA
❌
The beginning is okay... until it only chooses A.
Checking manually what happened, after DrgnSHJFCNT
, whatever the character being added, zlib always returns the same size.
We don't remember exactly how tokenization works in Huffman Coding, and zlib also uses LZ77 which we don't know.
We thought of learning more about those. Then thought that anyway we are in Python here so we can just read the code and went to gzip source code. And then we decided to be lazy and update our current brute force algorithm to not determine one character at a time, but two!
This exploit works!! ❌ Or so we thought ❌
Locally it works great, but remember our flag? There is no character being duplicated. The actual flag on the server contains duplication.
With this seed, there is quickly 3 I
. Checking manually, those seems to be normal. But afterwards the exploit gets completely broken because such consecutive characters impacts the tokenisation of the Huffman Coding algorithm.
At that point we still kept track of what we learnt about the flag. We manually reversed the shuffle manually.
Here is the information we gathered with this test:
# DrgnS{??????????????????} -- Initial step, no brute force done
# DrgnS{??????A???????????} -- First letter brute forced
# DrgnS{??????A?R?????????} -- Second letter brute forced
# DrgnS{????I?A?R?????????} -- ...
# DrgnS{????I?A?R?????????} -- ...
# DrgnS{??I?I?A?R???I?????} -- Last letter brute forced, after this one we can't do anything
#2430447:DrgnSARIII
What do you do when brute-forcing is not working? Try again.
The next seed for which the shuffled flag starts with "DrgnS" is 3723664, and it actually starts with "DrgnS}", so that's even better, we know a longer prefix than expected.
We still have a similar issue, but we only miss 3 characters this time! We are also comforted with our brute force since found letters are the same as with our first try.
# DrgnS{??????????????????}
# DrgnS{????I?????????????}
# DrgnS{????IS????????????}
# DrgnS{????IS?????E??????}
# DrgnS{????IS?????E?????S}
# DrgnS{????IS????ME?????S}
# DrgnS{T???IS????ME?????S}
# DrgnS{T???ISA???ME?????S}
# DrgnS{TH??ISA???ME?????S}
# DrgnS{TH??ISA???MEI????S}
# DrgnS{TH??ISA???MEIG???S}
# DrgnS{TH??ISA???MEIG??SS}
# DrgnS{TH??ISA?R?MEIG??SS}
# DrgnS{TH??ISACR?MEIG??SS}
# DrgnS{TH??ISACR?MEIGU?SS}
# DrgnS{THI?ISACR?MEIGU?SS}
#3723664:DrgnS}ISESMTAHIGSRCUISESM
What do you do when brute-forcing is not working? Stop brute-forcing and guess. It often works when trying to brute force passwords so that seems legit here ?
The flag looks like a sentence and the first one that came to our mind is "This is a crime I guess". => ? DrgnS{THISISACRIMEIGUESS} ?
I found the challenge super fun. It reminded me of a course on compression I took 7 years ago at school. Time flies!
It also gave me vibes of "you can see that in real life" somehow. The most probable one would be in some Blind SQL Injection.
Maybe another would be related to HTTP calls that would store user input data, suffix it with some metadata, and returns a compressed size in headers. This kind of attack would allow an attacker to read the metadata. That seems a bit far fetched though.
Thank you Dragon Sector :)
#!/usr/bin/env python3
import threading
import socket
import random
import codecs
import lzma as lz
with open("flag.txt", "rb") as f:
FLAG = f.read().strip()
def none(v):
return len(v)
def zlib(v):
return len(codecs.encode(v, "zlib"))
def bzip2(v):
return len(codecs.encode(v, "bz2"))
def lzma(v):
return len(lz.compress(v))
COMPRESSION_FUNCS = [
none,
zlib,
bzip2,
lzma
]
def handle_connection(s, addr):
s.sendall(
("Please send: seed:string\\n\n"
"I'll then show you the compression benchmark results!\n"
"Note: Flag has format DrgnS{[A-Z]+}\n").encode())
data = b''
while True:
idx = data.find(b'\n')
if idx == -1:
if len(data) > 128:
s.shutdown(socket.SHUT_RDWR)
s.close()
return
d = s.recv(1024)
if not d:
s.close()
return
data += d
continue
line = data[:idx]
data = data[idx+1:]
seed, string = line.split(b':', 1)
flag = bytearray(FLAG)
random.seed(int(seed))
random.shuffle(flag)
test_string = string + bytes(flag)
response = []
for cfunc in COMPRESSION_FUNCS:
res = cfunc(test_string)
response.append(f"{cfunc.__name__:>8} {res:>4}")
response.append('')
response.append('')
s.sendall('\n'.join(response).encode())
s.shutdown(socket.SHUT_RDWR)
s.close()
def main():
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('0.0.0.0', 1337))
s.listen(256)
while True:
conn, addr = s.accept()
print(f"Connection from: {addr}")
th = threading.Thread(
target=handle_connection,
args=(conn, addr),
daemon=True
)
th.start()
if __name__ == "__main__":
main()
#!/usr/bin/env python3
import random
with open("flag.txt", "rb") as f:
FLAG = f.read().strip()
def main():
print("Initial flag: " + str(FLAG))
found = 0
seed = 2430448
while (found == 0):
flag = bytearray(FLAG)
random.seed(int(seed))
random.shuffle(flag)
test_string = bytes(flag)
if(str(test_string).startswith("b'DrgnS")):
print("Seed found: " + str(seed))
found = 1
if(seed % 100000 == 0):
print(seed)
seed += 1
if __name__ == "__main__":
main()
import re
import requests
import socket
import string
from pwn import *
flag = "DrgnS"
seed = "3723664"
#nc = remote('compresstheflag.hackable.software', 1337)
nc = remote('127.0.0.1', 1337)
nc.recvuntil(b"Note: Flag has format DrgnS{[A-Z]+}\n")
def send_flag(payload):
nc.write(bytes(payload, "utf-8") + b"\n")
nc.recvuntil(b"none")
nc.recvline()
recv = nc.recvline()
nc.recvuntil(b"lzma")
nc.recvline()
numbers = [int(word) for word in recv.split() if word.isdigit()]
return int(numbers[0])
while len(flag) < 25:
min_size = 100000
best_payload = ""
for i in string.ascii_uppercase + "{}":
payload = flag + i
print(payload)
size = send_flag(seed + ":" + payload)
if size < min_size:
min_size = size
best_payload = payload
flag = best_payload