Tags: vm reversing 

Rating:

# vmwhere1 (rev)
Writeup by: [xlr8or](https://ctftime.org/team/235001)

In this challenge we get an ELF binary `chal` and a data blob `program`.

The ELF, as the name name hints, is a virutal machine of sorts and it is going to execute the `program`. It is a stripped binary, however we can get to the `main` function in ghidra, by locating the `entry` function. The `entry` function will call `__libc_start_main` and it will pass the pointer to the `main` function as the first argument.

In the `main` function we see that some command line parameters are read. The first and only parameters that is provided is the path to the blob that contains the instructions to execute. Here is the `main` function (with some renaming):

```c
undefined8 main(int param_1,char **param_2)

{
undefined8 uVar1;
long in_FS_OFFSET;
undefined4 length;
int local_1c;
long prog_bytes;
long local_10;

local_10 = *(long *)(in_FS_OFFSET + 0x28);
if (param_1 < 2) {
printf("Usage: %s <program>\n",*param_2);
uVar1 = 1;
}
else {
prog_bytes = read_file_bytes(param_2[1],&length);
if (prog_bytes == 0) {
printf("Failed to read program %s\n",param_2[1]);
uVar1 = 2;
}
else {
local_1c = exec_prog(prog_bytes,length);
if (local_1c == 0) {
uVar1 = 0;
}
else {
uVar1 = 3;
}
}
}
if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
/* WARNING: Subroutine does not return */
__stack_chk_fail();
}
return uVar1;
}
```

We are mostly interested in the `exec_prog` function, since that is the one that will execute the instructions of `program`.

In this function, first some variables are initialized.
* `pbVar4` allocates 0x1000 bytes, that will be used as a stack for the vm. It will always point to the start of the vm stack.
* `local_20` point to the current byte in the program, this can be thought of as an instruction pointer
* `local_18` this is going to be used as the stack pointer. Important to note here is that the vm handles the stack pointer the opposite way x86 would. Whenever more stack space is needed, the pointer is incremented, and whenever stack space is freed the pointer is decremented. Another difference is that here, the stack pointer doesn't point to the last pushed element, but one cell after that, as opposed to x86, where the stack pointer points to the just pushed element.
* `next_inst` this is set while executing instructions already. My naming is a bit misleading, as it really doesn't point to the next instruction, but whatever the next byte after the current instruction pointer is.

In the main loop a big `switch` statement exists, to decode and execute instructions. The loop additionally ensures that there's no stack over/underflow.
Let's look at some of the instructions to get a better understanding of the VM:
```c
case 1:
local_18[-2] = local_18[-2] + local_18[-1];
local_18 = local_18 + -1;
local_20 = next_inst;
break;
```

This instruction will first add the top 2 elements on the stack, and place the result before the current top element. After this the stack pointer is decremented (remember, here this *frees* stack space). The instruction pointer is incremented by one and we continue.
In essence, this operations pops 2 elements from the stack, adds them, and pushes the result on the stack.

Now let's look at a more complex instruction

```c
case 0xc:
if (local_18[-1] == 0) {
next_inst = next_inst + CONCAT11(*next_inst,local_20[2]);
}
local_20 = next_inst;
local_20 = local_20 + 2;
break;
```

This is a conditional jump instruction, which jumps, if the value on the top of the stack is zero. If the condition holds, `next_inst` is incremented by the offset the jump specifies. We see that this offset is the next 2 bytes after the opcode. For example `0x0c 0x00 0x02` would be a jump with offset 2. This offset is relative to the start of the instruction **after** the jump.

Now that we have a better feeling for the VM, let's write a simple disassembler for the instruction set. In fact only some of the instruction defined by the VM are used, so no need to support the full set of instructions.

```python
bts = open('./program', 'rb').read()

def aschar(num):
try:
return chr(num)
except:
return num

ptr = 0
while ptr < len(bts):
cur = bts[ptr]
if cur == 0xA:
print(f'push {hex(bts[ptr+1])} - "{aschar(bts[ptr+1])}"')
ptr += 2
elif cur == 0x00:
print('exit 0')
ptr += 1
elif cur == 0x05:
print('TOS[-2] = TOS[-2] ^ TOS[-1]; TOS--')
ptr += 1
elif cur == 0x07:
print('TOS[-2] = TOS[-2] >> TOS[-1]; TOS--')
ptr += 1
elif cur == 0x08:
print('TOS++; get(TOS[-1])')
ptr += 1
elif cur == 0x09:
print('put(TOS[-1]); TOS--')
ptr += 1
elif cur == 0xF:
print('dup')
ptr += 1
elif cur == 0xC:
jump_val = ctypes.c_int16(int.from_bytes(bts[ptr+1:ptr+3], 'big'))
print(f'if TOS[-1] == 0: ip += {jump_val.value}')
ptr += 3
elif cur == 0xD:
jump_val = ctypes.c_int16(int.from_bytes(bts[ptr+1:ptr+3], 'big'))
print(f'ip += {jump_val.value}')
ptr += 3
elif cur == 0x0E:
print('TOS--')
ptr += 1
else:
print('Stopping at', hex(ptr))
break
```

There are some instructions which were not covered in detail above, however I think all instructions are understandable from the decompiled C code of the VM, using the examples that are explained above.

In the disassembly `get` will refer to getting a character from the user input, while `put` will refer to priting a character to stdout.

Now we can disassemble the `program` and understand what is happening under the hood.

Let's look at an example first. When the program starts *Welcome to VMWhere1!* is printed. Let's see how that looks in the disassembly:
```
push 0x0 - ""
push 0xa - ""
push 0x21 - "!"
push 0x31 - "1"
push 0x20 - " "
push 0x65 - "e"
push 0x72 - "r"
push 0x65 - "e"
push 0x68 - "h"
push 0x57 - "W"
push 0x4d - "M"
push 0x56 - "V"
push 0x20 - " "
push 0x6f - "o"
push 0x74 - "t"
push 0x20 - " "
push 0x65 - "e"
push 0x6d - "m"
push 0x6f - "o"
push 0x63 - "c"
push 0x6c - "l"
push 0x65 - "e"
push 0x57 - "W"
if TOS[-1] == 0: ip += 4
put(TOS[-1]); TOS--
ip += -7
```

First all the characters are pushed to the stack in reverse. Then we check if the current character is a null byte, and if it is we exit the loop. Otherwise the current character on top of the stack is first printed, and then popped. Finally we jump back to the condition that checks for the null byte again.

Now let's turn our attention to the important bit. `get` is found 57 times in the disassembly, each of these instruction will read a char from the user. But upon testing the application, entering even a single character could cause the flag check to fail. This means that the check happens as soon as the given character arrives. This is confirmed, by the disassembly, since we see 57 similar blocks that execute some computation on the user input.
Let's see the first one:
```
push 0x0 - ""
TOS++; get(TOS[-1])
dup
push 0x4 - ""
TOS[-2] = TOS[-2] >> TOS[-1]; TOS--
TOS[-2] = TOS[-2] ^ TOS[-1]; TOS--
TOS[-2] = TOS[-2] ^ TOS[-1]; TOS--
dup
push 0x72 - "r"
TOS[-2] = TOS[-2] ^ TOS[-1]; TOS--
if TOS[-1] == 0: ip += 3
ip += 1037
TOS--
```
Now let's see what happens on the stack instruction by instruction (left most stack item will be the Top of stack for me)
1. We push 0x00 on the stack; `stack: 0x00`
2. We put the user input on the stack; `stack: <flag_0> 0x00`
3. We duplicate the user input on the stack; `stack: <flag_0> <flag_0> 0x00`
4. We push 0x04 on the stack; `stack: 0x04 <flag_0> <flag_0> 0x00`
5. We perform a right shift; `stack: (<flag_0> >> 0x04) <flag_0> 0x00`
6. We perform an xor; `stack: ((<flag_0> >> 0x04) ^ <flag_0>) 0x00`
7. We perform another xor; `stack: (((<flag_0> >> 0x04) ^ <flag_0>) ^ 0x00)`
8. The current value is duplicated on the stack (before doing this there's only one element right now on the stack let's name that element `<res_0>` for clarity); `stack: <res_0> <res_0>`
9. We push 0x72 on the stack; `stack: 0x72 <res_0> <res_0>`
10. We perform an xor; `stack: (0x72 ^ <res_0>) <res_0>`
11. If the result of the previous xor is 0, we skip the far jump and pop the operation's result; `stack: <res_0>`
12. If the result of the previous xor is non-zero, we take the far jump. The far jump will print the flag check failed message, therefore we should avoid it.

So this is done for a character of the flag. Notice how `<res_0>` is left on the stack. The subsequent blocks will exclude the `push 0x00` from the top, and instead use `<res_0>` that is already on the stack. They will also update this value, just like the first block updates the initial 0x00 that was pushed.

Additionally each block pushes a different character instead of `0x72` in the first block. The xor followed by a zero check, will check if those elements are equal. Therefore we are ready to write a simple bruteforce for the flag. Each character can be in the printable ascii range. We will execute the computation as explained above, and then bruteforce the current user input character, until it matches that value that is pushed by the block. A simple python script was made to do this job:
```python
def brute(val, feedback=0x00):
vals = []
for i in range(0x20, 0x7f):
if feedback ^ i ^ (i >> 4) ^ val == 0:
print('\t', i, chr(i))
vals.append(i)

if len(vals) != 1: raise Exception("Now what")
return vals[0]

values = [0x72, 0x1d, 0x6f, 0xa, 0x79, 0x19, 0x65, 0x2, 0x77, 0x47, 0x1d, 0x63, 0x50, 0x22, 0x78, 0x4f, 0x15, 0x60, 0x50, 0x37, 0x5d, 0x7, 0x76, 0x1d, 0x47, 0x37, 0x59, 0x69, 0x1c, 0x2c, 0x76, 0x5c, 0x3d, 0x4a, 0x39, 0x63, 0x2, 0x32, 0x5a, 0x6a, 0x1f, 0x28, 0x5b, 0x6b, 0x9, 0x53, 0x20, 0x4e, 0x7c, 0x8, 0x52, 0x32, 0x0, 0x37, 0x56, 0x7d, 0x7]

fb = 0
ans = ''
for v in values:
subres = brute(v, fb)
fb = fb ^ (subres ^ (subres >> 4))
ans += chr(subres)

print('flag:', ans)
```

* `brute` is going to give us the correct character at the current position of the flag, based on the target value, and the value that was left on top of the stack (0x00 initially, and then `<res_0>` ...)
* `values` is the array of values that are pushed to the stack by each block, that will be checked against the value of the computation.

The loop will go over all the blocks, and collect the correct characters for the flag. It will also keep track of the value that is left on the top of the stack after each computation, so that it can feed it to the next computation.

Using this script, we can recover the flag:
```
➜ vmwhere1 python solve.py
117 u
105 i
117 u
99 c
116 t
102 f
123 {
97 a
114 r
51 3
95 _
121 y
48 0
117 u
95 _
52 4
95 _
114 r
51 3
97 a
108 l
95 _
118 v
109 m
95 _
119 w
104 h
51 3
114 r
51 3
95 _
40 (
103 g
112 p
116 t
95 _
103 g
51 3
110 n
51 3
114 r
52 4
116 t
51 3
100 d
95 _
116 t
104 h
49 1
115 s
95 _
102 f
49 1
52 4
103 g
41 )
125 }
flag: uiuctf{ar3_y0u_4_r3al_vm_wh3r3_(gpt_g3n3r4t3d_th1s_f14g)}
```