Tags: printf rev 

Rating: 5.0

# Rev: Sprint

## Task Description

Sprint faster than this binary!

## First Look

We are provided one ELF, namely `sprint`. Opening this in Ghidra reveals a `main` function as follows:

```
undefined8 main(void)

{
undefined8 uVar1;
ushort *local_98;
ushort *local_90;
undefined8 local_88;
undefined8 local_80;
undefined8 local_78;
undefined8 local_70;
undefined8 local_68;
undefined8 local_60;
undefined8 local_58;
undefined8 local_50 [2];
ushort *local_40;

local_40 = (ushort *)mmap((void *)0x4000000,0x4000000,3,0x22,-1,0);
memcpy(local_40,M,0xf134);
local_88 = 0;
local_80 = 0;
local_78 = 0;
local_70 = 0;
local_68 = 0;
local_60 = 0;
local_58 = 0;
local_50[0] = 0;
local_98 = local_40;
local_90 = local_40;
puts("Input password:");
uVar1 = 0x101270;
__isoc99_scanf("%255s",local_40 + 0x7000);
while (local_98 != local_40 + 0x7fff) {
sprintf((char *)0x6000000,(char *)local_98,&DAT_0011116a,0,&local_98,0x6000000,(ulong)*local_90,
local_90,&local_90,local_88,&local_88,local_80,&local_80,local_78,&local_78,local_70,
&local_70,local_68,&local_68,local_60,&local_60,local_58,&local_58,local_50[0],local_50,
uVar1);
}
if (local_40[0x7400] != 0) {
printf("Flag: %s\n",local_40 + 0x7400);
}
return 0;
}
```

Upon further investigation, this uses `mmap` on the memory region `0x40000000` to `0x80000000`, as well as copying `0xf134` bytes of data to the start of this section. Looking at the data being copied as characters, we see a whole bunch of strings looking very much like C-style `printf` format strings:

```
%1$00038s%3$hn%1$65498s%1$28672s%9$hn
%1$00074s%3$hn%1$65462s%1$*8$s%7$hn
%1$00108s%3$hn%1$65428s%1$1s%6$hn
%1$00149s%3$hn%1$65387s%1$*8$s%1$2s%7$hn
%1$00183s%3$hn%1$65353s%1$1s%6$hn
%1$00218s%3$hn%1$65318s%1$2s%11$hn
%1$00264s%3$hn%1$65272s%1$*10$s%1$*10$s%17$hn
%1$00310s%3$hn%1$65226s%1$28672s%1$*16$s%7$hn
...
```

This makes sense, as we have an `sprintf` function call inside a loop later on. In fact, the program initializes `local_98` to be equal to the start of the mapped region, and provides this as a format string argument to `sprintf`! We can see that it terminates the program when this variable is equal to a certain value, and then does a check before possibly revealing the flag. This leads us into our next topic.

## Control Flow Bending

![bending](https://user-images.githubusercontent.com/11176520/90996567-6db03480-e5bf-11ea-9fc1-24eba788ee2e.png)

While we only learnt this afterwards, this is a well known technique to write arbitrary programs using `printf` format strings (and as a result, `printf` is Turing-complete!). Upon taking a closer look at the format strings, we are able to figure out what they do:

- `%1`, a pointer to null data, is used as an empty string. This is then padded with `$1234s` in order to print a specific amount of characters to the output memory location, `0x6000000`
- `%3` is a pointer to our "instruction pointer", `local_98`, and controls which format string will be executed next
- `$hn` is used to write a half-word (two bytes), representing the amount of characters written so far, to the `sprintf` argument given
- `$*8` or similar is used to reference a `sprintf` argument, and commonly as a parameter for the amount of padding to a `%s` call. This enables the program to perform addition by setting padding equal to one variable and writing it to another
- `%c` is used to write a single character, often one that can either be a null character or not
- `%4`, a pointer to the location where all this is being printed to, is used to form conditionals. The way this works is that `%c` is used to print a character, then a set amount of padding is printed, followed by a null character. Then, all the data is printed out again as a string using `%4$s`. However, when the first character is a null character, no extra data is printed. This is used to form conditional jumps by writing to `%3` afterwards

Armed with all this knowledge, we proceeded to write a decompiler for the format strings given. The script, written in the deep hours of the night and not at all guaranteed to work for any other program, is as follows:

```
import re
ls = open("code").readlines()
f = [l.strip("\n").split("%")[1:] for l in ls]
ln = [len(l.strip("\n"))+1 for l in ls]
ln = ["0x0"] + [hex(sum(ln[:i+1])) for i in range(len(ln)-1)]

def parse(s):
src = re.search(r"(\d+)\$([^A-Za-z]*)(\S+)", s)
return src.groups()

def phex(i):
return "0x" + hex(i)[2:].zfill(4)

f = [[parse(s) for s in l] for l in f]
names = ["", "nadr", "nchr", "iptr", "optr", "@lv90", "lv90", "lr90", "lv88", "lr88", "lv80", "lr80", "lv78", "lr78", "lv70", "lr70", "lv68", "lr68", "lv60", "lr60", "lv58", "lr58", "@lv50", "lv50", "var1"]

for i in range(len(f)):
l = f[i]
wr = 0
s = ln[i][2:].zfill(4) + " | "
regs = ""
end = ""
if l[0][2] == "c":
no = (int(l[1][1]) + int(l[4][1]) + 2) % 65536;
yes = (no + int(l[1][1]) + 1) % 65536;
s += f"cjump {names[int(l[0][0])]} @{phex(yes)} @{phex(no)}"
else:
for e in l:
if e[2] == "s":
if "*" not in e[1]:
wr = (wr + int(e[1]))%65536
else:
regs = (regs if regs == "" else (regs + " + ")) + f"{names[int(e[1][1:-1])]}"
elif e[2] == "hn":
if e[0] == "3":
if (i == len(f)-1 or wr != int(ln[i+1], 16)):
end = f"jump {phex(wr)}"
else:
arr = []
if regs != "":
arr += [regs]
if len(arr) == 0 or wr != 0:
arr += [phex(wr)]
s += f"write {names[int(e[0])]} {' + '.join(arr)}"
else:
s += f"ERROR {e[0]}${e[1]}{e[2]}"
s += end
print(s)
```

This outputs relatively nice assembly-ish code that we can then use to get a better idea of what the program is doing. The output is as follows:

```
0000 | write lr88 0x7000
0026 | write lr90 lv88
004a | write lv90 0x0001
006c | write lr90 lv88 + 0x0002
0095 | write lv90 0x0001
00b7 | write lr80 0x0002
00da | write lr68 lv80 + lv80
0108 | write lr90 lv68 + 0x7000
0136 | write lr70 @lv90
015b | cjump lv70 @0x0324 @0x0180
0180 | write lr78 lv80 + lv80
01ae | write lr90 0xffef
01d4 | write lv90 lv78
01f9 | write lr90 0xfff0
021f | write lr70 @lv90
0244 | cjump lv70 @0x0324 @0x0269
0269 | write lr68 lv78 + lv78
0297 | write lr90 lv68 + 0x7000
02c5 | write lv90 0x0001
02e7 | write lr78 lv78 + lv80
0315 | jump 0x01ae
0324 | write lr80 lv80 + 0x0001
034f | cjump lv80 @0x00da @0x0374
0374 | write lr88 0xe000
039a | write lr80 0x0000
03bd | write lr90 lv88
03e1 | write lr78 @lv90
0406 | cjump lv78 @0x043a @0x042b
042b | jump 0x04a1
043a | write lr80 lv80 + 0xffff
0469 | write lr88 lv88 + 0x0001
0492 | jump 0x03bd
04a1 | write lr68 lv80 + 0x00fe
04d0 | cjump lv68 @0x0504 @0x04f5
04f5 | jump 0x0536
0504 | write lv50 0x0005
0527 | jump 0x13d9
0536 | write lr88 0x0000
0558 | write lr80 0x0000
057b | write lr90 0xf100
05a1 | write lr78 @lv90
05c6 | write lr70 0x0001
05e9 | write lv50 0x0000
060c | write lr90 lv88 + 0xe000
0639 | write lr68 @lv90
065e | cjump lv68 @0x0692 @0x0683
0683 | jump 0x0d97
0692 | write lr88 lv88 + 0x0001
06bb | write lr60 lv68 + 0xff8b
06ea | cjump lv60 @0x0745 @0x070f
070f | write lr68 0xfff0
0736 | jump 0x0945
0745 | write lr60 lv68 + 0xff8e
0774 | cjump lv60 @0x07cb @0x0799
0799 | write lr68 0x0001
07bc | jump 0x0945
07cb | write lr60 lv68 + 0xff9c
07fa | cjump lv60 @0x0852 @0x081f
081f | write lr68 0x0010
0843 | jump 0x0945
0852 | write lr60 lv68 + 0xff94
0881 | cjump lv60 @0x08dc @0x08a6
08a6 | write lr68 0xffff
08cd | jump 0x0945
08dc | write lr70 0x0000
08ff | write lr68 0x0000
0922 | write lv50 0x0001
0945 | write lr78 lv78 + lv68
0973 | write lr90 0xffef
0999 | write lv90 lv78
09be | write lr90 0xfff0
09e4 | write lr68 @lv90
0a09 | cjump lv68 @0x0d65 @0x0a2e
0a2e | write lr90 lv78 + 0xf000
0a5c | write lr68 @lv90
0a81 | write lr90 0xffef
0aa7 | write lv90 lv68
0acc | write lr90 0xfff0
0af2 | write lv90 0x0000
0b14 | write lr90 0xffef
0b3a | write lr68 @lv90
0b5f | write lr68 lv68 + lv68
0b8d | write lr90 lv68 + 0x7000
0bbb | write lr68 @lv90
0be0 | cjump lv68 @0x0d10 @0x0c05
0c05 | write lr68 lv80 + 0x0001
0c30 | write lr90 lv68 + 0xf102
0c5e | write lr68 @lv90
0c83 | write lr68 lv68 + lv78
0cb1 | cjump lv68 @0x0d01 @0x0cd6
0cd6 | write lr80 lv80 + 0x0001
0d01 | jump 0x060c
0d10 | write lr70 0x0000
0d33 | write lv50 0x0002
0d56 | jump 0x060c
0d65 | write lv50 0x0004
0d88 | jump 0xfffe
0d97 | cjump lv70 @0x0dcb @0x0dbc
0dbc | jump 0x13d9
0dcb | write lr68 lv80 + 0xfff7
0dfa | cjump lv68 @0x0e2e @0x0e1f
0e1f | jump 0x0e60
0e2e | write lv50 0x0003
0e51 | jump 0x13d9
0e60 | write lr88 0x0000
0e82 | write lr80 0x0000
0ea5 | write lr78 lv88 + 0xffd9
0ed3 | cjump lv78 @0x0f07 @0x0ef8
0ef8 | jump 0x137b
0f07 | write lr70 0x0004
0f2a | write lr78 0x0000
0f4d | write lr78 lv78 + lv78
0f7b | write lr78 lv78 + lv78
0fa9 | write lr90 lv80 + 0xe000
0fd7 | write lr68 @lv90
0ffc | write lr60 lv68 + 0xff8b
102b | cjump lv60 @0x105f @0x1050
1050 | jump 0x1218
105f | write lr60 lv68 + 0xff8e
108e | cjump lv60 @0x10ed @0x10b3
10b3 | write lr78 lv78 + 0x0001
10de | jump 0x1218
10ed | write lr60 lv68 + 0xff9c
111c | cjump lv60 @0x117b @0x1141
1141 | write lr78 lv78 + 0x0002
116c | jump 0x1218
117b | write lr60 lv68 + 0xff94
11aa | cjump lv60 @0x1209 @0x11cf
11cf | write lr78 lv78 + 0x0003
11fa | jump 0x1218
1209 | jump 0x13d9
1218 | write lr80 lv80 + 0x0001
1243 | write lr70 lv70 + 0xffff
1272 | cjump lv70 @0x0f4d @0x1297
1297 | write lr90 lv88 + 0xf10c
12c4 | write lr70 @lv90
12e9 | write lr90 lv88 + 0xe800
1316 | write lv90 lv70 + lv78
1343 | write lr88 lv88 + 0x0001
136c | jump 0x0ea5
137b | write lr90 lv88 + 0xe800
13a8 | write lv90 0x0000
13ca | jump 0xfffe
13d9 | write lr90 0xe800
13ff | write lv90 0x0000
1421 | jump 0xfffe
```
## Program Analysis

Me and my teammate then spent a good amount of time on a collaborative editing website slowly working our way through this program, analysing the control flow and figuring out what it does. Our final Python-esque pseudocode is as follows:

```
arr[0x7000] = 1
arr[0x7002] = 1

for inc in range(2, 256):
if arr[0x7000+2*inc] == 0:
count = 2*inc
while True:
arr[0xffef] = count
if arr[0xfff0] != 0:
break
arr[0x7000+2*count] = 0x0001
count += inc

l88 = 0xe000
l80 = 0x0000

while arr[l88] != 0:
l80 -= 1
l88 += 1

if l80 != -254:
l50 = 0x0005
exit3()

l88 = 0
l80 = 0
l78 = arr[0xf100]
l70 = 0x0001
l50 = 0x0000

while arr[l88 + 0xe000] != 0:
l68 = arr[l88 + 0xe000]
l88 += 1

if l68 == 117: # u
l68 = -0x10
elif l68 == 114: # r
l68 = 0x01
elif l68 == 100: # d
l68 = 0x10
elif l68 == 108: # l
l68 = -0x01
else:
l70 = 0
l68 = 0
l50 = 1

l78 += l68
arr[0xffef] = l78
if arr[0xfff0] != 0:
exit1()

arr[0xffef] = arr[l78 + 0xf000]
arr[0xfff0] = 0
l68 = 2*arr[0xffef]
if arr[0x7000 + l68] != 0:
l70 = 0
l50 = 2
elif arr[l80 + 0xf103] + l78 == 0:
l80 += 1

def exit1():
l50 = 4
sys.exit()

if l70 == 0:
exit3()

if l80 + 0xfff7 == 0:
l50 = 3
exit3()

l88 = 0
l80 = 0

while l88 != 39:
l78 = 0

for i in range(4):
l78 *= 4
l68 = arr[l80 + 0xe000]

if l68 == 117: # u = 0
l78 += 0
elif l68 == 114: # r = 1
l78 += 1
elif l68 == 100: # d = 2
l78 += 2
elif l68 == 108: # l = 3
l78 += 3
else:
exit3()
l80 += 1

arr[l88 + 0xe800] = arr[l88 + 0xf10c] + l78
l88 += 1

arr[0xe800+l88] = 0
sys.exit()

def exit3():
arr[0xe800] = 0
sys.exit()
```

In short, this does the following:

- Sets up a prime sieve where `0x0001` refers to composite numbers and `0x0000` as primes, starting at `0x7000` offset from the base address and running up to `0x7200`. This can be thought of as a 16x16 array
- Checks length of input string to be `254`, else terminates with `l50` set to `5`
- Uses input string as directions, where each character has to be `u`, `r`, `l`, or `d`, or the program exits with code `1`
- Starts at coordinates `1, 1` and executes the moves, exiting with code `4` if an index variable overflows
- Looks up the coordinate in a preset "16x16" map at `0xf000` offset, and uses the resulting byte as a coordinate to look up in the abovementioned prime sieve. If this location is not on a prime, the program returns exit code `1`
- If this number is on a prime, a check is performed to see if the coordinate is the next in a series of preset coordinates at `0xf103` offset. If so, a counter is incremented
- This counter is checked to be `9`, else the program terminates with code `3`
- Finally, the input string is read `4` characters at a time and converted to a byte using base-4, and is stored in memory as the flag

## Solution

![maze](https://user-images.githubusercontent.com/11176520/90996624-9df7d300-e5bf-11ea-83a5-17d2ee2694bb.png)

It is clear we need to find paths from the start to the various locations, and this will give us the flag. Writing a simple Depth-First Search suffices, and after a bit of tweaking the code we get out a string of, you guessed it, `254` moves that will visit each location. The program:

```
pmap = "ccb0e77bbcc0ee3afc7381d07a6984e248e3d759116bf1b3860b89c5bf536565f0ef6abf0878c42c99353c6cdce0c899c83bef29970bb38bcc9dfc051b67b5ad15c108d045452643456df4efbb4906ca736bbce9509705e597d3b5472bad258baeaf41e5d814f483e6f0c0980aaca195f5b5d353f097ef9dd43b3b0be717071f6cf11e4492b25707b7368f53c9ea109062df1d07b37153611a2b78bfc1b5c63bea2b4417a084ca8fb73b382fe87384ad44eff8ad8c1fea7fcdc5b349050395a744b59169f8956ce587534e4792be80d0801dadf13de3df3561f1e70d71c5024f205ea28bc461320fa8be7e29d16d2ad955470783ea2b79954f3da311ddc11d89"
junk = "8301af49adc10f8be19effa126143b68606bc734c40a1b6d8cc947766532745fe225723274620ab9816ec617e3c5667d"
prim = "0100010000000000010000000100000001000100010000000100000001000100010000000100000001000100010000000100010001000100010000000100000001000100010001000100000001000100010000000100000001000100010000000100010001000100010000000100010001000100010000000100000001000100010001000100000001000100010000000100000001000100010001000100000001000100010000000100010001000100010000000100010001000100010001000100000001000100010000000100000001000100010000000100000001000100010000000100010001000100010001000100010001000100010001000100000001000100010000000100010001000100010000000100000001000100010001000100010001000100010000000100000001000100010001000100000001000100010001000100000001000100010000000100010001000100010000000100010001000100010000000100000001000100010001000100010001000100010000000100000001000100010000000100000001000100010001000100010001000100010001000100000001000100010001000100010001000100010001000100000001000100010000000100000001000100010000000100010001000100010000000100000001000100010001000100010001000100010000000100010001000100"

pmap = [int(pmap[i:i+2], 16) for i in range(0, len(pmap), 2)]
junk = [int(junk[i:i+2], 16) for i in range(0, len(junk), 2)]
prim = [int(prim[i:i+2], 16) for i in range(0, len(prim), 4)]

real = [[prim[pmap[i*16+j]] for j in range(16)] for i in range(16)]

for i in range(16):
print(" ".join(" " if real[i][j] == 0 else "0" for j in range(16)))

ms = [(-1, 0), (0, 1), (1, 0), (0, -1)]
ls = ["u", "r", "d", "l"]

def dfs(cur, end, pre, st):
for i in range(4):
move = ms[i]
newl = (cur[0]+move[0], cur[1]+move[1])
if newl == end:
return ls[i]
if 0 <= newl[0] <= 15 and 0 <= newl[1] <= 15 and real[newl[0]][newl[1]] == 0 and newl != pre:
trail = dfs(newl, end, cur, st)
if trail != "":
return ls[i]+trail
return ""

s = (1, 1)
out = ""

for i in range(9):
loc = ((256-junk[i])//16, (256-junk[i])%16)
print(loc)
out += dfs(s, loc, (-1, -1), "")
s = loc
print(out)
print(len(out))
```

The moves are:

```
ddrrrrrrddrrrrrrrrddllrruullllllllddddllllllddddrrrrrrrruurrddrrddrrlluulluullddlllllllluuuurrrrrruuuuuulllllldduurrrrrrddddddllllllddddrrrrrruuddlllllluuuuuurruuddllddrrrrrruuuurrrrrruurrllddllllllddddllllllddddrrddllrruulluuuurrrrrruullrruurruuuurrrrrr
```

Before I could even write code to assemble this into base-4 myself, my teammate had already inserted it into the program and gotten the flag, `CTF{n0w_ev3n_pr1n7f_1s_7ur1ng_c0mpl3te}`

Thanks to the organizers for one of the most fun challenges I have attempted so far!

Original writeup (https://gist.github.com/RobotSquid/2d42771ea1243178ee3dcc7d6cc8403e).