Tags: turing_machine make 

Rating:

**Description**

> Solving CTF challenges often involves a lot of work, which is very unfair to lazy competitors. To even things out, we designed a self-solving challenge: Just type `make flag` and make yourself a coffee while the solution is computed.

**Files provided**

- [`Makefile`](https://github.com/EmpireCTF/empirectf/blob/master/writeups/2018-10-05-Hackover-CTF/files/flagmaker)

**Solution**

> Note: this is a post-CTF analysis, I did not quite manage to get to the flag during the competition.

When I first downloaded this, I started `make flag` in the background while looking at other challenges… As it turns out, even if `make` supported arbitrarily big integers (which it doesn't), it would take an insane amount of time to finish. I know my computer isn't the fastest and there were no solves for this challenge for hours, so clearly it wasn't going to be that easy!

My first analysis was just a cursory glance through the functions in the makefile.

```Makefile
r = [-48, 2, -48, -8, -59, -18, 1, -59, 3, -5, -26, -57, 53, 3, -43, -3, -41, -20, 1, -64, -65, -45, -71, -47, -16, -47, -38, -3, 46, -63, -54, 1, -49, 4, -51, -45, -61, -46, -13, -4, -65, -48, -55, -51, -38, -64, -50, -5, -65, 2, -54, -56, -1, -50, -28]

flag:
@echo $(call l, $(o), $(n), $(p), $(q)) | sha256sum | python -c "import sys; print ''.join([chr(ord(c) - d) for c,d in zip(sys.stdin.read(),$(r))])" | tee $@
```

The result of a very complex macro call to `l` was hashed using SHA-256, then the hexadecimal digest was piped into a `python` one-liner, which used the first 55 digits and shifted them using the `r` array.

Here I discovered the first method of solving this challenge. The output of `python` was the flag, which had to be in the `hackover18{...}` format. The input to `python` were lowercase hexadecimal digits, and there are only so many of them. Based on this, some characters of the flag could be exactly pinpointed. I did not actually try this dictionary attack. Maybe I should have since apparently the first team to solve this used this method.

I was quite interested in how the script actually worked. The first step was to create a dependency graph of the functions:

![](https://github.com/EmpireCTF/empirectf/raw/master/writeups/2018-10-05-Hackover-CTF/screens/flagmaker.jpg)

This revealed that the complicated-looking mess of functions actually completely separated into two almost tree-like dependency graphs, and only `l` and `k` had a recursive dependency. With a graph like this it is easier to reason about what each function does and what its inputs might represent.

Before analysing functions properly, I noticed one thing – the `q` variable:

```Makefile
q = ++--+++++ ++-+-+--+ +++---++- ++++-+-++ --+-++-++ --+++++-- -++-+--+- -+++---++ -+--+++-+ -+-++-+++ +-+-++++- +-++++-+- +---++--- +--++-+--
```

`q` is passed to the `l` function as its last, 4th argument in the initial call. When `l` calls `k` to recurse, it gives `q` unmodified to `k` as its last, 5th argument. Likewise, when `k` calls `l` to recurse, it gives back `q` unmodified to `l` as its 4th argument. `q` is indeed a constant, and even though it is used within the functions in other places, the original value is passed back and forth. I renamed `q` to `program`, since at this point I was thinking the script is interpreting `program` as some sort of source code.

Let's first look at the functions referenced in `l`, which is also the initial function called by `make`. Throughout the analysis, we can refer to the GNU Makefile documentation on [text functions](https://www.gnu.org/software/make/manual/html_node/Text-Functions.html#Text-Functions) and [conditional functions](https://www.gnu.org/software/make/manual/html_node/Conditional-Functions.html#Conditional-Functions).

```Makefile
a=$(subst -,- ,$(subst +,+ ,$(1)))
d=$(if $(filter-out -,$(1)),,+)
e=$(call a,$(filter $(subst $(call a) ,,$(strip $(1))$(strip $(2)))%,$(3)))
j=$(word $(words $(1)), $(2))
m=$(word 1,$(1)) $(or $(word 2,$(1)),+) $(or $(word 3,$(1)),-)

# l($1 ?, $2 ?, $3 ?, $4 program)
l=$(if $(call d,$(call m,$(1))),$(words $(filter +,$(3))),$(call k,$(call e, $(call m,$(1)), $(call j,$(2),$(3)),$(4)),$(call m,$(1)),$(2),$(3),$(4)))
```

```Makefile
# a($1 str)
a=$(subst -,- ,$(subst +,+ ,$(1)))
```

`a` takes a string, and separates all `-` and `+` with spaces. E.g. `$(call a, --+-++) == - - + - + +`. Note that in `make`, spaces between the function name and the first argument are ignored, and so are spaces after an argument comma and the argument itself. Spaces at the end of arguments are preserved, however, which is why `a` is not actually a no-op.

Also, at this point it is pretty safe to say that there seems to be no way for `make` to produce strings containing anything other than `+`, `-`, or whitespace.

```Makefile
# e($1 prefix1, $2 prefix2, $3 program)
e=$(call a,$(filter $(subst $(call a) ,,$(strip $(1))$(strip $(2)))%,$(3)))
```

We know that the third argument for `e` is the `program`, since it is only invoked in `l` and it gives `e` its last argument, the `program` constant.

Here it is important to check how the `filter` function of `make` works in specific cases. `$(filter AB%, ABC AABC AB CAB) == ABC AB`. So `e` looks for an exact prefix (created from its first two arguments) in the `program` words, then takes that word and separates its `+` and `-` using `a`.

Note also the `subst` call, where `a` is called without any arguments (resulting in an empty string). But `$(call a) ,` (with a space afterwards) actually results in the string `" "` (space), so the `subst` call here removes all spaces in the prefix, so it can be properly found in `program`.

```Makefile
# d($1 str)
d=$(if $(filter-out -,$(1)),,+)
```

`d` checks if there are any characters other than `-` in its input. If yes, it returns the empty string, otherwise it returns `+`.

We can now extend our assumption about the `+` and `-` characters. Given that there are two kinds of symbols, it is natural to assume some sort of binary encoding. The `l` function itself (as we'll see further down) uses `d` to check if it should recurse or terminate. It will only terminate if `d` returns a non-empty string, i.e. `+`. This will only happen if the input to `d` is all `-`. Let's therefore understand `+` as a binary 1 and `-` as a binary 0.

`d` is then the stop condition checking if its input is equal to 0.

```Makefile
# j($1 pos, $2 data)
j=$(word $(words $(1)), $(2))
```

`j` returns a part of its second argument based on its first argument. `words` in `make` returns the number of words of its argument, `word` returns the n-th (one-indexed) word of its argument.

There are very few situations I can think of where referencing data based on the length of other data makes sense. With our `+` / `-` encoding, the only thing that makes sense is a unary encoding – the first argument is a "number" where `+ + +` represents `3`, `+ + + + + +` represents `6`, etc.

Then the second argument is some sort of array of bits ...?

```Makefile
# m($1 state)
m=$(word 1,$(1)) $(or $(word 2,$(1)),+) $(or $(word 3,$(1)),-)
```

`m` seems to take three bits from its input. If the second bit is not present, `+` is used. And if the third bit is not present `-` is used. The argument for `m` is the first argument given to `l`, which comes from the `o` variable:

```Makefile
o = +
```

Clearly `o` always has at least one bit. When given to `m`, it is changed to `++-`. So it seems that `m` ensures that the default value for its output is `++-`, unless the input has enough data.

```Makefile
# l($1 state, $2 pos, $3 data, $4 program)
l=$(if $(call d,$(call m,$(1))),
$(words $(filter +,$(3))),
$(call k,
$(call e,
$(call m,$(1)),
$(call j,$(2),$(3)),
$(4)
),
$(call m,$(1)),
$(2),
$(3),
$(4)
)
)
```

Finally let's look at `l`. As explained in `d`, it stops recursing when its first argument becomes 0. When it stops, it returns the number of `+` symbols in `data`. Otherwise, it calls `k`.

Note that here we can see that the only possible way for this program to stop is when `d` evaluates to `+` and causes `l` to count `+` symbols in `data`. For a long time I was spending a lot of CPU resources to check a lot of integers, SHA-256 hashing them, and checking if their prefix is `8c3c4df743a` – which would then decode into `hackover18{`. Given how large the actual target number is, this was a very futile effort. But, hindsight's 20/20!

`$(call m,$(1))` appears multiple times, and we already know that it simply produces a default value if needed.

And finally, the `e` subcall resulting in the first argument of `k`. It takes the potentially defaulted `state`, appends a single symbol from `data` based on `pos`, and finds that prefix in the `program`.

Even without knowing `k` it might start to be more and more clear what this program is. It is a [Turing Machine](https://en.wikipedia.org/wiki/Turing_machine) simulator, with 3-bit (up to 8) states, 2 symbols, and `data` is its tape. Given how long it executes, we can also guess that it is a [Busy beaver](https://en.wikipedia.org/wiki/Busy_beaver), i.e. a Turing machine designed to run for as long as possible.

For the sake of completion, however, let's have a look at the functions referenced by `k`.

```Makefile
b=$(or $(if $(filter +,$(strip $(1))),$(2) +),$(if $(filter -,$(strip $(1))),$(wordlist 2,$(words $(2)),$(2))),$(if $(filter N,$(strip $(1))),$(2)))
c=$(if $(word 1,$(1)),$(if $(word $(words $(1)),$(2)),$(2),$(2) -),- $(2))
f=$(wordlist 7,$(words $(1)),$(1))
g=$(call b,$(word 6,$(1)), $(2))
h=$(wordlist 1,$(words $(wordlist 2,$(words $(2)),$(2))),$(3)) $(word 5,$(1)) $(wordlist $(words $(2) -),$(words $(3)),$(3))
i=$(or $(call g,$(1),$(2)),+)

# k($1 state spec, $2 state, $3 pos, $4 data, $5 program)
k=$(call l,$(call f, $(1)),$(call i, $(1), $(3)),$(call c,$(call g, $(1), $(3)),$(call h, $(1), $(3), $(4))),$(5))
```

```Makefile
# b($1 movement, $2 pos)
b=$(or
$(if $(filter +,$(strip $(1))),$(2) +),
$(if $(filter -,$(strip $(1))),$(wordlist 2,$(words $(2)),$(2))),
$(if $(filter N,$(strip $(1))),$(2))
)
```

`b` takes an argument `movement` that will always be either `+` or `-` and the tape position (unary number). If `movement` is `+`, a `+` is appended to the position - increment by 1. If `movement` is `-`, the first word of the position is removed - decrement by 1. There is a check for `movement` `N`, but it seems to be a red herring, since this can never happen.

```Makefile
# c($1 pos, $2 data)
c=$(if $(word 1,$(1)),$(if $(word $(words $(1)),$(2)),$(2),$(2) -),- $(2))
```

`c` is a function to "normalise" the tape. Whenever `pos` becomes 0, it prepends a `-` to `data`, thereby extending it to the left (`pos` itself is then normalised in `i`). And whenever `pos` is larger than the length of `data`, a `+` is appended to `data`. Note that the latter comparison is done with `$(word $(words $(1)),$(2))`, which checks if the `pos`-th word exists in `data`. If neither happens, `data` is left intact and returned as-is.

```Makefile
# f($1 state spec)
f=$(wordlist 7,$(words $(1)),$(1))
```

`f` takes one of the (space-separated) words from `program` and returns its 7th through last words. We'll see soon the significance of this.

```Makefile
# g($1 state spec, $2 pos)
g=$(call b,$(word 6,$(1)), $(2))
```

`g` takes one of the (space-separated) words from `program` and updates the tape position `pos` based on its 6th word.

```Makefile
# h($1 state spec, $2 pos, $3 data)
h=$(wordlist 1,$(words $(wordlist 2,$(words $(2)),$(2))),$(3)) \
$(word 5,$(1)) \
$(wordlist $(words $(2) -),$(words $(3)),$(3))
```

`h` is the function that actually takes care of writing symbols to the tape. It takes `data` and returns it in three chunks - all the `data` words (symbols) from beginning to `pos - 1`, then the 5th word of one of the (space-separated) words from `program`, then all the `data` words (symbols) from `pos + 1` to the end. A Python equivalent would be `data[:pos] ++ [symbol] ++ data[pos + 1:]`.

```Makefile
# i($1 state spec, $2 pos)
i=$(or $(call g,$(1),$(2)),+)
```

`i` just wraps `g`, but if the output of `g` is empty (i.e. `pos == 0`), it returns `+` (i.e. `pos == 1`) instead. This is the other part of the position normalisation. If `pos` becomes zero, the tape is extended to the left (see `c` above), and `pos` is incremented to be one instead.

```Makefile
# k($1 state spec, $2 state, $3 pos, $4 data, $5 program)
k=$(call l,
$(call f, $(1)),
$(call i, $(1), $(3)),
$(call c,
$(call g, $(1), $(3)),
$(call h, $(1), $(3), $(4))
),
$(5)
)
```

And finally `k`. This function always calls `l`, but it updates all of its arguments, except for the `program`, which never changes. Let's finally crack the `program` encoding, based on the functions we have just analysed. `program` contains 14 words, each of which is composed of 9 `+` / `-` symbols.

```
++--+++++
++-+-+--+
+++---++-
++++-+-++
--+-++-++
--+++++--
-++-+--+-
-+++---++
-+--+++-+
-+-++-+++
+-+-++++-
+-++++-+-
+---++---
+--++-+--
```

According to the various ways these symbols are used, we can split these words into columns:

```
++- - + + +++
++- + - + --+
+++ - - - ++-
+++ + - + -++
--+ - + + -++
--+ + + + +--
-++ - + - -+-
-++ + - - -++
-+- - + + +-+
-+- + + - +++
+-+ - + + ++-
+-+ + + + -+-
+-- - + + ---
+-- + + - +--
```

Note that the first column is always repated twice, with `-` then `+` in the second column. The last column contains only strings that can be found in the first column, with the exception of `---` (0, the halting state). The first column is 3 bits, hence it can express 8 different numbers, but only 7 states are present. Again, 0 is the halting state that is not described here.

Based on the analysis, we can describe these columns and use decimal representations for the 3-bit ones:

```
(bits) 123 4 5 6 789
dec cur state cur tape write pos change next state dec
6 ++- - + + +++ 7
6 ++- + - + --+ 1
7 +++ - - - ++- 6
7 +++ + - + -++ 3
1 --+ - + + -++ 3
1 --+ + + + +-- 4
3 -++ - + - -+- 2
3 -++ + - - -++ 3
2 -+- - + + +-+ 5
2 -+- + + - +++ 7
5 +-+ - + + ++- 6
5 +-+ + + + -+- 2
4 +-- - + + --- 0
4 +-- + + - +-- 4
```

Or in a more conventional notation, where the tuples represent (what to write, which way to move, which state to go to):

| State | Current symbol: 0 | Current symbol: 1 |
| --- | --- | --- |
| 6 | (1, →, 7) | (0, →, 1) |
| 7 | (0, ←, 6) | (0, →, 3) |
| 1 | (1, →, 3) | (1, →, 4) |
| 3 | (1, ←, 2) | (0, ←, 3) |
| 2 | (1, →, 5) | (1, ←, 7) |
| 5 | (1, →, 6) | (1, →, 2) |
| 4 | (1, →, 0) | (1, ←, 4) |

The Turing machine works in steps on the tape. At each step, it looks at the symbol at the current position on the tape. Based on the current symbol and its current state, it writes a symbol to the tape, then moves either left or right, and finally switches to another state (may be the same state).

The starting state is `++-` (6), thanks to the `m` function.

We can also see that the only way to get to state `0`, the halting state, is from state `4`. State `4` itself does not modify the tape except for changing a single symbol just before halting, so it is basically superfluous. So in fact, this seems to be a 6-state busy beaver.

Now that we know all about this Turing machine simulator, we have another problem: how to figure out its result. Busy beavers are *designed* to take as long as possible, and the record holder for a 6-state BB produces more than "3.514 x⋅10^{18267}" 1's on the tape. This obviously surpasses the memory capabilities of any computer.

The good news is that we do not need to simulate the machine directly, nor do we need to represent one symbol on the tape as one bit of real memory. Even though the busy beaver machines take unfathomable number of steps before halting, their behaviour is still modeled completely with a simple table, as seen above. So the patterns they produce on the tape are expansive, but not terribly complex. Instead of a billion ones, we can simply use a single symbol that say "repeat 1 a billion times", since the number itself, even if arbitrarily big, can still be represented in far less memory than the data.

This is where I was ~1 hour before the end of the CTF, in the process of writing a simulator for the BB machine. From what I have seen of its behaviour, my more efficient simulator would only need to compress long runs of ones and zeros, and long runs of the alternating `0 1 0 1 ...` pattern. Had I started earlier I might have finished the program, or perhaps had the much more practical realisation that "somebody must have done this".

As it turns out, there was a page on the Internet with an efficient BB simulator, though by chance it went offline shortly before the CTF. After the CTF ended, an admin pointed me to [an archived version of the site](http://web.archive.org/web/20160314012051/http://www.drb.insel.de/~heiner/BB/). This site contains [an awk script to run BB simulations](http://web.archive.org/web/20160305221928/http://www.drb.insel.de/~heiner/BB/bbsimtab.html#How), as well as [some record-holding BB machines](http://web.archive.org/web/20140717002127/http://www.drb.insel.de/~heiner/BB/bb-6list).

As mentioned above, the machine in this challenge has a superfluous state which simply adds 1 to the number of 1's on the tape. Taking the number of ones produced by one of the BB machines in the list (`Name = k`), adding 1 to it, and taking the SHA-256 sum of it:

$ echo "2537699363594175843063 + 1" \
| bc \
| shasum -a 256 \
| python -c "import sys; print ''.join([chr(ord(c) - d) for c,d in zip(sys.stdin.read(),[-48, 2, -48, -8, -59, -18, 1, -59, 3, -5, -26, -57, 53, 3, -43, -3, -41, -20, 1, -64, -65, -45, -71, -47, -16, -47, -38, -3, 46, -63, -54, 1, -49, 4, -51, -45, -61, -46, -13, -4, -65, -48, -55, -51, -38, -64, -50, -5, -65, 2, -54, -56, -1, -50, -28])])"

Very creative challenge, I wish I did not waste some of my hours!

`hackover18{n0_beavers_were_h4rmed_gener4ting_this_fl4g}`

Original writeup (https://github.com/EmpireCTF/empirectf/blob/master/writeups/2018-10-05-Hackover-CTF/README.md#500-reverse--flagmaker).