Tags: reversing compiler 

Rating:

# Writeup:
We are given some documentation and files related to a made up language called "seventeen".
## seventeen.txt:
```
= Overview of seventeen =
** CONFIDENTIAL: Please do not talk about seventeen with non-Evil Robot Corp. employees **

Seventeen is a very simple programming language. It gets its name from the fact
that there are only 17 instructions.

Seventeen is used for internal cryptography applications. The seventeen core runtime
has been proven to be secure (see seventeen.coq). In the future, we plan to expand
this language and build commercial products, so please keep all this confidential.

Here are some of the core concepts:

* Each program starts execution with the first instruction in the file, and keeps
processing instructions sequentially unless an IFZ, IFG, CALL, JMP or EXIT
instruction is encountered.

* There are three storage systems:
- a stack (where data can be pushed)
- a vector (where data can be read or written using an offset)
- variables

The state of all these storage systems is undefined when a program starts.

* Most operations work with the stack. E.g. "1 2 ADD" will:
- push 1 on the stack
- push 2 on the stack
- Replace these two numbers with their sum

The resulting stack will contain 3.

* Variables and labels are in the "a-z_" range ('a' to 'z' with '_').
Make sure to read the formal specification (see seventeen.spec)
= Sample code =
* see primes.17

```
## .bash_history:
```
ls
echo "root" | /usr/bin/seventeen passmgr.17
mysql --user=root --password="8a30347d2a" secure.evil-robot.corp
echo "150" | seventeen primes.17 > primes.out
exit
```
Besides the partial documentation and usage examples, we are given two "executables".

One of them is a program that prints primes, which is used to understand how the language works, and the other one is a program that is hard to reverse, and we need its output on the string "sandflea".

By reading the code and the partial documention, we can infer that most of the operations get arguments from the stack, then push the result into the stack.

The following python code runs files written in the "seventeen" language.

After cleaning the file a bit, it evaluates the custom functions and their line number, adding them to a dictionary (for example: {"loop_start:",5}).

Then, it interprets words in each line by the order they appear. The following words are possible:

* A "seventeen" syscall - meaning, one of the built in functions in the "seventeen" language that we need to emulate. We will execute the syscall.

* A number - self explanatory, just push it.

* A custom function - we will push the function name onto the stack. If needed, a syscall will set our instruction pointer to the line accompanying the function.

* A known variable - a variable which already has a value (via the store syscall).

* An unknown variable - a word that doesn't describe any of the above.

Since some syscalls refer to a variables value, and some to its symbol, we push a tuple ('varname', value) onto the stack, and let the syscalls use the relevant value.

## interpreter.py:
```
import sys,re
import syscalls

debug = 0 #set to 1 to enable. [P?]=Pushing onto stack var of type "?". [S] executing a seventeen syscall.
ip = 0 #instruction pointer (line number)
stack = []
functions = {}
variables = {}
vect = [None] * 4096

def clean(data):
#deleteComments
data = re.sub(r"\/\*(.|\n)*?\*\/", "", data)
#deleteEmptyLines
data = "\n".join([ll.rstrip() for ll in data.splitlines() if ll.strip()])
return data

def parseFunctions(data):
for i,line in enumerate(data.splitlines()):
if re.search(r"[a-z_]*:", line) != None:
functions[re.search(r"[a-z_]*:", line).group(0)[:-1]] = i

def parseFile(f):
data = f.read()
data = clean(data)
parseFunctions(data)
return data

def executeWord(word):
global stack,variables,vect,ip
if word in syscalls.__dict__: #word is a builtin seventeen function
sys.stdout.write(("[S]Syscall: " + word + "\n") * debug)
stack,variables,vect,ip = syscalls.__dict__[word](stack,variables,vect,functions,ip)
return
if word.isdigit(): #word is a number, push it
sys.stdout.write(("[PN]Number: " + word + "\n") * debug)
stack.append(int(word))
return
if word[:-1] in functions: #word is function definition. Skip it.
return
if word in functions: #custom function name
sys.stdout.write(("[PF]Func: " + word + "\n") * debug)
stack.append(word)
return
if word in variables: #var-val tuple
sys.stdout.write(("[PV]var-val: " + str((word,variables[word])) + "\n") * debug)
stack.append((word,variables[word]))
return
else: #unknown variable, probably pushed to store
sys.stdout.write(("[PUV]var: " + word + "\n") * debug)
stack.append(word)
return

def executeLine(line):
global ip
curr_line_num = ip
for word in line.split():
executeWord(word)
if curr_line_num == ip:
ip = curr_line_num+1

def run(data):
global ip
lines = data.splitlines()
line = lines[0]
while True:
executeLine(line)
line = lines[ip]

if len(sys.argv) != 2:
print "Usage: %s [filename]" % sys.argv[0]
sys.exit(-1)

f = open(sys.argv[1],'r')
run(parseFile(f))
```

## syscalls.py:
```
import sys

#pushes a number from stdin onto the stack
def read_num(stack,variables,vect,functions,ip):
stack.append(int(raw_input()))
return (stack,variables,vect,ip)

#pushes a byte (as a number) from stdin onto the stack
def read_byte(stack,variables,vect,functions,ip):
char = sys.stdin.read(1)
if char == "": #ENDOFOUTPUT
stack.append(0)
else:
stack.append(ord(char))
return (stack,variables,vect,ip)

#input:anything
#duplicates the top of the stack
def dup(stack,variables,vect,functions,ip):
stack.append(stack[-1])
return (stack,variables,vect,ip)

#input:(value(int)/(str,int))
#input:(varname(str)/(str,int))
#stores value in varname
def store(stack,variables,vect,functions,ip):
varname = stack.pop()
value = stack.pop()
if (type(varname) == tuple):
varname = varname[0]
if (type(value) == tuple):
value = value[1]
assert(type(varname) == str and type(value) == int)
variables[varname] = value
return (stack,variables,vect,ip)

#input:(value(int)/variable(tuple))
#input:(function)
#input:(function)
#conditional jump (ifzero)
def ifz(stack,variables,vect,functions,ip):
f1 = stack.pop()
f2 = stack.pop()
value = stack.pop()
if (type(value) == tuple): #variable
value = value[1]
assert(f1 in functions and f2 in functions and type(value) == int)
if value != 0:
ip = functions[f1]
else:
ip = functions[f2]
return (stack,variables,vect,ip)

#input:(value(int)/variable(tuple))
#input:(function)
#input:(function)
#conditional jump (ifgreaterthanzero)
def ifg(stack,variables,vect,functions,ip):
f1 = stack.pop()
f2 = stack.pop()
value = stack.pop()
if (type(value) == tuple): #variable
value = value[1]
assert(f1 in functions and f2 in functions and type(value) == int)
if value < 0:
ip = functions[f1]
else:
ip = functions[f2]
return (stack,variables,vect,ip)

#input:(function)
#pushes return address and jumps to the function line number
def call(stack,variables,vect,functions,ip):
f = stack.pop()
assert(f in functions)
stack.append(ip+1)
ip = functions[f]
return (stack,variables,vect,ip)

#input:(int)
#input:(int)
#pushes a1 + a2
def add(stack,variables,vect,functions,ip):
a1 = stack.pop()
a2 = stack.pop()
if (type(a1) == tuple): #variable
a1 = a1[1]
if (type(a2) == tuple): #variable
a2 = a2[1]
assert(type(a1) == int and type(a2) == int)
stack.append(a1+a2)
return (stack,variables,vect,ip)

#input:(int)
#input:(int)
#pushes a2-a1
def sub(stack,variables,vect,functions,ip):
a1 = stack.pop()
a2 = stack.pop()
if (type(a1) == tuple): #variable
a1 = a1[1]
if (type(a2) == tuple): #variable
a2 = a2[1]
assert(type(a1) == int and type(a2) == int)
stack.append(a2-a1)
return (stack,variables,vect,ip)

#input:(int)
#input:(int)
#pushes a2 mod a1
def mod(stack,variables,vect,functions,ip):
a1 = stack.pop()
a2 = stack.pop()
if (type(a1) == tuple): #variable
a1 = a1[1]
if (type(a2) == tuple): #variable
a2 = a2[1]
assert(type(a1) == int and type(a2) == int)
stack.append(a2%a1)
return (stack,variables,vect,ip)

#input:(int)
#input:(int)
#pushes a2 xor a1
def xor(stack,variables,vect,functions,ip):
a1 = stack.pop()
a2 = stack.pop()
if (type(a1) == tuple): #variable
a1 = a1[1]
if (type(a2) == tuple): #variable
a2 = a2[1]
assert(type(a1) == int and type(a2) == int)
stack.append(a2^a1)
return (stack,variables,vect,ip)

#input:(int)
#prints the byte
def print_byte(stack,variables,vect,functions,ip):
b = stack.pop()
assert(type(b) == int)
sys.stdout.write(chr(b))
return (stack,variables,vect,ip)

#input:(int)
#prints the number
def print_num(stack,variables,vect,functions,ip):
n = stack.pop()
if (type(n) == tuple): #variable
n = n[1]
assert(type(n) == int)
sys.stdout.write(str(n))
return (stack,variables,vect,ip)

#input:(int)/(function)
#jumps to line in input
def jump(stack,variables,vect,functions,ip):
addr = stack.pop()
if (type(addr) == tuple): #variable
addr = addr[1]
if (type(addr) == str): #function
addr = functions[addr]
assert(type(addr) == int)
ip = addr
return (stack,variables,vect,ip)

#input:(index(int)/(str,int))
#input:(value(int)/(str,int))
def vstore(stack,variables,vect,functions,ip):
value = stack.pop()
index = stack.pop()
if (type(index) == tuple):
index = index[1]
if (type(value) == tuple):
value = value[1]
assert(type(index) == int and type(value) == int)
vect[index] = value
return (stack,variables,vect,ip)

#input:(index(int)/(str,int))
def vload(stack,variables,vect,functions,ip):
index = stack.pop()
if (type(index) == tuple):
index = index[1]
assert(type(index) == int)
stack.append(vect[index])
return (stack,variables,vect,ip)

def exit(stack,variables,vect,functions,ip):
print "Done!"
sys.exit(0)
```