Tags: interpreter overflow 

Rating:

[Link to original writeup](https://wrecktheline.com/writeups/m0lecon-2021/#m0lang)
# Molang (5 solves, 406 points)
by JaGoTu

```
A friend of mine developed this brand new interpreter, do you want to play with it?

Author: matpro
```

We get a `chall` file:

```
$ file chall
chall: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=15a0d9198a5ce75bb5b39e0197e3c64b71ec137e, not stripped
```

When running it, we get a REPL shell of m0lang:

```
$ ./chall
m0lang 0.1 (May 14 2021, 19:00:00)
Type 'help "<command>"' or 'help <value>' for more information.
m0lang> help
[line 2] Error at end: Expect expression.
m0lang> flag;
8GG8!d8?W88ݫe!88ѫd5ˉ?5G
m0lang>
```

After a while of staring into IDA, I recognized that the interpreter is a modified version of the `clox` interpreter from
the [Crafting interpreters](https://craftinginterpreters.com/) book (source available on [Github](https://github.com/munificent/craftinginterpreters])). I got lucky, as I read the book as part of preparations for my bachelor's thesis, but even if you never heard about it, it is always a good idea to try to google for some error messages in the binary, in case it's a modified open source project. For example here searching for the `Can't read local varaible in its own initializer.` error message would lead you directly to the project, with the `varaible` typo intact, which is a good indicator that it's based on `clox`.

Once I knew that, I started renaming procedures and creating structures in IDA based on the `clox` source code. `clox` works by parsing the raw source, compiling it into bytecode and then executing such bytecode. One of the added features to `molang` were the flag and help features, so these had to be parsed somewhere. A function called `identifierType()` is responsible for parsing reserved words using a trie-like code:

```C
TokenType __fastcall identifierType()
{
TokenType result; // eax
int v1; // eax
int v2; // eax

switch ( *scanner.start )
{
case 'a':
result = checkKeyword(1, 2, &unk_91E4, TOKEN_AND);
break;
case 'c':
result = checkKeyword(1, 4, "lass", TOKEN_CLASS);
break;
case 'e':
result = checkKeyword(1, 3, "lse", TOKEN_ELSE);
break;
case 'f':
if ( scanner.current - scanner.start <= 1 )
goto LABEL_32;
v1 = *((char *)scanner.start + 1);
if ( v1 == 'l' )
{
result = checkKeyword(2, 2, "ag", TOKEN_FLAG);
}
else if ( v1 > 108 )
{
if ( v1 == 'o' )
{
result = checkKeyword(2, 1, "r", TOKEN_FOR);
}
else
{
if ( v1 != 'u' )
goto LABEL_32;
result = checkKeyword(2, 1, "n", TOKEN_FUN);
}
}
else
{
if ( v1 != 'a' )
goto LABEL_32;
result = checkKeyword(2, 3, "lse", TOKEN_FALSE);
}
break;
case 'h':
result = checkKeyword(1, 3, "elp", TOKEN_HELP);
break;
/*
shortened, you get the point
[...]
*/
case 'v':
result = checkKeyword(1, 2, "ar", TOKEN_VAR);
break;
case 'w':
result = checkKeyword(1, 4, "hile", TOKEN_WHILE);
break;
default:
LABEL_32:
result = TOKEN_IDENTIFIER;
break;
}
return result;
}
````

Using the code we can resolve `flag` and `help` to `TOKEN_FLAG = 0x16` and `TOKEN_HELP = 0x27` enemurable values respectively.

Now, where are those tokens compiled? In `statement()`:

```C
void __fastcall statement()
{
if ( (unsigned __int8)match_token(TOKEN_PRINT) )
{
printStatement();
}
else if ( (unsigned __int8)match_token(TOKEN_FOR) )
{
forStatement();
}
else if ( (unsigned __int8)match_token(TOKEN_IF) )
{
ifStatement();
}
else if ( (unsigned __int8)match_token(TOKEN_RETURN) )
{
returnStatement();
}
else if ( (unsigned __int8)match_token(TOKEN_WHILE) )
{
whileStatement();
}
else if ( (unsigned __int8)match_token(TOKEN_HELP) )
{
helpStatement();
}
else if ( (unsigned __int8)match_token(TOKEN_FLAG) )
{
flagStatement();
}
else if ( (unsigned __int8)match_token(TOKEN_LEFT_BRACE) )
{
beginScope();
block();
endScope();
}
else
{
expressionStatement();
}
}
```

Both `flagStatement()` and `helpStatement()` are simple functions:

```C
__int64 flagStatement()
{
consume(TOKEN_SEMICOLON, (__int64)"Expect ';' after command.");
return emitByte(0xEu);
}

__int64 helpStatement()
{
expression();
consume(TOKEN_SEMICOLON, (__int64)"Expect ';' after value.");
return emitByte(0x19u);
}
```

So we are dealing with 2 new instructions, `0xE` for flag and `0x19` for help.
Looking into `run()`, we can locate the handlers for those opcodes:

```C
case 0xEu:
printFlag();
putchar(10);
continue;
/* [...] */
case 0x19u:
v104 = pop();
v0 = *(double *)&v105;
get_help_for_primitive(v104, v105);
putchar(10);
continue;
```

Digging deeper:

```C
void printFlag()
{
do_printflag();
}

void __fastcall do_printflag()
{
decode(qword_20CC40);
}

void __fastcall get_help_for_primitive(int a1, __int64 a2)
{
if ( a1 == 1 )
{
printf("A nil.");
}
else if ( a1 )
{
if ( a1 == 2 )
{
printf("A number.");
}
else if ( a1 == 3 )
{
get_help_complex(3LL, a2);
}
}
else
{
printf("A boolean.");
}
}

void __fastcall get_help_complex(__int64 a1, __int64 a2)
{
char dest[40]; // [rsp+10h] [rbp-30h] BYREF
unsigned __int64 v3; // [rsp+38h] [rbp-8h]

v3 = __readfsqword(0x28u);
switch ( *(_DWORD *)a2 )
{
case 0:
printf("A bound method.");
break;
case 1:
printf("A class.");
break;
case 2:
printf("A closure.");
break;
case 3:
printf("A function.");
break;
case 4:
printf("An instance.");
break;
case 5:
printf("A native function.");
break;
case 6:
strncpy(dest, *(const char **)(a2 + 24), 0x1EuLL);
if ( !strcmp(dest, "and") )
{
decode(qword_20C460);
}
else if ( !strcmp(dest, "class") )
{
decode(qword_20C4D0);
}
else if ( !strcmp(dest, "else") )
{
decode(qword_20C540);
}
/* [...] */
else if ( !strcmp(dest, "while") )
{
decode(qword_20CB60);
}
else if ( !strcmp(dest, "help") )
{
decode(qword_20CBD0);
}
else
{
printf("A string.");
}
break;
case 7:
printf("An upvalue.");
break;
default:
return;
}
}
```

We see that `help` for strings and `flag` both use the `decode()` function to decode a message.
Jumping to XREFs, the used qwords are populated by `refresh()`, which is called at the begginning of `run()` (so for each chunk) - the `refresh()` function is long and uninteresting, just moving a bunch of qword constants. I decided to get the values dynamically with gdb:

```
pwndbg> dump binary memory dump.bin 0x55555560c460 0x55555560cca0
```

Now let's reimplement the `decode()` function in python:
```C
void __fastcall decode(char *a1)
{
unsigned int i; // [rsp+14h] [rbp-7FCh]
int len; // [rsp+18h] [rbp-7F8h]
char mul1; // [rsp+1Ch] [rbp-7F4h]
char add1; // [rsp+20h] [rbp-7F0h]
char add2; // [rsp+24h] [rbp-7ECh]
char add3; // [rsp+28h] [rbp-7E8h]
char add4; // [rsp+2Ch] [rbp-7E4h]
char v8[1008]; // [rsp+30h] [rbp-7E0h] BYREF
char src[1000]; // [rsp+420h] [rbp-3F0h] BYREF
unsigned __int64 v10; // [rsp+808h] [rbp-8h]

v10 = __readfsqword(0x28u);
len = (unsigned __int8)*a1;
mul1 = a1[1];
add1 = a1[2];
add2 = a1[3];
add3 = a1[4];
add4 = a1[5];
for ( i = 0; len > i; ++i )
{
src[i] = a1[i + 6];
a1[i + 6] = src[i] * (src[i] * (add2 + src[i] * (add1 + mul1 * src[i])) + add3) + add4;
v8[i] = a1[i + 6];
}
strncpy(a1 + 6, src, 0x6AuLL);
printf("%s", v8);
}
```

```python
with open('dump.bin', 'rb') as f:
data = f.read()

def decode(offset):
inp = data[offset:]
len = inp[0] & 0xFF
mul1 = inp[1]
add1 = inp[2]
add2 = inp[3]
add3 = inp[4]
add4 = inp[5]
result = []
for i in range(len):
tmp = inp[i+6]
result.append((tmp * (tmp * (add2 + tmp * (add1 + mul1 * tmp)) + add3) + add4) & 0xFF)

return bytes(result)

for i in range(0,len(data),0x70):
print(decode(i))
```

Running this we get suspicious output:

```
$ python3 decode.py
b'Logical AND operation.\x00'
b'A class, have you ever heard of OOP?\x00'
b'What to do if the previous condition is not satisfied.\x00'
b'A boolean value, which is not true.\x00'
b'The command to print the flag.\x00'
b'A for loop, like a whhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh'
b"AHAHAHAHAH\nNo wait, it's a function.\x00"
b'Basic keyword for conditional programming.\x00'
b'Nothing to say.\x00'
b'Logical OR operation.\x00'
b'How to display something.\x00'
b'How to pass something to the previous function.\x00'
b"No, it's not a hero.\x00"
b'Not that.\x00'
b'A boolean value, which is not false.\x00'
b'A variable... Do you know how to code?\x00'
b"It's a while loop.\x00"
b'Really? You need help for help?\x00\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tQ\x80D\xce\xd9\xc1\xa9\x9dh*Uh*U\xc9\x9ddU\xb3A\xa0>\x87U\xa5\xc9\xac\xb2\x1a\x89{6\xd46\x7f\xea\x89AA\xf86\xf0{\xc9\xea\x7f\xa5\x7fd\xc96\xa9\x9dh\xb46\xb3A\xa0>\x16<\x01U{FNU\xf8F\xf4U\x1a\xa0{U*\xf4\xef\xach\xc9Uh\xc9`5\t\t\t\t\t\t'
b'8\x88\xedG\x88\xedG\x888\x88!\x88d\xf85\xa3\x94\x88\x088\x19\xffW8\xa4\xab\xb1\xab\x94\x878\xf8\xf8\xdd\xabe\xa48\x87\x94\x08\x94!8\xab8\x88\xed\xd1\xabd\xf85\xa3\xcb\x89\xf8\x88\xa4\x1b\x13\x88\xdd\x1b\x91\x88W5\xa4\x88G\x91'
```

Notice how the help for help is way too long when going by the internal length? It must overflow into the encoded string for flag! The encoded messages are restored/refreshed only once per code chunk, so after running `help "help";` the encoded message for flag will be corrupted. Maybe it's intended?

Let's try to run the following program:

```
fun test()
{
help "help";
flag;
}

test();
```

```
$ ./donut test.txt
Really? You need help for help?
This is the flag: ptm{c4n_U_r34lly_1nt3rpret_Thi5_flag?}, now you can submit it!
```

BINGO, we got the flag!No captcha required for preview. Please, do not write just a link to original writeup here.

Original writeup (https://wrecktheline.com/writeups/m0lecon-2021/#m0lang).