Tags: commodore64 hardware basic retro 

Rating: 5.0


> You won't find any assembly in this challenge, only C64 BASIC. Once you get the password, the flag is CTF{password}. P.S. The challenge has been tested on the VICE emulator.

**Files provided**

- [a ZIP file](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/files/back-to-the-basics.zip) containing:
- `crackme.prg` - a Commodore 64 ROM / program


We are given a PRG file for Commodore 64. As chance would have it, I had the VICE emulator installed already, so I had a look at what the program actually looks like when executed.


We can try a password:


It says it might take a while but the verdict is instantaneous. Well, let's have a look at the program itself.

$ xxd crackme.prg | head
0000000: 0108 1e08 0100 8f20 b2b2 b2b2 b2b2 b2b2 ....... ........
0000010: b2b2 b2b2 b2b2 b2b2 b2b2 b2b2 b2b2 003a ...............:
0000020: 0802 008f 20b2 b2b2 2042 4143 4b20 a420 .... ... BACK .
0000030: 4241 5349 4353 20b2 b2b2 0057 0803 008f BASICS ....W....
0000040: 20b2 b2b2 b2b2 b2b2 b2b2 b2b2 b2b2 b2b2 ...............
0000050: b2b2 b2b2 b2b2 b200 6b08 0a00 99c7 2831 ........k.....(1
0000060: 3535 293a 99c7 2831 3437 2900 8608 1400 55):..(147).....
0000070: 9720 3533 3238 302c 2036 3a97 2035 3332 . 53280, 6:. 532
0000080: 3831 2c20 363a 0098 0819 0099 224c 4f41 81, 6:......"LOA
0000090: 4449 4e47 2e2e 2e22 0013 091e 0083 2032 DING..."...... 2

Most of it is not really readable, but there are some things that stand out, even in this very short sample. There are numbers, represented in readable ASCII. And the string `LOADING...` is surrounded with double quotes. Neither of these would occur in a compiled program, so indeed, the challenge description is true - we are looking at C64 BASIC, but where are the actual commands? The string `LOADING...` is the first thing printed to the screen, so we should expect a `PRINT` command just before it.

We can search for specifications of the PRG format. Apparently it represents a [Commodore BASIC tokenised file](http://fileformats.archiveteam.org/wiki/Commodore_BASIC_tokenized_file). To save space, BASIC commands could be represented with tokens, single-byte versions of the full strings. Normal text uses bytes with values in the range `20 ... 127`, but these tokens have the high bit set, so their values are in the range `128 ... 255`. These are not ASCII values, but [PETSCII](https://en.wikipedia.org/wiki/PETSCII), which does have significant overlaps with ASCII, e.g. in letters and numbers, which is why these are readable in the ASCII print-out next to the hexdump above.

To confirm our expectations, we can see that the token for `PRINT` is `0x99`. And indeed, this exact byte is right next to the string `LOADING...`.

So what we need is some way to convert all of the tokens in the PRG file into their text versions so we can try to understand the code and eventually the password. This is not really a decompiler, since the PRG file is really just as good as source code. What we need is called a "detokeniser", or a "BASIC lister", such as [this one](https://www.luigidifraia.com/c64/index.htm#BL).

Running the lister on the PRG file we have produces some results:

1 REM ======================
3 REM ======================
10 PRINTCHR$(155):PRINTCHR$(147)
20 POKE 53280, 6:POKE 53281, 6:
30 DATA 2,1,3,11,32,32,81,81,81,32,32,32,32,81,32,32,32,32,81,81,81,81,32,81,81,81,81,81,32,32,81,81,81,81,32,32,87,87,87,87
31 DATA 32,32,32,32,32,32,81,32,32,81,32,32,81,32,81,32,32,81,32,32,32,32,32,32,32,81,32,32,32,81,32,32,32,32,32,87,32,32,32,32
32 DATA 20,15,32,32,32,32,81,81,81,32,32,81,32,32,32,81,32,32,81,81,81,32,32,32,32,81,32,32,32,81,32,32,32,32,32,32,87,87,87,32
33 DATA 32,32,32,32,32,32,81,32,32,81,32,81,81,81,81,81,32,32,32,32,32,81,32,32,32,81,32,32,32,81,32,32,32,32,32,32,32,32,32,87
34 DATA 20,8,5,32,32,32,81,81,81,32,32,81,32,32,32,81,32,81,81,81,81,32,32,81,81,81,81,81,32,32,81,81,81,81,32,87,87,87,87,32
40 FOR I = 0 TO 39: POKE 55296 + I, 1: NEXT I
41 FOR I = 40 TO 79: POKE 55296 + I, 15: NEXT I
42 FOR I = 80 TO 119: POKE 55296 + I, 12: NEXT I
43 FOR I = 120 TO 159: POKE 55296 + I, 11: NEXT I
44 FOR I = 160 TO 199: POKE 55296 + I, 0: NEXT I
50 FOR I = 0 TO 199
51 READ C : POKE 1024 + I, C
90 CHKOFF = 11 * 40 + 1
200 IF LEN(P$) = 30 THEN GOTO 250
210 POKE 1024 + CHKOFF + 0, 86:POKE 55296 + CHKOFF + 0, 10
220 GOTO 31337
250 POKE 1024 + CHKOFF + 0, 83:POKE 55296 + CHKOFF + 0, 5
2001 REM
2010 POKE 03397, 00199 : POKE 03398, 00013 : GOTO 2001
31345 GOTO 31345

We see a lot of what we would expect. `REM` is a comment "command" in BASIC. The fancy header is printed to the screen, then the program asks for the password. It checks whether our password is 30 characters in length. Let's try inputting a 30-character long password:


This passes the first check, represented as a green heart in the progress bar. The program then takes quite a long time indeed to produce all the other red crosses. We can disable the speed limit on the emulator to make it produce the above in a matter of seconds.

But this is quite curious - where is all this checking done? Where does it print the 19 red crosses? There is clearly some direct memory access going on (`POKE ADDRESS, VALUE` writes `VALUE` to `ADDRESS`), but it is not nearly enough to override the program to do any meaningful password checking. Where is the password actually read? In the code we can see the only time the password is read is in `LEN(P$)`.

So clearly the detokenised code is not all there is. And indeed, if we open the program in a hex editor, it spans 32 KiB, with many binary data sections and many parts that are still clearly code (e.g. mentioning `CHKOFF`, a variable initialised in the code we have already seen). How come the detokeniser didn't read these?

Looking at the PRG format page again, parsing a tokenised BASIC file should not be all that complicated:

| Size (bytes) | Info |
| --- | --- |
| 2 | Destination address for program |
| | **For each line:** |
| 2 | Next line address |
| 2 | Line number |
| * | Code |
| 1 | Null terminator |

(all 2-byte values are little-endian)

The last line of the program is empty and has zeroes in both the "next line address" and the "line number" fields.

The "next line address" field might seem a little unnecessary, since clearly the lines are null-terminated. There are two important reasons to store the address anyway:

1. Performance - a `GOTO` command in BASIC (which finds a line with a given number and resumes execution flow from there) needs to only read 2 words (4 bytes) per line before it can look further; otherwise it would have to read entire lines
2. Binary data - while the null terminator terminates valid BASIC lines, programs can embed binary data (including null bytes) as well; referencing lines by their address allows BASIC to skip blocks of binary data without trying to parse them

Apart from this we need the token table for BASIC and we should be able to parse the program:

([simple parser script](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/scripts/back-to-the-basics/Simple.hx))

$ haxe --run Simple
0801: 1: REM ======================
081E: 2: REM === BACK TO BASICS ===
083A: 3: REM ======================
0857: 10: PRINTCHR$(155):PRINTCHR$(147)
0D63: 2001: REM
0D69: 2010: POKE 03397, 00199 : POKE 03398, 00013 : GOTO 2001
0DB5: 31345: GOTO 31345

Well, if we follow the proper parsing rules, respecting the last line marker and only looking for lines based on the "next line address" field, we get exact the same result as before with the BASIC Lister. Not surprising, really.

At this point, there are two approaches we can take. We can try a more lenient parsing procedure - for example, the fact that any valid line is terminated with a null byte can help us; we can simply split the data on null bytes and try to detokenise all the "lines" in between. Alternatively, we can try to understand how the C64 (emulator) even knows to find the additional lines of code.

During the CTF, we chose the former path, since it is very quick to implement. We parse as much as we can, but ignore lines longer than 100 characters - these are actually binary data, and BASIC does impose a limit on line length. In the following listing the first address printed for each line is its "next line address" field.

([lenient parser script](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/scripts/back-to-the-basics/Lenient.hx))

$ haxe --run Lenient
081E <- 0801: 1: REM ======================
083A <- 081E: 2: REM === BACK TO BASICS ===
0857 <- 083A: 3: REM ======================
086B <- 0857: 10: PRINTCHR$(155):PRINTCHR$(147)
0D69 <- 0D63: 2001: REM
0D96 <- 0D69: 2010: POKE 03397, 00199 : POKE 03398, 00013 : GOTO 2001
0DB5 <- 0D96: 31337: PRINT:PRINT"VERDICT: NOPE":GOTO 31345
0DC1 <- 0DB5: 31345: GOTO 31345
0000 <- 0DC1: 0: REM
0DEB <- 0DC7: 2001: POKE 03397, 00069 : POKE 03398, 00013
0E1F <- 0DEB: 2002: POKE 1024 + CHKOFF + 1, 81:POKE 55296 + CHKOFF + 1, 7
0E46 <- 0E1F: 2004: ES = 03741 : EE = 04981 : EK = 148
0E81 <- 0E46: 2005: FOR I = ES TO EE : K = ( PEEK(I) + EK ) AND 255 : POKE I, K : NEXT I
0E9D <- 0E81: 2009: POKE 1024 + CHKOFF + 1, 87
---- <- 0E9D: -----: <BINARY>
13B7 <- 137C: 2900: FOR I = ES TO EE : K = ( PEEK(I) + EK ) AND 255 : POKE I, K : NEXT I

([full listing here](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/scripts/back-to-the-basics/listing1.txt))

We successfully parsed many more lines than before. Binary blobs in the file cannot be detected fully accurately without properly parsing BASIC commands, so some garbage data leaks through, but mostly the detection is successful.

Before we move on with the analysis of this additional code, let's also consider how the C64 knows where to look for this code. In the listing, you can see many of the lines show the same line number, e.g. 2001 is repeated twice in the sample above. If the program was input purely via the C64 BASIC interface, this could not happen - specifying the same line number would simply override that line with new code.

It would be impractical (or even impossible) to check that there are no duplicate line numbers in the program when it is loaded. So the BASIC interpreter can simply operate under the assumption that there are no duplicate lines present. The fact that lines store the address of the next line is an important hint to understand how the lines are checked. Storing the address of a following element is a familiar concept in data structures - it is a singly-linked list. Whenever BASIC is looking for a line, it iterates the linked list until it finds the correct number (or perhaps until it reaches its starting point). Whenever the end of the list marker is encountered, it can start looking from the program's loading address again; this way it is possible to `GOTO` a preceding line.

It is important to note that in C64 land, there is no concept of an NX bit, of data vs. code. There is only 64K of address space (and less actual memory still), and all of it is directly addressable with 2-byte addresses. There is nothing preventing the program from manipulating its own memory while it is running, using `POKE` statements. With this in mind, this line in particular starts to make sense:

0D96 <- 0D69: 2010: POKE 03397, 00199 : POKE 03398, 00013 : GOTO 2001

`00397` (in decimal) is `0x0D45` (in hexadecimal), `00398` is `0x0D46`, `00199` is `0xC7`, `00013` is `0x0D`. `POKE` writes a single byte, so the two `POKE` statements together write the value `0x0DC7` (in little-endian) to address `0x0D45`. What is at this address?


It is this seemingly innocent comment line. Keep in mind that its first two bytes store the "next line address". So now, after executing the two `POKE`s, instead of `0x0D63`, the line points to `0x0DC7`. After the `POKE`s, we `GOTO 2001`, which will now be found at address `0x0DC7`!

0DEB <- 0DC7: 2001: POKE 03397, 00069 : POKE 03398, 00013

This line now overrides the pointer back to `0x0D45`. This way the modification in the program is undone after its effect was used (i.e. the current line is one that was previously unreachable). I believe this is done so that dumping the memory after running the program would not be any more helpful than just looking at the original program. The same `POKE` process is repeated multiple times in the remainder of the code.

Once again, this was just an attempt to explain how the program hid its code from the lister (a simple anti-RE technique), but in this challenge just parsing all the code we could find was enough. Perhaps a more complex challenge could interleave lines in interesting ways, executing different code when a line is executed from its middle instead of its beginning?

But back to analysing what we have. In the listing (as well as a hexdump of the program), we can see 19 large blocks of binary data, each surrounded with some code. Remember that when checking our password, one heart (for correct length) and 19 crosses were printed. We can guess each code + binary block corresponds to a single check and depending on its result, a heart or a cross is printed.

Here is the first check block (the blocks are conveniently separated by empty `REM` comments in the listing):

0DEB <- 0DC7: 2001: POKE 03397, 00069 : POKE 03398, 00013
0E1F <- 0DEB: 2002: POKE 1024 + CHKOFF + 1, 81:POKE 55296 + CHKOFF + 1, 7
0E46 <- 0E1F: 2004: ES = 03741 : EE = 04981 : EK = 148
0E81 <- 0E46: 2005: FOR I = ES TO EE : K = ( PEEK(I) + EK ) AND 255 : POKE I, K : NEXT I
0E9D <- 0E81: 2009: POKE 1024 + CHKOFF + 1, 87
---- <- 0E9D: -----: <BINARY>
13B7 <- 137C: 2900: FOR I = ES TO EE : K = ( PEEK(I) + EK ) AND 255 : POKE I, K : NEXT I
13EA <- 13B7: 2905: POKE 1024 + CHKOFF + 1, A:POKE 55296 + CHKOFF + 1, B
1417 <- 13EA: 2910: POKE 03397, 00029 : POKE 03398, 00020 : GOTO 2001

We have already seen line `2001`, used to restore the line pointer.

Line `2002` uses `CHKOFF`, and this variable is actually only used to keep track of the position of the "progress bar" displayed when checking our password. Symbol 81 in shifted PETSCII is a circle, and it is displayed [in yellow colour](https://www.c64-wiki.com/wiki/Color) - this is indeed what is shown in the progress bar while our password is being checked. But anyway, this line is not really important to us.

Line `2004` defines some variables. If we look at what `03741` (`ES`) is in hexadecimal, we see it is `0x0E9D` - exactly matching the address of the binary data block! `04981` (`EE`) is `0x1375`, just two bytes shy of the line immediately after.

Line `2005` then finally modifies some memory using `POKE`. Basically, `EK` is added to all the bytes of the memory range `ES ... EE` (modulo 256). We will see what the memory there decodes to soon.

Line `2009` just changes the symbol in the progress bar. What is more interesting is its "next line address" field - it points into the binary data block, as if it were regular code. So what we expect at this point is that line `2005` decoded the binary block into valid BASIC code, which will be executed after line `2009`. Given that we haven't seen any mention of the `P$` variable so far (storing our password), we can expect the decoded BASIC code to actually do some meaningful checking.

Line `2900` re-encodes the data block with the same procedure as before. This means that after the program executes, the memory of the program will be different, but still unreadable, so a memory dump won't be helpful (again).

Line `2905` sets the progress bar symbol for the last time. However, the symbol type and its colour are stored in variables `A` and `B`, respectively. We haven't seen these in the code so far, so we expect them to be set in the decoded binary block, depending on the result of the password check.

Finally, line `2910` repeats the `POKE` procedure to make sure BASIC can find the next line of code, along with the next password check.

In the listing we can see that all the binary blocks are surrounded with the same general code, but the `ES`, `EE`, and `EK` variables are given different values. We can look for all the lines of the form:

ES = ..... : EE = ..... : EK = ...

And indeed, there are 19 of these. After reading their values, we can modify the program memory as needed and do another listing:

$ haxe --run Solve
081E <- 0801: 1: REM ======================
083A <- 081E: 2: REM === BACK TO BASICS ===
0857 <- 083A: 3: REM ======================
0E46: 2005: FOR I = ES TO EE : K = ( PEEK(I) + EK ) AND 255 : POKE I, K : NEXT I
0E81: 2009: POKE 1024 + CHKOFF + 1, 87
0E9D: 2010: V = 0.6666666666612316235641 - 0.00000000023283064365386962890625 : G = 0
0EEB: 2020: BA = ASC( MID$(P$, 1, 1) )
0F05: 2021: BB = ASC( MID$(P$, 2, 1) )
0F1F: 2025: P0 = 0:P1 = 0:P2 = 0:P3 = 0:P4 = 0:P5 = 0:P6 = 0:P7 = 0:P8 = 0:P9 = 0:PA = 0:PB = 0:PC = 0
0F7E: 2030: IF BA AND 1 THEN P0 = 0.062500000001818989403545856475830078125
0FBC: 2031: IF BA AND 2 THEN P1 = 0.0156250000004547473508864641189575195312
0FFB: 2032: IF BA AND 4 THEN P2 = 0.0039062500001136868377216160297393798828
103A: 2033: IF BA AND 8 THEN P3 = 0.0009765625000284217094304040074348449707
1079: 2034: IF BA AND 16 THEN P4 = 0.0002441406250071054273576010018587112427
10B9: 2035: IF BA AND 32 THEN P5 = 0.0000610351562517763568394002504646778107
10F9: 2036: IF BA AND 64 THEN P6 = 0.0000152587890629440892098500626161694527
1139: 2037: IF BA AND 128 THEN P7 = 0.0000038146972657360223024625156540423632
117A: 2040: IF BB AND 1 THEN P8 = 0.0000009536743164340055756156289135105908
11B9: 2031: IF BB AND 2 THEN P9 = 0.0000002384185791085013939039072283776477
11F8: 2032: IF BB AND 4 THEN PA = 0.0000000596046447771253484759768070944119
1237: 2033: IF BB AND 8 THEN PB = 0.000000014901161194281337118994201773603
1275: 2034: IF BB AND 16 THEN PC = 0.0000000037252902985703342797485504434007
12B5: 2050: K = V + P0 + P1 + P2 + P3 + P4 + P5 + P6 + P7 + P8 + P9 + PA + PB + PC
1300: 2060: G = 0.671565706376017
131A: 2100: T0 = K = G : A = 86 : B = 10
133B: 2200: IF T0 = -1 THEN A = 83 : B = 5
135A: 2210: POKE 1024 + CHKOFF + 1, 90
1376: 2500: REM
137C: 2900: FOR I = ES TO EE : K = ( PEEK(I) + EK ) AND 255 : POKE I, K : NEXT I

([full listing here](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/scripts/back-to-the-basics/listing2.txt))

Indeed, the binary blocks decoded to some password-checking code.

Lines `2020` and `2021` store individual characters of the password in `BA` and `BB`.

Lines `2030` through `2040`, then again `2031` through `2034` all check individual bits of the password characters and set the values of `P0 ... P9, PA, PB, PC` based on them.

Finally, all of the `P` values (and `V`) are summed and the result is compared to `G`. If the value matches exactly, `A` is set to the heart symbol (line `2200`), otherwise it remains a cross (line `2100`).

The fact that the condition is exact match and that the lines show decimal values with so many digits made me worry at first - do we need to have an exact C64-like implementation of decimal arithmetics for this to work? Do we need to write our password cracker in BASIC?

But perhaps trying to find the closest solution using regular 64-bit IEEE floats will work just as well. Each of the 19 blocks checks 13 bits, giving a total of 247 bits checked. This is 7 more bits than there are in our 30-byte password. If we check the last block, it checks a dummy `BX` variable, and its value will always be `0` - so the 19th check really only checks 6 bits.

So we need to crack 19 checks with 13 bits of data each - this gives 8K possible combinations, very easily brute-forceable. We write our cracker and print out the characters (we would normally have to convert PETSCII into ASCII, but fortunately the password comprises of ASCII-equivalent characters):

([full solver script](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/scripts/back-to-the-basics/Solve.hx))

$ haxe --run Solve

And it works! BASIC lines form linked lists as described above, and luckily we didn't need to get into the specifics of 40-bit C64 floats.