Rating:

Initial inspection showed that the files contain a Linux software RAID 5:

```
$ file *.img
disk01.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk02.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk03.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk04.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk05.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk06.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk07.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk08.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk09.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
disk10.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10
```

After trying to mount the RAID, we got different errors depending on the Linux version
of different team members, but it was clear that the RAID was corrupted in some way.
We did not get any errors saying that the array was currupted though, and it was possible
to recover some data form the files by using tools like `binwalk`, but it was clearly corrupted.

To dig a little deeper we used a small python script to create the XOR between all disks,
which should give us all zeros on a functioning RAID 5 array:

```py
disks = [None for _ in range(10)]
for i in range(10):
with open(f'disk{i+1:02d}.img', 'rb') as f:
disks[i] = f.read()

with open('xor.img', 'wb') as f:
out = bytearray()
for b in range(len(disks[0])):
byte = 0
for i in range(len(disks)):
byte ^= disks[i][b]
out.append(byte)
f.write(out)
```

The resulting file can be examined with `hexyl`:

```
$ hexyl --border ascii xor.img
+--------+-------------------------+-------------------------+--------+--------+
|00000000| 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 |00000000|00000000|
|* | | | | |
|000010a0| 00 00 00 00 00 00 00 00 | a6 ba ae eb 3e d7 00 92 |00000000|××××>×0×|
|000010b0| 80 4a c9 6f ea e5 ae 89 | 00 00 00 00 00 00 00 00 |×J×o××××|00000000|
|000010c0| 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 |00000000|00000000|
|* | | | | |
|00500000| | | | |
+--------+-------------------------+-------------------------+--------+--------+
```

This is the only region that does not withstand a parity check, and since it is
in the header region of the drives, this is expected. The drives actually do
contain a valid array, so something must be wrong with the headers.

We found a great source for an explanation of the different fields in a Linux RAID
here <https://raid.wiki.kernel.org/index.php/RAID_superblock_formats>, which
explains that in version 1.2 of Linux RAID, the superbloc is always 4K from the
start of every partition, so 0x1000 is the start of our superblocks.

Our differences are from 0xA8 - 0xB7 in the superblock, which are the
`device_uuid` or "UUID of the component device". This makes sense as all devices
in a RAID 5 contain different data, so there needs to be a way to destinguish
them.

However offset 0xA0 in the superblock contains the `dev_number` or
"Permanent identifier of this device"
which should indicate where in the RAID array the device is used. But it turns
out this field is the same for all disks (`09 00 00 00` in little endian).
So there is actually no way for the kernel to figure out which disks belongs
into which role/position in the array.

Now our task got a bit clearer: Find out what is the correct order of the disks.
We could have used the checksum of the headers, but the challenge authors took
care of them as well and replaced them all with `0xbadc0de`:

```
$ mdadm -E disk05.img
<...>
Checksum : badc0de - expected ff67acc5
```

So we had to get a bit more creative. We were able to determine that disk01 and
disk10 were at the correct position already, because disk01 was the only one that
contained the start of an ext4 file system, and disk10 was the only one that
contained seemingly arbitrary data, making it the only candidate for the parity
slice. For an overview of how a left asymmetric RAID 5 looks, see here:
<http://www.reclaime-pro.com/posters/raid-layouts.pdf>

We realized that the disk was filled with
different plays from Shakespeare, apparently saved as text files (note it later
turned out to be just one large file containing all plays). With this information
it was easy to figure out which disk was followed by another simply by looking
for text fragments that overlapped slice boundaries.

Within a few minutes we had all the text fragments matched and deduced the
correct oder of the disks:

```
0 -> 6 -> 3 -> 5 -> 2 -> 4 -> 1 -> 7 -> 8 -> 9
```

since at this point we already implemented a simple RAID recovery tool,
we used it to extract the ext partition from the RAID, which was then just mounted
to extract the flag, alongside with 484 pirate flags in `.jpg` files.

```
sudo mount unraid.img /mnt/unraid
```

# Flag

![Flag](flag.jpg)

# Files

## `sovle.py`

```python
import os

disks = [None for _ in range(10)]
for i in range(10):
with open(f'disk{i+1:02d}.img', 'rb') as f:
disks[i] = f.read()

searches = [
b"Than these poor compounds that thou mayst not sell.",
b"I sell thee poison; thou hast sold me none.",

b"Live, and be prosperous; and farewell, g",
b"Bal. [aside] For all this same, I'll hide me hereabout.",

b"Friar. Saint Francis be my speed! how oft to-night",
b"Have my old feet stumbled at graves! Who's there?",
b"Bal. Here's one, a friend, and one that knows you well.",

b"Wife. The people in the street cry 'Romeo,'",
b"Some 'Juliet,' and some 'Paris'; and all run,",
b"With open outcry, toward our monument.",

b"If I departed not and left him there.",
b"Prince. Give me the letter. I will look on it.",
b"Where is the County's page that rais'd the watch?",

b"Why, Belman is as good as he, my",
b"upon it at the merest loss",
b"And twice today",

b"And give them friendly welcome every",
b"Let them want nothing that my house affords.",

b"Or wilt thou ride? Thy horses shall be trapp'd,",
b"Their harness studded all with gold and pearl.",
b"Dost thou love hawking? Thou hast hawks will soar",

b"Ay, it stands so that I may hardly tarry so long. But I",
b"be loath to fall into my dreams again: I will therefore tarry in",
b"despite of the flesh and the blood.",

b"If either of you both love Katherina,",
b"Because I know you well and love you well,",
b"Leave shall you have to court her at your pleasure.",

b"shall be so far forth friendly maintained, till by helping",
b"Baptista's eldest daughter to a husband, we set his youngest free",
b"for a husband, and then have to't afresh. Sweet Bianca! Happy man",
]

disks_data = [x[0x100000:] for x in disks]

disks_data_slices = []

for i in range(10):
diskslices = []
for slicenum in range(len(disks_data[0])//0x1000):
diskslices.append(disks_data[i][0x1000*slicenum:0x1000*(slicenum+1)])
disks_data_slices.append(diskslices)

for si, search in enumerate(searches):
for didx, diskslices in enumerate(disks_data_slices):
for slicenum, pslice in enumerate(diskslices):
if search in pslice:
print(f'{didx} ', end='')

# order: 0 6 3 5 2 4 1 7 8 9

disks_data = [
disks_data[0],
disks_data[6],
disks_data[3],
disks_data[5],
disks_data[2],
disks_data[4],
disks_data[1],
disks_data[7],
disks_data[8],
disks_data[9],
]

with open('unraid.img', 'wb') as f:
for slicenum in range(len(disks_data[0])//0x1000):
print(f'slice {slicenum}: ', end='')
for i in range(10):
parity_in_slice = 10 - (slicenum % 10) - 1
slloc = slicenum
if i >= parity_in_slice:
slloc += 1
if parity_in_slice == 0:
continue
print(slloc, ",", end='')
f.write(disks_data[i][0x1000*slloc:0x1000*(slloc+1)])
print()
```

Original writeup (https://hack.more.systems/writeup/2021/07/06/googlectf-raidersofcorruption/).