Tags: lsb-oracle crypto rsa 

Rating:

**Description**

> This crypto experiment will help you decrypt an RSA encrypted message.
>
> `nc perfect-secrecy.ctfcompetition.com 1337`

**Files provided**

- [a ZIP file](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/files/perfect-secrecy.zip) containing:
- `challenge.py` - script running on the server
- `flag.txt` - 1024-bit encrypted message
- `key_pub.pem` - 1024-bit RSA public key

**Solution**

First we can extract data from the public key file `key_pub.pem` using OpenSSL command-line tools:

$ openssl rsa -pubin -in key_pub.pem -text -noout
Public-Key: (1024 bit)
Modulus:
00:da:53:a8:99:d5:57:30:91:af:6c:c9:c9:a9:fc:
31:5f:76:40:2c:89:70:bb:b1:98:6b:fe:8e:29:ce:
d1:2d:0a:df:61:b2:1d:6c:28:1c:cb:f2:ef:ed:79:
aa:7d:d2:3a:27:76:b0:35:03:b1:af:35:4e:35:bf:
58:c9:1d:b7:d7:c6:2f:6b:92:c9:18:c9:0b:68:85:
9c:77:ca:e9:fd:b3:14:f8:24:90:a0:d6:b5:0c:5d:
c8:5f:5c:92:a6:fd:f1:97:16:ac:84:51:ef:e8:bb:
df:48:8a:e0:98:a7:c7:6a:dd:25:99:f2:ca:64:20:
73:af:a2:0d:14:3a:f4:03:d1
Exponent: 65537 (0x10001)

And we can confirm `flag.txt` is indeed encrypted:

$ xxd flag.txt
0000000: a9c5 65cb c2cf 1c7d 4267 fd17 69dc e9f0 ..e....}Bg..i...
0000010: 3481 800b bae8 6bb0 926a e617 a7e6 d09f 4.....k..j......
0000020: 2c61 a9d7 0a85 6783 973c 4c55 bf43 a24c ,a....g..<LU.C.L
0000030: 1d70 f7b0 2ac0 34ff 39c5 37ab 39c7 8d90 .p..*.4.9.7.9...
0000040: 523a 8107 a098 0195 df35 21d6 54d7 2069 R:.......5!.T. i
0000050: f942 8208 431c c763 def3 9bcd 8cd3 ea9d .B..C..c........
0000060: 45e9 9e23 f781 0fa0 3b6c e906 d6f4 1373 E..#....;l.....s
0000070: e0e2 a7c0 2230 1828 d7f8 0ed3 c630 ae56 ...."0.(.....0.V

After studying the Python script provided for a little bit, we can decide to ignore most of it - the parts that take care of setting up the service and doing standard RSA operations are not important.

This is the actual core of the challenge, with some comments:

def Challenge(private_key, reader, writer):
try:
# read two bytes from the client
m0 = reader.read(1)
m1 = reader.read(1)
# read a ciphertext from the client
ciphertext = reader.read(private_key.public_key().key_size // 8)
# decrypt it
dice = RsaDecrypt(private_key, ciphertext)
# repeat a 100 times ...
for rounds in range(100):
# select one of the two bytes given based on the
# least significant bit of the decrypted ciphertext
p = [m0, m1][dice & 1]
# choose a random number (0, 1, or 2!)
k = random.randint(0, 2)
# add the random number to the byte selected,
# then keep its least significant bit only
c = (ord(p) + k) % 2
# zero-expand this single bit to a byte and send it to the client
writer.write(bytes((c,)))
writer.flush()
return 0
except Exception as e:
return 1

In a challenge like this, it is very important to understand where we actually gain any useful information. We cannot break a 1024-bit RSA key (practically), so neither the encoded message nor the public key help us. We are given a remote service so naturally we need to interact with it one way or another to solve the challenge.

In the above script, we can provide the server with any ciphertext and it will decrypt it for us. However, `dice`, the variable containing the decrypted data, is not used anywhere except as `dice & 1`, giving us only a single bit of information. We need to consider how this could be useful.

Apart from that, `random.randint(0, 2)` might be a source of trip ups. I was not sure whether or not the upper bound was exclusive. If it were, then this function call would return `0` roughly 50% of the time and `1` otherwise. If this were the case, the attack we will choose in the end would not have been possible, and we might have considered other side channels - timing? Padding oracle? And so on.

But, `random.randint(0, 2)` indeed returns `0`, `1`, or `2`, all with equal probabilities. This is significant because `0` and `2` are both even numbers, whereas `1` is not. This random distribution is biased towards even numbers, returning an odd number only 33% of the time!

Let's summarise the procedure of a single interaction with the server:

- connect
- provide byte value `M0` with the LSB (least significant bit) equal to 0
- provide byte value `M1` with the LSB equal to 1
- provide a ciphertext `C`
- (the server decrypts `C` into plaintext `P`)
- count the number of `1` bits returned by the server
- majority of `1` bits - the LSB of `P` is probably `1`
- majority of `0` bits - the LSB of `P` is probably `0`

We can understand this interaction as a simple black box that takes an input, the ciphertext `C`, and returns a single bit of information `P0`, the LSB of the decrypted plaintext `P`.

Do note the word "probably" in the last step of the interaction. The server only runs 100 trials for each bit we test it for. The more bits we test, the more likely it is that the majority of bits returned by the server do not reflect the LSB of the plaintext!

So what can we do with our black box? We have been given the ciphertext for the flag (based on the filename), but if we input it into the black box, we can only obtain the LSB of the flag, which is not terribly useful.

Now may be the time to look more closely at the Python script. In particular, the `RsaDecrypt` function used.

def RsaDecrypt(private_key, ciphertext):
assert (len(ciphertext) <=
(private_key.public_key().key_size // 8)), 'Ciphertext too large'
return pow(
int.from_bytes(ciphertext, 'big'),
private_key.private_numbers().d,
private_key.public_key().public_numbers().n)

The important thing here is the fact that `RsaDecrypt` is implemented just as a simple exponentiation operation, using the private exponent. It is not checking the input (besides its size in bits), nor is it checking the output. When implemented properly, RSA uses a padding scheme. If the plaintext is not valid with respect to the padding scheme after decryption, it should be rejected (and the client should not know that this happened). Here we can see no such check, so we are dealing with "plain RSA", which can be attacked in [numerous ways](https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Attacks_against_plain_RSA).

What does this mean for us? It means that no matter what ciphertext `C` we give to the server, it will decrypt it and give us (indirectly) the LSB of the resulting plaintext.

**Note**: the rest of this write-up describes how an LSB oracle attack on RSA works, and how it is implemented, as well as some caveats. I am by no means a cryptographer. You might have more luck searching for "RSA LSB oracle" and finding a white paper for a proper explanation and proof. Nevertheless, I tried my best to explain it.

Because RSA is simply exponentiation under some large modulus, there are certain mathematical properties that could be useful to us. In particular, if we multiply two ciphertexts `C1` and `C2`, the resulting plaintext will be the product of the corresponding plaintexts `P1` and `P2`:

decrypt(C1 * C2) = P1 * P1
(C1 * C2)^d = P1 * P2 mod n

We know the public exponent, so we can choose any plaintext and calculate the corresponding ciphertext. We want to gather information about the flag of course, so one of our ciphertexts (and its plaintext) is chosen. What should we choose for the other ciphertext / plaintext?

Let's step back a bit (ha, ha) and think about our goal. We want to know the flag. We can only gather one bit of information from the server at a time, and it is the LSB of the flag multiplied by any (natural) number we choose. Since we are considering the flag in its binary form, let's consider bitwise shifting as mathematical operations:

x = 0001 0110 1101
2 * x = 0010 1101 1010
x << 1 = 0010 1101 1010

A left shift of one bit is the same as multiplying by two.

x = 0001 0110 1101
x // 2 = 0000 1011 0110
x >> 1 = 0000 1011 0110

A right shift of one bit is the same as (integer) division by two.

If we were working in regular integer arithmetic, there would be no way to multiply the flag to reproduce the effect of a right shift. This would mean we multiplied the flag by a number less than 1, but again - we only have natural numbers to work with.

But we are working in modular arithmetic. If we multiply the flag by some number, the result can be smaller than the original flag! But then again, it could be larger. Can we tell which happened by the LSB?

Let's choose `2` as the number we multiply the flag by:

Given
public modulus n (n is odd; product of 2 large primes)
public exponent e
flag p (p < n)
ciphertext c = p^e mod n
factor f = 2
encrypted factor cf = 2^e mod n
Then
x = decrypt(c * cf)
= (c * cf)^d
= (p^e * 2^e)^d
= 2p
If x < n, then
LSB of x mod n is 0! (a left shift)
If x ≥ n, then
LSB of x mod n is 1! (because n is odd)

So with this process, we obtained useful information about the flag, in terms of its relative value to `n`, a value which we know!

In fact, we can repeat this process. If we multiply the flag by `4` (or better, multiply it by `2` twice), what can happen? Since `p < n`, we now have four cases to consider:

x = decrypt(c * cf)
= 4p
If x < n, then
LSB of x mod n is 0! (a left shift of two)
If n ≤ x < 2x, then
LSB of x mod n is 1! (because n is odd)
If 2n ≤ x < 3x, then
LSB of x mod n is 0! (because n is odd and subtracted from `x` twice)
If x ≥ 3x, then
LSB of x mod n is 1! (because n is odd and subtracted from `x` thrice)

If we combine the last two results, we know in which "quadrant" of the modulus space `p` is in. That is, if we were to separate the range of all of its possible values, we know in which quarter of this range it lies in.

But of course, we don't have to stop here. If we keep multiplying our ciphertext by `2^e mod n` (and hence our plaintext by `2`) and gathering the LSB returned by the server, after 1024 interactions, we will know exactly what the flag is. In fact, since the range keeps getting more and more divided, we can spot that our process is that of a binary search, at each point checking a smaller and smaller interval of possible values and deciding on one half of that interval.

With all the pieces now in place, the implementation is quite easy to understand. We use several parallel connections to gather the LSB / parity bits faster. We consider results where the number of `1` bits is very close to the number of `0` bits to be meaningless and re-try until at least a 60:40 majority is found for any given bit. Once we have gathered all of the needed 1024 bits from the server, we use Python's decimals to have enough precision to calculate the exact value of the flag. (Regular 32 or 64-bit IEEE floats cannot express all possible values when the flag has 1024 bits of information!)

([full Python script](https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/scripts/perfect-secrecy.py))

$ python3 perfect-secrecy.py
calculating ciphertext multiples ...
gathering parity bits from server ...
calculating flag value ...
flag:
0x261b4020c33d22e3ae79e4227f2d51c3760e40dcdacd87f025f4c8471ea8cb41d7d8290671730002e6f4f204354467b68336c6c305f5f31375f355f6d335f315f7734355f77306e643372316e365f31665f34663733725f346c6c5f37683335335f79333472355f7930755f645f6c316b335f37305f6d3333377d204f6f2e
b"\x00\x02a\xb4\x02\x0c3\xd2.:\xe7\x9eB'\xf2\xd5\x1c7`\xe4\r\xcd\xac\xd8\x7f\x02_L\x84q\xea\x8c\xb4\x1d}\x82\x90g\x170\x00.oO CTF{h3ll0__17_5_m3_1_w45_w0nd3r1n6_1f_4f73r_4ll_7h353_y34r5_y0u_d_l1k3_70_m337} Oo."

And we got the flag!

We can see that the message is actually padded using PKCS#1 v1.5. This is signaled by the `\x00\x02` prefix to the message, followed by a bunch of random data, then a null byte, and finally the actual message.

During the CTF our team was stuck for a little bit despite having the above script implemented and tested locally. The difference was that we did not enforce a 60:40 majority on the bits returned by the server. We got unlucky and one of the bits returned was wrong, giving us this message after deciphering the parity bits:

$ xxd attempt1.bin
0000000: 0002 61b4 020c 321d 86e9 ea88 7ded 44cc ..a...2.....}.D.
0000010: 0ec8 396b 47dd 586b 0b01 29fb 1831 15e3 ..9kG.Xk..)..1..
0000020: a367 44e4 9197 7525 880c ec4f 3c91 176a .gD...u%...O<..j
0000030: a2c0 3cd3 6bea 55af b865 7f0b a81c 09af ..<.k.U..e......
0000040: 31ae 192f 69c5 f9eb 7d82 ff6b dac0 d1d4 1../i...}..k....
0000050: 5870 49bf f959 c002 ee58 1068 baf1 77ad XpI..Y...X.h..w.
0000060: 51b7 bdc5 fd0f b3b0 f19c 8d97 2ab3 fc29 Q...........*..)
0000070: 9a84 6a00 0a59 afdc d9da 0931 be8b 3a68 ..j..Y.....1..:h
$ xxd attempt2.bin
0000000: 0002 61b4 020c 33d2 2e3a e79e 4227 f2d5 ..a...3..:..B'..
0000010: 1c37 60e4 0dcd acd8 7f02 5f4c 8471 ea8c .7`......._L.q..
0000020: b41d 7d82 9067 1730 002e 6f4f 2043 5446 ..}..g.0..oO CTF
0000030: 7b68 336c 6c30 5f5f 3137 5f35 5f6d 335f {h3ll0__17_5_m3_
0000040: 315f 7734 355f 7730 6e64 3372 316e 365f 1_w45_w0nd3r1n6_
0000050: 3166 5f34 6637 3372 5f34 6c6c 5f37 6833 1f_4f73r_4ll_7h3
0000060: 3533 5f79 3334 7235 5f79 3075 5f64 5f6c 53_y34r5_y0u_d_l
0000070: 316b 335f 3730 5f6d 3333 377d 204f 6f2e 1k3_70_m337} Oo.

As you can see, we only got 12 bytes (and a couple of bits) right. I believed the value must have been correct, since the message was valid with respect to the PKSC#1 v1.5 padding scheme. We finally uncovered the problem when I tried to re-encrypt the supposed flag and the result was different from the original ciphertext. Once again, do note the word "probably" in the attack description!

`CTF{h3ll0__17_5_m3_1_w45_w0nd3r1n6_1f_4f73r_4ll_7h353_y34r5_y0u_d_l1k3_70_m337}`

Original writeup (https://github.com/Aurel300/empirectf/blob/master/writeups/2018-06-23-Google-CTF-Quals/README.md#158-crypto--perfect-secrecy).