Rating:

# BCTF 2015: weak_enc

**Category:** Crypto
**Points:** 200
**Description:**

> nc 146.148.79.13 8888
>
> [http://dl.bctf.cn/weak_enc-40eb1171f07d8ebb06bbf36849d829a1.py.xz](challenge/weak_enc.py)
>
> Decrypt: NxQ1NDMYcDcw53gVHzI7
>
> The flag for this problem does not look like BCTF{xxxxxx}

## Write-up

The challenge consists of a python daemon running an encryption service using a custom encryption algorithm. Upon every connection to the daemon, we are sent a challenge and have to respond with a small ['proof-of-work'](http://en.wikipedia.org/wiki/Proof-of-work_system) consisting of finding the pre-image (the first 12 bytes of which consist of the challenge) for a SHA1 hash where the lower-order 16 bits are all 0:

>```python
> proof = b64.b64encode(os.urandom(12))
>
> req.sendall("Please provide your proof of work, a sha1 sum ending in 16 bit's set to 0, it must be of length %s bytes, starting with %s\n" % (len(proof)+5, proof))
>
> test = req.recv(21)
> ha = hashlib.sha1()
> ha.update(test)
>
> if (test[0:16] != proof or ord(ha.digest()[-1]) != 0 or ord(ha.digest()[-2]) != 0):
>```

After we've send the proof-of-work we can send an arbitrary plaintext message M of up to 40000 characters to the daemon and get the corresponding ciphertext C such that C = E(SALT+M).
This gives us an adaptive (partially) [chosen plaintext scenario](http://en.wikipedia.org/wiki/Chosen-plaintext_attack) (since every message is prefixed with an unknown, but static, salt and we can adapt our choice of plaintext based on previous ciphertext responses). The attack, however, is not as straightforward as simply bruteforcing the plaintext message and comparing it against the target ciphertext since the proof-of-work challenge-response mechanism makes this an infeasible approach.

Let's take a closer look at the encryption scheme:

>```python
>SMALLPRIME = 13
>HASHLENGTH = 16
>N = 17
>
>for i in SALT:
> if not i in string.ascii_lowercase:
> print "OMG, this salt is HOOOOOOOOOOT!!!" # Bad things happened
> sys.exit(0)
>
>def updateDict(s, lzwDict):
> if not s in lzwDict:
> count = len(lzwDict.keys())
> lzwDict[s] = count % 256
>
>def LZW(s, lzwDict): # LZW written by NEWBIE
> for c in s: updateDict(c, lzwDict)
> # print lzwDict # have to make sure it works
> result = []
> i = 0
> while i < len(s):
> if s[i:] in lzwDict:
> result.append(lzwDict[s[i:]])
> break
> for testEnd in range(i+2, len(s)+1):
> if not s[i:testEnd] in lzwDict:
> updateDict(s[i:testEnd], lzwDict)
> result.append(lzwDict[s[i:testEnd-1]])
> i = testEnd - 2
> break
> i += 1
> return result
>
>def STRONGPseudoRandomGenerator(s):
> return s[SMALLPRIME - HASHLENGTH :], hashlib.md5(s).digest()
>
>def encrypt(m):
> lzwDict = dict()
> toEnc = LZW(SALT + m, lzwDict)
> key = hashlib.md5(SALT*2).digest()
> OTPBase = ""
> OPT = ""
> step = HASHLENGTH - SMALLPRIME
> for i in range(0, 3*N+step, step):
> rand, key = STRONGPseudoRandomGenerator(key)
> OTPBase += rand
> enc = []
> otpadded = []
> for i in range(len(toEnc)):
> index = i % N
> iRound = i / N + 1
> OTP = OTPBase[3*int(pow(ord(OTPBase[3*index]),ord(OTPBase[3*index+1])*iRound, N))+2]
> otpadded.append(ord(OTP))
> enc.append(chr(toEnc[i] ^ ord(OTP)))
> return b64.b64encode(''.join(enc))
>```

What happens here is that every plaintext message m is prefixed with the unknown salt and compressed using the [LZW](http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch) lossless compression algorithm. Subsequently an intermediary key is generated by taking the MD5 hash of a double concatenation of the salt and a pseudo-'one time pad' key is generated by iteratively taking the last 3 bytes of the current intermediary key and updating the intermediary key by applying MD5 to it again. This is repeated until we have a 53-byte long OTPBase derived from the salt. Finally, the compressed plaintext message (prefixed with the salt) is XORed against a transposed version (based on the current index and round count, with rounds of 17 iterations) of the OTPBase before being base64 encoded to yield the ciphertext.

There are several problems here, ranging from using static keys (derived solemnly from a salt included in the ciphertext) to key repetition (as opposed to it being a suggested true OTP) after a certain length. But the most exploitable problem is the fact that the use of LZW compression before encryption gives us a [side-channel attack](http://www.iacr.org/cryptodb/archive/2002/FSE/3091/3091.pdf) that leaks information about the plaintext, something which is inherent to the algorithm and its use here and not due to an implementation flaw.

LZW is an adaptive compression algorithm (to encode recurring substrings more efficiently) so its state (represented by the dictionary) changes based on the data it is fed. Since every message is has an unknown static prefix in the form of the salt every message enters the LZW algorithm in the same state initialized by the salt.

Given that the encryption is done in streaming fashion (ie. the key is repeatedly XORed with the plaintext) the ciphertext length is identical to the length of the compressed input message prefixed with the salt. By supplying empty data we obtain a 'null ciphertext' (ie. E(SALT)) and hence learn the length of the compressed salt.

![alt lzw](lzw.gif)

Now we can iteratively feed chosen plaintexts consisting of incremental lowercase alpha n-grams (starting with bi-grams and continuing until we learn nothing new) into the encryption routine and observe the length difference between the resulting ciphertext length (and hence the length of the compressed SALT+n-gram) and the length of the 'null ciphertext'. Due to the way LZW compression works, every n-gram that is already present in the state upon encountering it is encoded with a single unique code, while n-grams that are not present are added as multiple new characters. So when we feed the bigram "ni" to the encryption routine and the resulting length difference is 1 (while "ni" consists of more than 1 character) we know the bigram was in the state dictionary and hence is contained in the salt. This approach can be incrementally repeated for bi-grams, tri-grams, etc. until the first n-gram where we learn no new information. Once we have the set of n-grams comprising the salt we can recover the salt (somewhat naively) by tyring all possible combinations (offline) until it yields the 'null ciphertext'.

Once we have the salt, we can derive the OTPBase, apply it to our target ciphertext and obtain a plaintext version of the LZW compression of the target message. Using the salt, we reconstruct a reverse LZW dictionary to decode the plaintext LZW compressed message and recover the final plaintext (assuming the constituent bytes are a subset of the salt's bytes but that turned out to be the case).

Putting this all together, the [solution](solution/weak_enc_crack.py) is as follows:

>```python
>#!/usr/bin/python
>#
># BCTF 2015
># WEAK_ENC (CRYPTO/200) cracker
>#
># @a: Smoke Leet Everyday
># @u: https://github.com/smokeleeteveryday
>#
>
>import socket
>import re
>import base64 as b64
>import hashlib
>import itertools
>import string
>
>SMALLPRIME = 13
>HASHLENGTH = 16
>N = 17
>
># Get SHA1
>def getSHA(data):
> ha = hashlib.sha1()
> ha.update(data)
> return ha.digest()
>
># Do proof of work
>def findSHA(challenge):
> charset = "".join(chr(i) for i in range(0x00, 0x100))
> for p in itertools.chain.from_iterable((''.join(l) for l in itertools.product(charset, repeat=i)) for i in range(5, 5 + 1)):
> candidate = challenge + p
> proof = getSHA(candidate)
> if((ord(proof[-2]) == 0) and (ord(proof[-1]) == 0)):
> return candidate
> return None
>
># Session with server
>def talk(s, plaintext):
> workrequest = s.recv(1024)
> if not workrequest:
> return None
>
> m = re.match(r"^.*with\s(.*?)$", workrequest, re.MULTILINE)
>
> if not(m):
> return None
>
> challenge = m.group(1)
> response = findSHA(challenge)
>
> s.sendall(response)
> welcome = s.recv(1024)
>
> if not welcome:
> return None
>
> s.sendall(plaintext + "\n")
> encrypted = s.recv(1024)
>
> if not encrypted:
> return None
>
> m = re.match(r"^.*:\s(.*?)$", encrypted, re.MULTILINE)
> if not(m):
> return None
>
> ciphertext = m.group(1)
> return ciphertext
>
># Do request
>def req(HOST, m):
> PORT = 8888
> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> s.connect((HOST, PORT))
> ciphertext = talk(s, m)
> s.close()
> return ciphertext
>
># Update LZW Dict
>def updateDict(s, lzwDict):
> if not s in lzwDict:
> count = len(lzwDict.keys())
> lzwDict[s] = count % 256
>
># LZW
>def LZW(s, lzwDict): # LZW written by NEWBIE
> for c in s: updateDict(c, lzwDict)
> # print lzwDict # have to make sure it works
> result = []
> i = 0
> while i < len(s):
> if s[i:] in lzwDict:
> result.append(lzwDict[s[i:]])
> break
> for testEnd in range(i+2, len(s)+1):
> if not s[i:testEnd] in lzwDict:
> updateDict(s[i:testEnd], lzwDict)
> result.append(lzwDict[s[i:testEnd-1]])
> i = testEnd - 2
> break
> i += 1
> return result
>
># PRNG
>def STRONGPseudoRandomGenerator(s):
> return s[SMALLPRIME - HASHLENGTH :], hashlib.md5(s).digest()
>
># Get OTPBase
>def getOTPBase(SALT):
> key = hashlib.md5(SALT*2).digest()
> OTPBase = ""
> OPT = ""
> step = HASHLENGTH - SMALLPRIME
> for i in range(0, 3*N+step, step):
> rand, key = STRONGPseudoRandomGenerator(key)
> OTPBase += rand
> return OTPBase
>
># Encryption function
>def encrypt(SALT, m):
> lzwDict = dict()
> toEnc = LZW(SALT + m, lzwDict)
> key = hashlib.md5(SALT*2).digest()
> OTPBase = ""
> OPT = ""
> step = HASHLENGTH - SMALLPRIME
> for i in range(0, 3*N+step, step):
> rand, key = STRONGPseudoRandomGenerator(key)
> OTPBase += rand
> enc = []
> otpadded = []
> for i in range(len(toEnc)):
> index = i % N
> iRound = i / N + 1
> OTP = OTPBase[3*int(pow(ord(OTPBase[3*index]),ord(OTPBase[3*index+1])*iRound, N))+2]
> otpadded.append(ord(OTP))
> enc.append(chr(toEnc[i] ^ ord(OTP)))
> return b64.b64encode(''.join(enc))
>
># Obtain LZW compressed message length
>def lzwCompressLen(m):
> lzwDict = dict()
> return len(LZW(m, lzwDict))
>
># Obtain LZW dictionary
>def lzwDict(SALT, m):
> lzwDict = dict()
> LZW(SALT+m, lzwDict)
> return lzwDict
>
># Check if n-gram is in LZW dictionary
>def checkTuple(HOST, nulllen, prefix, charset, tuplelen):
> tuples = []
> for p in itertools.chain.from_iterable((''.join(l) for l in itertools.product(charset, repeat=i)) for i in range(tuplelen, tuplelen + 1)):
> ctext = req(HOST, prefix+p)
> if(len(b64.b64decode(ctext)) - nulllen == 1):
> tuples.append(prefix+p)
> return tuples
>
># Obtain LZW dictionary n-grams
>def getTuples(HOST, nulllen, prefixes, charset, tuplelen):
> tuples = []
> if(len(prefixes) > 0):
> for prefix in prefixes:
> tuples += checkTuple(HOST, nulllen, prefix, charset, tuplelen)
> return tuples
> else:
> return checkTuple(HOST, nulllen, "", charset, tuplelen)
>
># Obtain SALT given n-gram its composition
>def getSALT(charset, nulllen):
> for candidate in itertools.chain.from_iterable((''.join(l) for l in itertools.product(charset, repeat=i)) for i in range(1, nulllen + 1)):
> if(nulllen == lzwCompressLen(candidate)):
> if(nullcipher == encrypt(candidate, "")):
> return candidate
> return None
>
># Recover LZW compressed message given OTPBase
>def recoverLZWCompress(c, OTPBase):
> enc = list(b64.b64decode(c))
> toEnc = []
>
> for i in range(len(enc)):
> index = i % N # index within the round (0..16)
> iRound = i / N + 1 # round index (1 for 0..16, 2 for 17..31, etc.)
> OTP = OTPBase[3*int(pow(ord(OTPBase[3*index]),ord(OTPBase[3*index+1])*iRound, N))+2]
> toEnc.append(ord(enc[i]) ^ ord(OTP))
> return toEnc
>
>HOST = '146.148.79.13'
>ciphertext = "NxQ1NDMYcDcw53gVHzI7"
>cipherlen = len(b64.b64decode(ciphertext))
>
>print "[*]Target ciphertext: [%s] (%d bytes)" % (b64.b64decode(ciphertext).encode('hex'), cipherlen)
>
># Obtain null cipher
>nullcipher = req(HOST, "")
>nulllen = len(b64.b64decode(nullcipher))
>maxplainlen = cipherlen - nulllen
>
>print "[+]Null cipher: [%s]" % nullcipher
>print "[+]Null len: %d" % nulllen
>print "[+]Max LZW plaintext len: %d" % maxplainlen
>
># SALT charset
>charset = string.ascii_lowercase
>
>print "[*]Reconstructing LZW dictionary n-grams..."
>
># Get initial dictionary bi-grams
>tupleset = getTuples(HOST, nulllen, [], charset, 2)
>tupget = tupleset
># Retrieve dictionary n-grams as long as possible
>while (len(tupget) > 0):
> tupget = getTuples(HOST, nulllen, tupget, charset, 1)
> tupleset += tupget
>
>print "[*]Recovering SALT..."
>
># Recover SALT, this could be done more efficiently by determinig order of n-gram occurance but who cares
>SALT = getSALT(tupleset, nulllen)
># Derive OTPBase
>OTPBase = getOTPBase(SALT)
># Recover compressed message
>recComp = recoverLZWCompress(ciphertext, OTPBase)
># Reconstruct SALT LZW dictionary
>saltDict = lzwDict(SALT, "")
>revSaltDict = dict((v, k) for k, v in saltDict.items())
>
>print "[+]Got SALT: [%s]" % SALT
>print "[+]Got OTPBase: [%s]" % OTPBase.encode('hex')
>print "[+]Got LZW compressed message: [%s]" % (",".join(str(i) for i in recComp))
>
># Derive plaintext message
>message = ""
>for i in range(nulllen, len(recComp)):
> if not(recComp[i] in revSaltDict):
> print "[-]Element isn't in reconstructed LZW dictionary..."
> exit()
>
> message += revSaltDict[recComp[i]]
>
>if(encrypt(SALT, message) == ciphertext):
> print "[+]Recovered message plaintext: [%s]" % message
>else:
> print "[-]Something went wrong..."
>```

Which will yield the following result:

>```bash
>$ ./weak_enc_crack.py
>[*]Target ciphertext: [371435343318703730e778151f323b] (15 bytes)
>[+]Null cipher: [NxQ1NDMYcDcw53g=]
>[+]Null len: 11
>[+]Max LZW plaintext len: 4
>[*]Reconstructing LZW dictionary n-grams...
>[*]Recovering SALT...
>[+]Got SALT: [nikonikoninikonikoni]
>[+]Got OTPBase: [99c537a6bd37cf200051b097433c3f162bd7dfcbd3d0ad67d064748dc7dda8b7e28d7b1e83249dcf5c15202211cf3db3525a54846d6b]
>[+]Got LZW compressed message: [0,1,2,3,4,6,4,8,7,5,12,11,10,13,4]
>[+]Recovered message plaintext: [nikoninikoni]
>```

Giving the flag:

>nikoninikoni

Original writeup (https://github.com/smokeleeteveryday/CTF_WRITEUPS/tree/master/2015/BCTF/crypto/weak_enc).