Tags: python z3 pickle reversing
Rating:
Pickled Onions was a reversing challenge where we are given a small Python file that loads a [pickle](https://docs.python.org/3/library/pickle.html) object, and checks a flag that we input. It turns out to be a nested, self-unpacking program written in the Pickle bytecode language.
## Stage 1
The initial challenge file looks like this:
```python
__import__('pickle').loads(b'(I128\nI4\nI99\n...\x85R\x85R.')
```
When we run it:
```
What is the flag? test
Nope!
```
Pickle is a Python standard module allowing serialization of objects. Pickled objects can define a routine used while unpickling, and this code is included in the pickled string. The bytecode language seems to be a subset of Python. It is stack based, with some 'memo' slots for more persistent storage, and can build dicts, tuples and call functions.
While researching, I found the `pickletools` module, which allows us to disassemble a pickle string, and optionally annotate the meaning of each opcode. We will begin by disassembling the bytecode.
```
>>> pickletools.dis(challenge, annotate=30)
0: ( MARK Push markobject onto the stack.
1: I INT 128 Push an integer or bool.
6: I INT 4 Push an integer or bool.
9: I INT 99 Push an integer or bool.
13: I INT 112 Push an integer or bool.
18: I INT 105 Push an integer or bool.
23: I INT 99 Push an integer or bool.
27: I INT 107 Push an integer or bool.
32: I INT 108 Push an integer or bool.
37: I INT 101 Push an integer or bool.
42: I INT 10 Push an integer or bool.
46: I INT 105 Push an integer or bool.
51: I INT 111 Push an integer or bool.
56: I INT 10 Push an integer or bool.
60: I INT 40 Push an integer or bool.
64: I INT 140 Push an integer or bool.
69: I INT 18 Push an integer or bool.
[ ... ]
544273: t TUPLE (MARK at 0) Build a tuple out of the topmost stack slice, after markobject.
544274: p PUT 69420 Store the stack top into the memo. The stack is not popped.
544281: c GLOBAL 'pickle loads' Push a global object (module.attr) on the stack.
544295: c GLOBAL 'builtins bytes' Push a global object (module.attr) on the stack.
544311: g GET 69420 Read an object from the memo and push it on the stack.
544318: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
544319: R REDUCE Push an object built from a callable and an argument tuple.
544320: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
544321: R REDUCE Push an object built from a callable and an argument tuple.
544322: . STOP Stop the unpickling machine.
```
When unpickled, this string pushes thousands of `int`s to the stack, then builds a `tuple` out of them, storing it at memo position `69420`. It then loads the `bytes` function, converting the tuple into a `bytestring`, then calls `pickle.loads` on it. We can deduce that the integers pushed are another pickled object, and extract it.
## Stage 2
The stage2 pickle code is much more complex. it pushes a number of constants, which themselves appear to be more nested pickles, and a number of names such as `pickledgreekoregano`. It then builds these all into a dict, and calls input with the `What is the flag?` prompt. At this point I was stuck for a while, as I wasn't able to follow the sequence of operations. Luckily, my teammate discovered [Fickling](https://github.com/trailofbits/fickling), a module written by TrailOfBits (coincidentally, a sponsor of RedPwnCTF). This module can disassemble a pickle into a Python [AST](https://docs.python.org/3/library/ast.html) - and the built-in `unparse` function can transform the AST into something resembling valid Python source code. Fickling didn't have support for the `INT` and `DICT` opcodes, so I patched these in:
```diff
diff --git a/fickling/pickle.py b/fickling/pickle.py
index ab62566..3eb46c4 100644
--- a/fickling/pickle.py
+++ b/fickling/pickle.py
@@ -162,6 +162,10 @@ class ConstantOpcode(Opcode):
interpreter.stack.append(make_constant(self.arg))
+class Int(ConstantOpcode):
+ name = "INT"
+
+
class StackSliceOpcode(Opcode):
def run(self, interpreter: "Interpreter", stack_slice: List[ast.expr]):
raise NotImplementedError(f"{self.__class__.__name__} must implement run()")
@@ -743,6 +747,18 @@ class Tuple(StackSliceOpcode):
interpreter.stack.append(ast.Tuple(tuple(stack_slice), ast.Load()))
+class Dict(StackSliceOpcode):
+ name = "DICT"
+
+ def run(self, interpreter: Interpreter, stack_slice: List[ast.expr]):
+ k = list()
+ v = list()
+ for i in range(0, len(stack_slice), 2):
+ k.append(stack_slice[i])
+ v.append(stack_slice[i+1])
+ interpreter.stack.append(ast.Dict(k, v))
+
+
class Build(Opcode):
name = "BUILD"
```
Here was the result:
```py
from pickle import io
from builtins import input
_var0 = input('What is the flag? ')
_var1 = io
_var1.__setstate__({'pickledhorseradish': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__add__\ncio\npickledbarberry\n\x85R.', 'pickledcoconut': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__sub__\ncio\npickledbarberry\n\x85R.', 'pickledlychee': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__xor__\ncio\npickledbarberry\n\x85R.', 'pickledcrabapple': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__eq__\ncio\npickledbarberry\n\x85R.', 'pickledportabella': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__ne__\ncio\npickledbarberry\n\x85R.', 'pickledquince': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__le__\ncio\npickledbarberry\n\x85R.', 'pickledeasternmayhawthorn': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__ge__\ncio\npickledbarberry\n\x85R.', 'pickledmonstera': b'I1\n.', 'pickledcorneliancherry': b'I0\n.', 'pickledalligatorapple': b'\x80\x04I', 'pickledboysenberry': <VERY VERY LONG PICKLED DATA> 'pickledgreekoregano': '\nI', 'pickledpupunha': ('Nope!', 'Correct!'), 'pickledximenia': _var0, 'pickledgarlic': ('pickledcorneliancherry', 'pickledboysenberry')})
from io import pickledximenia.__len__
_var2 = pickledximenia.__len__()
from io import pickledximenia.encode
_var3 = pickledximenia.encode()
_var4 = _var1
_var4.__setstate__({'pickledburmesegrape': _var2, 'pickledximenia': _var3})
from builtins import print
from io import pickledpupunha.__getitem__
from pickle import loads
from io import pickledgarlic.__getitem__
from io import pickledburmesegrape.__eq__
_var5 = pickledburmesegrape.__eq__(64)
_var6 = pickledgarlic.__getitem__(_var5)
from io import _var6
_var7 = loads(_var6)
_var8 = pickledpupunha.__getitem__(_var7)
_var9 = print(_var8)
result = _var9
```
At first it may not be immediately obvious what this is doing, but if we follow it through it begins to make sense. We see `pickle.io` is assigned to var1, which then has `__setstate__` called upon it. `setstate` assigns each key/value pair as a property of `pickle.io` - this is so these constant strings are added to the `io` module, and can be accessed later on. We can rewrite this as regular Python:
```python
flag = input("What is the flag? ")
io.pickledhorseradish = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__add__\ncio\npickledbarberry\n\x85R.'
io.pickledcoconut = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__sub__\ncio\npickledbarberry\n\x85R.'
io.pickledlychee = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__xor__\ncio\npickledbarberry\n\x85R.'
io.pickledcrabapple = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__eq__\ncio\npickledbarberry\n\x85R.'
io.pickledportabella = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__ne__\ncio\npickledbarberry\n\x85R.'
io.pickledquince = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__le__\ncio\npickledbarberry\n\x85R.'
io.pickledeasternmayhawthorn = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__ge__\ncio\npickledbarberry\n\x85R.'
# Each of these pushes io.pickledmacadamia.__add__ to the stack followed by (pickledbarberry,)
# Or another method
io.pickledmonstera = 'I1\n.'
io.pickledcorneliancherry = 'I0\n.'
io.pickledalligatorapple = '\x80\x04I'
io.pickledboysenberry = 'VERY LONG'
io.pickledgreekoregano = '\nI'
io.pickledpupunha = ("Nope!", "Correct!")
io.pickledximenia = flag
io.pickledgarlic = ("pickledcorneliancherry", "pickledboysenberry")
io.pickledburmesgrape = flag.len()
io.pickledximenia = flag.encode()
var5 = int(io.pickledburmesgrape == 64) # 1 if len(input) is 64, 0 otherwise
var6 = pickledgarlic[var5] # pickledcorneliancherry if len(input) is not 64, pickledboysenberry if it is
var7 = loads(var6) # loads(b'I0\n.') if len(input) is not 64, loads(b'VERY LONG') if it is
var8 = pickledpupunha[var7] # 'Nope!' if returned value is 0, 'Correct!' if it's 1
print(var8)
```
The first few variables saved are constants which, when unpickled, will compare `io.pickledmacadamia` to `io.pickledbarberry` in some way. If our flag length is 64, it will run the pickle in `pickledboysenberry` - otherwise, it will skip it and return 0. Our flag is saved to `pickledximenia`.
## Stage 3
Again running Fickling on our new pickle, we receive this output
```python
from pickle import io
from io import pickledgreekoregano.join
from builtins import map
from builtins import str
_var0 = map(str, pickledximenia)
_var1 = b"\nI".join(_var0)
_var2 = io
_var2.__setstate__({'pickledximenia': _var1})
from io import pickledximenia.encode
_var3 = pickledximenia.encode()
_var4 = _var2
_var4.__setstate__({'pickledximenia': _var3})
from io import pickledalligatorapple.__add__
from io import pickledximenia
_var5 = pickledalligatorapple.__add__(pickledximenia)
_var6 = _var4
_var6.__setstate__({'pickledalligatorapple': _var5})
from io import pickledalligatorapple.__add__
_var7 = pickledalligatorapple.__add__(b'\n')
_var8 = _var6
_var8.__setstate__({'pickledalligatorapple': _var7})
from io import pickledalligatorapple
_var9 = _var8
_var9.__setstate__({'pickledblackapple': pickledalligatorapple, 'pickledyellowpeartomato': 1})
```
If we clean it up:
```python
flag = "\nI".join([str(x) for x in pickledximenia]).encode()
flag = b'\x80\x04I' + flag + b'\n'
io.pickledblackapple = flag
io.pickledyellowpeartomato = 1
```
Our flag is converted into 64 integers which are pushed onto the stack.
The next part:
```python
from io import pickledblackapple.__add__
_var10 = pickledblackapple.__add__(b'p0\n0cpickle\nio\n(\x8c\x10pickledmacadamiag0\ndb(\x8c\x17pickledyellowpeartomatocio\npickledyellowpeartomato.__and__\ncio\npickledmacadamia.__gt__\nI32\n\x85R\x85Rdb(\x8c\x17pickledyellowpeartomatocio\npickledyellowpeartomato.__and__\ncio\npickledmacadamia.__lt__\nI127\n\x85R\x85Rdb0')
_var11 = _var9
_var11.__setstate__({'pickledblackapple': _var10})
```
This part is repeated 64 times, and looks like this when disassembled
```
>>> pickletools.dis(a, annotate=50)
4: p PUT 0 Store the stack top into the memo. The stack is not popped.
7: 0 POP Discard the top stack item, shrinking the stack by one item.
8: c GLOBAL 'pickle io' Push a global object (module.attr) on the stack.
19: ( MARK Push markobject onto the stack.
20: \x8c SHORT_BINUNICODE 'pickledmacadamia' Push a Python Unicode string object.
38: g GET 0 Read an object from the memo and push it on the stack.
41: d DICT (MARK at 19) Build a dict out of the topmost stack slice, after markobject.
42: b BUILD Finish building an object, via __setstate__ or dict update.
43: ( MARK Push markobject onto the stack.
44: \x8c SHORT_BINUNICODE 'pickledyellowpeartomato' Push a Python Unicode string object.
69: c GLOBAL 'io pickledyellowpeartomato.__and__' Push a global object (module.attr) on the stack.
105: c GLOBAL 'io pickledmacadamia.__gt__' Push a global object (module.attr) on the stack.
133: I INT 32 Push an integer or bool.
137: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
138: R REDUCE Push an object built from a callable and an argument tuple.
139: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
140: R REDUCE Push an object built from a callable and an argument tuple.
141: d DICT (MARK at 43) Build a dict out of the topmost stack slice, after markobject.
142: b BUILD Finish building an object, via __setstate__ or dict update.
143: ( MARK Push markobject onto the stack.
144: \x8c SHORT_BINUNICODE 'pickledyellowpeartomato' Push a Python Unicode string object.
169: c GLOBAL 'io pickledyellowpeartomato.__and__' Push a global object (module.attr) on the stack.
205: c GLOBAL 'io pickledmacadamia.__lt__' Push a global object (module.attr) on the stack.
233: I INT 127 Push an integer or bool.
238: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
239: R REDUCE Push an object built from a callable and an argument tuple.
240: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
241: R REDUCE Push an object built from a callable and an argument tuple.
242: d DICT (MARK at 143) Build a dict out of the topmost stack slice, after markobject.
243: b BUILD Finish building an object, via __setstate__ or dict update.
244: 0 POP Discard the top stack item, shrinking the stack by one item.
```
This looks complex, but with a quick glance at the constants and functions, we can guess the purpose - checking that each character of the flag is between 32 and 127 - or `0x20 <= c <= 0x7f`, i.e. the printable ASCII range.
We then reach the net part, which is where the flag checking logic actually begins:
```python
from io import pickledalligatorapple.__add__
_var140 = pickledalligatorapple.__add__(b'000000000000000000000000000000000000000000000000000000000000000p0\n0cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI102\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x11pickledbluetongue\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var141 = pickledalligatorapple.__add__(b'00000000000000000000000000000000000000000000000000000000000000p0\n00cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI108\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x16pickledroseleafbramble\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var142 = pickledalligatorapple.__add__(b'0000000000000000000000000000000000000000000000000000000000000p0\n000cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI97\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x0cpickledolive\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var143 = pickledalligatorapple.__add__(b'000000000000000000000000000000000000000000000000000000000000p0\n0000cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI103\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x0fpickledvoavanga\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
_var144 = _var139
```
If we disassemble the first, we see:
```
POP * 63
4: p PUT 0
7: 0 POP
8: c GLOBAL 'pickle io'
19: ( MARK
20: \x8c SHORT_BINUNICODE 'pickledmacadamia'
38: g GET 0
41: \x8c SHORT_BINUNICODE 'pickledbarberry'
58: I INT 102
63: \x8c SHORT_BINUNICODE 'pickledgarlic'
78: \x8c SHORT_BINUNICODE 'pickledcorneliancherry'
102: \x8c SHORT_BINUNICODE 'pickledbluetongue'
121: \x86 TUPLE2
122: d DICT (MARK at 19)
123: b BUILD
124: c GLOBAL 'pickle loads'
138: \x8c SHORT_BINUNICODE 'io'
142: c GLOBAL 'io pickledgarlic.__getitem__'
172: c GLOBAL 'pickle loads'
186: c GLOBAL 'io pickledcrabapple'
207: \x85 TUPLE1
208: R REDUCE
209: \x85 TUPLE1
210: R REDUCE
211: \x93 STACK_GLOBAL
212: \x85 TUPLE1
213: R REDUCE
214: . STOP
```
This constructs a dict of `{pickledmacadamia: <first char>, pickledbarberry: 102, pickledgarlic: (pickledbluetongue, pickledcorneliancherry)}`
We then use `pickledcrabapple` (earlier defined as `pickledmacadamia.__eq__(pickledbarberry)`). This is repeated 4 times, with different numbers of pops and constants. If we convert the constants to ASCII, we get `flag` - this pops the first 4 characters, and checks that they are 'flag'.
## Checking Logic
The rest of the checking logic is a little more complex. They each load two characters from the flag, and compare them using `sub, xor, eq, ne, le, ge`. Some checks also `add` two characters together then use `eq` to check if it matches a constant. This is a perfect case for solving using [Z3](https://github.com/Z3Prover/z3).
We create a stub like this:
```python
from z3 import *
s = Solver()
flag = [BitVec('flag_%s' % i, 8) for i in range(0, 64)]
for c in flag:
s.add(c <= 0x7f)
s.add(c >= 0x20)
```
Then all we need to do is parse each check and convert it into a z3-style check.
```python
data = open("checks.txt").read().split("\n")
import pickle
for line in data:
# Convert the data into a bytestring literal
pickletext = eval("b'" + line + "'")
example = pickletext
# Create a fake stack so it will unpickle properly. The value at each index is the index, so we can track the position of the checks
flag = [i for i in range(0, 64)]
flag = b"I" + b'\nI'.join([str(i).encode('latin1') for i in flag]) + b"\n"
end = example.index(b'dbc') + 1
# Slice it up to the parts calling functions - we only want to build the dict
example = example[:end]
example = flag + example + b"."
example = (pickle.loads(example))
# Load the 2
p0 = example["pickledmacadamia"]
p1 = example["pickledbarberry"]
# Check which operation is used, and output the appropriate check
if b"pickledhorseradish" in pickletext:
tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
print(f"s.add(flag[{p0}] + flag[{p1}] == {tmp})")
if b"pickledlychee" in pickletext:
tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
print(f"s.add(flag[{p0}] ^ flag[{p1}] == {tmp})")
if b"pickledcoconut" in pickletext:
tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
print(f"s.add(flag[{p0}] - flag[{p1}] == {tmp})")
```
We are now able to print the z3-generated model, and read the flag:
`flag{n0w_th4t5_4_b1g_p1ckl3_1n_4_p1ckl3_but_1t_n33d3d_s0m3_h3lp}`