Tags: python z3 pickle reversing
Rating:
Pickled Onions was a reversing challenge where we are given a small Python file that loads a pickle object, and checks a flag that we input. It turns out to be a nested, self-unpacking program written in the Pickle bytecode language.
The initial challenge file looks like this:
__import__('pickle').loads(b'(I128\nI4\nI99\n...\x85R\x85R.')
When we run it:
What is the flag? test
Nope!
Pickle is a Python standard module allowing serialization of objects. Pickled objects can define a routine used while unpickling, and this code is included in the pickled string. The bytecode language seems to be a subset of Python. It is stack based, with some 'memo' slots for more persistent storage, and can build dicts, tuples and call functions.
While researching, I found the pickletools
module, which allows us to disassemble a pickle string, and optionally annotate the meaning of each opcode. We will begin by disassembling the bytecode.
>>> pickletools.dis(challenge, annotate=30)
0: ( MARK Push markobject onto the stack.
1: I INT 128 Push an integer or bool.
6: I INT 4 Push an integer or bool.
9: I INT 99 Push an integer or bool.
13: I INT 112 Push an integer or bool.
18: I INT 105 Push an integer or bool.
23: I INT 99 Push an integer or bool.
27: I INT 107 Push an integer or bool.
32: I INT 108 Push an integer or bool.
37: I INT 101 Push an integer or bool.
42: I INT 10 Push an integer or bool.
46: I INT 105 Push an integer or bool.
51: I INT 111 Push an integer or bool.
56: I INT 10 Push an integer or bool.
60: I INT 40 Push an integer or bool.
64: I INT 140 Push an integer or bool.
69: I INT 18 Push an integer or bool.
[ ... ]
544273: t TUPLE (MARK at 0) Build a tuple out of the topmost stack slice, after markobject.
544274: p PUT 69420 Store the stack top into the memo. The stack is not popped.
544281: c GLOBAL 'pickle loads' Push a global object (module.attr) on the stack.
544295: c GLOBAL 'builtins bytes' Push a global object (module.attr) on the stack.
544311: g GET 69420 Read an object from the memo and push it on the stack.
544318: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
544319: R REDUCE Push an object built from a callable and an argument tuple.
544320: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
544321: R REDUCE Push an object built from a callable and an argument tuple.
544322: . STOP Stop the unpickling machine.
When unpickled, this string pushes thousands of int
s to the stack, then builds a tuple
out of them, storing it at memo position 69420
. It then loads the bytes
function, converting the tuple into a bytestring
, then calls pickle.loads
on it. We can deduce that the integers pushed are another pickled object, and extract it.
The stage2 pickle code is much more complex. it pushes a number of constants, which themselves appear to be more nested pickles, and a number of names such as pickledgreekoregano
. It then builds these all into a dict, and calls input with the What is the flag?
prompt. At this point I was stuck for a while, as I wasn't able to follow the sequence of operations. Luckily, my teammate discovered Fickling, a module written by TrailOfBits (coincidentally, a sponsor of RedPwnCTF). This module can disassemble a pickle into a Python AST - and the built-in unparse
function can transform the AST into something resembling valid Python source code. Fickling didn't have support for the INT
and DICT
opcodes, so I patched these in:
diff --git a/fickling/pickle.py b/fickling/pickle.py
index ab62566..3eb46c4 100644
--- a/fickling/pickle.py
+++ b/fickling/pickle.py
@@ -162,6 +162,10 @@ class ConstantOpcode(Opcode):
interpreter.stack.append(make_constant(self.arg))
+class Int(ConstantOpcode):
+ name = "INT"
+
+
class StackSliceOpcode(Opcode):
def run(self, interpreter: "Interpreter", stack_slice: List[ast.expr]):
raise NotImplementedError(f"{self.__class__.__name__} must implement run()")
@@ -743,6 +747,18 @@ class Tuple(StackSliceOpcode):
interpreter.stack.append(ast.Tuple(tuple(stack_slice), ast.Load()))
+class Dict(StackSliceOpcode):
+ name = "DICT"
+
+ def run(self, interpreter: Interpreter, stack_slice: List[ast.expr]):
+ k = list()
+ v = list()
+ for i in range(0, len(stack_slice), 2):
+ k.append(stack_slice[i])
+ v.append(stack_slice[i+1])
+ interpreter.stack.append(ast.Dict(k, v))
+
+
class Build(Opcode):
name = "BUILD"
Here was the result:
from pickle import io
from builtins import input
_var0 = input('What is the flag? ')
_var1 = io
_var1.__setstate__({'pickledhorseradish': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__add__\ncio\npickledbarberry\n\x85R.', 'pickledcoconut': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__sub__\ncio\npickledbarberry\n\x85R.', 'pickledlychee': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__xor__\ncio\npickledbarberry\n\x85R.', 'pickledcrabapple': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__eq__\ncio\npickledbarberry\n\x85R.', 'pickledportabella': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__ne__\ncio\npickledbarberry\n\x85R.', 'pickledquince': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__le__\ncio\npickledbarberry\n\x85R.', 'pickledeasternmayhawthorn': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__ge__\ncio\npickledbarberry\n\x85R.', 'pickledmonstera': b'I1\n.', 'pickledcorneliancherry': b'I0\n.', 'pickledalligatorapple': b'\x80\x04I', 'pickledboysenberry': <VERY VERY LONG PICKLED DATA> 'pickledgreekoregano': '\nI', 'pickledpupunha': ('Nope!', 'Correct!'), 'pickledximenia': _var0, 'pickledgarlic': ('pickledcorneliancherry', 'pickledboysenberry')})
from io import pickledximenia.__len__
_var2 = pickledximenia.__len__()
from io import pickledximenia.encode
_var3 = pickledximenia.encode()
_var4 = _var1
_var4.__setstate__({'pickledburmesegrape': _var2, 'pickledximenia': _var3})
from builtins import print
from io import pickledpupunha.__getitem__
from pickle import loads
from io import pickledgarlic.__getitem__
from io import pickledburmesegrape.__eq__
_var5 = pickledburmesegrape.__eq__(64)
_var6 = pickledgarlic.__getitem__(_var5)
from io import _var6
_var7 = loads(_var6)
_var8 = pickledpupunha.__getitem__(_var7)
_var9 = print(_var8)
result = _var9
At first it may not be immediately obvious what this is doing, but if we follow it through it begins to make sense. We see pickle.io
is assigned to var1, which then has __setstate__
called upon it. setstate
assigns each key/value pair as a property of pickle.io
- this is so these constant strings are added to the io
module, and can be accessed later on. We can rewrite this as regular Python:
flag = input("What is the flag? ")
io.pickledhorseradish = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__add__\ncio\npickledbarberry\n\x85R.'
io.pickledcoconut = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__sub__\ncio\npickledbarberry\n\x85R.'
io.pickledlychee = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__xor__\ncio\npickledbarberry\n\x85R.'
io.pickledcrabapple = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__eq__\ncio\npickledbarberry\n\x85R.'
io.pickledportabella = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__ne__\ncio\npickledbarberry\n\x85R.'
io.pickledquince = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__le__\ncio\npickledbarberry\n\x85R.'
io.pickledeasternmayhawthorn = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__ge__\ncio\npickledbarberry\n\x85R.'
# Each of these pushes io.pickledmacadamia.__add__ to the stack followed by (pickledbarberry,)
# Or another method
io.pickledmonstera = 'I1\n.'
io.pickledcorneliancherry = 'I0\n.'
io.pickledalligatorapple = '\x80\x04I'
io.pickledboysenberry = 'VERY LONG'
io.pickledgreekoregano = '\nI'
io.pickledpupunha = ("Nope!", "Correct!")
io.pickledximenia = flag
io.pickledgarlic = ("pickledcorneliancherry", "pickledboysenberry")
io.pickledburmesgrape = flag.len()
io.pickledximenia = flag.encode()
var5 = int(io.pickledburmesgrape == 64) # 1 if len(input) is 64, 0 otherwise
var6 = pickledgarlic[var5] # pickledcorneliancherry if len(input) is not 64, pickledboysenberry if it is
var7 = loads(var6) # loads(b'I0\n.') if len(input) is not 64, loads(b'VERY LONG') if it is
var8 = pickledpupunha[var7] # 'Nope!' if returned value is 0, 'Correct!' if it's 1
print(var8)
The first few variables saved are constants which, when unpickled, will compare io.pickledmacadamia
to io.pickledbarberry
in some way. If our flag length is 64, it will run the pickle in pickledboysenberry
- otherwise, it will skip it and return 0. Our flag is saved to pickledximenia
.
Again running Fickling on our new pickle, we receive this output
from pickle import io
from io import pickledgreekoregano.join
from builtins import map
from builtins import str
_var0 = map(str, pickledximenia)
_var1 = b"\nI".join(_var0)
_var2 = io
_var2.__setstate__({'pickledximenia': _var1})
from io import pickledximenia.encode
_var3 = pickledximenia.encode()
_var4 = _var2
_var4.__setstate__({'pickledximenia': _var3})
from io import pickledalligatorapple.__add__
from io import pickledximenia
_var5 = pickledalligatorapple.__add__(pickledximenia)
_var6 = _var4
_var6.__setstate__({'pickledalligatorapple': _var5})
from io import pickledalligatorapple.__add__
_var7 = pickledalligatorapple.__add__(b'\n')
_var8 = _var6
_var8.__setstate__({'pickledalligatorapple': _var7})
from io import pickledalligatorapple
_var9 = _var8
_var9.__setstate__({'pickledblackapple': pickledalligatorapple, 'pickledyellowpeartomato': 1})
If we clean it up:
flag = "\nI".join([str(x) for x in pickledximenia]).encode()
flag = b'\x80\x04I' + flag + b'\n'
io.pickledblackapple = flag
io.pickledyellowpeartomato = 1
Our flag is converted into 64 integers which are pushed onto the stack.
The next part:
from io import pickledblackapple.__add__
_var10 = pickledblackapple.__add__(b'p0\n0cpickle\nio\n(\x8c\x10pickledmacadamiag0\ndb(\x8c\x17pickledyellowpeartomatocio\npickledyellowpeartomato.__and__\ncio\npickledmacadamia.__gt__\nI32\n\x85R\x85Rdb(\x8c\x17pickledyellowpeartomatocio\npickledyellowpeartomato.__and__\ncio\npickledmacadamia.__lt__\nI127\n\x85R\x85Rdb0')
_var11 = _var9
_var11.__setstate__({'pickledblackapple': _var10})
This part is repeated 64 times, and looks like this when disassembled
>>> pickletools.dis(a, annotate=50)
4: p PUT 0 Store the stack top into the memo. The stack is not popped.
7: 0 POP Discard the top stack item, shrinking the stack by one item.
8: c GLOBAL 'pickle io' Push a global object (module.attr) on the stack.
19: ( MARK Push markobject onto the stack.
20: \x8c SHORT_BINUNICODE 'pickledmacadamia' Push a Python Unicode string object.
38: g GET 0 Read an object from the memo and push it on the stack.
41: d DICT (MARK at 19) Build a dict out of the topmost stack slice, after markobject.
42: b BUILD Finish building an object, via __setstate__ or dict update.
43: ( MARK Push markobject onto the stack.
44: \x8c SHORT_BINUNICODE 'pickledyellowpeartomato' Push a Python Unicode string object.
69: c GLOBAL 'io pickledyellowpeartomato.__and__' Push a global object (module.attr) on the stack.
105: c GLOBAL 'io pickledmacadamia.__gt__' Push a global object (module.attr) on the stack.
133: I INT 32 Push an integer or bool.
137: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
138: R REDUCE Push an object built from a callable and an argument tuple.
139: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
140: R REDUCE Push an object built from a callable and an argument tuple.
141: d DICT (MARK at 43) Build a dict out of the topmost stack slice, after markobject.
142: b BUILD Finish building an object, via __setstate__ or dict update.
143: ( MARK Push markobject onto the stack.
144: \x8c SHORT_BINUNICODE 'pickledyellowpeartomato' Push a Python Unicode string object.
169: c GLOBAL 'io pickledyellowpeartomato.__and__' Push a global object (module.attr) on the stack.
205: c GLOBAL 'io pickledmacadamia.__lt__' Push a global object (module.attr) on the stack.
233: I INT 127 Push an integer or bool.
238: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
239: R REDUCE Push an object built from a callable and an argument tuple.
240: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
241: R REDUCE Push an object built from a callable and an argument tuple.
242: d DICT (MARK at 143) Build a dict out of the topmost stack slice, after markobject.
243: b BUILD Finish building an object, via __setstate__ or dict update.
244: 0 POP Discard the top stack item, shrinking the stack by one item.
This looks complex, but with a quick glance at the constants and functions, we can guess the purpose - checking that each character of the flag is between 32 and 127 - or 0x20 <= c <= 0x7f
, i.e. the printable ASCII range.
We then reach the net part, which is where the flag checking logic actually begins:
from io import pickledalligatorapple.__add__
_var140 = pickledalligatorapple.__add__(b'000000000000000000000000000000000000000000000000000000000000000p0\n0cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI102\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x11pickledbluetongue\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var141 = pickledalligatorapple.__add__(b'00000000000000000000000000000000000000000000000000000000000000p0\n00cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI108\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x16pickledroseleafbramble\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var142 = pickledalligatorapple.__add__(b'0000000000000000000000000000000000000000000000000000000000000p0\n000cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI97\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x0cpickledolive\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var143 = pickledalligatorapple.__add__(b'000000000000000000000000000000000000000000000000000000000000p0\n0000cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI103\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x0fpickledvoavanga\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
_var144 = _var139
If we disassemble the first, we see:
POP * 63
4: p PUT 0
7: 0 POP
8: c GLOBAL 'pickle io'
19: ( MARK
20: \x8c SHORT_BINUNICODE 'pickledmacadamia'
38: g GET 0
41: \x8c SHORT_BINUNICODE 'pickledbarberry'
58: I INT 102
63: \x8c SHORT_BINUNICODE 'pickledgarlic'
78: \x8c SHORT_BINUNICODE 'pickledcorneliancherry'
102: \x8c SHORT_BINUNICODE 'pickledbluetongue'
121: \x86 TUPLE2
122: d DICT (MARK at 19)
123: b BUILD
124: c GLOBAL 'pickle loads'
138: \x8c SHORT_BINUNICODE 'io'
142: c GLOBAL 'io pickledgarlic.__getitem__'
172: c GLOBAL 'pickle loads'
186: c GLOBAL 'io pickledcrabapple'
207: \x85 TUPLE1
208: R REDUCE
209: \x85 TUPLE1
210: R REDUCE
211: \x93 STACK_GLOBAL
212: \x85 TUPLE1
213: R REDUCE
214: . STOP
This constructs a dict of {pickledmacadamia: <first char>, pickledbarberry: 102, pickledgarlic: (pickledbluetongue, pickledcorneliancherry)}
We then use pickledcrabapple
(earlier defined as pickledmacadamia.__eq__(pickledbarberry)
). This is repeated 4 times, with different numbers of pops and constants. If we convert the constants to ASCII, we get flag
- this pops the first 4 characters, and checks that they are 'flag'.
The rest of the checking logic is a little more complex. They each load two characters from the flag, and compare them using sub, xor, eq, ne, le, ge
. Some checks also add
two characters together then use eq
to check if it matches a constant. This is a perfect case for solving using Z3.
We create a stub like this:
from z3 import *
s = Solver()
flag = [BitVec('flag_%s' % i, 8) for i in range(0, 64)]
for c in flag:
s.add(c <= 0x7f)
s.add(c >= 0x20)
Then all we need to do is parse each check and convert it into a z3-style check.
data = open("checks.txt").read().split("\n")
import pickle
for line in data:
# Convert the data into a bytestring literal
pickletext = eval("b'" + line + "'")
example = pickletext
# Create a fake stack so it will unpickle properly. The value at each index is the index, so we can track the position of the checks
flag = [i for i in range(0, 64)]
flag = b"I" + b'\nI'.join([str(i).encode('latin1') for i in flag]) + b"\n"
end = example.index(b'dbc') + 1
# Slice it up to the parts calling functions - we only want to build the dict
example = example[:end]
example = flag + example + b"."
example = (pickle.loads(example))
# Load the 2
p0 = example["pickledmacadamia"]
p1 = example["pickledbarberry"]
# Check which operation is used, and output the appropriate check
if b"pickledhorseradish" in pickletext:
tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
print(f"s.add(flag[{p0}] + flag[{p1}] == {tmp})")
if b"pickledlychee" in pickletext:
tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
print(f"s.add(flag[{p0}] ^ flag[{p1}] == {tmp})")
if b"pickledcoconut" in pickletext:
tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
print(f"s.add(flag[{p0}] - flag[{p1}] == {tmp})")
We are now able to print the z3-generated model, and read the flag:
flag{n0w_th4t5_4_b1g_p1ckl3_1n_4_p1ckl3_but_1t_n33d3d_s0m3_h3lp}