Reversing Pickles with r2pickledec
01 Jun 2023 - Posted by Dennis GoodlettR2pickledec is the first pickle decompiler to support all instructions up to protocol 5 (the current). In this post we will go over what Python pickles are, how they work and how to reverse them with Radare2 and r2pickledec. An upcoming blog post will go even deeper into pickles and share some advanced obfuscation techniques.
What are pickles?
Pickles are the built-in serialization algorithm in Python. They can turn any Python object into a byte stream so it may be stored on disk or sent over a network. Pickles are notoriously dangerous. You should never unpickle data from an untrusted source. Doing so will likely result in remote code execution. Please refer to the documentation for more details.
Pickle Basics
Pickles are implemented as a very simple assembly language. There are only 68
instructions and they mostly operate on a stack. The instruction names are
pretty easy to understand. For example, the instruction empty_dict
will push
an empty dictionary onto the stack.
The stack only allows access to the top item, or items in some cases. If you
want to grab something else, you must use the memo. The memo is implemented as
a dictionary with positive integer indexes. You will often see memoize
instructions. Naively, the memoize
instruction will copy the item at the top
of the stack into the next index in the memo. Then, if that item is needed
later, a binget n
can be used to get the object at index n
.
To learn more about pickles, I recommend playing with some pickles. Enable
descriptions in Radare2 with e asm.describe = true
to get short descriptions of
each instruction. Decompile simple pickles that you build yourself, and see if you
can understand the instructions.
Installing Radare2 and r2pickledec
For reversing pickles, our tool of choice is Radare2 (r2 for short). Package managers tend to ship really old r2 versions. In this case it’s probably fine, I added the pickle arch to r2 a long time ago. But if you run into any bugs I suggest installing from source.
In this blog post, we will primarily be using our R2pickledec decompiler plugin. I purposely wrote this plugin to only rely on r2 libraries. So if r2 works on your system, r2pickledec should work too. You should be able to instal with r2pm.
$ r2pm -U # update package db
$ r2pm -ci pickledec # clean install
You can verify everything worked with the following command. You should see the r2pickledec help menu.
$ r2 -a pickle -qqc 'pdP?' -
Usage: pdP[j] Decompile python pickle
| pdP Decompile python pickle until STOP, eof or bad opcode
| pdPj JSON output
| pdPf Decompile and set pick.* flags from decompiled var names
Reversing a Real pickle with Radare2 and r2pickledec
Let’s reverse a real pickle. One never reverses without some context, so let’s imagine you just broke into a webserver. The webserver is intended to allow employees of the company to perform privileged actions on client accounts. While poking around, you find a pickle file that is used by the server to restore state. What interesting things might we find in the pickle?
The pickle appears below base64 encoded. Feel free to grab it and play along at home.
$ base64 -i /tmp/blog2.pickle -b 64
gASVDQYAAAAAAACMCF9fbWFpbl9flIwDQXBplJOUKYGUfZQojAdzZXNzaW9ulIwR
cmVxdWVzdHMuc2Vzc2lvbnOUjAdTZXNzaW9ulJOUKYGUfZQojAdoZWFkZXJzlIwT
cmVxdWVzdHMuc3RydWN0dXJlc5SME0Nhc2VJbnNlbnNpdGl2ZURpY3SUk5QpgZR9
lIwGX3N0b3JllIwLY29sbGVjdGlvbnOUjAtPcmRlcmVkRGljdJSTlClSlCiMCnVz
ZXItYWdlbnSUjApVc2VyLUFnZW50lIwWcHl0aG9uLXJlcXVlc3RzLzIuMjguMpSG
lIwPYWNjZXB0LWVuY29kaW5nlIwPQWNjZXB0LUVuY29kaW5nlIwNZ3ppcCwgZGVm
bGF0ZZSGlIwGYWNjZXB0lIwGQWNjZXB0lIwDKi8qlIaUjApjb25uZWN0aW9ulIwK
Q29ubmVjdGlvbpSMCmtlZXAtYWxpdmWUhpR1c2KMB2Nvb2tpZXOUjBByZXF1ZXN0
cy5jb29raWVzlIwRUmVxdWVzdHNDb29raWVKYXKUk5QpgZR9lCiMB19wb2xpY3mU
jA5odHRwLmNvb2tpZWphcpSME0RlZmF1bHRDb29raWVQb2xpY3mUk5QpgZR9lCiM
CG5ldHNjYXBllIiMB3JmYzI5NjWUiYwTcmZjMjEwOV9hc19uZXRzY2FwZZROjAxo
aWRlX2Nvb2tpZTKUiYwNc3RyaWN0X2RvbWFpbpSJjBtzdHJpY3RfcmZjMjk2NV91
bnZlcmlmaWFibGWUiIwWc3RyaWN0X25zX3VudmVyaWZpYWJsZZSJjBBzdHJpY3Rf
bnNfZG9tYWlulEsAjBxzdHJpY3RfbnNfc2V0X2luaXRpYWxfZG9sbGFylImMEnN0
cmljdF9uc19zZXRfcGF0aJSJjBBzZWN1cmVfcHJvdG9jb2xzlIwFaHR0cHOUjAN3
c3OUhpSMEF9ibG9ja2VkX2RvbWFpbnOUKYwQX2FsbG93ZWRfZG9tYWluc5ROdWKM
CF9jb29raWVzlH2UdWKMBGF1dGiUjAVhZG1pbpSMD1BpY2tsZXMgYXJlIGZ1bpSG
lIwHcHJveGllc5R9lIwFaG9va3OUfZSMCHJlc3BvbnNllF2Uc4wGcGFyYW1zlH2U
jAZ2ZXJpZnmUiIwEY2VydJROjAhhZGFwdGVyc5RoFClSlCiMCGh0dHBzOi8vlIwR
cmVxdWVzdHMuYWRhcHRlcnOUjAtIVFRQQWRhcHRlcpSTlCmBlH2UKIwLbWF4X3Jl
dHJpZXOUjBJ1cmxsaWIzLnV0aWwucmV0cnmUjAVSZXRyeZSTlCmBlH2UKIwFdG90
YWyUSwCMB2Nvbm5lY3SUTowEcmVhZJSJjAZzdGF0dXOUTowFb3RoZXKUTowIcmVk
aXJlY3SUTowQc3RhdHVzX2ZvcmNlbGlzdJSPlIwPYWxsb3dlZF9tZXRob2RzlCiM
BVRSQUNFlIwGREVMRVRFlIwDUFVUlIwDR0VUlIwESEVBRJSMB09QVElPTlOUkZSM
DmJhY2tvZmZfZmFjdG9ylEsAjBFyYWlzZV9vbl9yZWRpcmVjdJSIjA9yYWlzZV9v
bl9zdGF0dXOUiIwHaGlzdG9yeZQpjBpyZXNwZWN0X3JldHJ5X2FmdGVyX2hlYWRl
cpSIjBpyZW1vdmVfaGVhZGVyc19vbl9yZWRpcmVjdJQojA1hdXRob3JpemF0aW9u
lJGUdWKMBmNvbmZpZ5R9lIwRX3Bvb2xfY29ubmVjdGlvbnOUSwqMDV9wb29sX21h
eHNpemWUSwqMC19wb29sX2Jsb2NrlIl1YowHaHR0cDovL5RoVymBlH2UKGhaaF0p
gZR9lChoYEsAaGFOaGKJaGNOaGROaGVOaGaPlGhoaG9ocEsAaHGIaHKIaHMpaHSI
aHUojA1hdXRob3JpemF0aW9ulJGUdWJoeH2UaHpLCmh7SwpofIl1YnWMBnN0cmVh
bZSJjAl0cnVzdF9lbnaUiIwNbWF4X3JlZGlyZWN0c5RLHnVijAdiYXNldXJslIwU
aHR0cHM6Ly9leGFtcGxlLmNvbS+UdWIu
We decode the pickle and put it in a file, lets call it test.pickle
. We
then open the file with r2. We also run x
to see some hex and pd
to print
dissassembly. If you ever want to know what an r2 command does, just run the
command but append a ?
to the end to get a help menu (e.g., pd?
).
$ r2 -a pickle test.pickle
-- .-. .- -.. .- .-. . ..---
[0x00000000]> x
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x00000000 8004 95bf 0500 0000 0000 008c 1172 6571 .............req
0x00000010 7565 7374 732e 7365 7373 696f 6e73 948c uests.sessions..
0x00000020 0753 6573 7369 6f6e 9493 9429 8194 7d94 .Session...)..}.
0x00000030 288c 0768 6561 6465 7273 948c 1372 6571 (..headers...req
0x00000040 7565 7374 732e 7374 7275 6374 7572 6573 uests.structures
0x00000050 948c 1343 6173 6549 6e73 656e 7369 7469 ...CaseInsensiti
0x00000060 7665 4469 6374 9493 9429 8194 7d94 8c06 veDict...)..}...
0x00000070 5f73 746f 7265 948c 0b63 6f6c 6c65 6374 _store...collect
0x00000080 696f 6e73 948c 0b4f 7264 6572 6564 4469 ions...OrderedDi
0x00000090 6374 9493 9429 5294 288c 0a75 7365 722d ct...)R.(..user-
0x000000a0 6167 656e 7494 8c0a 5573 6572 2d41 6765 agent...User-Age
0x000000b0 6e74 948c 1670 7974 686f 6e2d 7265 7175 nt...python-requ
0x000000c0 6573 7473 2f32 2e32 382e 3294 8694 8c0f ests/2.28.2.....
0x000000d0 6163 6365 7074 2d65 6e63 6f64 696e 6794 accept-encoding.
0x000000e0 8c0f 4163 6365 7074 2d45 6e63 6f64 696e ..Accept-Encodin
0x000000f0 6794 8c0d 677a 6970 2c20 6465 666c 6174 g...gzip, deflat
[0x00000000]> pd
0x00000000 8004 proto 0x4
0x00000002 95bf05000000. frame 0x5bf
0x0000000b 8c1172657175. short_binunicode "requests.sessions" ; 0xd
0x0000001e 94 memoize
0x0000001f 8c0753657373. short_binunicode "Session" ; 0x21 ; 2'!'
0x00000028 94 memoize
0x00000029 93 stack_global
0x0000002a 94 memoize
0x0000002b 29 empty_tuple
0x0000002c 81 newobj
0x0000002d 94 memoize
0x0000002e 7d empty_dict
0x0000002f 94 memoize
0x00000030 28 mark
0x00000031 8c0768656164. short_binunicode "headers" ; 0x33 ; 2'3'
0x0000003a 94 memoize
0x0000003b 8c1372657175. short_binunicode "requests.structures" ; 0x3d ; 2'='
0x00000050 94 memoize
0x00000051 8c1343617365. short_binunicode "CaseInsensitiveDict" ; 0x53 ; 2'S'
0x00000066 94 memoize
0x00000067 93 stack_global
From the above assembly it appears this file is indeed a pickle. We also see
requests.sessions
and Session
as strings. This pickle likely imports
requests
and uses sessions
. Let’s decompile it. We will run the command pdPf @0
~...
. This takes some explaining though, since it uses a couple of r2’s
features.
-
pdPf
- R2pickledec uses thepdP
command (seepdP?
). Adding anf
causes the decompiler to set r2 flags for every variable name. This will make renaming variables and jumping to interesting locations easier. @0
- This tells r2 to run the command at offset 0 instead of the current seek address. This does not matter now because our current offset defaults to- I just make this a habit in general to prevent mistakes when I am seeking around to patch something.
~..
- This is the r2 version of|less
. It uses r2’s built in pager. If you like the realless
better, you can just use|less
. R2 commands can be piped to any command line program.
Once we execute the command, we will see a Python-like source representation of the pickle. The code is seen below, but snipped. All comments below were added by the decompiler.
## VM stack start, len 1
## VM[0] TOP
str_xb = "__main__"
str_x16 = "Api"
g_Api_x1c = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)
str_x54 = "headers"
str_x5e = "requests.structures"
str_x74 = "CaseInsensitiveDict"
g_CaseInsensitiveDict_x8a = _find_class(str_x5e, str_x74)
str_x91 = "_store"
str_x9a = "collections"
str_xa8 = "OrderedDict"
g_OrderedDict_xb6 = _find_class(str_x9a, str_xa8)
str_xbc = "user-agent"
str_xc9 = "User-Agent"
str_xd6 = "python-requests/2.28.2"
tup_xef = (str_xc9, str_xd6)
str_xf1 = "accept-encoding"
...
str_x5c9 = "stream"
str_x5d3 = "trust_env"
str_x5e0 = "max_redirects"
dict_x51 = {
str_x54: what_x16c,
str_x16d: what_x30d,
str_x30e: tup_x32f,
str_x331: dict_x33b,
str_x33d: dict_x345,
str_x355: dict_x35e,
str_x360: True,
str_x36a: None,
str_x372: what_x5c8,
str_x5c9: False,
str_x5d3: True,
str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616
It’s usually best to start reversing at the end with the return
line. That is
what is being returned from the pickle. Hit G
to go to the end of the file.
You will see the following code.
str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616
The what_x616
variable is getting returned. The what
part of the variable
indicates that the decompiler does not know what type of object this is. This
is because what_x616
is the result of a g_Api_x1c.__new__
call. On the
other hand, g_Api_x1c
gets a g_
prefix. The decompiler knows this is a
global, since it is from an import. It even adds the Api
part in to hint at
what the import it. The x1c
and x616
indicate the offset in the pickle
where the object was created. We will use that later to patch the pickle.
Since we used flags, we can easily rename variables by renaming the flag. It
might be helpful to rename the g_Api_x1c
to make it easier to search for.
Rename the flag with fr pick.g_Api_x1c pick.api
. Notice, the flag will tab
complete. List all flags with the f
command. See f?
for help.
Now run pdP @0 ~..
again. Instead of g_Api_x1c
you will see api
. If we
search for its first use, you will find the below code.
str_xb = "__main__"
str_x16 = "Api"
api = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)
Naively, _find_class(module, name)
is equivalent to
_getattribute(sys.modules[module], name)[0]
. We can see the module is
__main__
and the name is Api
. So the api
variable is just __main__.Api
.
In this snippet of code, we see the request session being imported. You may
have noticed the baseurl
field in the previous snippet of code. Looks like
this object contains a session for making backend API requests. Can we steal
something good from it? Googling for “requests session basic authentication”
turns up the auth
attribute. Let’s look for “auth” in our pickle.
str_x30e = "auth"
str_x315 = "admin"
str_x31d = "Pickles are fun"
tup_x32f = (str_x315, str_x31d)
str_x331 = "proxies"
dict_x33b = {}
...
dict_x51 = {
str_x54: what_x16c,
str_x16d: what_x30d,
str_x30e: tup_x32f,
str_x331: dict_x33b,
str_x33d: dict_x345,
str_x355: dict_x35e,
str_x360: True,
str_x36a: None,
str_x372: what_x5c8,
str_x5c9: False,
str_x5d3: True,
str_x5e0: 30
}
It might be helpful to rename variables for understanding, or run pdP >
/tmp/pickle_source.py
to get a .py
file to open in your favorite text editor.
In short though, the above code sets up the dictionary dict_x51
where the
auth
element is set to the tuple ("admin", "Pickles are fun")
.
We just stole the admin credentials!
Patching
Now I don’t recommend doing this on a real pentest, but let’s take things farther. We can patch the pickle to use our own malicious webserver. We first need to find the current URL, so we search for “https” and find the following code.
str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = api.__new__(g_Api_x1c, *())
So the baseurl
of the API is being set to https://example.com/
. To patch
this, we seek to where the URL string is created. We can use the x5fe
in the
variable name to know where the variable was created, or we can just seek to
the pick.str_x5e
flag. When seeking to a flag in r2 you can tab complete the
flag. Notice the prompt changes its location number after the seek command.
[0x00000000]> s pick.str_x5fe
[0x000005fe]> pd 1
;-- pick.str_x5fe:
0x000005fe 8c1468747470. short_binunicode "https://example.com/" ; 0x600
Let’s overwrite this URL with https://doyensec.com/
. The below Radare2
commands are commented so you can understand what they are doing.
[0x000005fe]> oo+ # reopen file in read/write mode
[0x000005fe]> pd 3 # double check what next instructions should be
;-- pick.str_x5fe:
0x000005fe 8c1468747470. short_binunicode "https://example.com/" ; 0x600
0x00000614 94 memoize
0x00000615 75 setitems
[0x000005fe]> r+ 1 # add one extra byte to the file, since our new URL is slightly longer
[0x000005fe]> wa short_binunicode "https://doyensec.com/"
INFO: Written 23 byte(s) (short_binunicode "https://doyensec.com/") = wx 8c1568747470733a2f2f646f79656e7365632e636f6d2f @ 0x000005fe
[0x000005fe]> pd 3 # double check we did not clobber an instruction
;-- pick.str_x5fe:
0x000005fe 8c1568747470. short_binunicode "https://doyensec.com/" ; 0x600
0x00000615 94 memoize
;-- pick.what_x616:
0x00000616 75 setitems
[0x000005fe]> pdP @0 |tail # check that the patch worked
str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://doyensec.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x617 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x617.__setstate__(dict_x21)
return what_x617
JSON and Automation
Imagine this is just the first of 100 files and you want to patch them all.
Radare2 is easy to script with r2pipe.
Most commands in r2 have a JSON variant by adding a j
to the end. In this
case, pdPj
will produce an AST in JSON. This is complete with offsets. Using
this you can write a parser that will automatically find the baseurl
element
of the returned api
object, get the offset and patch it.
JSON can also be helpful without r2pipe. This is because r2 has a bunch of
built-in features for dealing with JSON. For example, we can pretty print JSON
with ~{}
, but for this pickle it would produce 1492 lines of JSON. So better
yet, use r2’s internal gron output with
~{=}
and grep for what you want.
[0x000005fe]> pdPj @0 ~{=}https
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[1][1].value[1].args[0].value[0][1].value[1].args[0].value[10][1].value[0].value = "https";
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[8][1].value[1].args[0].value = "https://";
json.stack[0].value[1].args[0].value[1][1].value = "https://doyensec.com/";
Now we can go use the provided JSON path to find the offset of the doyensec.com URL.
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].value}
https://doyensec.com/
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1]}
{"offset":1534,"type":"PY_STR","value":"https://doyensec.com/"}
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}
1534
[0x00000000]> s `pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}` ## seek to address using subcomand
[0x000005fe]> pd 1
;-- pick.str_x5fe:
0x000005fe 8c1568747470. short_binunicode "https://doyensec.com/" ; 0x600
Don’t forget you can pipe to external commands. For example, pdPj |jq
can be used to
search the AST for different patterns. For example, you could return all
objects where the type is PY_GLOBAL
.
Conclusion
The r2pickledec plugin simplifies reversing of pickles. Because it is a r2 plugin, you get all the features of r2. We barely scratched the surface of what r2 can do. If you’d like to learn more, check out the r2 book. Be sure to keep an eye out for my next post where I will go into Python pickle obfuscation techniques.