Reversing Pickles with r2pickledec

R2pickledec is the first pickle decompiler to support all instructions up to protocol 5 (the current). In this post we will go over what Python pickles are, how they work and how to reverse them with Radare2 and r2pickledec. An upcoming blog post will go even deeper into pickles and share some advanced obfuscation techniques.

What are pickles?

Pickles are the built-in serialization algorithm in Python. They can turn any Python object into a byte stream so it may be stored on disk or sent over a network. Pickles are notoriously dangerous. You should never unpickle data from an untrusted source. Doing so will likely result in remote code execution. Please refer to the documentation for more details.

Pickle Basics

Pickles are implemented as a very simple assembly language. There are only 68 instructions and they mostly operate on a stack. The instruction names are pretty easy to understand. For example, the instruction empty_dict will push an empty dictionary onto the stack.

The stack only allows access to the top item, or items in some cases. If you want to grab something else, you must use the memo. The memo is implemented as a dictionary with positive integer indexes. You will often see memoize instructions. Naively, the memoize instruction will copy the item at the top of the stack into the next index in the memo. Then, if that item is needed later, a binget n can be used to get the object at index n.

To learn more about pickles, I recommend playing with some pickles. Enable descriptions in Radare2 with e asm.describe = true to get short descriptions of each instruction. Decompile simple pickles that you build yourself, and see if you can understand the instructions.

Installing Radare2 and r2pickledec

For reversing pickles, our tool of choice is Radare2 (r2 for short). Package managers tend to ship really old r2 versions. In this case it’s probably fine, I added the pickle arch to r2 a long time ago. But if you run into any bugs I suggest installing from source.

In this blog post, we will primarily be using our R2pickledec decompiler plugin. I purposely wrote this plugin to only rely on r2 libraries. So if r2 works on your system, r2pickledec should work too. You should be able to instal with r2pm.

$ r2pm -U             # update package db
$ r2pm -ci pickledec  # clean install

You can verify everything worked with the following command. You should see the r2pickledec help menu.

$ r2 -a pickle -qqc 'pdP?' -
Usage: pdP[j]  Decompile python pickle
| pdP   Decompile python pickle until STOP, eof or bad opcode
| pdPj  JSON output
| pdPf  Decompile and set pick.* flags from decompiled var names

Reversing a Real pickle with Radare2 and r2pickledec

Let’s reverse a real pickle. One never reverses without some context, so let’s imagine you just broke into a webserver. The webserver is intended to allow employees of the company to perform privileged actions on client accounts. While poking around, you find a pickle file that is used by the server to restore state. What interesting things might we find in the pickle?

The pickle appears below base64 encoded. Feel free to grab it and play along at home.

$ base64 -i /tmp/blog2.pickle -b 64
gASVDQYAAAAAAACMCF9fbWFpbl9flIwDQXBplJOUKYGUfZQojAdzZXNzaW9ulIwR
cmVxdWVzdHMuc2Vzc2lvbnOUjAdTZXNzaW9ulJOUKYGUfZQojAdoZWFkZXJzlIwT
cmVxdWVzdHMuc3RydWN0dXJlc5SME0Nhc2VJbnNlbnNpdGl2ZURpY3SUk5QpgZR9
lIwGX3N0b3JllIwLY29sbGVjdGlvbnOUjAtPcmRlcmVkRGljdJSTlClSlCiMCnVz
ZXItYWdlbnSUjApVc2VyLUFnZW50lIwWcHl0aG9uLXJlcXVlc3RzLzIuMjguMpSG
lIwPYWNjZXB0LWVuY29kaW5nlIwPQWNjZXB0LUVuY29kaW5nlIwNZ3ppcCwgZGVm
bGF0ZZSGlIwGYWNjZXB0lIwGQWNjZXB0lIwDKi8qlIaUjApjb25uZWN0aW9ulIwK
Q29ubmVjdGlvbpSMCmtlZXAtYWxpdmWUhpR1c2KMB2Nvb2tpZXOUjBByZXF1ZXN0
cy5jb29raWVzlIwRUmVxdWVzdHNDb29raWVKYXKUk5QpgZR9lCiMB19wb2xpY3mU
jA5odHRwLmNvb2tpZWphcpSME0RlZmF1bHRDb29raWVQb2xpY3mUk5QpgZR9lCiM
CG5ldHNjYXBllIiMB3JmYzI5NjWUiYwTcmZjMjEwOV9hc19uZXRzY2FwZZROjAxo
aWRlX2Nvb2tpZTKUiYwNc3RyaWN0X2RvbWFpbpSJjBtzdHJpY3RfcmZjMjk2NV91
bnZlcmlmaWFibGWUiIwWc3RyaWN0X25zX3VudmVyaWZpYWJsZZSJjBBzdHJpY3Rf
bnNfZG9tYWlulEsAjBxzdHJpY3RfbnNfc2V0X2luaXRpYWxfZG9sbGFylImMEnN0
cmljdF9uc19zZXRfcGF0aJSJjBBzZWN1cmVfcHJvdG9jb2xzlIwFaHR0cHOUjAN3
c3OUhpSMEF9ibG9ja2VkX2RvbWFpbnOUKYwQX2FsbG93ZWRfZG9tYWluc5ROdWKM
CF9jb29raWVzlH2UdWKMBGF1dGiUjAVhZG1pbpSMD1BpY2tsZXMgYXJlIGZ1bpSG
lIwHcHJveGllc5R9lIwFaG9va3OUfZSMCHJlc3BvbnNllF2Uc4wGcGFyYW1zlH2U
jAZ2ZXJpZnmUiIwEY2VydJROjAhhZGFwdGVyc5RoFClSlCiMCGh0dHBzOi8vlIwR
cmVxdWVzdHMuYWRhcHRlcnOUjAtIVFRQQWRhcHRlcpSTlCmBlH2UKIwLbWF4X3Jl
dHJpZXOUjBJ1cmxsaWIzLnV0aWwucmV0cnmUjAVSZXRyeZSTlCmBlH2UKIwFdG90
YWyUSwCMB2Nvbm5lY3SUTowEcmVhZJSJjAZzdGF0dXOUTowFb3RoZXKUTowIcmVk
aXJlY3SUTowQc3RhdHVzX2ZvcmNlbGlzdJSPlIwPYWxsb3dlZF9tZXRob2RzlCiM
BVRSQUNFlIwGREVMRVRFlIwDUFVUlIwDR0VUlIwESEVBRJSMB09QVElPTlOUkZSM
DmJhY2tvZmZfZmFjdG9ylEsAjBFyYWlzZV9vbl9yZWRpcmVjdJSIjA9yYWlzZV9v
bl9zdGF0dXOUiIwHaGlzdG9yeZQpjBpyZXNwZWN0X3JldHJ5X2FmdGVyX2hlYWRl
cpSIjBpyZW1vdmVfaGVhZGVyc19vbl9yZWRpcmVjdJQojA1hdXRob3JpemF0aW9u
lJGUdWKMBmNvbmZpZ5R9lIwRX3Bvb2xfY29ubmVjdGlvbnOUSwqMDV9wb29sX21h
eHNpemWUSwqMC19wb29sX2Jsb2NrlIl1YowHaHR0cDovL5RoVymBlH2UKGhaaF0p
gZR9lChoYEsAaGFOaGKJaGNOaGROaGVOaGaPlGhoaG9ocEsAaHGIaHKIaHMpaHSI
aHUojA1hdXRob3JpemF0aW9ulJGUdWJoeH2UaHpLCmh7SwpofIl1YnWMBnN0cmVh
bZSJjAl0cnVzdF9lbnaUiIwNbWF4X3JlZGlyZWN0c5RLHnVijAdiYXNldXJslIwU
aHR0cHM6Ly9leGFtcGxlLmNvbS+UdWIu

We decode the pickle and put it in a file, lets call it test.pickle. We then open the file with r2. We also run x to see some hex and pd to print dissassembly. If you ever want to know what an r2 command does, just run the command but append a ? to the end to get a help menu (e.g., pd?).

$ r2 -a pickle test.pickle
 -- .-. .- -.. .- .-. . ..---
[0x00000000]> x
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00000000  8004 95bf 0500 0000 0000 008c 1172 6571  .............req
0x00000010  7565 7374 732e 7365 7373 696f 6e73 948c  uests.sessions..
0x00000020  0753 6573 7369 6f6e 9493 9429 8194 7d94  .Session...)..}.
0x00000030  288c 0768 6561 6465 7273 948c 1372 6571  (..headers...req
0x00000040  7565 7374 732e 7374 7275 6374 7572 6573  uests.structures
0x00000050  948c 1343 6173 6549 6e73 656e 7369 7469  ...CaseInsensiti
0x00000060  7665 4469 6374 9493 9429 8194 7d94 8c06  veDict...)..}...
0x00000070  5f73 746f 7265 948c 0b63 6f6c 6c65 6374  _store...collect
0x00000080  696f 6e73 948c 0b4f 7264 6572 6564 4469  ions...OrderedDi
0x00000090  6374 9493 9429 5294 288c 0a75 7365 722d  ct...)R.(..user-
0x000000a0  6167 656e 7494 8c0a 5573 6572 2d41 6765  agent...User-Age
0x000000b0  6e74 948c 1670 7974 686f 6e2d 7265 7175  nt...python-requ
0x000000c0  6573 7473 2f32 2e32 382e 3294 8694 8c0f  ests/2.28.2.....
0x000000d0  6163 6365 7074 2d65 6e63 6f64 696e 6794  accept-encoding.
0x000000e0  8c0f 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
0x000000f0  6794 8c0d 677a 6970 2c20 6465 666c 6174  g...gzip, deflat
[0x00000000]> pd
            0x00000000      8004           proto 0x4
            0x00000002      95bf05000000.  frame 0x5bf
            0x0000000b      8c1172657175.  short_binunicode "requests.sessions" ; 0xd
            0x0000001e      94             memoize
            0x0000001f      8c0753657373.  short_binunicode "Session"  ; 0x21 ; 2'!'
            0x00000028      94             memoize
            0x00000029      93             stack_global
            0x0000002a      94             memoize
            0x0000002b      29             empty_tuple
            0x0000002c      81             newobj
            0x0000002d      94             memoize
            0x0000002e      7d             empty_dict
            0x0000002f      94             memoize
            0x00000030      28             mark
            0x00000031      8c0768656164.  short_binunicode "headers"  ; 0x33 ; 2'3'
            0x0000003a      94             memoize
            0x0000003b      8c1372657175.  short_binunicode "requests.structures" ; 0x3d ; 2'='
            0x00000050      94             memoize
            0x00000051      8c1343617365.  short_binunicode "CaseInsensitiveDict" ; 0x53 ; 2'S'
            0x00000066      94             memoize
            0x00000067      93             stack_global

From the above assembly it appears this file is indeed a pickle. We also see requests.sessions and Session as strings. This pickle likely imports requests and uses sessions. Let’s decompile it. We will run the command pdPf @0 ~.... This takes some explaining though, since it uses a couple of r2’s features.

  • pdPf - R2pickledec uses the pdP command (see pdP?). Adding an f causes the decompiler to set r2 flags for every variable name. This will make renaming variables and jumping to interesting locations easier.

  • @0 - This tells r2 to run the command at offset 0 instead of the current seek address. This does not matter now because our current offset defaults to
    1. I just make this a habit in general to prevent mistakes when I am seeking around to patch something.
  • ~.. - This is the r2 version of |less. It uses r2’s built in pager. If you like the real less better, you can just use |less. R2 commands can be piped to any command line program.

Once we execute the command, we will see a Python-like source representation of the pickle. The code is seen below, but snipped. All comments below were added by the decompiler.

## VM stack start, len 1
## VM[0] TOP
str_xb = "__main__"
str_x16 = "Api"
g_Api_x1c = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)
str_x54 = "headers"
str_x5e = "requests.structures"
str_x74 = "CaseInsensitiveDict"
g_CaseInsensitiveDict_x8a = _find_class(str_x5e, str_x74)
str_x91 = "_store"
str_x9a = "collections"
str_xa8 = "OrderedDict"
g_OrderedDict_xb6 = _find_class(str_x9a, str_xa8)
str_xbc = "user-agent"
str_xc9 = "User-Agent"
str_xd6 = "python-requests/2.28.2"
tup_xef = (str_xc9, str_xd6)
str_xf1 = "accept-encoding"
...
str_x5c9 = "stream"
str_x5d3 = "trust_env"
str_x5e0 = "max_redirects"
dict_x51 = {
        str_x54: what_x16c,
        str_x16d: what_x30d,
        str_x30e: tup_x32f,
        str_x331: dict_x33b,
        str_x33d: dict_x345,
        str_x355: dict_x35e,
        str_x360: True,
        str_x36a: None,
        str_x372: what_x5c8,
        str_x5c9: False,
        str_x5d3: True,
        str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616

It’s usually best to start reversing at the end with the return line. That is what is being returned from the pickle. Hit G to go to the end of the file. You will see the following code.

str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616

The what_x616 variable is getting returned. The what part of the variable indicates that the decompiler does not know what type of object this is. This is because what_x616 is the result of a g_Api_x1c.__new__ call. On the other hand, g_Api_x1c gets a g_ prefix. The decompiler knows this is a global, since it is from an import. It even adds the Api part in to hint at what the import it. The x1c and x616 indicate the offset in the pickle where the object was created. We will use that later to patch the pickle.

Since we used flags, we can easily rename variables by renaming the flag. It might be helpful to rename the g_Api_x1c to make it easier to search for. Rename the flag with fr pick.g_Api_x1c pick.api. Notice, the flag will tab complete. List all flags with the f command. See f? for help.

Now run pdP @0 ~.. again. Instead of g_Api_x1c you will see api. If we search for its first use, you will find the below code.

str_xb = "__main__"
str_x16 = "Api"
api = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)

Naively, _find_class(module, name) is equivalent to _getattribute(sys.modules[module], name)[0]. We can see the module is __main__ and the name is Api. So the api variable is just __main__.Api.

In this snippet of code, we see the request session being imported. You may have noticed the baseurl field in the previous snippet of code. Looks like this object contains a session for making backend API requests. Can we steal something good from it? Googling for “requests session basic authentication” turns up the auth attribute. Let’s look for “auth” in our pickle.

str_x30e = "auth"
str_x315 = "admin"
str_x31d = "Pickles are fun"
tup_x32f = (str_x315, str_x31d)
str_x331 = "proxies"
dict_x33b = {}
...
dict_x51 = {
        str_x54: what_x16c,
        str_x16d: what_x30d,
        str_x30e: tup_x32f,
        str_x331: dict_x33b,
        str_x33d: dict_x345,
        str_x355: dict_x35e,
        str_x360: True,
        str_x36a: None,
        str_x372: what_x5c8,
        str_x5c9: False,
        str_x5d3: True,
        str_x5e0: 30
}

It might be helpful to rename variables for understanding, or run pdP > /tmp/pickle_source.py to get a .py file to open in your favorite text editor. In short though, the above code sets up the dictionary dict_x51 where the auth element is set to the tuple ("admin", "Pickles are fun").

We just stole the admin credentials!

Patching

Now I don’t recommend doing this on a real pentest, but let’s take things farther. We can patch the pickle to use our own malicious webserver. We first need to find the current URL, so we search for “https” and find the following code.

str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = api.__new__(g_Api_x1c, *())

So the baseurl of the API is being set to https://example.com/. To patch this, we seek to where the URL string is created. We can use the x5fe in the variable name to know where the variable was created, or we can just seek to the pick.str_x5e flag. When seeking to a flag in r2 you can tab complete the flag. Notice the prompt changes its location number after the seek command.

[0x00000000]> s pick.str_x5fe
[0x000005fe]> pd 1
            ;-- pick.str_x5fe:
            0x000005fe      8c1468747470.  short_binunicode "https://example.com/" ; 0x600

Let’s overwrite this URL with https://doyensec.com/. The below Radare2 commands are commented so you can understand what they are doing.

[0x000005fe]> oo+ # reopen file in read/write mode
[0x000005fe]> pd 3 # double check what next instructions should be
            ;-- pick.str_x5fe:
            0x000005fe      8c1468747470.  short_binunicode "https://example.com/" ; 0x600
            0x00000614      94             memoize
            0x00000615      75             setitems
[0x000005fe]> r+ 1 # add one extra byte to the file, since our new URL is slightly longer
[0x000005fe]> wa short_binunicode "https://doyensec.com/"
INFO: Written 23 byte(s) (short_binunicode "https://doyensec.com/") = wx 8c1568747470733a2f2f646f79656e7365632e636f6d2f @ 0x000005fe
[0x000005fe]> pd 3     # double check we did not clobber an instruction
            ;-- pick.str_x5fe:
            0x000005fe      8c1568747470.  short_binunicode "https://doyensec.com/" ; 0x600
            0x00000615      94             memoize
            ;-- pick.what_x616:
            0x00000616      75             setitems
[0x000005fe]> pdP @0 |tail      # check that the patch worked
        str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://doyensec.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x617 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x617.__setstate__(dict_x21)
return what_x617

JSON and Automation

Imagine this is just the first of 100 files and you want to patch them all. Radare2 is easy to script with r2pipe. Most commands in r2 have a JSON variant by adding a j to the end. In this case, pdPj will produce an AST in JSON. This is complete with offsets. Using this you can write a parser that will automatically find the baseurl element of the returned api object, get the offset and patch it.

JSON can also be helpful without r2pipe. This is because r2 has a bunch of built-in features for dealing with JSON. For example, we can pretty print JSON with ~{}, but for this pickle it would produce 1492 lines of JSON. So better yet, use r2’s internal gron output with ~{=} and grep for what you want.

[0x000005fe]> pdPj @0 ~{=}https
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[1][1].value[1].args[0].value[0][1].value[1].args[0].value[10][1].value[0].value = "https";
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[8][1].value[1].args[0].value = "https://";
json.stack[0].value[1].args[0].value[1][1].value = "https://doyensec.com/";

Now we can go use the provided JSON path to find the offset of the doyensec.com URL.

[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].value}
https://doyensec.com/
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1]}
{"offset":1534,"type":"PY_STR","value":"https://doyensec.com/"}
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}
1534
[0x00000000]> s `pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}` ## seek to address using subcomand
[0x000005fe]> pd 1
            ;-- pick.str_x5fe:
            0x000005fe      8c1568747470.  short_binunicode "https://doyensec.com/" ; 0x600

Don’t forget you can pipe to external commands. For example, pdPj |jq can be used to search the AST for different patterns. For example, you could return all objects where the type is PY_GLOBAL.

Conclusion

The r2pickledec plugin simplifies reversing of pickles. Because it is a r2 plugin, you get all the features of r2. We barely scratched the surface of what r2 can do. If you’d like to learn more, check out the r2 book. Be sure to keep an eye out for my next post where I will go into Python pickle obfuscation techniques.