Researching Polymorphic Images for XSS on Google Scholar

A few months ago I came across a curious design pattern on Google Scholar. Multiple screens of the web application were fetched and rendered using a combination of location.hash parameters and XHR to retrieve the supposed templating snippets from a relative URI, rendering them on the page unescaped.

Google Scholar's design pattern

This is not dangerous per se, unless the platform lets users upload arbitrary content and serve it from the same origin, which unfortunately Google Scholar does, given its image upload functionality.

While any penetration tester worth her salt would deem the exploitation of the issue trivial, Scholar’s image processing backend was applying different transformations to the uploaded images (i.e. stripping metadata and reprocessing the picture). When reporting the vulnerability, Google’s VRP team did not consider the upload of a polymorphic image carrying a valid XSS payload possible, and instead requested a PoC||GTFO.

Given the age of this technique, I first went through all past “well-known” techniques to generate polymorphic pictures, and then developed a test suite to investigate the behavior of some of the most popular libraries for image processing (i.e. Imagemagick, GraphicsMagick, Libvips). This effort led to the discovery of some interesting caveats. Some of these methods can also be used to conceal web shells or Javascript content to bypass “self” CSP directives.

Payload in EXIF

The easiest approach is to embed our payload in the metadata of the image. In the case of JPEG/JFIF, these pieces of metadata are stored in application-specific markers (called APPX), but they are not taken into account by the majority of image libraries. Exiftool is a popular tool to edit those entries, but you may find that in some cases the characters will get entity-escaped, so I resorted to inserting them manually. In the hope of Google’s Scholar preserving some whitelisted EXIFs, I created an image having 1.2k common EXIF tags, including CIPA standard and non-standard tags.

JPG having the plain XSS alert() payload in every common metadata field PNG having the plain XSS alert() payload in every common metadata field

While that didn’t work in my case, some of the EXIF entries are to this day kept in many popular web platforms. In most of the image libraries tested, PNG metadata is always kept when converting from PNG to PNG, while they are always lost from PNG to JPG.

Payload concatenated at the end of the image (after 0xFFD9 for JPGs or IEND for PNGs)

This technique will only work if no transformations are performed on the uploaded image, since only the image content is processed.

JPG having the plain XSS alert() payload after the trailing 0xFFD9 chunk PNG having the plain XSS alert() payload after the trailing IEND chunk

As the name suggests, the trick involves appending the JavaScript payload at the end of the image format.

Payload in PNG’s iDAT

In PNGs, the iDAT chunk stores the pixel information. Depending on the transformations applied, you may be able to directly insert your raw payload in the iDAT chunks or you may try to bypass the resize and re-sampling operations. Google’s Scholar only generated JPG pictures so I could not leverage this technique.

Payload in JPG’s ECS

In the JFIF standard, the entropy-coded data segment (ECS) contains the output of the raw Huffman-compressed bitstream which represents the Minimum Coded Unit (MCU) that comprises the image data. In theory, it is possible to position our payload in this segment, but there are no guarantees that our payload will survive the transformation applied by the image library on the server. Creating a JPG image resistant to the transformations caused by the library was a process of trial and error.

As a starting point I crafted a “base” image with the same quality factors as the images resulting from the conversion. For this I ended up using this image having 0-length-string EXIFs. Even though having the payload positioned at a variable offset from the beginning of the section did not work, I found that when processed by Google Scholar the first bytes of the image’s ECS section were kept if separated by a pattern of 0x00 and 0x14 bytes.

Hexadecimal view of the JFIF structure, with the payload visible in the ECS section

From here it took me a little time to find the right sequence of bytes allowing the payload to survive the transformation, since the majority of user agents were not tolerating low-value bytes in the script tag definition of the page. For anyone interested, we have made available the images embedding the onclick and mouseover events. Our image library test suite is available on Github as doyensec/StandardizedImageProcessingTest.

Exploitation result of the XSS PoC on Scholar


  • [2019-09-28] Reported to Google VRP
  • [2019-09-30] Google’s VRP requested a PoC
  • [2019-10-04] Provided PoC #1
  • [2019-10-10] Google’s VRP requested a different payload for PoC
  • [2019-10-11] Provided PoC #2
  • [2019-11-05] Google’s VRP confirmed the issue in 2 endpoints, rewarded $6267.40
  • [2019-11-19] Google’s VRP found another XSS using the same technique, rewarded an additional $3133.70

LibreSSL and OSS-Fuzz

The story of a fuzzing integration reward

In my first month at Doyensec I had the opportunity to bring together both my work and my spare time hobbies. I used the 25% research time offered by Doyensec to integrate the LibreSSL library into OSS-Fuzz. LibreSSL is an API compatible replacement for OpenSSL, and after the heartbleed attack, it is considered as a full-fledged replacement of OpenSSL on OpenBSD, macOS and VoidLinux.

OSS-Fuzz Fuzzying Process

Contextually to this research, we were awarded by Google a $10,000 bounty, 100% of which was donated to the Cancer Research Institute. The fuzzer also discovered 14+ new vulnerabilities and four of these were directly related to memory corruption.

In the following paragraphs we will walk through the process of porting a new project over to OSS-Fuzz from following the community provided steps all the way to the actual code porting and we will also show a vulnerability fixed in 136e6c997f476cc65e614e514ac3bf6ee54fc4b4.

commit 136e6c997f476cc65e614e514ac3bf6ee54fc4b4
Author: beck <>
Date:   Sat Mar 23 18:48:15 2019 +0000

    Add range checks to varios ASN1_INTEGER functions to ensure the
    sizes used remain a positive integer. Should address issue
    13799 from oss-fuzz
    ok tb@ jsing@

 src/lib/libcrypto/asn1/a_int.c    | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 src/lib/libcrypto/asn1/tasn_prn.c |  8 ++++++--
 src/lib/libcrypto/bn/bn_lib.c     |  4 +++-
 3 files changed, 62 insertions(+), 6 deletions(-)

The FOSS historician blurry book

As a voidlinux maintainer, I’m a long time LibreSSL user and proponent. LibreSSL is a version of the TLS/crypto stack forked from OpenSSL in 2014 with the goals of modernizing the codebase, improving security, and applying best practice development procedures. The motivation for this kind of fork arose after the discovery of the Heartbleed vulnerability.

LibreSSL’s efforts are aimed at removing code considered useless for the target platforms, removing code smells and including additional secure defaults at the cost of compatibility. The LibreSSL codebase is now nearly 70% the size of OpenSSL (237558 cloc vs 335485 cloc), while implementing a similar API on all the major modern operating systems.

Forking is considered a Bad Thing not merely because it implies a lot of wasted effort in the future, but because forks tend to be accompanied by a great deal of strife and acrimony between the successor groups over issues of legitimacy, succession, and design direction. There is serious social pressure against forking. As a result, major forks (such as the Gnu-Emacs/XEmacs split, the fissioning of the 386BSD group into three daughter projects, and the short-lived GCC/EGCS split) are rare enough that they are remembered individually in hacker folklore.

Eric Raymond Homesteading the Noosphere

The LibreSSL effort was generally well received and it now replaces OpenSSL on OpenBSD, macOS since 10.11 and on many other Linux distributions. In the first few years 6 critical vulnerabilities were found in OpenSSL and none of them affected LibreSSL.

Historically, these kinds of forks tend to spawn competing projects which cannot later exchange code, splitting the potential pool of developers between them. However, the LibreSSL team has largely demonstrated of being able to merge and implement new OpenSSL code and bug fixes, all the while slimming down the original source code and cutting down on rarely used or dangerous features.

OSS-Fuzz Selection

While the development of LibreSSL appears to be a story with an happy ending, the integration of fuzzing and security auditing into the project was much less so. The Heartbleed vulnerability was like a wakeup call to the industry for tackling the security of libraries that make up the core of the internet. In particular, Google opened up OSS-Fuzz project. OSS-Fuzz is an effort to provide, for free, Google infrastructure to perform fuzzing against the most popular open source libraries. One of the first projects performing these tests was in fact Openssl.

OSS-Fuzz Fuzzying Process

Fuzz testing is a well-known technique for uncovering programming errors in software. Many of these detectable errors, like buffer overflows, can have serious security implications. OpenSSL included fuzzers in c38bb72797916f2a0ab9906aad29162ca8d53546 and was integrated into OSS-Fuzz later in 2016.

commit c38bb72797916f2a0ab9906aad29162ca8d53546
Refs: OpenSSL_1_1_0-pre5-217-gc38bb72797
Author:     Ben Laurie <>
AuthorDate: Sat Mar 26 17:19:14 2016 +0000
Commit:     Ben Laurie <>
CommitDate: Sat May 7 18:13:54 2016 +0100
    Add fuzzing!

Since both LibreSSL and OpenSSL share most of their codebase, with LibreSSL mainly implementing a secure subset of OpenSSL, we thought porting the OpenSSL fuzzers to LibreSSL would have been a fun and useful project. Moreover, this resulted in the discovery of several memory related corruption bugs.

To be noted, the following details won’t replace the official OSS-Fuzz guide but will instead help in selecting a good target project for OSS-Fuzz integration. Generally speaking applying for a new OSS-Fuzz integration proceeds in four logical steps:

  • Selection: Select a new project that isn’t yet ported. Check for existing projects in OSS-Fuzz projects directory. For example, check if somebody already tried to perform the same integration in a pull-request.
  • Feasibility: Check the feasibility and the security implications of that project on the Internet. As a general guideline, the more impact the project has on the everyday usage of the web the bigger the bounty will be. At the time of writing, OSS-Fuzz bounties are up to $20,000 with the Google patch-reward program. On the other hand, good coverage is expected to be developed for any integration. For this reason it is easier to integrate projects that already employ fuzzers.
  • Technical integration: Follow the super detailed getting started guide to perform an initial integration.
  • Profit: Apply for the Google patch-reward program. Profit?!

We were awarded a bounty, and we helped to protect the Internet just a little bit more. You should do it too!


After a crash was found, OSS-Fuzz infrastructure provides a minimized test case which can be inspected by an analyst. The issue was found in the ASN1 parser. ASN1 is a formal notation used for describing data transmitted by telecommunications protocols, regardless of language implementation and physical representation of these data, whether complex or very simple. Coincidentally, it is employed for x.509 certificates, which represents the technical base for building public-key infrastructure.

Passing our testcase 0202 ff25 through dumpasn1 it’s possible to see how it errors out saying that the integer of length 2 (bytes) is encoded with a negative value. This is not allowed in ASN1, and it should not even be allowed in LibreSSL. However, as discovered by OSS-Fuzz, this test crashes the Libressl parser.

$ xxd ./test
xxd ../test
00000000: 0202 ff25                                ...%
$ dumpasn1 ./test
  0   2: INTEGER 65317
       :   Error: Integer is encoded as a negative value.

0 warnings, 1 error.

Since the LibreSSL implementation was not guarded against negative integers, trying to covert the ASN1 integer crafted a negative to an internal representation of BIGNUM and causes an uncontrolled over-read.

    ==1==ERROR: AddressSanitizer: SEGV on unknown address 0x00009fff8000 (pc 0x00000058a308 bp 0x7ffd3e8b7bb0 sp 0x7ffd3e8b7b40 T0)
    ==1==The signal is caused by a READ memory access.
    SCARINESS: 20 (wild-addr-read)
        #0 0x58a307 in BN_bin2bn libressl/crypto/bn/bn_lib.c:601:19
        #1 0x6cd5ac in ASN1_INTEGER_to_BN libressl/crypto/asn1/a_int.c:456:13
        #2 0x6a39dd in i2s_ASN1_INTEGER libressl/crypto/x509v3/v3_utl.c:175:16
        #3 0x571827 in asn1_print_integer_ctx libressl/crypto/asn1/tasn_prn.c:457:6
        #4 0x571827 in asn1_primitive_print libressl/crypto/asn1/tasn_prn.c:556
        #5 0x571827 in asn1_item_print_ctx libressl/crypto/asn1/tasn_prn.c:239
        #6 0x57069a in ASN1_item_print libressl/crypto/asn1/tasn_prn.c:195:9
        #7 0x4f4db0 in FuzzerTestOneInput libressl.fuzzers/asn1.c:282:13
        #8 0x7fd3f5 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:529:15
        #9 0x7bd746 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:286:6
        #10 0x7c9273 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:715:9
        #11 0x7bcdbc in main /src/libfuzzer/FuzzerMain.cpp:19:10
        #12 0x7fa873b8282f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/libc-start.c:291
        #13 0x41db18 in _start

This “wild” address read may be employed by malicious actors to perform leaks in security sensitive context. The Libressl maintainers team not only addressed the vulnerability promptly but also included an ulterior protection in order to guard against missing ASN1_PRIMITIVE_FUNCS in 46e7ab1b335b012d6a1ce84e4d3a9eaa3a3355d9.

commit 46e7ab1b335b012d6a1ce84e4d3a9eaa3a3355d9
Author: jsing <>
Date:   Mon Apr 1 15:48:04 2019 +0000

    Require all ASN1_PRIMITIVE_FUNCS functions to be provided.

    If an ASN.1 item provides its own ASN1_PRIMITIVE_FUNCS functions, require
    all functions to be provided (currently excluding prim_clear). This avoids
    situations such as having a custom allocator that returns a specific struct
    but then is then printed using the default primative print functions, which
    interpret the memory as a different struct.

Closing the door to strangers

Fuzzing, despite being seen as one of the easiest ways to discover security vulnerabilities, still works very well. Even if OSS-Fuzz is especially tailored to open source projects, it can also be adapted to closed source projects. In fact, at the cost of implementing the LLVMFuzzerOneInput interface, it integrates all the latest and greatest clang/llvm fuzzer technology. As Dockerfile language improves enormously on the devops side, we strongly believe that the OSS-Fuzz fuzzing interface definition language should be employed in every non-trivial closed source project too. If you need help, contact us for your security automation projects!

As always, this research was funded thanks to the 25% research time offered at Doyensec. Tune in again for new episodes!

InQL Scanner

InQL is now public!

As a part of our continuing security research journey, we started developing an internal tool to speed-up GraphQL security testing efforts. We’re excited to announce that InQL is available on Github.

Doyensec Loves GraphQL

InQL can be used as a stand-alone script, or as a Burp Suite extension (available for both Professional and Community editions). The tool leverages GraphQL built-in introspection query to dump queries, mutations, subscriptions, fields, arguments and retrieve default and custom objects. This information is collected and then processed to construct API endpoints documentation in the form of HTML and JSON schema. InQL is also able to generate query templates for all the known types. The scanner has the ability to identify basic query types and replace them with placeholders that will render the query ready to be ingested by a remote API endpoint.

We believe this feature, combined with the ability to send query templates to Burp’s Repeater, will decrease the time to exploit vulnerabilities in GraphQL endpoints and drastically lower the bar for security research against GraphQL tech stacks.

InQL Scanner Burp Suite Extension

Using the inql extension for Burp Suite, you can:

  • Search for known GraphQL URL paths; the tool will grep and match known values to detect GraphQL endpoints within the target website
  • Search for exposed GraphQL development consoles (GraphiQL, GraphQL Playground, and other common utilities)
  • Use a custom GraphQL tab displayed on each HTTP request/response containing GraphQL
  • Leverage the template generation by sending those requests to Burp’s Repeater tool
  • Configure the tool by using a custom settings tab

Enabling InQL Scanner Extension in Burp

To use inql in Burp Suite, import the Python extension:

  • Download the latest Jython Jar
  • Download the latest version of InQL scanner
  • Start Burp Suite
  • Extender Tab > Options > Python Enviroment > Set the location of Jython standalone JAR
  • Extender Tab > Extension > Add > Extension Type > Select Python
  • Extension File > Set the location of > Next
  • The output window should display the following message: InQL Scanner Started!

In the next future, we might consider integrating the extension within Burp’s BApp Store.

InQL Demo

We completely revamped the command line interface in light of InQL’s public release. This interface retains most of the Burp plugin functionalities.

It is now possible to install the tool with pip and run it through your favorite CLI.

pip install inql

For all supported options, check the command line help:

usage: inql [-h] [-t TARGET] [-f SCHEMA_JSON_FILE] [-k KEY] [-p PROXY]
            [--header HEADERS HEADERS] [-d] [--generate-html]
            [--generate-schema] [--generate-queries] [--insecure]
            [-o OUTPUT_DIRECTORY]

InQL Scanner

optional arguments:
  -h, --help            show this help message and exit
  -t TARGET             Remote GraphQL Endpoint (https://<Target_IP>/graphql)
  -f SCHEMA_JSON_FILE   Schema file in JSON format
  -k KEY                API Authentication Key
  -p PROXY              IP of web proxy to go through (
  -d                    Replace known GraphQL arguments types with placeholder
                        values (useful for Burp Suite)
  --generate-html       Generate HTML Documentation
  --generate-schema     Generate JSON Schema Documentation
  --generate-queries    Generate Queries
  --insecure            Accept any SSL/TLS certificate
  -o OUTPUT_DIRECTORY   Output Directory

An example query can be performed on one of the numerous exposed APIs, e.g endpoints:

$ $ inql -t
[+] Writing Queries Templates
 |  Page
 |  Media
 |  MediaTrend
 |  AiringSchedule
 |  Character
 |  Staff
 |  MediaList
 |  MediaListCollection
 |  GenreCollection
 |  MediaTagCollection
 |  User
 |  Viewer
 |  Notification
 |  Studio
 |  Review
 |  Activity
 |  ActivityReply
 |  Following
 |  Follower
 |  Thread
 |  ThreadComment
 |  Recommendation
 |  Like
 |  Markdown
 |  AniChartUser
 |  SiteStatistics
[+] Writing Queries Templates
 |  UpdateUser
 |  SaveMediaListEntry
 |  UpdateMediaListEntries
 |  DeleteMediaListEntry
 |  DeleteCustomList
 |  SaveTextActivity
 |  SaveMessageActivity
 |  SaveListActivity
 |  DeleteActivity
 |  ToggleActivitySubscription
 |  SaveActivityReply
 |  DeleteActivityReply
 |  ToggleLike
 |  ToggleLikeV2
 |  ToggleFollow
 |  ToggleFavourite
 |  UpdateFavouriteOrder
 |  SaveReview
 |  DeleteReview
 |  RateReview
 |  SaveRecommendation
 |  SaveThread
 |  DeleteThread
 |  ToggleThreadSubscription
 |  SaveThreadComment
 |  DeleteThreadComment
 |  UpdateAniChartSettings
 |  UpdateAniChartHighlights
[+] Writing Queries Templates
[+] Writing Queries Templates

The resulting HTML documentation page will contain details for all available queries, mutations, and subscriptions.

Stay tuned!

Back in May 2018, we published a blog post on GraphQL security where we focused on vulnerabilities and misconfigurations. As part of that research effort, we developed a simple script to query GraphQL endpoints. After the publication, we received a lot of positive feedbacks that sparked even more interest in further developing the concept. Since then, we have refined our GraphQL testing methodologies and tooling. As part of our standard customer engagements, we often perform testing against GraphQL technologies, hence we expect to continue our research efforts in this space. Going forward, we will keep improving detection and make the tool more stable.

This project was made with love in the Doyensec Research island.

Don't Clone That Repo: Visual Studio Code^2 Execution

This is the story of how I stumbled upon a code execution vulnerability in the Visual Studio Code Python extension. It currently has 16.5M+ installs reported in the extension marketplace.

The bug

Some time ago I was reviewing a client’s Python web application when I noticed a warning

VSCode pylint not installed warning

Fair enough, I thought, I just need to install pylint.

To my surprise, after running pip install --user pylint the warning was still there. Then I noticed venv-test displayed on the lower-left of the editor window. Did VSCode just automatically select the Python environment from the project folder?! To confirm my hypothesis, I installed pylint inside that virtualenv and the warning disappeared.

VSCode pylint not installed warning full window screenshot

This seemed sketchy, so I added os.exec("/Applications/") to one of pylint sources and a calculator spawned. Easiest code execution ever!

VSCode behaviour is dangerous since the virtualenv found in a project folder is activated without user interaction. Adding a malicious folder to the workspace and opening a python file inside the project is sufficient to trigger the vulnerability. Once a virtualenv is found, VSCode saves its path in .vscode/settings.json. If found in the cloned repo, this value is loaded and trusted without asking the user. In practice, it is possible to hide the virtualenv in any repository.

The behavior is not in VSCode core, but rather in the Python extension. We contacted Microsoft on the 2nd October 2019, however the vulnerability is still not patched at the time of writing. Given that the industry-standard 90 days expired and the issue is exposed in a GitHub issue, we have decided to disclose the vulnerability.


You can try for yourself! This innocuous PoC repo opens on macOS:

  • 1) git clone
  • 2) add the cloned repo to the VSCode workspace
  • 3) open in VScode

This repo contains a “malicious” settings.json which selects the virtualenv in totally_innocuous_folder/no_seriously_nothing_to_see_here.

In case of a bare-bone repo like this noticing the virtualenv might be easy, but it’s clear to see how one might miss it in a real-life codebase. Moreover, it is certainly undesirable that VSCode executes code from a folder by just opening a Python file in the editor.

Disclosure Timeline

  • 2nd Oct 2019: Issue discovered
  • 2nd Oct 2019: Security advisory sent to Microsoft
  • 8th Oct 2019: Response from Microsoft, issue opened on vscode-python bug tracker #7805
  • 7th Jan 2020: Asked Microsoft for a resolution timeframe
  • 8th Jan 2020: Microsoft replies that the issue should be fixed by mid-April 2020
  • 16th Mar 2020: Doyensec advisory and blog post is published


  • 17th Mar 2020: The blogpost stated that the extension is bundled by default with the editor. That is not the case, and we removed that claim. Thanks @justinsteven for pointing this out!