I apologize if this information is already known, but I couldn’t find any references about it and I wanted to understand what was going on and share with you because I think there is some value doing it.

In case this wasn’t known, I apologize to the Go team for not talking to them first and jumping the full disclosure gun (I don’t think it’s that severe). I really like Go! Thanks for all your great work.

The problem

Last night, I was exploring the contents of Go’s checksum database, and I noticed a curious result:

sqlite> select path, count(path) from modules group by path order by count(path) desc;
github.com/homebrew/homebrew-core|39438
github.com/Homebrew/homebrew-core|30896
github.com/concourse/concourse|25372
github.com/openshift/release|24065
github.com/cilium/cilium|22138

The homebrew/Homebrew case divergence is explained by Go’s documentation (thanks to Filippo Valsorda!):

To avoid ambiguity when serving from case-insensitive file systems, the $module and $version elements are case-encoded by replacing every uppercase letter with an exclamation mark followed by the corresponding lower-case letter. This allows modules example.com/M and example.com/m to both be stored on disk, since the former is encoded as example.com/!m.

Anyway, this caught my attention because Homebrew is known to use Ruby, and so I went to check the repository contents.

GitHub language stats confirm it:

language stats

This result seems unexpected, since there are no traces of Go and there are more than 70,000 entries in Go’s checksum database. To be sure, I cloned the repository and tried to find any Go-related files such as go.mod or Go source files; however, nothing exists.

So I posted a tweet (technically a toot on Mastodon), got no reply and moved on.

While continuing to explore the database, I noticed another unusual case in github.com/Edu4rdSHL/rust-headless-chrome. It’s just a fork of rust-headless-chrome, and there is nothing remarkable about the fork or the original except that they are both Rust repositories, and once again, no connection to Go.

Now my curiosity is piqued and the evil mode kicks in. It feels like arbitrary data can be pushed to the checksum database without a connection to Go. Why is the data being pushed? And how is it being pushed? I go to bed thinking about this, which is the most dangerous moment for security research. As I try to fall asleep, I come up with tons of ideas, but I’m usually too tired or lazy to take notes, and so, quite frequently, I can’t remember them in the morning. But this one wasn’t forgotten!

Research

A new day and a curious mind demands answers. Why, how and, what if are the most dangerous questions in this field. If a Git repository has nothing to do with Go code, how does it appear in the Go checksum database?

From previous documentation readings, I knew that proxy.golang.org was the default modules proxy, and sum.golang.org for the checksum database. A couple of ripgrep searches in Go source code return nothing interesting, so it was time to read Go’s documentation, which is usually quite good.

Where to start? Go Modules Reference was a great candidate and I finally found the answer to my question:

If the go command consults the checksum database, then the first step is to retrieve the record data through the /lookup endpoint. If the module version is not yet recorded in the log, the checksum database will try to fetch it from the origin server before replying.

Okay, this was easy! If the module doesn’t exist in the checksum database (and proxy), it will be downloaded by the checksum and proxy infrastructure. One of my questions was: how did the checksum database retrieve the modules since they can be anywhere? I couldn’t find anything in the Go code that was responsible for that (which wouldn’t explain at all how Ruby and Rust code ended up in the database).

So, the next logical step is easy. Can I get the Go checksum server to download arbitrary data?

According to the documentation, the endpoint to try this is $base/lookup/$module@$version:

Returns the log record number for the entry about $module at $version, followed by the data for the record (that is, the go.sum lines for $module at $version) and a signed, encoded tree description that contains the record.

First, let’s test it with a known record to see if and how it works:

$ curl https://sum.golang.org/lookup/github.com/homebrew/homebrew-core@v0.0.0-20240524162643-646fe2715a1c
26235981
github.com/homebrew/homebrew-core v0.0.0-20240524162643-646fe2715a1c h1:U32osaj3vZGypOtq7tsIHhZAYNOmiShiXJysIFGTqyM=
github.com/homebrew/homebrew-core v0.0.0-20240524162643-646fe2715a1c/go.mod h1:TM9a6pxWZJZZWuMzxESXhb6yaBaH9JAKDM4wpIzJsDE=

go.sum database tree
26238433
TQyXJYWJL6Z1OnKk5JXLAb9xfWrtHKjAUXKx5UQCa9Q=

— sum.golang.org Az3grm+I35+HBcG+YvxlX+nzkXah3cWlBac/4EytsG24bEHFLrJNvyz5SphrKAHSS0EeDKJXpnb3cvdUtqVSiaNLVAY=

Since the repository doesn’t seem to have any version tag the pseudo-version is used instead. Go documentation explains the logic behind pseudo-versions.

The next step is to verify wheter a new Go module repository will be added to the checksum database and proxy if we call the lookup endpoint, as described.

After creating a simple new Go module and uploading to my GitHub account I have tried to issue the lookup command in two different forms, one not totally according to documentation, the other one also incorrect but trying to follow documentation. Both return errors, although different.

$ curl https://sum.golang.org/lookup/github.com/gdbinit/fluxmatter@latest
bad request: version "latest" is not canonical (wanted "")

$ curl https://sum.golang.org/lookup/github.com/gdbinit/fluxmatter@v0.0.0
not found: github.com/gdbinit/fluxmatter@v0.0.0: invalid version: unknown revision v0.0.0

The errors are kind of expected since I haven’t versioned the module and wasn’t using the correct pseudo-version. But we can verify if the new module was fetched as described by the documentation. The easiest way would be to generate the correct pseudo-version and make another query to the checksum database. If the module was indeed downloaded, then the entry would exist and returned as in the homebrew-core test.

Another way would be to resync my copy of the checksum database and query it for my module:

sqlite> select * from modules where path = 'github.com/gdbinit/fluxmatter';
github.com/gdbinit/fluxmatter|v0.0.0-20240524163826-a7e64ffd69f2|2024-05-24T16:40:51.203837Z

Finally we can query the proxy and use the latest query to return the latest known version of a module. And then download the module zip and prove that we just stored our arbitrary data in the Go infrastructure.

$ curl https://proxy.golang.org/github.com/gdbinit/fluxmatter/@latest
{"Version":"v0.0.0-20240524163826-a7e64ffd69f2","Time":"2024-05-24T16:38:26Z","Origin":{"VCS":"git","URL":"https://github.com/gdbinit/fluxmatter","Hash":"a7e64ffd69f2d0751a52736e832a8d77a21059e7"}}

$ curl -O https://proxy.golang.org/github.com/gdbinit/fluxmatter/@v/v0.0.0-20240524163826-a7e64ffd69f2.zip
$ file v0.0.0-20240524163826-a7e64ffd69f2.zip
v0.0.0-20240524163826-a7e64ffd69f2.zip: Zip archive data, at least v2.0 to extract

$ unzip -t v0.0.0-20240524163826-a7e64ffd69f2.zip
Archive:  v0.0.0-20240524163826-a7e64ffd69f2.zip
    testing: github.com/gdbinit/fluxmatter@v0.0.0-20240524163826-a7e64ffd69f2/LICENSE   OK
    testing: github.com/gdbinit/fluxmatter@v0.0.0-20240524163826-a7e64ffd69f2/fluxmatter.go   OK
    testing: github.com/gdbinit/fluxmatter@v0.0.0-20240524163826-a7e64ffd69f2/go.mod   OK
No errors detected in compressed data of v0.0.0-20240524163826-a7e64ffd69f2.zip.

And voilà, everything works! There’s no need to specify a version in the lookup (at least for the initial seeding); just a lookup query that contains the module path and something like a version.

Next I tried to do the same with a repository that has no Go code whatsoever, to prove that everything works the same way.

$ curl https://sum.golang.org/lookup/github.com/gdbinit/readmem@v0.0.0
not found: github.com/gdbinit/readmem@v0.0.0: invalid version: unknown revision v0.0.0

sqlite> select * from modules where path = 'github.com/gdbinit/readmem';
github.com/gdbinit/readmem|v0.0.0-20131006075740-407cb0a56933|2024-05-24T16:45:35.88456Z

$ curl https://proxy.golang.org/github.com/gdbinit/readmem/@latest
{"Version":"v0.0.0-20131006075740-407cb0a56933","Time":"2013-10-06T07:57:40Z","Origin":{"VCS":"git","URL":"https://github.com/gdbinit/readmem","Hash":"407cb0a569336f98f3772582a31c17aa080caf66"}}

$ curl -O https://proxy.golang.org/github.com/gdbinit/readmem/@v/v0.0.0-20131006075740-407cb0a56933.zip
$ file v0.0.0-20131006075740-407cb0a56933.zip
v0.0.0-20131006075740-407cb0a56933.zip: Zip archive data, at least v2.0 to extract

$ unzip -t v0.0.0-20131006075740-407cb0a56933.zip
Archive:  v0.0.0-20131006075740-407cb0a56933.zip
    testing: github.com/gdbinit/readmem@v0.0.0-20131006075740-407cb0a56933/Entitlements.plist   OK
    testing: github.com/gdbinit/readmem@v0.0.0-20131006075740-407cb0a56933/README   OK
    testing: github.com/gdbinit/readmem@v0.0.0-20131006075740-407cb0a56933/readmem.xcodeproj/project.pbxproj   OK
    testing: github.com/gdbinit/readmem@v0.0.0-20131006075740-407cb0a56933/readmem/main.c   OK
No errors detected in compressed data of v0.0.0-20131006075740-407cb0a56933.zip.

This demonstrates that it’s possible to load into Go public proxy arbitrary data. The experiment was done using GitHub but it should work with other hosting sites.

One curious statistic is the amount of Go modules stored at GitHub:

sqlite> select count(distinct path) from modules;
1591375
sqlite> select count(distinct path) from modules where path like 'github.com%';
1515957

Around 95% of the unique paths present in sum.golang.org are hosted at GitHub. This is a raw statistic, without removing bogus data such as forks and targets that aren’t really Go code. But it still shows the magnitude of GitHub dependency in the Go ecosystem.

Go authors don’t seem to be completely unaware of this kind of scenario and implemented some restrictions, which are described in File path and size constraints section. The most relevant one is:

A module zip file may be at most 500 MiB in size. The total uncompressed size of its files is also limited to 500 MiB. go.mod files are limited to 16 MiB. LICENSE files are also limited to 16 MiB. These limits exist to mitigate denial of service attacks on users, proxies, and other parts of the module ecosystem. Repositories that contain more than 500 MiB of files in a module directory tree should tag module versions at commits that only include files needed to build the module’s packages; videos, models, and other large assets are usually not needed for builds.

500MiB is more than enough for considerable abuse and all the others aren’t really a problem.

Abuse what?

For example, it can be used to bypass destination download restrictions on developer machines and in CI/CD servers (assuming that there isn’t a private GOPROXY). Malware can simply store payloads and retrieve them from the proxy when needed. And because we don’t have restrictions regarding the source domain (as long it’s a working VCS), we can load up the payloads from anywhere and make those sources disappear, leaving only a small trace in the checksum database entry.

A Denial of Service (DoS) attack on proxy.golang.org might be challenging to execute. I have shown that we can request any random Git repository (and probably any of the other supported VCS) to be downloaded by the proxy. For a possible attack, we would need to first gather as many GitHub URLs as possible and then issue as many requests as possible to the lookup API. I have no idea about the server implementation, but I would assume that something similar to a work queue is implemented, so there is a limit to the amount of parallel requests that will be processed. Bandwidth protections could also possibly be triggered from GitHub’s side. There is also a possibility of DoS-ing the storage space. I’m just guessing here :-).

A command and control (C2) can also be easily implemented on top of this. It’s very easy to find the latest version of any module using the latest query, so there isn’t need to query the checksum database and find all the available versions of our payload. The payload can be a simple file or can be disguised inside the go.mod or any other Go source file for extra stealthiness. A module DGA (Domain Generation Algorithm) can be used to avoid using a single repository for the C2. My original goal was to write a sample C2 to demonstrate this, but there isn’t really much to be done here.

To download commands from the C2, the implant just needs the following steps:

  • Make a request to https://proxy.golang.org/module_path/@latest.

  • Parse the JSON result and extract the pseudo-version (or version if used).

  • Make another request to https://proxy.golang.org/module_path/@v/version.zip to download the zip file.

  • Extract the zip contents and parse the commands.

Quite easy in some 300 or less lines of Go code.

Conclusions

My questions have been answered, and now I understand how the checksum database process works. So far, it’s not a severe issue in Go infrastructure. It’s something that can be easily abused but also improved. Maybe there are (documented or not) reasons to let non-Go code be uploaded to the proxy and checksum database. Or maybe there is already someone abusing this, and we can go on a treasure hunt through the almost 1.6 million unique repositories (my up-to-date database copy contains almost 22 million entries).

I still have questions why certain valid non-Go projects are in the database. Are they doing it on purpose? Why? Using Go’s transparent log as safety backup? Any hints about this?

I had fun with this, and I have a bunch of ideas to further explore. I have a feeling… ;-).

Have fun,
fG!