Go's module system got a lot right, but it's a complete nightmare to work with. The author actually highlighted one of it's very worst traits:
> Go got it right from the beginning and didn't use a centralized package registry to manage dependencies, but instead you have to directly point to the source code of the packages.
Directly coupling the identity of a package to its location means that you can't change one without the other. Need to rename to a fork of a dependency? You'll have to touch every single file that imports it. Need to use a organisational local cache for deps? It better be a transparent proxy or you can't do that. The only support for this is replace statements in go.mod files, but those tie you even further in knots when you need to pull in a dependency that has a replace statement in it.
It's even worse on the maintainer side. If you want to rename a git repository, rename a GitHub organisation, migrate a repo to a different owner, or even move git hosting providers then you risk breaking every single downstream. The only solution is to host a proxy for your packages on a custom URL that redirects to the backend hosting provider, and set your Go module's name to be on that custom URL. Which requires you to do this ahead of time before the package is widely used.
Cargo and crates.io could be better but Go is the worst place to draw inspiration from as it's full of ideas that seem clean, work at first, and break in hard to fix ways once you do anything complex.
> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does (I can't imagine about the long tail which receives less attention).
I think this post has some good information in it, but this is essentially overstated: I look at crate discrepancies pretty often as part of reviewing dependency updates, and >90% of the time it's a single line difference (like a timestamp, hash, or some other shudder between the state of the tree at tag-time and the state at release-time). These are non-ideal from a consistency perspective, but they aren't cause for this degree of alarm -- we do know what the code does, because the discrepancies are often trivial.
Not only this, but the reason we can check what the discrepancy is is because crates.io distributes source code, not binaries, so they can always be inspected. In the end, whats in crates.io is the source of truth.
Yes, but that's already the case. My point was that in practice the current discrepancies observed don't represent a complete disconnect between the ground truth (the source repo) and the package index, they tend to be minor. So describing the situation as "nobody knows what 17% of the top crates.io packages do" is an overstatement.
If that's the case, it would be a lot simpler (and equally accurate) to say that "no one knows" what the source repo is doing, either! The median consumer of packages in any packaging ecosystem is absolutely not reading the entire source code of their dependencies, in either the ground truth or index form.
That's certainly true - and would also be true (maybe even moreso) if vendoring dependencies was widespread. Seems just as easy to hide things in a "vendored" directory that's 20x the size of the library.
That wasn't intentional. But also, I don't think "virtually" actually changes the meaning substantially; it has the same conventional meaning in that position as "effectively" or "might as well be nobody."
Serious consideration: Claude Mythos is going to change the risk envelope of this problem.
We're still thinking in the old mindset, whereas new tools are going to change how all of this is done.
In some years dependencies will undergo various types of automated vetting - bugs (various categories), memory, performance, correctness, etc. We need to think about how to scale this problem instead. We're not ready for it.
I specifically don't update the version in Cargo.toml in the codebase. I patch it in just before cargo publish, otherwise all other PRs now need to change.
> The single best defense against supply chain attacks is a comprehensive standard library developed by experts, such as Go's one.
Go programs, and python programs (which also has a pretty comprehensive standard library) have a lot of dependencies too. A big standard library helps a little, but I'm doubtful it is the "single best defense".
And there are several practical problems with a big standard library, which this article didn't address at all. I think for rust at least, a much better approach would be to have a collection of "blessed" libraries under the umbrella of the Rust Foundation. But that just reduces the risk for a subset of dependencies it doesn't solve the fundamental risks.
A fat std lib will definitely not solve the problem. I am a proponent of the rust foundation taking packages under their wing and having them audited and funded while keeping original maintainers in tact
> fat std lib will definitely not solve the problem
fully agree, that was tried and failed severely
- in python there is a saying that standard library is where packages go to die. It's pretty common to pull in 3rd party libraries for things in python due to the build in version by itself sucks. Weather it's for UX, performance, bug-proneness, feature completeness and even "bad security choices stuck with for backward compatibility reasons" cases happened.
- in Java "batteries included" had been repeatedly involved in pretty bad security vulnerabilities. Often on the line of "that niche feature people may not even be aware of was callable though some reflection/dynamic resolution leading to a RCE".
In the end IMHO it's not supper relevant if the rust foundation takes packages under their wing or not. What matters is to create verifiable supply trust.
That crates.io is only meant to contain source code already helps, and them only allowing uploading new packages and yanking but not overwriting them also helps.
Go is another example of a fat std lib causing issues specifics with their crypto code.
I think in general the things people are worried about are
1. Maintainer quits
2. Bad actor becomes new maintainer
3. Bad pr
4. Account compromise
When I say I want the rust foundation to take them under their wing what I really mean is I want the foundation to provide funding and have packages undergo the same procedure as the main language.
If there’s a cve the foundation should orchestrate reporting and standardize it.
If it becomes abandoned the foundation should handle that.
Basically I want it to be an extension of the standard but not in a way that actually requires it to be so. I just want these packages to have the seal of approval of the foundation so I know that they have a minimum amount of quality and are vetted on the regular by a trusted entity
We're going to be launching Chainguard Libraries for Rust in a few weeks, this article perfectly calls out the issues.
crates are somewhat better designed than NPM/PyPI (the dist artifacts are source based), but still much worse than Go where there's an intermediate packaging step disconnected from the source of truth.
I'm not really convinced that having a few more libraries in the standard library or decentralizing the library repository is going to change much the risks
> In a recent analysis, Adam Harvey found that among the 999 most popular crates on crates.io, around 17% contained code that do not match their code repository.
Huh, how is this possible? Is the code not pulled from the repository? Why not?
This is where the whole TPM / trusted computing / secure enclave could be useful to secure developer keys; an unencrypted .ssh/id_rsa file is just too much of a tempting target (also get off RSA already!)
You don't need the secure boot machinery for that though, a hardware security token would do and has the advantage that you need to acknowledge actions with a tap
Tangentially, soon all those will be replaced with new hardware supporting PQ signatures.
I've started keeping important signing keys in cloud HSM products. Getting AWS KMS to sign a payload is actually very straightforward once you've got your environment variables & permissions set up properly.
more specifically, one can introduce policies into the runtime, or given rust hoist at least some of them into compiletime that would do things like (a) enforce syscall filtering based on crate or even function (b) support private memory regions for crates or finer grained entities that are only unlocked upon traversing a declared call-gate (c) the converse, where crates can only access memory they themselves have allocated except a whitelist of parameters (d) use even heavier calling conventions that rpc to entirely separate processes
By using the type system. You define your type constraints at the module interface point and when you try to link the third-party module into that interface the compiler ensures that the constraints are satisfied. Same thing the compiler is already doing in simpler cases. If you specify that a third-party library function must return an integer, the compiler will ensure that function won't unexpectedly return a string. Just like that, except the type system is expanded to enable describing more complex behaviours.
> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does (I can't imagine about the long tail which receives less attention).
I dug into the linked article, and I would really say this means something closer to 17% of the most popular Rust package versions are either unbuildable or have some weird quirks that make building them not work the way you expect, and not in a remotely reproducible fashion.
Pulling things into the standard lib is fine if you think everyone should stop using packages entirely, but that doesn't seem like it really does anything to solve the actual problem. There are a number of things it seems like we might be forced to adopt across the board very soon, and for Rust it seems tractable, but I shudder to think about doing it for messier languages like Ruby, Python, Perl, etc.
* Reproducible builds seems like the first thing.
* This means you can't pull in git submodules or anything from the Internet during your build.
* Specifically for the issues in this post, we're going to need proactive security scanners. One thing I could imagine is if a company funnels all their packages through a proxy, you could have a service that goes and attempts to rebuild the package from source, and flags differences. This requires the builds to be remotely reproducible.
* Maybe the latest LLMs like Claude Mythos are smart enough that you don't need reproducible builds, and you can ask some LLM agent workflow to review the discrepancies between the repo and the actual package version.
> and I would really say this means something closer to 17% of the most popular Rust package versions are either unbuildable or have some weird quirks that make building them not work the way you expect
No, what it means is that the source in crates.io doesn't match 1:1 with any commit sha in their project's repo. This is usually because some gitignored file ended up as part of the distributed package, or poor release practice.
This doesn't mean that the project can't build, or that it is being exploited (but it is a signal to look closer).
Let me start by saying I love rust, but the supply chain story for official rust compiler binaries is not okay and I would never even trust them even on a dev workstation, let alone in production.
Most people get rust from rustup, an unsigned "curl | sh" style magic script.
Whoever controls the dns for rustup.rs, or the webserver, or the BGP nodes between you and the webserver, can just change that at any time, or change it only when requests come from specific IP addresses and backdoor people all day long.
Next you end up getting binaries of rust, that are not reproducible, and have no remote attestation provenance for the build system. Without at least one of those that means there is a CI, or workstation somewhere one or more people control building those binaries that could also tamper with them silently at build time. No one can reproduce them so until someone has the time to do some deep dive binary diffing, unlikely to be detected any time soon.
And then, we get to the fact the official rust releases are not full source bootstrapped. To build rust 1.94 you need rust 1.93 and so on. That means if you have ever backdoored -any- release of rust in the past using the above weaknesses, you have backdoored all of them via trusting trust attacks where the backdoor always detects when the next version of the rust compiler is being built and copies itself over to the new build.
The way you prove this is not happening within the rust build chain, is you bootstrap from another compiler. The rust team does not do this but thankfully Mutabah made mrustc, which is a minimal c++ port of the rust compiler suitable for building the actual rust compiler, so we can anchor our supply chain to a C compiler instead.
But now how do you trust the C compiler? Some random compiler from debian is at least signed, but only signed by one person. Another major risk.
So now you need to build your c compiler from source code all the way up, a technique called full source bootstrapping. A tiny bit of human reviewable machine code is used to build a more complex version of itself, all the way up to tinycc, gcc, llvm, and eventually rust. And then you have it all be deterministic and then have many people build any portions of the build chain that have changed every release and all get the same result, and sign that result. THEN we know you are getting a faithful build of the rust compiler from source that no one had the opportunity to tamper with.
Credit where due that Guix did this first, though they still have a much more relaxed supply chain security policy so threat model accordingly.
But how do you know the actual source of the stagex build process was not tampered by an impersonated maintainer that merged their own malicious PR made by a pseudonym bypassing code review? Well we sign every commit, and we sign every PR. Every change must have at least two cryptographic signatures by well known WoT established private keys held on smartcards by maintainers. We simply do not allow merging PRs from randos until another maintainer has signed the PR and then a -different- maintainer can review and do a signed merge. This also means no one can re-write git history, so we can survive a compromise even of the git server itself.
This is something only stagex does as far as we can tell, as our threat model assumes at least one maintainer is compromised at all times.
But, aside from a few large high risk entities, most people are not using stagex or guix built rust and just yolo using a shell script to grab a random binary and start compiling code with it.
I would strongly urge people to stop doing that if you are working on software meant to run on anything more security sensitive than a game console.
With the giant wave of AI bots doing account takeovers and impersonation all the time, github login as the last line of defense is going to keep ending badly.
Use provably correct binaries no single person can tamper with, or build them yourself.
rust fixed memory safety but left build-time trust wide open. What’s the realistic path to fixing this? sandboxed builds by default, or stricter provenance (sigstore-style) or what?
Eh, the only way to secure your Rust programs it the technique not described in the article.
Vendor your dependencies. Download the source and serve it via your own repository (ex. [1]). For dependencies that you feel should be part of the "Standard Library" (i.e. crates developed by the Rust team but not included into std) don't bother to audit them. For the other sources, read the code and decide if it's safe.
I'm honestly starting to regret not starting a company like 7 years ago where all I do is read OSS code and host libraries I've audited (for a fee to the end-user of course). This was more relevant for USG type work where using code sourced from an American is materially different than code sourced from non-American.
If you host your own internal crates.io mirror, I see two ways to stay on top of security issues that have been fixed upstream. Both involving the use of
Alternative A) would be to redirect the DNS for crates.io in your company internal DNS server to point at your own mirror, and to have your company servers and laptops/workstations all use your company internal DNS server only. And have the servers and laptops/workstations trust a company controlled CA certificate that issues TLS certificates for “crates.io”. Then cargo and cargo audit would work transparently assuming they use the host CA trust store when validating the TLS certificates when they connect to crates.io. The RustSec DB you use directly from upstream, not even mirroring it and hosting an internal copy. Drawback is if you accidentally leave some servers or laptops/workstations using external DNS, and connections are made to the real crates.io instead. Because then developers end up pulling in versions of deps that have not been audited by the company itself and added to the internal mirror.
Alternative B) that I see is to set up the crates host to use a DNS name under your own control. E.g. crates dot your company internal network DNS name. And then set up cargo audit to use an internally hosted copy of the advisory DB that is always automatically kept up to date but has replaced the cargo registry they are referring to to be your own cargo crates mirror registry. I think that should work. It is already very easy to set up your own crates mirror registry, cargo has excellent support built right into it for using crates registries other than or in addition to crates.io. And then you have a company policy that crates.io is never to be used and you enforce it with automatic scanning of all company repos that checks that no entries in Cargo.toml and Cargo.lock files use crates.io.
It would probably be a good idea even to have separate internal crate registries for crates that are from crates.io and crates that are internal to the company itself. To avoid any name collisions and the likes.
Regardless if going with A) or B), you’d then be able to run cargo audit and see security advisories for all your dependencies, while the dependencies themselves are downloaded from your internal mirror of crates.io crates, and where you audit every package source code before adding it in your internal mirror registry.
A large number of security issues in the supply chain are found in the weeks or months after library version bumps. Simply waiting six months to update dependency versions can skip these. It allows time to pass and for the dependency changes to receive more eyeballs.
Vendoring buys and additional layer of security.
When everyone has Claude Mythos, we can self-audit our supply chain in an automated fashion.
I really like the idea of implementing the std lib separate from the language. I think that would be a huge blessing for Java, Go and others, ideally allowing faster iteration on most things given that we usually don't need a reinvention of the compiler/runtime just to make a better library.
As long as you can include only the parts that you need.
In Java, the "stdlib" that comes with the JRE, like all the java.* classes, counts 0 towards the size of your particular program but everyone has to have the whole JRE installed to run anything. Whereas if you pull in a (maven) dependency, you get the entirety of the dependency tree in your project (or "uberjar" if you package it that way).
Then we could decide on which of java.util.collections, apache commons-collections, google guava etc. become "standard" ...
> I really like the idea of implementing the std lib separate from the language. I think that would be a huge blessing for [...] Go
Go's stdlib is separate from the language. The language spec doesn't specify a standard library at all. It also doesn't have just one stdlib. tinygo's stdlib isn't the same as gc's, for example.
I will note that gc's standard library also isn't written in Go. It is written in a superset with a 'private' language on top that is tied to the gc compiler to support low-level functions that Go doesn't have constructs for. So separating the standard library from the compiler wouldn't really work. No other Go compiler would be able to make sense of it. go1 promise aside, the higher-level packages that are pure Go could be hoisted completely out of the stdlib, granted.
That is why you mix in "Something So Feminine About A Mandolin" in once in a while. Or if you really insist on only very well known tunes "Cheese Burger in Paradise" should still count.
Coding agents should help us reduce dependencies overall. I agree Go is already best positioned as a language for this. Using random dependencies for some small feature seems archaic now.
Why is vendoring frowned upon, really? I mean, the tooling could still know how to fetch newer version and prepare a changeset to review and commit automatically, so updating doesn't have to be any harder. In the end, your code and the libraries get combined together and executed by a computer. So why have two separate version control systems?
Vendoring doesn't entirely solve the problem with hidden malicious code as described in the article, but it gives your static analyzers (and agents) full context out of the box. Also better audit trail when diagnosing the issue.
Go's module system got a lot right, but it's a complete nightmare to work with. The author actually highlighted one of it's very worst traits:
> Go got it right from the beginning and didn't use a centralized package registry to manage dependencies, but instead you have to directly point to the source code of the packages.
Directly coupling the identity of a package to its location means that you can't change one without the other. Need to rename to a fork of a dependency? You'll have to touch every single file that imports it. Need to use a organisational local cache for deps? It better be a transparent proxy or you can't do that. The only support for this is replace statements in go.mod files, but those tie you even further in knots when you need to pull in a dependency that has a replace statement in it.
It's even worse on the maintainer side. If you want to rename a git repository, rename a GitHub organisation, migrate a repo to a different owner, or even move git hosting providers then you risk breaking every single downstream. The only solution is to host a proxy for your packages on a custom URL that redirects to the backend hosting provider, and set your Go module's name to be on that custom URL. Which requires you to do this ahead of time before the package is widely used.
Cargo and crates.io could be better but Go is the worst place to draw inspiration from as it's full of ideas that seem clean, work at first, and break in hard to fix ways once you do anything complex.
> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does (I can't imagine about the long tail which receives less attention).
I think this post has some good information in it, but this is essentially overstated: I look at crate discrepancies pretty often as part of reviewing dependency updates, and >90% of the time it's a single line difference (like a timestamp, hash, or some other shudder between the state of the tree at tag-time and the state at release-time). These are non-ideal from a consistency perspective, but they aren't cause for this degree of alarm -- we do know what the code does, because the discrepancies are often trivial.
Not only this, but the reason we can check what the discrepancy is is because crates.io distributes source code, not binaries, so they can always be inspected. In the end, whats in crates.io is the source of truth.
Isn't the point that unless actually audited each time, the code could still be effectively anything?
Yes, but that's already the case. My point was that in practice the current discrepancies observed don't represent a complete disconnect between the ground truth (the source repo) and the package index, they tend to be minor. So describing the situation as "nobody knows what 17% of the top crates.io packages do" is an overstatement.
I think it just depends on whether or not you interpret the phrase "no one knows" neutrally or pessimistically.
Saying that there could be something there, but "no one knows" doesn't mean that there is something there. But it's still true.
If that's the case, it would be a lot simpler (and equally accurate) to say that "no one knows" what the source repo is doing, either! The median consumer of packages in any packaging ecosystem is absolutely not reading the entire source code of their dependencies, in either the ground truth or index form.
That's certainly true - and would also be true (maybe even moreso) if vendoring dependencies was widespread. Seems just as easy to hide things in a "vendored" directory that's 20x the size of the library.
> So describing the situation as "nobody knows what 17% of the top crates.io packages do" is an overstatement.
Noting that you willfully cut the qualifying "virtually" from that quote, thereby transforming it to over-stated:
> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does
That wasn't intentional. But also, I don't think "virtually" actually changes the meaning substantially; it has the same conventional meaning in that position as "effectively" or "might as well be nobody."
Serious consideration: Claude Mythos is going to change the risk envelope of this problem.
We're still thinking in the old mindset, whereas new tools are going to change how all of this is done.
In some years dependencies will undergo various types of automated vetting - bugs (various categories), memory, performance, correctness, etc. We need to think about how to scale this problem instead. We're not ready for it.
I specifically don't update the version in Cargo.toml in the codebase. I patch it in just before cargo publish, otherwise all other PRs now need to change.
> we do know what the code does
You know if you check. Hardly anyone checks. It's just normalization of deviance and will eventually end up with someone exploiting it.
> The single best defense against supply chain attacks is a comprehensive standard library developed by experts, such as Go's one.
Go programs, and python programs (which also has a pretty comprehensive standard library) have a lot of dependencies too. A big standard library helps a little, but I'm doubtful it is the "single best defense".
And there are several practical problems with a big standard library, which this article didn't address at all. I think for rust at least, a much better approach would be to have a collection of "blessed" libraries under the umbrella of the Rust Foundation. But that just reduces the risk for a subset of dependencies it doesn't solve the fundamental risks.
Talked about this topic here on my blog
https://vincents.dev/blog/rust-dependencies-scare-me/
It sparked some interesting discussion by lots of the rust maintainers
https://news.ycombinator.com/item?id=43935067
A fat std lib will definitely not solve the problem. I am a proponent of the rust foundation taking packages under their wing and having them audited and funded while keeping original maintainers in tact
> fat std lib will definitely not solve the problem
fully agree, that was tried and failed severely
- in python there is a saying that standard library is where packages go to die. It's pretty common to pull in 3rd party libraries for things in python due to the build in version by itself sucks. Weather it's for UX, performance, bug-proneness, feature completeness and even "bad security choices stuck with for backward compatibility reasons" cases happened.
- in Java "batteries included" had been repeatedly involved in pretty bad security vulnerabilities. Often on the line of "that niche feature people may not even be aware of was callable though some reflection/dynamic resolution leading to a RCE".
In the end IMHO it's not supper relevant if the rust foundation takes packages under their wing or not. What matters is to create verifiable supply trust.
That crates.io is only meant to contain source code already helps, and them only allowing uploading new packages and yanking but not overwriting them also helps.
Through much more is needed.
Go is another example of a fat std lib causing issues specifics with their crypto code.
I think in general the things people are worried about are
1. Maintainer quits 2. Bad actor becomes new maintainer 3. Bad pr 4. Account compromise
When I say I want the rust foundation to take them under their wing what I really mean is I want the foundation to provide funding and have packages undergo the same procedure as the main language.
If there’s a cve the foundation should orchestrate reporting and standardize it.
If it becomes abandoned the foundation should handle that.
Basically I want it to be an extension of the standard but not in a way that actually requires it to be so. I just want these packages to have the seal of approval of the foundation so I know that they have a minimum amount of quality and are vetted on the regular by a trusted entity
We're going to be launching Chainguard Libraries for Rust in a few weeks, this article perfectly calls out the issues.
crates are somewhat better designed than NPM/PyPI (the dist artifacts are source based), but still much worse than Go where there's an intermediate packaging step disconnected from the source of truth.
I'm not really convinced that having a few more libraries in the standard library or decentralizing the library repository is going to change much the risks
> In a recent analysis, Adam Harvey found that among the 999 most popular crates on crates.io, around 17% contained code that do not match their code repository.
Huh, how is this possible? Is the code not pulled from the repository? Why not?
Publishing doesn't go through GitHub or another forge, it's done from the local machine. Crates can contain generated code as well.
This is where the whole TPM / trusted computing / secure enclave could be useful to secure developer keys; an unencrypted .ssh/id_rsa file is just too much of a tempting target (also get off RSA already!)
You don't need the secure boot machinery for that though, a hardware security token would do and has the advantage that you need to acknowledge actions with a tap
Tangentially, soon all those will be replaced with new hardware supporting PQ signatures.
I've started keeping important signing keys in cloud HSM products. Getting AWS KMS to sign a payload is actually very straightforward once you've got your environment variables & permissions set up properly.
Rust should add a way to sandbox every dependency.
It's basically what we're already doing in our OSes (mobile at least), but now it should happen on the level of submodules.
How would that work? Rust "crates" are just a compilation unit that gets linked into the resulting binary.
An extremely verbose effects system can resolve these dependency permissions at compile time.
However, balancing ergonomics is a the big challenge.
I personally would prefer less ergonomics for more security, but that’s likely not a broadly shared opinion.
This is a nice exercise for compiler researchers.
I suppose it can be done on various levels, with various performance trade-offs.
more specifically, one can introduce policies into the runtime, or given rust hoist at least some of them into compiletime that would do things like (a) enforce syscall filtering based on crate or even function (b) support private memory regions for crates or finer grained entities that are only unlocked upon traversing a declared call-gate (c) the converse, where crates can only access memory they themselves have allocated except a whitelist of parameters (d) use even heavier calling conventions that rpc to entirely separate processes
By using the type system. You define your type constraints at the module interface point and when you try to link the third-party module into that interface the compiler ensures that the constraints are satisfied. Same thing the compiler is already doing in simpler cases. If you specify that a third-party library function must return an integer, the compiler will ensure that function won't unexpectedly return a string. Just like that, except the type system is expanded to enable describing more complex behaviours.
> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does (I can't imagine about the long tail which receives less attention).
I dug into the linked article, and I would really say this means something closer to 17% of the most popular Rust package versions are either unbuildable or have some weird quirks that make building them not work the way you expect, and not in a remotely reproducible fashion.
https://lawngno.me/blog/2024/06/10/divine-provenance.html
Pulling things into the standard lib is fine if you think everyone should stop using packages entirely, but that doesn't seem like it really does anything to solve the actual problem. There are a number of things it seems like we might be forced to adopt across the board very soon, and for Rust it seems tractable, but I shudder to think about doing it for messier languages like Ruby, Python, Perl, etc.
* Reproducible builds seems like the first thing.
* This means you can't pull in git submodules or anything from the Internet during your build.
* Specifically for the issues in this post, we're going to need proactive security scanners. One thing I could imagine is if a company funnels all their packages through a proxy, you could have a service that goes and attempts to rebuild the package from source, and flags differences. This requires the builds to be remotely reproducible.
* Maybe the latest LLMs like Claude Mythos are smart enough that you don't need reproducible builds, and you can ask some LLM agent workflow to review the discrepancies between the repo and the actual package version.
> and I would really say this means something closer to 17% of the most popular Rust package versions are either unbuildable or have some weird quirks that make building them not work the way you expect
No, what it means is that the source in crates.io doesn't match 1:1 with any commit sha in their project's repo. This is usually because some gitignored file ended up as part of the distributed package, or poor release practice.
This doesn't mean that the project can't build, or that it is being exploited (but it is a signal to look closer).
Let me start by saying I love rust, but the supply chain story for official rust compiler binaries is not okay and I would never even trust them even on a dev workstation, let alone in production.
Most people get rust from rustup, an unsigned "curl | sh" style magic script.
Whoever controls the dns for rustup.rs, or the webserver, or the BGP nodes between you and the webserver, can just change that at any time, or change it only when requests come from specific IP addresses and backdoor people all day long.
Next you end up getting binaries of rust, that are not reproducible, and have no remote attestation provenance for the build system. Without at least one of those that means there is a CI, or workstation somewhere one or more people control building those binaries that could also tamper with them silently at build time. No one can reproduce them so until someone has the time to do some deep dive binary diffing, unlikely to be detected any time soon.
And then, we get to the fact the official rust releases are not full source bootstrapped. To build rust 1.94 you need rust 1.93 and so on. That means if you have ever backdoored -any- release of rust in the past using the above weaknesses, you have backdoored all of them via trusting trust attacks where the backdoor always detects when the next version of the rust compiler is being built and copies itself over to the new build.
The way you prove this is not happening within the rust build chain, is you bootstrap from another compiler. The rust team does not do this but thankfully Mutabah made mrustc, which is a minimal c++ port of the rust compiler suitable for building the actual rust compiler, so we can anchor our supply chain to a C compiler instead.
But now how do you trust the C compiler? Some random compiler from debian is at least signed, but only signed by one person. Another major risk.
So now you need to build your c compiler from source code all the way up, a technique called full source bootstrapping. A tiny bit of human reviewable machine code is used to build a more complex version of itself, all the way up to tinycc, gcc, llvm, and eventually rust. And then you have it all be deterministic and then have many people build any portions of the build chain that have changed every release and all get the same result, and sign that result. THEN we know you are getting a faithful build of the rust compiler from source that no one had the opportunity to tamper with.
That is how we build and release rust in stagex: https://stagex.tools/packages/core/rust/
Credit where due that Guix did this first, though they still have a much more relaxed supply chain security policy so threat model accordingly.
But how do you know the actual source of the stagex build process was not tampered by an impersonated maintainer that merged their own malicious PR made by a pseudonym bypassing code review? Well we sign every commit, and we sign every PR. Every change must have at least two cryptographic signatures by well known WoT established private keys held on smartcards by maintainers. We simply do not allow merging PRs from randos until another maintainer has signed the PR and then a -different- maintainer can review and do a signed merge. This also means no one can re-write git history, so we can survive a compromise even of the git server itself.
This is something only stagex does as far as we can tell, as our threat model assumes at least one maintainer is compromised at all times.
But, aside from a few large high risk entities, most people are not using stagex or guix built rust and just yolo using a shell script to grab a random binary and start compiling code with it.
I would strongly urge people to stop doing that if you are working on software meant to run on anything more security sensitive than a game console.
With the giant wave of AI bots doing account takeovers and impersonation all the time, github login as the last line of defense is going to keep ending badly.
Use provably correct binaries no single person can tamper with, or build them yourself.
rust fixed memory safety but left build-time trust wide open. What’s the realistic path to fixing this? sandboxed builds by default, or stricter provenance (sigstore-style) or what?
Eh, the only way to secure your Rust programs it the technique not described in the article.
Vendor your dependencies. Download the source and serve it via your own repository (ex. [1]). For dependencies that you feel should be part of the "Standard Library" (i.e. crates developed by the Rust team but not included into std) don't bother to audit them. For the other sources, read the code and decide if it's safe.
I'm honestly starting to regret not starting a company like 7 years ago where all I do is read OSS code and host libraries I've audited (for a fee to the end-user of course). This was more relevant for USG type work where using code sourced from an American is materially different than code sourced from non-American.
[1]: https://docs.gitea.com/usage/packages/cargo
The only thing this leads to is that you'll have hundreds of vendored dependencies, with a combined size impossible to audit yourself.
But if you somehow do manage that, then you'll soon have hundreds of outdated vendored dependencies, full of unpatched security issues.
> full of unpatched security issues
If you host your own internal crates.io mirror, I see two ways to stay on top of security issues that have been fixed upstream. Both involving the use of
which uses the RustSec advisory DB https://rustsec.org/Alternative A) would be to redirect the DNS for crates.io in your company internal DNS server to point at your own mirror, and to have your company servers and laptops/workstations all use your company internal DNS server only. And have the servers and laptops/workstations trust a company controlled CA certificate that issues TLS certificates for “crates.io”. Then cargo and cargo audit would work transparently assuming they use the host CA trust store when validating the TLS certificates when they connect to crates.io. The RustSec DB you use directly from upstream, not even mirroring it and hosting an internal copy. Drawback is if you accidentally leave some servers or laptops/workstations using external DNS, and connections are made to the real crates.io instead. Because then developers end up pulling in versions of deps that have not been audited by the company itself and added to the internal mirror.
Alternative B) that I see is to set up the crates host to use a DNS name under your own control. E.g. crates dot your company internal network DNS name. And then set up cargo audit to use an internally hosted copy of the advisory DB that is always automatically kept up to date but has replaced the cargo registry they are referring to to be your own cargo crates mirror registry. I think that should work. It is already very easy to set up your own crates mirror registry, cargo has excellent support built right into it for using crates registries other than or in addition to crates.io. And then you have a company policy that crates.io is never to be used and you enforce it with automatic scanning of all company repos that checks that no entries in Cargo.toml and Cargo.lock files use crates.io.
It would probably be a good idea even to have separate internal crate registries for crates that are from crates.io and crates that are internal to the company itself. To avoid any name collisions and the likes.
Regardless if going with A) or B), you’d then be able to run cargo audit and see security advisories for all your dependencies, while the dependencies themselves are downloaded from your internal mirror of crates.io crates, and where you audit every package source code before adding it in your internal mirror registry.
A large number of security issues in the supply chain are found in the weeks or months after library version bumps. Simply waiting six months to update dependency versions can skip these. It allows time to pass and for the dependency changes to receive more eyeballs.
Vendoring buys and additional layer of security.
When everyone has Claude Mythos, we can self-audit our supply chain in an automated fashion.
Random question, does cargo have a way to identify if a package uses unsafe Rust code?
No, but you can use cargo-geiger[1] or siderophile[2] for that.
[1]: https://github.com/geiger-rs/cargo-geiger
[2]: https://github.com/trailofbits/siderophile
I really like the idea of implementing the std lib separate from the language. I think that would be a huge blessing for Java, Go and others, ideally allowing faster iteration on most things given that we usually don't need a reinvention of the compiler/runtime just to make a better library.
As long as you can include only the parts that you need.
In Java, the "stdlib" that comes with the JRE, like all the java.* classes, counts 0 towards the size of your particular program but everyone has to have the whole JRE installed to run anything. Whereas if you pull in a (maven) dependency, you get the entirety of the dependency tree in your project (or "uberjar" if you package it that way).
Then we could decide on which of java.util.collections, apache commons-collections, google guava etc. become "standard" ...
> I really like the idea of implementing the std lib separate from the language. I think that would be a huge blessing for [...] Go
Go's stdlib is separate from the language. The language spec doesn't specify a standard library at all. It also doesn't have just one stdlib. tinygo's stdlib isn't the same as gc's, for example.
I will note that gc's standard library also isn't written in Go. It is written in a superset with a 'private' language on top that is tied to the gc compiler to support low-level functions that Go doesn't have constructs for. So separating the standard library from the compiler wouldn't really work. No other Go compiler would be able to make sense of it. go1 promise aside, the higher-level packages that are pure Go could be hoisted completely out of the stdlib, granted.
Why not pin your packages? Andnwhy not have M of N auditors sign off on releases?
Also to note that RS domain is Serbia, who could simply redirect all rust users to malicious domains in a supply chain attack.
How realistic is for a TLD “owner” to take over a domain like this?
Doesn't USA do that all the time with .com and such?
How would that get around the SSL certificate?
If you control the domain, LetsEncrypt will happily issue you a fresh certificate.
But it's impossible to have a buffet overflow in rust
> But it's impossible to have a buffet overflow in rust
I dunno, I can only listen to Margaritaville so many times in a row.
That is why you mix in "Something So Feminine About A Mandolin" in once in a while. Or if you really insist on only very well known tunes "Cheese Burger in Paradise" should still count.
Coding agents should help us reduce dependencies overall. I agree Go is already best positioned as a language for this. Using random dependencies for some small feature seems archaic now.
Why is vendoring frowned upon, really? I mean, the tooling could still know how to fetch newer version and prepare a changeset to review and commit automatically, so updating doesn't have to be any harder. In the end, your code and the libraries get combined together and executed by a computer. So why have two separate version control systems?
Vendoring doesn't entirely solve the problem with hidden malicious code as described in the article, but it gives your static analyzers (and agents) full context out of the box. Also better audit trail when diagnosing the issue.
I agree. Also, very different world from Rust, but shadcn has popularized this for UI components and AI skills are done this way frequently.
I'm excited to see more patterns like this for other types of code.