Adding Build Provenance to Homebrew

dwheeler · on Dec 3, 2023

The post mentions SLSA level 2. Here are the SLSA levels, for those not familiar with them:

Here's the full spec from its beginning:

For those saying we should have verified reproducible builds: yes, that'd be great! This effort does not conflict with reproducible builds. However, some builds are hard to reproduce. Making sure you protect the build process itself is a great first step.

csande17 · on Dec 3, 2023

> For those saying we should have verified reproducible builds: yes, that'd be great! This effort does not conflict with reproducible builds.

Well, we can hope, anyway. "Provenance" necessarily involves injecting secret data into the build process so that it can't be reproduced by third parties.

It's mostly a question of how careful you are to keep the keys / signatures completely separate from all of the other build inputs and outputs. Some code signing systems make that pretty difficult.

colek42 · on Dec 4, 2023

Provenance is NOT injecting secret data into the build process. Provenance (scoped to supply chain security) is a document that describes the process in which the artifact goes through to become an artifact, to include all steps such as testing, GRC, etc.

in-toto is a great way to describe provenance. I talk about it in the CNCF blog article: https://www.cncf.io/blog/2023/08/17/unleashing-in-toto-the-a...

Disclaimer, I am a member if the in-toto steering committee and the CEO of a software supply chain startup, Testifysec. https://github.com/in-toto/witness is our project

woodruffw · on Dec 5, 2023

The context here is signed build provenance, which does involve injecting a secret (or more accurately, some publicly verifiable credential that only an a priori trusted party can mint or otherwise issue for) into the context that the provenance belongs to.

You're right that provenance itself doesn't require this, but that is principally because it punts on the problem of authenticity. Whether or not authenticity matters probably depends on the value and scope of the provenance's use :-)

kylegalbraith · on Dec 3, 2023

What are some of the systems that make that difficult?

hardwaresofton · on Dec 3, 2023

Isn’t what we really need reproducible binaries?

The problem of reproducible (and thus verifiable) software is 99% solved practically with languages that build reproducible binaries.

Requiring at least one human to build and sign a production release (which we know will be identical to what is produced by CI) is probably a reasonable way to solve this problem.

While this usually implies static linking, even a dynamic link would work if we had a way to ask for a dynamic lib by content hash, and we aggressively shunted dynamic per-platform content to other runtime files.

woodruffw · on Dec 3, 2023

Reproducibility has two operational constraints: it assumes the user is actually reproducing the binary (not the case in Homebrew, where the binary is pre-built and distributed to users), and it assumes that users have some way to ascertain whether the build has actually reproduced (versus comparing it to a digest delivered by the potentially malicious index, which could then be whatever malicious build the index intends).

Homebrew packages whatever software its users want; artificially limiting their packages to only languages that make reproducibility easy would significantly diminish its value.

Edit: to be absolutely clear, reproducibility is a great idea, and a great long term goal. It is not a sufficient substitute for digital signatures, especially not in binary distribution schemes like Homebrew.

csande17 · on Dec 3, 2023

I'm not sure I understand this line of argument. A system where every user rebuilds all packages themselves, just to use the resulting file hashes to verify whether they can trust an identical binary they already downloaded from the Internet, is obviously kind of goofy.

Instead, I think the use case for reproducibility is allowing multiple trusted parties to independently build and generate signatures for the same binary. In GP's example, open source contributors could run their own builds independently (maybe on air-gapped machines) to produce multiple trustworthy signatures for the resulting binaries. F-Droid does this kind of thing today; you can get apps verified by both the original author and the F-Droid infrastructure.

I think the main difference between that and your scheme is your scheme relies on whoever runs the CI/CD machines (Microsoft?) not tampering with them. But the underlying signature technology is probably still useful in a reproducible-builds world—assuming you don't break the reproducibility of the bottles by e.g. embedding the signatures into the binaries somehow.

woodruffw · on Dec 3, 2023

> Instead, I think the use case for reproducibility is allowing multiple trusted parties to independently build and generate signatures for the same binary. In GP's example, open source contributors could run their own builds independently (maybe on air-gapped machines) to produce multiple trustworthy signatures for the resulting binaries. F-Droid does this kind of thing today; you can get apps verified by both the original author and the F-Droid infrastructure.

You can do this, but I don’t think it has much of a value (or security) proposition on its own. The ideal world here would be “reproducible signed builds + occasional independent rebuilds with gossiping,” but that is significantly harder to operationalize than just build signing or reproducibility.

csande17 · on Dec 3, 2023

> You can do this, but I don’t think it has much of a value (or security) proposition on its own.

The value prop is pretty straightforward in the open-source space: allowing clients to choose to only trust binaries signed by N independent actors.

F-Droid achieves this today: if an application needs to be built/signed by both the original author and F-Droid to be considered valid, neither party can push a compromised binary on their own. ("Compromised" in the sense of "doesn't match the source code it claims to", which AFAICT is all you're targeting with this Homebrew scheme too.)

GNU Guix is working on a similar scheme where the client can choose to only accept binaries built by K of N "substitute servers". And even today, they ship a one-click "challenge" command that rebuilds a bunch of binaries to report whether a substitute server is shipping something suspicious; users can then gossip about this via their favorite social media platform.

I'd argue that both of these solutions create a lot more value than your scheme, where most of the people who would previously be able to tamper with the single, central instance of the build infrastructure are still able to do that without being detected.

woodruffw · on Dec 3, 2023

> The value prop is pretty straightforward in the open-source space: allowing clients to choose to only trust binaries signed by N independent actors.

That’s the gossiping part: unless you’re independently contacting each of those independent actors, you’re doing no more than 1-of-N.

As I’ve said in other comments: “can” translates to “won’t” in cryptographic contexts. Gossip does not matter if it doesn’t have controls behind it.

> where most of the people who would previously be able to tamper with the build infrastructure are still able to do that without being detected.

Could you provide an example scenario? Assuming the people you’re referring to are Homebrew’s own maintainers, this is not true: tampering would require them to modify the workflow, which cascades to a change in claims, which in turn is detectable (and is compelled to be detectable, thanks to both a transparency log and how the underlying identity construction works).

csande17 · on Dec 3, 2023

> As I’ve said in other comments: “can” translates to “won’t” in cryptographic contexts. Gossip does not matter if it doesn’t have controls behind it.

Yes, so you... code the client to check for the presence of multiple signatures (in the F-Droid case where the F-Droid service knows about the original author signature) or ship the URLs for two of the independent actors in the default configuration (in the GNU Guix case)? This does not seem like a hard problem.

> Could you provide an example scenario?

Whatever person or company runs the box that builds Homebrew packages tampers with that box.

Maybe Microsoft gets a National Security Letter, or has a tenant isolation bug. Or I dunno, the last time I seriously worked with Homebrew they were running Jenkins on a Mac Mini in some random colo facility, and the maintainers definitely had root on that thing.

woodruffw · on Dec 4, 2023

> code the client to check for the presence of multiple signatures (in the F-Droid case where the F-Droid service knows about the original author signature)

This trusts the client for both parties, while also treating the client as a potentially malicious party. This puts you back at 1-of-N.

> or ship the URLs for two of the independent actors in the default configuration (in the GNU Guix case)

This gets very unreliable, very fast: independent verifiers go down, stop verifying, &c. Even extremely well capitalized companies struggle to run reliable timestamping and similar services; I don’t think it would be responsible to tie Homebrew’s availability to the accessibility of a small handful of external, bespoke hosted services.

This is all to say: checking multiple points of trust without implicitly collapsing them into a single point of trust is, in fact, a pretty hard problem. Especially when you throw external unreliabilities into the mix.

To the best of my knowledge, all Homebrew bottles are currently built on GitHub Actions; there hasn’t been a Jenkins box in a while. The entire point of this work is to bootstrap signatures on the latent identities that GitHub Actions provides, since doing so avoids nearly all of the normal logistical issues that pop up when doing codesigning.

The point about tenant isolation is a good one.

csande17 · on Dec 4, 2023

> This trusts the client for both parties, while also treating the client as a potentially malicious party. This puts you back at 1-of-N.

Is this just the "well how do you get a trustworthy copy of the software you rely on to tell you whether copies of software are trustworthy" question? Since obviously once you have that software, it can download as many different signatures as it wants from the F-Droid service and validate them locally without needing to trust the F-Droid service; that is the point of digital signatures.

That seems like it'd be an issue for your thing as well, since the two available ways to install Homebrew itself are curl|bash and downloading/running an installer from a GitHub Releases page. Then again, if the security model here revolves around axiomatically trusting Microsoft (and also believing Microsoft is the only entity in the world capable of running CI/CD infrastructure), I guess that's not an issue.

yjftsjthsd-h · on Dec 3, 2023

I don't think every user has to reproduce the build; it should suffice to have a handful of verifiers building and checking the published versions. Someone needs to do it, but not everyone.

(Edit: yes, you do need a way to make sure everyone is seeing the same published hashes; no idea how to do that)

woodruffw · on Dec 3, 2023

> I don't think every user has to reproduce the build; it should suffice to have a handful of verifiers building and checking the published versions. Someone needs to do it, but not everyone.

This works, but you still need a gossiping mechanism (and accompanying threshold model, etc.). That gets very complicated very fast!

hardwaresofton · on Dec 3, 2023

Yup, thought it was clear in my comment, the software producers should be building and signing THEIR copy from their machine.

Seeing the same public hashes can work just like people check shasums now, also possible to independently verify at any time

woodruffw · on Dec 3, 2023

> Seeing the same public hashes can work just like people check shasums now, also possible to independently verify at any time

“Verify at any time” devolves to “verify never.” Verification needs to be mandatory to have any security value, which comes back the original point about operationalization.

hardwaresofton · on Dec 3, 2023

See the other comment, the idea is that the person producing the binary signs a version as an exemplar of sorts.

Also there’s no need to limit the publishing of unsigned packages at first, there needs to be a transition period. The problem is that if no one makes the hard decision to make this a priority then it never becomes one and we keep operating on hope and sheer luck.

Hope any luck definitely have worked great thus far but maybe we can work differently going forward

mikemcquaid · on Dec 3, 2023

Homebrew Project leader here.

Homebrew tries fairly hard already to make builds reproducible. Unfortunately, no one has been dedicated enough build infrastructure to make sure this does regress so we end up with issues like https://github.com/Homebrew/brew/issues/16012

To be clear, though, some Homebrew packages are already fully reproducible across OSs, some on the same OS and we welcome any help in improving this further. Thanks for raising the issue.

inferiorhuman · on Dec 11, 2023

Reproducible… but unsupported. Nice.

https://github.com/Homebrew/homebrew-core/issues/108964#issu...

hardwaresofton · on Dec 3, 2023

Thanks for the often thankless work you do!

I just want to be clear, this wasn’t something that I thought was wrong with homebrew, it was more about the ecosystem embracing reproducibility so it’s easier for you all and similar projects, overall!

I personally think that approaches like nix/guix ask for the forest when they only need the tree — I often only care if the software I built is reproducible, and I don’t think that always requires the entire package store/OS to be.

zamalek · on Dec 3, 2023

It's called Nix and it works on MacOS. https://nixos.org/

It's honestly what MacOS almost bearable for me prior to me convincing my employer to allow me to use Linux.

woodruffw · on Dec 3, 2023

This is my project; I’m happy to answer any questions about it!

yjftsjthsd-h · on Dec 3, 2023

> Once provenance on homebrew-core is fully deployed, a user who runs brew install python will be able to prove each of the following:

> [...]

> The bottle was built in a public, auditable, controlled CI/CD environment against a specific source revision.

Isn't it more precisely like, "the bottle was built on a machine with a signing key"? You could probably sign with a HSM to tie it to a machine, but how do you go from that to proving that the machine only built a bottle using what's in source control (vs an attacker getting code execution in the build server and building+signing anything they want)?

woodruffw · on Dec 3, 2023

This is a good question, thanks for asking it.

The specific scope here is a GitHub Acfions workflow, which presents a set of claims (username, repository, workflow name, git commit, etc.) that are bound into the digital signature. In other words: the signing key’s intrinsic “identity” includes a stable identifier for the workflow and source code that ran it.

This doesn’t stop an attacker with access to the workflow from tampering with the build (nothing can stop that, including an HSM), but it does produce a game-theoretic disincentive: the signing identity will reflect any changes made to the workflow by the attacker, meaning that they cannot simultaneously maintain stealth and craft a malicious signature.

(This assumes that the build workflow itself is relatively hermetic, i.e. is reproducible and doesn’t pull random executables from the Internet. But that’s table stakes!)

yjftsjthsd-h · on Dec 3, 2023

> the signing identity will reflect any changes made to the workflow by the attacker,

Why? I'm not super familiar with github actions specifically, but AIUI there's a build B with properties (like git hash(es) that went in, user, repo, etc.) P(B), then a signature S(P(B)) that gets either embedded in the final artifact or shipped next to it (I think the former but doesn't really matter). So an attacker with code execution on the build host can write a fake P(B) that claims to represent the right repo+workflow and a plausible git commit hash, but combine with their own compiled binaries and sign that combination. With reproducible builds you could expose this after the fact by showing that that git commit does not build that binary, but I'm not following how end users can tell without doing the compile themselves.

Edit: So reading more docs, I think you're going to meet L2 easily, but the plain language claim sounds like L3. However, it kinda reads like L3 just relies on github promising that the build machine won't get compromised, which... I guess if you accept that precondition it works.

woodruffw · on Dec 3, 2023

The piece you’re missing is that P(B) is itself digitally signed, in the form of an OIDC token from the GitHub Actions IdP. An attacker can’t contrive it without access to the IdP’s key material.

(This is all compatible with reproducible builds, for the reason you’ve noted.)

And yes: Build L2 is the goal here. L3 assumes trusted hardware, which is a much higher bar to clear.

password4321 · on Dec 3, 2023

It's early on the weekend but the question I predict is "What made Nix/Guix unsuitable as a part in this?"

The technology keyword SEO is strong around here! However if you've already documented pros/cons such a list could provide constructive direction for those projects.

stakhanov · on Dec 3, 2023

You seem to be throwing in some thoughts while simultaneously attributing them to an unidentified third party and distancing yourself from them, which is a bit confusing.

My two cents' worth: Just because a technology has reached buzzword status in a given social cluster doesn't impose a duty on everyone in that cluster to explain themselves for not using it.

Personally, I'm a huge fan of brew, while being not such a huge fan of nix/guix, despite the fact that nix/guix is a buzzword around here while brew is not.

password4321 · on Dec 3, 2023

> You seem to be throwing in some thoughts while simultaneously attributing them to an unidentified third party and distancing yourself from them, which is a bit confusing.

True. "If the crowd was here they might want to know why their favorite hammer didn't help" is more a best guess to generate helpful documentation should this survive on the front page until tomorrow. I did not intend to create an additional burden on anyone. I appreciated the offer to answer questions, and the mic was open while they were around.

Thank you for sharing your experience with Mac package managers. I used brew but switched to macports the second time I forgot to disable "auto update the world" as Apple+brew left my comfortable old macOS version behind (I use older hardware, I don't track macOS release dates, and brew uninstalled working software before failing to build the updated version). I have been too terrified to try Nix on Mac due to the old "I couldn't uninstall" issues.

stakhanov · on Dec 3, 2023

I actually use linuxbrew and don't use macs at all. Specifically, the way I like to set up my dev environments is to use Gentoo for the root filesystem, but also throw in linuxbrew to balance compile-from-source purism with binary-package pragmatism for packages and usecases where I just don't care as much about where my packages came from and how they were compiled.

One of the things I like about brew is that I can just tarball-up /home/linuxbrew/.linuxbrew and if I get myself into a pickle, I can just unpack a tarball in that location and be up-and-running. After I make any meaningful set of changes to the environment that I'm developing against, I make a versioned tarball and keep it for all eternity, that way I can have reasonable hope that if I ever want to run code I wrote a long time ago and haven't used for a long time, I will be able to re-create the development environment that I was using around it.

woodruffw · on Dec 3, 2023

> It's early on the weekend but the question I predict is "What made Nix/Guix unsuitable as a part in this?"

I'm not sure I understand: Homebrew is a package manager in its own right, so it doesn't make a lot of sense for it to retrofit an entire other packaging ecosystem (Nix or Guix) on top of it.

(I don't know those ecosystems particularly well, but my understanding is that neither has code signing/build provenance either -- for Nix at least my understanding is that there are no remote builds in the first place, so build provenance isn't relevant.)

tomberek · on Dec 3, 2023

Nix (and by extension Guix) has had remote builds and signed build provenance for nearly 2 decades. It has also had a story to deal with both content-addrrssed and input-addressed (this is our term for the "stable identifier" for a workflow mentioned above).

I can understand not using Nix, but please take a look at how we've solved some of these problems, and our mistakes. There is no need to reinvent the wheel blindly. Feel free to reach out if interested.

woodruffw · on Dec 3, 2023

> Nix (and by extension Guix) has had remote builds and signed build provenance for nearly 2 decades.

Where is this documented? When I search for "Nix code signing" I mostly get issue reports for people experiencing breakage due to macOS's (separate) code signing requirements.

Edit: To be clear, I like Nix, and I find many of its properties appealing. At the same time, it's not clear to me that Homebrew has too much to glean here: Homebrew isn't trying to be a totalizing package/system state management solution, and has historically leaned heavily on automation and external services (GHA, GHCR, etc.). These are design decisions with tradeoffs; they've also informed the proposed build provenance design substantially.

tomberek · on Dec 3, 2023

There is a naming problem here where "Nix" is often easily confused with the state management pieces (home-manager, NixOS, etc). The underlying Store abstraction provides the concepts of various kinds of addressing. This is independent of the Nix language and we are currently trying to expose each of these layers more directly so it can be re-used in other contexts.

The piece that might be of interest to you is the interaction between content-addressed and input-addressed content in light of build systems+software+compilers that are not bit-reproducible. There is also the bookkeeping to ensure auditability by third-parties; anyone else can run the same builds and expect the same outcome, or close enough that the differences become easy to find and report upstream (https://r13y.com/), but the system doesn't require bit-reproducibility from day1 because that would be impractical. Of course substituting from a cache is exact, made possible by the signatures.

Remote builds: https://nixos.org/manual/nix/stable/advanced-topics/distribu...

Signing (i'll amend my claim above to be only 1 decade). These are either signatures of a CA path, or for input-addressed things it can be seen as a claim by the signer that the producing build recipe (we call it "derivation") has this particular binary output.

- https://nixos.org/manual/nix/stable/command-ref/nix-store/ge...

- https://nixos.org/manual/nix/stable/advanced-topics/post-bui...

- https://nixos.org/manual/nix/stable/command-ref/conf-file.ht...

- example: see the Signatures line for an example that this this specific provenance produced this specific binary output: https://trusted-friendly-sesame.glitch.me/view.html?cache_ba... or https://cache.nixos.org/7ghhnlwla2mddkg7hgqa5v0sr8g5hga8.nar...

password4321 · on Dec 3, 2023

Thanks for answering my question; I apologize for not explaining it very well.

You are creating CI/CD infrastructure to build binaries for Homebrew. Per my limited understanding, Nix/Guix will someday build reproducible binaries, but it didn't offer value for you here at the end of 2023. The reasons why might help the projects be more useful to those with similar goals in the future.

woodruffw · on Dec 3, 2023

> You are creating CI/CD infrastructure to build binaries for Homebrew.

Not exactly: Homebrew already has extensive, mature infrastructure for building binary distributions (in Homebrew nomenclature, "bottles"). The point of this work is solely to offer cryptographic attestations for those bottles, namely attesting that they're being produced on Homebrew's (GitHub Actions-based) CI/CD.

Reproducibility would be nice, but it isn't a requirement for this work. Nix and Guix would possibly help with that, but adding it would be (1) a significant engineering effort, with no clearly defined borders, and (2) would not solve the actual problem at hand.

Edit: I realized the above doesn't offer a "why." The "why" for all of this is improved integrity: it adds another hurdle for an attacker looking to download and install malicious bottles through Homebrew. Additionally, it makes it easier to host a verifiable mirror of homebrew-core's bottles.

colek42 · on Dec 4, 2023

We would love for you to talk about this at one of our in-toto community meetings. Let me know if you are interested. contact info is in the comments, or feel free to stop by #in-toto on CNCF slack

password4321 · on Dec 3, 2023

Sorry my mind was just blown that this isn't already in place... this is an APT-sized (advanced persistent threat, not the package manager) hole that makes me wonder how many others are still out there.

woodruffw · on Dec 3, 2023

Code signing is the exception, not the norm, in OSS packaging ecosystems. The ones that do already have it tend to have properties that make them amenable to simple implementations, e.g. being able to pre-bake a set of trusted signers into the OS in the case of Debian.