openSUSE Factory Achieves Bit-By-Bit Reproducible Builds (phoronix.com) 22
Michael Larabel reports via Phoronix: While Fedora 41 in late 2024 is aiming to have more reproducible package builds, openSUSE Factory has already achieved a significant milestone in bit-by-bit reproducible builds. Since last month openSUSE Factory has been producing bit-by-bit reproducible builds sans the likes of embedded signatures. OpenSUSE Tumbleweed packages for that rolling-release distribution are being verified for bit-by-bit reproducible builds. SUSE/openSUSE is still verifying all packages are yielding reproducible builds but so far it's looking like 95% or more of packages are working out. You can learn more via the openSUSE blog.
Why is this not easy? (Score:2)
What does the build process do that doesn't produce the exact same bits every time?
Re: (Score:2)
What does the build process do that doesn't produce the exact same bits every time?
I'm guessing it has to do with "sans the likes of embedded signatures".
Re: (Score:2)
I saw that, but no idea what it means.
Re: (Score:2)
I saw that, but no idea what it means.
Looks like there are some docs on this. This one [reproducible-builds.org] may be relevant.
Re: (Score:3)
Compile-time timestamps were a problem for me in the 00s and, apparently, still exist - I recall having to filter timestamps for custom fingerprints. Parallel compilation and data structures that don't guarantee ordering add to the problem and it wouldn't surprise me if some started throwing in UID's into the mix.
Re: (Score:2)
Timestamps as part of the output data in a build system are a pretty bad idea. And I thought the Linux guys were so smart.
Re: (Score:3)
Re: (Score:2)
[Embedded unique stamps.] Every place where this was done has to be stamped out.
Or perhaps just omitted.
Hashes, PGP/GPG signatures, Git, et. always contain the ENTIRE file. How about we add a special mode where something like "version:[0-9.A-Za-z](1,20)" -- if I've got my RE right -- are ignored during the hash.
This would allow a hash to be carried out on 99.99% of the file to indicate it's contents but still allow some uniqueness.
I also doubt that you could fit a malware attack into ~20 bytes of ASCII code, but that still need to be addresses.
Re: Why is this not easy? (Score:5, Informative)
1. Timestamp's
2. Full paths to the source code location (which helps a debugger find the source code)
3. Many binary formats have space, eg. Bytes 1,2,3 are used and byte 4 is unused in the header. If the space was allocated by malloc() and not zero-initialized, it will have in it whatever junk was left there before
4. Sometimes if an artifact uses other artifacts built from source, vs artifacts fetched from build-cache since they'd been built previously, it might record that
5. Metadata about the build. I mentioned timestamp. Also host name, env vars etc. they're all helpful for diagnosing/reproducing build issues.
They're all irritating things put in by well-intentioned people for helpful reasons before it was widely understood that determinism is crucial in build systems.
Re: (Score:2)
Re: (Score:2)
Think of something like Linux. When you boot it up, it prints a banner, which contains the version and timestamp and who built the kernel. (Linux 6.7.1 built on date by blah). That's a timestamp - it's handy during development because hey, the version number might stay the same, but the timestamp gives you a rough idea of where to look at what changes it might have. But they're murder on reproducible builds.
Another one is if you're doing parallel builds - the build server may have 20+ cores on it to make bu
Re: (Score:2)
You can feed in the time stamp to use for the current date everywhere.
I'm sure the parallelization issues have a simple solution, too. But I see now there are some challenges. I'm just surprised these haven't been fixed a long time ago.
Re: (Score:1)
Good answer, thanks!
Re: Why is this not easy? (Score:2)
If you build on a multi core system, it might matter which files finish compiling in which order. And some compilers are just non-deterministic. I used a compiler where the amount of optimisations was dependent on available memory.
Re: (Score:1)
nixos? (Score:1)
So, what Nixos does already?
NixOS!?!? (Score:2)
Funnily enough, from https://www.phoronix.com/forum... [phoronix.com]
"Originally posted by Kjell View Post
Fantastic news for security!
Are there any other distributions which currently offer reproducible builds?
NixOS does. You can look at the reproducible-builds.org site for more informations."
I went looking for 'how', but stumbled on this answer and thought I'd share it.
For anyone else curious 'how'... (Score:2)
https://reproducible-builds.or... [reproducible-builds.org] (pointed at by a comment on the Phoronix comments section)
"First, the build system needs to be made entirely deterministic: transforming a given source must always create the same result. For example, the current date and time must not be recorded and output always has to be written in the same order.
Second, the set of tools used to perform the build and more generally the build environment should either be recorded or pre-defined.
Third, users should be given a way to recre
Why is this useful? (Score:2)
Apart from the y'all-watch-this factor, security-wise it seems it'd be more useful to have a completely different, randomised build each time so attackers can't target a monoculture binary image.
In terms of "you can use it to verify source to binary equivalence", you're already relying entirely on trusting the developers to not do anything malicious, so what advantage is there to a reproducible build vs. downloading a signed binary? And for it to work you need signed source code and a signed attestation th
Re: (Score:1)