Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
SuSE Open Source Operating Systems Build Linux

openSUSE Factory Achieves Bit-By-Bit Reproducible Builds (phoronix.com) 22

Michael Larabel reports via Phoronix: While Fedora 41 in late 2024 is aiming to have more reproducible package builds, openSUSE Factory has already achieved a significant milestone in bit-by-bit reproducible builds. Since last month openSUSE Factory has been producing bit-by-bit reproducible builds sans the likes of embedded signatures. OpenSUSE Tumbleweed packages for that rolling-release distribution are being verified for bit-by-bit reproducible builds. SUSE/openSUSE is still verifying all packages are yielding reproducible builds but so far it's looking like 95% or more of packages are working out. You can learn more via the openSUSE blog.
This discussion has been archived. No new comments can be posted.

openSUSE Factory Achieves Bit-By-Bit Reproducible Builds

Comments Filter:
  • What does the build process do that doesn't produce the exact same bits every time?

    • What does the build process do that doesn't produce the exact same bits every time?

      I'm guessing it has to do with "sans the likes of embedded signatures".

    • Compile-time timestamps were a problem for me in the 00s and, apparently, still exist - I recall having to filter timestamps for custom fingerprints. Parallel compilation and data structures that don't guarantee ordering add to the problem and it wouldn't surprise me if some started throwing in UID's into the mix.

      • Timestamps as part of the output data in a build system are a pretty bad idea. And I thought the Linux guys were so smart.

    • Many older projects embed things like time stamps or hashes or even a one-up counter to help align binaries with source control for debugging purposes and to give some sort of guaranteed serialization and uniqueness. Many people consider putting the serial build number in the *.*._ slot of the resulting library a best practice. Every place where this was done has to be stamped out.
      • [Embedded unique stamps.] Every place where this was done has to be stamped out.

        Or perhaps just omitted.

        Hashes, PGP/GPG signatures, Git, et. always contain the ENTIRE file. How about we add a special mode where something like "version:[0-9.A-Za-z](1,20)" -- if I've got my RE right -- are ignored during the hash.

        This would allow a hash to be carried out on 99.99% of the file to indicate it's contents but still allow some uniqueness.

        I also doubt that you could fit a malware attack into ~20 bytes of ASCII code, but that still need to be addresses.

    • by ljw1004 ( 764174 ) on Saturday April 20, 2024 @11:58AM (#64410442)

      1. Timestamp's
      2. Full paths to the source code location (which helps a debugger find the source code)
      3. Many binary formats have space, eg. Bytes 1,2,3 are used and byte 4 is unused in the header. If the space was allocated by malloc() and not zero-initialized, it will have in it whatever junk was left there before
      4. Sometimes if an artifact uses other artifacts built from source, vs artifacts fetched from build-cache since they'd been built previously, it might record that
      5. Metadata about the build. I mentioned timestamp. Also host name, env vars etc. they're all helpful for diagnosing/reproducing build issues.

      They're all irritating things put in by well-intentioned people for helpful reasons before it was widely understood that determinism is crucial in build systems.

      • Can you explain why determinism is crucial? Not all systems are the same, nor should they be. I can see an argument that timestamps and source paths should remain embedded. From a security point of view, you won't be able to simply hash the binary file directly, you'll have to know what the file format is so that you can mask out the bits. But so what? Security practices should adapt to the developer needs, not the other way around
      • by tlhIngan ( 30335 )

        Think of something like Linux. When you boot it up, it prints a banner, which contains the version and timestamp and who built the kernel. (Linux 6.7.1 built on date by blah). That's a timestamp - it's handy during development because hey, the version number might stay the same, but the timestamp gives you a rough idea of where to look at what changes it might have. But they're murder on reproducible builds.

        Another one is if you're doing parallel builds - the build server may have 20+ cores on it to make bu

        • You can feed in the time stamp to use for the current date everywhere.

          I'm sure the parallelization issues have a simple solution, too. But I see now there are some challenges. I'm just surprised these haven't been fixed a long time ago.

      • Good answer, thanks!

    • In C or C++, use the FILE and DATE macros. FILE includes the complete path, so something built on my machine is not the same as something built on your machine. And DATE is different if you re-compile in ten minutes.

      If you build on a multi core system, it might matter which files finish compiling in which order. And some compilers are just non-deterministic. I used a compiler where the amount of optimisations was dependent on available memory.
    • There are approximately 10 causes of nondeterminism that you can see in https://reproducible-builds.or... [reproducible-builds.org] or https://github.com/bmwiedemann... [github.com] In openSUSE, the hardest was the mtime values stored in rpm headers. Every build had new mtimes - until last month.
  • So, what Nixos does already?

    • Funnily enough, from https://www.phoronix.com/forum... [phoronix.com]

      "Originally posted by Kjell View Post
      Fantastic news for security!

      Are there any other distributions which currently offer reproducible builds?

      NixOS does. You can look at the reproducible-builds.org site for more informations."

      I went looking for 'how', but stumbled on this answer and thought I'd share it.

  • https://reproducible-builds.or... [reproducible-builds.org] (pointed at by a comment on the Phoronix comments section)

    "First, the build system needs to be made entirely deterministic: transforming a given source must always create the same result. For example, the current date and time must not be recorded and output always has to be written in the same order.

    Second, the set of tools used to perform the build and more generally the build environment should either be recorded or pre-defined.

    Third, users should be given a way to recre

  • Apart from the y'all-watch-this factor, security-wise it seems it'd be more useful to have a completely different, randomised build each time so attackers can't target a monoculture binary image.

    In terms of "you can use it to verify source to binary equivalence", you're already relying entirely on trusting the developers to not do anything malicious, so what advantage is there to a reproducible build vs. downloading a signed binary? And for it to work you need signed source code and a signed attestation th

    • by cen1 ( 2915315 )
      For Linux distros not much.. some kind of supply chain attack where an attacker has compromised the build process and injects malicious code into the final binary. Would not help with xz style attack because the source was compromised in that case. For the proprietary software, the benefit would be that you could check what the vendor provided binary matches the actual source code.

My mother is a fish. - William Faulkner

Working...