Debian Working on Reproducible Builds To Make Binaries Trustable 130
An anonymous reader writes: Debian's Jérémy Bobbio, also known as Lunar, spoke at the Chaos Communication Camp about the distribution's efforts to reassert trustworthiness for open source binaries after it was brought into question by various intelligence agencies. Debian is "working to bring reproducible builds to all of its more than 22,000 software packages," and is pushing the rest of the community to do the same. Lunar said, "The idea is to get reasonable confidence that a given binary was indeed produced by the source. We want anyone to be able to produce identical binaries from a given source (PDF)."
Here is Lunar's overview of how this works: "First you need to get the build to output the same bytes for a given version. But others also must to be able to set up a close enough build environment with similar enough software to perform the build. And for them to set it up, this environment needs to be specified somehow. Finally, you need to think about how rebuilds are performed and how the results are checked."
Here is Lunar's overview of how this works: "First you need to get the build to output the same bytes for a given version. But others also must to be able to set up a close enough build environment with similar enough software to perform the build. And for them to set it up, this environment needs to be specified somehow. Finally, you need to think about how rebuilds are performed and how the results are checked."
Seems like a little random build size (Score:3)
Would make it harder for them to exploit.
Re: (Score:2)
But that isn't the point of this. It's how to verify that your binary doesn't have tampered with source code.
Re: (Score:3)
That's a tricky problem.
Countering "Trusting Trust" [schneier.com]
Re: (Score:3, Insightful)
Yes it is difficult. That's why they are trying to solve the problem.
Re: (Score:3, Interesting)
But that isn't the point of this. It's how to verify that your binary doesn't have tampered with source code.
I care about this, too. That's one reason I run a source-based distribution. It's not the only reason. It's not even the main reason. But it's one reason.
Anyone who really needs this kind of assurance was probably also building from source. You can do it once on-site, then make your own binary packages and push those to all of your other machines so it's really not bad. I think a much more insidious threat comes from malicious yet innocent-looking source, like what you find in the Underhanded C Con
Re: (Score:2)
I wonder, though, if an always-connected build machine could have compromised object files pushed onto it mid-build. It's a theoretical risk, but not one that couldn't be accomplished by a determined foe. The build process is well characterized for many projects, knowing when to push a rogue object file into the build directory wouldn't be that difficult.
It means the penetrating entity would need to already have access to your system, but 'object pushing' would be a useful technique for escalating securit
This seems like a job for Virtual Box (Score:2)
Wouldn't virtual box be able to do this? Then one only has the trouble of validating the virtual box. While that is actually probably harder to do you don't have to do this as often and one could use some open source virtual box equivalent and compile that from source.
On the otherhand I don't quite understand why, if one can compile the source, one needs to worry about untrusted binaries. Perhaps the intent here is for some master agency to watch for tinkered binaries or to post it's own Checksums apart
Re:This seems like a job for Virtual Box (Score:5, Insightful)
On the otherhand I don't quite understand why, if one can compile the source, one needs to worry about untrusted binaries. Perhaps the intent here is for some master agency to watch for tinkered binaries or to post it's own Checksums apart from Debian. Then everyone has two sources for validated checksums.
Almost right, except without the master agency. This isn't for the incredibly paranoid types who would already be compiling from source. This is for the rest of us, the lazy people who would rather "apt-get install foo" and just assume the distro's doing things right. If the builds are reproducible then eventually someone's going to verify them. If no variations are discovered, the rest of us lazy masses can be a lot more confident that we're not running anything unexpected.
Re: (Score:2, Informative)
From the article the issue was that the cia had found a way to own the *compiler* binaries and each program it compiled would have a vulnerability added at build time.
Re: (Score:2)
you'd have to how the Debian internal politics have changed, there are articles about that and also about how it relates to systemd. even slashdot had one
Re: (Score:2)
So... what to do then? What other distribution works well, has the large amount of packages available, is freely available (not a big deal since RH isn't going to be systemd free), and pretty much Just Works?
Re: (Score:1)
NetBSD has the pkgsrc collection, which is fairly large, and it is never going to be polluted by systemd.
Re: (Score:1)
You do know that you sound like a crazy person, don't you?
Re: (Score:2)
Re: (Score:2)
Build timestamps mess this up (Score:2, Informative)
Unless you freeze system time for the full duration of the build, every piece of code that builds in __TIME__ or __DATE__ macros, will screw with this. Other environment macros injected into the build ( like git revision etc ) as well.
Re:Build timestamps mess this up (Score:5, Informative)
Pages 6 and 7 of the PDF linked cover time-related issues and basically agree, anything that builds time/date in to the binary is a problem that needs to be fixed.
Git revision on the other hand is a recommended solution, since it points at a specific state of the code and will always be the same if the code is unchanged.
Re: (Score:1)
Why would a lot of code need to be "fixed" just because someone anally retentive wants deterministic builds?
Uh, because some anally retentive person wants deterministic builds? Did you RTFS?
Re: (Score:2)
Why would a lot of code need to be "fixed" just because someone anally retentive wants deterministic builds? If they truly care they can LD_PRELOAD fake date/time libs.
The reason for deterministic builds is to allow those of us who want to use binaries from our distros for convenience sake verify that the binary is actually built from the source it claims to be from. It only takes a few people actually doing it to confirm things are good for all of us.
Basically it lets the lazy masses gain the same level of confidence in what they're running as those who compile everything from source.
I thought this problem was solved a long time ago by the bitcoin developers w/ gitian.
Bitcoin solved it as far as they needed for their own purposes, this project aims to so
Re: Build timestamps mess this up (Score:2)
__FILE__ has a similar effect, as it refers to absolute path. Absolute build paths would also be have to be created deterministically, no temp directories or anything like that. And then, make sure everything that gets linked in statically, including the toolchain bits follow the same rules.
Re: (Score:2)
I know this is Slashdot and all, but you really should RTFA. Again that's covered and a variety of solutions are offered, but you are basically right in that doing this right requires that those things all be the same where they're used.
The tricky part here is determining in which cases those sorts of macros are actually required and thus must be worked around versus where they can be replaced with something else to achieve the same goal (replacing time/datestamped builds with git commit IDs for example) v
Re: (Score:2)
Unless you freeze system time for the full duration of the build, every piece of code that builds in __TIME__ or __DATE__ macros, will screw with this.
And I have run into this at my current employment, when checking that I had successfully selected the correct version of an archived source and reproduced the binary build. The source apparently had an instance of __DATE__ in it, and would compile differently on different days.
But the datestamps - at least in the tools I was using - were always the same leng
Re: (Score:2)
The other thing is to hack your build of the toolchain, so that __TIME__ and __DATE__ and __FILE__ could be stubbed and/or overridden by command line. Havent looked at GCC or clang codebases, but i would think it wouldnt be too hard.
Re: (Score:2)
When I needed to compare binaries, I wrote a script that would clean the source of __DATE__/__TIME__ and RCS/CVS style $Id stuff.
While the $Id was okay for source comments, it was common practice to add something like "static char *rcs_id = "$Id";" to each .c file in such a way that you'd get a bunch of these in the final binary.
This script can be run recursively on a copy of the source tree. Or, it can be done "on the fly" by having the script masquerade as gcc. Then, do the build.
This works a bit better
Awesome (Score:5, Informative)
I was thinking about this being a problem a while back - how to deal with building something from source and knowing I was getting the same output that the developers wanted me to have. Coincidentally about the same time, a href="http://developers.slashdot.org/story/13/06/20/1548228/are-you-sure-this-is-the-source-code">this article popped on Slashdot and introduced me to Ken Thompson's article Reflections on Trusting Trust [cmu.edu] - a great read and something that really opened my eyes (in that wide-open-because-of-terror kind of way).
Also from that thread came this email [stanford.edu] from one of the Tor developers talking about their deterministic build process to do the same thing.
I think this is a problem that would be really great to solve as soon as possible. I very much hope that once we start seeing more reproducible builds we don't suddenly find out that certain compilers have been compromised long ago.
Diverse double compiling (thanks dwheeler) (Score:5, Interesting)
So long as two or more independently developed, self-hosting compilers for a language exist, with at least one as publicly available source code, a Ken Thompson attack on the public-source one is infeasible. David A. Wheeler proved it [dwheeler.com]; here's the gist:
Re: (Score:3)
No he didn't prove it is infeasible. For one, that would require a method to prove that the compilers are indeed wholly independent, which hasn't been provided. Also, note that people in some sub-field of technology tend to move around. An engineer who has worked on one compiler is *more* likely to also work on another compiler at some stage than any random engineer. The DDC technique *assumes* that diverse compilers are independent - it takes it on trust. Wheeler's work if anything re-inforces the essence
Re: (Score:2)
And the end of that comment still sounds more dismissive than I wanted... Take 2:
I'm not being dismissive of DDC. Distros regularly attempting to get reproducible builds with diverse compilers will raise the bar and make attacks harder if it can be done, and additionally it will help catch bugs. However, DDC does not fully counter Thompson's attack, and it is good to remain aware of the assumptions it operates under.
I.e. could be a very nice step forward, though it is important to note the "fully countering
Re: (Score:2)
The problem solution can get messy.
Most packaging systems have a control file for each package that specifies dependencies for other packages with versions, conflicts, etc. They specify deps for stuff they care about (e.g. gtk3 version X needs cairo version Y) but they don't always specify the version of gcc et. al. that they need, because that's not important from their perspective. That is, they're happy to build with gcc 4.0.0 or 4.1.0 or whatever. Sometimes the deps are specified as "I need package X
Awesome - on trusting trust (Score:5, Interesting)
I was thinking about this being a problem a while back - how to deal with building something from source and knowing I was getting the same output that the developers wanted me to have. Coincidentally about the same time, this article [slashdot.org] popped on Slashdot and introduced me to Ken Thompson's article Reflections on Trusting Trust [cmu.edu] - a great read and something that really opened my eyes (in that wide-open-because-of-terror kind of way).
Also from that thread came this email [stanford.edu] from one of the Tor developers talking about their deterministic build process to do the same thing.
I think this is a problem that would be really great to solve as soon as possible. I very much hope that once we start seeing more reproducible builds we don't suddenly find out that certain compilers have been compromised long ago.
Re: (Score:2)
The binaries emitted are more or less guaranteed to change every time the compiler is updated.
Sure.
So you compile it with the (perhaps out-of-date) toolset, getting a match to the distributed binaries, then again with the latest-and-greatest toolset if you want to use its output rather than the standard one (and take your own chances that your new compiler revision was compromised.)
The problem being addressed is making sure that your own build environment contains the same source and makes the same binaries
Easy enough to handle trusting trust (Score:3)
Since you mentioned Reflections on Trusting Trust, that issue is easy enough to avoid. There are some simpler and more clever methods, but consider this:
Use Borland 1.0 to compile llvm.
Use this new llvm binary to compile gcc.
Chain a few more in you want to.
You don't need to trust the first compiler. It could be trojaned so as to trojan new copies of itself. You'd only be concerned if you thought that Borland 1.0 was trojaned in such a way as to add a trojan to the code of a compiler that didn't yet exis
Re: (Score:2)
Why do you think a new trojan can not infect old binaries?
The Thompson attack is what we would recognise today as a class of virus. Indeed, as Thompson's point was a general one about the unavoidable need to trust others, if one did not build every component capable of basic logical manipulation oneself, to fully counter Thompson's attack you would have to be able to counter every possible kind of virus and rootkit - and not just of the software, but also of any other firmware and microcode that might handl
Borland CDs are read only (Score:2)
> Why do you think a new trojan can not infect old binaries?
CD, and floppies with the tab set, are read-only. Unless this virus changes the physical properties of aluminum, your old Borland CD isn't going to get infected.
Re: (Score:2)
You can't run a compiler from read-only media though.
Re: (Score:2)
Of course you can, it just needs a writable working directory.
Re: (Score:2)
Perhaps I wasn't being explicit enough.
The CDROM might be read-only, but the software has to be copied into memory by something in order to run. As per Thompson's original point, it isn't sufficient to protect one piece of the system. As he stated, his attack implies that *every* programme that is involved in the handling of software must either be validated to the same level as having written it yourself OR you must invest trust:
Re: (Score:2)
Good thing there are no well-known, stable hooks in programmes to allow code to be run in a generic fashion, as part of, say, binary file formats. Oh wait...
Borland predates Linux, ELF (Score:2)
Are you under the impression that the DOS .exe files produced by Borland 1.0 are approximately compatible with Linux ELF files? Maybe you're thinking that because neither are Windows, Linux must be DOS? No, there's nothing "stable" between the two completely different formats. So the Borland compiler couldn't possibly include a trojan for an operating system that didn't yet exist, using an executable format that didn't yet exist.
Re: (Score:2)
I'm not familiar with DOS exe format. However, there must be some well-defined entry point.
Thompson's attack doesn't mean that any subversion of the Borland 1.0 compiler is limited to when the Borland 1.0 compiler was created. Thompson was making an extremely general point about security in programmable systems: You either build pretty much all of it yourself, or else you must invest trust in others.
not trusting is hard work (Score:2)
Well, since you're not familiar with either format, let me give you an analogy. Go build a mold for making intake manifolds to fit all 2040 model year cars. That's essentially equalivent to what Borland would have had to do in order to include a Linux elf trojan in the 1980s.
The Thompson paper reminds us that the normal workflow involves trusting the toolchain. It in no way indicates that we can't choose a paranoid workflow instead. One type of paranoid workflow involves validating our modern tools by u
Re: (Score:2)
Not sure what car manifolds have to do with it - argumentum ad vehiculum.
Again, you're assuming that an old toolchain can only have old attacks. That's a flawed assumption. A modern attacker can subvert your system so that old toolchains are subverted to apply further subversions.
Are there practical steps we can take to raise the bar and make such attacks much harder to execute. Sure. Can we guarantee our system is free of such subversions, without either trusting others to some degree or building the syste
explain how you rewrite the laws of physics (Score:2)
> A modern attacker can subvert your system so that old toolchains are subverted to apply further subversions.
Explain please, how you imagine the silver in a pressed Borland Turbo CD or the DOS CD it runs on, is going to get new malware added to 20 years after it was pressed.
The stock Borland Turbo and DOS disks are read-only. That means they can't be changed. I'm not sure what part of read-only you don't understand.
Re: (Score:2)
The system is subverted, e.g. command.com has been modified, so that when Borland Turbo is loaded into memory it too is subverted. Alternatively, DOS 22h is replaced with a version that checks every disk write to see if it is the beginning of a DOS executable, and if so, subverts it. Alternatively, ... etc.
There are surely many ways. Otherwise, you are arguing that DOS is not vulnerable to a broad range of all-powerful subversions, which is patently untrue.
trolling are really that dense? (Score:2)
> command.com has been modified
I'm not sure if you're just trolling or if you really, truly don't know what a CD-ROM is, what read-only means.
Before iphones - I mean before the very first iphone, and before Windows 7 or 8, you couldn't download apps. Instead, apps were made out of aluminum- metal. The metal was inside of some plastic. You had to physically walk into a store to buy your apps, and you'd walk out with these metal and plastic circles. Those circles had the apps. You couldn't change
I Don't Get It (Score:2)
I mean, c
Re: (Score:2)
Yes, but can you trust your compiler tool chain? (Score:1)
Re: (Score:1)
This only works if Debian can guarantee the integrity of the development tool chain. See this [c2.com] >30 year old talk/paper [cmu.edu] by Ken Thompson describing the problem. Once inserted, the malware is persistent and invisible. Re-compiling your compiler and applications from known-good versions doesn't help.
The problem got a lot more complicated for the attacker today... Thompsons attack works well if there are only a few architectures and only a single compiler. But the attack complexity grows exponentially in the presence of multiple architectures (that can be used to cross-compile each other) and multiple compilers (that can compile each other). Now you need a compiler virus that not only compiles on all architectures well, it also needs to detect all kind of compilers that are there and works on all versi
Compromised hardware (Score:3, Interesting)
Re:Compromised hardware (Score:5, Interesting)
A partial answer to this is to build your own CPU and system in software. Like Bochs. But you could build this virtual system on any number of other completely incompatible platforms for verification. Would be slow. But at least it would be consistent and verifiable. You couldn't use hardware virtualization for this. Would have to be completely implemented in software. And if different people implemented the same reference platform independently (using their own preferred language and programming techniques) that would add an additional layer of verification. Even the deepest NSA compromise would have a hard time completely influencing this.
Re: (Score:2)
If you're worried about compromised CPUs being used to compile executables that are used by others, then reproduceable builds are a great countermeasure. Just use reproduceable builds on many different CPUs, and compare them to ensure they are the same (for a given version of source and tools). The more variations, the less likely that there is a subversion. If what you're compiling is itself a compiler, then use diverse double-compiling (DDC) on many CPUs.
If you're worried that an INDIVIDUAL may en
Re: (Score:2)
I have to say - necessary compile environments really put me off coding projects.
Just having a Makefile is not sufficient. Even having all the right versions of prerequisite libraries isn't sufficient. Sometimes you have to patch and tweak and pass parameters and all kinds us to build the damn thing properly.
Lots of software is like this. Especially when you work on a non-standard platform, even ARM, or with certain libraries (ffmpeg! grr!).
Let's not even get into what happens when it compiles against s
Re: (Score:2)
gradle and some other stuff is basically like your suggestion.
specify repos, names and version(s)/ranges that you want and build.. in theory anyways.
Why are they not doing this already? (Score:1)
Our build package included copies of the OS install CD, install media for any tools needed, and the complete code set as text files.
You wipe the disk on the build machine
Re: (Score:2)
But i guess your binaries still had a different checksum. For example because of timestamps. So you need to analyse byte-by-byte, what are the differences and if they are unimportant. Now you get the same binaries and do not need to check anything further.
already being done in gaming machine builds... (Score:2)
The biggest headache in the process was anything that included a
Tor Project Writeup on Deterministic builds (Score:1)
Part 1 [torproject.org]
Part 2 [torproject.org]
It's been necessary since Ken Thompson's bsd hack (Score:1)
I applaud this initiative, and it may make me switch back to Debian as my OS of choice.
The trusting trust problem is a serious one, and if you can't rely on being able to build a
byte-for-byte identical unit from source, you can't really have any confidence that you're
running code that represents what the authors intended.
- I used to be a perfectionist - now I am much better; I know how to compromise.
Whats the point? (Score:1)
Alas, Debian once was the standard that other Linux distributions were measured by.
Fast, stable, dependable, reliable, secure, usable.
Not so today, since they decided to pour bucket loads of experimental garbage code into their base, that the distribution can give the Gnome Desktop user a more Windows like experience. ..
Note: Nobody uses Gnome anymore. Move on
To be fair, a Linux distribution with training wheels (systemd) could be helpful to those wishing to migrate away from M$ addiction, and have a syste
Re: (Score:1)
At least systemd doesn't have maintainers that just randomly comment out lines of code in security software based on unverified, false positives from a static analyzer.
No but it does have a bunch of assholes determined to shove it down our throats no matter what, using all sorts of politics and other "not based on technical merit" tactics to make this happen. If I ever run systemd (not likely, but possible) it sure as hell won't be because somebody else decided I should. If I wanted somebody else to decide what was on my system, I would run a commercial OS and be done with it.
I find the Wiki page on systemd to be exquisitely ironic. In the first paragraph or two it
Re: (Score:1)
The systemd guys are young and energetic.
and foolish.
Re: Soon to be assimilated (Score:2)
I think you missed the sarcasm