Torvalds Warns the World: Don't Use the Linux 5.12-rc1 Kernel (arstechnica.com) 124
"In a message to the Linux Kernel Mailing List Wednesday, founding developer Linus Torvalds warned the world not to use the 5.12-rc1 kernel in his public git tree..." writes Ars Technica:
As it turns out, when Linus Torvalds flags some code dontuse, he really means it — the problem with this 5.12 release candidate broke swapfile handling in a very unpleasant way. Specifically, the updated code would lose the proper offset pointing to the beginning of the swapfile. Again, in Torvalds' own words, "swapping still happened, but it happened to the wrong part of the filesystem, with the obvious catastrophic end results."
If your imagination is insufficient, this means that when the kernel paged contents of memory out to disk, the data would land on random parts of the same disk and partition the swapfile lived on... not as files, mind you, but as garbage spewed directly to raw sectors on the disk. This means overwriting not only data in existing files, but also rather large chunks of metadata whose corruption would likely render the entire filesystem unmountable and unusable.
Torvalds goes on to point out that if you aren't using swap at all, this problem wouldn't bite you. And if you're using swap partitions, rather than swap files, you'd be similarly unaffected...
Torvalds also advised anyone who'd already pulled his git tree to do a git tag -d v5.12-rc1 "to actually get rid of the original tag name..." — or at least, to not use it for anything.
"I want everybody to be aware..." Torvalds writes, "because _if_ it bites you, it bites you hard, and you can end up with a filesystem that is essentially overwritten by random swap data. This is what we in the industry call 'double ungood'."
If your imagination is insufficient, this means that when the kernel paged contents of memory out to disk, the data would land on random parts of the same disk and partition the swapfile lived on... not as files, mind you, but as garbage spewed directly to raw sectors on the disk. This means overwriting not only data in existing files, but also rather large chunks of metadata whose corruption would likely render the entire filesystem unmountable and unusable.
Torvalds goes on to point out that if you aren't using swap at all, this problem wouldn't bite you. And if you're using swap partitions, rather than swap files, you'd be similarly unaffected...
Torvalds also advised anyone who'd already pulled his git tree to do a git tag -d v5.12-rc1 "to actually get rid of the original tag name..." — or at least, to not use it for anything.
"I want everybody to be aware..." Torvalds writes, "because _if_ it bites you, it bites you hard, and you can end up with a filesystem that is essentially overwritten by random swap data. This is what we in the industry call 'double ungood'."
How (Score:2)
Re: (Score:2)
This was a release candidate, so technically still in the testing phase. But what makes you think an automated test would catch this?
RC1 means the start of 6-10 weeks of testing (Score:5, Informative)
After RC1 is when testing of the whole thing *starts*.
RC1 is the point when they stop adding new few features and start testing and fixing bugs.
So when you say "technically still in the testing phase", to be precise RC1 is the last thing that happens BEFORE testing. (Testing of the whole, as opposed to individual contributors testing individual commits).
More info in the process can be found here:
https://www.kernel.org/doc/htm... [kernel.org]
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Its a release candidate. Its literally there in the name that its still in testing.
The Linux kernel might well be the most complicated piece of software in the world. No single automated testing regime will catch everything, because no one person would know all the angles to test for.
So they do alpha , beta, and release candidate testing, so that errors can be found. OBVIOUSLY you don't use RC kernels anywhere losing data could cause problems,
I don't get how anyone in the IT industry wouldn't understand tha
Re: (Score:2)
Its a release candidate. Its literally there in the name that its still in testing.
The trees are blocking your view of the forest. Again, "candidate", candidate means we think this might be done. How can one think it might be done without internal testing, regression testing, etc. A release candidate is a serious step that follows alpha testing (private) and beta testing (public).
The Linux kernel might well be the most complicated piece of software in the world. No single automated testing regime will catch everything ...
Again, this was a virtual memory failure. Something far more severe than your normal errant code. Something that can corrupt memory, anywhere, anytime, and without interference from any security, sandboxes, etc.
RC1 means "let's start testing" (Score:2)
> Before a public testing there is internal testing, there is regression testing, etc.
All Linux kernel development is public.
The kernel isn't proprietary software that is tested "internally" before it's shown publicly. There IS no "internal"; development is public.
> The "candidate" in "release candidate" means "we tested this and think it is good".
In the kernel process, RC1 means "here's what we need to start testing". The RC1 tag indicates it's time to START testing.
https://www.kernel.org/doc/htm [kernel.org]
Re: (Score:3)
It'd be hard to specifically test for this unless you already knew it was happening. It's a silent bug until it isn't.
Re: How (Score:1)
Hard?
It's a freakin bounds check! The most obvious thing to check for literally in programming history! Not even a part of testing, but of the runtime code.
Now of course, if the bounds are already set wrong, that won't help.
But in that case you do a sanity check on the input for the bounds setter. Another one of the oldest checks ever thought of.
Now of course the sanity checker and the bounds setter could both be broken. But that is already statistically way more unlikely. And newer code on one side could b
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
Because automated testing isn't the only software development strategy, and not even the best strategy. To be more precise, if you think your code is good because it passes all the automated tests, you're wrong.
Re: (Score:2)
Two more strawmen, and you'd have the illusive phantom five arguments you're looking for.
Re: (Score:2)
It'd be hard to specifically test for this unless you already knew it was happening. It's a silent bug until it isn't.
You realize that broken virtual memory creates a general sort of instability. That anything can crash because memory was just massively corrupted?
Re: (Score:2)
Re: (Score:2)
But this isn't a crash. This is about writing things to wrong parts of the disk. You can definitely test for that, because you'd just use some VM disk emulation.
The crash comes when you read those wrong things back into RAM. Potentially anyone's RAM, the kernel, a driver, a utility, an app. Anything can crash when its RAM is corrupted.
Re: (Score:2)
No, the problem is WRITES go to the wrong spot on the disk. You corrupt the disk.
The problem is you're not likely going to detect this unless you have a reasonably full filesystem with integrity checks and a machine under heavy memory pressure. And even so, you probably won't see the problem until you reboot and discover the disk is unmountable and corrupted.
The
Re: (Score:3)
Re:How (Score:5, Informative)
> Only installs using swap files (instead of partitions) are affected. Who's doing that?
People with cheap VPS's with insufficient RAM.
Re: (Score:2)
still, why wouldn't you use a swap PARTITION?
Re: How (Score:5, Informative)
Re: How (Score:2)
Re: (Score:2)
Re: (Score:2)
well it didn't get merged into Ubuntu, isn't it? It's not even released, so...
Re: (Score:2)
You might not use it, but (this may shock you, but it doesn't shock me) you are not me, and I do use it. I know that you think you are just a perfectly average user - how could you think otherwise, unless you were actually studying user variability- but you're not. You've a 38% chance of being more than one standard deviation from the average, on any meas
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Considering it ended up being a file system corruption bug, I'd advise looking at the massive amounts of unit and integration testing ZFS does. This is why I keep saying they're at least 10 years ahead of anyone else in the storage space. They attempt to catch every conceivable scenario where something might go wrong, but beyond that, they have tests that artificially creates those scenarios to test their error handling and reporting. Its not just about testing the positive, its also about testing the negat
Re: How (Score:2, Informative)
It only affects swap files. Most people use swap partitions on Linux. So nobody noticed.
Re: How (Score:2)
Re: (Score:2)
Re: (Score:3, Insightful)
You're supposed to test all cases,
Well, yes, but it's impossible to do that even for relatively simple software
Thats the very definition of engineering.
Yes, it is. But there's no such thing as software "engineering" unless you're using MISRA-C and building safety-of-life systems. Do that, and you'll see some engineering. A bunch of hackers around the world throwing code into the Linux kernel meets no definition of engineering that I've ever seen.
Re: (Score:2)
Re: (Score:2)
I'd argue that it is a rare use case - swap partitions are the default, or no swap at all given the large amount of RAM many have nowadays.
Re: (Score:2)
Re: (Score:2)
[citation needed]
"Rare" is a relative qualifier. A thousand machines out of a million is rare, a thousand out of two thousand is common.
Re: (Score:2)
Yeah, I know, regression test much?
Re: (Score:2)
It's not released, that's the point. It got fixed BEFORE 5.12 is released.
Re: (Score:2)
Re: (Score:2)
you understand that you can't test every possible case with every possible Linux commit, right?
Re: (Score:2)
Re: (Score:2)
Of course. It just isn't realistic to test every possible use case of Linux for every commit. But what matters is this bug got catched and fixed BEFORE the release.
Re: (Score:2)
Re: (Score:2)
Oh I get it fully. I just don't think it is realistic to expect that no bug will ever be committed to Linux. And the whole RC phase exists for a reason and that it to test and fix bugs before a release.
If you expect such reliability you shouldn't be using Linux, BSD, Windows or Mac OS. All of them are way too complex so that no bug ever get committed.
There's no private branch for the kernel (Score:4, Informative)
> A private branch of the code is made, it is automatically compiled and run through unit tests.
In the case of the Linux kernel, the test branch is public, not private. Individual commits can be tested privately by individual contributors of course. The combined whole is tested out in the open. Testing starts when Linus tags a branch RC1. Which is what this was, an RC1, meaning a branch for which testing could begin because no new features would be merged, only fixes.
More info in the process can be found here:
https://www.kernel.org/doc/htm... [kernel.org]
In other words (Score:2)
They've incorporated gshred into the kernel.
OK, fine. This stuff happens ocassionally (Score:1)
Re: (Score:2)
I'm wondering why anybody is messing around with such mature low-level code in the kernel that's been there since the start... 30 years old... why start messing with it now?
Re: (Score:2)
swap files (Score:2)
And if you're using swap partitions, rather than swap files, you'd be similarly unaffected...
Why would anyone use swap files? Every single article written after... 1820 or something, clearly explains that swap files are stupid, don't use them, use swap partitions or you're a big monkey.
Re: swap files (Score:2)
Re: (Score:2)
Is there a way I can check whether my swap partition has ever been used? I'm wondering if it's time to ditch it all together.
Re: (Score:2)
Re: swap files (Score:2)
Re: (Score:2)
Re: swap files (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: swap files (Score:2)
Re: (Score:2)
I don't know how this could be done with a swap partition.
dd is your friend :-)
I love the smell of "Release Candidate" in the .. (Score:2)
morning. But it smells like Beta-Poo(p) .. .. just mentioning that Linus and I have a very different view on the term "Release Candidate" and I would like to express my "gratitude":
"Stop doing stupid things".
And btw. yes using an RC for productive work .. I call equally stupid.
But hey does anybody remember 2.4.0, 2.4.1, 2.4.2, 2.4.3 - ReiserFS (a.k.a. "LifeSentenceFS"), "JFS" and "XFS" all your base belongs to /dev/zero bugs.
Re:I love the smell of "Release Candidate" in the (Score:4, Informative)
Your different view of the term "Release Candidate" just means that YOU inflated the concept to mean what "Release" means.
A candidate is still of finding bugs. Whether you like that or not.
You should use releases. Not release candidates. Clearly.
Re: (Score:2)
Perhaps the Linux release cycle uses a bullshit and missleading naming scheme .. I get that "inflation"
But Linus saying: "Don't Use the Linux 5.12-rc1 Kernel"
Some people just don't seem to agree with your last sentence and I mean when he needs to mention that he is fully aware of the use of a bullshit naming scheme which makes people think different.
Btw. I for one am very conscious that I don't use release candidates for my daily work.
Re: (Score:2)
Might be, but when Linus is telling people "Don't Use the Linux 5.12-rc1 Kernel"
And you wrote that "nobody" would do that, doesn't that seem contradictory too you? Like in a split personality sense?
But hey if this makes you happy living with a split personality and making a fool of yourself .. it is very entertaining.
What is the problem here? (Score:2)
I fail to see the problem here. There was a bug. It was discovered before release. I assume it has been fixed.
Just as it should be.
Nobody runs (or should run) an RC on a production system or their daily driver. They run it in order to look for problems, or lack thereof, expecting it to possibly break catastrophically.
Am I missing a point here?
Re: (Score:2)
What the fuck is wrong with people that things must be a "surprise" for it to be reported on? Something happened that doesn't normally happen and the details of which would interest people.
Re: (Score:2)
My point is that I just don't see why people get upset about such a non-event. Bugs happen and get discovered before release all the time.
If the bug had made it to a release, *that* would have been something to get upset about.
Re: (Score:2)
My point is that I just don't see why people get upset
No one got upset. People reporting on it is not being "upset".
Bugs happen and get discovered before release all the time.
Not enough to mark an RC dontuse. That's what made this event more notable. It's literally in the summary.
Re: (Score:2)
> Am I missing a point here?
Yeah, EditorDavid just spewing more clickbait. Basically ignore that moron.
You've summarized it perfectly. Nothing to see, move along.
Re: (Score:2)
Or creative (Score:2)
Re: (Score:2)
He enjoys playing with language. Perhaps that term tickled his fancy over carefully phrased cussing. Either that or he wants to reserve the swear jar for NVIDIA.
The important thing is that everyone was nice, and that the community was inclusive and followed the code of conduct. That's the most important thing.
Re: (Score:3)
Re: (Score:2)
Linus swearing at people would have done nothing to make that announcement clearer. Clear communication for a disastrous bug is the most important thing and would have been hindered with a sweary rant. As it stands, it got to the explanation quickly, and then succintly explained the problem. Then the news of it spread around just as quickly, and the important information was reported as such, rather than it being the case of "Linus has fired off yet another nasty rant" that it would have been.
It's kinda cute that you took me so seriously. I gotta remember that sarcasm tag
Re: (Score:2)
Re: (Score:2)
I've been caught out with sarcasm/irony a few times today. It really didn't help that in recent weeks people have been trying to defend the indefensible by saying things like what you did but with no intention of irony.
No prob - I slip into terrible sarcasm on occasion, and forget the tag.
Perhaps we should be telling Linus... (Score:3, Insightful)
Linus, SHUT THE FUCK UP!
It's a bug alright - in the kernel. How long have you been a maintainer? And you *still* haven't learnt the first rule of kernel maintenance?
If a change results in user programs breaking, it's a bug in the kernel. We never EVER blame the user programs. How hard can this be to understand?
Shut up, Linus. And I don't _ever_ want to hear that kind of obvious garbage and idiocy from a kernel maintainer again. Seriously.
WE DO NOT BREAK USERSPACE!
Seriously. How hard is this rule to understa
Re: (Score:2)
Re: (Score:2)
To me it should have been modded funny. Where did people's sense of humor go ?
In anycase glad this was modded up
Re: Perhaps we should be telling Linus... (Score:2)
Re: (Score:2)
Re: Perhaps we should be telling Linus... (Score:3)
Re: (Score:2)
Re: (Score:2)
It's not as funny when you have to explain it. I guess there are a lot of people who don't recognize my post.
For those who don't know, my "Linus, SHUT THE FUCK UP" comment was a word-for-word copy (with the name changed) of one of Linus's earlier and more famous rants [slashdot.org] at a poor hapless kernel developer. He had been doing that privately for years, but that was one of the first times he did it on a public mailing list and it made news. It actually got a lot worse since then, but that was one of the most fam
Re: (Score:2)
Only child rapists don't swear.
The only thing worse than pseudo-psychology is pseudo-psychology from nerds who think they're above the need to make logic and evidence-backed statements.
You're a nerd! So everything you say that seems to link A to B must be right! Not like the non-nerd rabble who use the same ridiculous reasoning but don't have the nerd cred to get a free pass on logic and fact!
Re: (Score:1)
Is that the new "kick the dog syndrome?"
Re: (Score:2)
I don’t know about all that, but apparently people who cuss are smarter, motherfucker.
https://thelanguagenerds.com/s... [thelanguagenerds.com]
Re:double ungood (Score:4, Funny)
Still better than double plus ungood.
Re: (Score:3)
No, this is what you get when the development process occurs in the open and it's possible to view the work-in-progress code.
Microsoft only ever release certain insider builds, the daily builds that are used internally never see the light of day and some of those internal builds may have catastrophic bugs, but it doesn't matter because it's all a part of the normal development process. The most catastrophic bugs are detected quickly and fixed, what you have to worry about are subtle bugs that make it throug
Re: (Score:2)
Besides, Windows 10 has had more than 1 bug that would corrupt data at random. I'm pretty sure they were in "final release" (if W10 can even be said to have that) and without any choice to opt out, too.
Re: (Score:1)
Re: (Score:2)
I'm not sure of the technical details. I believe the word used when it was reported here on Slashdot was "deleted", but in any case, your data is gone.
Yes, Windows used to be better. Linux used to be better too. Linux 2021 is still better than Windows 2021.
Re: (Score:2)
Re: (Score:2)