Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Bug Open Source Linux

Torvalds Warns the World: Don't Use the Linux 5.12-rc1 Kernel (arstechnica.com) 124

"In a message to the Linux Kernel Mailing List Wednesday, founding developer Linus Torvalds warned the world not to use the 5.12-rc1 kernel in his public git tree..." writes Ars Technica: As it turns out, when Linus Torvalds flags some code dontuse, he really means it — the problem with this 5.12 release candidate broke swapfile handling in a very unpleasant way. Specifically, the updated code would lose the proper offset pointing to the beginning of the swapfile. Again, in Torvalds' own words, "swapping still happened, but it happened to the wrong part of the filesystem, with the obvious catastrophic end results."

If your imagination is insufficient, this means that when the kernel paged contents of memory out to disk, the data would land on random parts of the same disk and partition the swapfile lived on... not as files, mind you, but as garbage spewed directly to raw sectors on the disk. This means overwriting not only data in existing files, but also rather large chunks of metadata whose corruption would likely render the entire filesystem unmountable and unusable.

Torvalds goes on to point out that if you aren't using swap at all, this problem wouldn't bite you. And if you're using swap partitions, rather than swap files, you'd be similarly unaffected...

Torvalds also advised anyone who'd already pulled his git tree to do a git tag -d v5.12-rc1 "to actually get rid of the original tag name..." — or at least, to not use it for anything.

"I want everybody to be aware..." Torvalds writes, "because _if_ it bites you, it bites you hard, and you can end up with a filesystem that is essentially overwritten by random swap data. This is what we in the industry call 'double ungood'."
This discussion has been archived. No new comments can be posted.

Torvalds Warns the World: Don't Use the Linux 5.12-rc1 Kernel

Comments Filter:
  • How do they test each linux release ? It would appear from this they dont have much or are missing automated tests.
    • This was a release candidate, so technically still in the testing phase. But what makes you think an automated test would catch this?

      • by raymorris ( 2726007 ) on Saturday March 06, 2021 @11:30PM (#61131994) Journal

        After RC1 is when testing of the whole thing *starts*.
        RC1 is the point when they stop adding new few features and start testing and fixing bugs.

        So when you say "technically still in the testing phase", to be precise RC1 is the last thing that happens BEFORE testing. (Testing of the whole, as opposed to individual contributors testing individual commits).

        More info in the process can be found here:
        https://www.kernel.org/doc/htm... [kernel.org]

    • It'd be hard to specifically test for this unless you already knew it was happening. It's a silent bug until it isn't.

      • Hard?

        It's a freakin bounds check! The most obvious thing to check for literally in programming history! Not even a part of testing, but of the runtime code.

        Now of course, if the bounds are already set wrong, that won't help.

        But in that case you do a sanity check on the input for the bounds setter. Another one of the oldest checks ever thought of.

        Now of course the sanity checker and the bounds setter could both be broken. But that is already statistically way more unlikely. And newer code on one side could b

        • It's not a bounds check and would not be found by any sanity checker in the world. It was a wrongly calculated offset that meant that the swapfile would be overwritten at the wrong place.
      • You don't have to specifically test for this. You just needed to have a regression test to test swapfiles. It's not unreasonable to ask why there weren't swapfile regression tests that verified that the data goes where it's supposed to and not where it's not supposed to.
        • Because automated testing isn't the only software development strategy, and not even the best strategy. To be more precise, if you think your code is good because it passes all the automated tests, you're wrong.

          • No one said it was the only strategy. No one said it was the best strategy. No one said if it passes all the tests that it must be good.

            Two more strawmen, and you'd have the illusive phantom five arguments you're looking for.
      • by drnb ( 2434720 )

        It'd be hard to specifically test for this unless you already knew it was happening. It's a silent bug until it isn't.

        You realize that broken virtual memory creates a general sort of instability. That anything can crash because memory was just massively corrupted?

        • But this isn't a crash. This is about writing things to wrong parts of the disk. You can definitely test for that, because you'd just use some VM disk emulation.
          • by drnb ( 2434720 )

            But this isn't a crash. This is about writing things to wrong parts of the disk. You can definitely test for that, because you'd just use some VM disk emulation.

            The crash comes when you read those wrong things back into RAM. Potentially anyone's RAM, the kernel, a driver, a utility, an app. Anything can crash when its RAM is corrupted.

        • by tlhIngan ( 30335 )

          You realize that broken virtual memory creates a general sort of instability. That anything can crash because memory was just massively corrupted?

          No, the problem is WRITES go to the wrong spot on the disk. You corrupt the disk.

          The problem is you're not likely going to detect this unless you have a reasonably full filesystem with integrity checks and a machine under heavy memory pressure. And even so, you probably won't see the problem until you reboot and discover the disk is unmountable and corrupted.

          The

    • Only installs using swap files (instead of partitions) are affected. Who's doing that? The lesser the use, the harder the testing.
      • Re:How (Score:5, Informative)

        by bill_mcgonigle ( 4333 ) * on Saturday March 06, 2021 @06:34PM (#61131594) Homepage Journal

        > Only installs using swap files (instead of partitions) are affected. Who's doing that?

        People with cheap VPS's with insufficient RAM.

      • Re: How (Score:5, Informative)

        by Steventhompsons ( 7594066 ) on Saturday March 06, 2021 @06:47PM (#61131620)
        It's Ubuntus default
      • Sorry, but I've still not seen a good reason to eschew building a system with a swap partition (which, per TFS, wouldn't actually have much of a problem).

        You might not use it, but (this may shock you, but it doesn't shock me) you are not me, and I do use it. I know that you think you are just a perfectly average user - how could you think otherwise, unless you were actually studying user variability- but you're not. You've a 38% chance of being more than one standard deviation from the average, on any meas

        • Actually I don't either :) Have plenty of memory, no need to harass the SSD more than necessary!
          • Not harassing the SSD, with it's limited write cycles, would be a fair reason for moving the swap partition to a spinning rust device. It doesn't actually address the question of whether to have Sometime this year I might actually get an SSD, but I'll have my actual data on spinning rust, out on my network somewhere.
        • by tlhIngan ( 30335 )

          Sorry, but I've still not seen a good reason to eschew building a system with a swap partition (which, per TFS, wouldn't actually have much of a problem).

          You might not use it, but (this may shock you, but it doesn't shock me) you are not me, and I do use it. I know that you think you are just a perfectly average user - how could you think otherwise, unless you were actually studying user variability- but you're not. You've a 38% chance of being more than one standard deviation from the average, on any measu

          • That would be a way of managing things. Indeed, it is the sort of thing that might persuade me to re-evaluate how I partition things on my own system. Systems, if I build or get another. But I spent long enough working out how to do things between a couple of 40MB MFM drives and booting Slackware from floppies, and I think it'll take a lot to persuade me to do things differently.
    • by darkain ( 749283 )

      Considering it ended up being a file system corruption bug, I'd advise looking at the massive amounts of unit and integration testing ZFS does. This is why I keep saying they're at least 10 years ahead of anyone else in the storage space. They attempt to catch every conceivable scenario where something might go wrong, but beyond that, they have tests that artificially creates those scenarios to test their error handling and reporting. Its not just about testing the positive, its also about testing the negat

    • Re: How (Score:2, Informative)

      by fred6666 ( 4718031 )

      It only affects swap files. Most people use swap partitions on Linux. So nobody noticed.

      • Only Ubuntu default out of the box....
      • THats not how testing is supposed to be done. Your supposed to test all cases, not just the happy case. Cars are tested for how they perform in crashes, even though crashes rarely happen compared to successful trips. Same for planes, trains, medicines and so on. Thats the very definition of engineering.
        • Re: (Score:3, Insightful)

          You're supposed to test all cases,

          Well, yes, but it's impossible to do that even for relatively simple software

          Thats the very definition of engineering.

          Yes, it is. But there's no such thing as software "engineering" unless you're using MISRA-C and building safety-of-life systems. Do that, and you'll see some engineering. A bunch of hackers around the world throwing code into the Linux kernel meets no definition of engineering that I've ever seen.

          • Its not impossible, surely testing the functionality of a swap file working is not a rare usecase it should be a core feature thats tested.
            • by jwdb ( 526327 )

              I'd argue that it is a rare use case - swap partitions are the default, or no swap at all given the large amount of RAM many have nowadays.

              • Still in a world of millions of machines, its not rare, its still a lot.
                • by jwdb ( 526327 )

                  [citation needed]

                  "Rare" is a relative qualifier. A thousand machines out of a million is rare, a thousand out of two thousand is common.

    • by kriston ( 7886 )

      Yeah, I know, regression test much?

    • It's not released, that's the point. It got fixed BEFORE 5.12 is released.

      • It doesnt matter if its released, every commit should be tested always. If you dont people forget and untested stuff gets into production.
        • you understand that you can't test every possible case with every possible Linux commit, right?

          • This problem isnt about version of Linux its a feature that the kernal supports and as such should be tested.
            • Of course. It just isn't realistic to test every possible use case of Linux for every commit. But what matters is this bug got catched and fixed BEFORE the release.

              • You just dont get it, it shouldnt happen because it should be tested before getting merged. Its pure luck it got discovered, thatsnot good engineering practice.
                • Oh I get it fully. I just don't think it is realistic to expect that no bug will ever be committed to Linux. And the whole RC phase exists for a reason and that it to test and fix bugs before a release.

                  If you expect such reliability you shouldn't be using Linux, BSD, Windows or Mac OS. All of them are way too complex so that no bug ever get committed.

  • They've incorporated gshred into the kernel.

  • But still, pretty harsh, especially for rc.
    • I'm wondering why anybody is messing around with such mature low-level code in the kernel that's been there since the start... 30 years old... why start messing with it now?

  • Comment removed based on user account deletion
  • by Tom ( 822 )

    And if you're using swap partitions, rather than swap files, you'd be similarly unaffected...

    Why would anyone use swap files? Every single article written after... 1820 or something, clearly explains that swap files are stupid, don't use them, use swap partitions or you're a big monkey.

    • Reading some of the comments here, apparently Ubuntu does but default. Me, I just have lots of RAM, so even my swap partition typically doesn't get used...
      • by dohzer ( 867770 )

        Is there a way I can check whether my swap partition has ever been used? I'm wondering if it's time to ditch it all together.

        • Ditch it and see if you run out of RAM (desktop). Or there are tools out there which will alert you at a certain swap percentage. You could set it for >0%! But I'm pretty sure you swap is getting used. If you have RAM pages that are not really used, they sometimes get swapped out so that those pages can be used to cache commonly-accessed files since that will yield better performance than keeping unused pages in memory.
        • Run the command "free", it will tell you straightaway. If you have no swap usage, this means adding RAM to your system will not have any benefit.
      • Having lots of RAM doesn't mean that your swap won't get used. I'm not super-familiar with the Linux swap strategy but many OSs will write out pages pre-emptively during idle time. If CPU and disk are quiet, it doesn't hurt to pre-write some pages to disk. That way if a large chunk of memory is needed quickly, it can be allocated without a write. Just use the pages that have already been written to disk. In that case, this bug would affect users even if they aren't anywhere near using all of their RAM.
        • True, but "free" often tells me that no swap has been used. Obviously, as soon as you suspend to disk, your swap gets used. I typically suspend to RAM and won't see any swap get used for ages...
          • Yeah but you would be impacted by this bug which was my original point. Even if there is no memory pressure, the swapper will be writing pages to disk and, if you have a kernel with this bug, data will be corrupted!
          • I just reread your post so now I'm replying twice. If all pages written to swap are still available in memory, its not clear that "free" would report that as swap usage. Maybe it would. But it doesn't mean you are exempt from this bug.
            • Since I'm using a swap partition and not a swap file, I'm exempt in any case. Aside from having run an RC kernel for the last time over 15 years ago...
  • morning. But it smells like Beta-Poo(p) .. .. just mentioning that Linus and I have a very different view on the term "Release Candidate" and I would like to express my "gratitude":
    "Stop doing stupid things".

    And btw. yes using an RC for productive work .. I call equally stupid.

    But hey does anybody remember 2.4.0, 2.4.1, 2.4.2, 2.4.3 - ReiserFS (a.k.a. "LifeSentenceFS"), "JFS" and "XFS" all your base belongs to /dev/zero bugs.

    • by freax ( 80371 ) on Sunday March 07, 2021 @03:56AM (#61132322) Homepage

      Your different view of the term "Release Candidate" just means that YOU inflated the concept to mean what "Release" means.

      A candidate is still of finding bugs. Whether you like that or not.

      You should use releases. Not release candidates. Clearly.

      • by burni2 ( 1643061 )

        Perhaps the Linux release cycle uses a bullshit and missleading naming scheme .. I get that "inflation"

        But Linus saying: "Don't Use the Linux 5.12-rc1 Kernel"
        Some people just don't seem to agree with your last sentence and I mean when he needs to mention that he is fully aware of the use of a bullshit naming scheme which makes people think different.

        Btw. I for one am very conscious that I don't use release candidates for my daily work.

  • I fail to see the problem here. There was a bug. It was discovered before release. I assume it has been fixed.

    Just as it should be.

    Nobody runs (or should run) an RC on a production system or their daily driver. They run it in order to look for problems, or lack thereof, expecting it to possibly break catastrophically.

    Am I missing a point here?

    • What's your point? That people shouldn't talk about it? That they shouldn't report on it?

      What the fuck is wrong with people that things must be a "surprise" for it to be reported on? Something happened that doesn't normally happen and the details of which would interest people.
      • My point is that I just don't see why people get upset about such a non-event. Bugs happen and get discovered before release all the time.

        If the bug had made it to a release, *that* would have been something to get upset about.

        • My point is that I just don't see why people get upset

          No one got upset. People reporting on it is not being "upset".

          Bugs happen and get discovered before release all the time.

          Not enough to mark an RC dontuse. That's what made this event more notable. It's literally in the summary.

    • > Am I missing a point here?

      Yeah, EditorDavid just spewing more clickbait. Basically ignore that moron.

      You've summarized it perfectly. Nothing to see, move along.

  • Comment removed based on user account deletion

Some people manage by the book, even though they don't know who wrote the book or even what book.

Working...