Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Open Source Red Hat Software Linux

Why a 'Frozen' Distribution Linux Kernel Isn't the Safest Choice for Security (zdnet.com) 104

Jeremy Allison — Sam (Slashdot reader #8,157) is a Distinguished Engineer at Rocky Linux creator CIQ. This week he published a blog post responding to promises of Linux distros "carefully selecting only the most polished and pristine open source patches from the raw upstream open source Linux kernel in order to create the secure distribution kernel you depend on in your business."

But do carefully curated software patches (applied to a known "frozen" Linux kernel) really bring greater security? "After a lot of hard work and data analysis by my CIQ kernel engineering colleagues Ronnie Sahlberg and Jonathan Maple, we finally have an answer to this question. It's no." The data shows that "frozen" vendor Linux kernels, created by branching off a release point and then using a team of engineers to select specific patches to back-port to that branch, are buggier than the upstream "stable" Linux kernel created by Greg Kroah-Hartman. How can this be? If you want the full details the link to the white paper is here. But the results of the analysis couldn't be clearer.

- A "frozen" vendor kernel is an insecure kernel. A vendor kernel released later in the release schedule is doubly so.

- The number of known bugs in a "frozen" vendor kernel grows over time. The growth in the number of bugs even accelerates over time.

- There are too many open bugs in these kernels for it to be feasible to analyze or even classify them....

[T]hinking that you're making a more secure choice by using a "frozen" vendor kernel isn't a luxury we can still afford to believe. As Greg Kroah-Hartman explicitly said in his talk "Demystifying the Linux Kernel Security Process": "If you are not using the latest stable / longterm kernel, your system is insecure."

CIQ describes its report as "a count of all the known bugs from an upstream kernel that were introduced, but never fixed in RHEL 8." For the most recent RHEL 8 kernels, at the time of writing, these counts are: RHEL 8.6 : 5034 RHEL 8.7 : 4767 RHEL 8.8 : 4594

In RHEL 8.8 we have a total of 4594 known bugs with fixes that exist upstream, but for which known fixes have not been back-ported to RHEL 8.8. The situation is worse for RHEL 8.6 and RHEL 8.7 as they cut off back-porting earlier than RHEL 8.8 but of course that did not prevent new bugs from being discovered and fixed upstream....

This whitepaper is not meant as a criticism of the engineers working at any Linux vendors who are dedicated to producing high quality work in their products on behalf of their customers. This problem is extremely difficult to solve. We know this is an open secret amongst many in the industry and would like to put concrete numbers describing the problem to encourage discussion. Our hope is for Linux vendors and the community as a whole to rally behind the kernel.org stable kernels as the best long term supported solution. As engineers, we would prefer this to allow us to spend more time fixing customer specific bugs and submitting feature improvements upstream, rather than the endless grind of backporting upstream changes into vendor kernels, a practice which can introduce more bugs than it fixes.

ZDNet calls it "an open secret in the Linux community." It's not enough to use a long-term support release. You must use the most up-to-date release to be as secure as possible. Unfortunately, almost no one does that. Nevertheless, as Google Linux kernel engineer Kees Cook explained, "So what is a vendor to do? The answer is simple: if painful: Continuously update to the latest kernel release, either major or stable." Why? As Kroah-Hartman explained, "Any bug has the potential of being a security issue at the kernel level...."

Although [CIQ's] programmers examined RHEL 8.8 specifically, this is a general problem. They would have found the same results if they had examined SUSE, Ubuntu, or Debian Linux. Rolling-release Linux distros such as Arch, Gentoo, and OpenSUSE Tumbleweed constantly release the latest updates, but they're not used in businesses.

Jeremy Allison's post points out that "the Linux kernel used by Android devices is based on the upstream kernel and also has a stable internal kernel ABI, so this isn't an insurmountable problem..."
This discussion has been archived. No new comments can be posted.

Why a 'Frozen' Distribution Linux Kernel Isn't the Safest Choice for Security

Comments Filter:
  • by kopecn ( 1962014 ) on Saturday May 18, 2024 @08:16PM (#64481997)
    Cannot really migrate because all of the newer builds are full of broken functionality that had previously worked. I am looking at you Ubuntu 22 and 24. Still stuck on 18.
    • Re:Yeah (Score:5, Interesting)

      by bill_mcgonigle ( 4333 ) * on Saturday May 18, 2024 @08:38PM (#64482027) Homepage Journal

      You can't be on 18 in ten years, can you?

      Plan B is needed. But also if Ubuntu is failing you evaluate that too.

      • Re:Yeah (Score:4, Insightful)

        by Tailhook ( 98486 ) on Saturday May 18, 2024 @08:45PM (#64482033)

        You can't be on 18 in ten years, can you?

        Not with support. With Ubuntu Pro you get to April 2028 with backported security patches.

        • Re:Yeah (Score:5, Interesting)

          by 93 Escort Wagon ( 326346 ) on Saturday May 18, 2024 @09:40PM (#64482079)

          Of course isn't that what this submission is all about - that the "backported security patches" approach is actually a sub-optimal way to do this (for the kernel, at a minimum)?

          Which is what pretty much every enterprise Linux currently does...

          • Oh and then there is all those frozen docker images. I am just crying inside.
            • That was always the problem with static linking, back when that was a fetish for some (1990s) - static bugs that can't be fixed even if you update the library. That happens today too, as everything old is new again and now everyone is into static linking again: container images and Go and so on.

              It turns out that the people who invented dynamic linking might have been onto something...

          • Yes ! That's exactly the point. Trying to curate and select patches for a "frozen" kernel fails due to the firehose of fixes going in upstream.

            And in the kernel many of these could be security bugs. No one is doing evaluation on that, there are simply too many fixes in such a complex code base to check.

            • But you don't have an alternative solution as just using the latest kernel is simply not viable in many many scenarios. So telling people to do something that would create even more problems is a really dumb move.

              • by micheas ( 231635 )

                But you don't have an alternative solution as just using the latest kernel is simply not viable in many many scenarios. So telling people to do something that would create even more problems is a really dumb move.

                The not viable in many scenarios can be translated to " We aren't competent enough to figure out how to do QA in the manner required to do our jobs properly"

                The idea of backporting fixes as anything other than a temporary bandaid, while the proper fix is applied, is grossly incompetent. The fact that there was not single company pushing QA at RSA last week shows how grossly incompetent the average CISO is and that they have no clue that the primary need is not to know what security risks you have but rat

            • And the firehose of new "features" adding new bugs at an equal rate.
    • by gweihir ( 88907 )

      Have you tried just replacing the kernel?

    • by AmiMoJo ( 196126 )

      Still using Ubuntu 18 at work for some stuff that won't migrate. It's getting harder and harder because stuff like modern VPNs aren't supported. At the moment it's barely working in a VM alone side another Ubuntu 22 VM, with a network tunnel between the two and the VPN running on the newer OS.

      In theory it might be possible to use Docker or something to get it working on a newer OS, but it's more work than it's worth and Docker can break easily enough as well. We have a build machine that runs it in Docker o

      • by micheas ( 231635 )

        Still using Ubuntu 18 at work for some stuff that won't migrate. It's getting harder and harder because stuff like modern VPNs aren't supported. At the moment it's barely working in a VM alone side another Ubuntu 22 VM, with a network tunnel between the two and the VPN running on the newer OS.

        In theory it might be possible to use Docker or something to get it working on a newer OS, but it's more work than it's worth and Docker can break easily enough as well. We have a build machine that runs it in Docker on Ubuntu 20, but moving the container off that machine breaks it and it's not worth investigating.

        Would love to see a good solution for this.

        How about running your app in Docker with SCRATCH as your base image and only copying in what you actually use?

        I know, that would be hard and require something other than an outsourced solution that makes the auditors happy. But, I can assure you that SCRATCH has never had a security vulnerability. Start with a sound foundation and figure out what you are actually doing. It's faster for a POC to just throw everything and the kitchen sink over the wall, but if you care at all about security, cost, and mainta

        • by AmiMoJo ( 196126 )

          I'm sure it would work, but it's work so it won't get any attention until it breaks. Ideally the VM could use the host VPN, but it doesn't work with VirtualBox on Windows.

    • Debian too.

      I have a TV card that works in Debian 11, but cannot decode signals in Debian 12 on the exact same hardware configuration despite acquiring a signal lock.

      I also have a laptop that if an ethernet (Wired or Wireless) interface is put into a different network namespace under Debian 12 all ethernet packets seem to get dropped by the kernel until the interfaces are all back in the initial namespace. Worked fine in a previous Debian.

      Another system just has it's video output stop working after the
  • by bill_mcgonigle ( 4333 ) * on Saturday May 18, 2024 @08:42PM (#64482029) Homepage Journal

    Currently many people buy "Enterprise" linux so they have someone to blame.

    They could be on bookworm with a -current kernel but then if something goes wrong there's noone paid to apologize.

    Perhaps with ransomware insurance going sky-high the incentives will change but right now in EL space being secure is not a top priority - quarterly bonuses are much more important.

    • The EU Cyber Resilience Act will eventually require that everything deployed from firmware, bios, device drivers, kernel, system service, application programs, web sites, etc. be security audited and have a statement from the vendor that they've audited all of their code and have certified statements from each vendor/FOSS that all of their code is audited.

      This goes recursively and eventually will require every line of code on a system to be audited.

      The Frozen Linux distribution is just to get ahead of that

      • Expect that thousands of widely used FOSS libraries will be in a not-audited state given that getting them audited and recertifying them every few years is cost many FOSS projects won't have time or money to do.

        How many years will it take for the following to be fully security audited?
        - Angular
        - React
        - Bootstrap
        - Node and common JS backends
        - PHP back-end
        - .NET back-end
        - ...

        And legacy companies like Oracle to open up and security audit their ancient code bases.

        • by tlhIngan ( 30335 )

          Expect that thousands of widely used FOSS libraries will be in a not-audited state given that getting them audited and recertifying them every few years is cost many FOSS projects won't have time or money to do.

          Why do you think every project needs auditing?

          It's a security audit, not a line by line code audit. To pass these you need to show things like how you designed your system to be secure, what steps you take to ensure that bad actors can't insert random code into your product, and other things.

          None of

          • by Bongo ( 13261 )

            How would that kind of audit have prevented MOVEit ?

            There's too often in "security" a theme of doing what's easy to do, rather than doing what's hard but which would actually make a difference.

          • by micheas ( 231635 )
            Considering CVEs are borderline useless. That is actually horrifying.

            CVEs are designed for proprietary software with the source code not available that is released on a fixed schedule.

            CVEs are a poor fit for open source software as they miss the majority of security issues that are incidentally fixed in bug fixes that don't have a security issue assigned to them and a non trivial percentage of them are bogus.

            If you are going to build off of a large open source eco system you need to commit to being able to

      • by gweihir ( 88907 )

        Not really. That audit requirement is a red herring. What is actually required is _resilience_. You know, as the title says. For example, there is no issue with using unaudited code or cutting-edge code as long as you have additional safeguards in place (that you need to have anyways) and as long as you have a tested recovery procedure. Yes, you need to be careful. Yes, you need to be able to keep things running or get them running again. No, that does not mean a glacial process where anything needs to have

      • by jmccue ( 834797 )

        Well, OpenBSD it is. That is the only Free UN*X that does regular and repeating audits of their system.

        The begs the question, what is meant by Vendor ? Since OpenBSD base does not come with third party applications, if I were to install say install Apache on OpenBSD from ports, would the Apache people be libel under EU's Cyber Act ? What about something from a one man project, can the EU go after the person in charge of that ? Is the same true for RHEL packages ? Who is at fault for something I install

      • by Bongo ( 13261 )

        That's interesting, but it sounds impossible. Rather, companies will take advantage of self-certification and they'll use extremely simplistic, nearly worthless auditing techniques. More security theatre.

        • The world loves security theater though. Just look at how we do airport security now. It's more important we "look" like we are doing something useful then actually doing something useful.

          No I don't like it but that's the sad state of affairs.

        • by micheas ( 231635 )

          That's interesting, but it sounds impossible. Rather, companies will take advantage of self-certification and they'll use extremely simplistic, nearly worthless auditing techniques. More security theatre.

          There was a reason I ran a $500 third-party pen test on an application. Here's a hint: It wasn't to find security issues.

    • by gweihir ( 88907 )

      Indeed. Broken error culture where getting better and staying flexible is not the aim, but always having somebody to blame is. Such an organization will never be good at anything and it will be too inflexible and incompetent to act fast when needed. And that need to act fast can arise at any time in a modern IT landscape. Essentially people are too afraid to touch anything because that touch may get them fired. It does not get much more dysfunctional than that, on engineering and on management level both. W

    • There is an Ubuntu LTS and Proxmox, both are on bookworm with supported kernels. The âfreeâ(TM) versions basically have become the beta channel though. If you use it, you will find the problem is that the bleeding edge kernel has a bunch of bugs with enterprise hardware like Intel and nVIDIA networking gear and then pieces like systemd and networkd arenâ(TM)t helping either.

      The problem is that most of the bleeding edge kernel developers and users are currently on AMD CPU with gaming/consumer

  • My Android phone is stuck in an old kernel. Why? My phone manufacturer is NOT going to spend unlimited time and money into updating this part enthusiastically, doing all the hardware support regression testing and mitigation for no gain in its business profit. Yes, there are security risks. But they are not big enough for consumers to force the company (or companies) to comply perfectly. The "patch" number of my Android phone kernel is thus lagging a lot behind from the latest "patch" number of the same lo

    • by HBI ( 10338492 )

      Sounds like a business opportunity to me. I'm sure people would pay some non-outrageous sum a year to maintain long term hardware support. Some apropos legislation mandating 7-10 year support for such devices (no mention of 'free') and some regulation of maximum software fees and voila, you have a business for someone. Maybe not the original vendor but who cares as long as they keep the phone running and patched up for the time frame required.

      • Sounds like a business opportunity to me.

        Two words: Locked bootloader.

        That business opportunity is DOA because of them. Why? Because why would the manufacturer leave a pile of money lying around when they can use the locked bootloader to up sell you? Expect this to become the norm for everything as long as your local regulations don't prohibit it. After all, if the OEM isn't making more money, why should anyone else be able to?

        • by HBI ( 10338492 )

          That's where the lawmakers come in, banning that shit. Literally compelling the vendor to support the phone for 7-10 years. They'll farm it out to someone.

    • Many Android phones have a locked bootloader, and the vendor decided to abandon patching them, which effectively makes the device useless. What would be nice is the ability to move to LineageOS or something supported for the long haul, so even 5-10 years from now, a device would be able to be on a recent kernel. Another thing that keeps this from happening are closed source, binary Linux blobs used for SoC functionality which may not work with newer kernels.

      • the real issue is binary device drivers statically linked vs specific kernel versions that get abandoned. Even with an unlocked bootloader you are still usually screwed if you don't have support from the device manufacturer.
  • by PPH ( 736903 )

    A "frozen" vendor kernel is an insecure kernel.

    Maybe. But that's the safe way to bet.

    The number of known bugs in a "frozen" vendor kernel grows over time.

    Not really. The bugs were always there. They just haven't been found yet.

    The whole back-porting of only "pristene" patches is bound to break down. Because as time goes by, the newer patches become more dependent on the new features continually being added to the kernel. And the work to back port becomes more difficult and error prone, introducing its own bugs. Eventually, one adds more bugs than they patch.

    Keeping up with the current kernels new features is a never

    • If there are so many bugs that it's hard to keep up with security patches, then your code is overall insecure. Updating to the latest version won't help that.

      It is a nest of insecure bugs because they don't prioritize security. They don't prioritize security because users don't prioritize security. It's not even the third or fourth priority.
      • The upstream Linux kernel doesn't differentiate between security bugs and "normal" bug fixes. So the new kernel.org CNA just assigns CVE's to all fixes. They don't score them.

        Look at the numbers from the whitepaper:

        "In March 2024 there were 270 new CVEs created for the stable Linux kernel. So far in April 2024 there are 342 new CVEs:"

        • I understand that.

          I'm going based on what you said here, that a "frozen" kernel is an insecure kernel.

          It's not that the current kernel is secure, it's just that the security bugs haven't been found yet. And the implication is that in the current kernel, there are a lot security bugs (otherwise freezing the kernel would be ok, and backporting the patches feasible). So updating to a current kernel won't fix your security problems, it'll just hide them a bit longer.
          • by micheas ( 231635 )

            Unless you have a mathematically provably secure kernel you are going to have that problem.

            Linux has some issues architecturally in the kernel that make this hard. Unfortunately, the majority of those decisions have serious performance reasons behind them. This means that a Linux kernel replacement would probably need to be written with both security and performance in mind as top priorities. At this point, my experience tells me that the first task would probably be to develop a new programming language fo

            • You don't need to mathematically prove the security of your kernel. OpenBSD is more secure than Linux.

              There are different levels of security. Linux can improve their security without putting in all the effort to mathematically prove it secure (or switching to a different language).
              • by micheas ( 231635 )

                OpenBSD has a reputation for being more secure than Linux. But, most of that "security" is done by having no services run by default.

                OpenBSD is more of a research lab for security features than it is an actually secure operating system.

                • okaaaaay, don't turn your brain off.

                  But, most of that "security" is done by having no services run by default.

                  As opposed to Linux or Windows, that have what, FTP and Telnet running by default? OpenBSD locked down their system before Windows or Linux, but now every OS turns off services by default.

        • Source of this "a bug is a bug is a bug" discussion: (GKH) https://youtu.be/HeeoTE9jLjM?t... [youtu.be] , and basically explanation of why frozen is insecure.

    • by znrt ( 2424692 )

      The whole back-porting of only "pristene" patches is bound to break down. Because as time goes by, the newer patches become more dependent on the new features continually being added to the kernel. And the work to back port becomes more difficult and error prone, introducing its own bugs.

      a crucial missing piece of information in the paper is how many maintainers are actually working on each version.

      i assume that backporting simply has too few developers allocated so isn't really done to completion, because the other points in the "paper" don't really make much sense: if new bugs are discovered in the frozen version at an increasing rate, then even more bugs will almost certainly be on the current version. ok, there you won't have as much dependency problems but it's also a moving target wit

      • You're missing something.

        New bugs are discovered upstream, but the vendor kernel maintainers either aren't tracking, or are being discouraged from putting these back into the "frozen" kernel.

        We even discovered one case where a RHEL maintainer fixed a bug upstream, but then neglected to apply it to the vulnerable vendor kernel. So it isn't like they didn't know about the bug. Maybe they just didn't check the vendor kernel was vulnerable.

        I'm guessing management policy discouraged such things. It's easier to j

    • That whole statement ignores the obvious: Why the fuck is the kernel having so much code-churn that backporting bug fixes cannot be reliably done for less than the lifespan of the devices it's expected to run on? Or more accurately, Why the fuck are you using Linux in an environment that requires adherence to better development practices?

      This is something that the kernel devs need to start considering. The various government types around the world are, and if Linux is found to be insufficient guess what w
  • "ZDNET" (Score:5, Informative)

    by MSG ( 12810 ) on Saturday May 18, 2024 @09:45PM (#64482087)

    ZDNet calls...

    It's less ZDNET, and more Steven Vaughan-Nichols. And since CIQ pays him to provide PR [twitter.com], it's really less ZDNET and more CIQ making that statement.

    ... so all of the sources for this story are CIQ employees. And what they're really pushing is not that Rocky Linux users should build and run the freely available LTS kernels available from the kernel developers, they're pushing CIQ's ELP program, which offers a build of those kernels -- that they don't maintain -- strictly to paying customers.

    • Gordon, Gordon, don't you ever get tired of your obsession ?

      "Towards thee I roll, thou all-destroying but unconquering whale; to the last I grapple with thee; from hellâ(TM)s heart I stab at thee; for hateâ(TM)s sake I spit my last breath at thee."

      • That doesn't really negate his point, though, which appears to be valid.

        It was never a good idea to assume any tech site was impartial... but that's especially true nowadays, since at this point they're basically just communal blogging platforms with minimal editorial oversight.

    • by znrt ( 2424692 )

      well, this would explain the reason for the whole argument "dealing with fewer bugs is more work than dealing with a lot more bugs because reasons" that didn't really make any sense. thanks.

  • it's not just for Windows anymore.
  • .. uses kernel version 4.4.302 in all current NAS products.
  • by ctilsie242 ( 4841247 ) on Saturday May 18, 2024 @11:00PM (#64482175)

    I have worked in organizations where an absolute distribution is a must, because they have paperwork and audits that require this. This has been the case for decades with AIX, Solaris, and HP-UX, where one can ensure that applications and databases are running on a supported OS version. This is why they use Red Hat, because they know that RHEL 9.1 will be exactly the same and have the same bugs. A rolling release on the other hand may have bugs that were not present, but updates cause them.

    This is especially important in air-gapped environments where patches tend to be released in managed sets, and where a rolling release will have patches delayed by a long time.

    The same issue also applies to snaps, which are a microcosm of the rolling release model. In some environments, allowing for code to be always updated is forbidden, because of concern that the snap pushed hasn't been vetted and perhaps contains malicious code. Yes, this may be red tape, but it is a valid concern, and one reason why a lot of enterprises keep using Red Hat where flatpaks are useful, but still optional.

    If security issues are an issue, perhaps we need to move to a different model. Immutable distributions come to mind as a way to protect root and the core userland like systemd from tampering. Solaris ships with root being a role, and one can use rolemod to allow it to be a user. Some distributions are even having the core be a signed btrfs snapshot, and updated on a block level, similar to how Android is updated on a number of devices. Maybe we need to go with more containerization so the Linux OS files are completely out of reach of userland stuff, similar to Qubes OS.

    However, in many environments, there is no way to get around frozen distributions, so it might be wise to focus on security of the release from zero days with defense in depth than to force rolling releases and constant patching.

    • Very astute comment. The white paper shows that the frozen "vendor" kernel model really doesn't work. And if people can't / won't upgrade then maybe alternative security precautions around a known insecure kernel is the best we can do.

      • In an ideal world, as soon as a patch is created, tested, and queued up for distribution, while assuming the patch maker is not compromised... the patch should be obtained and the bug it fixed solved for good. No worries about unpatched systems, especially if a remote root hole or a major zero day. However, for a lot of companies and even people, Internet access isn't available, and there are rules and regs in a lot of companies and organizations about patch levels, where if one machine has a different re

    • by gweihir ( 88907 )

      I have worked in organizations where an absolute distribution is a must, because they have paperwork and audits that require this.

      That is generally not true. It is one approach to do the paperwork and pass the audits, but not the only one. As IT and IT security auditor I have run into this approach when the IT organization did not have careful risk management and just did not want to touch anything because of limited capabilities and insight. As soon as you update _anything_ things can break. Kernels are not special in that regard. That does not mean you should never do it. Only that you need to have a good reason, careful risk manage

      • by micheas ( 231635 )

        Exactly. I would argue that it is a general incompetence at the senior levels of the company that is causing this issue.

        Audit trail paperwork can be completely automated. The problem is they built their processes around a "golden release" instead of understanding that the software was going to be forever in a state of flux and creating security and compliance processes around the assumption that there will be thousands of releases in a year and they all have to be fully documented, trusted, and known, at th

  • Everything looks like a nail. If you ask security experts they will always say -security first, apply all patches. But businesses need to do more than be secure, they need to get work done and applying all patches prevents that. No point being secure if you are not getting any work done; at the same time being insecure means innocent third parties can get hurt because you got hacked. So whats the solution? Move to prevention rather than patching. Dont add anything to the kernel till its proven (we can dream
    • by gweihir ( 88907 )

      Everything looks like a nail. If you ask security experts they will always say -security first, apply all patches.

      Not quite. If you ask _bad_ security experts, that will be their answer. Good ones will ask what your organizational IT risk analysis says first and then try to make an informed decision that can be "patch now", "patch later", "more information needed", "do not patch" and other options. Somebody that just has that proverbial hammer in their tool-chest is not an actual security expert. They are a fake "expert".

      My personal first cut analysis for security experts is:
      1. Do they know security stuff and are they

      • by Bongo ( 13261 )

        And I think risk management is actually a very deep subject, fantastically easy to do superficially and in a worthless way. Unfortunately, you're right into the realm of psychology for one and biases and incentives. So many people end up thinking about the risk to my career or the risk to my being able to report on the outcome of this project rather than the risk to the organization or the risk to customer data. We stick the word management in there, but do people even have enough awareness to understand wh

        • by gweihir ( 88907 )

          And I think risk management is actually a very deep subject, fantastically easy to do superficially and in a worthless way.

          True. It requires a real, critical and unbiased understanding of whatever you are doing risk management for. Most people are not capable of that at all.

          Unfortunately, you're right into the realm of psychology for one and biases and incentives.

          Above statement written before I read this statement. Always glad to run into somebody insightful!

          People need to slow the heck down and start having honest conversations. If you can start there, then you can start to talk about risk and then maybe start to talk about how to manage it. It's a real can of worms and no one really wants to touch it. Yes, we all risk manage blah blah. No, no you don't. Too often risk management is just people with a bit of authority deciding what they feel they can get away with.

          Interestingly, as an IT auditor, I would say I have about 1/3 people that really want to know. 1/3 that sort-of know they have to do it but do not really want to and 1/3 that are just trying to bullshit their way through it. This is in a regulated area (financia

  • by WilCompute ( 1155437 ) on Sunday May 19, 2024 @01:40AM (#64482337) Homepage

    I missed the point where they explain that the newer kernels aren't around enough for the bugs to be encountered, so while it looks like the bugs are with those older kernels, the newer kernels have them and more that are just not found, because they aren't in as much use.

    • by jd ( 1658 )

      But if M distros are hiring an average of N engineers to do the backporting, they could collaborate, pooling M*N engineers to proactively hunt down and fix regressions and new defects.

      Yes, you can't check every possible path, but that kind of surge in bugfixes would massively reduce the risks of newer stable kernels.

      The risk game is quite a simple one. The above strategy will reduce the risk of economic damage. It also greatly increases the number of people who are available to fix newly-reported issues by

    • What's more secure? A new kernel with unknow bugs, or an old kernel where very potential adversary knows about the bugs?

      The idea that new *may* be buggier simply because we haven't had time to look at its security is a fallacy. You're comparing an assumption with actual hard data.

      • So are you. You can't have hard data for either position because the new kernel doesn't have that data yet. So, you have to choose other criteria to make an educated guess. My position does this with probability tilted toward safety.

  • This suboptimal approach is basically what every Android phone on the market uses, right? They ship a 'vendor' kernel with all the drivers built only for a specific version, and the phone stays on that same kernel version until EoL.
    • by jd ( 1658 )

      Yes, which is why Android suffers from all kinds of security and stability problems. The very factors that are currently causing Microsoft's market share to decline sharply.

      Modelling your business after a failed strategy of the competition doesn't sound a terribly good way to proceed. We need alternatives.

  • The frozen version of stable distributions was *never* about enhanced security it was always about not breaking things. New versions have a nasty habit of introducing new bugs and changed behaviour that will break things. This requires extensive regression testing to ensure it doesn't happen, hence the frozen version distribution with backported security fixes as the compromise.
    Sure a latest and greatest might be more secure but it will also cause breakages that could cost millions of dollars. My production

    • by gweihir ( 88907 )

      New versions have a nasty habit of introducing new bugs and changed behaviour that will break things.

      One of the things were Linux is far superior. Unless you have custom drivers or the like (and they are not well-engineered), Linux kernel updates are exceptionally unlikely to break your system as long as you compile based on the old configuration. I can understand that somebody used to the bumbling and half-assed updates Microsoft forces on people will be paranoid here. No need to be paranoid for Linux. In most cases it just works and where it does not, it usually breaks very fast and obviously so. There g

    • by jd ( 1658 )

      You are correct, which means a calculation is in order. We have economic models that price risk. Those have been around for a while. We can measure the defect density. And we know the curve that defines diminishing returns on investment, which in this case would be bugfixes and regression fixes.

      We can quantify the economic damage by hackers (which is substantial), and we know from the defect density of older stable kernels and the reported downtimes the economic damage of undetected pre-existing conditions.

      • The problem is lazy distros. Take RHEL 7 as an example. Right now I can stop paying RedHat, instead pay TuxCare and get a whole slew of security fixes. The problem is not the model the problem is the implementation and for example RedHat need to do better. I have just put a RHEL7 machine under CentOS7 ELS from TuxCare because functionality breaks under RHEL8. Actually lots of testing later and it's the SMB driver in the kernel breaks on later versions with multiple levels of DFS redirects.

    • by micheas ( 231635 )
      Having spent 18 months getting a rolling HIPAA and SOC-2 compliant release pipeline setup I'll say that if security is an actual priority, you need to figure out how to do rolling releases. I spent a couple weeks getting the pipeline working and over a year getting the QA in shape so we could do rolling releases.

      The biggest problem in security is the grossly inadequate QA tools and tests. Finding new issues and fixing them is meaningless if you can't deploy the fix.

  • by gweihir ( 88907 )

    The question implied is stupid. You generally do not want the most secure choice. You want one that fits your risk profile. Security is just one factor. It is an important one, but so are reliability and cost (and others). A system that is perfectly secure but frequently breaks your software or hardware is no good and neither is one that is too expensive for you to maintain. A professional approach evaluates all risks and then strikes a balance. Depending on that risk analyses, it also determines how freque

    • by jd ( 1658 )

      You are correct, but are overlooking a possible solution.

      You have an average of N software engineers hired by M distributions to backport features. This means that the cost of those N*M software engineers is already built in.

      If you hire the same N*M software engineers as a consortium to fix the flaws and regressions in more recent stable kernels, then the software won't break, there won't be the new kernel defects, AND you don't get the security holes.

      Cooperation upstream would mean less kernel differentiat

      • by gweihir ( 88907 )

        Not really. The issue is that _any_ patch can break functionality, and that includes any security patch. You can never be sure to have tested everything. This is not a problem that can be fixed on the technological side. This is a risk management problem. And that risk management needs to be done by people that know the details of the application scenario.

        I do agree that giving more resources to the kernel-team would be a good thing. They already do what you describe with the "longterm" kernels. You may rem

        • by jd ( 1658 )

          Irrelevant, because there will be an unknown number of pre-existing bugs that cause downtime and the back porting of fixes can also introduce new regressions.

          What is relevant is risk. You measure and quantify the risk for each approach. (We have estimates for defect densities and it should be straightforward to get estimates for percentage downtime. We also have estimates of the economic damage from industrial espionage and industrial sabotage.)

          The better approach is the one with the lowest average costs, o

          • by Bongo ( 13261 )

            An issue with risk analysis and championing a good risk analysis is that often it ignores the risk of the risk analysis. In other words, you have to be really clear about expressing the unknowns rather than relying too much on a supposedly professional risk analysis. There are simply far too many things we don't know about systems and so on. One of the biggies is that there are many risk factors other than bugs which will affect your real-world risk. Are you an interesting target, for one, and if so, why? T

            • by gweihir ( 88907 )

              Risk analysis needs to be done competently. It can be done competently. This is not a new field or one with many uncertainties. It is a well-established field and it is known how to do it. Since its results are often inconvenient and may even show some bigshot CEO is doing crappy things, it is a not widely respected discipline though.

              But there really is no "risk of risk analysis". There is only a risk of people faking it. And for that, there is regulation, audit and personal liability. It just need to be es

              • by Bongo ( 13261 )

                Risk analysis needs to be done competently. It can be done competently. This is not a new field or one with many uncertainties. It is a well-established field and it is known how to do it. Since its results are often inconvenient and may even show some bigshot CEO is doing crappy things, it is a not widely respected discipline though.

                But there really is no "risk of risk analysis". There is only a risk of people faking it. And for that, there is regulation, audit and personal liability. It just need to be established.

                People faking it is very much a huge issue. I think what I'm getting at is that, people exploit the uncertainties as a way to fake it. And a way to avoid that is to express uncertainties as clearly as possible up front. Then that already traps those who would fake it, because their move is already exposed. For example, you know the uncertainty makes it 3 to 8, so they exploit that and claim it is a 3. But by having already stated up front that it is 3 to 8, when they pick 3, they have also accepted that it

    • You may remember that they do retire longterm kernels earlier than they would like to due to limited resources.

      These days they actually transfer things over to a different team run by the Civil Infrastructure Program [linuxfoundation.org] and they supply back-ported fixes for the "Super" LTS kernels. So, sure K-H only has so much support for LTS (and he would like to support them longer) - but there's also SLTS kernels if you want something to last for 10 years.

  • I contracted to a large Enterprise company a few years back.

    Somehow it got into the exec tree of management that choosing a curated code base distro was the safest way to go. The prevailing thought was the upstream was too vulnerable to supply chain tampering.

    It fell on deaf ears that the choice of curated code could already be vulnerable by design or by simple bugs.

    It became increasing common that packages needed were now slipping behind. They couldn't be patched. The exec team was even asking for us to

    • by Bongo ( 13261 )

      That sounds truly awful, and a great example of where the cure is worse than the disease.

  • Instead of hiring lots of developers for each distribution to backport essentially the same set of features to each frozen kernel, get together and collectively hire vastly more high-end dual-role engineers to proactively find and fix the bugs in newer stable kernels, so that there are far fewer new bugs.

    This makes the newer kernels safe for enterprise use, whilst eliminating the security risks.

    It costs the same amount, but avoids the reputation-scarring effects of security holes and thus also avoids the ec

    • by laffer1 ( 701823 )

      The real problem with this approach is that it doesn't give RedHat an exclusive. They love to hide patches behind subscriptions. The shakeups with CentOS and Fedora over the years are all about limiting public access to backports as much as possible.

      How do you solve the IBM/RedHat problem? They want to make maximum profit and vendor lock-in is the way they like to do it. (much like Microsoft)

      • by jd ( 1658 )

        It would indeed mean that IBM/Red Hat couldn't restrict the backports, that is true, but it would mean they could focus on any value add (which they could hide). So features they'd exclusively developed, and thus not in the main tree, would stay exclusively theirs and they'd be able to focus more attention on those.

        This would be, as you've noted, a massive divergence from IBM's Linux strategy of late. (Back in the day, when they contributed JFS and the POWER architecture, along with a bunch of HPC profilin

  • We have to remember where this all comes from. The original reason why Red Hat did choose a specific kernel release back in the old days was simple: some releases where good, other ones where not that great and prone to crashes and instability. That was 25 years ago... and since 20 years we have very stable linux releases, and the backwards compatibility requirement for userspace in Linux development is holy.
    Still the Red Hat leadership has been stuck with their heads in the past.
    As this paper rightfully
  • >"In RHEL 8.8 we have a total of 4594 known bugs with fixes that exist upstream, but for which known fixes have not been back-ported to RHEL 8.8. "

    A simple metric of "number of known bugs" is not a good metric of "security." I would bet the overwhelming super-majority of those bugs would not be security related, and some that are might be very low danger or not even relevant on most platforms.

    >"Rolling-release Linux distros such as Arch, Gentoo, and OpenSUSE Tumbleweed constantly release the latest u

This place just isn't big enough for all of us. We've got to find a way off this planet.

Working...