Retbleed Fix Slugs Linux VM Performance By Up To 70 Percent (theregister.com) 33

Posted by BeauHD on Monday September 12, 2022 @07:20PM from the nasty-numbers dept.

VMware engineers have tested the Linux kernel's fix for the Retbleed speculative execution bug, and report it can impact compute performance by a whopping 70 percent. The Register reports: In a post to the Linux Kernel Mailing List titled "Performance Regression in Linux Kernel 5.19", VMware performance engineering staffer Manikandan Jagatheesan reports the virtualization giant's internal testing found that running Linux VMs on the ESXi hypervisor using version 5.19 of the Linux kernel saw compute performance dip by up to 70 percent when using single vCPU, networking fall by 30 percent and storage performance dip by up to 13 percent. Jagatheesan said VMware's testers turned off the Retbleed remediation in version 5.19 of the kernel and ESXi performance returned to levels experienced under version 5.18.

Because speculative execution exists to speed processing, it is no surprise that disabling it impacts performance. A 70 percent decrease in computing performance will, however, have a major impact on application performance that could lead to unacceptable delays for some business processes. VMware's tests were run on Intel Skylake CPUs -- silicon released between 2015 and 2017 that will still be present in many server fleets. Subsequent CPUs addressed the underlying issues that allowed Retbleed and other Spectre-like attacks.

Retbleed Fix Slugs Linux VM Performance By Up To 70 Percent

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 33 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:1)
  
  by scalptalc ( 6477834 ) writes:
  
  That's my question. Is this a Linux kernel 5.19 issue, or an ESXi issue, or not an issue at all if you're not using that particular combination of kernel/VM? Just sit tight and let the wonks figure it out. Maybe they should stop pointing fingers so much; it's software on both sides here. Or they can both sue Intel, how 'bout that? I don't use either of those products and I'm just here in the bleachers but didn't like reading the word Skylake.
  - Re:Is this an anti-slashvertisement (Score:5, Interesting)
    
    by Anonymous Coward writes: on Monday September 12, 2022 @09:49PM (#62876189)
    
    It's not an ESXi/VMware specific issue.
    This problem as been known and discussed for many months now and affects all sorts of subsystems. They are working on mitigations that hopefully have less of a performance impact in the future.
    VMware is late to the game, ignorant, or just shitposting at this point. They were told to stop on the mailing list. Nothing they posted is new information.
    
    - Re: (Score:2)
      
      by drinkypoo ( 153816 ) writes:
      
      Probably late, like they are to support new kernel versions in player/workstation.
      Every time I think about paying for it, I remember that I'm still having to run someone else's host modules because they can't be bothered to keep up.
  - Re:Is this an anti-slashvertisement (Score:5, Informative)
    
    by F.Ultra ( 1673484 ) writes: on Tuesday September 13, 2022 @01:00AM (#62876459)
    
    Reading the LKML posts it turns out to be both but it's more complicated, as always. it's the use of IBRS that causes the slowdown (and IBRS is used on Windows as well so this is not a Linux only issue, it's a Retbleed issue). Where it gets complicated however is that it looks like ESXi exposes the CPU of the host to the guest as one needing IBRS when it doesn't aka it emulates a Retbleed vulnerable CPU regardless of what you actually use on the host.
    
- Re: (Score:2)
  
  by thegarbz ( 1787294 ) writes:
  
  What your post means is that you should never make any decision by reading a headline and a summary, or you end up looking stupid.
Well, Intel had to get its speed somewhere.... (Score:2, Insightful)

by gweihir ( 88907 ) writes:

Turns out cheating and unsound engineering can come back to bite you.
That said, does anybody know numbers for AMD and/or Virtualbox?
- Re: Well, Intel had to get its speed somewhere.... (Score:1)
  
  by dowhileor ( 7796472 ) writes:
  
  That being said..Why destroy predictive branching if the issue is illegal reads/writes? Heap operations are not getting secure-r it seems and the root causes are not going away...
  - Re: Well, Intel had to get its speed somewhere.... (Score:5, Informative)
    
    by codebase7 ( 9682010 ) writes: on Tuesday September 13, 2022 @12:34AM (#62876445)
    
    Why destroy predictive branching if the issue is illegal reads/writes?
    Because predictive branching means executing instructions that the CPU guesses it will be asked to execute *before* it's actually told to do so by the code. Those instructions will necessarily read and write data, as that's the entire point. To perform the reads/writes when it's the most efficient time for the CPU to do so given it's current work load / hardware. If the CPU guesses correctly, then there's no problem and you get a nice speed boost out of it. However, If the CPU's guess was wrong, it must rollback the changes those wrong instructions made. Those rollbacks are what these exploits are taking advantage of. As it takes time and power for the CPU to rollback, and the exact instructions that must be rolled back are influenced by the attacker.
    Heap operations are not getting secure-r
    The heap has nothing to do with predictive branching beyond being a place for the CPU to read/write data from. Securing the heap isn't going to impact predictive branching because the executing code by definition has access to it's own heap.
    root causes are not going away
    The root cause is the predictive branching hardware not masking the power/time that the rollbacks require to complete. The only way to actually fix that is either a CPU replacement, or a microcode update. Sadly only the CPU manufacturer can produce a microcode update. (Yet another place for the manufacturer to hide code in.) Which means some CPUs will never see an update even if the fix is known and could be applied by others. The only other option is a workaround that disables the predictive branching hardware outright. Which is the route that the fix in TFA took. Of course disabling the hardware outright means no speed optimizations are made by the CPU. Hence the massive slow down. I'd love it if VMware and others would use this opportunity to advocate for getting rid of manufacturer only hidden code areas in hardware, so that vulnerabilities like this could be fixed by others without such drastic slowdowns, but I'm not holding my breath.
    
    - Re: (Score:1)
      
      by michelcolman ( 1208008 ) writes:
      
      Why don't people just write algorithms that are immune to snooping (extra instructions that do nothing but hit the cache, things like that) and leave the rest of the system alone? Nobody is going to snoop on your video editing. Afaik, the attacks are limited to very small amounts of information like passwords. Just make sure that stuff is written in such a way that its operation is sufficiently obfuscated. Can't be that hard, can it?
      And perhaps new processors should get an instruction that disables caches a
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        You should look into what is actually gong on. This is primarily about kernel entry and process switching.
    - Re: Well, Intel had to get its speed somewhere... (Score:1)
      
      by dowhileor ( 7796472 ) writes:
      
      My point was...BTW I agree with your summary on many levels. But, looking at predictive branching, speculative branching, execution,...the majority of shortfalls in those concepts may have been with the way that user space has a way of creating environments that the register wss not meant to handle.
- Re:Well, Intel had to get its speed somewhere.... (Score:5, Insightful)
  
  by ToasterMonkey ( 467067 ) writes: on Monday September 12, 2022 @10:09PM (#62876233) Homepage
  
  Turns out cheating and unsound engineering can come back to bite you.
  That said, does anybody know numbers for AMD and/or Virtualbox?
  ... cheating? It's been what, four years since meltdown/spectre were disclosed? Not even a second thought as to why new vectors are being found every other week that affect just, all the processors? It's kind of like calling every C/C++ buffer overflow bad engineering and a Microsoft problem. It's pointless partisanship and wishful thinking at best.
  You might want to actually go look up the effects of retbleed on Intel and AMD processors and not take my word for it. I think one of the mitigations is disable hardware threads, so good luck making that fast.
  
  - Re: (Score:3)
    
    by dynamo ( 6127 ) writes:
    
    Not all the processors. Apple's ARM chips don't seem to be affected.
    - Re: (Score:2)
      
      by Tough Love ( 215404 ) writes:
      
      Apple will get their turn.
    - Re: (Score:2)
      
      by drinkypoo ( 153816 ) writes:
      
      Don't worry, Apple has their own unacceptable security flaw. [techcrunch.com]
      What I want to know is what the mitigation impact is like across the affected AMD processors (from FX series through Zen 2.) I am still using AMD FX...
      - Re: (Score:2)
        
        by jbmartin6 ( 1232050 ) writes:
        
        Here 'unacceptable' means accepted by almost everyone.
        
        Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        Look around you. Almost everyone accepts the unacceptable every day. That's how we got here.
  - Re: (Score:1)
    
    by michelcolman ( 1208008 ) writes:
    
    Disable hardware threads during the short amount of time when sensitive information is being processed. Let the software indicate when it enters a phase where it could be vulnerable to snooping. Usually this is an extremely short amount of time. The rest of the time, the system can run at full speed.
    - Re: (Score:2)
      
      by jbmartin6 ( 1232050 ) writes:
      
      This relies on the developer remembering to do it, and accurately understanding what information is sensitive. Both of those make it a non-starter.
  - Re: (Score:1)
    
    by dsanfte ( 443781 ) writes:
    
    C/C++ buffer overflows *are* bad engineering: bad engineering of the language. They aren't fit for purpose. Think of how many security issues have been caused by buffer overflows in the past 30 years, let alone crashes. Memory unsafe languages need to get in the bin.
    - Re: (Score:2)
      
      by OrangeTide ( 124937 ) writes:
      
      C11 Annex K has bounds checking. You can turn it on in GCC and Clang. Because of the low-level nature of C pointers you can escape the checking with a little bit of effort. If a large project standardizes on prohibiting certain unsafe language constructs (e.g. MISRA C) you can not only have bound checking working you can run some simple analysis tools on ever commit to check that you are conforming to the requirements. (I think you have to pay money for proper MISRA C scanning, but there are others to choos
  - Re: (Score:2)
    
    by jbmartin6 ( 1232050 ) writes:
    
    I don't think there is much of a bite here. AFAIK, there are still no real world attacks being run using this family of vulnerabilities even after all this time. Kudos to the chip makers for implementing improvements in the hardware moving forward, kudos the the browser makers for putting in mitigations as well. Meanwhile, everyone can disable the mitigations and proceed as normal.
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Just as bad. Zen2 and older, though.
  Intel, 9th gen Core and older.
  All known Arm cores.
  
  Turns out (almost) everyone cheated and engaged in unsound engineering.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Turns out (almost) everyone cheated and engaged in unsound engineering.
    In the end, yes. Because they somehow had to catch up with the numbers Intel put out there. But Intel was by war the most reckless and the first to do this crap. And this whole disaster was preventable, because years in advance the possibility of this attack got discussed at the microprocessor forum conference. And then Intel did it anyways. And then everybody had to follow or look bad. AMD at least managed to make this a lot harder and in some cases practically infeasible. Intel did simply not care.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      You really do take every post as an opportunity to just bash on Intel. It's bizarre.
      
      Everyone did this because it was the logical thing to do to increase performance.
      Nobody was designing processors trying to anticipate every single possible sidechannel that may come.
      
      When the first branch predictor attacks showed up, and Intel rolled out IBRS, AMD said "no need. our stuff isn't affected."
      Intel and security researchers said, "ehhh, almost certainly almost everyone is affected."
      
      Here we are, years later,
- Re: (Score:2)
  
  by thegarbz ( 1787294 ) writes:
  
  Turns out cheating and unsound engineering can come back to bite you.
  Precisely no one here is cheating. Speculative execution is a speed related enhancement for all processors. Why would you think it's cheating.
  Oh ... gweihir. I should have known from the completely ignorant content of your post.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    It is cheating because Intel knew of this attack possibility before and did nothing to make it hard. AMD did and sacrificed some performance because the are not total scum. Turned out to not be enough, but at least they tried. But Intel did nothing and they knew. This risk got discussed at the microprocessor forum conference years before Intel made it possible and yes, Intel was there in the very session it got discussed.
    I am aware this level of insight is wayyy outside of what you can do.
s/Slugs/Slows/g (Score:3)

by Fly Swatter ( 30498 ) writes: on Monday September 12, 2022 @09:58PM (#62876225) Homepage

Use that and the Title is actually readable without three double takes.

'editors'

- Re: (Score:3)
  
  by dhammabum ( 190105 ) writes:
  
  No, its slime contamination during manufacturing.
  - Re: (Score:2)
    
    by Chris Mattern ( 191822 ) writes:
    
    Actually, it was fake coins. Or maybe pickup carpool riders, I'm not sure which.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Retbleed Fix Slugs Linux VM Performance By Up To 70 Percent (theregister.com) 33

Retbleed Fix Slugs Linux VM Performance By Up To 70 Percent More Login

Retbleed Fix Slugs Linux VM Performance By Up To 70 Percent

Re: (Score:1)

Re:Is this an anti-slashvertisement (Score:5, Interesting)

Re: (Score:2)

Re:Is this an anti-slashvertisement (Score:5, Informative)

Re: (Score:2)

Well, Intel had to get its speed somewhere.... (Score:2, Insightful)

Re: Well, Intel had to get its speed somewhere.... (Score:1)

Re: Well, Intel had to get its speed somewhere.... (Score:5, Informative)

Re: (Score:1)

Re: (Score:2)

Re: Well, Intel had to get its speed somewhere... (Score:1)

Re:Well, Intel had to get its speed somewhere.... (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

s/Slugs/Slows/g (Score:3)

Re: (Score:3)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot