Intel Sees a 3888.9% Performance Improvement in the Linux Kernel - From One Line of Code (phoronix.com) 61
An anonymous reader shared this report from Phoronix:
Intel's Linux kernel test robot has reported a 3888.9% performance improvement in the mainline Linux kernel as of this past week...
Intel thankfully has the resources to maintain this automated service for per-kernel commit/patch testing and has been maintaining their public kernel test robot for years now to help catch performance changes both positive and negative to the Linux kernel code. The commit in question causing this massive uplift to performance is mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes. The patch message confirms it will fix some prior performance regressions and deliver some major uplift in specialized cases...
That mmap patch merged last week affects just one line of code.
This week the Register also reported that Linus Torvalds revised a previously-submitted security tweak that addressed Spectre and Meltdown security holes, writing in his commit message that "The kernel test robot reports a 2.6 percent improvement in the per_thread_ops benchmark."
Intel thankfully has the resources to maintain this automated service for per-kernel commit/patch testing and has been maintaining their public kernel test robot for years now to help catch performance changes both positive and negative to the Linux kernel code. The commit in question causing this massive uplift to performance is mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes. The patch message confirms it will fix some prior performance regressions and deliver some major uplift in specialized cases...
That mmap patch merged last week affects just one line of code.
This week the Register also reported that Linus Torvalds revised a previously-submitted security tweak that addressed Spectre and Meltdown security holes, writing in his commit message that "The kernel test robot reports a 2.6 percent improvement in the per_thread_ops benchmark."
Re:riiiiiight (Score:5, Interesting)
"Intel's Linux kernel test robot has reported a 3888.9% performance improvement in the mainline Linux kernel as of this past week..."
Or, in normal language everyone can immediately understand, a 38-odd times improvement. Why the insane urge to express everything in percentages? It's almost as bad as using microfortnights instead of seconds.
Re: (Score:3)
Why the insane urge to express everything in percentages?
To make the number bigger, obviously. People don't understand basic math, so the framing matters. For example:
"Sure, they say it's 3800% faster, but it's really just a 38x improvement."
People also don't understand how taxes work. The combination can be expensive: "I can give you a 2% raise, but it'll put you just over the line into the next tax bracket. It's up to you."
Re: (Score:3)
People also don't understand how taxes work. The combination can be expensive: "I can give you a 2% raise, but it'll put you just over the line into the next tax bracket. It's up to you."
I'm always amazed by this because all semi-sane tax systems have accumulative tax brackets, e.g. the higher tax rate only applies to the part of the income in the new bracket.
Re:riiiiiight (Score:4, Insightful)
More than a few conservative anti-tax politicians and news organizations like to misrepresent how a graduated income tax works in order to keep the rubes fearful. If all you do is look up your tax owed in a table or have someone else prepare it for you, you may not understand how the formula works.
Re: (Score:2)
If your income goes up in one area but nothing else changes, you will still be making more money after taxes. There are some contrived exceptions but they won't apply to your general wage/salary worker. Did you got from the 32% to 35% tax bracket? If yes, then congratulations on your raise, you are keeping more after-tax money. I think where some people get confused is because most taxpayers aren't calculating all these graduated rates by hand, either you look up the end result in a table (if income les
Re:riiiiiight (Score:4, Insightful)
It's also not a 38x increase in the overall kernel performance, it only applies in a particular case. It fixes an older patch that cause a 6x slowdown, in that particular case. Your average user won't notice any changes, your power users probably won't either, and mostly only the automated performance regression tests were seeing changes.
Re: (Score:2)
38-odd times improvement
Actually a 39.889 times improvement, so "40-odd" is much more accurate than "38-odd".
Re: (Score:2)
38-odd times improvement
Actually a 39.889 times improvement, so "40-odd" is much more accurate than "38-odd".
The number cited was 3888.9% , not 3988.9%.
Re: (Score:2)
Yes, but that was the percentage increase. Notice how a 100% increase is a 2 times improvement.
Re: (Score:2)
Or, in normal language everyone can immediately understand
This is technical news. Not everyone will immediately understand it anyway, and all of the people who are capable of understanding it are capable of understanding a percentage improvement. There is absolutely no value to making sure the layman can understand the performance increase when they cannot understand when it will apply.
Re: riiiiiight (Score:5, Interesting)
Or a really simple optimisation that was non obvious. Aligning things in memory can have an *enormous* performance impact. Finding a place where things werenâ(TM)t aligned and making sure they are now is very much the kind of thing Iâ(TM)d expect to be a massive win. I used to work at one of the major OS vendors, and this absolutely is the kind of thing weâ(TM)d on occasion find, completely legitimately.
Re: riiiiiight (Score:5, Interesting)
Re: riiiiiight (Score:2)
Automatic struct optimization isn't a thing in C yet?
Re: (Score:3)
One reason C doesn't reorder struct fields is that it's forbiden by the standard.
In any case to make it happen optimally you'd need to know the access patterns, but C compilation units typically compiled separately, so you wouldn't know what the optimal order would be. And how do you automatically determine the best order for fields in the first place, when it's related to the access patterns of such fields? The problem becomes more intractable if the fields are part of the provided API.
Something like profi
Re: riiiiiight (Score:2)
Rust reorders the structs to optimize on alignment boundaries and minimize cache misses. You can override this for e.g. FFI using repr.
Re: (Score:3)
You'd have to turn this off if Rust were used in a systems level program, or if the data was shared with non-Rust routines or programs, or there were ABI or API requirements, etc. Don't know enough of Rust to know if they have "external" structs for this purpose, but I've seem some obscure languages use this.
Re: (Score:2)
You'd use #[repr(C)] or a crate like PacketStruct to deal with that.
Re: (Score:2)
Good to know. I've do write Rust but never considered that. From the (safe) programmer's point of view that's invisible, so Rust is permitted to do that.
Alignment boundaries-based field reordering is a good feature, but how about dealing with false sharing? It would be very wasteful to align everything by cache line size and you can't know from the struct definition itself how they should be ordered best.
Re: riiiiiight (Score:4, Informative)
When you're in kernel space, things often have to be aligned to match hardware requirements. You don't want the compiler re-organizing the fields in your page table or your GPU command list.
You also don't want it re-organizing things when you're writing code that gets called from other languages.
Most of the use cases for C nowadays are cases where you really need things to be exactly how you specified.
Re: riiiiiight (Score:4, Insightful)
No, and it's forbidden! The reason being that C is a low level language and the layout in a struct is very often vital to proper working of the code, because the data is shared with other routines possibly built with a different compiler, or the struct is shared with other machines, etc. Languages have an ABI (application binary interface) which states how data is laid out, what registers are used, etc, so that they can all interoperate. If a compiler could just change the rule, then it will not be complying with the ABI.
OS (Score:1)
Re: (Score:2)
Or more specifically to this optimisation, align things with respect to the data the benchmark uses. The summary doesn't go in to the 6x slower performance in other scenarios.
It's a trade off between aligned memory and fragmented memory.
Misleading (Score:5, Informative)
This change has been shown to regress some workloads significantly.
One reports regressions in various spec benchmarks, with up to 600% slowdown.
Re:Misleading (Score:5, Insightful)
Also, no mention of AMD. Is anybody savvy enough to tell if this should apply to AMD as well?
Re:Misleading (Score:5, Interesting)
The misalignment caused the 600% slowdown in benchmarks, according to the commit message.
This fixes that by skipping this codepath on unaligned requests.
What's weird is Linus reverted the problem two years ago for the same reason but the committer put it back a few weeks later.
https://git.kernel.org/pub/scm... [kernel.org]
To me this looks like a partial solution with room for doing it right.
Re: (Score:2)
Re: Misleading (Score:2)
So asking, improvements in things like treading while things like pixel mapping take take a hit? No big deal i think.
"Performance improvement" (Score:3)
Way too much dramatization. No way it's a 38x overall performance improvement, that would defy all logic and something like that would hit mainstream news. So next time you want people to swallow BS, use a much smaller number like say 3.8%.
Re: (Score:1)
That's just your ignorance talking. The 38x improvement was in a very specific case, it even is implied as such in TFS, and yes a buggy implementation of code can definitely screw up your results like this, and in some cases even worse.
Just because you're afraid of big numbers or don't understand what is going on doesn't mean it isn't actually real.
Sounds like they are trying (Score:2)
Re: (Score:3)
It's not an Intel specific change. This impacts all platforms.
Re: (Score:1, Flamebait)
Not really. And Intel is dead. They will just take a while to die.
Re:Sounds like they are trying (Score:4, Interesting)
Intel is not dead (cue the shovel to the head). They do need to get back to their knitting.
Remember Apple was pronounced dead once, and so was AMD.
On the other hand GE did die due to thinking they were a bank.
So, the question is should Intel concentrate on the super chips that have bragging potential but a market of 0.1% of users, or for the power efficient desktop/laptop market?
I ask because looking at the M4 I realized I can't really put my M1 Air to full load. My Linux box has about the same performance, twice the RAM, six times the storage, and five times the USB ports. Apple's desktops are not a good fit for me.
Re: (Score:2)
Intel is dead. All they ever had was superior manufacturing which came form their _memory_ business. That is over. Their CPUs always sucked in one way or another and they have not managed any innovation for a long time now. And recently, they try to pull stunts like jeopardizing CPU reliability to boost performance. That has the stink of desperation.
Re: (Score:2)
"Intel is dead" - actually far from it.
There's still a huge amount of talent at Intel, and their single-core performance is still exceptionally good, if not on equal footing to AMD [techspot.com].
Besides, their business isn't limited to CPU production.
They just need better management, ideally someone who isn't guzzling millions for his own sake [reuters.com].
Re: (Score:2)
There's still a huge amount of talent at Intel, and their single-core performance is still exceptionally good, if not on equal footing to AMD.
If you're not #1, you're #2.
AMD now has the fastest processor in every segment and is outselling Intel in the datacenter.
Intel had better get their shit together fast or they really will die.
Re: (Score:2)
Indeed. And the thing is, AMD did this form a position of weakness. That shows how truly fucked Intel is. Intel may survive, but from available history, it is very unlikely they will ever be #1 again.
Add to that, that ARM is becoming more and more of a thing. Intel is inactive in that space or late to the game. AMD can deliver things like mixed AMD64/ARM chiplets.
Read the fine print. (Score:4, Interesting)
The patch message confirms it will fix some prior performance regressions and deliver some major uplift in specialized cases...
If you read the THP documentation [kernel.org] you'll learn that "THP only works for anonymous memory mappings and tmpfs/shmem" which means unless you're using tmpfs or shared memory with a request in excess of PMD_SIZE bytes (2MiB on my system) then this has no impact.
It seems unlikely that many programs will see much difference in performance but it's always nice to see improvements added to the kernel.
Re: (Score:2)
tmpfs used to be default for some systems, I think it was mostly reverted because of the stuff that for some reason thinks it's smart to unpack a massive archive there, like Nvidia driver runfiles. (They also don't respect the TMPDIR variable so you have to add the flag --tmpdir=$TMPDIR to get them to act like everyone else's software. The command line option is literally named after the environment variable used in Unix[likes] for decades! Just support the fucking variable!)
*ahem*
I am using tmpfs because i
Re: (Score:2)
I think anything writing files of substance (more than 100MiB) should check that destination has enough storage space (and permissions) before any writing is attempted. It seems logical enough that POSIX should have a function dedicated to this practice. I'm a bit surprised that nobody has tried to make it a standard practice.
Re: (Score:2)
Yes, on the one hand I am glad that nvidia provides a runfile driver so I don't have to mess with their rep, on the other hand I wish they were more competent about it, and on the gripping hand why does their driver install conflict with Multiarch for other things so that I have to use the runfile?
Since I am all-Linux now, my next GPU will probably be from AMD, and it's largely for reasons like this (but also price-motivated, ofc.)
Back on topic, it is pretty surprising that you don't specify how big a file
That's just f'ing great... (Score:5, Funny)
Now all my crappy code will crash much faster. ;)
Re: (Score:2)
Hey, the faster it crashes, the faster a fix may come.*
*excluding Microsoft software which relies on crashes
Impacts all CPUs (Score:4, Interesting)
This isn't an Intel specific change or even a x86_64 specific change, this impacts every Linux platform. The only reason "Intel Sees..." is in there is because they are the ones doing regression testing for the kernel.
Let me guess (Score:2)
It really does not matter how many lines of code (Score:3)
That is maybe a curiosity, but irrelevant. Also, 4000%? Most of that will _not_ mapt o general performance.
All that is to see here is pretty stupid reporting.
Re: (Score:2)
All that is to see here is pretty stupid reporting.
I see only stupid people seeing how TFS doesn't say it's general performance improvements. In fact it literally uses the phrase "uplift in specialized cases".
Re: (Score:2)
Yeah, there's a whole lot of skeptics on Slashdot basing their skepticism on, "That's a really big number. Can't be right." I will remind them that there is a big difference between skepticism and doubt. Skepticism demands more than a surface evaluation.
Re: (Score:1)
I throw 35 years of CS experience, a CS engineering PhD and experience with CPUs on all levels and understanding of system design into the pot. That enough for you to make it "more than a surface evaluation"?
Re: (Score:1)
You cannot keep quiet when you have nothing to say, can you? I am pointing something out. I am not making a claim. Of course, the difference is lost on you.
great (Score:5, Funny)
Impact, please. (Score:1)
Is this something that now takes microseconds instead of milliseconds for something that isn't done often, or something that takes milliseconds instead of large fractions of a second for something that's maybe done once or twice per boot on your average system?
If it were something done often enough to be noticable by Joe Average Linux User. I'd expect it to be bigger news.
Older equipment (Score:2)
The group I volunteer with got half a truckload of older systems. A speed up in the kernel would be welcome for these systems to see new life and boldly go where no cpu has gone before.
The only problem... (Score:2)
The specialized case seeing a 38x increase in performance is the HCF instruction.
But boy does it burn HOT!
int _low_level_write(int chan, char *buf, int len) (Score:2)
return 0;
//
}
One weird trick (Score:2)
And Linux magically runs 38x faster!
But like all of those "one weird tricks," this one line of code isn't all it's cracked up to be.