Intel Thread Director Is Headed to Linux for a Major Boost in Alder Lake Performance (hothardware.com) 38
The Linux 5.18 kernel is adding support this spring for the Intel Hardware Feedback Interface to make better decisions about where to place given work among available CPU cores/threads, reports Phoronix.
This is significant because Intel's Alder Lake CPUs "are the first x86-64 processors to embrace a hybrid paradigm with two separate CPU architectures on the same die," explains Hot Hardware: These two separate CPU architectures have different strengths and capabilities. The Golden Cove "performance cores" (or P-cores) feature Intel's latest high-performance desktop CPU architecture, and they are blisteringly fast. Meanwhile, the Gracemont "efficiency cores" (or E-cores) are so small that four of them, along with 2MB of shared L2 cache, can nearly fit in the same space as a single Golden Cove core. They're slower than the Golden Cove cores, but also much more efficient, at least in theory.
The idea is that background tasks and light workloads can be run on the E-cores, saving power, while latency-sensitive and compute-intensive tasks can be run on the faster P-cores. The benefits of this may not have been exactly as clear as Intel would have liked on Windows, but they were even less visible on Linux. That's because Linux isn't aware of the unusual configuration of Alder Lake CPUs.
Well, that's changing in Linux 5.18, slated for release this spring. Linux 5.18 is bringing support for the Intel Enhanced Hardware Feedback Interface, or EHFI...
This is essentially the crux of Intel's "Thread Director," which is an intelligent, low-latency hardware-assisted scheduler.
This is significant because Intel's Alder Lake CPUs "are the first x86-64 processors to embrace a hybrid paradigm with two separate CPU architectures on the same die," explains Hot Hardware: These two separate CPU architectures have different strengths and capabilities. The Golden Cove "performance cores" (or P-cores) feature Intel's latest high-performance desktop CPU architecture, and they are blisteringly fast. Meanwhile, the Gracemont "efficiency cores" (or E-cores) are so small that four of them, along with 2MB of shared L2 cache, can nearly fit in the same space as a single Golden Cove core. They're slower than the Golden Cove cores, but also much more efficient, at least in theory.
The idea is that background tasks and light workloads can be run on the E-cores, saving power, while latency-sensitive and compute-intensive tasks can be run on the faster P-cores. The benefits of this may not have been exactly as clear as Intel would have liked on Windows, but they were even less visible on Linux. That's because Linux isn't aware of the unusual configuration of Alder Lake CPUs.
Well, that's changing in Linux 5.18, slated for release this spring. Linux 5.18 is bringing support for the Intel Enhanced Hardware Feedback Interface, or EHFI...
This is essentially the crux of Intel's "Thread Director," which is an intelligent, low-latency hardware-assisted scheduler.
blah blah blah (Score:5, Funny)
The effect is very clear on Linux - The P-cores run very hot.
Re: (Score:2)
Is that what they mean by "blisteringly fast"?
Is this a rip off⦠(Score:2, Troll)
Re: (Score:3)
rip off of what Apple introduced in the M series and A12 CPU?
No, it's a rip off of ARM's BIG | little arch which Apple (presumably) licensed for their M series.
Perhaps they've come up with a better way to share the load among the cores but otherwise can't see the diff.
Re: (Score:1)
Don't know about the licensing, but Apple has had both performance and efficiency cores in their A series (iPhone, iPad) processors since the A10 [wikipedia.org] - so the past six generations.
Re:Is this a rip off⦠(Score:4, Insightful)
The first commercial implementation of it that I I'm aware of is Samsung's, which predated Apple by about 3 years.
Apple's implementation also had a limitation that Samsung chips (and Alder Lake, and Current Apple silicon) didn't have- it could only run P or E cores at once, not both. This closely matched ARM's solution for fast-tracking OS support (use a hypervisor to handle switching between clusters transparent to the OS)
Samsung left that decision up to the OS scheduler.
Re: (Score:3, Interesting)
And we'll take your word for it because... you don't even know that it's proper name is big.LITTLE?
I could care less if my word is taken for it. Someone will see my claim, and if they want to know the facts- they will go look for them and find that I am correct. I wouldn't have said them otherwise.
Beyond that, mistaking the capitalization isn't evidence of anything, logically speaking. If you judge facts based on grammatical errors, then you probably come to a lot of really fucking wrong conclusions in your life.
For the record, ARM announced big.LITTLE back in 2011 with the Cortex-A7 processors. Samsung started supporting big.LITTLE with Exynos 5420 in 2013. Apple introduced big.LITTLE with the A10 in 2016, but they fucked it up because A10 could run only efficiency cores, or performance cores, but not both at the same time (unlike every other big.LITTLE implementation).
That's literally what I wrote.
2016-2011 = 5 2016-2013 = 3.
Your dumbfuckery skills are on
Re: (Score:2)
Apple's implementation also had a limitation that Samsung chips (and Alder Lake, and Current Apple silicon) didn't have- it could only run P or E cores at once, not both. This closely matched ARM's solution for fast-tracking OS support (use a hypervisor to handle switching between clusters transparent to the OS)
Yup, the first-generation A10 had that limitation, while the second-generation (the A11) did not.
Re: (Score:2)
I'm not trying to engage in a flame war on "Who's better, Apple or Samsung", because, frankly, Apple's processors are flat out fucking better in every way.
But they were way behind the curve in HMP processors. I interpreted (maybe erroneously) your post saying, "They've been doing this for 6 generations...", as somehow countermanding the person you replied to who said "Apple didn't come up with this shit."
If I was mistaken, I apologize.
Re:Is this a rip off⦠(Score:4, Funny)
My god Apple fanbois are a tedious bunch. You probably think iTunes was when Apple invented music.
Re:Is this a rip off⦠(Score:5, Funny)
Apple invented music in 1968 with the release of the White Album.
Re: (Score:2)
Nope- much more fundamental than that. This is a ripoff of what Apple introduced when they invented microprocessors.
Re: (Score:2)
Simple Algorithm? (Score:4, Insightful)
Shouldn't this be a simple matter of tracking how often tasks yield, and using the p-cores for tasks likely to need more than a full tick of processor time? It would seem like a bit of tuning would tend to get that right without a lot of complication.
Re: (Score:3)
Re: (Score:2)
Of course they do. What, did you expect these new Intel chips to require recompiled binaries to work?
Sorta. The AVX-512 instructions don't work on the e-cores. But, by default, they don't work on the p-cores either and will only run if the e-cores are disabled.
Ideally, the programmer should be able to specify which thread runs on which type of core.
Re: (Score:2)
Ideally, the programmer should be able to specify which thread runs on which type of core.
That puts control of the system policy in the hands of some application.
I'd accept the programmer being able to give the OS hints (just like Thread Director does)
But in no way should the programmer be in control of such a thing.
Re:Simple Algorithm? (Score:5, Funny)
Re: (Score:1)
You mean blocking for I/O on a disk/network/keyboard/USB should be easy to detect and run an e-core? In fact, shouldn't all processes default to e-cores and only have the high CPU threads allocated to p-cores? That can't be all that difficult.
Re: (Score:2)
If you're implying that a thread that blocks should only run on an e-core, I find that confusing, since you seem to be suggesting that threads that require I/O should only be able to process it slowly.
Re:Simple Algorithm? (Score:5, Insightful)
I.e., look at a thread of execution and tell the OS what the power/performance/latency tradeoff is between executing on which core.
It's pretty neat, actually. It's a lot better than the "dumb" HMP that exists currently.
Misread the title (Score:5, Funny)
My first thought was "Who is Intel's Thread Director and why is he going to work for Linux?" Also, is he in any way related to that General Failure guy who has been reading my drive?
Re: (Score:2, Funny)
No relation. Thread Director used to work for General Protection Fault, whose portfolio also includes malware that makes your programs crash.
Re: (Score:1)
Just make sure Major Catastrophe doesn't get involved.
A bigger picture (Score:3)
Re: (Score:1)
To what end do you want this BorgOS ? What's the use case ?
If all you want to do is run generic computing jobs, like, say, run things Seti@home on most of of your devices, that don't depend on any I/O except network, all the pieces exist to do it today, at least for devices where OS/firmware can be replaced to run some version of Linux, which is many of them, thanks to inevitable security vulnerabilities. Of course, no single hypothetical BorgOS actually exists. But you can load Linux on many devices, some
Re: (Score:2)
This doesn't sound all that "hard to do" (as in, I don't think you have to write any code beyond a few simple shell scripts, not that it would necessarily be trivial to configure) with existing Unix software. To wit, you would construct applications to run on a personal cloud, and run each one against its own network display server, then you'd connect that server to whichever display you were near. If your application can be constructed in such a fashion that it calls out sub-processes to do heavy lifting w
And why would we trust Intel? (Score:2)
I assume that they'll fuck it up and introduce a bunch of security problems that will end up giving a net slowdown which patch (if they bother to patch them).
Why is anyone still buying Intel's shit processors?
Re: (Score:3)
Why is anyone still buying Intel's shit processors?
Because they have more marketing than AMD. Intel inside your brain.
Re: (Score:3)
Of course, that's subject to change. We'll see when Zen4 arrives.
Who cares? (Score:1)
Nobody smart is running that 2nd rated, overpriced and insecure Intel crap anyways these days.
Re: (Score:2)
Re: (Score:2)
For anything than gaming when you have too much money? Yes. Nobody sane buys Intel for a DC these days.
Re: (Score:2)
For anything than gaming when you have too much money? Yes. Nobody sane buys Intel for a DC these days.
That's just untrue. :P
I am part of an organization that manages a little under a dozen datacenters
AMD is putting along at around 10% of new purchases.
You can't define 90% of purchases as insane.
Also, the 12900K is significantly faster than, and cheaper than, competing AMD desktop parts.
In fact, as of Alder Lake, the only edge AMD has is performance per watt (which they still hold a significant advantage in)
But performance per dollar is now far below Intel, as well as the crown for top performer (whic