Linus Torvalds Rails At Intel For 'Killing' the ECC Industry (theregister.com) 218
An anonymous reader quotes a report from The Register: Linux creator Linus Torvalds has accused Intel of preventing widespread use of error-correcting memory and being "instrumental in killing the whole ECC industry with its horribly bad market segmentation." ECC stands for error-correcting code. ECC memory uses additional parity bits to verify that the data read from memory is the same as the data that was written. Without this check, memory is vulnerable to occasional corruption where a bit is flipped spontaneously, for example, by background radiation. Memory can also be attacked using a technique called Rowhammer, where rapid repeated reads of the same memory locations can cause adjacent locations to change their state. ECC memory solves these problems and has been available for over 50 years yet most personal computers do not use it. Cost is a factor but what riles Torvalds is that Intel has made ECC support a feature of its Xeon range, aimed at servers and high-end workstations, and does not support it in other ranges such as the Core series.
The topic came up in a discussion about AMD's new Zen 3 Ryzen 9 5000 series processors on the Real World Tech forum site. AMD has semi-official ECC support in most of its processors. "I don't really see AMD's unofficial ECC support being a big deal," said an unwary contributor. "ECC absolutely matters," retorted Torvalds. "Intel has been detrimental to the whole industry and to users because of their bad and misguided policies wrt ECC. Seriously. And if you don't believe me, then just look at multiple generations of rowhammer, where each time Intel and memory manufacturers bleated about how it's going to be fixed next time... And yes, that was -- again -- entirely about the misguided and arse-backwards policy of 'consumers don't need ECC', which made the market for ECC memory go away."
The accusation is significant particularly at a time when security issues are high on the agenda. The suggestion is that Intel's marketing decisions have held back adoption of a technology that makes users more secure -- though rowhammer is only one of many potential attack mechanisms -- as well as making PCs more stable. "The arguments against ECC were always complete and utter garbage. Now even the memory manufacturers are starting to do ECC internally because they finally owned up to the fact that they absolutely have to," said Torvalds. Torvalds said that Xeon prices deterred usage. "I used to look at the Xeon CPU's, and I could never really make the math work. The Intel math was basically that you get twice the CPU for five times the price. So for my personal workstations, I ended up using Intel consumer CPU's." Prices, he said, dropped last year "because of Ryzen and Threadripper... but it was a 'too little, much too late' situation." By way of mitigation, he added that "apart from their ECC stance I was perfectly happy with [Intel's] consumer offerings."
The topic came up in a discussion about AMD's new Zen 3 Ryzen 9 5000 series processors on the Real World Tech forum site. AMD has semi-official ECC support in most of its processors. "I don't really see AMD's unofficial ECC support being a big deal," said an unwary contributor. "ECC absolutely matters," retorted Torvalds. "Intel has been detrimental to the whole industry and to users because of their bad and misguided policies wrt ECC. Seriously. And if you don't believe me, then just look at multiple generations of rowhammer, where each time Intel and memory manufacturers bleated about how it's going to be fixed next time... And yes, that was -- again -- entirely about the misguided and arse-backwards policy of 'consumers don't need ECC', which made the market for ECC memory go away."
The accusation is significant particularly at a time when security issues are high on the agenda. The suggestion is that Intel's marketing decisions have held back adoption of a technology that makes users more secure -- though rowhammer is only one of many potential attack mechanisms -- as well as making PCs more stable. "The arguments against ECC were always complete and utter garbage. Now even the memory manufacturers are starting to do ECC internally because they finally owned up to the fact that they absolutely have to," said Torvalds. Torvalds said that Xeon prices deterred usage. "I used to look at the Xeon CPU's, and I could never really make the math work. The Intel math was basically that you get twice the CPU for five times the price. So for my personal workstations, I ended up using Intel consumer CPU's." Prices, he said, dropped last year "because of Ryzen and Threadripper... but it was a 'too little, much too late' situation." By way of mitigation, he added that "apart from their ECC stance I was perfectly happy with [Intel's] consumer offerings."
ECC isn't an industry - it's a SKU (Score:3)
Re:ECC isn't an industry - it's a SKU (Score:5, Informative)
It is available only on buffered memory BECAUSE of Intel forcing a marketing distinction that ECC is for servers.
While the unbuffered memory spec allows ECC, you can find official implementations only on very obscure AMD systems like f.e. the early Proliant Microservers. Everywhere else - the CPU (if it is AMD) supports it, the chipset (if it is from AMD) supports it as well. Unfortunately, the motherboard manufacturer has turned it off deliberately in order to perform an Abela Danger impersonation on their Intel account manager. Going back to the Proliant Microserver Gen 1 - these had ECC only because HP used all of its 800 pound Gorilla weight and only because it called it a "server".
So Linux is actually right. It is yet another case of Intel screwing up the industry by doing things similar to what they did to "compete" with the AMD Athlon and Opteron in the early 2000s.
Re: (Score:2)
No, because nothing you've said makes an argument for ECC being an "industry". The post you objected to is that ECC isn't an industry. It isn't, it's a product feature.
Re: (Score:2)
He is arguing it isn't "still available".
Re:ECC isn't an industry - it's a SKU (Score:5, Interesting)
It's not that bad.
On the motherboard front it varies from manufacturer to manufacturer, but all ASRock AM4 boards support ECC un-buffered DIMMs.
Such DIMMs are readily available, e.g. Kingston make them. Less common with high end overclockable RGB adorned gaming ones, partly because the extra bit adds cost and limits maximum speed.
Where it's really lacking is on laptops. Ryzen mobile parts support ECC but no laptops seem to. I'm hoping that this year Lenovo releases some decent Ryzen 5 machines with ECC support but we shall see.
Re: ECC isn't an industry - it's a SKU (Score:4, Informative)
Sprinklings of ECC (Score:2)
Re: (Score:3)
less expensive product wins despite being clearly inferior.
The, most important point in the article for me was that you can do ECC with cheap AMD processors, though you need special motherboards. Given that the AMD processors seem to be cheaper for the same level of performance right now, I think there's a chance that the better product can win here. Main problem seems to be lack of clarity about motherboard support [ycombinator.com].
Re: (Score:2)
There are sprinklings of ECC Intel CPUs in the Core and even Celeron range but it's fleeting and random. I've got a FreeNAS with a Celeron G3930 and ECC. I'm sure if people sought these select parts out Intel would have gotten the message but once again the less expensive product wins despite being clearly inferior.
Exactly. I had a committee that bought laptop computers a few years back for a project. They bought the cheapest things they could find. I would have been okay with that, but the laptops weren't up to the task.
Fast forward to today - New committee, same task. I keep my thumb on it this time. Every 5 minutes I have to remind them that the goal is not to buy the cheapest thing out there, fit for purpose is the goal.
Problem is, in the Windows world, so many aren't capable of breaking out of the cheap par
Re: (Score:3)
They have basically killed ECC on any consumer part that might compete with the xeon line. You only find ECC on celeron/i3/etc level parts that are considered quite low end.
But even then, the current i3/etc parts are similarly priced with the xeons, but won't work with ECC on a "consumer" chipset. So you end up paying the intel ECC tax via the motherboard now.
Business as usual (Score:2)
And by that I don't mean Intel being selfish money-grubbing assholes, I mean Linus being angry at selfish money-grubbing assholes.
I agree with him, ECC should have become standard because the smaller the transistors are, the more susceptible they are to background radiation. Being more secure would only have been the cherry on top. Unless you don't like cherries, in which case pick your own damn topping.
Re: (Score:3)
Indeed, but price matters - especially in the windows world. There, users are very accustomed to the occasional, unexplained system crash, loss of a spreadsheet, reboot and carry on. Thus, if the unexplained crash was actually due to a bit flip here and there, well, so be it - it'll be lost amongst the noise of crashes due to crappy software. I suspect the expectation of security is similarly low - fixing Rowhammer in a windows server isn't really going to secure it against all that much, since there are "e
Re: (Score:2)
Indeed, but price matters - especially in the windows world. There, users are very accustomed to the occasional, unexplained system crash, loss of a spreadsheet, reboot and carry on.
What about the occasional bit flip or system crash that, rather than destroying your spreadsheet, merely corrupts it? Or some other piece of vital data?
Re: (Score:2)
Yeah, I get it - but "most" Windows people probably don't, or don't care (at least, not enough to pay a few extra for their laptop, anyway).
Re: (Score:2)
Yeah, I get it - but "most" Windows people probably don't, or don't care (at least, not enough to pay a few extra for their laptop, anyway).
I wonder how many of them *would* pay extra for alloy wheels or a gigantic TV screen...
Re: (Score:3)
Previous employer shifted a ton of processing from North America to India ... including the security database. In the first several months, the number of support calls to regain lost access went through the roof. Turns out the computer there was extremely bad with regard to flipped bits. Rather than actually, ya know, fix the problem by upgrading the hardware, or installing ECC, they wrote a batch program that basically overwrote the whole database, on an hourly basis, using the master (back in North Ame
Re: (Score:3)
Who the fuck runs a database server without ECC? Answer: someone that needs to be fired.
Re: (Score:2)
To put that idiocy in perspective, here's a related tale (same employer)
Support for an existing application was moved from the group (here in North America) to India.
The team here was re-assigned to other roles.
In three years, not a single project was completed on the application by the group in India.
In every single project, the former team were required to drop what they were doing, and actually do 99.9% of the work.
Total billing for the India group, over the three years, was in the millions of dollars..
Re:Business as usual (Score:5, Insightful)
Indeed, but price matters - especially in the windows world.
I don't buy this. RAM prices are so volatile that the extra cost of ECC would get lost in the noise if it were a standard feature instead of something used to create a premium tier for servers.
Re: (Score:3)
Exactly, when i order my ECC ram a few years ago for my AMD cpu, the price difference wasn't that great (IIRC about 10%), but biggest problem was finding it and the delivery time. It was much harder to find the correct module and delivery times were several weeks, where non-ecc had less than 1 week delivery time.
Also MB, even in AMD, it is hard to find ECC enabled Motherboards because intel pushed that end-user do not need ECC, so that is a easy way for MB sellers to cut costs. Many times only the high end
Re: (Score:3)
But this is a regression. 10-15 years ago it wasn't hard to find ECC RAM at a modest premium. The market disappeared because Intel killed it to be able to use it as a differentiator on their server-class CPUs. It's an artificial distinction.
Re:Business as usual (Score:4, Insightful)
Indeed, but price matters - especially in the windows world. There, users are very accustomed to the occasional, unexplained system crash, loss of a spreadsheet, reboot and carry on.
No one is using Win9x any more - welcome to the 21st. The new annoyance is finding your machine has auto-patched and rebooted in the night.
Car analogy (Score:2, Interesting)
I used to look at the Xeon CPU's, and I could never really make the math work. The Intel math was basically that you get twice the CPU for five times the price.
I drove a 2006 Toyota that could do 100mph, given some time to get up to speed. IIRC, base price was $15,000. If I wanted something to go 200mph, I'd be paying a lot more than $30,000. I don't see why performance for anything at the upper levels of performance, CPU or car, should necessarily be expected to have a linear relation to price.
Re: (Score:2)
Bad car analogy. Toyota can't merge two of their cars together to enable you to drive at 200mph, but Intel can make dual core CPUs for twice the price or even less.
Re: (Score:2)
Yes but in this case its more like having both the front and rear brakes on the same hydrolic circuit.
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
The best value for money in dollars vs. top speed on four wheels for street legal vehicles is probably a Corvette. If it doesn't have to be street legal then it would be a belly tank landspeed car, and if it doesn't have to be 4 wheels then it would be a Japanese sportbike.
Re: Car analogy (Score:2)
> Corvette
Unless your streets have corners, of course. ;)
Re: (Score:2)
Cornering performance improved greatly from the C5-C7 generations, and the C5's handling wasn't bad.
Re: (Score:2)
Should add that I drove a C4 in an autocross once and the handling on that was decent too, very much like a big NA/NB Miata.
Re: (Score:2)
I used to look at the Xeon CPU's, and I could never really make the math work. The Intel math was basically that you get twice the CPU for five times the price.
I drove a 2006 Toyota that could do 100mph, given some time to get up to speed. IIRC, base price was $15,000. If I wanted something to go 200mph, I'd be paying a lot more than $30,000. I don't see why performance for anything at the upper levels of performance, CPU or car, should necessarily be expected to have a linear relation to price.
I really think we should stop comparing both hyper-processors and hyper-cars, because both are infected with hyper amounts of Greed. Pricing often makes no sense for either in reality. And when shopping for a desktop Xeon, you're shopping for a nice BMW upgrade, not a bespoke Bugatti with 16 cylinders and 300MPH performance. Sure, there's an expected premium, but not a 5x premium.
Re: (Score:3)
Re: (Score:3)
A 2021 BMW 7 Series has a base price of $86,000. It's hardly a bespoke Bugatti, but still carries a ~5x premium over a 2021 Toyota Yaris (closest modern equivalent I could find to the car I drove in '06).
There's a reason I said "BMW" and not "M6". A $40K BMW 3-series will likely give you roughly twice the performance over a Yaris at twice the cost, which more equates to a single CPU Xeon configuration alternative in a desktop PC (the scenario Linus outlined)
Your 7-series is more akin to server-class architecture. Quite premium, and quite limited in audience. Bugatti is a purpose-built high-performance cluster.
What about Apple? (Score:2)
I can't find any information about this so I'm assuming their new M1-powered Macs don't have ECC RAM either. Does anyone have inside information about ECC being looked into by their engineers? Is there a petition asking for ECC in future Macs? What about asking them via email?
Re: (Score:2)
Re: (Score:2)
Apple uses expensive high-end components but often puts them together in stupid ways or sticks with expensive in-house-made designs that are bad. They wouldn't use hardware that wouldn't make a noticeable improvement to the user experience though, so there goes any idea of using ECC RAM, before the beancounters even had a chance to tear it out.
Re: What about Apple? (Score:3)
You mean that high-end i7 that is perma-heat-throttling to i5/i3 levels because the Macbook case design put form over function?
Re: (Score:2)
Yes, an excellent example.
Not entirely correct (Score:5, Informative)
My 12/24 core Mac Pro has 64 GB of DDR3 ECC memory. So Apple does design ECC memory in for some cases.
system pic here, with uptime composited in [fyngyrz.com]
Re: (Score:2)
If Apple is "shit" for build quality, who's "great" in your book?
Re: (Score:2)
Re: What about Apple? (Score:2)
Apple IS shit in built quality. On the inside. And keyboards. And mice. And some displays. And ...
Apple only SEEMS like high quality to their vanity-luddite target group.
Seriously, go take a look at a Louis Rossmann video about Apple laptops. Just look at those sorry excuses of a hinge, abysmal cooling, glued-in crap, shitty fluid protection, easily breaking connectors, ...
Old Thinkpad series T and X had great build quality. New ones are still way ahead of Apple from what I saw. They don't come in hipster p
Re: (Score:2)
Dell is actually doing pretty well these days, though not without their flaws. Their notebooks have been packing some good hardware in, but it's the little things that suck - using garbage thermal paste on the discrete GPUs, etc.
But at least you can swap out the fucking SSD without a soldering iron.
Re: (Score:3)
M1 has RAM integrated into the SoC package itself. You get what Apple decides to sell you. To the best of my knowledge, none of the memory configs that currently exist for M1 feature ECC.
Demand (Score:5, Informative)
There's so much demand for memory that nobody is taking the consumer ECC market by storm by offering ECC RAM for the proportional increase in manufacturing cost.
Last I looked ECC RAM could cost more than double decent gaming RAM and that's reflective of the fact that business buyers will pay it, not that the parity bit doubles manufacturing cost.
AMD is doing the right thing but until worldwide RAM supply increases ECC will remain expensive. Which is unfortunate.
I'm disappointed, though, that Linus gave Intel a pass on the speculative execution and IME disasters. Rowhammer isn't even needed on many Intel systems. AMD does somewhat better in all these cases, so it's possible.
Comment removed (Score:4, Funny)
I've never ran ECC Ram (Score:2)
Re: I've never ran ECC Ram (Score:2)
Yes, people dont notice, because everything is just hoarded and never looked at again. But just try to check your oldest 128GB for errors. If the parser can even detect them... (Excel will not complain if your budget data cell's value suddenly changed from $1500 to $150,000,000.)
Re: (Score:2)
The problem with that thinking is that something like ECC isn't a problem, until all of a sudden it is. Smaller transistors are more susceptible to bit flipping from cosmic radiation, etc. - so as lithography processes shrink, probability of needing ECC goes up.
And there's certain applications and technologies that would be suicidal to run without ECC, most of which are still in the datacenter, but as with many things that start as server technologies, they end up in the consumer space before too long - th
Re: (Score:2)
Re: (Score:3)
I prefer MemTest86 and let it run extended tests for a full 24 hours.
And of course that only proves if the RAM is bad. Bits can flip just due to a stray cosmic ray hitting the RAM chip just right -through no fault of the hardware. Unless you want to schedule your computing around solar activity, it's better to just have a parity bit.
Re: (Score:3)
Re:Demand (Score:5, Interesting)
not that the parity bit doubles manufacturing cost.
If they have to run two separate chip fabs or fabrication lines then it could easily double the cost.
What they could do is make all RAM have ECC then disable one bit on non-ECC RAM. That could actually drive the cost down by increasing yield of non-ECC parts.
Re: (Score:2)
What they could do is make all RAM have ECC then disable one bit on non-ECC RAM. That could actually drive the cost down by increasing yield of non-ECC parts.
There speaks a business person, I trow! 8-)
Re: (Score:2)
Here's a person with a deep understanding of ECC and Parity memory. ;)
Linus is completely right (Score:5, Interesting)
There is no reason why consumer CPUs should do without ECC support.
Users would still have the choice - go cheap, or go ECC - but it would be up for them to decide.
Pros with high memory requirements would go for ECC just to make sure their software development, AI modelling, video editing etc. box is more stable.
I will in the next upgrade. For X470 boards choice of ECC sticks was extremely limited and support too iffy.
Re: Linus is completely right (Score:3, Funny)
Re: (Score:2)
Hint: it's a lot worse than 3%.
Re: Linus is completely right (Score:3)
Seconded. There is no technical reason for it to be even one clock cycle slower. Since it can all be hardware.
But parent commenters show the self-fulfilling prophecy that causes this problem well.
Re: (Score:3)
Ahem:
https://www.techspot.com/revie... [techspot.com]
And that's just "tuned" memory vs. XMP. Try running any of those games with typical ECC DIMM speeds of DDR4-2666 CAS 19 (yech!) and the performance difference would be massive.
In application performance, it's a mixed bag. It depends on the application. Stuff like Blender shows no improvement with memory speed, while stuff like y-cruncher does show a difference. My 3900X shows a 12.5% improvement going from DDR4-2666 14-16-14-28 to DDR4-3666 14-16-14-28. Next time,
Comment removed (Score:5, Insightful)
Re: (Score:2)
Re: Linus is completely right (Score:2)
Excuse me, but do you live in the Sahara?
Around here, once a year is still a bit unecessarily much.
Re: (Score:2)
RGB-less roomy case (Score:2)
I just want to buy a computer case with plenty of room and no ridiculous lighting!
I feel your pain!
(For my personnal tastes, the "big tower" form factor peeked at Lian Li's PC P80 "Armorsuit" tower)
Another reason to go with AMD (Score:2)
I dunno Linus... (Score:4, Insightful)
I wouldn't personally spend money on ECC RAM. I've worked on many dozens of computers and hundreds of code bases as a programmer and I have never pinpointed RAM corruption as a noteworthy failure point. There are a lot of things to worry about in life, like failing hard drives. RAM corruption at most requires a reboot.
Re:I dunno Linus... (Score:5, Interesting)
I did have a very nasty experience with bad memory. My server's PostgreSQL logs kept logging odd messages like:
ERROR: syntax error at or near "SEHECT"
WTF, "SEHECT"? Turns out bad memory was occasionally flipping bit 2 (asc(L) xor asc(H) = 4) and making the server extremely unstable. So it does/can happen, and if it happens in just the right way, it can corrupt your data without you noticing it.
symptoms of bad RAM (Score:2)
I did have a very nasty experience with bad memory. My server's PostgreSQL logs kept logging odd messages like:
ERROR: syntax error at or near "SEHECT"
WTF, "SEHECT"? Turns out bad memory was occasionally flipping bit 2 (asc(L) xor asc(H) = 4) and making the server extremely unstable. So it does/can happen, and if it happens in just the right way, it can corrupt your data without you noticing it.
I had a similar experience on a PDP-6 that had parity memory but not ECC. Bit 13 of a certain word would flip, and nobody seemed to know why. Fortunately, that word was in the middle of a scheduler table that only used the high-order bit. The console KSR35 printed a multiline message about the parity error when it occurred, then rewrote the word to correct the parity.
I was afraid to modify the operating system for fear that the bad word would contain something for which bit 13 was important, such as an i
Re: (Score:3)
Though, are you sure someone didn't switch your L and H keys?
Re: (Score:2)
RAM problems are rare but they're terrible when they strike. A computer with bad RAM is an insane computer that will corrupt its own files among other things. My servers have a live system RAM testing script [slashdot.org] to try to catch these errors before they do significant damage. This costs time and energy however.
Re:I dunno Linus... (Score:5, Insightful)
I have never pinpointed RAM corruption as a noteworthy failure point.
Yes. You have never *pinpointed* it - or maybe even suspected it. By the same token, you cannot be quite sure it is not happening. It's even worse, of course, if you don't notice it than if you get a crash or something else spectacular.
There are a lot of things to worry about in life, like failing hard drives. RAM corruption at most requires a reboot.
Unless, as I said before, it corrupts your data. Or corrupts your code in such a way that the code deletes or corrupts your data.
The possibilities are endless.
Re: (Score:3)
RAM corruption is typically a factor of area - how much area does RAM contain.
ECC memory is mandatory on clusters, because the amount of square footage RAM is exposed to for say, cosmic rays that will corrupt RAM a bit at a time.
It gets pretty famous when a cluster of non-ECC machines couldn't stay up for more than about 5 minutes before a machine would crash. I believe the U of VA had create a cluster of Macs for some analysis years ago, and basically that's all they'd stay up for before some machine or ot
Re: (Score:2)
I don't know if I'd buy ECC RAM or not. Yet, I agree with you that there's so many points of failure on a computer that RAM is pretty low on my list.
For me, I first worry about my storage as that's where my data lies. Everything else you can just buy. If the rest of my computer hardware starts to act funny (graphics, cpu, memory), that's generally after years of operating. At that time, it's often not worth the headache of even troubleshooting and I just upgrade as I need to upgrade anyways. Everyone's got
Re: I dunno Linus... (Score:2)
Yeah, and I have never seen my tiger-repellent rock not working!
Re: (Score:3, Interesting)
So you're ignorant of real world, typical code monkey. Why don't you read google studies, such as one that found they were getting one single-bit-error every 14 to 40 hours per Gigabit of DRAM.
https://www.intelligentmemory.... [intelligentmemory.com]
are there AMD desktop socket system with ipmi? (Score:2)
are there AMD desktop socket system with ipmi? and ecc?
Intel does have them and the cpu prices are in line with the desktop ones.
Re: (Score:2)
Saw it today in RSS:
https://www.anandtech.com/show... [anandtech.com]
ECC and non-ECC memory supported.
It's about gamers, idiot. (Score:2, Interesting)
Not using ECC in desktop is about gamers... the ECC checks take time, and ECC also has buffered outputs, which improve signal integrity at the expense of propagation delay.
Gamers will not accept this slowdown, and even though Ryzen 5k supports ECC, nobody is buying it (but maybe that's just because, so far, Ryzen 5k is a paper launch).
Re: It's about gamers, idiot. (Score:2)
Bullshit. ECC checks taking time is an old myth.
Badly missed (Score:5, Insightful)
One of the worst parts of having to move from VAX(or Alpha)/VMS to Windows on PCs cobbled together from the cheapest parts was losing ECC RAM. For a few years it felt like driving with no seat belt.
I wasted God knows how many hours, days, weeks troubleshooting weird PC problems that may well have been due to RAM corruption. Talk about false economy!
It is true that you can get PCs and workstations with ECC RAM, but your choice of other features is severely squeezed. Not to mention having to pay through the nose.
The loss was compounded by Windows' almost complete lack of any error reporting and analysis software. (Not that Linux is any better). On VMS every single hardware error could be logged, down to every bit in every hardware register. And comprehensive diagnostic software tested all the functions, so you could always tell if you had any kind of solid fault.
I can't help feeling that the cheapness of PCs and Windows was traded off, not for anything else in the manufacturer's world, but for the customer's time.
Re: (Score:2)
In fact, I recall that some time in the early 1980s I got a call from a very brilliant colleague who had gone across the pond to work as a reliability consultant in VAX/VMS engineering.
Someone had floated the apparently hare-brained idea that, by omitting ECC, they could make RAM work faster. The suggestion was that, even if you lost a few days a year recovering from failures, you would still come out ahead because you'd get more work done for the same cost.
Perhaps significantly, I never heard anything furt
bad argument for omitting ECC (Score:2)
In fact, I recall that some time in the early 1980s I got a call from a very brilliant colleague who had gone across the pond to work as a reliability consultant in VAX/VMS engineering.
Someone had floated the apparently hare-brained idea that, by omitting ECC, they could make RAM work faster. The suggestion was that, even if you lost a few days a year recovering from failures, you would still come out ahead because you'd get more work done for the same cost.
Perhaps significantly, I never heard anything further about that scheme - and VAXen (and later Alphas) continued to have quite sophisticated ECC in various places.
Another argument at DEC for omitting ECC was that ECC added to the complexity of the system, and therefore increased the chances of a failure, making the computer less reliable. The hardware people making this argument didn't seem to understand the difference (to the customer) of a failure that stopped the computer as opposed to a failure that silently corrupted data.
Re: (Score:2)
And of course the term desktop has shifted slightly in usage now that workstation and desktop is a mostly meaningless distinction.
It's been quite a few years since I first noticed that some of the latest IBM mainframes looked to me like workstations...
I bet they have ECC!
It's not so simple as making it available (Score:5, Informative)
Torvalds makes a decent point about Rowhammer, but otherwise, ECC isn't "that big of a deal" for consumer systems. It's actually a hindrance for a lot of PCs since the performance of ECC RAM can be so poor compared to non-ECC SKUs. Let's take a look at some available ECC, non-registered DIMMs (which is what Torvalds would want everyone using, presumably):
https://pcpartpicker.com/produ... [pcpartpicker.com]
Fastest SKU available is a 16GB stick of Micron MTA18ADF2G72AZ-3G2E1 DDR4-3200. It has a CAS/CL of 22 and a first-word latency of 13.75ns (ouch).
Now it's look at non-ECC RAM!
https://pcpartpicker.com/produ... [pcpartpicker.com]
(to be fair, I filtered for 16GB kits to make a more-direct comparison to the ECC Micron DIMM above)
We have kits going all the way up to DDR4-4400, though in terms of first-word latency, this kit in particular stands out:
https://pcpartpicker.com/produ... [pcpartpicker.com]
DDR4-4266 and CAS/CL 17 with first-word latency of ~8ns? Yes please! And at lower price points, you can still find 16GB DIMMs that run circles around that Micron ECC DIMM in the DDR4-3200 -> DDR4-3600 range at much-more-reasonable prices. That Micron ECC DIMM isn't even available anymore, except maybe on eBay (I didn't look).
Maybe if ECC were the defacto standard for ALL PCs, then higher-performance ECC non-registered DIMMs might exist. As it stands, they don't. Which is interesting, since there is a segment of power-users (notably those that buy Threadripper or HEDT Intel CPUs) that could use high-performance ECC RAM in their workstations. If the market is large enough to accommodate those CPUs and motherboards, it should be large enough for ECC RAM to match. As it stands, HEDT buyers just use gamer memory kits more-often-than-not. Those that need ECC take a massive performance hit.
Re: (Score:3)
...
Maybe if ECC were the defacto standard for ALL PCs, then higher-performance ECC non-registered DIMMs might exist. As it stands, they don't. Which is interesting, since there is a segment of power-users (notably those that buy Threadripper or HEDT Intel CPUs) that could use high-performance ECC RAM in their workstations. If the market is large enough to accommodate those CPUs and motherboards, it should be large enough for ECC RAM to match. As it stands, HEDT buyers just use gamer memory kits more-often-than-not. Those that need ECC take a massive performance hit.
I am building a new home PC and I want lots of RAM. The motherboard has eight DIMM slots, each slot able to handle a DIMM up to 32 GiB. When I was researching memory on the motherboard manufacturer's list, the largest ECC DIMMs I found were 16 GiB, but there was also a non-ECC 32 GiB DIMM for the same price. I went with the non-ECC memory.
I don't understand why adding ECC costs the same as doubling the density of the RAM chips. There is room on the DIMM for a ninth chip.
Re: (Score:2)
I think the problem is that you were looking at ECC, unregistered/unbuffered DIMMs, which are sort of the odd man out in the ECC world. Here are all the 32GB DDR4 ECC registered DIMMs I could find on a moment's notice:
https://pcpartpicker.com/produ... [pcpartpicker.com]
Unregistered/unbuffered ECC?
https://pcpartpicker.com/produ... [pcpartpicker.com]
Just one, and it's over $600 for a 32GB DDR4-2133 DIMM. The market for this type of RAM is small enough that there isn't much incentive to manufacture it, making it scarce. Scarcity drives price.
Re: (Score:2)
RAM doesn't drive hardware sales and upgrades. It's always an afterthought after CPU and motherboard.
Re: (Score:3)
I never could make Rowhammer work (Score:5, Interesting)
On any desktop, that is. But I noticed that all Rowhammer papers I looked at used laptops. Many laptops come with reduced RAM refresh because that saves power. It does make the RAM more vulnerable to Rowhammer though.
Re: I never could make Rowhammer work (Score:3)
Then I have some Corsair RAM to sell you...
Re: (Score:3)
So maybe my "problem" is that I do not buy crappy RAM? Well...
'starting to do ECC internally' (Score:2)
All DDR5 comes with ECC (Score:3, Informative)
ECC is not just a Parity bit. (Score:3, Informative)
Re: (Score:3, Informative)
ECC codes do nothing on their own, the OS and specific software (or Interrupt routines) actually has to act on the ECC byte and compute the parity bit, and if you're looking for performance on your home systems, would you give up 1-5% of your performance?
This is utter BS. ECC is done in the memory controller. The underlying software need not even know it is there unless it wants to look at error stats/respond to error events.
Why is there even a need for CPU support? (Score:4, Insightful)
What do you need CPU support? The summary even mentions that manufacturers are starting to do it internally. This seems like the correct way to do it anyways. Why does the computer need to know whether it is ECC ram or not. I realize they are different and are not swappable but why?
Re: (Score:3)
Internal ECC only protects the memory cells. That's the most vulnerable point, but it's still better if you can protect the bits while in transfer. That requires support from the memory controller, which is part of the CPU these days.
Re: (Score:3)
It can NOT be done in the CPU, or memory controller, without the extra data lines for the ECC code bits to be set and read. My desktop Xeons have the additional data lines, and the DIMM sockets are populated with the corresponding RAM. My laptop i7s do not.
If/when there were bits incorrectly stored, flipped while stored, or incorrectly read, the Xeons will detect this (up to a couple of simultaneous bits per line), and the operating system can generate an appropriate error response, such as no longer usin