Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Intel Security Linux

Linus Torvalds Rails At Intel For 'Killing' the ECC Industry (theregister.com) 218

An anonymous reader quotes a report from The Register: Linux creator Linus Torvalds has accused Intel of preventing widespread use of error-correcting memory and being "instrumental in killing the whole ECC industry with its horribly bad market segmentation." ECC stands for error-correcting code. ECC memory uses additional parity bits to verify that the data read from memory is the same as the data that was written. Without this check, memory is vulnerable to occasional corruption where a bit is flipped spontaneously, for example, by background radiation. Memory can also be attacked using a technique called Rowhammer, where rapid repeated reads of the same memory locations can cause adjacent locations to change their state. ECC memory solves these problems and has been available for over 50 years yet most personal computers do not use it. Cost is a factor but what riles Torvalds is that Intel has made ECC support a feature of its Xeon range, aimed at servers and high-end workstations, and does not support it in other ranges such as the Core series.

The topic came up in a discussion about AMD's new Zen 3 Ryzen 9 5000 series processors on the Real World Tech forum site. AMD has semi-official ECC support in most of its processors. "I don't really see AMD's unofficial ECC support being a big deal," said an unwary contributor. "ECC absolutely matters," retorted Torvalds. "Intel has been detrimental to the whole industry and to users because of their bad and misguided policies wrt ECC. Seriously. And if you don't believe me, then just look at multiple generations of rowhammer, where each time Intel and memory manufacturers bleated about how it's going to be fixed next time... And yes, that was -- again -- entirely about the misguided and arse-backwards policy of 'consumers don't need ECC', which made the market for ECC memory go away."

The accusation is significant particularly at a time when security issues are high on the agenda. The suggestion is that Intel's marketing decisions have held back adoption of a technology that makes users more secure -- though rowhammer is only one of many potential attack mechanisms -- as well as making PCs more stable. "The arguments against ECC were always complete and utter garbage. Now even the memory manufacturers are starting to do ECC internally because they finally owned up to the fact that they absolutely have to," said Torvalds. Torvalds said that Xeon prices deterred usage. "I used to look at the Xeon CPU's, and I could never really make the math work. The Intel math was basically that you get twice the CPU for five times the price. So for my personal workstations, I ended up using Intel consumer CPU's." Prices, he said, dropped last year "because of Ryzen and Threadripper... but it was a 'too little, much too late' situation." By way of mitigation, he added that "apart from their ECC stance I was perfectly happy with [Intel's] consumer offerings."

This discussion has been archived. No new comments can be posted.

Linus Torvalds Rails At Intel For 'Killing' the ECC Industry

Comments Filter:
  • by JoeyRox ( 2711699 ) on Tuesday January 05, 2021 @08:16AM (#60898402)
    And one that's still available.
    • by kot-begemot-uk ( 6104030 ) on Tuesday January 05, 2021 @09:29AM (#60898646) Homepage
      Yes and No.

      It is available only on buffered memory BECAUSE of Intel forcing a marketing distinction that ECC is for servers.

      While the unbuffered memory spec allows ECC, you can find official implementations only on very obscure AMD systems like f.e. the early Proliant Microservers. Everywhere else - the CPU (if it is AMD) supports it, the chipset (if it is from AMD) supports it as well. Unfortunately, the motherboard manufacturer has turned it off deliberately in order to perform an Abela Danger impersonation on their Intel account manager. Going back to the Proliant Microserver Gen 1 - these had ECC only because HP used all of its 800 pound Gorilla weight and only because it called it a "server".

      So Linux is actually right. It is yet another case of Intel screwing up the industry by doing things similar to what they did to "compete" with the AMD Athlon and Opteron in the early 2000s.

      • by dfghjk ( 711126 )

        No, because nothing you've said makes an argument for ECC being an "industry". The post you objected to is that ECC isn't an industry. It isn't, it's a product feature.

      • by AmiMoJo ( 196126 ) on Tuesday January 05, 2021 @11:40AM (#60899414) Homepage Journal

        It's not that bad.

        On the motherboard front it varies from manufacturer to manufacturer, but all ASRock AM4 boards support ECC un-buffered DIMMs.

        Such DIMMs are readily available, e.g. Kingston make them. Less common with high end overclockable RGB adorned gaming ones, partly because the extra bit adds cost and limits maximum speed.

        Where it's really lacking is on laptops. Ryzen mobile parts support ECC but no laptops seem to. I'm hoping that this year Lenovo releases some decent Ryzen 5 machines with ECC support but we shall see.

        • by aRTeeNLCH ( 6256058 ) on Tuesday January 05, 2021 @12:39PM (#60899758)
          I got a board, Gigabyte AB350N-Gaming-WiFi, with support for ECC RAM, plus a Raven Ridge 2400G which supports ECC, forked over the extras cash for that RAM, to find that supported meant: functions. As in, will boot and run. There is no hint that the ECC functionality is actually used, which is confirmed by various versions of Memtest86 and the Linux kernel, and since then various places on the Web, and recently even the Gigabyte support site for that board (ECC functionality only employed with pro versions of the CPU - which are nowhere to be found)... :-(
  • There are sprinklings of ECC Intel CPUs in the Core and even Celeron range but it's fleeting and random. I've got a FreeNAS with a Celeron G3930 and ECC. I'm sure if people sought these select parts out Intel would have gotten the message but once again the less expensive product wins despite being clearly inferior.
    • less expensive product wins despite being clearly inferior.

      The, most important point in the article for me was that you can do ECC with cheap AMD processors, though you need special motherboards. Given that the AMD processors seem to be cheaper for the same level of performance right now, I think there's a chance that the better product can win here. Main problem seems to be lack of clarity about motherboard support [ycombinator.com].

    • There are sprinklings of ECC Intel CPUs in the Core and even Celeron range but it's fleeting and random. I've got a FreeNAS with a Celeron G3930 and ECC. I'm sure if people sought these select parts out Intel would have gotten the message but once again the less expensive product wins despite being clearly inferior.

      Exactly. I had a committee that bought laptop computers a few years back for a project. They bought the cheapest things they could find. I would have been okay with that, but the laptops weren't up to the task.

      Fast forward to today - New committee, same task. I keep my thumb on it this time. Every 5 minutes I have to remind them that the goal is not to buy the cheapest thing out there, fit for purpose is the goal.

      Problem is, in the Windows world, so many aren't capable of breaking out of the cheap par

    • by bored ( 40072 )

      They have basically killed ECC on any consumer part that might compete with the xeon line. You only find ECC on celeron/i3/etc level parts that are considered quite low end.

      But even then, the current i3/etc parts are similarly priced with the xeons, but won't work with ECC on a "consumer" chipset. So you end up paying the intel ECC tax via the motherboard now.

  • And by that I don't mean Intel being selfish money-grubbing assholes, I mean Linus being angry at selfish money-grubbing assholes.

    I agree with him, ECC should have become standard because the smaller the transistors are, the more susceptible they are to background radiation. Being more secure would only have been the cherry on top. Unless you don't like cherries, in which case pick your own damn topping.

    • Indeed, but price matters - especially in the windows world. There, users are very accustomed to the occasional, unexplained system crash, loss of a spreadsheet, reboot and carry on. Thus, if the unexplained crash was actually due to a bit flip here and there, well, so be it - it'll be lost amongst the noise of crashes due to crappy software. I suspect the expectation of security is similarly low - fixing Rowhammer in a windows server isn't really going to secure it against all that much, since there are "e

      • Indeed, but price matters - especially in the windows world. There, users are very accustomed to the occasional, unexplained system crash, loss of a spreadsheet, reboot and carry on.

        What about the occasional bit flip or system crash that, rather than destroying your spreadsheet, merely corrupts it? Or some other piece of vital data?

        • Yeah, I get it - but "most" Windows people probably don't, or don't care (at least, not enough to pay a few extra for their laptop, anyway).

          • Yeah, I get it - but "most" Windows people probably don't, or don't care (at least, not enough to pay a few extra for their laptop, anyway).

            I wonder how many of them *would* pay extra for alloy wheels or a gigantic TV screen...

        • Previous employer shifted a ton of processing from North America to India ... including the security database. In the first several months, the number of support calls to regain lost access went through the roof. Turns out the computer there was extremely bad with regard to flipped bits. Rather than actually, ya know, fix the problem by upgrading the hardware, or installing ECC, they wrote a batch program that basically overwrote the whole database, on an hourly basis, using the master (back in North Ame

          • Who the fuck runs a database server without ECC? Answer: someone that needs to be fired.

            • To put that idiocy in perspective, here's a related tale (same employer)
              Support for an existing application was moved from the group (here in North America) to India.
              The team here was re-assigned to other roles.
              In three years, not a single project was completed on the application by the group in India.
              In every single project, the former team were required to drop what they were doing, and actually do 99.9% of the work.
              Total billing for the India group, over the three years, was in the millions of dollars..

      • by flink ( 18449 ) on Tuesday January 05, 2021 @09:45AM (#60898726)

        Indeed, but price matters - especially in the windows world.

        I don't buy this. RAM prices are so volatile that the extra cost of ECC would get lost in the noise if it were a standard feature instead of something used to create a premium tier for servers.

        • by higuita ( 129722 )

          Exactly, when i order my ECC ram a few years ago for my AMD cpu, the price difference wasn't that great (IIRC about 10%), but biggest problem was finding it and the delivery time. It was much harder to find the correct module and delivery times were several weeks, where non-ecc had less than 1 week delivery time.
          Also MB, even in AMD, it is hard to find ECC enabled Motherboards because intel pushed that end-user do not need ECC, so that is a easy way for MB sellers to cut costs. Many times only the high end

      • by chuckugly ( 2030942 ) on Tuesday January 05, 2021 @01:46PM (#60900044)

        Indeed, but price matters - especially in the windows world. There, users are very accustomed to the occasional, unexplained system crash, loss of a spreadsheet, reboot and carry on.

        No one is using Win9x any more - welcome to the 21st. The new annoyance is finding your machine has auto-patched and rebooted in the night.

  • Car analogy (Score:2, Interesting)

    by cmseagle ( 1195671 )

    I used to look at the Xeon CPU's, and I could never really make the math work. The Intel math was basically that you get twice the CPU for five times the price.

    I drove a 2006 Toyota that could do 100mph, given some time to get up to speed. IIRC, base price was $15,000. If I wanted something to go 200mph, I'd be paying a lot more than $30,000. I don't see why performance for anything at the upper levels of performance, CPU or car, should necessarily be expected to have a linear relation to price.

    • Bad car analogy. Toyota can't merge two of their cars together to enable you to drive at 200mph, but Intel can make dual core CPUs for twice the price or even less.

    • by DarkOx ( 621550 )

      Yes but in this case its more like having both the front and rear brakes on the same hydrolic circuit.

    • Comment removed based on user account deletion
    • Comment removed based on user account deletion
      • The best value for money in dollars vs. top speed on four wheels for street legal vehicles is probably a Corvette. If it doesn't have to be street legal then it would be a belly tank landspeed car, and if it doesn't have to be 4 wheels then it would be a Japanese sportbike.

    • I used to look at the Xeon CPU's, and I could never really make the math work. The Intel math was basically that you get twice the CPU for five times the price.

      I drove a 2006 Toyota that could do 100mph, given some time to get up to speed. IIRC, base price was $15,000. If I wanted something to go 200mph, I'd be paying a lot more than $30,000. I don't see why performance for anything at the upper levels of performance, CPU or car, should necessarily be expected to have a linear relation to price.

      I really think we should stop comparing both hyper-processors and hyper-cars, because both are infected with hyper amounts of Greed. Pricing often makes no sense for either in reality. And when shopping for a desktop Xeon, you're shopping for a nice BMW upgrade, not a bespoke Bugatti with 16 cylinders and 300MPH performance. Sure, there's an expected premium, but not a 5x premium.

      • A 2021 BMW 7 Series has a base price of $86,000. It's hardly a bespoke Bugatti, but still carries a ~5x premium over a 2021 Toyota Yaris (closest modern equivalent I could find to the car I drove in '06).
        • A 2021 BMW 7 Series has a base price of $86,000. It's hardly a bespoke Bugatti, but still carries a ~5x premium over a 2021 Toyota Yaris (closest modern equivalent I could find to the car I drove in '06).

          There's a reason I said "BMW" and not "M6". A $40K BMW 3-series will likely give you roughly twice the performance over a Yaris at twice the cost, which more equates to a single CPU Xeon configuration alternative in a desktop PC (the scenario Linus outlined)

          Your 7-series is more akin to server-class architecture. Quite premium, and quite limited in audience. Bugatti is a purpose-built high-performance cluster.

  • I can't find any information about this so I'm assuming their new M1-powered Macs don't have ECC RAM either. Does anyone have inside information about ECC being looked into by their engineers? Is there a petition asking for ECC in future Macs? What about asking them via email?

    • Comment removed based on user account deletion
      • Apple uses expensive high-end components but often puts them together in stupid ways or sticks with expensive in-house-made designs that are bad. They wouldn't use hardware that wouldn't make a noticeable improvement to the user experience though, so there goes any idea of using ECC RAM, before the beancounters even had a chance to tear it out.

      • If Apple is "shit" for build quality, who's "great" in your book?

        • Comment removed based on user account deletion
        • Apple IS shit in built quality. On the inside. And keyboards. And mice. And some displays. And ...

          Apple only SEEMS like high quality to their vanity-luddite target group.

          Seriously, go take a look at a Louis Rossmann video about Apple laptops. Just look at those sorry excuses of a hinge, abysmal cooling, glued-in crap, shitty fluid protection, easily breaking connectors, ...

          Old Thinkpad series T and X had great build quality. New ones are still way ahead of Apple from what I saw. They don't come in hipster p

        • Dell is actually doing pretty well these days, though not without their flaws. Their notebooks have been packing some good hardware in, but it's the little things that suck - using garbage thermal paste on the discrete GPUs, etc.

          But at least you can swap out the fucking SSD without a soldering iron.

    • M1 has RAM integrated into the SoC package itself. You get what Apple decides to sell you. To the best of my knowledge, none of the memory configs that currently exist for M1 feature ECC.

  • Demand (Score:5, Informative)

    by bill_mcgonigle ( 4333 ) * on Tuesday January 05, 2021 @08:25AM (#60898416) Homepage Journal

    There's so much demand for memory that nobody is taking the consumer ECC market by storm by offering ECC RAM for the proportional increase in manufacturing cost.

    Last I looked ECC RAM could cost more than double decent gaming RAM and that's reflective of the fact that business buyers will pay it, not that the parity bit doubles manufacturing cost.

    AMD is doing the right thing but until worldwide RAM supply increases ECC will remain expensive. Which is unfortunate.

    I'm disappointed, though, that Linus gave Intel a pass on the speculative execution and IME disasters. Rowhammer isn't even needed on many Intel systems. AMD does somewhat better in all these cases, so it's possible.

    • by account_deleted ( 4530225 ) on Tuesday January 05, 2021 @08:40AM (#60898452)
      Comment removed based on user account deletion
      • and never really noticed. I think that's the problem. If you've got a limited budget (and in semiconductors you do) you'd be much better off investing in better motherboard components than ECC Ram. Anyone here old enough to remember crap VIA motherboards and the VIA "4-in-1" driver that contained fixes and workarounds for all the stuff that didn't work?
        • Yes, people dont notice, because everything is just hoarded and never looked at again. But just try to check your oldest 128GB for errors. If the parser can even detect them... (Excel will not complain if your budget data cell's value suddenly changed from $1500 to $150,000,000.)

        • The problem with that thinking is that something like ECC isn't a problem, until all of a sudden it is. Smaller transistors are more susceptible to bit flipping from cosmic radiation, etc. - so as lithography processes shrink, probability of needing ECC goes up.

          And there's certain applications and technologies that would be suicidal to run without ECC, most of which are still in the datacenter, but as with many things that start as server technologies, they end up in the consumer space before too long - th

        • Comment removed based on user account deletion
          • I prefer MemTest86 and let it run extended tests for a full 24 hours.

            And of course that only proves if the RAM is bad. Bits can flip just due to a stray cosmic ray hitting the RAM chip just right -through no fault of the hardware. Unless you want to schedule your computing around solar activity, it's better to just have a parity bit.

    • Re:Demand (Score:5, Interesting)

      by Joce640k ( 829181 ) on Tuesday January 05, 2021 @08:43AM (#60898462) Homepage

      not that the parity bit doubles manufacturing cost.

      If they have to run two separate chip fabs or fabrication lines then it could easily double the cost.

      What they could do is make all RAM have ECC then disable one bit on non-ECC RAM. That could actually drive the cost down by increasing yield of non-ECC parts.

      • What they could do is make all RAM have ECC then disable one bit on non-ECC RAM. That could actually drive the cost down by increasing yield of non-ECC parts.

        There speaks a business person, I trow! 8-)

      • by dfghjk ( 711126 )

        Here's a person with a deep understanding of ECC and Parity memory. ;)

  • by jay age ( 757446 ) on Tuesday January 05, 2021 @08:27AM (#60898422)

    There is no reason why consumer CPUs should do without ECC support.
    Users would still have the choice - go cheap, or go ECC - but it would be up for them to decide.

    Pros with high memory requirements would go for ECC just to make sure their software development, AI modelling, video editing etc. box is more stable.
    I will in the next upgrade. For X470 boards choice of ECC sticks was extremely limited and support too iffy.

  • I switched to AMD Ryzen because I was sick of all the vulnerabilities in Intel's CPU's (which could be patched but with a huge performance penalty) and the infamous Intel Management Engine a.k.a. the secret snooping processor within my processor. Goodby Intel, and good riddens!
  • I dunno Linus... (Score:4, Insightful)

    by whiplashx ( 837931 ) on Tuesday January 05, 2021 @08:41AM (#60898456)

    I wouldn't personally spend money on ECC RAM. I've worked on many dozens of computers and hundreds of code bases as a programmer and I have never pinpointed RAM corruption as a noteworthy failure point. There are a lot of things to worry about in life, like failing hard drives. RAM corruption at most requires a reboot.

    • Re:I dunno Linus... (Score:5, Interesting)

      by dskoll ( 99328 ) on Tuesday January 05, 2021 @08:49AM (#60898478) Homepage

      I did have a very nasty experience with bad memory. My server's PostgreSQL logs kept logging odd messages like:

      ERROR: syntax error at or near "SEHECT"

      WTF, "SEHECT"? Turns out bad memory was occasionally flipping bit 2 (asc(L) xor asc(H) = 4) and making the server extremely unstable. So it does/can happen, and if it happens in just the right way, it can corrupt your data without you noticing it.

      • I did have a very nasty experience with bad memory. My server's PostgreSQL logs kept logging odd messages like:

        ERROR: syntax error at or near "SEHECT"

        WTF, "SEHECT"? Turns out bad memory was occasionally flipping bit 2 (asc(L) xor asc(H) = 4) and making the server extremely unstable. So it does/can happen, and if it happens in just the right way, it can corrupt your data without you noticing it.

        I had a similar experience on a PDP-6 that had parity memory but not ECC. Bit 13 of a certain word would flip, and nobody seemed to know why. Fortunately, that word was in the middle of a scheduler table that only used the high-order bit. The console KSR35 printed a multiline message about the parity error when it occurred, then rewrote the word to correct the parity.

        I was afraid to modify the operating system for fear that the bad word would contain something for which bit 13 was important, such as an i

      • Wow!

        Though, are you sure someone didn't switch your L and H keys?

    • RAM problems are rare but they're terrible when they strike. A computer with bad RAM is an insane computer that will corrupt its own files among other things. My servers have a live system RAM testing script [slashdot.org] to try to catch these errors before they do significant damage. This costs time and energy however.

    • by Archtech ( 159117 ) on Tuesday January 05, 2021 @09:13AM (#60898550)

      I have never pinpointed RAM corruption as a noteworthy failure point.

      Yes. You have never *pinpointed* it - or maybe even suspected it. By the same token, you cannot be quite sure it is not happening. It's even worse, of course, if you don't notice it than if you get a crash or something else spectacular.

      There are a lot of things to worry about in life, like failing hard drives. RAM corruption at most requires a reboot.

      Unless, as I said before, it corrupts your data. Or corrupts your code in such a way that the code deletes or corrupts your data.

      The possibilities are endless.

    • I don't know if I'd buy ECC RAM or not. Yet, I agree with you that there's so many points of failure on a computer that RAM is pretty low on my list.

      For me, I first worry about my storage as that's where my data lies. Everything else you can just buy. If the rest of my computer hardware starts to act funny (graphics, cpu, memory), that's generally after years of operating. At that time, it's often not worth the headache of even troubleshooting and I just upgrade as I need to upgrade anyways. Everyone's got

    • Yeah, and I have never seen my tiger-repellent rock not working!

    • Re: (Score:3, Interesting)

      by iggymanz ( 596061 )

      So you're ignorant of real world, typical code monkey. Why don't you read google studies, such as one that found they were getting one single-bit-error every 14 to 40 hours per Gigabit of DRAM.

      https://www.intelligentmemory.... [intelligentmemory.com]

  • are there AMD desktop socket system with ipmi? and ecc?
    Intel does have them and the cpu prices are in line with the desktop ones.

  • by Anonymous Coward

    Not using ECC in desktop is about gamers... the ECC checks take time, and ECC also has buffered outputs, which improve signal integrity at the expense of propagation delay.

    Gamers will not accept this slowdown, and even though Ryzen 5k supports ECC, nobody is buying it (but maybe that's just because, so far, Ryzen 5k is a paper launch).

  • Badly missed (Score:5, Insightful)

    by Archtech ( 159117 ) on Tuesday January 05, 2021 @09:06AM (#60898514)

    One of the worst parts of having to move from VAX(or Alpha)/VMS to Windows on PCs cobbled together from the cheapest parts was losing ECC RAM. For a few years it felt like driving with no seat belt.

    I wasted God knows how many hours, days, weeks troubleshooting weird PC problems that may well have been due to RAM corruption. Talk about false economy!

    It is true that you can get PCs and workstations with ECC RAM, but your choice of other features is severely squeezed. Not to mention having to pay through the nose.

    The loss was compounded by Windows' almost complete lack of any error reporting and analysis software. (Not that Linux is any better). On VMS every single hardware error could be logged, down to every bit in every hardware register. And comprehensive diagnostic software tested all the functions, so you could always tell if you had any kind of solid fault.

    I can't help feeling that the cheapness of PCs and Windows was traded off, not for anything else in the manufacturer's world, but for the customer's time.

    • In fact, I recall that some time in the early 1980s I got a call from a very brilliant colleague who had gone across the pond to work as a reliability consultant in VAX/VMS engineering.

      Someone had floated the apparently hare-brained idea that, by omitting ECC, they could make RAM work faster. The suggestion was that, even if you lost a few days a year recovering from failures, you would still come out ahead because you'd get more work done for the same cost.

      Perhaps significantly, I never heard anything furt

      • In fact, I recall that some time in the early 1980s I got a call from a very brilliant colleague who had gone across the pond to work as a reliability consultant in VAX/VMS engineering.

        Someone had floated the apparently hare-brained idea that, by omitting ECC, they could make RAM work faster. The suggestion was that, even if you lost a few days a year recovering from failures, you would still come out ahead because you'd get more work done for the same cost.

        Perhaps significantly, I never heard anything further about that scheme - and VAXen (and later Alphas) continued to have quite sophisticated ECC in various places.

        Another argument at DEC for omitting ECC was that ECC added to the complexity of the system, and therefore increased the chances of a failure, making the computer less reliable. The hardware people making this argument didn't seem to understand the difference (to the customer) of a failure that stopped the computer as opposed to a failure that silently corrupted data.

  • by DrMrLordX ( 559371 ) on Tuesday January 05, 2021 @09:39AM (#60898696)

    Torvalds makes a decent point about Rowhammer, but otherwise, ECC isn't "that big of a deal" for consumer systems. It's actually a hindrance for a lot of PCs since the performance of ECC RAM can be so poor compared to non-ECC SKUs. Let's take a look at some available ECC, non-registered DIMMs (which is what Torvalds would want everyone using, presumably):

    https://pcpartpicker.com/produ... [pcpartpicker.com]

    Fastest SKU available is a 16GB stick of Micron MTA18ADF2G72AZ-3G2E1 DDR4-3200. It has a CAS/CL of 22 and a first-word latency of 13.75ns (ouch).

    Now it's look at non-ECC RAM!

    https://pcpartpicker.com/produ... [pcpartpicker.com]

    (to be fair, I filtered for 16GB kits to make a more-direct comparison to the ECC Micron DIMM above)

    We have kits going all the way up to DDR4-4400, though in terms of first-word latency, this kit in particular stands out:

    https://pcpartpicker.com/produ... [pcpartpicker.com]

    DDR4-4266 and CAS/CL 17 with first-word latency of ~8ns? Yes please! And at lower price points, you can still find 16GB DIMMs that run circles around that Micron ECC DIMM in the DDR4-3200 -> DDR4-3600 range at much-more-reasonable prices. That Micron ECC DIMM isn't even available anymore, except maybe on eBay (I didn't look).

    Maybe if ECC were the defacto standard for ALL PCs, then higher-performance ECC non-registered DIMMs might exist. As it stands, they don't. Which is interesting, since there is a segment of power-users (notably those that buy Threadripper or HEDT Intel CPUs) that could use high-performance ECC RAM in their workstations. If the market is large enough to accommodate those CPUs and motherboards, it should be large enough for ECC RAM to match. As it stands, HEDT buyers just use gamer memory kits more-often-than-not. Those that need ECC take a massive performance hit.

    • ...

      Maybe if ECC were the defacto standard for ALL PCs, then higher-performance ECC non-registered DIMMs might exist. As it stands, they don't. Which is interesting, since there is a segment of power-users (notably those that buy Threadripper or HEDT Intel CPUs) that could use high-performance ECC RAM in their workstations. If the market is large enough to accommodate those CPUs and motherboards, it should be large enough for ECC RAM to match. As it stands, HEDT buyers just use gamer memory kits more-often-than-not. Those that need ECC take a massive performance hit.

      I am building a new home PC and I want lots of RAM. The motherboard has eight DIMM slots, each slot able to handle a DIMM up to 32 GiB. When I was researching memory on the motherboard manufacturer's list, the largest ECC DIMMs I found were 16 GiB, but there was also a non-ECC 32 GiB DIMM for the same price. I went with the non-ECC memory.

      I don't understand why adding ECC costs the same as doubling the density of the RAM chips. There is room on the DIMM for a ninth chip.

      • I think the problem is that you were looking at ECC, unregistered/unbuffered DIMMs, which are sort of the odd man out in the ECC world. Here are all the 32GB DDR4 ECC registered DIMMs I could find on a moment's notice:

        https://pcpartpicker.com/produ... [pcpartpicker.com]

        Unregistered/unbuffered ECC?

        https://pcpartpicker.com/produ... [pcpartpicker.com]

        Just one, and it's over $600 for a 32GB DDR4-2133 DIMM. The market for this type of RAM is small enough that there isn't much incentive to manufacture it, making it scarce. Scarcity drives price.

  • by gweihir ( 88907 ) on Tuesday January 05, 2021 @10:05AM (#60898840)

    On any desktop, that is. But I noticed that all Rowhammer papers I looked at used laptops. Many laptops come with reduced RAM refresh because that saves power. It does make the RAM more vulnerable to Rowhammer though.

  • The only reason memory manufacturers would be doing this is because their designs for speed speed speed are approaching the unreliable and they NEED an internal checksum so that the memory instability can be covered up to the point that it actually fucking works. They would never do this on purpose without marketing the shit out of it unless they are covering for something bad.
  • by ideechaniz ( 7599842 ) on Tuesday January 05, 2021 @10:49AM (#60899144)
    So far no one said that all DDR5 memory will have ECC support so intel won't be able to limit it to only Xeons. https://www.overclock3d.net/ne... [overclock3d.net]
  • by klashn ( 1323433 ) on Tuesday January 05, 2021 @11:15AM (#60899276) Journal
    It's easy to see on the ECC RAM modules that there's an extra memory component or two, but also with that is pins and routing those pins to the memory controller. A typical ECC module these days has an extra 8 bit lines for ECC. That extra component is used to store the ECC codes. Those extra 8 bit lines also need to be routed to every DRAM slot. It's not such a big deal with say a system with a single memory controller, dual channel, but when you start getting into larger system architectures, you now have to route those 8 traces from he DRAM modules/channels to its respective memory controller in the system. So, it increases the cost of the memory module, increases the cost and design of the boards, it increases the cost of the CPU and also increases the need in software to employ handlers to properly capture error conditions and recover. ECC codes do nothing on their own, the OS and specific software (or Interrupt routines) actually has to act on the ECC byte and compute the parity bit, and if you're looking for performance on your home systems, would you give up 1-5% of your performance? (I guess that's akin to maybe using a VPN these days). I'm just saying, it's not as easy as just allowing for ECC memory to be slotted into consumer motherboards.
    • Re: (Score:3, Informative)

      ECC codes do nothing on their own, the OS and specific software (or Interrupt routines) actually has to act on the ECC byte and compute the parity bit, and if you're looking for performance on your home systems, would you give up 1-5% of your performance?

      This is utter BS. ECC is done in the memory controller. The underlying software need not even know it is there unless it wants to look at error stats/respond to error events.

  • by Wycliffe ( 116160 ) on Tuesday January 05, 2021 @12:46PM (#60899796) Homepage

    What do you need CPU support? The summary even mentions that manufacturers are starting to do it internally. This seems like the correct way to do it anyways. Why does the computer need to know whether it is ECC ram or not. I realize they are different and are not swappable but why?

    • Internal ECC only protects the memory cells. That's the most vulnerable point, but it's still better if you can protect the bits while in transfer. That requires support from the memory controller, which is part of the CPU these days.

    • by dltaylor ( 7510 )

      It can NOT be done in the CPU, or memory controller, without the extra data lines for the ECC code bits to be set and read. My desktop Xeons have the additional data lines, and the DIMM sockets are populated with the corresponding RAM. My laptop i7s do not.

      If/when there were bits incorrectly stored, flipped while stored, or incorrectly read, the Xeons will detect this (up to a couple of simultaneous bits per line), and the operating system can generate an appropriate error response, such as no longer usin

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (10) Sorry, but that's too useful.

Working...