Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Hardware

Bad DIMM on Linus Torvalds' Desktop System Moves Kernel Merges to His Laptop (theregister.com) 188

When a kernel developer asked Linus Torvalds if he'd missed a Git pull, Torvalds "revealed the request was still in his queue as 'I'm doing merges (very slowly) on my laptop, while waiting for new ECC memory DIMMs to arrive,'" reports The Register: Torvalds needs the DIMMs because over the last few days he experienced what he described as "some instability on my main desktop... with random memory corruption in user space resulting in my allmodconfig builds randomly failing with internal compiler errors etc."

The Linux boss's first thought was that a new kernel bug had caused the problem — which isn't good but sometimes happens. His instinct was wrong. "It was literally a DIMM going bad in my machine randomly after 2.5 years of it being perfectly stable," he wrote. "Go figure. Verified first by booting an old kernel, and then with memtest86+ overnight."

Torvalds appears to have been tracking delivery of the new DIMMs as he reported replacement memory was "out for delivery" and predicted it should arrive later on Sunday evening....

His post also mentions that his main PC was set up for error correction code memory (ECC memory), but "during the early days of COVID when there wasn't any ECC memory available at any sane prices. And then I never got around to fixing it, until I had to detect errors the hard way."

"I absolutely *detest* the crazy industry politics and bad vendors that have made ECC memory so 'special'," he added.

This discussion has been archived. No new comments can be posted.

Bad DIMM on Linus Torvalds' Desktop System Moves Kernel Merges to His Laptop

Comments Filter:
  • Astonishing (Score:4, Interesting)

    by rcb1974 ( 654474 ) on Sunday October 16, 2022 @06:48AM (#62970801) Homepage
    I can't believe he didn't have any spare compatible ECC ram in a drawer, or an older computer with ECC ram in his closet that he could have used instead. I'm also amazed that he chose to get non-ECC ram just because it got expensive during the pandemic. He's the maintainer for the Linux kernel for heavens sake! Someone please gift the guy a backup laptop with ECC ram. I do wish that more smaller computers like laptops and micro desktop PCs supported ECC ram.
    • I have a bunch of old RAM sticks, I think some of them are even ECC, but they don't fit into my current computer. I don't know exactly what his workloads are, but I think compiling everything on the old Phenom rig would take longer than shipping a new memory stick.

    • I'm also surprised he doesn't have a high performance computer like the ones they have at Google and Microsoft that he could just remotely log into and use instead of his home desktop.
    • Re: Astonishing (Score:5, Insightful)

      by KermodeBear ( 738243 ) on Sunday October 16, 2022 @08:07AM (#62970919) Homepage

      I am more astonished that this is a story. Man has bad ram chip. Wow.

      • Re: (Score:2, Insightful)

        by jonathantn ( 6373084 )
        There is where /. has descended to.
        • I know that it doesn't really matter that much, but it doesn't get much nerdier.
        • Re: Astonishing (Score:5, Insightful)

          by timeOday ( 582209 ) on Sunday October 16, 2022 @09:01AM (#62971013)
          I disagree. I think the fact that Linus has stayed like this instead of becoming a bigwig is remarkable. It's remarkable that hey has keyboard-level involvement with anything at all. I think the way he is accounts for why linux hasn't turned to shit like most things do.
          • Re: (Score:3, Informative)

            by Anonymous Coward

            That's because he's a real leader, not some greedy, selfish and irresponsible poser. Most people are egotistical and unethical; few people are mature enough to realize how important ethics really are. Most people are unethical and self-serving. What evil people do is produce crap, which is why we're drowning in shit.

            Linux, Linus and the like are the exceptions that prove the rule. Society is in decline and civilization is collapsing due to corruption. That's the meta now.

            • Re: (Score:2, Informative)

              Redundancy for critical projects is not unethical!

              Jfc

              • by rcb1974 ( 654474 )
                Very true. I'm rather curious to know what will happen once Linus becomes incapable of maintaining the kernel. Does he have a plan -- a sort of Linux "will" -- for that?
                • Re: Astonishing (Score:4, Informative)

                  by godrik ( 1287354 ) on Sunday October 16, 2022 @11:57AM (#62971327)

                  Yes, Linux has a plan of succession. The top maintainers for for the kernel are setup to take over in case Linus can suddenly no longer perform the job.

                  It was tested a few years ago when Linus took off for a couple month. I think "Greg" took over as final approver for the kernel.

          • Re: (Score:2, Insightful)

            He doesn't have to be a bigwig asshole to have a spare computer.

            Jfc, the world awaits while he orders a new DIMM?

            That's -super- unprofessional. It just is.

            • by godrik ( 1287354 )

              He gets the work done on his laptop. It's just slower while restoring his typical workstation. That seems reasonable to me.

              • by dfghjk ( 711126 )

                Reasonable and unprofessional are not mutually exclusive. It is only reasonable because there are not expectations, it is unprofessional without a doubt.

            • Re: Astonishing (Score:5, Insightful)

              by toutankh ( 1544253 ) on Sunday October 16, 2022 @12:14PM (#62971363)

              If it's "unprofessional" then the people who pay him to work on the kernel are allowed to complain to him about it, the rest of the world can stfu. It's the real world though so I would expect all kinds of entitled idiots to complain.

            • He doesn't have to be a bigwig asshole to have a spare computer.

              Apparently he does -- his laptop. While a spare computer that is similar to the primary might be preferable, having a less capable and, presumably, less expensive system is entirely reasonable -- especially if it's only for those rare cases in which the primary system is unavailable. Catastrophic hardware failures happen fairly infrequently.

        • But where will me mere nerds read about what Roger Federer eats for breakfast?

      • by Latent Heat ( 558884 ) on Sunday October 16, 2022 @08:57AM (#62970999)

        For want of an ECC RAM stick, the computer was lost,

        for want of a computer the build was lost;

        and for want of a build Linux was lost;

        being overtaken and replaced by Windows,

        all for want of care about an ECC RAM stick.

        -Benjamin Franklin

    • by gweihir ( 88907 )

      Linus is not a hardware-guy...

      • Linus is not a hardware-guy...

        Obviously.

      • by rcb1974 ( 654474 )
        But he must have hardware knowledge to some extend because otherwise how could he even write or maintain the kernel or any of its drivers? Also, building a computer is such a basic thing that someone with only software engineering skills should be able to do that, especially Torvalds. Just pick the CPU, pick a compatible motherboard with ECC ram support, lookup compatible RAM in the motherboard's PDF manual, pick a video card, a few M.2 sticks for software RAID1, then a power supply with enough watts to
        • by edis ( 266347 )

          You obviously meant extent.
          I hope he attempted reseating DIMMs by swapping into different slot. First thing to do if it's only about developing poor contact.

    • Re:Astonishing (Score:4, Informative)

      by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Sunday October 16, 2022 @10:35AM (#62971159) Homepage Journal

      I can't believe he didn't have any spare compatible ECC ram in a drawer, or an older computer with ECC ram in his closet that he could have used instead.

      I don't think I've ever had more than two machines that took the same kind of RAM since around the Pentium era, that only because I changed machines so many times, and often I only had one. I had a lot of machines that would take EDO and original SDRAM, two that would take DDR, two that would take DDR2, and two that would take DDR3. And usually I only had two because I bought a slow CPU to start, then upgraded later and put the old CPU into a backup system.

      If you're not a hardware guy, and someone else is building your PCs, then you likely aren't doing any of this — and odds are very good over the last decade and change that if you have changed PCs, both CPU and memory sockets have changed between literally every upgrade. And your storage interfaces have probably changed at least once, too.

      • by imidan ( 559239 )
        Yeah, I used to have a couple of boxes of older hardware and had the ability to swap stuff around if something went bad. But in recent years, I have two desktops, and they're both quite stable, but they're of different generations, and one is AMD and the other is Intel, and I eventually got rid of all those old parts because all they were good for anymore was building junky, obsolete PCs. And I no longer find it to be as much fun to constantly tinker with hardware -- what I've got works fine. Today, if I ha
    • More than likely the it was something like either the old computer was DDR4 and the new one was DD5 or there was a timing differences with the old vs the new DIMMs. Maybe it's time for Linus to move Linux builds to the cloud??

      • All major cloud companies are in league with the Five Eyes, so if he did move builds to the cloud, shady intelligence agencies could inject malicious code into the chain.
    • I can't believe he didn't have any spare compatible ECC ram in a drawer, or an older computer with ECC ram in his closet that he could have used instead.

      As far as I can tell, systems that support ECC RAM are usually "server class" systems that are more expensive and less abundant than non-ECC systems (as per Linus' comment). Having an unused spare one might be a luxury. That being said ...

      I actually have a refurbished Dell PowerEdge T110 with 32GB of ECC RAM currently sitting unused under my desk. :-) I switched away from it when a friend upgraded his desktop gave me his old ASRock Z77 Extreme3 motherboard with an Intel i7-3770 and I built a system ar

  • it should arrive later on Sunday evening

    And in what godless far-off land are goods delivered on a Sunday evening?

    • it should arrive later on Sunday evening

      And in what godless far-off land are goods delivered on a Sunday evening?

      The land of Amazon

    • And in what godless far-off land are goods delivered on a Sunday evening?

      I'm in Silicon Valley and we get packages on Sunday all the time.

      For starters, Amazon cut a deal with the USPS to fund them to do parcel post deliveries on Sunday if they'd deliver Amazon packages. And I'm pretty sure that's national.

      Lots of Amazon contractors also deliver on Sunday. So if you're getting something where the fulfillment is via Amazon's operation, even if the seller is independent, Sunday delivery is not anything speci

    • it should arrive later on Sunday evening

      And in what godless far-off land are goods delivered on a Sunday evening?

      Probably a lot of places that aren't predominantly Christian. Or is that what you meant by "godless"?

  • I told him so (Score:5, Interesting)

    by BatteryKing42 ( 6922296 ) on Sunday October 16, 2022 @07:00AM (#62970815)
    If you do some digging, you will find somebody (me) told him so, but he just got angry and said what he was doing was fine; he didn't need that special memory. To clarify a bit, Linus Torvards had Linus Sebastian on Linus Tech Tips build him a Threadripper (not Pro) system. At the time Linus Torvalds could have used a more workstation oriented EPYC board and used ECC registered memory to get around the issue of poor availability of ECC UDIMMS. I had been trying to prod AMD into making a workstation version of EPYC before this, but not sure if that contributed to them making the Threadripper Pro or not.
    • Re: (Score:2, Insightful)

      by Anonymous Coward

      If you do some digging, you will find somebody (me) told him so, but he just got angry and said what he was doing was fine; he didn't need that special memory.

      Or a dedicated machine to do the kernel merges in. He's got a strong case of NIH (see git) and at the end of the day is still an amateur ("works for me!"). Despite having been at it for quite a few years now.

      Franky, I think he needs a slower, not a faster, machine to do the merges with. That might give him an incentive to speed up the process for everyone rather than just papering over the problem by throwing more hardware at it.

      • Re: (Score:2, Flamebait)

        He's got a strong case of NIH (see git) and at the end of the day is still an amateur ("works for me!").

        No wonder you're posting as an Anonymous Coward. You're a fucking idiot.

        git was created because Linux was barred from using BitKeeper. He didn't create git for no reason other than because he wanted to. At the time, there was only CVS, SVN (their centralized model was not suitable for the Linux development model at all), and DVCSes like Mercurial were not that great. They still aren't, because most people who were using pre-git DVCSes have switched to git.

        git was an opportunity to create something tha

    • Re:I told him so (Score:5, Interesting)

      by evanh ( 627108 ) on Sunday October 16, 2022 @08:50AM (#62970985)

      Ummm, Linus's point is ECC UDIMMs shouldn't be special. They should be in every laptop and desktop system these days. Even the super cheap deals.

      He certainly has a point. Why isn't that the case?

      • by evanh ( 627108 )

        Hell, even cellphones should be using ECC by now.

      • by Luckyo ( 1726890 )

        Universally added cost for everyone for utility that is irrelevant for almost everyone.

        There are plenty of decisions like this in everything. Why don't you have a turbo on every car? Because it's a universally added cost for everyone for utility that is irrelevant for almost everyone.

        • by evanh ( 627108 )

          That's just it. Mass produced is mass produced. They are the same cost, the premium is in the marketing.

          As for utility, that's the other part of the whole point isn't it. The larger number of RAM cells the greater the chances of defects to exist. Each cell is a potential failure. Simple probability.

          • by Luckyo ( 1726890 )

            No, they are not. ECC modules require more memory for the same memory capacity, as well as more complex boards and more support components for the same capacity. ECC as a feature is not magically conjured out of thin air. It requires additional costs to manufacture. And when mass manufactured for everyone, it means equally massive additional costs for everyone.

            The winners would be people who need ECC memory today, because they'd be getting their memory significantly cheaper. Losers are everyone else, who's

            • by evanh ( 627108 )

              Yeah, yeah, the usual bullshit excuses when marketer don't want to admit they're creaming it. Mass production wipes those away easy. Cost per unit is always economies of scale.

              • There is no great conspiracy here, just the markets at work. If the PC manufacturers demanded ECC ram en-mass, the prices would come down but they don't so ECC ram is a low-volume higher cost product. It makes no sense for the RAM manufacturing cabal to get together to force ECC ram on the market.
                • by evanh ( 627108 )

                  That's exactly what marketing does though. It has no other purpose than to influence.

            • Re:I told him so (Score:4, Interesting)

              by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Sunday October 16, 2022 @10:30AM (#62971151) Homepage Journal

              The manufacturing cost difference between non-ECC and ECC RAM is real, but minuscule compared to all of the other costs like packaging, shipping, QA, etc.

              As memory cells have shrunk, and memory sizes have increased (including die area) the chance of a cosmic ray causing a bit to flip have also increased to the point that regular users can expect to experience bit flips [hiveeyes.org].

              So no, absolutely not. Even on a cellphone, users are likely to experience bit flips. All users would benefit from ECC RAM, and the cost differential would be minimal if all RAM were ECC RAM, and we weren't paying a premium for it because it's less common.

              • but minuscule compared to all of the other costs like packaging, shipping, QA, etc.

                It's 12-20% difference between raw hardware. Packaging, shipping and QA benefit from economies of scale, buying extra hardware does not. Yes the difference is not what the current markup would suggest, but it is not miniscule either.

                But honestly that's partially beside the point. ...

                but minuscule compared to all of the other costs like packaging, shipping, QA, etc.

                As software ends up using a shitton of RAM for insignificant things the overwhelming majority of bitflips will have zero impact on the user. Those which do are likely to cause instability, or a system crash. The odds of a bit fl

                • but minuscule compared to all of the other costs like packaging, shipping, QA, etc.

                  It's 12-20% difference between raw hardware. Packaging, shipping and QA benefit from economies of scale, buying extra hardware does not.

                  So what percentage of the retail price is the hardware cost? Even if it's 50%, you would have at most a 10% price increase on something that's typically around $100. Maybe that doesn't quite meet your definition of "minuscule", but it's not a particularly significant increase.

                • 3200C22 is the JEDEC spec for 3200. 3200C16 is an XMP profile, and thus an overclock as far as any CPU and mobo warranty cares. Nobody who does real work with their computer is going to overclock it and void the warranty.
              • Re:I told him so (Score:4, Informative)

                by AmiMoJo ( 196126 ) on Monday October 17, 2022 @04:01AM (#62973005) Homepage Journal

                DDR5 makes ECC mandatory. All DDR5 memory modules must have ECC.

                Many of them don't expose the ECC to the host computer, but they do have the extra memory and hardware to correct single bit errors silently on-board. The cost to add the computer interface should be minimal now.

                ECC was deemed necessary, even for consumer stuff, because memory density is getting so high, and operating voltages are very low so less energy is needed to flip a bit.

      • He certainly has a point. Why isn't that the case?

        Extra cost (even if they didn't consider this a premium feature there's extra hardware involved).
        Lower performance (ECC RAM is literally a lower standard speed both in terms of common choice on the market as well as maximum possible performant choice).

        • He certainly has a point. Why isn't that the case?

          Extra cost ...
          Lower performance ...

          IMHO even if they don't supply it, the manufacturers of laptops and motherboards should run the couple extra signals needed to make it possible to install it as an option.

          At current speeds and feature sizes, errors are not all that uncommon even in good RAM and processors. So it's really time to have error-correction available and use it a lot. Mean time to failure or hidden undetected error measured in single-digit days

        • by evanh ( 627108 )

          It's RDIMMs that are slower than UDIMMs, not ECC. ECC doesn't affect the DRAM latency like pin registration does.

          And extra cost definitely is not the reason, there is so many examples of dicky extras that everyone ends up buying because they're just part of the package. If cost was the reason the packages would be stripped down far more than they are.

          Lack of demand can be the only reason in market terms. And that's arguably down to marketing.

      • DDR5 has on-die ECC as a base feature now.
    • by Kokuyo ( 549451 )

      The way Linus Sebastien goes on about the awesomeness of ECC RAM I wonder what lead to this build being non-ECC.

      Correct me if I'm wrong but Just about any AMD chip supports it, no?

  • by bradley13 ( 1118935 ) on Sunday October 16, 2022 @07:19AM (#62970849) Homepage

    Except for the absolute cheapest devices, ECC really ought to be standard.

    Really, no one knows how many errors are caused by random bit flips in memory, but they do occur.

    Just as an example, I once saw Excel total up a column of numbers incorrectly. The total wasn't wildly off, but I have a pretty good mathematical intuition, and the result bothered me. Had I not noticed, the bad result would have been used. A quick refresh, and the error disappeared.

    • by gweihir ( 88907 )

      Funny thing: I just bought components for a Ryzen 7900x system. The system memory bus is not ECC, but the RAM modules are "ECC internally". Not quite the same thing and I have to find out what this can actually do and whether I can get some ECC stats or alerts from the DIMMs, but it seems some RAM vendors are now bypassing the stupidity of the mainboard manufacturers. Why the mainboard people insist on not putting in those 4 extra signal lines is beyond me.

      • Why didn't you get a motherboard with support for ECC? Even my very cheap Zen+ system (Asrock B450M/Ryzen 3 1600AF) supports ECC RAM. Granted, there is only one 8GB module on their officially supported list, but at least there's one.

    • by thegarbz ( 1787294 ) on Sunday October 16, 2022 @08:56AM (#62970993)

      Except for the absolute cheapest devices, ECC really ought to be standard.

      No. ECC really out to be an optional choice for end users. Not a standard. While ECC has an upside, it also has a downside, both in terms of (at a minimum) a 12% increase in cost, as well as a very real performance hit.

      Just as an example, I once saw Excel total up a column of numbers incorrectly. The total wasn't wildly off, but I have a pretty good mathematical intuition, and the result bothered me. Had I not noticed, the bad result would have been used. A quick refresh, and the error disappeared.

      Your computer has many places where an incorrect result could have occurred. You're quick to blame ECC memory but you've got little chance of proving it. Sure, maybe, but maybe not. Additionally the fact that it resulted in an actual important calculation having a visibly incorrect result is extraordinarily unlucky. In all the GB of RAM it happened to be one calculation visible to you the user. In the overwhelming majority of cases a RAM problem causes instability in some process that results in that process failing. Even on systems with very shitty and obviously problematic RAM the chances of committing an error somewhere important are minimal as calculations do not typically happen in a vacuum but are rather part of a larger process that constantly goes in and out of memory, and the amount of memory in use in your system that is relevant to user data can literally be measured in the low parts per million.

      TL;DR: Unless you're doing financial work or scientific modelling the overwhelming majority of computer users are not materially affected by random RAM errors beyond the incredibly occasional system crash. It's not worth paying extra to resolve and in many cases not worth the performance hit.

      • A real performance hit ? I do not know how x86/AMD64 handles this, but hamming code EDAC can be implemented directly in hardware with a quite simple encoder/decoder. I would be very surprised if this was not hardwired in the CPUs, with no practical overhead beside the memory space for the required check bits. A quick search [pugetsystems.com] on the internet seems to confirm this.
        • by dfghjk ( 711126 )

          You sound really informed on this topic. If only memory controller designers would consider doing a "quick search" on the internet like you did.

        • A real performance hit ? I do not know how x86/AMD64 handles this, but hamming code EDAC can be implemented directly in hardware with a quite simple encoder/decoder.

          Indeed it can, and yet a single clock cycle has an impact on RAM performance. Your search results show an article from over 8 years ago, running DDR3 (two generations behind), at snail speed. Yes I'm sure it wasn't relevant back then, despite the crap methodology***.

          But you don't need to look at any fancy studies, you can just settle for checking what's available on the market. Ignoring price, the fastest ECC memory I found from my local suppliers (plural), and Newegg (just to make sure my local market wasn

        • by willy_me ( 212994 ) on Sunday October 16, 2022 @03:14PM (#62971843)

          The performance hit from ECC is probably not actually from ECC - it will be from the fact that the memory is registered.

          ECC is typically used in registered memory so you can safely fit more DIMMs into that server. But it comes at a price - an extra clock cycle of latency. Now you could use ECC on a regular DIMM but manufacturers do not bother because it would not work with Intel desktop CPUs - the market for regular DIMMs. So ECC and registered DIMMs end up being packaged together thereby resulting in ECC memory being a bit slower.

          Fortunately the DDR5 spec requires use of EEC memory so this has become a big non-issue. I do not know how the memory modules handle failures but if anyone wants to elaborate, I would appreciate it. However the fact that DDR5 can use ECC without any sort of speed penalty shows that ECC does not really impart any performance hit.

      • TL;DR: Unless you're doing financial work or scientific modelling the overwhelming majority of computer users are not materially affected by random RAM errors beyond the incredibly occasional system crash. It's not worth paying extra to resolve and in many cases not worth the performance hit.

        Your post SO much reminds me of Intel's initial reaction to the Pentium floating point bug...

      • by serafean ( 4896143 ) on Sunday October 16, 2022 @05:30PM (#62972133)

        ECC should be default, with the ability to disable. Unless you're of the view that Spectre/Meltdown mitigations should also be opt-in...

        Random bitflips might create random security issues (ie one in a gazillion ssh bruteforce attempts succeeds because of RAM corruption), and the more pressing issue is some of the user's calculations is one bit off.
        I can easily imagine a bitflip creating a shadow on an image from (for instance) an electron microscope, which then gets misinterpreted and wastes a few days of work. 90% of usage doesn't require 110% performance.

        I don't have the source at hand, but from memory, the ballpark chance is : on a 4GB system, on average you'll get one bitflip every 3 days. That's a lot of potential garbage.

        Also, Rowhammer...
        (which apparently can today defeat ECC, interesting)

    • by stwrtpj ( 518864 )

      Just as an example, I once saw Excel total up a column of numbers incorrectly. The total wasn't wildly off, but I have a pretty good mathematical intuition, and the result bothered me. Had I not noticed, the bad result would have been used. A quick refresh, and the error disappeared.

      Uh, no, that was just a bug in Excel.

  • Is buying memory from the chip manufacturers brands. (e.g. Samsung, Micron et el.) With an integrator you are rolling the dice. They are known for not even having the proper test equipment, packaging bottom of the bin chips to reduce costs and a history of passing memory they know to have bit errors.

    • by splutty ( 43475 )

      That still doesn't solve any of the other issues. Like cosmic radiation. (Don't laugh, but that IS an actual issue that has had real world impact).

      ECC is pretty much the only way you can solve that, and even ECC isn't 100% fool proof in that particular case.

      If you want to reduce the chance of bits flipping as much as possible, ECC is mandatory.

      • by gweihir ( 88907 )

        That still doesn't solve any of the other issues. Like cosmic radiation. ... ECC is pretty much the only way you can solve that

        Not really. That one can also hit somewhere else. The only way to deal with that reliably is software prepared for it. Very, very rare though and generally not a concern.

        • Very, very rare though and generally not a concern.

          No, it is much, much more common than most people think [hiveeyes.org], and the only reason it doesn't affect more people more of the time is that there's lots of error correction in the world.

          Every CPU I have looked at in ages has ECC cache for a reason, and that reason is cosmic radiation. If all they wanted to do was detect failures they could just do continual testing during idle cycles.

      • by dfghjk ( 711126 )

        "If you want to reduce the chance of bits flipping as much as possible, ECC is mandatory."

        Or use an abacus. No ECC required for that. If you're going to cite "cosmic radiation" we're going to laugh at you whether you complain or not.

    • by gweihir ( 88907 )

      I have made consistent excellent experiences with Kingston. I have made bad experiences with Infineon original modules with Infineon RAM chips (15 years back though). Would not recommend the other integrators though.

    • Is buying memory from the chip manufacturers brands. (e.g. Samsung, Micron et el.) With an integrator you are rolling the dice.

      Horseshit. There's zero evidence that primary chip manufacturers have any additional quality control on an assembled end user product over an integrator. Heck quite often the integrator has better quality control on account of producing components designated for higher performance workloads / components more likely to end up in systems where they will be overclocked, i.e. what's better than parts being binned by the vendor? Parts being binned twice.

  • by ClueHammer ( 6261830 ) on Sunday October 16, 2022 @08:26AM (#62970945)
    Its bad for the chips!
  • I guess he hasn't worked at any decent scale with computer hardware. Yes, memory goes bad all the time. All ECC does is buy you some time before you need to replace the memory.
    • All ECC does is buy you some time before you need to replace the memory.

      False. It also turns data corruption and unrequested immediate reboots into warnings about failing RAM that you can cross-order and replace with minimal (or in some cases, zero) downtime.

  • by tlhIngan ( 30335 ) <{slashdot} {at} {worf.net}> on Sunday October 16, 2022 @10:17AM (#62971119)

    Didn't the Linux kernel in the past support the "badram=" parameter which lets you exclude certain ranges of memory in case you had a broke memory module?

    There was an article on /. about it a LONG ways back, but if you had a bad memory module you could exclude the bad range and continue to operate your Linux machine.

    http://rick.vanrein.org/linux/... [vanrein.org]

    Seems like it was around the kernel 2.2 era where it was supported...

    • by Khyber ( 864651 )

      Exactly this! The man himself doesn't even remember about this feature that would've solved his problem? It's saved my ass twice in the past.

      • The man himself doesn't even remember about this feature that would've solved his problem?

        How do you know it would have solved his problem? Did you talk to him and ask him to see the Memtest86 results? There are problems you can have not linked to the range of the memory.

        Also why fuck around when you have a spare computer?

        • by Khyber ( 864651 )

          "How do you know it would have solved his problem?"

          Because barring a bad clock controller on the stick, you can segment off the bad section of RAM after identifying it in memtest86+ using said parameter and avoid using it for anything while still retaining the remaining RAM capacity.

          "Also why fuck around when you have a spare computer?"

          Real hackers don't ask this question. Fucking around is exactly what we do best.

    • by godrik ( 1287354 )

      it depends why the ram goes bad. sometimes it just all goes to shit and it's not really salvageable in those cases.

      Also, you can never know whether the blocks of badRAM you identified with memtest are the only bad blocks. And you may not want to risk screwing up your work.

    • by Dwedit ( 232252 )

      Windows has support for a badram-like feature in the BCD configuration. EXCEPT Microsoft managed to completely fuck this feature up. Every time Windows 10 upgrades to a major version through Windows Update, it discards that setting.

  • Why isn't Linus developing on the cloud?

  • Comment removed based on user account deletion
    • Doesn't he live in California now?

    • I'm in Canada and have an Amazon order out for delivery right now. FedEx also does some weekend deliveries in select Canadian markets (not mine), but you pay extra for it on top of the already pricey next day service.
  • Probably needed re-seating; remove/reinstall... That low-level contact resistance will get you every time!

When some people discover the truth, they just can't understand why everybody isn't eager to hear it.

Working...