Patch To Allow Linux To Use Defective DIMMs 247
BtG writes: "BadRAM is a patch to Linux 2.2 which allows it to make use of faulty memory by marking the bad pages as unallocatable at boot time. If there were a source of cheap faulty DIMMs this would make building Linux boxes with buckets of memory significantly cheaper; it also demonstrates another advantage of having the source code to one's operating system." The BadRAM page has a great explanation of the project's motivation and status. Now where can I pick up some faulty-but-fixable 512MB RAM sticks?
Re:Signal 11 no more? (Score:1)
--
Cool! Put it in the installer! (Score:2)
Now if we can just get some kernel drivers that can bypass other bad hardware... umm... uh... ok... so I don't have any examples, but dammit! Don't you love that Snicker's commercial with the guy wanting to go to lunch with his poster of the panda bear?! Pretty pretty panda! Pretty pretty panda!
I INVENTED PANTS!
Re:Bad Ram (Score:1)
Also have to come up with a good euphemistic buzzword for this memory so that it can be sold. "Near compliant" was a good one I heard a while back.
Re:Is this good for Linux's rep? (Score:1)
--
Re:Bad Ram (Score:1)
Re:A better solution... (Score:1)
Uses of 512MB of RAM (Score:1)
(sorry, it had to be said:D)
Re:If Linux works with crap, that's all IT will gi (Score:1)
One of the major advantages Linux has over M$ products and even some flavours of UNIX is its ability to work on spartan hardware.
It makes sense to use cheaper stuff. However, if you plan to use defective DIMMs on Mission Critical machines, you probable have
some defective ones in your head.
Its up to you (just like everything else with Linux ). If you want it, use it.
If you don't, good for you.
This is very good news for people in places like India (Where I come from) where the cost of 32MB Ram (EDO RAM) is about 1/3rd
of the average person's salary.
Hackito Ergo Sum.
Liberte, Egalite, Fraternite, Caffinate.
Re:Now there's a point to the BIOS memory test? (Score:2)
Write a pattern to ram.
Read it back. And compare to what it should be.
Write a aliasing pattern to ram.
Read it back and make sure u got what u expected.
This will catch quite a few serious memory problems. The 2 cases I saw recently were:
1. PC100 memory that wasn't quite up to par. (Droping bits randomly)
2. A friend of mine put a PC100 dimm in a mobo set for PC133 dimms. The PC100 ram worked.. almost.
In both cases the results were random lockups and application crashes. Turning on the BIOS ram test quickly identified the problem. Which was resolved by putting quality memory in the box.
These tests are only really usefull the first time you boot your box or if you are suspecting bad RAM. (It's a quick way to test for serious memory problems without having to pull out a RamChecker)
Why bad ram may not be sold (Score:2)
Note: This has happened to me, I bought two hard to find keyboards only to find both had water damage upon arrival (packaging still good) and the retailer disappeared.
I'm not so sure about this (Score:2)
Re:Signal 11 no more? (Score:2)
CmdrTaco that really does not become you.
--
Bad Memory doesn't go to waste (Score:3)
There isn't this huge supply of bad memory out there (Radio Shack jokes aside) because memory manufacturers are pretty clever. Bad memory is put into things like:
Audio storage devices, like answering machines and mp3 players, where a bit or two of failure will just end up as a teeny bit more noise.
Cheap digital cameras (once again, a bad pixel here or there....)
Toys. They actually call bad memory "toy memory" sometimes.
SIMMS. You take (for example) 4 bad chips and 1 good chip and get the equivalent of 4 good chips (by replacing bad io's on the bad chips with io's on the good chip). There are jillions of ways to do this, and companies have pretty much done them all.
Sell them at CompUSA to people who don't know any better. (Sorry, couldn't resist)
If I were you, I'd download memtest86 [sgi.com] right now.
Re:Finally! (Score:2)
Now what about something to make me burn less coasters?
There's new error-prevention technology available, but I believe it relies on hardware and software, to keep you from burning coasters.
#1) Sanyo's BURN-Proof technology (available on the newest Creative, QPS, Plextor, LaCie etc. writers)
#2) Ricoh's JustLink technology (available on its CD-R/RW/DVD-ROM combination drive among others)
Both technologies automatically prevents buffer under-run errors which are the leading cause of coasters.
If I were in the market for a new burner, I'd go with the $349 Ricoh combination drive. [ricohdms.com] It does 12x CD-R, 10x CD-RW, 32x CD-ROM, and 8x DVD-ROM all in one device. That's smart.
--
Re:Bad Ram (Score:2)
I've also seen machines that had bad ram lock up randomly, even when the bad pages are never touched.
Let me just say that I have my doubts.
Re:Why bother? (Score:2)
"It's worth doing because it keeps a working system up, and Linux should have that"
Huh? Isn't ECC functionality handled in the BIOS, not the OS? So... Linux does have that functionality, eh?
Re:Oh, sure, Linux users are this desperate (Score:2)
Sounds to me like you're describing what the true definition of "hacking" is. Let's see, if you can get a certain amount of RAM by doing a little hacking for less than you'd pay in a store, what's wrong with that? People do this in their everyday lives. As I type this, I have a penny in my car, wedged between the stereo head unit and the side where it mounts to hold the thing in place. No, it doesn't look pretty. Yes, it did the job (and the price was right).
Perhaps in corporate, "everything must look nice and neat" environments, this isn't a valid solution for adding RAM. But for the CS student who has an old DIMM sitting around, it's pretty damn cool.
Comment removed (Score:5)
Linux is not the be all and end all (Score:2)
IANAESE, but Linux will never be used in a life critical medical device, never mind implantable medical devices. Firstly, the FDA requirements are simply too strict to allow linux's usage. Secondly, it's both overkill and underkill at once. Linux may be relatively efficient compared to systems like Windows, but it's not anywhere near small enough for traditional embedded systems. Third, Linux simply does more than it would need ever need to, why use it? Fourth, it's not setup for DSP type operations. Fifth, do you really want to unnecessarily trust your life to linux just so you can make a statement?
Re:Oh, sure, Linux users are this desperate (Score:2)
It's like saying, "This new Mercedes E320 is as good as the one without a dent in the door." Both run, both are equally safe (assuming its just a superficial dent), but it just doesn't sell itself.
--
BTW, a bit of ancient, related trivia (Score:2)
By the way, here's some ancient related trivia. The INTV Productions video game cartridge "Triple Challenge" integrated the previously-released Chess, Checkers and Backgammon on a single game cartridge. In its original form, the Chess cartridge came equipped with a 1K SRAM onboard, as the game required extra memory.
At the time INTV went to produce the Triple Challenge carts, they discovered that since RAM had grown in capacity over the years, 1K SRAMs weren't available in quantity for reasonable prices, and larger SRAMs were too expensive as well. They almost had to cancel the Triple Challenge cart.
That is, until they found someone with a stack of 2K SRAMs, in which half the RAM was good, the other half was bad. Since the game only needed 1K, it ignored the bad half, and off they went.
Cool, eh?
--Joe--
Program Intellivision! [schells.com]
Re:Why bother? (Score:2)
On x86 systems, the memory controller handles the ECC error correction, and you get an interrupt which allows you to log the event. Often this interrupt is handled by the BIOS. But the BIOS typically doesn't do anything but log the event. The OS can do more; it can map the bad block out, probably without a shutdown.
Just how useful is this, really? (Score:5)
You'll probably get better results simply by cleaning off the contacts with a pencil eraser (remembering to brush away all the eraser dust first) and firmly re-inserting them into the socket.
Re:Hello this was on Kernel Traffic a long time ag (Score:2)
How to find bad ram cheap (Score:2)
Oops, now you can't
Does Slashdot readership know nothing of hardware? (Score:5)
Alright, so we've accepted that some dies are necessarily going to be damaged. Why not make the hardware such that it can resist imperfections? Well, actually we do. RAM being as simple and homogenous as it is, lends itself well to this approach. Here's the idea: you add extra "blocks" of memory to a decode line. Then, if one of the "regular" blocks is destroyed by a process imperfection, the post-fab die can be modified with laser to reroute data to the extra backup block. So you invest some die room in backup structures, so that a die with only a few errors can be "corrected" and will still function as intended. This is basically like keeping a spare tire. If you get one blowout, you're still in business, but two and you are in trouble. Of course, you can package as many extras as necessary, but it may not make economic sense. Here you calculate the appropriate trade off between die size and yield to make the decision.
Anyway, long story short: your DRAM is already "bad". Quite a few RAM chips contain process errors that are rerouted around in hardware so that you, the consumer, need never know. To you, the process is transparent. All you should care about is that you get your *functional* RAM cheaper, because the manufacturer would have had to scrap that die otherwise.
This post discusses software "rerouting" around blocks that had more errors than could be corrected in hardware, but somehow still made it out the door. What's wrong with that?
Will semiconductor manufacturers suddenly think "Gee...let's not worry about yield anymore?" You'd better bet they won't. And even if they did, if the software rerouting is so clean as to not be noticeable (which is the only way it would fly), what do you care? You'd get your RAM cheaper.
--Lenny
Finally! (Score:3)
----
This IS good for Linux's rep (Score:2)
Re:Hello this was on Kernel Traffic a long time ag (Score:2)
Great, deliberate instability :-/ (Score:2)
reliable ? (Score:2)
Anyway it's a nice thing.
Oh, sure, Linux users are this desperate (Score:5)
"Hello, Kingston, I'm looking for any old cruddy defective RAM, got any? Uh.. No.. I won't be reselling it to Linux users, I swear that I am with a major US ISP and we want to put it into our servers! Call Rambus, you say? Hello? Hello?"
--
Similar solution exists in the 2.4 kernel already! (Score:4)
Anything similar? (Score:5)
My bad RAM story (Score:4)
We had just installed an Exchange server we were rolling out the Exchange client to all the desktop PCs. Unfortunately, no one had thought to ask if they could take it--which many of them couldn't. So we were feverishly digging up all the RAM we could find and sticking it into machines as fas as we could. I happened to find a 32MB stick (glory be!) in an unused PC. I said to my boss: "Hey, I found a big one!" He turns around and asked "Is it any good?" while simultaneously reaching for it, and ZAP audibly discharges static electricity right into the thing. We look at each other for a moment and then I say "Not anymore."
I was wrong, though--it was fine.
--
An abstained vote is a vote for Bush and Gore.
Re:What about intermittent failures? (Score:2)
The theory on why this occurs is that the memory on the freelist isn't being accessed (well, ok, we have some bugs occasionally, but... :) and it degrades because of this. Since you don't care about the data on the page, it kinda sucks to panic during the bzero. So, Irix, starting with 6.5.7, knows how to "nofault" this bzero operation and if it fails, it grabs a new page off the freelist and discards the bad page. This is a feature we are thinking of adding to Linux. Other types of pages for which this same recovery can work are mapped files (ie, program and library text/read only data) and clean user data (ie, just swapped in and not yet modified). This is about the best way to solve the intermittant failure problem.
One other interesting thing we've noticed with RAM is that failure rates over time stay pretty much constant since, while manufacturing techniques are getting much better, the memories are getting larger. This causes failure rates to stay nearly flat.
The Environment (Score:2)
I like to preserve as much of the environment as I can. The production of chips is a very resource intensive process, and the complexity of a chip means that a lot of the produced chips are incorrect. I dislike wasting good materials, and even if they are merely `good enough' they should be taken seriously. By allowing the use of such `good enough' memory chips, I hope to help preserving the environment.
Computers are pretty darn unbiodegradable, yet the pace of progress makes them obsolete at an ever increasing pace. How many 386's are somewhere other than landfill? A 386 is not actually that old when you compare it to a washing machine or a fridge.
A lot of people are slamming this because it has some practical limitations, so what!
This guy has done a pretty cool hack, but has also done something positive about side of our industry that most of don't think about very often.
Testing methods (Score:2)
Not that that would lead to lower quality overall. It might even be better since people might get sloppy after looking at a few hundred identical chips every day, whereas machines don't get bored. (well, except for my computer. It insists I play Unreal Tourn. now and again
-----
D. Fischer
Some things NEVER CHANGE... (Score:2)
But it was an AWESOME machine. And, mapping out memory that was bad was something it did on the fly! It would find a bad memory spot, and do one of several things with it:
1) Stop using it;
2) If the problem was intermittent, it would only store PROGRAM CODE there - which, if the memory was bad, it could re-load from the hard disk!
3) If the memory tested good for a while doing program code, (a few days, I think) it would return that RAM to general use.
An amazing machine - with some features that pale even a big, powerful *nix box today.. For example, versioning of just about EVERYTHING... *:1, *:2, *:3, etc, and while there was a "root" user (called admin on this system) there could be more than one! (My login, "dirdisb" was a "root" login too, and you could always tell when you looked at a file whether admin or dirdisb actually did it - much better than *nix style, IMHO)
I seem to recall that there was a patch or something you could apply that would make it use ALL hard disk space to create as many versions as possible of documents - or just 10. (we used the latter)
This machine, as slow as it was, would comfortably handle 20 simultaneous users! (granted, no X-windows or GUI at all)
With patches such as this badram patch (which IMHO should be added to the kernel by default) we are getting some of these really cool features back...
Your bad RAM has to... (Score:2)
To fix this problem, you'd have to use 2 "half-working" chips to get the same amount of memory that 1 of the non-damaged ones would have provided.
It seems that buying several damaged chips to make up for the one non-damaged chip would not be very cost effective in the long run.
Re:Now there's a point to the BIOS memory test? (Score:2)
The Atari 7800 had lots more ROM space than it could possibly use for just the 960-bit digital signature lockout code, so they included a full 6502 CPU test in there. Sheesh.
Re:Finally! (Score:2)
Re:Oh, sure, Linux users are this desperate (Score:2)
Still, I can appreciate there is a psychological problem with knowing that your system is not 100% flawless - plus, if RAM has some bad bits, might it not have others you don't know about?
The only way BadRAM would take off, I believe, is if RAM manufacturers started shipping each DIMM with a list of known defects, as used to be done for disks. At present, a single defect means the RAM is not used, so the only defective memory modules are dodgy no-name ones you might not want to trust anyway. If, OTOH, the manufacturer guarantees that there are no flaws other than the handful given in the defect list, there'd be no reason not to use the memory provided you trust the manufacturer.
Re:What about Quality Control rejects? (Score:2)
Bogus software wouldn't bother me at all, at least if it worked when I got it home. I'm talking about defective modems that were taken from a trash heap somewhere and new boxes made for them and put out for sale.
I could give a damn if someone is selling bootlegs of Freddy the Fish or something like that.
LK
Re:Just how useful is this, really? (Score:2)
Ah yes, timing and all that other rot. Two stories here.
My first every memory upgrade was many years ago, adding 4116 RAM to a TRS-80 Expansion Interface. I put it in and it didn't work. When I looked at the address range with the TRSDOS debugger, it contained random values that changed every time I looked at it! I managed to get it to work right by cranking the power supply voltage down into the low 4.x volt range.
And a few months ago, I got some old IBM 72 pin 8MB SIMMs at a computer show that were probably pulls from old PS/2 machines. Some of them worked, some didn't. I didn't realize until a few days later that this was 80ns RAM, and the motherboard only supported 60/70ns RAM. I was lucky to get any of them to work right.
And then there was all that talk about "cosmic rays" messing up DRAM, until eventually it was discovered that the radiation was coming from within the chip itself!
Re:Oh, sure, Linux users are this desperate (Score:2)
I wouldn't use RAM with intermittent faults. But if it had a handful of known bad bits, with a guarantee that all the others were solid, I wouldn't have a problem with mapping out the bad 0.001% (with say 0.1% wasted space) and using the rest.
That's the problem - the perception that RAM with any defects at all is 'second rate'. In the past this has certainly been true because it wasn't possible to map bits out. If RAM starts being seen more like hard disks (and until a few years ago, floppies and LCDs) - seen as something which may have known defects without being unreliable - then manufacturers will be only too keen to improve their yields by selling the chips that are only almost-perfect.
(I wonder - could you do this with other hardware? If one of the registers on your CPU is broken, could you sell it and tell the customer to use a compiler that won't use that register? That would be totally infeasible today, but in the future I can see it _could_ happen. For example, if the whole system were in bytecode with a small native-code bootstrap that finds out about the CPU's defects and sets up the JIT compiler appropriately. There have been cheaper chips which were rumoured to be defective versions of more expensive ones - eg the Intel 486SX may originally have been a use for 486DXes where the FPU turned out defective.)
mod this guy up (Score:2)
--
My Article (Score:2)
From the "Why did this article get posted...and not mine" department:
Ohh..This sounds a lot like the story I submitted a week ago called:
"How to make your good memory make use of bad software."
{Joke Mode Off }
Re:Just how useful is this, really? (Score:2)
I thought it was already relatively common for RAM manufacturers to test for single bit errors in the factory and route around the affected cell, which would negate the economic value of doing this in software. (They should certainly be incented to do this, otherwise they'd have terrible yield issues.)
This sounds like the "bad block" detectors that used to be necessary for hard drives, but aren't any longer (hard drives these days remap bad blocks internally)...
A better solution... (Score:2)
A lot of vendors offer service contracts and warranties. But peecee vendors, accustomed to dealing with...shall we say, less than reliable operating systems, will try to make you go through 543 steps and tests before allowing you to send your hardware for replacement, because most problems in that world are either OS bugs or user error. In the real computer market, they don't fuck around. You paid a lot for your system, and you can expect it to work. When you call them up and say you have a bad foowhatzit, they send you a new one (unless they're Sun, in which case they make you sign an NDA first - bad Sun, bad!). They expect, and rightly so in most cases, that you know what you're doing and it isn't a software problem. No runaround, no bullshit, no cost to you. This is one of several reasons I'll never own another peecee. The service just ain't the same.
I understand the concept of trying to get the most you can out of any hardware you might have. But I also think that people stuck with such hardware ought to learn their lesson next time instead of relying on hacks, however clever, to work around their poor buying decisions. Anyone actually seeking out bad memory to use with this is insane. Firstly, there's good reason to believe that if memory is failing, other areas in the same part may fail as well, perhaps with less frequency or at a later time. Second, even if the cost is half that of a good part, is it really worth saving 50 bucks and having to configure this thing, test it, and make sure periodically that no other memory areas fail? I would suggest that technical work of this type is worth at least 50 bucks an hour...so if you value your time fairly, it's unlikely that you'll win out of this. I'll gladly pay some extra money to know that I won't get a sig11 the next time I go to compile something...and if I do, I can get replacement parts the next morning at no cost, without any hassle. I don't work for any vendors. I'm just a sysadmin who'd rather read slashdot than argue with tech support.
Re:Now there's a point to the BIOS memory test? (Score:2)
It can't. Running every possible combination would take an indefinately long period of time (infinity).
Does it check every single block on the hard drive? No!
This is because the hard drive is not essential to the functioning of the computer. With modern operating systems, usually a hard drive is required, but again, it's not essential.
Does it check all the blocks of floppies, CDs, DVDs, etc to make sure they work?
This would be absurd. "Please insert a DVD, CD and floppy to boot".
If the memory test is essential to the functioning of the system, why do they let you skip it?
You then go on to contridict yourself by saying Obviously, the smart thing to do is to _wait_ for the memory to fail rather than test the whole lot for a minute or two..
make sure nobody replaces linux (Score:2)
Re:What about Quality Control rejects? (Score:2)
LK
Re:Finally! (Score:2)
"Yamaha first to market with 16X CD-RW [oaktech.com] drive designed around Oak's controller that reduces CD burn time to under 5 minutes"
16X Write
16X ReWrite
40X Read / Audio Ripping
Yamaha's CRW2100 [yamaha.co.jp]:
16X Write
10X ReWrite
40X Read / Audio Ripping
These drives use an 8MB Memory Buffer for their high speed and to avoid buffer under-run. I can't find any indication if they use either Sanyo's or Ricoh's error prevention technology. I don't think they do.
An interesting article [cdrinfo.com] on Plextor's newest drive talks about a newer form of BURN-proof and also JustLink hints that 24x write drives may be down the road.
--
SIMMs too, right? (Score:2)
It was quite fun, running a system (FreeBSD) with a single-bit memory error. Sure, gcc would die on occasion, but then there was the oddness of having a script break because a file http_log was missing (mysteriously renamed to httx_log). The best part was actually figuring out which bit was bad...
Re:Finally! (Score:2)
Amazing!
Now if you're trying to correct a meatspace error your're having, that's a different story. Use rewritables!
The reason you couldn't do this on a normal burner is that the change requires a bit of extra code in the firmware to handle the ready to restart and a laser that can switch from read to write very fast. CDRW drives can (and do) correct errors such as this when they are writing to RW media.
~GoRK
No, this *is* good for production use! (Score:5)
If we ever want to see linux used in mission critical systems like air traffic control, embedded medical devices, or military applications, then projects like this are the key. Fault tolerance now exists for memory (this project), storage (RAID), and communication (redundant NICs). The next target should be the CPU.
How about projects to detect the types of errors a failing (typically, overheated) cpu produces, and adjust the scheduler accordingly to insert idle time and cool down the cpu? Or to use one cpu to monitor another in multiprocessor systems, and avoid using a processor that starts producing faulty results?
Imperfect knowledge, but ... them's the breaks. (Score:3)
But how many people saw it on kt? For purely selfish reasons, I'd like to see a lot more people know about this project, because I find it very interesting and useful-looking. Plus, I think it's just a neat hack in general, and I'd like to point it out.
If it's too old for you, then
YMMV, whaddya do?
OK.
timothy
Re:Now there's a point to the BIOS memory test? (Score:2)
put arbitrary number in first register
copy first register to second register
...
copy second-last register to last register
compare last register to first register
if (different) HALT
(I have a feeling it did this twice, with 01010... and then with 10101...)
He thought this was quite clever until he realized that a bad bit in the first register would still pass the test (and also negate the test of that bit on all the other registers)...
Re:Bad Ram (Score:2)
I think this comment is a result of not knowing how difficult it is to make "good" RAM (or *any* good electronics for that matter).
The complex process behind making RAM means that there will ALWAYS be defective ones in the batches that can't meet standards set by the manufacturer to cover their ass on warrantees, etc. If they are going to end up throwing this hardware out (or recycling the pieces if that's cheap) then they might be able to make more money selling the defective RAM.
While having "partially defective" RAM on the market may seem bad, if the price point is right it could be useful for some people. Like if I could get 512MB with 50MB defective for 100 bucks (w/ maybe a 1 year warrantee on the 462), I'd jump on it in a second. But that's just me.
Still practical... (Score:2)
But what happens when there are more faulty rows than spares? Answer: They sell it to the crackling-audio people, for cheap. Such chips might not have a higer tendency toward progressive RAM-cancer than those with fewer faults (though I will be happy to stand corrected if someone has contrary data.)
By marking the bad rows bad, Linux never allocates them. With virtual memory in fixed-size pages and memory-mapped I/O there's no penalty for scattering your data all over the place and hopping over the occasional chuckhole.
Downside would be if there's a flakey cell and the memory test misses it. So a persistent bad-page map might be useful, as would beefing up the startup test if the feature is enabled, and adding a background memory test on the currently unallocated pages, to pick up any really-low-density faults.
If an intermittent cell gives you a hit on a read-only or unmodified page, a hack in the parity-error recovery code could move and refresh it. A read hit on a modified page not yet written back to disk is bad news. (Another background hack could be writing modified pages back part of the time the disk is otherwise idle, to reduce that window.)
Re:Bad Ram (Score:2)
[google]
DRAMs typically improve yield by using spare rows and columns to replace those occupied by defective cells. The repairs are performed using current-blown fuses, laser-blown fuses, or laser-annealed resistor connections. [Coc94] references one case in which memory repair increased the yield from 1% to 51%.
from http://www.cs.berkeley.edu/~rfromm/Courses/SP96/c
oh, bull (Score:2)
And if that doesn't impress management, take your faulty DIMM and throw it in a Win2k box. Sit back and watch the fireworks.
Re:PS/2 (Score:2)
PS/2?! Heck, didn't PDP-11s do this? I can still remember (barely, though) PDP memory boards with socketed discrete memory chips that included a two/three spare chips so you could replace the bad chips yourself after locating them using an XXDP+ dianostic. I don't recall bad chips bringing down the system but in those days, when a big PDP had 4MB of memory, every little bit counted and the OS worked around it.
I heard somewhere several years ago that Windows got around flaky memory not by marking pages as `bad' but by forcing additional wait states if questionable memory was detected at boot time. Any truth to this?
--
Re:Oh, sure, Linux users are this desperate (Score:2)
Lucky you, where I once worked (two places actually) this is how these things played out:
If you have it running on that [cobbled together pile of leftovers] then we'll just leave it the way it is.
It eventually breaks because the upgrade never happens (due to always fewer priorities than necessities met)
The [cobbled together pile of leftovers] never should have been done, it makes us look bad when a [cobbled together pile of leftovers] failure deprives users of one of our key services.
In retrospect, it's funny, but wasn't each time innovation was thus mishandled.
This too often seemed to resemble the institutional project model:
1. Plan is proposed
2. Wild enthusiasm
3. Plan is put into action
4. Process fails
5. Feelings of hurt, loss and disillusionment
6. Search for the guilty
7. Punishment of the innocent
8. Promotion of non-participants
--
Bad RAM for cache (Score:2)
I once had a motherboard that had a problem refreshing RAM above the 1MB boundary. You could write and read just fine, but as time passed you would watch individual bits revert back to 1's. It was kind of amusing watching all the graphics in doom change ;^). I 'fixed' the problem by writing a TSR that tricked all software into thinking I only had 640k of memory. That memory would have been fine for cache if the data was protected by a checksum/CRC.
Re:No, this *is* good for production use! (Score:2)
Fault tolerance now exists for memory ( ECC RAM [goldenram.com]), storage (RAID), and communication (redundant NICs).
Real information... (Score:5)
mod parent up please (Score:2)
----
Umm.. (Score:2)
With that said, if I can get a 512MB DIMM for $100 because 50MB of it is inusable, I'll buy it and install this hack, even though having any where near that much ram on my Linux box will not help me(it's little more than a mp3 player and web browser, most of my work is on my Mac).
Re:Just how useful is this, really? (Score:2)
I have a Toshiba ultraportable. Nice little box, 3 pounds, magnesium case, didn't cost a whole lot, nice bright screen, 96 megs of ram (maxed out)
64 of that 96 megs of ram are on an add-on card. For whatever reason, the cost of this proprietary memory card for this particular notebook has skyrocketed.
I've had the notebook about 10 months when, out of the blue, it starts rebooting spontaniously. I fire up memtest86, there's a few chunks of bad ram.
I take it out and start frantically searching for a source of a replacement. Currently, Kingston is the only manufacturer i can find shipping it, and they want 1/2 what the whole notebook goes for on eBay, including the memory card. About $350 for 64 megs of ram. No Effin Way. Just not going to happen. I'm not poor but that's just plain stupid. I'd be disgusted with myself if i spent that much on that little memory when i'd be better off selling the whole notebook and buying another.
So i found the badram patch. Patched my kernel. Found that my lilo was too old to allow the whole commandline. Downloaded and installed new lilo.
And, it doesn't work.
Well, it sortof works. Now, with that ram, it randomly locks up, instead of randomly reboots. Big improvement, right? Wrong.
I don't know what the problem is. A friend with a background in the semiconductor industry says that memtest86 is written from an outsider's point of view regarding memory, but that he's under a prior NDA, and Motorola would probably be Quite Upset if he leaked old documents that would tell the author how to improve it. Maybe i just need a better memtest86. Maybe i ought to expand the ranges so that an area around the affected areas are also blocked out. I don't know.
All I know is, in my case, it didn't really help. And that a notebook with 32 megs of ram and a really slow harddrive is useful for little more than an xterminal.
Signal 11 no more? (Score:4)
It made it notorious for working with dodge memory, failing to boot half of the time. I've seen people blame Linux for bad hardward because it would work with Windows.
It's nice that Linux now could just go
*ARGH YOU HAVE CRAP MEMORY*
shrug it's shoulders and chug along anyway.
Re:Is this good for Linux's rep? (Score:3)
Re:Oh, sure, Linux users are this desperate (Score:2)
--
Re:Your bad RAM has to... (Score:2)
It's worse than that. DRAM is addressed by rows and columns, so each address line controls two bits, and not necessarily two adjacent bits.
If an entire address line is bad, it's time to make a keychain holder. At the factory, this type of problem won't even make it out of initial die testing, much less all the way to manufacturing a DIMM.
Chips may still not work (Score:2)
Plus, some of the motivation is a little aschew. If you want to push these chips into an old machine, you still have the problems of RAM limitations due to motherboard design. A fat lot 512MB of semi-faulty memory is going to do in a board that can only support up to 32MB (or better yet, an older chipset that supports up to 8 or 2).
Better hurry... (Score:5)
handfull of busted 256m DIMMS: $10.71 with tax
6 reboots, a little math, and a partial kernel compile: 21min
The look on my roommate's face when I typed "top": priceless!
Swiss Cheese (Score:3)
Linux forced its way into our IT Department when it could restore a trashed system into something useful. Here at The Salvation Army, we endevor to be good stewards of what we are given. We have an IBM PC Server 350 (now named "Methusela") that crashed one day for no apparent reason. It refused to run Windows anymore... not even Win98 or Win95!
But it ran Linux flawlessly. Well, actually it did point out one flaw on its own: The internal Ethernet controller was getting an unusually high number of bad packets. It would receive DHCP assignments, even do some web work in Linux... but it was enough to shut Windows down completely. Even after installing a working NIC, Windows could not run due to the faulty internal NIC, but Linux ran fine!
Likewise, we found an instant way to crash every WinNT system in the building. Someone was re-arranging the hubs and switches, and accidentally created a packet loop by plugging a switch back to itself... in three seconds every WinNT system on the network went straight to the Blue Screen of Death.
It one thing to handle the rules well, but quite another to deal with the exceptions!
anti-linux? (Score:4)
I know you're just trolling, and I shouldn't respond, but for students, and anybody who has access to memory modules that are experiencing known, predictable faults, this would be great. Not everybody has some fancy $30,000/year job, y'know.
--
"Don't trolls get tired?"
Coming soon to Mac OS-X (Score:2)
Burris
One step further: (Score:2)
Re:What about Quality Control rejects? (Score:2)
get real, this is only useful for high reliability (Score:2)
bad ram is just that, bad, and is likely to have more failures over time.
Hello this was on Kernel Traffic a long time ago (Score:2)
Could have used this last year.... (Score:2)
Re:Is this good for Linux's rep? (Score:2)
A long time ago... (Score:2)
Rich
Re:Bad Ram (Score:2)
Now there's a point to the BIOS memory test? (Score:3)
... (Score:2)
> we are talking about the die that passed the initial wafer probe and then were packaged and then fail somehow at or after the packaging or even shipping stages.
Yes, we definitely are. I only brought up the fact that the industry already routes around process errors in DRAM's to demonstrate a point. He seemed frightened by the possibility that future DRAM's we buy might not be 100% "clean". I wanted to demonstrate that 100% clean isn't necessary, and infact isn't produced now, by and large. What *is* necessary is 100% functional parts. DRAM manufacturers know this, and use it to improve yields, thus driving down cost. No foul play there.
> So in conclusion, I think there are plenty of us at
I expect there are, but none of them were posting. Instead, most of the posts demonstrated a clear lack of understanding of the process. The consequences of techniques like Linux's remapping seemed to worry the original poster, and I wanted to explain why it wasn't something to worry about since a similar process is already performed quite successfully. Further, I wanted to emphasize that this is a perfectly valid technique for increasing yield, and is transparent to the user. It isn't like the manufacturers are trying to rip people off.
> Why don't you just allow us to think that is't really cool to use software to lock out blocks of RAM?
I'm not saying it isn't cool. Some seemed weary of the idea, though, and I wanted to point out that their present DIMM's use something very much like this. As long as the software can do this transparently (as the hardware does), what does it matter to the user?
> but I seriously doubt that they are applicable to such dense cells as memories
Believe me: they are. Think of it: a massive die area that will be completely destroyed by a single speck. Wouldn't you prefer an ever so slightly (say 5-10%) larger die that can withstand one or two specks? The redundency is very easy to use in a homogenous structure like DRAM (millions of identical cells). All that has to be done to "swap in" the replacement RAM block is to modify some address lines. That can be done by electrically blowing fuses on die, or through laser modification. One of my former employers was *most* found of the laser approach. It added quite a bit of flexibility to their designs.
--Lenny
Re:Oh, sure, Linux users are this desperate (Score:2)
I'd be more impressed with on the fly patching of bad memory to keep a server going rather than having it hang. That would be a selling point.
--
Re:Oh, sure, Linux users are this desperate (Score:2)
I'm well aware of the ingenuity of radio amatures, my father has been one for decades, and the innovation which made early repeaters work (retuning savlaged military/commercial radio equipment.) This is great as a hobby, great if it helps out in an emergency, but, as with Linux seeking acceptance, try not to overlook those who opt to ditch the throwback for a stable, professional package with reliability. I can't see calling up Icom and asking for technical support after I nibbled a hole in the casing of my UHF handheld and wired in a customization any more than I can see calling up a tech at 1AM on a Friday because the cut-price 512M DIMM just flaked out a little more and took the system down.
When you buy good memory, it's expected to be 100% good, no intermittancy. If it flakes you replace it, hopefully under warranty or field searvice agreement. When you buy "iffy" memory, you accept that it is known to be broken, but you have no guarrantee how it is broken and whether that state of broken is stable.
Annecdote: A disreputable computer technician, who often overcharged for simple repairs and patches, is hit by a truck and killed immediately. He appears in hell and a demon welcomes him, and directs him down a passageway to his eternal punishment. The tech inquires as to what it will be. The demon indicates the punishment fits the crime and opens a door. The tech looks in and sees an enormous cavern filled with PCs, all tagged as broken. The tech says, oh, I'll spend eternity fixing broken computers? The demon says, yes, but since this is hell, they all have intermittent problems.
--
Why bother? (Score:4)
Modern DRAM doesn't have much trouble with bad cells, and the yields are quite good. So there isn't a big supply of DRAM with bad cells that fail solidly. Most DRAM problems today are at the edges: at the buffers, the connectors, or clock synchronization - the things that can be messed up during installation.
Personally, I get ECC RAM even on desktops, just so I know it's working. It eliminates arguments with tech support when the hardware really is broken.
They throw them away now (Score:2)
--
Predictable faults? (Score:2)
--
Re:Umm.. (Score:2)
-matthew
Err... (Score:4)
- A.P.
--
* CmdrTaco is an idiot.
Re:Umm.. (Score:2)
Best Buy (Score:4)
If only it made sense.. (Score:2)
Okay, so you car dealer marks down the car 80% because ONE of the pistons doesn't work right. However, the rest work fine and he installed a thing-a-ma-bob to make the engine ignore that piston.. ummm..
What kind of warranties will the end user get for the memory? what kind of performance is eaten by this program? does the memory run up to spec? will it still work in 2 months?
There could be a niche market for "used" memory sticks, but "damaged" or "defective" may not sell all too well...
I agree, however, that this does seem like a cool way to resurect older systems into useful appliances (print servers, routers/gateways, etc).
Verbatim