TRIM and Linux: Tread Cautiously, and Keep Backups Handy 182
An anonymous reader writes: Algolia is a buzzword-compliant ("Hosted Search API that delivers instant and relevant results")
start-up that uses a lot of open-source software (including various strains of Linux) and a lot of solid-state disk, and as such sometimes runs into problems with each of these. Their blog this week features a fascinating look at troubles that they faced with ext4 filesystems mysteriously flipping to read-only mode: not such a good thing for machines processing a search index, not just dishing it out.
"The NGINX daemon serving all the HTTP(S) communication of our API was up and ready to serve the search queries but the indexing process crashed. Since the indexing process is guarded by supervise, crashing in a loop would have been understandable but a complete crash was not. As it turned out the filesystem was in a read-only mode. All right, let's assume it was a cosmic ray :) The filesystem got fixed, files were restored from another healthy server and everything looked fine again. The next day another server ended with filesystem in read-only, two hours after another one and then next hour another one. Something was going on. After restoring the filesystem and the files, it was time for serious analysis since this was not a one time thing.
The rest of the story explains how they isolated the problem and worked around it; it turns out that the culprit was TRIM, or rather TRIM's interaction with certain SSDs: "The system was issuing a TRIM to erase empty blocks, the command got misinterpreted by the drive and the controller erased blocks it was not supposed to. Therefore our files ended-up with 512 bytes of zeroes, files smaller than 512 bytes were completely zeroed. When we were lucky enough, the misbehaving TRIM hit the super-block of the filesystem and caused a corruption."
Since SSDs are becoming the norm outside the data center as well as within, some of the problems that their analysis exposed for one company probably would be good to test for elsewhere. One upshot: "As a result, we informed our server provider about the affected SSDs and they informed the manufacturer. Our new deployments were switched to different SSD drives and we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel."
The rest of the story explains how they isolated the problem and worked around it; it turns out that the culprit was TRIM, or rather TRIM's interaction with certain SSDs: "The system was issuing a TRIM to erase empty blocks, the command got misinterpreted by the drive and the controller erased blocks it was not supposed to. Therefore our files ended-up with 512 bytes of zeroes, files smaller than 512 bytes were completely zeroed. When we were lucky enough, the misbehaving TRIM hit the super-block of the filesystem and caused a corruption."
Since SSDs are becoming the norm outside the data center as well as within, some of the problems that their analysis exposed for one company probably would be good to test for elsewhere. One upshot: "As a result, we informed our server provider about the affected SSDs and they informed the manufacturer. Our new deployments were switched to different SSD drives and we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel."
Maybe we can create a superior alternative to TRIM (Score:4, Funny)
I suggest we call it SNATCH.
Re: (Score:2)
No we need a TRIM standard first.
That's too long (Score:2)
We should use CUT. Although that might conflict with a builtin in the Gnu assisted shell.
Is there a site maintaining a list of "bad" SSDs? (Score:5, Interesting)
I'll Google in a moment, but I was wondering if anyone knew of any good sites that maintain lists of good/bad SSDs for Linux. With the number of vendors out there nowadays, having to scan the source seems like a poor way to track the information.
Re:Is there a site maintaining a list of "bad" SSD (Score:5, Informative)
It takes a couple of links and searching through source code [github.com] to get there. So here's the list of problematic drives, better formatted but still in regular expression format:
Micron_M500*
Crucial_CT*M500*
Micron_M5[15]0*
Crucial_CT*M550*
Crucial_CT*MX100*
Samsung SSD 8*
So, basically, all the ones I thought were the best. The list of whitelisted drives after it only includes those brands, Intel, and ST-something. So other brand may be unknowns.
Re:Is there a site maintaining a list of "bad" SSD (Score:5, Informative)
The Crucial MX100 with the latest MU02 firmware is now whitelisted by the Linux Kernel, and has it's TRIM ability re-enabled.
Re:Is there a site maintaining a list of "bad" SSD (Score:4, Informative)
There's also an upgrade path for Micron's older SSDs - I just upgraded my Crucial M550 from MU01 to MU02 using a bootable ISO from Micron's support site:
http://www.crucial.com/usa/en/support-ssd-firmware
Re:Is there a site maintaining a list of "bad" SSD (Score:5, Informative)
ObPedant: those aren't regexes, they're globs. Otherwise (for instance), the Samsung entry would match
Samsung SSD<space>
Samsung SSD<space>8
Samsung SSD<space>88
Samsung SSD<space>888
.
.
.
ad nauseam: the "*" regex operator means "zero or more occurrences of the previous pattern", which in this case is the character "8".
At least, I hope they're not supposed to be regexes. Otherwise, the kernel blacklist itself will have some serious issues known-bad SSDs because someone never learned how to create a regular expression.
Re: (Score:2)
You will only find SSDs from the very best vendors there... because the crap ones don't claim to support queued TRIM in the first place.
It is interesting that the Micron M500, *which is an enterprise datacenter SSD*, is listed. Rather bad PR for Micron, that: an enterprise datacenter SSD that corrupts data and has not been fixed?!
As usual, good PR for Intel... too bad their SSDs self-destruct based on a timer, instead of trying to soldier on until things actually get really broken (and only *then* self-des
Re: (Score:2)
Wait, what?
Re:Is there a site maintaining a list of "bad" SSD (Score:5, Insightful)
Wait, what?
When Intel SSDs decide they are bad, they just brick themselves instead of going into read-only-good-luck-your-data-may-be-bad-mode. This probably makes sense for Enterprise RAID, and for absolutely no other use case.
Re:Is there a site maintaining a list of "bad" SSD (Score:5, Informative)
The drive's media wear indicator ran out shortly after 700TB, signaling that the NAND's write tolerance had been exceeded. Intel doesn't have confidence in the drive at that point, so the 335 Series is designed to shift into read-only mode and then to brick itself when the power is cycled. Despite suffering just one reallocated sector, our sample dutifully followed the script. Data was accessible until a reboot prompted the drive to swallow its virtual cyanide pill.
Re: Is there a site maintaining a list of "bad" SS (Score:4, Interesting)
If your booting from the SSD, chances are the machine will crash...
Would be much better to just stay in readonly mode, and give you the chance to copy data off (and yes im aware this is no substitute for a backup, but think of the use case of a travelling laptop far away from its backup server etc).
Re: (Score:2)
We have Crucial/Micron SSD's in RAID 10 configurations. We of course by them in batch. There's nothing like watching 16 of them all go "bad" at the same time, and not having a clue WTF is going on. Fixed via firmware, but glorious hell, it made my heart sink watching every drive just die in a 20 second window. Randomly and repeatedly over a period of a couple weeks. They all work like a charm now...
Comment removed (Score:5, Interesting)
Re:Is there a site maintaining a list of "bad" SSD (Score:5, Informative)
Because Windows doesn't do queued TRIM.
TRIM in Windows and Linux before now worked more like this. -DATA- -DATA- -FLUSH ALL COMMANDS TO DRIVE- -WAIT- -TRIM- -DATA- -DATA- When I drive was doing the trim thing it could not do anything else, there could be no other in flight commands to the drive.
This is different. -DATA- -DATA- -TRIM- -DATA- -TRIM- -DATA- -DATA- -DATA-
TRIM is part of the NCQ and is an operation occurring with other instructions in the SATA queue. Problem is some disk manufactures have pissed this up. It seems likely that a firmware update will be able to fix this issue.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
Yep freebsd is fine here with 840 pros.
I find it hysterical that it must be the drives attitude on this heavily biased site towards linux.
Re: (Score:2)
When it comes to IT, I'll gladly let the hipsters deal with the pain and the data loss for a 0.5% speedup and use whatever they came up with - and is still relevant - after 10 years. Thank you very much.
Re: (Score:2)
I assume from that comment you have never actually used a SSD. .5% is a considerable exaduration. It is more like 90%; the drives slow to a crawl until you format them if there is no TRIM support.
Re: (Score:2)
This is about queued TRIM vs. just TRIM, not TRIM vs. no TRIM at all.
Re:Is there a site maintaining a list of "bad" SSD (Score:4, Interesting)
Re: (Score:2)
The problem with ReiserFS is you never know when it will kill your drive...
Bad joke...I know...I'll show myself out.
Re: (Score:2)
There is are two easy solutions to Ext4 vs. SSD problems. The first is ReiserFS which is still eminently usable on Gentoo. The second is UFS which is available on the BSD's.
If the problem is that the drive doesn't follow the spec for TRIM, I'd rather just disable TRIM than try to keep using it with a different filesystem. That seems a bit like playing Russian Roulette. Are you really that sure that ReiserFS won't have the same problem (unless it just doesn't use TRIM anyway, in which case it is no better than ext4 without TRIM).
Re: (Score:2)
how is f2fs now-a-days?
No idea in general, but I'd think that a log-based filesystem would be fairly immune to this kind of nonsense since it would only issue TRIMs very rarely, and then only for huge areas of the disk at a time. They don't overwrite random blocks in-place constantly.
Re: (Score:2, Insightful)
I assume that Windows does not submit queued trim commands, thereby avoiding this problem.
Re:Is there a site maintaining a list of "bad" SSD (Score:4, Funny)
Linus is on vacation so you'll have to wait on your next SSD purchase for him to return to merge the patches....
What? "The most influential individual economic force of the past 20 years" gets to just wander off?
Apple (Score:2, Insightful)
This is why Apple doesn't support TRIM in third-party SSDs...
Re: (Score:3)
And it's trivial to get around.
Re: (Score:2)
Not really trivial as of Yosemite. You have to disable kernel extension signing. Luckily, there appears to be a command line tool for force enabling TRIM in 10.11.
Re: (Score:2)
Still trivial even for computer n00bs.
Download and install Trim Enabler from the app store.
Re: (Score:3)
Sure, if you want to roll the dice.
You roll the dice when you buy Apple equipment, just like everything else. They've had their hardware failures, and their design failures too. They simply put out less models than other companies do.
Re: (Score:3)
This bug has nothing to do with standard TRIM. Send TRIM as a single command after flushing the drive queue and it works fine. This has to do with the newest SATA specification allowing queued TRIM blowing up. Apple just wants to sell their more expensive drives.
Re: (Score:2)
Re: (Score:2)
Been rolling those dice for 3 years with zero problems. Buy a quality SSD and not the cheap crap.
TRIM -- command of mass destruction (Score:5, Interesting)
The only TRIM use I recommend is running on it on an entire partition, e.g. like the swap partition, at boot, or before initializing a new filesystem. And that's it. It's an EXTREMELY dangerous command which results in non-deterministic operation. Not only do SSDs have bugs in handling TRIM, but filesystem implementations almost certainly also have ordering and concurrency bugs in handling TRIM. It's the least well-tested part of the firmware and the least well-tested part of the filesystem implementation. And due to cache effects, it's almost impossible to test it in a deterministic manner.
You can get close to the same performance and life out of your SSD without using TRIM by doing two simple things. First, use a filesystem with at least a 4KB block size so the SSD doesn't have to write-combine stuff on 512-byte boundaries. Second, simply leave a part of the SSD unused. 5% is plenty. In fact, if you have swap space configured on your SSD, that's usually enough on its own (since swap is not usually filled up during normal operation), as long as you TRIM it on boot.
-Matt
Re: (Score:2)
Isn't TRIM support disabled by default in Linux? They must have set the "discard" mount option.
Re:TRIM -- command of mass destruction (Score:5, Informative)
Man Linux users are hilarious. TRIM has worked and been safe on every other platform for ages.
LOL.
Do you know who you're replying to? Matt Dillon is the principal developer of DragonflyBSD, and the HAMMER fileystem.
While he probably does use Linux from time to time, I think you're more likely to find him at a BSD system.
Re: (Score:2)
Re: (Score:3)
*So* not a kludge. o.O
Historically TRIM was a smart addition to block-storage commands in the enterprise SAN space, to better enable things like thin-provisioning, which at least at the time went with server virtualisation like eggs go with bacon.
It just-so-happened that TRIM also fulfilled a very similar, and similarly smart, use-case on flash-based SSDs of all levels soon after/at the same time.
Given the characteristics of the current flash devices in SSDs, it's a perfectly reasonable thing to add to file
Re: (Score:2)
Man Linux users are hilarious. TRIM has worked and been safe on every other platform for ages.
The Intel SSD toolbox won't enable automatic TRIM support for my disk because it's some weird disk they sold to HP. TRIM claims success on it though.
Re: (Score:2)
Not so. My main Windows server suffered serious problems when first deployed - not so long ago - which we eventually tracked down to the use of TRIM on the iSCSI drives.
Granted the issues were mainly DoS rather than data loss, it was still a serious problem.
Re: (Score:2)
Just like ZFS and similar file systems have more problems with hardware. Or was it that they detect data corruption which happens on Windows without detection or warning?
Re: (Score:3)
No, TRIM support doesn't matter to SSDs.
TRIM is an optimization that when properly used, can make an SSD faster. Emphasis on can, and properly used. It can also be used to reduce write amplification (the ratio of number of writes to flash over the number of writes the host computer actually did. More later).
You can use an SSD with a non-TRIM aware OS and it will work just fine.
In an SSD, you have pages which are mapped to sectors. As you overwrite a logical sector, the wear levelling algorithm marks the old
Re: (Score:2)
Name and shame (Score:2, Informative)
see ata_blacklist_entry
(reformatted to get past Slashdot's 'junk' filter)
static const struct ata_blacklist_entry ata_device_blacklist [] = {
see ata_blacklist_entry [github.com]
static const struct ata_blacklist_entry ata_device_blacklist [] = /* Devices with DMA related problems under Linux */ , , , ,
WDC AC11000H, NULL, ATA_HORKAGE_NODMA
WDC AC22100H, NULL, ATA_HORKAGE_NODMA
WDC AC32500H, NULL, ATA_HORKAGE_NODMA
WDC AC33100H, NULL, ATA_HORKAGE_NODMA
WDC AC31600H, NULL, ATA_HORKAGE_NOD
Re:Name and shame (Score:5, Interesting)
TLDR (Score:2, Insightful)
Don't buy Samsung SSDs.
Re: (Score:2)
Yeah, don't buy arguably the best SSDs on the market because your OS can't be bothered to work around their foibles.
Algolia? (Score:2)
It sounds like a kind of infection. The kind you get, you know, down there
So dont use cheap consumer ssd's (Score:2)
Or if your going to use consumer ones vet the hell out of them.
Re: (Score:2)
I suspect that what we see here is a problem that is very common in the consumer hardware industry: manufacturers don't bother testing under any OS other than Windows, which means bugs that do not manifest under Windows go undetected. It's a problem most often seen in ACPI interfaces, where Windows has a very loose interpretation of the standards. So long as it runs fine on Windows, it's considered good enough to ship.
Apple TRIM Whitelist? (Score:4, Interesting)
I wonder if this issue has anything to do with why Apple only supports TRIM on specific drives they OEM?
Re: (Score:2, Insightful)
Re: (Score:2)
I will have to call BS on this one. Both my MBP and my old school MP run 3rd party drives, including SSDs (Crucial and OCZ - yeah, I know). No problems whatsoever so far, fans are spinning at their normal rpm.
Re:Apple TRIM Whitelist? (Score:5, Insightful)
Its a good bet.
As apple is probably quite aware, being probably the biggest seller of non-windows PCs, there is an endemic problem with a whole lot of hardware shipped claiming to be "compliant" with any given standard.
Most vendor's testing methodology pretty much comes down to "Works on windows? Ship it"
Linux has been dealing with this problem for decades. Power management implementations in laptops (and some desktop motherboards) are often outright broken and don't behave anything close to what the "standard" dictates. (Its so bad in laptops that Microsoft's power management maintains a hardware checklist with custom hacks for laptops with known bad implementations. On many systems it does not even /attempt/ to use standard calls)
Linux developers attempt to access hardware in a manner according to how documented standards state and end up tripping all sorts of bugs from mild to hardware-bricking. Flabbergasted hardware vendors often respond with "It works in windows!"
(Fortunately this shit doesn't fly in the server space where Linux is now pretty much King.. Well, at least in theory)
So yes, I'd be willing to bet that Apple found that enabling trim in any old SSD led to an unacceptable chance of filesystem corruption and decided to implement a white list. So, you know, they don't catch shit for someone else's broken hardware.
Re: (Score:2)
Re: (Score:2)
If that was the reason they would just make sure to never send queued TRIM commands like Windows does for all drives and Linux does for known bad drives. The performance loss is minimal, especially on Windows where TRIM commands are sent when the drive is idle anyway for performance reasons.
Instead they just disabled it for all but their own drives. Seems more like a way to discourage people from buying non-Apple SSDs (which are rather expensive) by crippling performance for no good reason.
What is Windows doing differently? (Score:2)
I have an 850 Pro at home and an 850 EVO at work, and haven't experienced any corruption. I know that Windows uses TRIM. Why am I not seeing any problems?
I doubt EXT4 or whatever part of Linux issuing TRIM commands is doing it wrong, but they're clearly doing it different, and maybe it can be worked around or at the very least reported to the manufacturer to fix broken firmware.
Re: (Score:2)
Re: (Score:2)
I can add an anecdote. 2x 840 evo pros at home and 1 840 evo running windows 7, 8.1 and 7 respectively. Never had any kind of corruption issue. Mind you none of these drives are under any serious load like search indexing.
Re: (Score:2)
Re: (Score:2)
How is ZFS or BTRFS going to help you?
Sounds like the corruption occurs due to the trim command erasing data that has already been successfully written.
Sure, ZFS will tell you if you're reading corrupt data, but by then it's too late, your data is gone.
Re: (Score:2)
Yes I've checked it on the windows 7 machines, not on the windows 8.1 (just assumed it was on as it was with windows 7).
Also no I don't continuously verify data. These drives store only program info on them. If I were accumulating errors on the data at any kind of problematic rate then I would have started seeing random crashes / bluescreens sometime over the past few years, which I haven't.
Errors are errors, and there's ways of noticing them other than continuous checksumming.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
These guys are running a very out of the ordinary usage profile and they have also managed to identify the root cause. It's possible it happens rarely with more normal usage and that none has bothered to find the root cause.
Re: (Score:2)
Re: (Score:2)
TRIM+NCQ (Score:2)
Except that's irrelevant, the guys didn't use queued TRIM either. It says in the article itself that they used non-queued TRIM.
They more precisley said :
The TRIM on our drives is un-queued
Which is true.
Except that, recent firmware fixes from Samsung (you know, the whole "speed decay on aging data" fiasco) had suddenly started to falsely report support for TRIM+NCQ.
So it might be possible that unkowningly to them, their Linux installation has suddenly started to issue queued TRIMs, even if the drive actually don't support them, because it trusts what the firmware told to do.
The thing Windows does (Score:2)
I have an 850 Pro at home and an 850 EVO at work, and haven't experienced any corruption. I know that Windows uses TRIM. Why am I not seeing any problems?
You're shielded from the problem because of 2 different things:
- Samsung 850 aren't as much affected by speed decay as Samsung 840. Thus a firmware fixing the speed problem was only shipped for 840s, not for 850s - and it's that firmware which had the problem. You drive simply didn't get the problematic firmware.
- That newest firmware falsely advertises that the drive supports TRIM together with NCQ. But the drive actually doesn't.Re-ordering should happen while TRIM is used.
Linux follows the standards: it
Indeed (Score:2)
Windows is doing it wrong {..} When Linux tries to follow the standard and do it right, it gets burned.
Indeed, that right: Windows doesn't support TRIM+NCQ, whereas Linux does and will enable it if the drive reports it as present.
the ssd manufacture has 'modified' their firmware to work around it. {...} and the ssd manufacture has 'modified' their firmware to work around it.
In this case it's purely accidental. Samsung issued a firmware upgrade for some Samsung SSD to fix a problem causing a decay of speed as data ages on the SSD. That new firmware happens to falsely report support for TRIM+NCQ whereas it doesn't actually support it.
It's a bug left in a new firmware fix, not a tweak intentionally designed to work around quirks and bugs in windows.
The bu
Another Deceptive Slashdot Title (Score:5, Insightful)
Correct title: "TRIM and Any Fucking Operating System: Don't Buy Defective SSDs"
It's not as if Windows or MacOS has any magic that makes queued TRIM work with non-compliant and poorly-coded hardware, right?
Seriously, WTF, over?
Re: (Score:3, Informative)
Windows and MacOS do not issue Queued TRIM in the first place. They only issue the regular TRIM command, which has to stop all data in flight and quiesce the entire submission queue (all tags, etc).
Linux is ultra-high-IO-load optimized, queued TRIM is a must when dealing with high-performance storage (not just SSDs). Maybe it should stop trusting devices that are neither attached to a SAS or FC transport by default when they claim to actually implement advanced features, though.
Re: (Score:2)
TRIM is not used by anything other than flash storage.
Re: (Score:2)
Wrong it is used with thin provisioning in enterprise storage products. That is I can thinly provision a volume on my storage array and it will use the TRIM commands to "reuse" blocks that are no longer needed in exactly the same way flash drive would.
Re: (Score:2)
https://en.wikipedia.org/wiki/... [wikipedia.org]
TRIM is explicitly a non-queued command. Linux attempting to queue it is out of spec. It works most of the time, but it isn't fair to say lay all the blame for failures with the SSD manufacturers. They should reject queued TRIM commands if they don't work, but equally Linux should not be sending them.
Re: (Score:2)
Perhaps you should update your lovely Wikipedia page, because it is outdated.
SATA 3.1 standard note at techreport [techreport.com]
Webopedia info on SATA 3.x [webopedia.com]
Wikipedia's own entry on SATA 3.1 [wikipedia.org]
TechPowerUp article about SATA 3.1 [techpowerup.com]
Here's a press release from sata-io about it: in PDF format [sata-io.org]
Not only does TRIM via NCQ exist, it is in the recent specifications. You see, the thing about computer technology is that it keeps being improved. Outdated information doesn't stop that. It just becomes outdated.
Re: (Score:2)
From your link:
This Trim shortcoming has been overcome in Serial ATA revision 3.1 with the introduction of the Queued Trim Command.
If the drives were not reporti
Re:Another Deceptive Slashdot Title (Score:5, Insightful)
Dear Microsoft,
Thank you for your generous donation to our staff social club. As promised, please find attached drivers that utilise the *real* TRIM commands for our SSDs.
Sincerely yours,
A. Manufacturer
Re: (Score:2)
OS X disables TRIM on third-party SSDs by default. On Mavericks and below, there's an app called TRIM Enabler that enables TRIM on third-party SSDs. On Yosemite, kernel signing prevents this from happening, resulting in a really sketchy method to get TRIM operational.
El Capitan is supposed to come with a way to enable TRIM on third-party SSDs, but it requires special modes and using
Re: (Score:2, Informative)
It's a buggy hardware issue. How each operating system deals with it may vary but the entire dilemma results from shitty hardware.
Just not worth it (Score:2)
Even when implemented correctly, TRIM slows down regular I/O that happens around the time it's done. On top of that, you are risking OS and drive bugs that can vary with every incremental revision. You may not notice corruption until all your backups are overwritten, and just think of a hassle of restoring even once. Is it really worth potential minor performance benefits that are often realized by drive itself anyway?
I can think of exceptions like building a supercomputer with monolithic array of drives us
Re: (Score:2)
Use of TRIM fights the deleterious effect of write amplification on lifespan, as well as reducing degradation of performance over time. Why does that "make no sense" for individual users?
There are two strategies for using TRIM.
The first one is "discard" in the mount options, which causes the drive to be informed via the TRIM command at the time a block is freed (file erased). The second strategy runs a utility (fstrim) periodically - for example, once a day - to TRIM all the blocks freed since the last time
SSDs are not HDDs so DO keep backups ALWAYS handy (Score:2)
I only have experience with customer grade SSDs and not with enterprise ones. But as it comes for customer SSDs most of the ones I've used or maintained caused no problems. But I recall one HP made drive that used crash after about a year - total data loss after a year of usage. Reformat and the drive was ok - another year passed and crash and data loss. As it turned out the disk had some encryption procedures in firmware which were faulty - firmware upgrade (hopefully) fixed it but also said firmware updat
For anyone bothered by nonsense constructions... (Score:2)
FTFS:
Is "SSD drive" grammatically anything like "PIN number"?
Re: (Score:3)
"we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel"
???? SERIOUSLY???
While poorly written, I think the author was suggesting that any model of SSD for which the Linux kernel has specific special handling logic should be avoided. In my opinion, it is not an unreasonable statement.
Re: (Score:2)
While poorly written, I think the author was suggesting that any model of SSD for which the Linux kernel has specific special handling logic should be avoided. In my opinion, it is not an unreasonable statement.
It probably is an unreasonable statement. If Linux has special logic to handle the drive, then someone else probably already had the problem and now there's a fix in so it probably won't happen to you.
Re:trim (Score:4, Interesting)
While poorly written, I think the author was suggesting that any model of SSD for which the Linux kernel has specific special handling logic should be avoided. In my opinion, it is not an unreasonable statement.
It probably is an unreasonable statement. If Linux has special logic to handle the drive, then someone else probably already had the problem and now there's a fix in so it probably won't happen to you.
Perhaps. But if the drive was broken and someone had to write special software to fix it, how can you be sure that it was fixed correctly and completely? Can you also be sure that the "fix" works for all versions of firmware on the drive? While you might be confident of these things, I would suggest that it would be better to use a drive that follows the standards and doesn't require special code to make it work right. Granted that as always, your mileage may vary -- and it could vary in either direction.
TRIM vs NCQ (Score:2)
But if the drive was broken and someone had to write special software to fix it, how can you be sure that it was fixed correctly and completely? Can you also be sure that the "fix" works for all versions of firmware on the drive?
Because the fix is relatively simple.
To put in general terms:
- The problem is that the drive advertises a bunch of features. Linux tries to use them. But the firmware is buggy and the feature don't work or aren't even implemented.
- The fix is to ignore any advanced feature even if advertised by firmware. Stick to only the small subset of features that are also used in windows.
e.g:
- the most frequent problem with trim is that the device advertises supporting TRIM with NCQ (= reordering of commands).
(the late
Re: (Score:2)
It will break if some model of drive from the same manufacturer is also buggy but the model number happens not to match the list of regexes. That's why they recommend steering clear. If the regex match correctly keeps up with all the buggy drives, there's no problem.
Re: (Score:2)
It will break if some model of drive from the same manufacturer is also buggy but the model number happens not to match the list of regexes.
Right, so that's another reason not to buy a disk which isn't mentioned in kernel errata. Any new device may have unknown bugs. Thanks for really driving my point home.
Re: (Score:3)
it's a long list, apparently... the whitelist is shorter. I'd go with the Apple one. They seem to have their QA department high and tight.
Re: (Score:3)
Your RAID controller would have to pass the TRIM commands to the SSDs in the array for this bug to show up. The controller simply having TRIM support doesn't mean it actually passes it to drives that are part of an array.
Intel controllers since the Z-77 do pass TRIM along to the drives in the array, but only for RAID 1 and RAID 0.
TRIM+NCQ (Score:3)
it's not trim by itself that is problematic.
it's the combined use with NCQ (= reordering of commands).
The latest firmware (the one that fixes the speed decay) has started to falsely advertise support for this combination, whereas the drive doesn't actually support it.
The drive isn't actually able to re-order TRIM commands, and the wrong bit might end up being erased due to NCQ.
So this but will only show up:
- your drive is a Samsung 840 EVO (850 aren't affected by the speed decay and didn't get the faulty up
Re:Btrfs? (Score:4, Informative)
COW doesn't solve the problem that TRIM solves.
Once you write over the entire drive once, then all blocks of flash are dirty and MUST be erased before any new writes can take place. At this point, you can't even write the meta data without a sector erase, then you can write to it ... just to tell it that you've added another ref to an existing block.
With TRIM, blocks are erased when they are no longer used, so they do not need an erase cycle when before writing to them.
I don't use BTRFS, but do use ZFS and it most certainly benefits from TRIM on an active drive, which is certainly what all your SSDs are going to be.
Re: (Score:2)
Re: (Score:2)
ZFS is COW and still cannot magically eliminate rrandom writes due to fragmentation as it gets near full. I daresay all filesystems have this problem to a degree. Reserved blocks and over-provisioning in no way can prevent it.
Reserved blocks are solely present to allow bad-block replacement. Over-provisioning adds to the general pool of available blocks, but as soon as they are used they have to be erased before re-use, just like any other block. As your writes cumulatively total multiples of the capacity o
Re: (Score:2)
Re: (Score:2)
Couple of things: first, disk never gets near full because of root reservation. Second - there is implicit trim when you overwrite a block with new data (think about it). So my point still stands.
He never claimed the drive got full. He said that the issue occurs when you've written a drive's worth of data. You can write a drive's worth of data without filling up even 1% of the drive, if you just overwrite one logical block in-place repeatedly.
Of course the drive erases a block when you overwrite it. The whole point of trim is that it improves performance when this is done. If you overwrite a 512-byte block that isn't trimmed the drive has to erase the surrounding 4K worth of blocks, and then rew
Re:Mentioned in a bad way by the Linux kernal? (Score:4, Funny)
I could shove a dead Rat in the CPU slot and Windows would work with errors galore while Linux would actually tell me I have a dead Rat in the CPU slot. Linux exposes bad Hardware.