Forgot your password?
typodupeerror
Data Storage Software Intel Linux

Optimizing Linux Systems For Solid State Disks 207

Posted by Soulskill
from the bit-by-bit dept.
tytso writes "I've recently started exploring ways of configuring Solid State Disks (SSDs) so they work most efficiently in Linux. In particular, Intel's new 80GB X25-M, which has fallen down to a street price of around $400 and thus within my toy budget. It turns out that the Linux Storage Stack isn't set up well to align partitions and filesystems for use with SSD's, RAID systems, and 4k sector disks. There are also some interesting configuration and tuning that we need to do to avoid potential fragmentation problems with the current generation of Intel SSDs. I've figured out ways of addressing some of these issues, but it's clear that more work is needed to make this easy for mere mortals to efficiently use next generation storage devices with Linux."
This discussion has been archived. No new comments can be posted.

Optimizing Linux Systems For Solid State Disks

Comments Filter:
  • by Anonymous Coward on Saturday February 21, 2009 @11:29AM (#26940933)
    Yes, we do need progress in that area. However, for many of us who require better-than-average data security, the matter of SSD's read/write behaviour makes the devices extremely vulnerable to analyses and discovery of data the owner/author of which believes to be inaccessible to others: 'secure wiping', or lack thereof, is the issue. As i understand it, 'secure wiping' programs fail to do their job, on SSD's . It's been reported among 'criminals' that SSD's are a 'forensic analyst's dream come true' ! and so it must be for corporate spies, etc,, who have a yen for theft of private data.
  • Is it only linux? (Score:4, Interesting)

    by jmors (682994) on Saturday February 21, 2009 @11:37AM (#26940979)
    This article makes me wonder if any OS is really properly optimized for SSDs. Has there been any analysis as to whether or not windows machines properly optimize the use of solid state disks? Perhaps the problem goes beyond just linux?
  • by Anonymous Coward on Saturday February 21, 2009 @12:31PM (#26941355)

    As other components become less noisy, the "solid state" electronics' acoustic noise becomes audible. It isn't necessarily faulty electronics, just badly designed with no consideration for vibrations due to electromagnetic fields changing at audible frequencies. These fields subtly move components and this movement causes the acoustic noise. Most often it is a power supply or regulation unit which causes high pitched noises. Old tube TV sets often emit noise at the line frequency of the TV signal (ca. 15.6kHz for PAL, ca. 15.8kHz for NTSC).

  • by DragonTHC (208439) <Dragon&gamerslastwill,com> on Saturday February 21, 2009 @12:32PM (#26941367) Homepage Journal

    Don't forget android.

  • by v1 (525388) on Saturday February 21, 2009 @12:34PM (#26941395) Homepage Journal

    I don't think this is going to be a significant problem when compared to normal seek time problems.

    Lets say we have 100 k of data to read. 512 byte blocks would require 200 reads. 4k blocks would require 25 reads.

    For rotating discs: If the data is contiguous, we have to hope that all the blocks are on the same track. If they are, then there is 1 (potentially very costly) seek to get to the track with all the blocks on it. The cost of the seek is dependent on the track it's going to, the track it's on, and whether or not the drive is sleeping or spun down. Otherwise we also get to do another very short seek, which is going to add a bit of time to get to the next adjacent track. Worst case scenario all 200 blocks are on different tracks, scattered randomly on the platter, requiring 200 seeks. Ouch ouch ouch.

    For SSDs: What is important is the number of cells we have to read. Cells will be 4k in size. All seek times are essentially zero. Best case scenario, all data is contiguous, and the start block is at the start of a cell. Read time boils down to how fast the flash can read 20 cells. Worst case scenario is where the data is 100% fragmented, such that all 200 512 byte blocks reside in a different cell, requiring 200 cell reads. (10fold increase in time required) There will also be overhead in copying out the 512 byte data from each buffer and assembling things, but this time is negligible for this comparison.

    While the 20x time increase (order N) looks significant, it's important to compare the probabilities involved, and just how bad things get. The most important difference between how these two drives react is the space between fragments. In the "worse case' for SSD, 100% fragmentation, is highly unlikely. I don't even want to think about what a spinning disc would do if asked to perform a head seek for 100% of the blocks in say, a 1mb file. The read head would probably sing like a tuning fork at the very least. 2000 cell reads compared to 2000 seeks, the SSD will win handily every single time, even if the tracks on the disc are close.

    If the spacing between fragments is anything near normal, say 30-100k, then there will be some seeking going on with the disc, and there will be some wasted cell reads with the SDD, but having to do an extra one cell read compared with having to do an extra head seek, again the SSD wins hands down. The advantage of the SSD actually goes down as fragmentation goes down, because most fragments are going to cause a head seek, each of will significantly widen the time gap. Also a spinning disc will read in the blocks much faster than the cells on a SSD.

    I realize the OP was more describing the possibility of "not so much bang for the buck as you are expecting" due to fragmentation, and I know the above hits more on comparing the two than what happens to the SSD, but if you consider the effects of fragmentation on a spinning disc, and then weigh how the impact compares with a SSD, it's easy to see that fragmentation that sent you running for the defrag tool yesterday may not even be noticeable with a SSD. So I'd call this a "non-issue".

    What I'm waiting for is them to invest the same dev time in read speeds as write speeds. SSDs don't appear to be doing any interleaved reads - they're doing it for the writes because they're so slow. Though at this point I wonder if read speeds are just plain running into a bus speed limit with the SSDs?

  • by NekoXP (67564) on Saturday February 21, 2009 @12:36PM (#26941401) Homepage

    Because Intel and the rest want to keep their wear-leveling algorithm and proprietary controller as much of a secret as possible so they can try to keep on top of the SSD market.

    Moving wear-levelling into the filesystem - especially an open source one - effectively also defeats the ability to change the low-level operation of the drive when it comes to each flash chip - and of course, having a filesystem and a special MTD driver for *every single SSD drive manufactured* when they change flash chips or tweak the controller, could get unwieldy.

    Backing them behind SATA is a wonderful idea, but this reliance on CHS values I think is what's killing it. Why is the Linux block subsystem still stuck in the 20MB hard-disk era like this?

  • Re:Is it only linux? (Score:1, Interesting)

    by Anonymous Coward on Saturday February 21, 2009 @01:04PM (#26941641)

    Somebody please mod the parent up to 5.

    Yeah, hard disk manufacturers.

    Since they moved to large disks which require LBA, they've been fudging the CHS values returned by the drive to get the maximum size available to legacy operating systems. Since when did a disk have 63 heads? Never. It doesn't even make sense anymore when most hard disks are single platter (therefore having 1 or 2) and SSDs don't even have heads.

    What they need to do is define a new command structure for accurately determining the best structure on the disk - on an SSD this would report the erase block size or so, on a hard disk, how many sectors are in a cylinder, without fucking around with some legacy value designed in the 1980's.

    With the drive electronics as complex as they are nowdays you'd think the OS wouldn't need to know much. Just give it a couple of stats to allow the file system to align properly and stop with all this CHS translation.

  • by couchslug (175151) on Saturday February 21, 2009 @02:34PM (#26942297)

    If it's an older laptop or the mechanical hard disk died, go for it. Addonics make SATA CF adapters so you are not restricted to IDE CF adapters.

  • Re:ZFS L2ARC (Score:1, Interesting)

    by Anonymous Coward on Saturday February 21, 2009 @06:14PM (#26944147)

    I assume, the SSD augumented rotating rust is the way to go, thus technologies like L2ARC will me more widespread in the future. As you correctly state ... you need a lot of flash storage to substitue all this magnetic storage, at least as long there arenÂt further breakthroughs like even octa- or hexa-bit MLC with SLC reliability.

    When we just use flash to accelerate the working set instead of storing seldom used data on SSD, this would make more sense. I donÂt like this attitude of big storage vendors simply selling flash drives instead of rotating rust drives without giving a good way to manage this.

    BTW: I want both in my notebook. A flash drive for my working set actual emails, actual documents and so on, and i want disk drives for my long term storage and mass data like video, images and music ... but i donÂt want to manage it.

  • by harry666t (1062422) <harry666t@gmail. c o m> on Saturday February 21, 2009 @07:38PM (#26944743)
    That was my idea when I've proposed an "object storage system" here on /. a few months ago: associate type and metadata with every file, making them more "object-like" (as in object-oriented programming). The storage system would know the behaviour of each object (whether it is likely to grow, or more likely to be modified in place, or probably not modified at all, etc), and would choose the most efficient way of storing every particular kind of data. I've also proposed separate namespaces for each process, capability-based security, dropping paths in favour of non-hierarchical tags, and a few other "revolutionary" ideas that all had only one downside: nobody's going to break backwards compatibility, especially while the current system still "just works".
  • by Cassini2 (956052) on Saturday February 21, 2009 @10:20PM (#26945683)

    So many choices!

    This could be fun. Here are some more suggestions:

    - Welder - The little chips don't last long against a good arc welder.
    - 600 VAC - Why stop at a wall outlet?
    - Tesla Coil - 200 kV is better than 600 VAC
    - Lightening Rod. Why stop at 200 kV?
    - Oxy-acetylene Torch - higher temperatures
    - Plasma Cutter - even higher temperatures
    - NdYAG Laser - Etch your name into the remains of the flash chip.
    - Chew Toy for Dog - Don't underestimate some of those canines, although USB keys might not be good for them.
    - Log-Splitting Practice. How good are you at aiming that Axe?
    - Place USB in Cement Footings of a building. Do the mob thing.
    - Rock crusher
    - Grinding Machine
    - Wood chipper / pulper
    - Cement kiln
    - Blast Furnace
    - Industrial Press - Terminator Style!

    I'm pretty sure that some of these machines can destroy industrial quantities of USB keys, with little difficulty. Cement kilns and rock crushers can destroy just about anything. It would be interesting the see the resulting crushed rock in a piece of cement though. It would be colorful.

  • by tytso (63275) * on Sunday February 22, 2009 @07:05PM (#26952243) Homepage

    I use 1GB for /boot because I'm a kernel developer and I end up experimenting with a large number of kernels (yes, on my laptop --- I travel way to much, and a lot of my development time happens while I'm on an airplane). In addition, SystemTap requires compiling kernels with debuginfo enabled, which makes the resulting kernels gargantuan --- it's actually not that uncommon for me to fill my /boot partition and need to garbage collect old kernels. So yes, I really do need a 1GB for /boot.

    As far as LVM, of course I use more than a single volume; separate LV's get used for test filesystems (I'm a filesystem developer, remember), but more importantly, the most important reason to use LVM is because it allows you to take snapshots of your live filesystem and then run e2fsck on the snapshot volume --- if the e2fsck is clean you can then drop the snapshot volume, and run "tune2fs -C 0 -T now /dev/XXX" on the file system. This eliminates boot-time fsck's, while still allowing me to make sure the file system is consistent. And because I'm running e2fsck on the snapshot, I can be reading e-mail or browsing the web while the e2fsck is running in the background. LVM is definitely worth the overhead (which isn't that much, in any case).

"The only way for a reporter to look at a politician is down." -- H.L. Mencken

Working...