Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Data Storage Software Intel Linux

Optimizing Linux Systems For Solid State Disks 207

tytso writes "I've recently started exploring ways of configuring Solid State Disks (SSDs) so they work most efficiently in Linux. In particular, Intel's new 80GB X25-M, which has fallen down to a street price of around $400 and thus within my toy budget. It turns out that the Linux Storage Stack isn't set up well to align partitions and filesystems for use with SSD's, RAID systems, and 4k sector disks. There are also some interesting configuration and tuning that we need to do to avoid potential fragmentation problems with the current generation of Intel SSDs. I've figured out ways of addressing some of these issues, but it's clear that more work is needed to make this easy for mere mortals to efficiently use next generation storage devices with Linux."
This discussion has been archived. No new comments can be posted.

Optimizing Linux Systems For Solid State Disks

Comments Filter:
  • by wjh31 ( 1372867 ) on Saturday February 21, 2009 @10:27AM (#26940921) Homepage
    I think the bigger challenge will be in getting mere mortals to have a $400 toy budget to afford the SSD
    • Re: (Score:2, Insightful)

      Well, they will obviously go down in price eventually. The real price issue won't be affordability but rather value. Do most consumers out there really want a what would seem to average out to slightly faster drive, or an order of magnitude or two more storage? There have always been fast drive solutions in the past and they have never been very popular, and quickly become obsolete. Eventually some sort of SSD will take over the market, but I don't believe this sort of compromised experience business model
      • by Average ( 648 )

        Sure. There are *lots* of considerations beyond speed to want SSDs.

        First is battery life. Batteries suck. Laptops pulling 5 or 6 watts total make that suck more bearable. SSDs are part of that.

        There's also noise. Hard drives have gotten much quieter. But in a dead-silent conference room, I want dead-silence.

        Even form-factor is an issue. a 2.5" cylinder is a notable chunk of a small notebook. 1.8" drives are, generally, quite slow. SSDs can be worked into design.

        • Re: (Score:3, Informative)

          by piripiri ( 1476949 )

          Sure. There are *lots* of considerations beyond speed to want SSDs

          And SSD drives are also shock-resistant.

        • For dead-silence you might be better off with getting a LED backlight. In my laptop I can't hear the hard drive over the whine of the backlight converter.
        • Sure. There are *lots* of considerations beyond speed to want SSDs.

          Another example: I have a tiny NSLU2 network appliance that I use as a music server. In the out-of-the-box configuration, it runs Linux from a ROM, but you can add an external drive via a USB cable and boot Linux off of that. It doesn't have SATA, so that wasn't an option.

          I'm not sure why this guy paid $400 for an 80 Gb SSD. I just upgraded my music server to a 64 Gb SSD, and it only cost $100. Maybe the one he got is a fancier, faster d

          • I'm not sure why this guy paid $400 for an 80 Gb SSD. I just upgraded my music server to a 64 Gb SSD, and it only cost $100. Maybe the one he got is a fancier, faster drive?

            Price/GB for SSDs seems to be largely proportional to the number of write operations per second the SSD can handle. Once a handful of manufacturers solve that particular puzzle, I expect prices will drop significantly.

      • I've been wrestling this idea around as a sound studio solution, and it seems that an external storage unit makes the most sense, with a DRAM card for the currently working files. Almost affordable, anyway.

    • by Hatta ( 162192 )

      You can buy a 32GB SSD for less than $100 [oempcworld.com] today. Is that within the budget of mere mortals?

      • by gmuslera ( 3436 )
        Not all SSDs are equal. Why you should pay US$400 for the Intel X25-M if you can get another for under US$100? Check this AnandTech [anandtech.com] review, that spent a lot of time bashing JMicron JMF602 based SSDs.
  • Is it only linux? (Score:4, Interesting)

    by jmors ( 682994 ) on Saturday February 21, 2009 @10:37AM (#26940979)
    This article makes me wonder if any OS is really properly optimized for SSDs. Has there been any analysis as to whether or not windows machines properly optimize the use of solid state disks? Perhaps the problem goes beyond just linux?
    • Re: (Score:3, Informative)

      by Jurily ( 900488 )

      unfortunately the default 255 heads and 63 sectors is hard coded in many places in the kernel, in the SCSI stack, and in various partitioning programs; so fixing this will require changes in many places.

      Looks like someone broke the SPOT rule.

      As for other OSes:

      Vista has already started working around this problem, since it uses a default partitioning geometry of 240 heads and 63 sectors/track. This results in a cylinder boundary which is divisible by 8, and so the partitions (with the exception of the first, which is still misaligned unless you play some additional tricks) are 4k aligned.

      • by NekoXP ( 67564 ) on Saturday February 21, 2009 @11:29AM (#26941339) Homepage

        Yeah, hard disk manufacturers.

        Since they moved to large disks which require LBA, they've been fudging the CHS values returned by the drive to get the maximum size available to legacy operating systems. Since when did a disk have 63 heads? Never. It doesn't even make sense anymore when most hard disks are single platter (therefore having 1 or 2) and SSDs don't even have heads.

        What they need to do is define a new command structure for accurately determining the best structure on the disk - on an SSD this would report the erase block size or so, on a hard disk, how many sectors are in a cylinder, without fucking around with some legacy value designed in the 1980's.

        • by Dr. Ion ( 169741 )

          A bigger problem is our reluctance to move off 512-byte sectors. Who needs that fine granularity of LBA?

          That's two sectors per kilobyte.. dating back to the floppy disk. And we still use this quanta on TB hard disks.

    • Re: (Score:3, Informative)

      by mxs ( 42717 )

      Of course it goes beyond just Linux. Microsoft is aware of the problem and working on improving its SSD performance (they already did some things in Vista as the article states, and Windows 7 has more in store; google around to find a few slides from WinHEC on the topic).

      The problem with Windows w.r.t. optimizing for SSDs is that it LOVES to do lots and lots of tiny writes all the time, even when the system is idle (and moreso when it is not). Try moving the "prefetch" folder to a different drive. Try movin

    • Re:Is it only linux? (Score:5, Informative)

      by tonyr60 ( 32153 ) on Saturday February 21, 2009 @03:00PM (#26942973)

      Sun's new 7000 series storage arrays use them, and that series runs OpenSolaris. So I guess Solaris has at least some SSD optimisatioons... http://www.infostor.com/article_display.content.global.en-us.articles.infostor.top-news.sun_s-ssd_arrays_hit.1.html [infostor.com]

    • There is no major OS that makes anything remotely like an appropriate use of persistent RAM. SSD is one application of persistent RAM, but it's a terrible one, which ignores most of the benefits of persistent RAM. I want to treat flash as heirarchical memory, not as disk. I want the OS to support me not with inconsequential filesystem optimizations, but by implementing cache-on-write with an asynchronous write-back queue for mapped flash memory. I want to map allocated regions of a terabyte flash array

  • If I mount /home on a separate drive, (good to do when upgrading) the rest of the Linux file system fits nicely on a small SSD.

    • If I mount /home on a separate drive, (good to do when upgrading) the rest of the Linux file system fits nicely on a small SSD.

      I would move /tmp to either a RAM disk or a hard drive. There is no point in having tmp files using up the lifespan of your SSD, especially after you just moved /home to extend its life. Also, you could move some of the stuff in /var to a hard drive or ramdisk. Good candidates might be /var/tmp and /var/log. Alternatively, you could just move the entire /var hierarchy to a hard d

      • Good point, I will have to think about that...

        Well, I fired up Ubuntu with the new configuration and I wasn't disappointed - WOW!

        Booting is lightning quick - I am still doing a lot of downloads so I haven't had a chance as some real performance tests but from what I have seen so far the results are impressive.

  • Most of us can't afford to worry about this, but does the Fusion-io suffer from this issue?

  • by Anonymous Coward on Saturday February 21, 2009 @11:06AM (#26941177)

    > Vista has already started working around this problem, since it uses a default partitioning geometry of 240 heads and 63 sectors/track. This results in a cylinder boundary which is divisible by 8, and so the partitions (with the exception of the first, which is still misaligned unless you play some additional tricks) are 4k aligned. So this is one place where Vista is ahead of Linuxâ¦.

    Although the technology it is used in is repugnant, NTFS has always been the One True Filesystem. It descended from DIGITAL's ODS2 (On Disk Structure 2) which traces back to the original Five Models (PDP 1, 8, 10, 11 and 12). You see, ODS was written by passionate people with degrees and rich personal lives in Massachusetts who sang and danced before the fall of humanity to the indignant Gates series who assimilated their young wherever possible and worked them into early graves during his epic battle with the Steves before the UNIX enemy remerged after a 25 year sleep and nuked the United States, draining all of its technological secrets to the other side of the world. Gates, realizing what he's done, now travels the universe seeking to rebuild his legacy by purifying humanity while the Steve series attempts to rebuild itself. Some of the original Five are still around, left to logon to Slashdot and witness what's left of the shadow of humanity still in the game as they struggle blindly around in epic circles indulging new and different ways to steal music, art and technology to make up for their lack of creativity long ago bred out of them by the Gates series.

    • I have mod points, but cannot find the "Totally Bonkers" mod...
  • by jensend ( 71114 ) on Saturday February 21, 2009 @11:07AM (#26941187)

    SSDs gradually gain more and more sophisticated controllers which do more and more to try to make the SSD seem like an ordinary hard drive, but at the end of the day the differences are great enough that they can't all be plastered over that way (the fragmentation/long term use problems the story linked to are a good example). I know that (at present- this could and should be fixed) making these things run on a regular hard drive interface and tolerate being used with a regular FS is important for Windows compatibility, but it seems like a lot of cost could be avoided and a lot of performance gained by having a more direct flash interface and using flash-specific filesystems like UBIFS, YAFFS2, or LogFS. I have to wonder why vendors aren't pursuing that path.

    • by NekoXP ( 67564 ) on Saturday February 21, 2009 @11:36AM (#26941401) Homepage

      Because Intel and the rest want to keep their wear-leveling algorithm and proprietary controller as much of a secret as possible so they can try to keep on top of the SSD market.

      Moving wear-levelling into the filesystem - especially an open source one - effectively also defeats the ability to change the low-level operation of the drive when it comes to each flash chip - and of course, having a filesystem and a special MTD driver for *every single SSD drive manufactured* when they change flash chips or tweak the controller, could get unwieldy.

      Backing them behind SATA is a wonderful idea, but this reliance on CHS values I think is what's killing it. Why is the Linux block subsystem still stuck in the 20MB hard-disk era like this?

      • > and of course, having a filesystem and a special MTD driver for
        > *every single SSD drive manufactured* when they change flash
        > chips or tweak the controller, could get unwieldy.

        Large numbers of flash chips can be supported by the MTD CFI drivers:

        http://en.wikipedia.org/wiki/Common_Flash_Memory_Interface [wikipedia.org]

        Something similar could be done for SSDs too, except they've chosen HDD standards as they are a better fit.

        Mike

      • Same reason it doesn't reasonably support heirarchical persistent RAM: Everybody who wants to do it is too busy with other work.

      • Re: (Score:3, Insightful)

        by gillbates ( 106458 )

        Why is the Linux block subsystem still stuck in the 20MB hard-disk era like this?

        As one who had to tune the performance of hard drives at the kernel level, I can say with some authority that the Linux block subsystem is not at all stuck in the 20MB hard-disk era. In fact, everything is logical blocks these days, and it's the filesystem driver and IO schedulers which determine the write sequences. The block layer is largely "dumb" in this regard, and treats every block device as nothing more than a la

  • . . . which runs on the Nokia N800/N810 "Internet Tablets" (www.maemo.org). They might have done some tweaking, since this is Linux running on SSDs.

    • Re: (Score:3, Interesting)

      by DragonTHC ( 208439 )

      Don't forget android.

    • by ADRA ( 37398 )

      Maemo and several other embedded systems have been using flash based disk storage for years. The problem is that SSD isn't a flash storage device, its a hard-drive interface wrapped around a flash device.

      Since Linux can't see the flash devices themselves, it can't properly implement a flash based hard-drive interface.

  • when I saw the headline, I was thinking not so much the fragmentation issues, but the repeated re-writing of logs and other small frequently accessed files that SSDs are susceptible to (maximum # of rated read-write cycles). Have there been any developments in that area?
    • by nedlohs ( 1335013 ) on Saturday February 21, 2009 @11:53AM (#26941529)

      It will outlast a standard hard drive by orders of magnitude so it's completely not an issue.

      With wear leveling and the technology now supporting millions of writes it just doesn't matter. Here's a random data sheet: http://mtron.net/Upload_Data/Spec/ASIC/MOBI/PATA/MSD-PATA3035_rev0.3.pdf [mtron.net]

      "Write endurance: >140 years @ 50GB write/day at 32GB SSD"

      Basically the device will fail before it reaches the it runs out of write cycles. You can overwrite the entire device twice a day and it will last longer than your lifetime. Of course it will fail due to other issues before then anyway.

      Can there be a mention of SSDs without this out-dated garbage being brought up?

      • by A beautiful mind ( 821714 ) on Saturday February 21, 2009 @01:00PM (#26942047)
        There are a few tricks up the manufacturer's sleeve to make this slightly better than it really is:

        1. large block size (120k-200k?) means that even if you write 20 bytes, the disk physically writes a lot more. For logfiles and databases (quite common on desktops too, think of index dbs and sqlite in firefox for storing the search history...) where tiny amounts of data are modified, this can add up rapidly. Something writes to the disk once every second? That's 16.5GB / day, even if you're only changing a single byte over and over.

        2. Even if the memory cells do not die, due to the large block size, fragmentation will occur (most of the cells will have a small amount of space used in them). There has been a few articles about this that even devices with advanced wear leveling technology like Intel's exhibit a large performance drop (less than half of the read/write performance of a new drive of the same kind) after a few months of normal usage.

        3. According to Tomshardware [tomshardware.com] unnamed OEMs told them that all the SSD drives they tested under simulated server workloads got toasted after a few months of testing. Now, I wouldn't necessary consider this accurate or true, but I'd sure as hell would not use SSDs in a serious environment until this is proven false.
      • Re: (Score:3, Informative)

        All nice and dandy, but these figures aren't exactly honest. In a normal scenario your filesystem consists for a large part on static data. These blocks/cells are never rewritten. Therefore the writes (for logfiles etc) are concentrated on a small part of the disk, wearing it out rather more quickly.

        Having a few Compact Flash disks wear out in the recent past, I'm not exactly anxious to replace my server disks with SSD.
        • I'd expect that wear-leveling algorithms look for that kind of discrepancy and moves static files to sectors that are getting heavier use, and starts putting heavily written files onto the sectors that previously contained static info. That would be pretty easy to do. At least OS X moves files around according to how often they're being used (but the OS X technology was designed for optimizing platters).
        • This is not "informative" it's "crap" and also "wrong". Modern SSDs move data even when it isn't written. Therefore there is no static data from the flash controller's point of view.
  • From what I can scrape together quickly off of the Internet IANASE (I am not a software engineer). The biggest difference seems to be the lack of a need for error checking and disk defrag etc. Since the a normal spinning hdd does not actually delete a file but just removes the markers the filesystem treats all areas the same and does the same things to both real and non-real data to keep the disk state sane. In an SSD all of this leads to a lot of unneeded disk usage and premature degradation of the dri

    • by ADRA ( 37398 )

      Flash devices have the inherent weakness that if you write to the same place in the disk say 10000 times, that part of the disk will stop working.

      Its kind of like a corrupt sector(piece of the disk) on your regular hard-drive, but instead of the timer being based on some drive defects or head crashes, its based on a write timer.

      Why is this a big deal? Say I have a file called foose.txt. I decide that my neat program will open the file, increment a number, then close the file again. It sound pretty simple, b

      • by tytso ( 63275 ) * on Saturday February 21, 2009 @04:03PM (#26943575) Homepage

        Because of this, I imagine that the author would like Linux devs to better support SSD's by getting non-flash file systems to support SSD better than they are today.

        Heh. The author is a Linux dev; I'm the ext4 maintainer, and if you read my actual blog posting, you'll see that I gave some practical things that can be done to support SSD's today just by better tuning parameters given to tools like fdisk, pvcreate, mke2fs, etc., and I talked about some of the things I'm thinking about to make ext4 better at support SSD's better than it does today.....

  • I'm just sitting here thinking. Doesn't an SSD have a preset number of writes in it due to it's nature?

    Does it really matter if they spread these writes around on the hard drive when the number of writes the drive is capable of doing is still the same in the end?

    To drastically oversimplify, lets say that each block can be written to twice. Does it really matter if they used up the first blocks on the drive and just spread towards the end of the drive partition with general usage rather than jumping a
    • Re: (Score:3, Informative)

      Say you 100 cells and can write 10 times to each cell.

      Having every cell written to nine times: 100 * 9 = 900 writes and you still have a completely working disk.

      Writing 900 writes to the first couple of cells: you now have 90 defective cells. In fact, as you still have to rewrite the data to working cells, you have lost your data as there aren't enough working cells.
  • by Britz ( 170620 ) on Saturday February 21, 2009 @02:19PM (#26942647)

    I purchased an X300 Thinkpad for the company this week and took a close look at it. I thought expensive business notebooks come without crapware. And I was sure the X300 would be optimized. But they had defrags scheduled! I always thought defrag is a no no for ssds. Now I am not sure anymore. I deinstalled it first. But who knows?

  • I just recently put in two 128Gb SSD disks in a raid 0 set. I set up a ram drive for use as /tmp and have /var going to another partition on a standard SATA harddrive. I changed fstab to mount the drives noatime so it doesn't record file access times. I also made some other tweaks pointing any programs or services that write logs or use a temporary cache somewhere to use /tmp. Its a software raid I use so I'm using /dev/mapper/-- as the device so I'm not exactly sure how to use the schedular, although I hav

  • In certain situations the increased performance of a SSD removes a bottleneck which would result in increased CPU/memory load. On certain platforms this means these components would spend less time in their lower power states, ie lowered cpu multiplier or core voltage level.

    Tasks for task a SSD saves power, possibly more than would be lost by any higher CPU speed steps, but in something like a looping benchmark more work is done in the same time therefore more power draw.

    This phenomena Had tom's hardw

Every program is a part of some other program, and rarely fits.

Working...