Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
Sun Microsystems Linux

Running ZFS Natively On Linux Slower Than Btrfs 235 235

An anonymous reader writes "It's been known that ZFS is coming to Linux in the form of a native kernel module done by the Lawrence Livermore National Laboratory and KQ Infotech. The ZFS module is still in closed testing on KQ infotech's side (but LLNL's ZFS code is publicly available), and now Phoronix has tried out the ZFS file-system on Linux and carried out some tests. ZFS on Linux via this native module is much faster than using ZFS-FUSE, but the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."
This discussion has been archived. No new comments can be posted.

Running ZFS Natively On Linux Slower Than Btrfs

Comments Filter:
  • First post! (Score:5, Funny)

    by halfaperson (1885704) on Monday November 22, 2010 @11:58AM (#34306646) Homepage
    Using BTRFS :)
  • by BoRegardless (721219) on Monday November 22, 2010 @12:01PM (#34306684)

    If 3 other file systems are "faster", then is ZFS somehow "better"?

    • Re:They Why ZFS? (Score:5, Insightful)

      by klingens (147173) on Monday November 22, 2010 @12:03PM (#34306714)

      ext2 is faster than ext3, simply because it does less. ZFS has many, many features most other FS don't have but they do come at a price.

    • Re:They Why ZFS? (Score:5, Insightful)

      by Rakshasa Taisab (244699) on Monday November 22, 2010 @12:04PM (#34306732) Homepage

      I can write the fastest file system around, assuming you don't put much weight on the whole 'being able to read the data back' thingie.

    • Re:They Why ZFS? (Score:4, Insightful)

      by outZider (165286) on Monday November 22, 2010 @12:12PM (#34306838) Homepage

      So, because ext3 implementations on other OSes are slow, that means ext3 is slow? Got it.

      Try running ZFS on FreeBSD, or better yet, on the original OS: Solaris.

      • by hedwards (940851) on Monday November 22, 2010 @12:27PM (#34307012)
        Indeed. The main reason to use ZFS over the other ones, even in cases where the features are the same is that ZFS is more widely available. Admittedly, it's far from universal, but right now it's officially supported in more than one OS. I'm not aware of a filesystem that provides similar functionality to ZFS which is more widely available.

        And it's hardly fair to compare a filesystem that's being run in such a convoluted way to one that's able to be much more tightly integrated, especially considering that it's a licensing issue not a technical one that mandates the approach.

        And yes, I've personally used ZFS on both FreeBSD and Solaris, and I haven't had any complaints about speed. Resource utilization yes, but that's been greatly improved.

        I'm sure that Hammer and Btrfs are both great filesystems, but like EXT4FS, they aren't particularly useful in cross platform computing at present, and while servers aren't going to be doing that, it is something to consider when you've got massive arrays of disks that if you can't take it directly that you'll be stuck with some sort of really annoying migration process for the disks as well as the rest of it.
        • Re:They Why ZFS? (Score:5, Interesting)

          by tlhIngan (30335) <slashdot@wor[ ]et ['f.n' in gap]> on Monday November 22, 2010 @12:36PM (#34307118)

          The main reason to use ZFS over the other ones, even in cases where the features are the same is that ZFS is more widely available. Admittedly, it's far from universal, but right now it's officially supported in more than one OS. I'm not aware of a filesystem that provides similar functionality to ZFS which is more widely available.

          Actually, I've run into this problem, not with ZFS (haven't used it), but with other filesystems, on Linux only. It seems not all filesystems are truly endian-aware, so moving a USB disk created on a big-endian system and moving it to a little endian system results in a non-working filesystem. Had to actually go and use that system to mount the disk.

          Somewhat annoying if you want to pull a RAID array our of a Linux-running big-endian system in the hopes that you can recover the data... only to find out it was using XFS or other non-endian-friendly FS and basically not be able to get at the data...

      • by icebraining (1313345) on Monday November 22, 2010 @12:35PM (#34307106) Homepage

        Yeah, especially since:

        The SPL packages provide the Solaris Porting Layer modules for emulating some Solaris primitives in the Linux kernel, as such, this ZFS implementation is not ported to purely take advantage of the Linux kernel design.

    • Re:They Why ZFS? (Score:5, Interesting)

      by caseih (160668) on Monday November 22, 2010 @12:21PM (#34306946)

      ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks, in my opinion. I currently run ZFS on about 10 TB. I never worry about a corrupt file system, never have to fsck it. And snapshots are cheap and fast. I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems). Then I back up from the snapshot. In other areas of the disk I do hourly snapshotting. Indeed snapshots are the kill feature for me for ZFS. LVM has snapshots, true, but they are not quick or convenient compared to ZFS. In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space. The integration of volume management and the file system may break a lot of people's ideas of clear separation between layers, but from the admin's point of view it is really nice.

      We'll ditch ZFS and Solaris once BtrFS is ready. BtrFS is close, though; should work well for things like home servers, so try it out if you have a large MythTV system.

      • Re:They Why ZFS? (Score:2, Informative)

        by Anonymous Coward on Monday November 22, 2010 @12:29PM (#34307042)

        ZFS is...the only FS for large disks

        XFS

        I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems)...LVM has snapshots, true, but they are not quick or convenient compared to ZFS.

        30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

        In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space.

        I can't even make sense of these two sentences. What you're saying is, an LVM snapshot requires free space, and er, a ZFS snapshot requires free space?

        • by ranulf (182665) on Monday November 22, 2010 @12:54PM (#34307362)
          He's saying that LVM can only snapshot to unallocated space, whereas ZFS can snapshot to space that is allocated to a partition not isn't currently being used.

          This is simply because LVM works at a layer above the FS, whereas ZFS is the filesystem.

          • by Galactic Dominator (944134) on Monday November 22, 2010 @04:05PM (#34309704)

            ZFS is both a filesystem and volume manager. I can't see how anyone would actually prefer the LVM management style to the All-in-One of ZFS, but whatever cocks their pistol.

            Also it's absolutely shocking that phoronix would have benchmark which resulted in a Linux component clearly out preforming a roughly equivalent component from another OS. That's not their MO or anything. I'm sure they took great pains to ensure equality as they always do.

            ZFS/RAIDZ is a great thing, but raw performance is not it's strength.

        • Re:They Why ZFS? (Score:3, Interesting)

          by caseih (160668) on Monday November 22, 2010 @01:41PM (#34307906)

          XFS

          Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

          30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

          Glad to know LVM is faster though. However, as I stated before it's not convenient. With ZFS I do the following things:
          - snapshot the works every night, and keep 7 days worth of snapshots.
          - some directories are snapshotted every night, but I keep 365 snapshots (one year). For example the directories that our financial folk use.
          - snapshot important directories every hour, keep 24 hours worth

          You simply cannot do that with LVM. Sorry. How would I know how much free volume space to plan for? If I have a 10 TB disk, do I plan to use 6 TB of it and leave 4 TB for snapshots? Snapshots consume as much space as subsequent changes. For the 365 say snapshots, this could be a lot or very little depending on what has been touched.

          I can't even make sense of these two sentences. What you're saying is, an LVM snapshot requires free space, and er, a ZFS snapshot requires free space?

          It's very simple. LVM snapshots require free volume set space. If your volume group is 10 TB, then you must leave unallocated space on it for the snapshots to consume. On ZFS you don't need to do this. Any free space on the file system can be used for either files or snapshots; it's all the same pool. To do snapshots with LVM the way I do with ZFS would require me to set aside a lot of space. Very unefficient and wasteful.

          As far as I can tell, BtrFS will work in a similar way to ZFS, bypassing the need for LVM. Which I'm totally okay with.

          • Re:They Why ZFS? (Score:3, Informative)

            by makomk (752139) on Monday November 22, 2010 @03:41PM (#34309422) Journal

            Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

            On ZFS, if the system goes down uncleanly you should avoid data corruption so long as every part of the chain from ZFS to your hard drive's platters behaves as ZFS expects and writes data in the order it wants. If it doesn't, you can easily end up with filesystem corruption that can't be repaired without dumping the entire contents of the ZFS pool to external storage, erasing it, and recreating the filesystem from scratch. If you're even more unlucky, the corruption will tickle one of the bugs in ZFS and even trying to mount the FS will cause a kernel panic, though this was more of a problem in older versions.

          • Re:They Why ZFS? (Score:3, Informative)

            by segedunum (883035) on Monday November 22, 2010 @04:50PM (#34310222)

            Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens.

            What? That's true of any filesystem, and especially ZFS as practical experience shows. The only way to reliably keep any filesystem going is to keep it on a UPS and talking about 'nine nines' in that context is just laughable.

            I keep hearing this shit over and over, mostly on idiot infested Linux distribution and Solaris fanboy forums, and it's just getting unbearable to see.

            It's very simple. LVM snapshots require free volume set space. If your volume group is 10 TB, then you must leave unallocated space on it for the snapshots to consume.

            You make it sound like you need an extra 10 terabytes to backup a 10 terabyte volume with LVM. You don't. It takes a snapshot and the free space you need is for further changes to the volume. ZFS is the same, except it's more intelligent about how it can use any free space over multiple volumes for snapshots and with things like dedpluication it will get much better, but you still need free space to perform them. You make it sound like ZFS snapshots are completely free as I see many ZFS proponents saying, and it's crap. The OP is also right about the time that ZFS snapshots can take. It's far too long.

            This is a road Btrfs will have to travel because it also has to be *the* general purpose Linux filesystem and will have to solve problems and be in places where ZFS is not.

        • by DJProtoss (589443) on Monday November 22, 2010 @01:52PM (#34308036)
          zfs snapshots are much more akin to a block level version of rsnapshot. lvm snapshots are more like zfs clones (although not quite, as even they are done Copy on Write (CoW).
      • Re:They Why ZFS? (Score:3, Interesting)

        by TheLink (130905) on Monday November 22, 2010 @12:37PM (#34307126) Journal
        Question about ZFS, say I have a bunch of ZFS filesystems on a bunch of physical drives or drive arrays on Solaris/OpenSolaris/OpenIndiana.

        How do I figure out which physical drives/devices a particular ZFS filesystem depends on?

        And if a physical drive is faulty, how would I know which actual physical drive it is? e.g. get its serial number or physical slot/bay/position or whatever.
        • Re:They Why ZFS? (Score:5, Informative)

          by Maquis196 (535256) on Monday November 22, 2010 @12:42PM (#34307196)

          zpool status

          That's the command you are looking for. The zfs-fuse lists disks by id which means if you go into /dev/disks/by-id/ and do a ls -al you'll see which devices they are linked to.

          It is done this way to make it easier in Linux, in BSD/Solaris the disks are by gpt name (well they were for me) so this keeps it sane.

          Hope it helps.
          Maq

      • by falconwolf (725481) <`moc.oohay' `ta' `0002_gniraosnoclaf'> on Monday November 22, 2010 @06:07PM (#34311116)

        ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks

        What about HFS+? It can work with large drives, up to 8EB.

        Is because it's an Apple format?

        Falcon

    • by LWATCDR (28044) on Monday November 22, 2010 @12:42PM (#34307194) Homepage Journal

      Well they tested on a single SSD.
      I have not used ZFS or Btrfs but I have read a lot about ZFS.
      This is not really the use case for ZFS. ZFS has many features for things like using an SSD to cache for the HDDs , RAID like functions, data compression and so on.
      The idea that a simpler less full featured file system is faster is no big shock.
      I would like to see tests with maybe two wan servers each with say 12 HHDs and an SSD for cacheing. That is more the use case for ZFS than a workstation with a single SSD.

    • by Bengie (1121981) on Monday November 22, 2010 @01:48PM (#34307972)

      People like my cousin who run a data center with 10,000+ hard drives and by requirement must have a File System that has been considered stable for at least 5 years. Any data loss is unacceptable. Unless God targets you with His wrath, you have no excuse for any data loss or corruption.

    • by davecb (6526) <davec-b@rogers.com> on Monday November 22, 2010 @06:57PM (#34311632) Homepage Journal

      If the licenses are incompatible, then why even port it? Academic interest?

      --dave

  • by tysonedwards (969693) on Monday November 22, 2010 @12:02PM (#34306704)
    Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?
    • by chrb (1083577) on Monday November 22, 2010 @12:56PM (#34307406)

      Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?

      The full release is supposed to be coming out in the first week of January. Given the short time frame, it would seem like this is probably closer to the final release than the words " first beta" imply.

      Surprises:

      • Native ZFS beat XFS on several of the benchmarks - XFS is usually a good performer in these kind of tests
      • Native ZFS does very well on the Threaded IO Test, where it ties for first place.
      • Btrfs is really bad on the SQLite test, taking 5 times longer than XFS on both 2.6.32 and 2.6.37 (bug?)
      • XFS IOzone write performance increased by 45% going from 2.6.32 to 2.6.37 (!) XFS increased on FS-Mark by 37%. I thought XFS would be pretty much at the point where there would be no such great improvements.
      • "Real" Solaris+ZFS gets absolutely slaughtered on the Threaded IO Test and the PostMark Test, with ext4 pushing almost 10x more transactions per second.
      • Tests were done on a SSD, apparently there was no difference in relative performance of the filesystems on SSD versus HD

      Notes:

      • "Real" Solaris+ZFS results are not shown for most tests
      • Would be nice to know how many replicates they did of each test
      • This is an interesting set of results that will get people talking/arguing :-) Thanks to Phoronix for starting the discussion.
  • by Anonymous Coward on Monday November 22, 2010 @12:04PM (#34306738)

    On similar hardware of course.

    It occurs to me that ZFS does a lot more than EXT4 and Btrfs too.

  • by mattdm (1931) on Monday November 22, 2010 @12:14PM (#34306858) Homepage

    OpenAFS, which still today provides features unavailable in any other production-ready network filesystem, is a nightmare to use in the real world because of its lack of integration with the mainline kernel. It's licensed under the "IPL", which like the CDDL is free-software/open source but not GPL compatible.

    ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.

    Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.

    • by wonkavader (605434) on Monday November 22, 2010 @12:22PM (#34306952)

      "But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen."

      Well, in the short term, we know what's not going to happen.

    • by caseih (160668) on Monday November 22, 2010 @12:52PM (#34307346)

      You mean like how the Nvidia GPU driver has failed because of licensing conflict? I see no reason why the ZFS module can't be distributed in a similar manner to the nvidia driver. I'm sure that rpmfusion could host binary RPMs without problem. They wouldn't be violating the GPL because it would be you the user who taints the kernel.

      Of course ZFS on Linux probably isn't aimed at normal users anyway. It's far more likely to be used by people with existing ZFS infrastructure (large fiber-channel arrays, etc). In my opinion, ZFS on linux gives a smoother migration path away from Oracle Solaris and ZFS.

    • by QuantumRiff (120817) on Monday November 22, 2010 @02:11PM (#34308294)

      ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.

      Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.

      Um, just who do you think is writing BTRFS? http://en.wikipedia.org/wiki/Btrfs [wikipedia.org] I know its fashionable to knock Oracle every chance you get... but Look at the line:

      Btrfs, when complete, is expected to offer a feature set comparable to ZFS.[16] btrfs was considered to be a competitor to ZFS. However, Oracle acquired ZFS as part of the Sun Microsystem's merger and this did not change their plans for developing btrfs.[17]

      • by mattdm (1931) on Monday November 22, 2010 @02:53PM (#34308846) Homepage

        Um, just who do you think is writing BTRFS? http://en.wikipedia.org/wiki/Btrfs [wikipedia.org] I know its fashionable to knock Oracle every chance you get... but Look at the line:

        As I understand it, Chris Mason brought his btrfs work with him when he started at Oracle, or at least the ideas for it. A kernel hacker of his caliber probably started the job with an agreement of exactly how that was going to go.

        Oracle is a big organization; it's not surprising they act in apparently contradictory ways. They've done a reasonable amount of good open source work and made community contributions. But I stand by the statement that it's impossible to make a good prediction as to what Oracle is going to do with anything that comes from the Sun acquisition -- but you certainly don't need to take my word for it that most of the behavior so far seems to be aimed at short-term monetization rather than long-term community growth.

    • by jon3k (691256) on Monday November 22, 2010 @05:44PM (#34310808)
      You're assuming they'll have anything to license when the NetApp lawsuit is over.
  • by hoggoth (414195) on Monday November 22, 2010 @12:15PM (#34306870) Journal

    I was confused as to what versions of ZFS were available on which distros so I made a chart that lists the different distros and which version of ZFS they support:

    http://petertheobald.blogspot.com/2010/11/101-zfs-capable-operating-systems.html [blogspot.com]

    Hope it's helpful...

  • by digitaldc (879047) * on Monday November 22, 2010 @12:17PM (#34306898)
    Couldn't they name the file system something better than butterface?
  • by soupforare (542403) on Monday November 22, 2010 @12:18PM (#34306904)

    I've been through a few filesystem war^Wdramas and stuck with ext?fs the whole time. I liked the addition of journaling but I'm not sure that I've noticed any of the other "backstage" improvements in day to day functioning.
    Is there really a reason to jump ship as a single-workstation user?

    • by Etherized (1038092) on Monday November 22, 2010 @12:34PM (#34307094)

      Snapshotting is probably the most compelling feature of either FS for workstation use. Both BTRFS and ZFS are copy-on-write, and they both feature very low overhead, very straightforward snapshotting. That's a feature that almost anybody can utilize.

      Aside from that, ZFS features a lot of datacenter-centric goodies that might have some utility on a workstation as well. Realtime (low overhead) compression, realtime (high overhead) deduplication, realtime encryption, easy and fast creation/destruction of filesystems and virtual block devices, and a ton of other odds and ends.

    • by hedwards (940851) on Monday November 22, 2010 @12:43PM (#34307218)
      The ext?fs work well unless they don't. In my, admittedly limited experience, I've lost more files on ext2fs than on all other filesystems I've dabbled in combined. Admittedly, I had backups, but any fs that depends upon you having backups to that extent should not be trusted. And while I'm sure the newer ones are better, I'm not sure that I personally trust them as ext2fs shouldn't have been that easy to corrupt. IIRC that was only a couple years ago, and it should've been both robust and well undestood by then.
    • by icebraining (1313345) on Monday November 22, 2010 @12:47PM (#34307268) Homepage

      Probably not, especially considering they're still less tested. Ext3 + LVM already provide everything I need for now.

    • by mlts (1038732) * on Monday November 22, 2010 @01:19PM (#34307682)

      For me, journaling was the reason to move from ext2 to ext3. However, for an end user, ZFS has a few cool features that are significant:

      1: Deduplication by blocks. For end users, it should save some disk space, not sure how much.
      2: File CRCs. This means file corruption is at least detected.
      3: RAID-Z. 'Nuff said. No worry about the LVM layer.
      4: Filesystem encryption.

    • by Hatta (162192) on Monday November 22, 2010 @01:34PM (#34307804) Journal

      If you need the features provided by an advanced filesystem, you'll know. If you're not hitting your head on the limits of EXT4/LVM/RAID, then you don't really need ZFS or Btrfs.

  • by Seth Kriticos (1227934) on Monday November 22, 2010 @12:22PM (#34306960)

    It's OK, runs fairly stable, but it also locks up once in a while and does some aggressive disk I/O. No idea what exactly, probably housekeeping, but it's somewhat irksome, could use some more fine tuning.

    The main problem with btrfs right now is that it lacks fsck tools, so in case of havoc there is little chance to recuperate, which is not good for server like systems.

    As for ZFS, it's not the tech that's keeping it from Linux but the restrictive licensing. Unless that gets fixed (probably won't happen), it is off limits, and Linux folks will do their own thing, like the always do.

  • Not bad news (Score:5, Interesting)

    by wonkavader (605434) on Monday November 22, 2010 @12:35PM (#34307108)

    It's still under development. But it's already pretty competitive, doing reasonably well in many tests.

    And then there's this (on the last page) "Ending out our tests we had the PostMark test where the performance of the ZFS Linux kernel module done by KQ Infotech and the Lawrence Livermore National Laboratories was slaughtered. The disk transaction performance for ZFS on this native Linux kernel module was even worse than using ZFS-FUSE and was almost at half the speed of this test when run under the OpenSolaris-based OpenIndiana distribution."

    Ok, maybe someone can disabuse me of a misconception that I have, but: There's no reason that ZFS in the kernel should be slower than a FUSE version. That means there's something wrong. If they figure out what's wrong and fix it, that could very likely affect the results in some or all of the other tests.

    ZFS isn't done yet, and it already looks like it might be worth the trade-off for the features ZFS provides. And performance might get somewhat better. This article is good news (though that final benchmark is distressing, especially when you look at the ZFS running on OpenSolaris).

    It says: "When KQ Infotech releases these ZFS packages to the public in January and rebases them against a later version of ZFS/Zpool, we will publish more benchmarks."

    and I'm looking forward to that new article.

  • by Pengo (28814) on Monday November 22, 2010 @12:49PM (#34307296) Journal

    The throughput for large data sorts are just faster, period.

    A lot of it has to do with the reading of compressed data, and the huge ram-buffer that ZFS uses on the OS, optional commit on writes, block sizes that match the database pages.

    The system scans 3 megs of index data, what it's actually reading to get that off is say 1 meg, which it decompresses on the fly on one of the many cores the database server has. In the end throughput destroys what i get running non-compressed volumes on EXT4 or XFS on Linux. For "MY" database, it runs nearly 2-3x faster than the same hardware running on Linux. (RHEL5 is what I ran the db on for a long time).

    I have not been able to get Linux/Postgres to run even partially as fast as I have been able to get Solaris/ZFS running Postgres 8.3.

    Btrfs isn't even near production states yet, but i am really hoping that it will give me an option to get off of Solaris.

    On that note, one thing i haven't tried yet with our DB is Solid State Drives. The sheer throughput might more than make up for the benefits i get on compressed ZFS volumes.

    I for one am VERY VERY hopeful that BTRFS can get stable, and fast. Oracle's fiasco has me and a few other people at our small business very nervous. I'm not planning on replacing our Sol10 (free) distribution , and could care less about the support Oracle offers. I'm playing with Solaris Express 11 now, but not sure I want to pay the $1k a year for production use, though if it offers me the performance gains over linux that I'm currently seeing, it will probably be worth it for our Database system alone.

    Has anyone here had experience tuning Postgres on Linux versus Solaris/ZFS ? We're not a huge shop, 8 people running large data-warehouse type applications. We run on a shoestring and don't have a bunch of money to throw at the problem and would be very grateful for any ideas on how to make our database run with comparable performance on Linux. I'm hoping that I'm missing something obvious.

    • by jimicus (737525) on Monday November 22, 2010 @12:57PM (#34307410)

      Has anyone here had experience tuning Postgres on Linux versus Solaris/ZFS ? We're not a huge shop, 8 people running large data-warehouse type applications. We run on a shoestring and don't have a bunch of money to throw at the problem and would be very grateful for any ideas on how to make our database run with comparable performance on Linux. I'm hoping that I'm missing something obvious.

      What have you done so far and how are you using Postgres? Mostly reads, mostly writes or some combination of the two? Postgres as it ships is notorious for slow configuration, and many Linux distributions are consistently one major version behind the curve (which is a little annoying as much of the focus of the Postgres people for some time has been improving performance).

      • by Pengo (28814) on Monday November 22, 2010 @04:13PM (#34309800) Journal

        It's a good blend of both reads and writes.

        We have tables that have as many as 100m records, where Solaris/ZFS seemed to help massively was the big reads for reporting. We have indexed it pretty aggressively, even going so far as to index statements and managed to pull amazing performance, considering the concurrency we see from a free database. (Which for the record, has never given us any problems... postgres has been rock-solid)

        for the most part it was running "ok" on linux, but the bump we got from the testing on Solaris with ZFS with identical hardware and similar configs was nothing short of amazing.

        One of the big differences between the 2 configs, we disabled the raid controler (A dell perc 6/i) to run jbod instead of Raid 1+0. I've not tried to do a stripe configuration on Linux with a similar configuration , even without compression. To be fair to the linux performance, i really need to setup and test with a similar config to make sure my results were not hardware related.

        A friend had told me where solaris and ZFS really gives the big bump on the performance is how it's not having to read each byte from the disk, it's reading a compressed block and decompressing it on the fly, which if you have the CPU cycles to spare causes the io transfers to be a lot quicker. (at times 2-3x faster than a raw read with uncompressed data)

        I'm guessing that we could probably get similar results with Linux on XFS or ext4 using solid state drives, which are now a little more affordable than they were years ago.

        Again, we're not a large shop with lots of money to throw around at the project, we're a startup just trying to get by in a brutal economy. :)

        You're right though about the default configuration. I've gone through and tuned the work memory, index cache, tuned the memory to match my hardware. (Currently 32 gigs on an array of 8 disks on a 8 core Xeon server)...

  • by pedantic bore (740196) on Monday November 22, 2010 @01:25PM (#34307718)
    Picking on ZFS for being slow when ported to a different OS and running on atypical hardware is like criticizing Stephen Hawking for being a poor juggler. It's focussing on the wrong thing. The goals of ZFS are, in no particular order:
    - Scalability to enormous numbers of devices
    - Highly assured data integrity via checksumming
    - Fault tolerance via redundancy
    - Manageability/usability features (i.e., snapshots) that conventional file systems simply don't have
    Oh, and if it's fast, well, that's gravy.
    • The fast can be achieved by more/better hardware. A filesystem shouldn't have 'fast' or 'faster than ye' as it's primary focus anyway. If it's very fast but not 100% trustworthy it's not a good file system (eg. ReiserFS).

      Some features that make ZFS a bit slower are thought up by people that have years of experience in large SAN and other storage solutions. Writing metadata multiple times over different spindles might seem overkill for most but that is until you lose a N+1 spindles (or just get r/w errors on the N+1'th spindle while the others are recovering) and in a typical situation this means the whole file system is hosed but ZFS can sometimes recover a lot of and will be able to tell which files it could not fix which is nice when your system has many TB's and takes days to restore from a full backup.

      ZFS solves the fast by allowing frequently accessed data to be in memory or faster disks (like SSD's) and have small sync writes hit an intent log while optimizing async writes before putting them on-disk. Give ZFS more memory than your average desktop and you'll be a lot faster in reads, give it a small SLC SSD and see it hit 10k stable write IOPS.

    • by Ash-Fox (726320) on Monday November 22, 2010 @02:35PM (#34308634)

      Picking on ZFS for being slow when ported to a different OS and running on atypical hardware

      How is he picking? He's just measuring the file system performance compared to others on a specific OS.

      It's focussing on the wrong thing.

      I don't think it is, this person wanted to measure performance on Linux, not compare features and he got what he was testing. I would imagine there are plenty of people who want to know how well it performs - regardless of features - in comparison to other filesystems.

  • by Guy Smiley (9219) on Monday November 22, 2010 @02:25PM (#34308466)

    Since ZFS is doing metadata replication, running the tests on a single disk is going to punish ZFS performance much more than other filesystems. It would be much more interesting to run a benchmark with an array of 6 or 8 disks with RAID-Z2, with ZFS managing the disks directly, and XFS/btrfs/ext4 running on MD RAID-6 + LVM. Next, run a test that creates a snapshot in the middle of running some long benchmark and see what the performance difference is before/after.

  • by yoshi_mon (172895) on Monday November 22, 2010 @03:58PM (#34309630)

    I don't want speed from ZFS, I will do that via hardware.

    I what the tech from ZFS to give me everything that it does.

    Why judge a Nascar on it's performance when it runs on a Rally car track? (I am a bit of a car geek so I think that is a pretty good /. car analogy! ;)

  • by KonoWatakushi (910213) on Monday November 22, 2010 @04:27PM (#34309962)

    The consistency guarantees provided by the tested filesystems differ significantly. Most (all?) aside from ZFS only journal metadata by default. All data and metadata written to ZFS is always consistent on disk. You won't notice the difference until you crash, and even then you still might not, but it will certainly show up in the benchmarks.

    ZFS is not a lightweight filesystem, that is a fact. The 128-bit addresses, 256-bit checksums, compression, and two or three way replicated metadata don't come for free. Also, another thing that may be working against ZFS on a Flash based SSD is the page size. By default, ZFS uses a minimum of 512 byte blocks for data, whereas most other filesystems use 4k which matches the SSD page size. It would be interesting to create the ZFS pool with a 4k asize and see how that affects the results.

    The benchmarks aside, it is the feature set which really sells it. The performance is good, the administrative interface is excellent, and it does an fine job of returning your data in an error free state. At the end of the day, that is what really matters.

    Even so, I look forward to more numbers when stable releases can be compared. It would also be nice to include DragonFlyBSDs HAMMER filesystem, to round out the modern set.

The most difficult thing in the world is to know how to do a thing and to watch someone else doing it wrong, without commenting. -- T.H. White

Working...