Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Sun Microsystems Linux

Running ZFS Natively On Linux Slower Than Btrfs 235

An anonymous reader writes "It's been known that ZFS is coming to Linux in the form of a native kernel module done by the Lawrence Livermore National Laboratory and KQ Infotech. The ZFS module is still in closed testing on KQ infotech's side (but LLNL's ZFS code is publicly available), and now Phoronix has tried out the ZFS file-system on Linux and carried out some tests. ZFS on Linux via this native module is much faster than using ZFS-FUSE, but the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."
This discussion has been archived. No new comments can be posted.

Running ZFS Natively On Linux Slower Than Btrfs

Comments Filter:
  • First post! (Score:5, Funny)

    by halfaperson ( 1885704 ) on Monday November 22, 2010 @10:58AM (#34306646) Homepage
    Using BTRFS :)
  • If 3 other file systems are "faster", then is ZFS somehow "better"?

    • Re:They Why ZFS? (Score:5, Insightful)

      by klingens ( 147173 ) on Monday November 22, 2010 @11:03AM (#34306714)

      ext2 is faster than ext3, simply because it does less. ZFS has many, many features most other FS don't have but they do come at a price.

    • Re:They Why ZFS? (Score:5, Insightful)

      by Rakshasa Taisab ( 244699 ) on Monday November 22, 2010 @11:04AM (#34306732) Homepage

      I can write the fastest file system around, assuming you don't put much weight on the whole 'being able to read the data back' thingie.

      • Re: (Score:2, Funny)

        by Anonymous Coward

        I can write the fastest file system around, assuming you don't put much weight on the whole 'being able to read the data back' thingie.

        You mean "> /dev/null"?

        • Re: (Score:3, Funny)

          Comment removed based on user account deletion
          • Re: (Score:3, Funny)

            by ebuck ( 585470 )

            A homage to Spinal tap:

            Nigel Tufnel: My RAID array are all RAID-11. Look, right across the rack, RAID-11, RAID-11, RAID-11and...
            Marty DiBergi: Oh, I see. And most arrays go up to RAID-10?
            Nigel Tufnel: Exactly.
            Marty DiBergi: Does that mean it's faster? Is it any faster?
            Nigel Tufnel: Well, it's one faster, isn't it? It's not RAID-10. You see, most blokes, you know, will be serving files at RAID-10. You're on RAID-10 here, all the way up, all the way up, all the way up, you're on RAID-10 on your database b

            • Hmm, A mirrored set of mirrors. I don't think that's going to be fast at all. And it's going to waist a lot of space.

      • 'mount /dev/null /' ?

    • Re:They Why ZFS? (Score:4, Insightful)

      by outZider ( 165286 ) on Monday November 22, 2010 @11:12AM (#34306838) Homepage

      So, because ext3 implementations on other OSes are slow, that means ext3 is slow? Got it.

      Try running ZFS on FreeBSD, or better yet, on the original OS: Solaris.

      • Indeed. The main reason to use ZFS over the other ones, even in cases where the features are the same is that ZFS is more widely available. Admittedly, it's far from universal, but right now it's officially supported in more than one OS. I'm not aware of a filesystem that provides similar functionality to ZFS which is more widely available.

        And it's hardly fair to compare a filesystem that's being run in such a convoluted way to one that's able to be much more tightly integrated, especially considering th
        • Re:They Why ZFS? (Score:5, Interesting)

          by tlhIngan ( 30335 ) <slashdot@@@worf...net> on Monday November 22, 2010 @11:36AM (#34307118)

          The main reason to use ZFS over the other ones, even in cases where the features are the same is that ZFS is more widely available. Admittedly, it's far from universal, but right now it's officially supported in more than one OS. I'm not aware of a filesystem that provides similar functionality to ZFS which is more widely available.

          Actually, I've run into this problem, not with ZFS (haven't used it), but with other filesystems, on Linux only. It seems not all filesystems are truly endian-aware, so moving a USB disk created on a big-endian system and moving it to a little endian system results in a non-working filesystem. Had to actually go and use that system to mount the disk.

          Somewhat annoying if you want to pull a RAID array our of a Linux-running big-endian system in the hopes that you can recover the data... only to find out it was using XFS or other non-endian-friendly FS and basically not be able to get at the data...

      • Yeah, especially since:

        The SPL packages provide the Solaris Porting Layer modules for emulating some Solaris primitives in the Linux kernel, as such, this ZFS implementation is not ported to purely take advantage of the Linux kernel design.

    • Re:They Why ZFS? (Score:5, Interesting)

      by caseih ( 160668 ) on Monday November 22, 2010 @11:21AM (#34306946)

      ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks, in my opinion. I currently run ZFS on about 10 TB. I never worry about a corrupt file system, never have to fsck it. And snapshots are cheap and fast. I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems). Then I back up from the snapshot. In other areas of the disk I do hourly snapshotting. Indeed snapshots are the kill feature for me for ZFS. LVM has snapshots, true, but they are not quick or convenient compared to ZFS. In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space. The integration of volume management and the file system may break a lot of people's ideas of clear separation between layers, but from the admin's point of view it is really nice.

      We'll ditch ZFS and Solaris once BtrFS is ready. BtrFS is close, though; should work well for things like home servers, so try it out if you have a large MythTV system.

      • Re: (Score:2, Informative)

        by Anonymous Coward

        ZFS is...the only FS for large disks

        XFS

        I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems)...LVM has snapshots, true, but they are not quick or convenient compared to ZFS.

        30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

        In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space.

        I can't even make sense of these two sentences. Wh

        • by ranulf ( 182665 )
          He's saying that LVM can only snapshot to unallocated space, whereas ZFS can snapshot to space that is allocated to a partition not isn't currently being used.

          This is simply because LVM works at a layer above the FS, whereas ZFS is the filesystem.

          • ZFS is both a filesystem and volume manager. I can't see how anyone would actually prefer the LVM management style to the All-in-One of ZFS, but whatever cocks their pistol.

            Also it's absolutely shocking that phoronix would have benchmark which resulted in a Linux component clearly out preforming a roughly equivalent component from another OS. That's not their MO or anything. I'm sure they took great pains to ensure equality as they always do.

            ZFS/RAIDZ is a great thing, but raw performance is not it's str

        • Re: (Score:3, Interesting)

          by caseih ( 160668 )

          XFS

          Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

          30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

          Glad to know LVM is faster though. However, as I stated before it's not conveni

          • Re: (Score:3, Informative)

            by makomk ( 752139 )

            Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

            On ZFS, if the system goes down uncleanly you should avoid data corruption so long as every part of the chain from ZFS to your hard drive's platters behaves as ZFS expects and writes data in the order it wants. If it doesn't, you can easily end up with filesystem corruption that can't be repaired without dumping the entire contents of the ZFS pool to external storage, erasing it, and recreating the filesystem from scratch. If you're even more unlucky, the corruption will tickle one of the bugs in ZFS and ev

          • Re: (Score:3, Informative)

            by segedunum ( 883035 )

            Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens.

            What? That's true of any filesystem, and especially ZFS as practical experience shows. The only way to reliably keep any filesystem going is to keep it on a UPS and talking about 'nine nines' in that context is just laughable.

            I keep hearing this shit over and over, mostly on idiot infested Linux distribution and Solaris fanboy forums, and it's ju

        • zfs snapshots are much more akin to a block level version of rsnapshot. lvm snapshots are more like zfs clones (although not quite, as even they are done Copy on Write (CoW).
      • Re: (Score:3, Interesting)

        by TheLink ( 130905 )
        Question about ZFS, say I have a bunch of ZFS filesystems on a bunch of physical drives or drive arrays on Solaris/OpenSolaris/OpenIndiana.

        How do I figure out which physical drives/devices a particular ZFS filesystem depends on?

        And if a physical drive is faulty, how would I know which actual physical drive it is? e.g. get its serial number or physical slot/bay/position or whatever.
        • Re:They Why ZFS? (Score:5, Informative)

          by Maquis196 ( 535256 ) on Monday November 22, 2010 @11:42AM (#34307196)

          zpool status

          That's the command you are looking for. The zfs-fuse lists disks by id which means if you go into /dev/disks/by-id/ and do a ls -al you'll see which devices they are linked to.

          It is done this way to make it easier in Linux, in BSD/Solaris the disks are by gpt name (well they were for me) so this keeps it sane.

          Hope it helps.
          Maq

      • ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks

        What about HFS+? It can work with large drives, up to 8EB.

        Is because it's an Apple format?

        Falcon

    • by LWATCDR ( 28044 )

      Well they tested on a single SSD.
      I have not used ZFS or Btrfs but I have read a lot about ZFS.
      This is not really the use case for ZFS. ZFS has many features for things like using an SSD to cache for the HDDs , RAID like functions, data compression and so on.
      The idea that a simpler less full featured file system is faster is no big shock.
      I would like to see tests with maybe two wan servers each with say 12 HHDs and an SSD for cacheing. That is more the use case for ZFS than a workstation with a single SSD.

    • by Bengie ( 1121981 )

      People like my cousin who run a data center with 10,000+ hard drives and by requirement must have a File System that has been considered stable for at least 5 years. Any data loss is unacceptable. Unless God targets you with His wrath, you have no excuse for any data loss or corruption.

    • If the licenses are incompatible, then why even port it? Academic interest?

      --dave

  • by tysonedwards ( 969693 ) on Monday November 22, 2010 @11:02AM (#34306704)
    Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?
    • Re: (Score:3, Informative)

      by chrb ( 1083577 )

      Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?

      The full release is supposed to be coming out in the first week of January. Given the short time frame, it would seem like this is probably closer to the final release than the words " first beta" imply.

      Surprises:

      • Native ZFS beat XFS on several of the benchmarks - XFS is usually a good performer in these kind of tests
      • Native ZFS does very well on the Threaded IO Test, where it ties for first place.
      • Btrfs is really bad on the SQLite test, taking 5 times longer than XFS on both 2.6.32 and 2.6.37 (bug?)
      • XFS IOzone
      • XFS recently implemented a new journaling subsystem that should speed up metadata-intensive operations. Once they turn it on, it will gain even more performance (and Ext4 is also getting many scalability improvements)

  • by Anonymous Coward

    On similar hardware of course.

    It occurs to me that ZFS does a lot more than EXT4 and Btrfs too.

    • by mcelrath ( 8027 )
      If you had read TFA, you'd know they did benchmark on Solaris (OpenIndiana).
      • But that's not particularly helpful. I don't believe that Btrfs is supported beyond Linux at the moment and neither FreeBSD nor Open Solaris support both. Meaning that you're comparing a filesystem that's been grafted onto Linux via fuse with one that can ultimately be integrated into the Linux kernel.
  • by mattdm ( 1931 ) on Monday November 22, 2010 @11:14AM (#34306858) Homepage

    OpenAFS, which still today provides features unavailable in any other production-ready network filesystem, is a nightmare to use in the real world because of its lack of integration with the mainline kernel. It's licensed under the "IPL", which like the CDDL is free-software/open source but not GPL compatible.

    ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.

    Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.

    • "But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen."

      Well, in the short term, we know what's not going to happen.

    • by caseih ( 160668 )

      You mean like how the Nvidia GPU driver has failed because of licensing conflict? I see no reason why the ZFS module can't be distributed in a similar manner to the nvidia driver. I'm sure that rpmfusion could host binary RPMs without problem. They wouldn't be violating the GPL because it would be you the user who taints the kernel.

      Of course ZFS on Linux probably isn't aimed at normal users anyway. It's far more likely to be used by people with existing ZFS infrastructure (large fiber-channel arrays, et

    • ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.

      Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.

      Um, just who do you think is writing BTRFS? http://en.wikipedia.org/wiki/Btrfs [wikipedia.org] I know its fashionable to knock Oracle every chance you get... but Look at the line:

      Btrfs, when complete, is expected to offer a feature set comparable to ZFS.[16] btrfs was considered to be a competitor to ZFS. However, Oracle acquired ZFS as part of the Sun Microsystem's merger and this did not change their plans for developing btrfs.[17]

      • Re: (Score:3, Interesting)

        by mattdm ( 1931 )

        Um, just who do you think is writing BTRFS? http://en.wikipedia.org/wiki/Btrfs [wikipedia.org] I know its fashionable to knock Oracle every chance you get... but Look at the line:

        As I understand it, Chris Mason brought his btrfs work with him when he started at Oracle, or at least the ideas for it. A kernel hacker of his caliber probably started the job with an agreement of exactly how that was going to go.

        Oracle is a big organization; it's not surprising they act in apparently contradictory ways. They've done a reasonable amount of good open source work and made community contributions. But I stand by the statement that it's impossible to make a good prediction as to what Oracle is

    • by jon3k ( 691256 )
      You're assuming they'll have anything to license when the NetApp lawsuit is over.
  • by hoggoth ( 414195 ) on Monday November 22, 2010 @11:15AM (#34306870) Journal

    I was confused as to what versions of ZFS were available on which distros so I made a chart that lists the different distros and which version of ZFS they support:

    http://petertheobald.blogspot.com/2010/11/101-zfs-capable-operating-systems.html [blogspot.com]

    Hope it's helpful...

  • by digitaldc ( 879047 ) * on Monday November 22, 2010 @11:17AM (#34306898)
    Couldn't they name the file system something better than butterface?
  • I've been through a few filesystem war^Wdramas and stuck with ext?fs the whole time. I liked the addition of journaling but I'm not sure that I've noticed any of the other "backstage" improvements in day to day functioning.
    Is there really a reason to jump ship as a single-workstation user?

    • Snapshotting is probably the most compelling feature of either FS for workstation use. Both BTRFS and ZFS are copy-on-write, and they both feature very low overhead, very straightforward snapshotting. That's a feature that almost anybody can utilize.

      Aside from that, ZFS features a lot of datacenter-centric goodies that might have some utility on a workstation as well. Realtime (low overhead) compression, realtime (high overhead) deduplication, realtime encryption, easy and fast creation/destruction of files

    • The ext?fs work well unless they don't. In my, admittedly limited experience, I've lost more files on ext2fs than on all other filesystems I've dabbled in combined. Admittedly, I had backups, but any fs that depends upon you having backups to that extent should not be trusted. And while I'm sure the newer ones are better, I'm not sure that I personally trust them as ext2fs shouldn't have been that easy to corrupt. IIRC that was only a couple years ago, and it should've been both robust and well undestood by
    • Probably not, especially considering they're still less tested. Ext3 + LVM already provide everything I need for now.

    • by mlts ( 1038732 ) *

      For me, journaling was the reason to move from ext2 to ext3. However, for an end user, ZFS has a few cool features that are significant:

      1: Deduplication by blocks. For end users, it should save some disk space, not sure how much.
      2: File CRCs. This means file corruption is at least detected.
      3: RAID-Z. 'Nuff said. No worry about the LVM layer.
      4: Filesystem encryption.

    • by Hatta ( 162192 )

      If you need the features provided by an advanced filesystem, you'll know. If you're not hitting your head on the limits of EXT4/LVM/RAID, then you don't really need ZFS or Btrfs.

  • It's OK, runs fairly stable, but it also locks up once in a while and does some aggressive disk I/O. No idea what exactly, probably housekeeping, but it's somewhat irksome, could use some more fine tuning.

    The main problem with btrfs right now is that it lacks fsck tools, so in case of havoc there is little chance to recuperate, which is not good for server like systems.

    As for ZFS, it's not the tech that's keeping it from Linux but the restrictive licensing. Unless that gets fixed (probably won't happen), it

    • Re: (Score:2, Informative)

      by larkost ( 79011 )

      "As for ZFS, it's not the tech that's keeping it from Linux but the restrictive licensing."

      Just to be clear: between CDDL (ZFS) and GPL (BTRFS), GPL is clearly the more restrictive license. BTRFS can probably never be shipped with any other major OS other than linux (at least not in kernel mode), while ZFS has already shipped with a few.

      The license restriction is one of linuxes making, not ZFS's. There are arguments for that restricion, but calling the problem one of CDDL being restrictive is a completly di

  • Not bad news (Score:5, Interesting)

    by wonkavader ( 605434 ) on Monday November 22, 2010 @11:35AM (#34307108)

    It's still under development. But it's already pretty competitive, doing reasonably well in many tests.

    And then there's this (on the last page) "Ending out our tests we had the PostMark test where the performance of the ZFS Linux kernel module done by KQ Infotech and the Lawrence Livermore National Laboratories was slaughtered. The disk transaction performance for ZFS on this native Linux kernel module was even worse than using ZFS-FUSE and was almost at half the speed of this test when run under the OpenSolaris-based OpenIndiana distribution."

    Ok, maybe someone can disabuse me of a misconception that I have, but: There's no reason that ZFS in the kernel should be slower than a FUSE version. That means there's something wrong. If they figure out what's wrong and fix it, that could very likely affect the results in some or all of the other tests.

    ZFS isn't done yet, and it already looks like it might be worth the trade-off for the features ZFS provides. And performance might get somewhat better. This article is good news (though that final benchmark is distressing, especially when you look at the ZFS running on OpenSolaris).

    It says: "When KQ Infotech releases these ZFS packages to the public in January and rebases them against a later version of ZFS/Zpool, we will publish more benchmarks."

    and I'm looking forward to that new article.

  • The throughput for large data sorts are just faster, period.

    A lot of it has to do with the reading of compressed data, and the huge ram-buffer that ZFS uses on the OS, optional commit on writes, block sizes that match the database pages.

    The system scans 3 megs of index data, what it's actually reading to get that off is say 1 meg, which it decompresses on the fly on one of the many cores the database server has. In the end throughput destroys what i get running non-compressed volumes on EXT4 or XFS on Linu

    • by jimicus ( 737525 )

      Has anyone here had experience tuning Postgres on Linux versus Solaris/ZFS ? We're not a huge shop, 8 people running large data-warehouse type applications. We run on a shoestring and don't have a bunch of money to throw at the problem and would be very grateful for any ideas on how to make our database run with comparable performance on Linux. I'm hoping that I'm missing something obvious.

      What have you done so far and how are you using Postgres? Mostly reads, mostly writes or some combination of the two? Postgres as it ships is notorious for slow configuration, and many Linux distributions are consistently one major version behind the curve (which is a little annoying as much of the focus of the Postgres people for some time has been improving performance).

      • by Pengo ( 28814 )

        It's a good blend of both reads and writes.

        We have tables that have as many as 100m records, where Solaris/ZFS seemed to help massively was the big reads for reporting. We have indexed it pretty aggressively, even going so far as to index statements and managed to pull amazing performance, considering the concurrency we see from a free database. (Which for the record, has never given us any problems... postgres has been rock-solid)

        for the most part it was running "ok" on linux, but the bump we got from th

  • by pedantic bore ( 740196 ) on Monday November 22, 2010 @12:25PM (#34307718)
    Picking on ZFS for being slow when ported to a different OS and running on atypical hardware is like criticizing Stephen Hawking for being a poor juggler. It's focussing on the wrong thing. The goals of ZFS are, in no particular order:
    - Scalability to enormous numbers of devices
    - Highly assured data integrity via checksumming
    - Fault tolerance via redundancy
    - Manageability/usability features (i.e., snapshots) that conventional file systems simply don't have
    Oh, and if it's fast, well, that's gravy.
  • Since ZFS is doing metadata replication, running the tests on a single disk is going to punish ZFS performance much more than other filesystems. It would be much more interesting to run a benchmark with an array of 6 or 8 disks with RAID-Z2, with ZFS managing the disks directly, and XFS/btrfs/ext4 running on MD RAID-6 + LVM. Next, run a test that creates a snapshot in the middle of running some long benchmark and see what the performance difference is before/after.

  • I don't want speed from ZFS, I will do that via hardware.

    I what the tech from ZFS to give me everything that it does.

    Why judge a Nascar on it's performance when it runs on a Rally car track? (I am a bit of a car geek so I think that is a pretty good /. car analogy! ;)

  • The consistency guarantees provided by the tested filesystems differ significantly. Most (all?) aside from ZFS only journal metadata by default. All data and metadata written to ZFS is always consistent on disk. You won't notice the difference until you crash, and even then you still might not, but it will certainly show up in the benchmarks.

    ZFS is not a lightweight filesystem, that is a fact. The 128-bit addresses, 256-bit checksums, compression, and two or three way replicated metadata don't come for

In any problem, if you find yourself doing an infinite amount of work, the answer may be obtained by inspection.

Working...