Forgot your password?
typodupeerror
Sun Microsystems Linux

Running ZFS Natively On Linux Slower Than Btrfs 235

Posted by kdawson
from the early-days dept.
An anonymous reader writes "It's been known that ZFS is coming to Linux in the form of a native kernel module done by the Lawrence Livermore National Laboratory and KQ Infotech. The ZFS module is still in closed testing on KQ infotech's side (but LLNL's ZFS code is publicly available), and now Phoronix has tried out the ZFS file-system on Linux and carried out some tests. ZFS on Linux via this native module is much faster than using ZFS-FUSE, but the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."
This discussion has been archived. No new comments can be posted.

Running ZFS Natively On Linux Slower Than Btrfs

Comments Filter:
  • by hoggoth (414195) on Monday November 22, 2010 @11:15AM (#34306870) Journal

    I was confused as to what versions of ZFS were available on which distros so I made a chart that lists the different distros and which version of ZFS they support:

    http://petertheobald.blogspot.com/2010/11/101-zfs-capable-operating-systems.html [blogspot.com]

    Hope it's helpful...

  • Re:They Why ZFS? (Score:2, Informative)

    by Anonymous Coward on Monday November 22, 2010 @11:29AM (#34307042)

    ZFS is...the only FS for large disks

    XFS

    I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems)...LVM has snapshots, true, but they are not quick or convenient compared to ZFS.

    30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

    In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space.

    I can't even make sense of these two sentences. What you're saying is, an LVM snapshot requires free space, and er, a ZFS snapshot requires free space?

  • Re:They Why ZFS? (Score:5, Informative)

    by daha (1699052) on Monday November 22, 2010 @11:30AM (#34307054)

    Which of the ZFS features most impact its performance?

    Compression enabled by default can't help (available in btrfs).

    Checksum for all blocks probably doesn't help, but definitely helps detect corrupt data/corruption (available in btrfs).

    Forcing any file that requires more than a single block to use a tree of block pointers probably doesn't help. The dnode only has one block pointer and the block pointer can only point to a single block (no extents). On the plus side, the block size can vary between 512 bytes and 64 KiB per object, so slack space is kept low. If more than a single block is necessary it creates a tree of block pointers. Each block pointer is 128 bytes in size, so the tree can get deep fairly quick.

    Three copies of almost all file system structures (such as inodes, but called dnodes in ZFS) by default can't help (which are compressed of course).

  • by yup2000 (182755) on Monday November 22, 2010 @11:34AM (#34307088) Homepage

    hmmm, well the most obvious feature that ZFS has that Ext4 does not is check summing.

    That feature is one reason why ZFS is better (it will tell you if your disk is going bad, and if you have a raid setup, it will go get the good data for you). However, this is also one reason why ZFS is slower... it spends time making sure your data is safe and that it always gives you the correct bits from your disk.

    That single feature is why I run FreeBSD (looking forward to kFreeBSD/debian!) on my file server in a mirrored raid configuration. Yes, it is "slower", but I still pull data off that server at over 50MB/sec on my home gigabit lan! The specs on that server aren't great either... 2GB ram, and an old 1.6GHZ single core sempron.

  • Re:They Why ZFS? (Score:5, Informative)

    by Maquis196 (535256) on Monday November 22, 2010 @11:42AM (#34307196)

    zpool status

    That's the command you are looking for. The zfs-fuse lists disks by id which means if you go into /dev/disks/by-id/ and do a ls -al you'll see which devices they are linked to.

    It is done this way to make it easier in Linux, in BSD/Solaris the disks are by gpt name (well they were for me) so this keeps it sane.

    Hope it helps.
    Maq

  • Re:They Why ZFS? (Score:1, Informative)

    by Anonymous Coward on Monday November 22, 2010 @11:52AM (#34307336)
    If you evenly split your 1TB disk into 2 LVM volume sets and volume1 is full, you can't make a snapshot of it. volume2 is sitting there empty, but the snapshot can't use it.
  • by chrb (1083577) on Monday November 22, 2010 @11:56AM (#34307406)

    Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?

    The full release is supposed to be coming out in the first week of January. Given the short time frame, it would seem like this is probably closer to the final release than the words " first beta" imply.

    Surprises:

    • Native ZFS beat XFS on several of the benchmarks - XFS is usually a good performer in these kind of tests
    • Native ZFS does very well on the Threaded IO Test, where it ties for first place.
    • Btrfs is really bad on the SQLite test, taking 5 times longer than XFS on both 2.6.32 and 2.6.37 (bug?)
    • XFS IOzone write performance increased by 45% going from 2.6.32 to 2.6.37 (!) XFS increased on FS-Mark by 37%. I thought XFS would be pretty much at the point where there would be no such great improvements.
    • "Real" Solaris+ZFS gets absolutely slaughtered on the Threaded IO Test and the PostMark Test, with ext4 pushing almost 10x more transactions per second.
    • Tests were done on a SSD, apparently there was no difference in relative performance of the filesystems on SSD versus HD

    Notes:

    • "Real" Solaris+ZFS results are not shown for most tests
    • Would be nice to know how many replicates they did of each test
    • This is an interesting set of results that will get people talking/arguing :-) Thanks to Phoronix for starting the discussion.
  • by Anonymous Coward on Monday November 22, 2010 @12:10PM (#34307576)

    If you read TFA (or perhaps even the slashdot submission text) you should know that both fuse and native ports for linux are being discussed.

  • by larkost (79011) on Monday November 22, 2010 @12:14PM (#34307634)

    "As for ZFS, it's not the tech that's keeping it from Linux but the restrictive licensing."

    Just to be clear: between CDDL (ZFS) and GPL (BTRFS), GPL is clearly the more restrictive license. BTRFS can probably never be shipped with any other major OS other than linux (at least not in kernel mode), while ZFS has already shipped with a few.

    The license restriction is one of linuxes making, not ZFS's. There are arguments for that restricion, but calling the problem one of CDDL being restrictive is a completly distorted view.

  • Re:They Why ZFS? (Score:1, Informative)

    by Anonymous Coward on Monday November 22, 2010 @12:20PM (#34307694)

    L2ARC is a HUGE performance improvement for many workloads, it essentially allows you to use faster disks to cache the most frequently used data. If they had combined the SSD and the 7200 RPM SATA drive and benchmarked a real world workload the ZFS implementation would have probably stomped the others because it would have used the SSD for the 'hot' data, the best you can do with btrfs is to place the metadata on the SSD.

    L2ARC is just another cache. The ultimate IO limits of the filesystem are still set by limitations of the final backing store.

    So if you're moving lots and lots of data, the L2ARC is pretty useless.

    Set yourself up a ZFS file system, then start benchmarking it. If you're running on Solaris, run something like "iostat -sndxz 1" so you can see actual IO to your physical LUNs every second. Under heavy write load, you'll see ZFS go for extended periods without writing anything, then it'll hang your box badly as it flushes to disk. That's bad for two reasons - the relatively long periods of time ZFS isn't writing are IO opportunities lost, and the hanging of the box is horrible.

    ZFS's IO pattern gives away available bandwidth, and then ZFS hammers your system to its knees.

  • Re:They Why ZFS? (Score:5, Informative)

    by cbhacking (979169) <.moc.oohay. .ta. ... isiurc_tuo_neeb.> on Monday November 22, 2010 @01:57PM (#34308900) Homepage Journal

    Um... WTF? Compression is a performance *improvement* and a massive one, at that. The trivial cost in CPU time is offset by the massive reduction in IO time, which is more expensive by far. This has been true since 2000 or even earlier. Modern multi-core CPUs just take the CPU penalty from negligible to nonexistent. Unless your CPU cores are all running at 100%, and possibly even if they are, compression will improve performance.

    Note that this is true on a wide variety of filesystems; it's nothing special to these particular ones. Hell, NTFS has had built-in compression for a decade or more. You can improve performance on a Windows system by right-clicking the C: drive and selecting Properties -> Compress this drive. You can do it from the command line using

    compact.exe /C /S:C:\ /A

    This will compress all files in or under the root of the C drive, including hidden or system files (requires admin, of course) and marks all the directories so that any files written to them will also get compressed.

  • Re:They Why ZFS? (Score:1, Informative)

    by Anonymous Coward on Monday November 22, 2010 @02:35PM (#34309328)

    Don't do this for any files that regularly get random writes (like, say, database files). Compression uses bigger blocks (64K I think) so a write to a single block becomes a decompression of several blocks, an update and a recompression of the blocks. Which will kill performance.

  • Re:They Why ZFS? (Score:3, Informative)

    by makomk (752139) on Monday November 22, 2010 @02:41PM (#34309422) Journal

    Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

    On ZFS, if the system goes down uncleanly you should avoid data corruption so long as every part of the chain from ZFS to your hard drive's platters behaves as ZFS expects and writes data in the order it wants. If it doesn't, you can easily end up with filesystem corruption that can't be repaired without dumping the entire contents of the ZFS pool to external storage, erasing it, and recreating the filesystem from scratch. If you're even more unlucky, the corruption will tickle one of the bugs in ZFS and even trying to mount the FS will cause a kernel panic, though this was more of a problem in older versions.

  • Re:They Why ZFS? (Score:4, Informative)

    by Dhalka226 (559740) on Monday November 22, 2010 @02:59PM (#34309646)

    Half of which's results will be one discussion forum or another where people who are not smug asses thoughtfully took a moment to answer a person's question.

    You had time to post this self-important drivel, surely you have time to answer the question as well -- but you elected for the drivel. And you think that somehow says something about the people asking the question rather than about you?

  • Re:They Why ZFS? (Score:3, Informative)

    by segedunum (883035) on Monday November 22, 2010 @03:50PM (#34310222)

    Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens.

    What? That's true of any filesystem, and especially ZFS as practical experience shows. The only way to reliably keep any filesystem going is to keep it on a UPS and talking about 'nine nines' in that context is just laughable.

    I keep hearing this shit over and over, mostly on idiot infested Linux distribution and Solaris fanboy forums, and it's just getting unbearable to see.

    It's very simple. LVM snapshots require free volume set space. If your volume group is 10 TB, then you must leave unallocated space on it for the snapshots to consume.

    You make it sound like you need an extra 10 terabytes to backup a 10 terabyte volume with LVM. You don't. It takes a snapshot and the free space you need is for further changes to the volume. ZFS is the same, except it's more intelligent about how it can use any free space over multiple volumes for snapshots and with things like dedpluication it will get much better, but you still need free space to perform them. You make it sound like ZFS snapshots are completely free as I see many ZFS proponents saying, and it's crap. The OP is also right about the time that ZFS snapshots can take. It's far too long.

    This is a road Btrfs will have to travel because it also has to be *the* general purpose Linux filesystem and will have to solve problems and be in places where ZFS is not.

  • Re:They Why ZFS? (Score:3, Informative)

    by CAIMLAS (41445) on Monday November 22, 2010 @06:42PM (#34312042) Homepage

    What features does ZFS have that ext4 doesnt? Its a simple question, but you had to act like an ass. Good job.

    Jeez, where to start? They're night and day. EXT4 has more in common with FAT32 or UFS than it does ZFS.

    It's got a handful of core features, all of which are significant on their own:

    * copy-on-write, so you know your data gets committed
    * integral RAID-like functionality, integrated with the filesystem. This reduces overhead and eliminates the need for archaic RAID controllers (almost) entirely (complete with their shitty firmware and quirks, etc.) - just the controller, please.
    * Due to the above two, eliminates the RAID5 write hole
    * instant (like, a second or two) snapshotting of very large amounts of data.
    * You can transparently 'piggyback' any filesystem on top of ZFS to provide said filesystem with ZFSs' protection
    * Integral iSCSI provider. Nice to have with the above feature!

    Shortcomings might include:
    * No fdisk. IMO it's a bit of a serious limitation, but "it's not needed". Still, it can't help you recover from something like...
    * The potential loss of your zpool definition file. Unlike (say) mdraid on Linux, there are no block backups within the filesystem (as far as I know) so the pool definition can tenably be lost (if you have a backup file somewhere, it's easy enough to recover, but still..)

    As for the original post "not terribly fast" diss? Sorry, not buying it. They really needed to compare the performance against (say) other ZFS-based systems to show it's utility - there are a lot of people 'forced' to use solaris and or FreeBSD because it's got ZFS. Another significant thing to consider will be its maturity/stability and feature-completeness (eg. FreeBSD is a good way behind Solaris/OS/Illumos in these departments).

    Finally, this is still pretty beta code. The only 'significant' not-as-good performance failure is the Postmark benchmark, which may or may not be conclusive (I don't know what it does). If you compare it to this [phoronix.com] postmark benchmark for PCBSD, it doesn't look that bad (particularly when you consider the above linked article figures are 500 points or so higher across the board than the 'new' benchmarks) - and the new implementation appears better than XFS, which is still quite a decent filesystem.

    Oh, yeah - consider it's still 'beta'. Noteably, considerably more 'beta' than Butter. Consider me excited. I'm not going to jump until I get fairly certain news that it's at least as stable as the FreeBSD implementation (while requiring less 'tuning' - bah!); I can do without features if it's stable. CoW and the basic RAID-like implementation on their own is enough to jump ship for.

There is no royal road to geometry. -- Euclid

Working...