Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?
Data Storage Open Source Software Linux

The State of ZFS On Linux 370

An anonymous reader writes: Richard Yao, one of the most prolific contributors to the ZFSOnLinux project, has put up a post explaining why he thinks the filesystem is definitely production-ready. He says, "ZFS provides strong guarantees for the integrity of [data] from the moment that fsync() returns on a file, an operation on a synchronous file handle is returned or dirty writeback occurs (by default every 5 seconds). These guarantees are enabled by ZFS' disk format, which places all data into a Merkle tree that stores 256-bit checksums and is changed atomically via a two-stage transaction commit.. ... Sharing a common code base with other Open ZFS platforms has given ZFS on Linux the opportunity to rapidly implement features available on other Open ZFS platforms. At present, Illumos is the reference platform in the Open ZFS community and despite its ZFS driver having hundreds of features, ZoL is only behind on about 18 of them."
This discussion has been archived. No new comments can be posted.

The State of ZFS On Linux

Comments Filter:
  • Working well for me (Score:4, Informative)

    by zeigerpuppy ( 607730 ) on Thursday September 11, 2014 @11:04AM (#47880507)
    I've been using ZFSonLinux for a year in production. No problems at all. It's my storage back end for Xen Virtual machines. Just make sure you use ECC RAM and a decent hard disk controller. Instant snapshots and ZFS send/receive functions are awesome, have reduced my backup times by an order of magnitude. I use a Debian Wheezy/Unstable hybrid.
  • I've been using this for a production fileserver for about a year and a half. Prior to that I was using ZFS on FUSE for about a year.

    The only minor negative things I can say is that when you do have some odd kind of failure ZFS (and this may be the case on BSD and Solaris) gives you some pretty scary messages like "Please recover from backup" but usually exporting and importing the FS brings it back at least in a degraded state. My other caveat might just be my linux distro but I've often had problem
  • by Anonymous Coward

    It's a killer file system. Once you've used it, you won't be able to leave it.

  • So how much space does the chechsums take up? How much does all this behind the scenes work slow down the data retrieval/writing?
    Is this something that a normal consumer would use for their main storage?
    • by McKing ( 1017 )

      The checksums don't really take up more physical overhead than a more traditional RAID + LVM setup, and performance is equivalent in my experience (albeit on Solaris 10 and not Linux). There is also the ability to turn on compression, which trades a little bit of CPU overhead for increased disk I/O performance. On a lot of workloads the difference can be dramatic.

      If you are already comfortable with RAID + LVM, then I would wholeheartedly recommend ZFS for your main workstation. I would also recommend taking

    • The overhead is barely there at all. I've measured the performance of the default fletcher4 checksum on a modest 2GHz Core 2 CPU and it comes to around 4GB/s/core. Now given that most CPUs now come with 4 or more cores, in order to get the checksum to be 10% of CPU overhead, you'd have to do be doing around 1.2GB/s of I/O. Needless to say, you're not ever going to get that even for fairly high-performance boxes.
      • by tlhIngan ( 30335 )

        The overhead is barely there at all. I've measured the performance of the default fletcher4 checksum on a modest 2GHz Core 2 CPU and it comes to around 4GB/s/core. Now given that most CPUs now come with 4 or more cores, in order to get the checksum to be 10% of CPU overhead, you'd have to do be doing around 1.2GB/s of I/O. Needless to say, you're not ever going to get that even for fairly high-performance boxes.

        Not really. A SATA3 SSD can push 550MB/sec both ways (limited by SATA3 itself) nowadays - just yo

        • Keep perspective. Are you really going to build a box like that with just one 2 GHz quad-core CPU?
          I have pushed 4GB/s through a SAS SSD array on ZFS, but even so I maxed out on other stuff way before the CPU and much less checksumming ever began to be an issue (e.g. had to go through two LSI SAS 9200-8e HBAs, because one maxes out the PCI-e 2.0 x8 lanes; with two HBAs I maxed out on the two 6G SAS links to my JBOD). That the point of my post. I've yet to see a system which is constrained by the checksummin
  • by N7DR ( 536428 ) on Thursday September 11, 2014 @11:20AM (#47880709) Homepage

    I've been using ZFS on Linux for about a year. I can summarise my position on the experience with two words: it's magic.

    It is still tricky to run one's root system off ZFS (at least on Debian). That, I think, is for those who are brave and have to time to deal with issues that might arise following updates. But for non-root filesystems, ZFS is, as I said, magic. It's fast, reliable, caches intelligently, adaptable to a large variety of mirror/striping/RAID configurations, snapshots with incredible efficiency, and simply works as advertised.

    Someone once (before the port to other OSes) said that ZFS was Solaris' "killer app". Having used it in production for a year, I can understand why they said that.

    • I ran zfs on freebsd for a few years but gave up on it. at one time, I did a cvsup (like an apt-get update, sort of, on bsd) and it updated zfs code, updated a disk format encoding but you could not revert it! if I had to boot an older version of the o/s (like, before the cvsup) the disk was not readable! that was a showstopper for me and a design style that I object to, VERY MUCH. makes support a nightmare.

      I've never seen this in linux with jfs, xfs, ext*fs, even reiser (remember that?) never screwed m

      • it updated zfs code, updated a disk format encoding but you could not revert it

        You can thank your package maintainer for this. ZFS never ever ever upgrades the on-disk format silently. You always have to do a manual "zpool upgrade" to do it. It'll tell you when a pool's format is out of date in "zpool status", but it'll never do the upgrade by itself.

        updating a disk image format and not allowing n-1 version of o/s to read it is a huge design mistake and I'm not sure I understand the reasoning behind it, but until that is changed, I won't run zfs

        Again, this is not ZFS' fault, it's your package maintainer for auto-upgrading all your imported zpools. ZFS never does this by itself.

    • by Rich0 ( 548339 )

      adaptable to a large variety of mirror/striping/RAID configurations

      "Adaptable" is a bit of a stretch here. If you set up a RAID on ZFS, you can't change it, you can only replace individual disks within it, or destroy the entire array.

      That isn't a big deal if you're talking about a ZFS filesystem with a very large number of drives, but it is a big limitation for a small ZFS filesystem. That is, if I have 300 disks in 60 arrays of 5 1TB disks each, and I want to move to 3TB disks, then I just need to add 5 3TB disks, turn them into an array, add them to the filesystem, the

      • then there is no easy way to replace those with 5 3TB drives one at a time and actually get use out of the extra space.

        It's not THAT bad. You do this:
        1. Put new disk in usb cradle.
        2. Run 'zpool replace', swapping new disk for old disk.
        3. Take the new disk and physically replace the old disk.
        4. Repeat 1-3 for each new disk until you have the whole array running at the new capacity.
        5. If autoexpand is not enabled, run the 'zfs online' command with the '-e' flag to use the new capacity.

        I've only used FreeBSD, not Linux - but I presume this would work so long as you are giving ZFS the whole disk. ZFS does not care which interfa

  • How can it be production-ready if it still lacks SELinux support.. the ZOL FAQ suggests either permissive or disabling of it entirely.
    • other security systems exist, many believe that SELinux is causes more problems than it solves

    • by devman ( 1163205 )
      The FAQ is outdated. SELinux support was added in the last release. I run a ZFS system on CentOS 6 with SELinux set to Enforcing and it works fine.
  • Hey, I'm the guy who got modded +5 funny for replying to the 8/10TB disk announcement with "of course they did, I ordered 6TB drives 2 hours ago". Well, I switched my home NAS over to ZFS last month. So, yay for me, for once I'm ahead in at least some minimal sense or other!

    Seriously though, I have found ZFS to be a damned good solution so far. (FYI, CentOS, Core i5, 4GB, 6x4TB with 2-disk parity, 2 eSATA -> port multipliers...) I really don't think I will ever deploy hardware RAID again.

  • by Solandri ( 704621 ) on Thursday September 11, 2014 @02:32PM (#47882751)
    And was very impressed. It was a new 4-drive system I'd put together to operate as both a NAS/fileserver and a host for virtual machines. I had originally intended to use RAID 5, but decided to give ZFS a try after reading about it. My initial config had it booting Ubuntu (maybe Mint? I don't recall), with ZFS for Linux installed as the main non-boot filesystem with one-drive redundancy. I had all sorts of problems with drives dropping out of the array, which I eventually tracked down to the motherboard shipping with bad SATA cables. ZFS handled this admirably. At first I didn't notice one of the drives had dropped, and continued using the system for about a day. When I got the drive working again, as I understand it RAID 5 would have had to do a complete array rebuild because of the changed files. ZFS noticed most of my old data was on the "new" drive and simply validated the checksums as still accurate, then noticed I had written new files and automatically created new redundancy files for them on the "new" drive. The entire "rebuild" only took a little over an hour instead of the 20+ hours I was expecting (how long it takes me to backup the data over eSATA).

    If you're wondering why ZFS trusts the checksums on the "new" drive instead of reading the entire file, it will read the entire file and compare it to the checksum every time you access it. Once a month by default, it runs a "scrub" where it reads every file and verifies they haven't suffered bit rot and still match the checksums. Apparently the strategy after a dropped drive is to get the redundant filesystem up and running again ASAP, then do the file integrity scrub afterwards at its leisure. (You can manually force this check at any time with a zfs scrub.)

    The other main advantage I'd say is that it's incredibly flexible when you're putting together redundant arrays. RAID 5 normally requires 3+ drives or partitions of the same size. ZFS lets you mix together drives, partitions, files (yes, one of your ZFS "drives" can be a file on another filesystem), other devices like SAS drives, etc. You can even put the 3+ "drives" needed for redundancy onto a single drive if you just want to play around with it for testing.

    The only problem I ran into was with deduplication. Dedup was part of the reason I decided to try ZFS, and is one of the features frequently mentioned by ZFS advocates. While dedup does work, it is an incredible memory and performance hog. Writes to the ZFS array went from 65+ MB/s (bunch of mixed random files) down to about 8 MB/s with dedup turned on, and memory use climbed to where I ordered more RAM to bump the system up to 16 GB. In the end I decided the approx 2% disk space I was saving with dedup wasn't worth it and disabled it.

    I eventually switch to FreeNAS (based on FreeBSD, which has a native port of ZFS) because it was annoying having to reinstall ZFS for Linux after an Ubuntu/Mint update, and I couldn't see myself doing that after every new release because I wanted features which were added to the core OS. (And if you're wondering, dedup performance is just as bad under FreeNAS.)

"You show me an American who can keep his mouth shut and I'll eat him." -- Newspaperman from Frank Capra's _Meet_John_Doe_