Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Open Source Operating Systems Oracle Software Linux

Oracle Engineer Talks of ZFS File System Possibly Still Being Upstreamed On Linux (phoronix.com) 131

New submitter fstack writes: Senior software architect Mark Maybee who has been working at Oracle/Sun since '98 says maybe we "could" still see ZFS be a first-class upstream Linux file-system. He spoke at the annual OpenZFS Developer Summit about how Oracle's focus has shifted to the cloud and how they have reduced investment in Solaris. He admits that Linux rules the cloud. Among the Oracle engineer's hopes is that ZFS needs to become a "first class citizen in Linux," and to do so Oracle should port their ZFS code to Oracle Linux and then upstream the file-system to the Linux kernel, which would involve relicensing the ZFS code.
This discussion has been archived. No new comments can be posted.

Oracle Engineer Talks of ZFS File System Possibly Still Being Upstreamed On Linux

Comments Filter:
  • by ZorinLynx ( 31751 ) on Wednesday October 25, 2017 @06:03PM (#55432843) Homepage

    One nice thing about ZFS not being in upstream is that it is currently maintained and updated separate from the Linux kernel.

    Now, it would be nice to relicense ZFS under GPL so that it can be included in the kernel. But this should wait until the port is a bit more mature. Right now development is very active on ZFS and we have new versions coming out every few weeks; having to coordinate this with kernel releases will complicate things.

    All this said, relicensing ZFS would definitely help Oracle redeem themselves a bit. After mercilessly slaughtering Sun after acquiring them, they have a long way to go to get from the "evil" side back to the forces of good.

    • by davecb ( 6526 )
      It might also cut their maintenance costs, something Oracle often likes.
    • Comment removed based on user account deletion
      • by JBMcB ( 73720 ) on Wednesday October 25, 2017 @06:59PM (#55433187)

        Funny, I thought ZFS was very mature by now.

        It's very mature, on Solaris. Linux has a different ABI to the storage layer, and different requirements on how filesystems are supposed to behave. So it's not so much a port as a re-implementation.

        • by pr0nbot ( 313417 )

          ZFS is mature, but has some curious omissions.

          For example, as far as I know it can't use a disk span within an RAID set. I.e. You can't mirror a 4TB drive, say, with 2x2TB drives spanned to present as a 4TB device. (Which is the kind of thing that would make a small home NAS be able to really easily re-use small disks.) If I'm not wrong then I can only assume that's too niche a case to be interesting in the enterprise environment.

          • by JBMcB ( 73720 )

            You can't mirror a 4TB drive, say, with 2x2TB drives spanned to present as a 4TB device.

            It doesn't do it natively, but you can hardware RAID the 2x2TB drives and it will treat it like a single 4TB device. It's not best practice, because ZFS uses the SMART counters to warn you of impending drive failure, and hardware RAID masks those, but you can do it.

            • by Aaden42 ( 198257 )

              Do any existing RAID systems allow you to do that in one step?

              You could do in in Linux w/ ZoL by using md to stripe or concatenate the two smaller devices & feed the md block device to ZFS. It should work, but I could see ZFS making some ungood decisions based on hiding the underlying hardware from it. Dunno about performance either.

              • by rl117 ( 110595 )
                That would remove some of the data integrity guarantees and healing of corrupt data which ZFS provides. There are good reasons for the layered design of ZFS, and giving it full control of the underlying storage is required if you want the best performance and robustness out of it. Create a mirrored vdev like ZFS wants and you'll have a much happier experience if one of them fails. Like fast resilvering instead of a full array sync.
            • How does raid hide those? When I log into the various raid configs, they have smart reporting. Do you just mean it doesn't take action on the smart values?
          • Says who? I've done that. A bunch of examples if you google for it.

            • by pr0nbot ( 313417 )

              I googled extensively at the time I was setting up my home NAS (~4 years ago). If it's possible without doing the spanning using something outside ZFS (e.g. hw RAID as others have suggested) I'd be really interested, as from time to time I grow the storage and have to partition the disks in interesting ways.

              • by rl117 ( 110595 )
                This has always been possible. You grow the storage by adding more vdevs, or by upgrading the capacity of an existing vdev. So you can start with a vdev of a single mirror, and then you can add another mirror vdev, and another... and the zpool will be striped across all the vdevs. This increases the number of iops the pool can sustain linearly with the number of vdevs. For each mirror vdev, you can swap out a disc with a larger capacity drive, resilver it and then repeat for the other drive, which will
              • You realize you can use a file as a vdev, right? This means if you want to use two 1TB drives as "one" device, you just create a pool using 2x 1TB drives, and then in your new set up you just refer to the file that was created.

                zpool create test /tank/test/zpool
                or
                zpool create test /dev/blah/fake-2TB-device

          • by devman ( 1163205 )
            You can add any block device you want to a vdev. The recommendation is that it be a physical drive, but there is no software limitation.
          • by rl117 ( 110595 )
            You can absolutely do this. You create two mirrored vdevs, so that reads and writes are striped across the pair of mirrors. I have this exact setup in my home NAS; see this example [pastebin.com] for how it's set up. Note the sizes aren't strictly true for the used space since it takes compression into account; the discs are paired 1.8 and 2.7T discs; I've upgraded one pair to larger capacity drives, and I'll likely do the same for the other pair next time I upgrade it (or add a third mirror).
          • You can't mirror a 4TB drive, say, with 2x2TB drives spanned to present as a 4TB device.

            Err yes you can. You just really shouldn't given you're doubling the failure rate of one of the vdevs, and also hosing the flexibility of ZFS which would at best benefit from using all three drives in a single RAIDZ and upgrading them on failure at which point your pool automatically grows to the lowest sized drive.

            You can do whatever you want. ZFS won't stop you from turning your hardware into an maintainable mess. The fact that you CAN do this in ZFS I see it as a huge downside. The only end result here w

    • I'm gonna disagree with you a bit here. Each portion of the kernel has its own maintainer and I don't see how ZFS being upstreamed would change that at all. Likewise, does not the maintainer of, say, the TTY subsystem (just a random pick...) make active changes *between* release cycles, submitting their LAG to the various RCs? Not saying that you are 100% wrong, but...help me out here.
      • by Kjella ( 173770 )

        Likewise, does not the maintainer of, say, the TTY subsystem (just a random pick...) make active changes *between* release cycles, submitting their LAG to the various RCs?

        Not to RCs. As I understand it the kernel is on a three month cycle, one month merge window and roughly two months of weekly RCs that are only supposed to be bug fixes. Otherwise you might get an undiplomatic response from Mr. Torvalds. Worse yet, many distros ship kernels much older than that and despite having "proper channels" bugs often go directly upstream with a resolution of "we fixed that two years ago, update... sigh, waste of time". So if you're not really ready for production use, being in the ke

    • Re: (Score:3, Insightful)

      by Anonymous Coward

      Oracle is evil ... period. There is no going back.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      I don't believe this is Oracle's better nature or whatever; ZFS has to transition from Solaris to Linux because Solaris is dead.

      It's really that simple. If Oracle can gin up a little excitement and maybe score some kudos then great, why not? But ultimately this has to happen or the official Oracle developed ZFS will die with its only official platform.

    • by DrXym ( 126579 )
      I don't see how the choice of being an upstream is lost by going GPL. If the ZFS group said "we're GPL now but we're not ready to land yet, give us some time" then the kernel won't land it.

      But this is Oracle we're talking about. I doubt they would GPL something because in their minds they'd lose control of it and allow the competition to exploit their code. After all, that's what Oracle has done itself to competitors like Red Hat. Aside from that, assuming they did GPL it, then it would immediately fork b

    • One nice thing about ZFS not being in upstream is that it is currently maintained and updated separate from the Linux kernel.

      And that's actually a huge problem that makes it a major obstacle to its upstream adoption.
      Mainly due to code duplication.

      ZFS (and its competitor BTRFS) is peculiar, because it's not just a filesystem. It's a whole integrated stack that includes a filesystem layer on the top, but also a volume management and replication layer underneath (ZFS and BTRFS on their own a the equivalent of a full EXT4 + LVM + MDADM stack).

      That is a necessity, due to some features in these : e.g. the checksuming going on in the f

    • they have a long way to go to get from the "evil" side back to the forces of good.

      What do you mean "back"? I can't ever remember a time when Oracle wasn't obnoxious.

    • No. I've been running ZoL for almost a decade. It's constantly being bitten by kernel API changes and the kernel devs will break ZFS without a second thought and it happens all the time.

      It's been a little while since we've been three months without a working ZFS-head build on Fedora (or other newish kernels) but there's still nothing stopping it from happening.

      Dual-licensing to something GPL-compatible would allow parts of the SPL/ZFS stack to be brought in-kernel, even if most of it stayed outside, at le

  • Some folks don't like the particular set of tradeoffs, but for a filesyste (as opposed to an object store, one of which I'm testing right now), it's a very good offering. I definitely want it on my Fedora dev laptop, along with a write cache on flash.
    • ZFS wants to live in a fairly specific configuration. It wants a bunch of drives, a bunch of memory, and not much competition for system resources. It's really a NAS filesystem, which is why there are no recovery utilities for it. If your filesystem takes a dump, you're SOL, hope you have a backup.

      You can run it on a single drive on a desktop machine, but you are incurring a bunch of overhead and not getting the benefits of a properly set up ZFS configuration.

      • Re:Careful there (Score:5, Insightful)

        by dnaumov ( 453672 ) on Wednesday October 25, 2017 @06:37PM (#55433021)

        ZFS wants to live in a fairly specific configuration. It wants a bunch of drives, a bunch of memory, and not much competition for system resources.

        Except for the part where it works with 2 drives, on a system with 4GB of RAM and under constant heavy load just fine.

        • by HuguesT ( 84078 )

          Precisely, a bunch of drives, or a RAID, starts at two drives.

          • Precisely, a bunch of drives, or a RAID, starts at two drives.

            Being pedantic here, but you are wrong, and there are circumstances where this matters.

            You cam make a RAID1 array with one drive plus a failed (non-existent) drive. Hence the minimum is actually 1 drive, not two.

            • by dfghjk ( 711126 )

              RAID, as defined in the original paper, involves data striping and striping cannot be implemented with less than 2 drives.

              If you desire redundancy, RAID requires a minimum of 3 drives. A mirrored drive pair is not RAID, it is just mirroring.

              • Depends on what you mean by a drive. I have a horrible hard drive which was declared almost in its grave by SMART long ago. I made 2 partitions, run "software RAID1" across the 2 partitions , and store one final backup on it.

                If it dies, nothing is lost.

          • Precisely, a bunch of drives, or a RAID, starts at two drives.

            Actually you're more than happy to run it on 1 drive as well. There's nothing "precise" about the GP's assertion that ZFS wants a fairly specific configuration.

        • Are you doing Z+1? Or just striping with an L2ARC, which is nearly pointless? What's the areal density of the drives? 'Cause if you are using anything above 2TB the odds of getting uncorrectable errors on both drives becomes non-trivial.

          At this point you are better off using XFS with a really good backup strategy.

          • by dfghjk ( 711126 )

            So they say. Don't you find it odd that a drive can't possibly correct for errors but a filesystem can?

            I wonder if drive vendors acknowledge that 100% of their high capacity drives are incapable of functioning without uncorrectable errors. Perhaps they should implement ZFS internally and all problems would be solved.

            • So they say. Don't you find it odd that a drive can't possibly correct for errors but a filesystem can?

              That's because the filesystem can just write to a different spot on the device, but if a specific spot on the physical device goes bad it's bad. In fact, almost all drives automatically error correct, you can see the stats through utils like "smartctl". A drive generally has +10%-+20% of advertised capacity, and exports a virtual mapping of the drive. As sectors start to show signs of failing, the address is transparently mapped to some of this "extra" space and things continue as normal. It's only a drive-

              • the filesystem can just write to a different spot on the device, but if a specific spot on the physical device goes bad it's bad.

                That's not true at all. Modern HDDs can remap sectors, to other tracks if necessary.

            • by JBMcB ( 73720 )

              A drive can correct for errors if a block is bad. Problem is, as areal densities increase, the odds of data changing randomly increases. This is mainly due to cosmic rays or other natural sources of radiation, but there can be other factors. The drive doesn't know anything about the data itself, it only knows if it can read a block or not, and that's really the way you want it. You want the drive to be structure and data agnostic. Otherwise you would need a specific drive for a specific file system, which

          • Why? All the articles you link to describe one failure mode which is not only theoretical but all can be avoided by simply not scrubbing the pool. No one is forcing you to do that, and you can run ZFS just as happily as any other file system with non-ECC RAM and still get some of the benefits including the filesystem potentially alerting you to failing RAM rather than silently screwing your system as it would with any other filesystem.

          • Correction: I have sat down and read all your links in detail. All of the claims that ZFS scrubbing will destroy your pool on non-ECC RAM is actually garbage which doesn't take into account the actual failure mechanism of the RAM or the response of the scrub which is to leave data untouched if an unfixable error occurs. So scrub away.

            GP was right, there is no special hardware requirements for ZFS and you should have no problem letting him administer your sensitive data.

      • by Bongo ( 13261 )

        Whilst I run it on bunches of drives, I also use it on single drives when I want to know all data is correct. Backups are great, but silent data corruption, which gets copied to backup, can mess everything up.

      • I use ZFS on a NAS with a bunch of drives, but I also use it on a hosted VM with under 1GB of RAM on a single (virtual) drive and a few local VMs. The benefits that I'm apparently not getting include:

        • It's trivial to add more storage. If I want to expand a VM, I attach a new virtual disk and simply expand my storage onto it.
        • It's trivial to back up - I can snapshot all of my ZFS filesystems and use zfs send / zfs receive to send incremental snapshots of them to another system (where I can reconstruct all
      • ZFS wants to live in a fairly specific configuration.

        ZFS wants nothing, but many of its advanced features require certain configuration. You want to run it with 12 drives, 32GB of RAM on a simple file server, go for it, it really shines. You want to run it on a single drive on a system with 2GB of RAM, go for it, there's no downsides there vs any other file system.

        It's really a NAS filesystem, which is why there are no recovery utilities for it.

        There's no recovery utilities because they are rarely needed. The single most common configuration involves redundancy. ZFS's own tools include those required to fix zdb errors and recover data on a

  • by Jerry ( 6400 )

    I played with zfs-fuse on KDE Neon a couple years ago after reading from its acolytes that it was "more advanced" and "better" than EXT4 or Btrfs. It wasn't. A lot of it is missing in the fuse rendition.

    I switched to Btrfs. I have three 750Gb HD's in my laptop. I use one as a receiver of @ and @home backup snapshots. I've configured the other two as a 2 HD pool and then as a RAID1, and then back to a pool again. In 2 1/2 years of using Btrfs I've never had a single hiccup with it.

    There are some exce

    • by caseih ( 160668 )

      ZFS fuse is not ZFS on Linux. Not sure why you'd pass judgement on ZFS having only used it years ago with the fuse version. If you want a real test, try the latest ZFS on Linux releases. They are kernel modules not fuse drivers.

      I have run BtrFS for about 5 years now, and I must say it works well on my Laptop with SSD. However on my desktop with spinning disk, it completely falls over. It started out pretty fast for the first few years, but now it's horrible. The slightest disk I/O can freeze my system for

      • by mvdwege ( 243851 )

        Quick question for you: do you have quota's enabled? Updating qgroups takes an enormous amount of time, I had the same symptoms on my laptop on a 1T drive, and turning of quotas and removing qgroups solved it.

        • by caseih ( 160668 )

          Oh you had me excited there for a second. But no, alas, quotas and qgroups are not enabled, as near as I can tell.

          • by mvdwege ( 243851 )
            Heh. It's a nice system, and I like it for snapshotting and incremental backups, but yeah, it still has weird hangups here and there.
    • I played with zfs-fuse on KDE Neon a couple years ago after reading from its acolytes that it was "more advanced" and "better" than EXT4 or Btrfs.

      They should have mentioned no such thing. ZFS-Fuse was a shitty work around to a licensing issue that many people are still arguing may not actually be real. It has effectively been undeveloped for many years and also as a fuse module was not capable of implementing the entire ZFS stack as required.

      Switching to btrfs from zfs-fuse has nothing to do with ZFS itself. You just switched from the worst option to the second best. btrfs is still preferable to ext4 in my opinion, but it doesn't hold a candle to ZFS

  • New to ZFS (Score:4, Informative)

    by AlanObject ( 3603453 ) on Wednesday October 25, 2017 @10:45PM (#55434339)

    Just as this article popped up I was assembling a JBOD array (twelve 4TB drives) for a new data center project, my first in quite a while. Also self funded so I don't have to defer to anyone in decisions.

    When I started I did a bit of reading trying to decide what RAID hardware to get. To make a long story short once I read the architecture of ZFS and several somewhat-polemic-but-well-reasoned blog entries I decided that is what I wanted.

    Only two months ago I had an aged Dell RAID array let me down. I have no idea what actually happened, but it appears some error crept in one of the drives and it got faithfully spread across the array and there was just no recovering it. If I didn't have good backups that would have been about 12 years of the company's IP up in smoke. I just thought I'd share.

    So I ended up as a prime candidate (with new found distrust for hardware RAID) to be a new ZFS-as-my-main-storage user. I've just recently learned stuff that was well established five years ago [pthree.org] and I can't understand why doesn't everybody do it this way.

    Wow. snapshots? I can do routine low-cost snapshots? Data compression? Sane volume management? (I consider LVM to the the crazy aunt in the attic. Part of the family but ...) Old Solaris hands are probably rolling their eyes but this is like mana from heaven to me.

    Given the plethora of benefits I am sure the incentive is high enough to keep ZFS on Linux going onward. ZFS root file system would be nice but I am more than willing to work around that now.

    • > Only two months ago I had an aged Dell RAID array let me down. I have no idea what actually happened, but it appears some error crept in one of the drives and it got faithfully spread across the array and there was just no recovering it. If I didn't have good backups that would have been about 12 years of the company's IP up in smoke. I just thought I'd share.

      It may have been the RAID write hole ?

      See Page 17 [illumos.org]

      • It may have been the RAID write hole ?

        I was wondering if that is what it was, but with the stress of having a major file server down I just couldn't justify the hours it would take to a) learn how to diagnose it and then b) do an analysis. That system had only the one VM left on it so I was just happy enough to take the latest VM image and put it on another hypervisor.

        One drive was making ugly noises so maybe (probably) a head crash. The confident product theory of hardware RAID is that shouldn't have mattered the remaining good drive(s) s

    • You may also want to take a look at btrfs. It sounds like a match for the feature set that interests you, and it is already available on Linux.

      • ZFS is also "already available" on Linux and has been for several years. By comparison btrfs is still in diapers, and currently support has been dropped by all major linux vendors save for SUSE, and whatever the fuck Oracal is doing in the Linux world right now (trying to appear relevant).

        ZFS is more mature and in far more active development.

        • by pnutjam ( 523990 )
          Only Red Hat has dropped support for btrfs. Mainly because they use a patchwork kernel that is really old.
          • That would be insignificant if anyone else other than SUSE was throwing anything behind btrfs. btrfs seems to be losing favour ever since Ubuntu decided to change their roadmap from potentially including to btrfs as a default to declaring it outright experimental with the current roadmap favouring zfs as the future default.

            By support being dropped I don't mean technical support or official support, I mean that the major vendors (other than SUSE) are no longer supporting the idea of btrfs becoming the next g

            • by pnutjam ( 523990 )
              btrfs is still the future, Red Hat and Ubuntu are not cutting edge distros, they are focusing on stability.
              • btrfs is still the future

                For whom?

                Dismissing the two biggest players in the industry as not cutting edge, despite the fact that they aren't abandoning it out of conservatism (Ubuntu isn't anyway) doesn't paint it as "the future".

                Especially giving that ZFS is further in development, more actively developed and has a more advanced roadmap, I question if btrfs is the same kind of "future" as Clean Coal or a slightly more efficient car, etc. If btrfs is the future, you're going to have a hard time convincing people of it.

                • by pnutjam ( 523990 )
                  Show me the distribution that is shipping zfs as a core component. I can show you the one shipping btrfs, opensuse.
                  • Ubuntu.

                    But nice attempting to change the focus of the discussion. Remember the word I used over and over again? "Future" Now please scroll back to the start and read that entire thread over again.

    • Unfortunately, it's not quite there. Very close though.

      https://antergos.com/wiki/miscellaneous/zfs-under-antergos/ [antergos.com]

    • by rl117 ( 110595 )
      You can use ZFS on the root filesystem. I'm writing this on a system which has been using ZFS on root for 18 months now (since Ubuntu 16.04, upgraded through to 17.10 without a hitch).
  • He hopes that... but he has no decision power, i bet. Maybe he's on the next firing list.

    This is Oracle that we're talking about, it's more likely they'll let you license ZFS for a couple thousand per month...

Our OS who art in CPU, UNIX be thy name. Thy programs run, thy syscalls done, In kernel as it is in user!

Working...