Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage Open Source Software Linux

Meet Linux's Newest File-System: Bcachefs 132

An anonymous reader writes: Bcachefs is a new open-source file-system derived from the bcache Linux kernel block layer cache. Bcachefs was announced by Kent Overstreet, the lead Bcache author. Bcachefs hopes to provide performance like XFS/EXT4 while having features similar to Btrfs and ZFS. The bachefs on-disk format hasn't yet been finalized and the code isn't yet ready for the Linux kernel. That said, initial performance results are okay and "It probably won't eat your data — but no promises." Features so far for Bcachefs are support for multiple devices, built-in caching/tiering, CRC32C checksumming, and Zlib transparent compression. Support for snapshots is to be worked on.
This discussion has been archived. No new comments can be posted.

Meet Linux's Newest File-System: Bcachefs

Comments Filter:
  • by TechyImmigrant ( 175943 ) on Friday August 21, 2015 @03:49PM (#50365501) Homepage Journal

    If there's a Mrs Overstreet, she needs to be careful. Linux FS programmers have a bit of a history.

  • Is this the Linux answer to swapcache and the HAMMER filesystem in DragonFly BSD? Of course, a major generalization and oversimplification, but it seems a similar kind of approach to a similar set of problems.
    • by Anonymous Coward on Friday August 21, 2015 @04:26PM (#50365887)

      Definitely not like HAMMER. Every new filesystem in the past 15 years has used b-trees. You have to because traditional block tables don't scale to the huge disks we have today. (At least, that's the conventional wisdom.) Copy-on-write follows naturally from using a b-tree data structure. So that's definitely not new, either.

      HAMMER has so many other design goals that bcachefs isn't even in the same league. For one thing, HAMMER supports online multi-machine replication. HAMMER2 will support multi-master replication.

      And unlike all the other Linux filesystems in development, HAMMER (1) actually exists in final form and (2) is stable. The last useable b-tree based filesystem that emerged from the Linux world and gained any traction was ReiserFS, which officially released in 2001. XFS is the most widely used b-tree FS on Linux, but it originated at SGI on IRIX in 1993.

      bcachefs looks dead in the water to me. It currently doesn't match the performance of ext4. The author claims there's plenty of room for improvement. Well, of course there is for any proof of concept. But there's also a crap-ton of work to do for correctness and recovery. Correctness and recovery is the achilles heel of b-tree based file systems. Making b-tree filesystems performant while keeping them robust is extremely difficult. What differentiates all the contenders are the way they approach optimization, correctness, and recovery. The strategies invariably evolve to become extremely complex--both the design and the code. bcachefs hasn't appeared to even scratch the surface in that regard.

      The optimism of the author suggests to me extreme naivety. I wouldn't touch anything he writes with a 10-foot pole.

      • I can't find it now, but somewhere on the NameSys website they had an interesting piece on what made them so successful in writing a fast filesystem.

        Essentially, and I am quoting from memory now, classical file system codes start with a grand idea of what *should* be a fast architecture (e.g. B/B+/dancing trees, etc.), then code that in all at once until perfection, and finally benchmark the finished product.
        In contrast, they would try an implementation of a feature, then quickly benchmark it on several fil

  • by UnknownSoldier ( 67820 ) on Friday August 21, 2015 @04:14PM (#50365767)

    > a modern COW filesystem with checksumming, compression, multiple devices, caching, and eventually snapshots and all kinds of other nifty features

    Instead of yet another FS flavor of the month, or year, (Reiserfs, Btrfs, Bcachefs, etc.) and all the man-hours wasted re-solving the same old problems how about just doing it right the first time (ZFS) ?? Because this is what it is turning into. What advantages bachefs have over ZFS??? There is no way in hell I'm going to trust an unproven, buggy, and incomplete FS when we already have one that works.

    Fixing the Butr free space shenanigans [kernel.org] would have been a step in the right direction: An existing debugged FS.

    Reminds me of this xkcd #927: Standards [xkcd.com]

    • by Trepidity ( 597 )

      Is ZFS-on-Linux production ready yet?

      • Re: (Score:3, Informative)

        by Anonymous Coward

        Yep, just not included in standard distros due to licensing.

      • Re: (Score:3, Informative)

        by danbob999 ( 2490674 )

        BTRFS is more ready than ZFS is. It is already pretty stable, in the kernel, and distros are talking about using it as a default FS.
        The main problem to its adoption is that most people don't need the extra features over ext4 and don't really care.

        • by Anonymous Coward

          "Pretty stable"? That's a standard? I don't know about you, but a FS is either stable or it's not. If it loses your data only sometimes, it is not "pretty stable" the rest of the time.

        • by Anonymous Coward on Friday August 21, 2015 @05:14PM (#50366249)

          You know that www.phoronix.com lost data due to BTRFS recently? The author, Michael, wrote an article on the corrupted data using BTRFS this month. He deemed it not to be production ready. And if you read the forum comments on the article, lot of people wrote that they also got corrupted data due to BTRFS. Not production ready.

          • by Anonymous Coward on Friday August 21, 2015 @05:37PM (#50366427)

            Yeah, that clearly was die to BTRFS, totally unrelated to running a git master kernel.
            Not 4.2.x.
            Not 4.2.
            Not 4.2-rc.
            Linus' git master.
            In production.

            • by rl117 ( 110595 )

              This may be the case, and the experiences of Phoronix are certainly "higher profile" than the rest of us. However, the reality is that it's been awful in all the stable kernel releases as well. Maybe not this bug, but unbalancing and occasional dataloss have been a common experience for many people, including myself.

            • by Bengie ( 1121981 )
              A well designed FS should not allow for committed data to be lost. A correct design results in a very binary state, either things work or they don't. If they work, you don't lose data, if they don't work, it immediate errors out instead of chugging along and corrupting data.

              Of course this is ideal and not 100% attainable, but you can get really damned close. Fail early and fail with a clear error message with the exact reason. All reasons should be accounted for.
        • by fnj ( 64210 )

          NO WAY is btrfs even in the same class of reliability and robustness as ZFS. So no, it is NOT production ready. And no, I won't be easily impressed just because some dopey distros take a chance on it.

          • A filesystem does not need to be good enough to trust absolutely, because no filesystem should ever be trusted absolutely.

            You just need to be confident that the chance of a fault is low enough that you can accept the amount of downtime it will take to restore from your backups. Which, I hope, are taken often and stored on independent physical media.

            • That's a terrible argument.

              The fact that we've had crappy storage systems in the past that mandated backups is no reason to state that it should always be like that.

              A good file system(tm) should protect against the weaknesses of the underlying physical media, it should provide snapshots, it should provide (configurable) redundancy, and it should support geographically separated redundancy.

              Especially the last one is very much a work in progress for the DIY-self-sufficeint-user, but we should and will get the

          • NO WAY is btrfs even in the same class of reliability and robustness as ZFS.

            We are talking about ZFS for Linux. Not ZFS in general.

            • by fnj ( 64210 )

              We are talking about ZFS for Linux. Not ZFS in general.

              So am I. Point?

              • ZFS for linux is nowhere nears production ready. It isn't even in the kernel (and never will, because of license)

        • by rl117 ( 110595 )

          Nonsense. Btrfs is still not reliable. I've lost data from it twice when it trashed the filesystem irrecoverably, and more recently I've been suffering from the need to periodically rebalance. Where periodically is "approximately every 36 hours" under high load. I'm afraid that a filesystem which randomly stops working every 1.5 days because it used up all the free space (despite the disk being under 10% full) is most emphatically *not* production ready by any stretch of the imagination. Even FAT was m

    • Re: (Score:3, Informative)

      by rubycodez ( 864176 )

      You are funny, ZFS is a horrible resource pig. Many superior alternatives exist

      • by Anonymous Coward on Friday August 21, 2015 @04:34PM (#50365949)

        Wrong. If you have enough RAM (about 4x what you would normally estimate you need) and a fast enough CPU and keep your storage pool small, then it is not a hog. I wish people here would stop trying to mislead us about ZFS.

        • by Anonymous Coward

          So... it's a horrible resource pig.

        • So, how did I manage to run ZFS on fairly old and underpowered Solaris servers without performance problems almost ten years ago? Did the Linux guys somehow break the whole thing as they ported it to Linux?
          • by Bert64 ( 520050 )

            Because your fairly old and underpowered solaris servers likely had small disks...

        • by tlhIngan ( 30335 )

          Wrong. If you have enough RAM (about 4x what you would normally estimate you need) and a fast enough CPU and keep your storage pool small, then it is not a hog. I wish people here would stop trying to mislead us about ZFS.

          That seems to limit its usefulness. I mean, ZFS typically wants 1GB of RAM per TB of storage. And ZFS makes perfect sense for large storage NAS tasks.

          But that also means that since commodity processors have a 32GB RAM limit (slightly lower thanks to peripherals), that limits you to 32TB of

          • by fnj ( 64210 )

            I am running every day 36 TB of raw storage - 24 TB after redundancy is subtracted - in two 6-drive RAIDZ2 ZPOOLs on a CentOS6 box with 8 GB of RAM devoted to ZFS (total installed RAM is 16 GB). Both pools are nearing 90% full. Performance is excellent and problems nil. So you can push well past the 1 TB/GB rule of thumb.

            Yeah, if you traverse the entire system listing files, it's a little slow because my RAM ARC cache is so limited, and I have no SSD L2ARC cache. That's an acceptable tradeoff for my purpose

          • That seems to limit its usefulness. I mean, ZFS typically wants 1GB of RAM per TB of storage.

            And more again if you do de-duplication, but there's one key thing that is missing from the discussion: If you keep ZFS happy with loads and loads of RAM to cache then it far outperforms filesystems with comparable feature sets, and even out performs simpler file systems like EXT3/4 on some metrics like doing random I/O on spinning disks.

            Throwing 16GB of RAM at EXT3/4 won't make it any faster, but on ZFS it will, and it would seem that not offering enough RAM for the entire cache only has a minor performanc

          • No, with dedupe enabled, ZFS runs best with 1 GB of ARC space (including L2ARC) for every TB of unique data in a pool.

            With dedupe turned off, all data is unique, but then you need less ARC to manage it.

            We have a couple of 40 TB pools running with only 32 GB of RAM without issues.

            We also have a couple 96 TB pools running with 128 GB of RAM; one even has dedupe enabled and runs without issues.

            And I've run it at home on a P4 system with only 2GB if ram without issues. Nursing from raidz1 using 160G drives, to

        • by mcrbids ( 148650 )

          Disclaimer: I ZFS.

          We had a problem that ext* just couldn't handle. We have a medium sized filesystem with about 250 million data files that we needed to back up. Every day. Rsync completely failed at the job, taking between 1 and 2 days to do the job.

          Desperate to find a solution, we tried ZFS and snapshot replication. Our time to replicate to DR, dropped from days to a few hours, backup storage requirements dropped through the floor, and server load dropped at the same time! This is on a reasonably priced s

          • by swb ( 14022 )

            I've only played with it a little via nas4free (which probably limits what I know further), but what would have seemed to make more sense to me would be just add disks/LUNs to the pool without any specific redundancy assignments and create vdevs as device block-level parity sets across all pool members.

            These virtual vdevs could be restriped on demand to change RAID levels and adding a disk to the pool would cause it to rebalance the stripes across all pool members. Removing a disk would be the opposite, pr

            • by Bengie ( 1121981 )
              The problem with ZFS is the only way to rebalance vDevs or removing them is to do pointer re-writing, and pointer re-writing by definition leaves the FS in an inconsistent state during the transition, and they refuse to do anything that leaves the FS in an inconsistent state. What they need is a way to atomically update all references that point to a a data block. There are some ways to do this, but the cure is worse than the disease. It's simpler to just make a duplicate system with your corrected vDevs an
          • by lewiscr ( 3314 )

            As a long time ZFS admin, I have a few suggestions.

            ZFS snapshots and send are much faster than rsync. Nearly all of them time is spent actually transferring data, and very little is spent enumerating data. One day it dawned on me that I could do hourly, or even 5 minute, snapshot && send on machines that could only handle daily rsyncs on ext4. It still depends on your write bandwidth and overwrite percentage, but it removes number of files from the equation.

            Regarding vdev reorganization, it's tru

      • You are funny, ZFS is a horrible resource pig. Many superior alternatives exist

        Yup, boy did Sun call that one wrong, because spare processing & memory capacity haven't been steadily rising since ZFS's introduction at all, have they...

        • virtualization has become HUGE since Sun introduced ZFS, and in that scenario memory and cpu are very cafefully rationed resources. Thus ZFS becomes an unwanted resource pig

      • by Bengie ( 1121981 )
        You can configure ZFS to only cache metadata in memory. Out of the box, ZFS comes configured for servers with at least 8GiB of memory. You can tweak it to be decent on low end hardware, but it really isn't meant for embedded systems.
    • License, maybe? And then performance, as said in TFS.

    • by Anonymous Coward

      COW

      Moooooo?

    • by Anonymous Coward

      Both ZFS and btrfs have buglists a mile long. I wouldn't say either one is "working" yet.

      Personally I only used btrfs and while it hasn't lost me data yet, I found it easy to deadlock and a horrible disk space hog over time. But friends at companies that tried going with ZFS on Linux were similarly unimpressed by it.

  • I like the idea this filesystem is going for... it can be useful as a cache, so that hardcore random I/O is smoothed out before it goes onto HDD platters, so a SSD can function as a place for the OS, and as a cache between a drive array or slow external drives.

    My only addition would be encryption. If it is designed to work as a transient, ephemeral filesystem where data is only kept until it is safely copied to the real filesystem, then maybe encryption should be a part of this, with keys for data periodic

  • Funding Needed (Score:4, Interesting)

    by bezenek ( 958723 ) on Friday August 21, 2015 @04:56PM (#50366119) Journal

    From Mr. Overstreet's announcement:

    PSA: Right now I'm not getting any kind of funding for working on bcachefs; I'm
    working on it full time for now but that's only going to last as long as my
    interest and my savings account hold out. So - this would be a wonderful time
    both for other developers to jump in and get involved, and for potential users
    to pony up some funding. If you think this is interesting and worthwhile and you
    want to see it completed and upstream - especially if you're at a company that
    might make use of it - talk to your $manager or whoever and nag them until they
    send me a check :)

    • Please sponsor my hobby ;-)
    • Good luck with that. Nobody seems to want to pay for system level software anymore. They might shell out a few bucks for a game that they will grow tired of after a few weeks, but they expect their OS, tools, and other platform software to be free (as in beer). You might build a system, library, or algorithm that collectively saves the world economy a $ billion dollars per year in saved time, electricity costs, and/or hardware upgrades; but don't expect to get paid anything for doing it. Sad as that might b
  • "It probably won't eat your data — but no promises."

    Well, that's a ringing endorsement if I ever heard one. Thanks but no thanks.

  • Hmmmm (Score:5, Insightful)

    by eyegone ( 644831 ) on Friday August 21, 2015 @09:49PM (#50367703)
    I guess writing a new filesystem is easier than fixing the existing bugs in bcache itself.
  • Seriously, no 'file manager' solution I've seen so far works adequately, and in a way that preserves such tags across devices / disks / etc. What do other slashdotters do for tagging purposes?
    • I am working on it now but without funding it is going much slower than I had hoped. About half finished, but works great so far. See the video at http://youtu.be/2uUvGMUyFhY [youtu.be]

"To take a significant step forward, you must make a series of finite improvements." -- Donald J. Atwood, General Motors

Working...