Forgot your password?
typodupeerror
Bug Security Linux IT

Denial-of-Service Attack Found In Btrfs File-System 210

Posted by timothy
from the at-that-range-a-hammer-works-too dept.
An anonymous reader writes "It's been found that the Btrfs file-system is vulnerable to a Hash-DOS attack, a denial-of-service attack caused by hash collisions within the file-system. Two DOS attack vectors were uncovered by Pascal Junod that he described as causing astonishing and unexpected success. It's hoped that the security vulnerability will be fixed for the next Linux kernel release." The article points out that these exploits require local access.
This discussion has been archived. No new comments can be posted.

Denial-of-Service Attack Found In Btrfs File-System

Comments Filter:
  • by Nimey (114278) on Friday December 14, 2012 @09:27PM (#42297559) Homepage Journal

    and should we give him a medal or lynch him?

  • by Anonymous Coward on Friday December 14, 2012 @09:35PM (#42297625)

    btrfs is a step in the right direction, but even now, Linux does not have production-level deduplication (which even Windows has, for crying out loud), encryption, snapshots, or something even close to supplanting LVM2.

    I just got out of a meeting at my job because we are replacing some old large servers... and because Linux has no stable filesystem with enterprise features, looks like things are either going to Windows, or perhaps Solaris x86 (which is expensive.)

    This doesn't mean to suck Sun's teat for ZFS access... but at least try to come close to what even NTFS or even ReFS offers...

    • by Anonymous Coward on Friday December 14, 2012 @10:02PM (#42297855)

      ZFS on FreeBSD or FreeNAS is great. Easily saturates gigE with a simple mirror of recent 7200rpm disks. It scales up from there, and FreeBSD is pretty rock solid.

    • by Anonymous Coward on Friday December 14, 2012 @10:07PM (#42297893)

      btrfs is a step in the right direction, but even now, Linux does not have production-level deduplication (which even Windows has, for crying out loud), encryption, snapshots, or something even close to supplanting LVM2.

      I just got out of a meeting at my job because we are replacing some old large servers... and because Linux has no stable filesystem with enterprise features, looks like things are either going to Windows, or perhaps Solaris x86 (which is expensive.)

      This doesn't mean to suck Sun's teat for ZFS access... but at least try to come close to what even NTFS or even ReFS offers...

      Hear hear! Backup admin here, just want to add before the unwashed masses of armchair Linux admins show up, one example of an enterprise filesystem feature is the NTFS change journal. It makes the file system scan as part of an incremental backup run in constant time.

      It's sad on other systems with large numbers of files to schedule subdirectories for different times of day to deal with scanning overhead.

      • by Tough Love (215404) on Friday December 14, 2012 @10:22PM (#42297977)

        NTFS doesn't have snapshots. Instead it relies on volume shadow copies, with known severe performance artifacts caused by needing to move snapshotted data out of the way when new writes come in. Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty. The takeaway: I would not go so far as to claim Microsoft has an enterprise-worthy solution either. If you want something with industrial strength dedup, snapshots and fault tolerance, you won't be getting it from Micorosft.

        • by jamesh (87723) on Saturday December 15, 2012 @12:08AM (#42298631)

          NTFS doesn't have snapshots. Instead it relies on volume shadow copies, with known severe performance artifacts caused by needing to move snapshotted data out of the way when new writes come in. Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty. The takeaway: I would not go so far as to claim Microsoft has an enterprise-worthy solution either. If you want something with industrial strength dedup, snapshots and fault tolerance, you won't be getting it from Micorosft.

          What nonsense. VSS is the snapshot solution for NTFS, and of course it uses copy-on-write. Microsoft VSS backup architecture is years ahead of Linux... LVM is kind of cool but if you have a single database spread across multiple LV's then you can't snapshot them all as an atomic operation so it becomes useless. MS VSS does this, and always has.

          I'm normally a Linux fanboi but when you sprout rubbish like this I have no hesitation in correcting you.

          • Re: (Score:3, Informative)

            by Anonymous Coward

            Tried to find some more information on this. First discovery: VSS stands for "Volume Shadow copy Service", not "Visual SourceSafe", as was my first association. :)

            AFAICT he's saying pretty much what Microsoft is saying [microsoft.com]:

            When a change to the original volume occurs, but before it is written to disk, the block about to be modified is read and then written to a "differences area", which preserves a copy of the data block before it is overwritten with the change. Using the blocks in the differences area and unchanged blocks in the original volume, a shadow copy can be logically constructed that represents the shadow copy at the point in time in which it was created.

            The disadvantage is that in order to fully restore the data, the original data must still be available. Without the original data, the shadow copy is incomplete and cannot be used. Another disadvantage is that the performance of copy-on-write implementations can affect the performance of the original volume.

            Do you have a newer reference?

          • by Tough Love (215404) on Saturday December 15, 2012 @03:17AM (#42299481)

            VSS is the snapshot solution for NTFS, and of course it uses copy-on-write

            Well. Maybe you better sit down in a comfortable chair and think about this a bit. From Microsoft's site: When a change to the original volume occurs, but before it is written to disk, the block about to be modified is read and then written to a “differences area”, which preserves a copy of the data block before it is overwritten with the change. [microsoft.com]

            Think about what this means. It is not a "copy-on-write", it is a "copy-before-write". Gross abuse of terminology if anybody tries to call it a "copy-on-write", which has the very specific meaning [wikipedia.org] of "don't modify the destination data". Instead, copy it, then modify the copy. OK, are we clear? VSS does not do copy-on-write, it does copy-before-write.

            Now let's think about the implications of that. First, the write needs to be blocked until the copy-before-write completes, otherwise the copied data is not sure to be on stable storage. The copy-before-write needs to read the data from its original position, write it to some save area, then update some metadata to remember which data was saved where. How many disk seeks is that, if it's a spinning disk? If the save area is on the same spinning disk? If it's flash, how much write multiplication is that? When all of that is finally done, the original write can be unblocked and allowed to proceed. In total, how much slower is that than a simple, linear write? If you said "on the order of an order of magnitude" you would be in the ballpark. In face, it can get way worse than that if you are unlucky. In the best imaginable case, your write performance is going to take a hit by a factor of three. Usually, much much worse.

            OK, did we get this straight? As a final exercise, see if you can figure out who was talking nonsense.

            • by jamesh (87723) on Saturday December 15, 2012 @04:04AM (#42299661)

              VSS is the snapshot solution for NTFS, and of course it uses copy-on-write

              Well. Maybe you better sit down in a comfortable chair and think about this a bit. From Microsoft's site: When a change to the original volume occurs, but before it is written to disk, the block about to be modified is read and then written to a “differences area”, which preserves a copy of the data block before it is overwritten with the change. [microsoft.com]

              Think about what this means. It is not a "copy-on-write", it is a "copy-before-write". Gross abuse of terminology if anybody tries to call it a "copy-on-write", which has the very specific meaning [wikipedia.org] of "don't modify the destination data". Instead, copy it, then modify the copy. OK, are we clear? VSS does not do copy-on-write, it does copy-before-write.

              Now let's think about the implications of that. First, the write needs to be blocked until the copy-before-write completes, otherwise the copied data is not sure to be on stable storage. The copy-before-write needs to read the data from its original position, write it to some save area, then update some metadata to remember which data was saved where. How many disk seeks is that, if it's a spinning disk? If the save area is on the same spinning disk? If it's flash, how much write multiplication is that? When all of that is finally done, the original write can be unblocked and allowed to proceed. In total, how much slower is that than a simple, linear write? If you said "on the order of an order of magnitude" you would be in the ballpark. In face, it can get way worse than that if you are unlucky. In the best imaginable case, your write performance is going to take a hit by a factor of three. Usually, much much worse.

              OK, did we get this straight? As a final exercise, see if you can figure out who was talking nonsense.

              I concede that the terminology used by the MS article is misused. I don't think you're thinking the performance issues through though. You start with a file nicely laid out linearly on disk, and you take a snapshot so you can make a backup. Now you make a modification to the middle of the file and what happens? Suddenly the middle of the file is elsewhere on disk, and in the case of LVM this is invisible to the filesystem so no amount of defragging is going to fix it. This situation persists long after you have taken your backup and thrown the snapshot away. Of course this doesn't matter for flash but we're not all there yet. If BTRFS does snapshots using copy-on-write (correct definition) then this will be a problem too, although if BTRFS is smart enough it should be able to repair the situation once the snapshot is discarded.

              VSS's way leaves the original data in-order on the storage medium. The difference area is likely on a completely different disk anyway so the copy-on-write (MS definition) could not be performed any other way.

              • Re: (Score:3, Informative)

                by Tough Love (215404)

                Modifications in the middle of files are extremely rare. It's true, running a database on top of a snapshotted spinning disk is probably going to suck. For normal users, keeping regular files mostly linear, and files in the same directory nearby each other is what matters, and yes, Btrfs does a credible job of that.

                I know why shadow copy works the way it does. 1) It's simple, therefore likely to work. 2) It's an easy answer to the "how do you control fragmentation" question. But the write performance issue

                • Dear Microsoft spinmods: you don't change the fact that your volume snapshots suck by modding down my post.

                  • by jamesh (87723)

                    Dear Microsoft spinmods: you don't change the fact that your volume snapshots suck by modding down my post.

                    Troll is a little harsh... I disagree with you but I know you're not trolling and the discussion is still an Interesting one.

          • LVM is kind of cool but if you have a single database spread across multiple LV's then you can't snapshot them all as an atomic operation so it becomes useless.

            You're also wrong about that. You can concatenate multiple logical volumes as a single logical volume and snapshot that atomically.

            • by jamesh (87723)

              LVM is kind of cool but if you have a single database spread across multiple LV's then you can't snapshot them all as an atomic operation so it becomes useless.

              You're also wrong about that. You can concatenate multiple logical volumes as a single logical volume and snapshot that atomically.

              OK this is news to me. When I last asked about that it couldn't be done but that was a few years go. Google doesn't tell me how I can concatenate (say) my database lv and my logs lv (separate vg's because separate spindles), snapshot them, then un-concatenate them... a link would be appreciated.

              • lvm lets you concatenate any block devices into a virtual block device

              • Totally aside from your main point, what does the spindle count have to do with your VG naming?

                pvcreate /dev/sda1
                pvcreate /dev/sdb1
                pvcreate /dev/sdc1

                vgcreate LotsOfDrives /dev/sda1 /dev/sdb1 /dev/sdc1

                Now if you want spindle-specific LVs:
                lvcreate -n dbdata LotsOfDrives /dev/sdb1
                lvcreate -n logdata LotsOfDrives /dev/sdc1

                • by jamesh (87723)

                  I'm still not getting how you can simultaneously snapshot dbdata (optimised for read and write) and logdata (optimised for write) as an atomic operation. "Tough Love (215404)" said "concatenate them together" but I don't get what that means in this context.

                  Last time I checked you would still have to snapshot one, then the other, and the resulting snapshots are almost certainly not going to give you a consistent backup because there would have been writes between the first and the second snapshots.

        • by belrick (31159)

          Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty.

          WAFL doesn't do copy-on-write. Copy-on-write means a write to a block in a file requires the original block to be read, written elsewhere for the snapshot, then the new block written in the original location. That's exactly what WAFL doesn't do. WAFL writes all changed blocks for multiple files in big RAID stripes, updating pointers to current copies and leaving snapshot pointers pointing to old copies of the updated files. Very efficient for writes, but changes almost all reads, random or sequential (w

          • Btrfs, like ZFS and Netapp's WAFL, use a far more efficient copy-on-write strategy that avoids the write penalty.

            WAFL doesn't do copy-on-write. Copy-on-write means a write to a block in a file requires the original block to be read, written elsewhere for the snapshot, then the new block written in the original location. That's exactly what WAFL doesn't do. WAFL writes all changed blocks for multiple files in big RAID stripes, updating pointers to current copies and leaving snapshot pointers pointing to old copies of the updated files. Very efficient for writes, but changes almost all reads, random or sequential (within a file) into random reads (within the filesystem) because file blocks get scattered according to write order, not location of the block within the file. That's why they want lots of spindles in an aggregate and they love RAM cache and flash cache.

            But since you say that copy-on-write avoids the write penalty I think you know what is does but simply don't know that it isn't copy-on-write.

            We both know what we're talking about, we just disagree on terminology. Properly, a "copy-on-write" doesn't modify the original destination. [wikipedia.org] Nobody should ever use the term "copy-on-write" to describe the algorithm that is properly "copy-before-write". The strategy that leaves the original destination untouched and updates pointers to point at the modified copy is correctly called "copy-on-write", but because the terminology has been so commonly abused by the likes of Microsoft and their followers, it is be

      • Deduplication typically isn't done by the operating system in production systems, it is a feature of enterprise grade storage, backup and archival systems.

        Snapshots and encryption can be done in GNU/Linux, or done outside the OS.

        What enterprise grade storage/backup/archival systems are you using, the obvious solution will already be evident from that answer in most cases.

    • Re: (Score:2, Interesting)

      by Anonymous Coward

      Wouldn't it be cheaper and just as effective to use FreeBSD or FreeNAS for your data? if you're considering either Windows or Solaris then obviously you don't need a specific operating system. I would think FreeBSD (or even ZFS on Linux) would suit your purposed better 9and with less expense) than Windows or Solaris.

    • by maz2331 (1104901) on Friday December 14, 2012 @10:51PM (#42298139)

      ZFS on Linux does exist as a kernel module that is pretty stable and works well. http://zfsonlinux.org/ [zfsonlinux.org] -- it was put out by Lawrence Livermore National Lab, but can't be included with the kernel distros due to GPL / CDDL license compatability issues.

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Linux has production level encryption, snapshots, and LVM2. What are you talking about?

      Unless you have very specific uses, deduplication should be done at your storage array really. It's not a high priority to implement in the filesystem. (No, your anecdote does not make it a high priority).

    • Did you guys look at FreeBSD?
    • No (Score:3, Interesting)

      by ArchieBunker (132337)

      Instead of picking a filesystem and moving forward people will moan and cry and eventually split into a few different groups with beta level implementations. Sound on Linux is a great example. Two completely different sound drivers that both work half assed. What's the word with XFS these days?

      • by drinkypoo (153816)

        What's the word with XFS these days?

        I don't know, but my last word is that I dropped it due to data corruption and now I'm using ext4 while I'm waiting for btrfs.

        I was hoping to be using bcache by now too, but alas, no. I have an 80GB SSD and a 320GB HDD, which I will bump up to 2x1TB stripe and backup to 2TB external... just as soon as I can install with bcache without having to do it all manually.

        • by Wolfrider (856)

          --Have you tried JFS? I'm a heavy Vmware user and it works really well, with minimal CPU usage.

      • by diegocg (1680514)

        What's the word with XFS these days?

        http://www.youtube.com/watch?v=FegjLbCnoBw [youtube.com]

    • by guruevi (827432)

      Solaris and it's derivatives can be had for free. You don't HAVE to buy it and it's derivatives like OpenIndiana are very stable.

      • by iggymanz (596061)

        opensolaris is long dead. OpenIndiana has never put out a stable release and never met their 2011 q1 stable release target. they put out a development release once in a while, but that is NOT production grade nor matained at a level suitable for production use

    • Funny, even my home box uses LVM over dm-crypt over RAID on Linux just fine. And that's with Ext4 file systems.

      LVM lets me create a snapshot for consistent backups any time I want.

    • by LWATCDR (28044)

      I would say that you should look at BSD then. If you are willing to go open souce anyway FreeBSD offers ZFS. Too bad that more hardware and software companies do not support BSD as well as Linux.

    • by T-Ranger (10520)
      If you ... your employer ... are prepared to spend money, then why not spend money? I mean, and this is a serious question, why not go with something like a EMC VNX or VNXe? Byte for byte of real physical storage SANs are pretty expensive, I grant, but the features can oft make up for that.
    • [...]I just got out of a meeting at my job [...]and because Linux has no stable filesystem with enterprise features [...]

      Sure, AC has some real complex stuff to handle on an enterprise level. That's why all the big boys like Google, Facebook and Twitter are using Windows to host their data...

      You're either a silly moron, a self deluding enterprisy [a-z]+architect or a very capable troll.

    • "I just got out of a meeting at my job because we are replacing some old large servers... and because Linux has no stable filesystem with enterprise features, looks like things are either going to Windows, or perhaps Solaris x86 (which is expensive.)"

      Somebody notify the millions of Enterprise servers that are Linux based, and serving up a major portion of the internet's content every day! Talk about throwing the baby out with the bathwater. Basically, you don't want to take a chance that established files

    • by Lennie (16154)

      It depends on your needs.

      Take for example the top500, if I'm not mistaken more than 50% of that uses Lustre as the filesystem. Which is obviously Linux based.

      I think both Ceph ("inspired" by Lustre) and btrfs are interresting and I'm sure they'll be more than production ready next year.

      Hopefully with bcache in the mainline kernel too.

    • by Rich0 (548339)

      btrfs is a step in the right direction, but even now, Linux does not have production-level deduplication (which even Windows has, for crying out loud), encryption, snapshots, or something even close to supplanting LVM2.

      Well, that might be why they're working on btrfs, then. :) I'm not sure about encryption, but everything else on your list is something likely to be in the feature list at some point. It obviously isn't stable yet, but that is a matter of time, and if somebody wanted to make a push to get something stable they'd get there a lot faster with btrfs than reinventing something else.

      btrfs already supports reflink copies (think of a copy that behaves like a hard link on initial copy, but each file tracks its own

  • by Anonymous Coward on Friday December 14, 2012 @09:48PM (#42297733)

    no more dangerous than a fork bomb or filling up /tmp or trying to compile open office.

    • by cryptizard (2629853) on Friday December 14, 2012 @10:38PM (#42298081) Homepage
      Sort of, but at least you can recover from those attacks by restarting or booting from an external source to clean up your filesystem. The second attack here leaves you with undeletable files because the file system code responsible for deleting cannot handle the multiple hash collisions. There is no way to recover from that until a patch is pushed out that fixes the problem.
      • by blade8086 (183911)

        Which, without the over sensationalized BS that is this story, will probably be in about a week tops.

        And since BTRFS is not in any 'enterprise' Linux Distributions, means that it will pretty much be available
        immediately since everyone running it in critical production environments will probably be running
        pretty bleeding edge linuxen

      • by drinkypoo (153816)

        The second attack here leaves you with undeletable files because the file system code responsible for deleting cannot handle the multiple hash collisions. There is no way to recover from that until a patch is pushed out that fixes the problem.

        There's no filesystem debugger for btrfs?

        Seems to me like fsck ought to be able to solve this problem, too. Two files with the same hash? Delete the one with the newer timestamp.

        • Two files with the same hash is not a problem, it is allowed. This will happen just by chance many times on your filesystem because the hash is relatively short (64 bits). The problem is when you engineer many files to have the same hash and your data structure (hash table) degrades to an array. There is also some other problem in the code here that makes it so the the hash table can't store or for some reason can't process more than a certain number of collisions.
  • Nice! (Score:4, Interesting)

    by gweihir (88907) on Friday December 14, 2012 @10:21PM (#42297971)

    "Algorithmic Complexity Attacks" like this one have long been known, but rarely been documented publicly. One good example to point out why hash-randomization is a good idea!

  • by Anonymous Coward on Friday December 14, 2012 @10:31PM (#42298029)

    Hopefully more people start fuzzing btrfs so it is that much better when it is declared stable.

    • by Rich0 (548339)

      Lots of people have been doing testing on btrfs. Filesystems aren't so much declared as stable as they become used as stable. Unless the fix changes the on-disk format in some non-backwards-compatible way, it doesn't really matter when the fix gets deployed. Most likely the fixes will be in git in a week or two.

      Oh, and anybody who really wants to run btrfs should probably be running the git version anyway. They're doing so many bugfixes per month that this is one of those rare times where the mainline k

  • Unstable software that is still under heavy development is actually unstable. Who would've guessed?
    I think that based on this ingenious discovery, we should all switch over to it by next week.

  • "Denial-of-Service Attack Found In Btrfs File-System" didn't happen. A vulnerability was found. That's a big deal, no reason to obscure it.

  • by Decameron81 (628548) on Saturday December 15, 2012 @12:18AM (#42298681)

    An attack was found in the filesystem? What's that supposed to mean?

    • by dr2chase (653338)

      Carefully chosen file names (a lot of them) can DOS file system performance. Whether this could be escalated to a network vulnerability, hard to say -- if an attacker over the net can figure out a way to induce particular file names on the server, that would be worse.

      It's a little sad that people are still forgetting about this failure mode of hash tables and hash functions; either there's got to be a randomizing secret swizzled in, or a better (more nearly cryptographically strong) hash function, or both.

    • Indeed, the title makes you think that BTRFS was trojaned or worse is malware.
    • by Noughmad (1044096)

      An attack was found in the filesystem? What's that supposed to mean?

      I'm not sure, but it sure sounds like Mr. Reiser had something to do with it.

  • by Anonymous Coward
    Editors please! I normally expect even a submitter to know the difference between an attack and a vulnerability. However the editor damn well better know the difference. When I read that an ATTACK had been found in btrfs I went to read about how some malicious code had been placed into the code for btrfs. Maybe this code modified data, erases stuff, sends data to China, or just renames files. But no, this was a simple vulnerability. They didn't find an attack in btrfs, they found the potential for an attack

If A = B and B = C, then A = C, except where void or prohibited by law. -- Roy Santoro

Working...