Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Stats Linux

What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage 196

An anonymous reader writes "Enterprise Storage Forum's long-awaited Linux file system Fsck testing is finally complete. Find out just how bad the Linux file system scaling problem really is."
This discussion has been archived. No new comments can be posted.

What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage

Comments Filter:
  • by Anonymous Coward on Friday February 03, 2012 @12:47PM (#38917439)

    How fast a full fsck scan is is my last concern. What about how successful they are at recovering the filesystem?

    • by h4rr4r ( 612664 ) on Friday February 03, 2012 @12:52PM (#38917533)

      If you need to fsck you should already be restoring from backups onto another machine.

      • by rickb928 ( 945187 ) on Friday February 03, 2012 @12:59PM (#38917711) Homepage Journal

        More helpful advice from the Linux community. Thank you ever so much, once again right on point, timely, and effective.

        • by h4rr4r ( 612664 )

          No, just the truth from a real live sysadmin.

          If the question had been how effective a chkdsk was I would have said the same thing.

          Grow up.

        • Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.
          • I'm also still wondering why they tested how long it takes to check a filesystem which has no problems. Why didn't they just test how long it takes to replay the journal if that's all they wanted? They wouldn't have had to wait hours for ext's fsck to finish that way. :)

          • by Nutria ( 679911 )

            Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.

            That's a joke, right? 'Cause we *all* know that Really Bad Things *never* ever happen...

            • I do backups on my home array, and I have a mirror for StuffThatMatters (tm). I still run fsck & chkdisk on my linux and windows machines. More than once I have had them raise flags about a drive that had recoverable read errors. This means it is time to add a new drive to the mirror or a new JBOD disk and re-sync or copy everything over to the new disk and unjoin ore remove the failing disk. While downtime is not an issue for me at home, it is inconvenient. Having these tools run at night when I d

          • Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.

            And something really bad always happens. So please be careful with that word "should".

        • You are not helpful. In the real world fsck is an important determinant of filesystem robustness. In your career, your proverbial butt will be saved at least once by a good fsck, and you will be left twisting in the breeze at least twice because of a bad or absent fsck. Why twice? Because that is how many times it takes to send the message to someone unwilling to receive it.

      • But then again, you'll want to fsck from time to time to know if you have an issue.
        If you're waiting for the issue to appear "hey boss we apparently lost half the db" you'll lose more data during the time the corruption happens and you're not aware of it, than if you detected it earlier.

        Thus being able to fsck in a decent amount of time matters.

        Thats not the only thing of course. Sometimes you don't have a backup. Sometimes things are fucked up. Sometimes you're just required to get the thing running before

        • by h4rr4r ( 612664 )

          I never said such a thing. If you are fscking you are doubting your filesystem and there fore should already be restoring your backups. If you get lucky and everything is ok all you lost was a little time, if not you are ready to roll out the machine the backups went too.

      • Not really. First, there are problems that a filesystem check can repair without damaging the integrity of your data.

        More importantly, some filesystem/disk problems are transparent until you check for errors. Linux is usually set to do a fsck at regular intervals in order to detect errors that might otherwise go undetected. So, in short, you might not know that you need to restore from backups until you do a filesystem check.

      • Yes, because every time I have an unclean shutdown, I sure want to be recovering from tape.

    • by grumbel ( 592662 )

      Yep, my last experience with fsck was after a HDD has gotten a few bad sectors. fsck on the ext3 file system let me recover the data alright, except of course for the filenames, thus I ended up with a whole lot of unsorted and unnamed stuff in /lost+found, which wasn't very helpful. I'd really like to see more focus on how secure the filesystems are and less on how fast they are.

  • by drewstah ( 110889 ) on Friday February 03, 2012 @12:54PM (#38917591) Homepage

    When I had some EBS problems a couple years ago, I figured I would run xfs_check. It seemed to do absolutely nothing, even if there were disks known to be bad in the md array. xfs is nice and fast, but I haven't seen the xfs_check or xfs_repair to do either of the things I'd assume they'd do -- check and repair. I found it easier to delete the volumes and start from scratch, because any compromised xfs filesystem seems to be totally unfixable. Is fsck for xfs new?

    • I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did something -- it told me that it was so corrupted, it couldn't be fixed.
      • by Sipper ( 462582 ) on Friday February 03, 2012 @02:37PM (#38919387)

        I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did something -- it told me that it was so corrupted, it couldn't be fixed.

        I think you mean xfs_repair. On XFS, fsck is a no-op.

        I've never yet seen xfs_repair tell me there was an issue it couldn't fix -- that sounds unusual. However there have been lots of changes to XFS in the Linux kernel in recent years, and occasionally there has been a few nasty bugs, some of which I ran into. Linux-2.6.19 in particular had some nasty XFS filesystem corruption bugs.

    • by Sipper ( 462582 )

      When I had some EBS problems a couple years ago, I figured I would run xfs_check. It seemed to do absolutely nothing, even if there were disks known to be bad in the md array. xfs is nice and fast, but I haven't seen the xfs_check or xfs_repair to do either of the things I'd assume they'd do -- check and repair. I found it easier to delete the volumes and start from scratch, because any compromised xfs filesystem seems to be totally unfixable. Is fsck for xfs new?

      It's not you; xfs_repair will only operate on a filesystem that is not mounted at all. In other words, if you want to run xfs_repair, you need to do it after booting a LiveCD of some kind. Even when using the -d option for "dangerous" which implies that it will operate on a filesystem mounted read-only, xfs_repair will refuse and simply quit.

      However once you do boot a LiveCD and run xfs_repair, it does actaully repair an XFS filesystem. For obvious reasons this is critical to be able to do, because any i

  • They're testing 70 TB of storage, so with current hard drive quality, the odds of an unrecoverable read error are probably close to 100%. It would be simpler to write a two-line fsck utility to report it:

    #!/bin/sh
    exit 1

    • Except they were using RAID60.
      • by _LORAX_ ( 4790 )

        Unless every read does a checksum ( they don't or it would kill performance ) then there is still the possibility of a silent read corruption. At 70TB it would be rare, but not as rare as many would think and would depend on the sector size and checksum on the individual drives.

        • by isorox ( 205688 )

          Unless every read does a checksum ( they don't or it would kill performance ) then there is still the possibility of a silent read corruption. At 70TB it would be rare, but not as rare as many would think and would depend on the sector size and checksum on the individual drives.

          Ideally you'd have something like zfs's scrubbing in the background. Or keep it in the application level (the application stores metadata about the files, may as wel throw in a checksum on create, then have a background checker), however a 1 bit error in an mpeg file isn't important.

          And when you're creating and destroying data at multi-gigabit speed, how do you perform backups?

        • Unless every read does a checksum ( they don't or it would kill performance )

          How does that relate to using the journal checksum option on ext4?

          • by _LORAX_ ( 4790 )

            jornal checksumming only prevents errors in the journal, not once the data has been written to the main storage area. This was done primarily to ensure the atomic nature of the journal is not violated by a partial write.

    • by _LORAX_ ( 4790 )

      After evaluating our options in the 50-200TB range with room for further growth we ended up moving away from linux and to an object based storage platform with a pooled, snapshotted, and checksummed design. One of the major reasons for this was the URE problem, we would virtually be guaranteeing silent data corruption at that size with a filesystem that did not have internal checksums. The closest thing in the OS world would be ZFS whose openness is in serious doubt. It is scary how much trust the communi

    • by gweihir ( 88907 )

      They're testing 70 TB of storage, so with current hard drive quality, the odds of an unrecoverable read error are probably close to 100%. It would be simpler to write a two-line fsck utility to report it:

      #!/bin/sh
      exit 1

      That is only when you use the minimal guarantees from the datasheets. In practice, with healthy disks, read errors are a lot less common.

      • by _LORAX_ ( 4790 )

        That is only when you use the minimal guarantees from the datasheets. In practice, with healthy disks, read errors are a lot less common.

        Are you willing to bet 70TB+ on it, because that's what you are doing.

  • by Anonymous Coward

    This just in:

    Full filesystem scans take longer as the size of the filesystem increases.

    News at 11.

  • Damage? (Score:4, Funny)

    by eggstasy ( 458692 ) on Friday February 03, 2012 @01:02PM (#38917765) Journal

    Honey badger don't give a fsck.

  • by Anonymous Coward

    A single file system that big without checking features that file systems like ZFS or clustering file stores provide seems insane to me.

  • A much better test of linux "big data"

    1) write garbage to X blocks
    2) run fsck if no errors found, repeat step 1

    How long would it take before either of these filesystems noticed a problem and how many corrupt files do you have? With a real filesystem you should be able to identify and/or correct the data before it takes out any real data.

  • Each pool is a LUN that is 3.6TB in size before formatting or actually 3,347,054,592 bytes as reported by "cat /proc/partitions".

    a file system with about 72TB using "df -h" or 76,982,232,064 bytes from "cat /proc/partitions"

    Yeah, I think there's definitely a scaling problem there.

    Or perhaps a reading comprehension problem, since /proc/partitions reports in blocks, not bytes, but either way it doesn't inspire any kind of confidence in the rest of their testing methodology.

  • by erice ( 13380 ) on Friday February 03, 2012 @03:55PM (#38920415) Homepage

    When an article about fsck has a tag line of "What's the damage", I expect to see some discussion of how fsck deals with a damaged file system.

    The time required to fsck a file system that doesn't need checking is less interesting and inconsistant with the title. Although, if fsck had complained about the known clean file system that would be interesting.

  • 1. Why did they put a label on the RAID devices? They should have just used /dev/sd[b-x] directly, and not confused the situation with a partition table.

    2. Did they align the partitions they used to the RAID block size? They don't indicate this. If they used the default DOS disk label strategy of starting /dev/sdb1 at block 63, then their filesystem blocks were misaligned with their 128 kiB RAID block size, and one in every 32 filesystem blocks will span two disks (assuming 4 kiB filesystem blocks).

    3. Why d

  • I am not sure it has much impact, but why would you use a 5 year old linux kernel to perform the test? Maturity is all very nice, but if you are pushing technology, it is not always the best approach.

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!

Working...