What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage 196

Posted by timothy on Friday February 03, 2012 @01:40PM from the disks-groaning-with-shame dept.

An anonymous reader writes "Enterprise Storage Forum's long-awaited Linux file system Fsck testing is finally complete. Find out just how bad the Linux file system scaling problem really is."

This discussion has been archived. No new comments can be posted.

What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage

Load All Comments

Search 196 Comments Log In/Create an Account

Comments Filter:

fsck speed, want safety (Score:3, Insightful)

by Anonymous Coward writes: on Friday February 03, 2012 @01:47PM (#38917439)

How fast a full fsck scan is is my last concern. What about how successful they are at recovering the filesystem?

Share
twitter facebook
- Re:fsck speed, want safety (Score:5, Insightful)
  
  by h4rr4r ( 612664 ) writes: on Friday February 03, 2012 @01:52PM (#38917533)
  
  If you need to fsck you should already be restoring from backups onto another machine.
  
  Parent Share
  twitter facebook
  - Re:fsck speed, want safety (Score:5, Insightful)
    
    by rickb928 ( 945187 ) writes: on Friday February 03, 2012 @01:59PM (#38917711) Homepage Journal
    
    More helpful advice from the Linux community. Thank you ever so much, once again right on point, timely, and effective.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by h4rr4r ( 612664 ) writes:
      
      No, just the truth from a real live sysadmin.
      If the question had been how effective a chkdsk was I would have said the same thing.
      Grow up.
      - Re:fsck speed, want safety (Score:4, Interesting)
        
        by h4rr4r ( 612664 ) writes: on Friday February 03, 2012 @02:15PM (#38918073)
        
        Because sometimes it does work. Relying on any such software is stupid.
        While the FSCK/CHKDSK runs you restore onto another machine. This way if the check finishes first, you can use it until you can switch over to the restored machine. It also can save your ass if you are not smart enough/fortunate enough to have good backups.
        
        Parent Share
        twitter facebook
        
        Re:fsck speed, want safety (Score:5, Interesting)
        
        by hackstraw ( 262471 ) writes: on Friday February 03, 2012 @02:50PM (#38918699)
        
        The largest filesystem I admin is just shy of 1/2 petabyte. And its one in number. Backing up everything on that filesystem is simply not feasible. To put it in perspective 1 stream @ 200 MiB/s would take almost 28 days to backup the whole thing. I would imagine a restore would take about the same order. Telling hundreds of users their files are unavailable for reading or writing for 30 days is not really an option, so I run fsck.
        Backups simply are not really an option past 20+ terabytes of storage, and simply not feasible if the storage is volatile in nature. AFAIK everyone has gone to redundancy over backups at scale.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by h4rr4r ( 612664 ) writes:
        
        You need to be writing that data to two or more, more really, filesystems at the same time. Streaming replication.
        Redundancy can be backups, if they are in different locations and proper versioning is used.
        
        Re:fsck speed, want safety (Score:5, Insightful)
        
        by chuckymonkey ( 1059244 ) writes: <charles DOT d DO ... AT gmail DOT com> on Friday February 03, 2012 @03:05PM (#38918939) Journal
        
        You're fairly wrong there, you can actually back that much data up. You just have to be willing to pay for some seriously large tape libraries and they're not cheap. We're in the process of installing a 700TB array with a 1.5PB tape library backup. You just have to do the backups using filesystem snapshots and run them pretty much constantly.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by ion++ ( 134665 ) writes:
        
        We're in the process of installing a 700TB array with a 1.5PB tape library backup. You just have to do the backups using filesystem snapshots and run them pretty much constantly.
        And XFS is pretty brilliant for taking filesystem snapshots. Using the command xfs_freeze you can make good snapshots of XFS in what appears to have no downtime at all see XFS manpage like http://linux.die.net/man/8/xfs_freeze [die.net]
        And then run these commands:
        xfs_freeze -f /mount/point && block_level_snapshot && xfs_freeze -u /mount/point
        Last time I checked that did not work with EXT4.
        
        Re: (Score:2)
        
        by siDDis ( 961791 ) writes:
        
        I work with IPTV and VOD, we have 4 PB of data running on FreeBSD and ZFS which is being replicated off site with the send && receive features that comes with ZFS. Since we mostly deal with large media files we have even reversed the replication direction. That means that if master storage needs to go down for maintenance, the other offsite storage becomes the master. At the moment we're looking into using HAST which will make it even easier to switch what storage site that should be the master.
        
        Re:fsck speed, want safety (Score:4, Insightful)
        
        by h4rr4r ( 612664 ) writes: on Friday February 03, 2012 @05:12PM (#38920595)
        
        Most people are worried more about cost then reliability.
        Most people is often a category that does not do things the best way or the right way.
        
        Parent Share
        twitter facebook
        
        Re:fsck speed, want safety (Score:4, Insightful)
        
        by chuckymonkey ( 1059244 ) writes: <charles DOT d DO ... AT gmail DOT com> on Friday February 03, 2012 @05:23PM (#38920745) Journal
        
        I know I'm posting to an AC here, but I want to point something out. "Backups simply are not really an option past 20+ terabytes of storage, and simply not feasible if the storage is volatile in nature." He was claiming that it's not feasible to back up more than 20+ TB of storage when in fact it is. I was pointing out that yes you can, but it's pretty expensive.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by Nutria ( 679911 ) writes:
        
        When did sys admins forget how to install multiple tape dives into a computer?
        
        Re: (Score:3, Insightful)
        
        by phorm ( 591458 ) writes:
        
        If you're in a scenario where "Backups are not really an option", somebody is doing something wrong...
        How long did it take you to get to 0.5PB? If you use a differential backup/sync, then you should generally only need to copy *NEW* data, and the old stuff will already be there.
        
        Re: (Score:2)
        
        by Grishnakh ( 216268 ) writes:
        
        Besides the other responses to this comment, is it not possible to break up that 500TB into smaller, more manageable volumes? Does it all really need to be a single volume? Why not have a bunch of smaller volumes, each mounted to the filesystem? Then, either backing up or fscking any one volume wouldn't take that long (you could have multiple tape drives, each backing up a different volume simultaneously), and if something goes wrong with one volume the other volumes will still be ok.
        
        Re: (Score:3)
        
        by Aighearach ( 97333 ) writes:
        
        Databases.
        
        Re: (Score:2)
        
        by Grishnakh ( 216268 ) writes:
        
        Right, but even so, unless you just have one giant-ass database, if you have multiple data sets, you should be able to organize them into separate databases and put the different databases on separate volumes.
        500TB sounds like one hell of a database if that's just a single one.
        
        Re: (Score:2)
        
        by Aighearach ( 97333 ) writes:
        
        That's what NASA uses XFS for...
        They have 2 of them.
        
        Re: (Score:2)
        
        by jedidiah ( 1196 ) writes:
        
        I was working with 50TB databases 10 years ago. They have to be up to ungodly sizes now.
        
        Re: (Score:2, Interesting)
        
        by Anonymous Coward writes:
        
        So you have 1/2 petabyte storage but 200 MiB/s speed -- are you kidding me ? Is your storage controller broken or really cheap or both ?
        Also, xfsdump (which is used to backup xfs) can do multi-threaded backups.
        Now to comment on the test -- it is completely insane. As mentioned by you and others, if you are running fsck while your whole application is down -- thing broken is not system but the thing inside the skull -- you will obviously need a very fast backup/restore and/or a HA solution, both are not (an
        
        Re:fsck speed, want safety (Score:5, Interesting)
        
        by tlhIngan ( 30335 ) writes: <[ten.frow] [ta] [todhsals]> on Friday February 03, 2012 @04:23PM (#38920029)
        
        The largest filesystem I admin is just shy of 1/2 petabyte. And its one in number. Backing up everything on that filesystem is simply not feasible. To put it in perspective 1 stream @ 200 MiB/s would take almost 28 days to backup the whole thing. I would imagine a restore would take about the same order. Telling hundreds of users their files are unavailable for reading or writing for 30 days is not really an option, so I run fsck.
        Which means You're Doing It Wrong(tm).
        Two words: volume snapshot.
        What it does is give you a view of the filesystem as it exists at that the time the snapshot is taken. The frozen image is mounted in another mountpoint (read-only), while the snapshotted voume is still accessible (read-write). Changes to the volume since the snapshot was taken won't be in the snapshot (obviously).
        Your backup points to that snapshot which won't change and that's copied to tape. Once you're done backing up 30 days later, you delete the snapshot.
        Since your backup takes so long, you'd immediately then make another snapshot and being the backup again.
        If it's a database, the database backup tools work on a database snapshot - it will be correct and consistent as of when the snapshot was taken while the database remains available for reading and writing outside of the snapshot.
        Having to take a system down to back it up is a dead concept on modern OSes as they all tend to have snapshot capability.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by gknoy ( 899301 ) writes:
        
        How do you restore it, in case of failure?
        
        Re:fsck speed, want safety (Score:5, Informative)
        
        by _LORAX_ ( 4790 ) writes: on Friday February 03, 2012 @04:25PM (#38920059) Homepage
        
        Backups simply are not really an option past 20+ terabytes of storage, and simply not feasible if the storage is volatile in nature. AFAIK everyone has gone to redundancy over backups at scale.
        200TB/130TB usable clustered/distributed system with 4x LTO5 drives and we do a full snapshot to tape every week. With data that size you either pay up-front for proper engineering or you pay for the life of the system for poor performance and eventual cleanup of the mess.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by Nutria ( 679911 ) writes:
        
        you restore onto another machine
        ROTFLMAO,
        We struggle to even get test machines; there's no way that "they" would pay for all that kit to just sit there gathering dust waiting for a disaster. If anything, it would be our DR machine and we'd instantly flip production over to it.
        
        Re: (Score:3)
        
        by pankkake ( 877909 ) writes:
        
        Now that's just plain dishonesty.
        It's not because it's not useful most of the time that it is useless or not to be used.
        Some filesystems are not atomic or can be mounted with non-atomic options.
        Data corruption occurs.
        It's simply useful to test if the filesystem is all right. At least for developers.
        Doesn't change the fact that you can't rely on fsck to *recover* data.
        
        Re: (Score:3)
        
        by h4rr4r ( 612664 ) writes:
        
        No, its primary job is to tell you about integrity of the filesystem. Any attempt at fixing it is secondary.
        
        Re: (Score:3)
        
        by Dishevel ( 1105119 ) writes:
        
        Yup.
        Every week I switch over my systems (master/slave arrangement) and take the old master down and fsck.
        Making sure all is well. Sometimes there is a small issue. It fixes it. All is well.
        So far I have never had catastrophe. Where I loose all data on the Master while my slave is down hard.
        Going to a tape back up even a day old is going to be bad news.
    - Re: (Score:3)
      
      by pankkake ( 877909 ) writes:
      
      Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.
      - Re: (Score:2)
        
        by cloudmaster ( 10662 ) writes:
        
        I'm also still wondering why they tested how long it takes to check a filesystem which has no problems. Why didn't they just test how long it takes to replay the journal if that's all they wanted? They wouldn't have had to wait hours for ext's fsck to finish that way. :)
      - Re: (Score:2)
        
        by Nutria ( 679911 ) writes:
        
        Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.
        That's a joke, right? 'Cause we *all* know that Really Bad Things *never* ever happen...
        
        Re: (Score:2)
        
        by networkBoy ( 774728 ) writes:
        
        I do backups on my home array, and I have a mirror for StuffThatMatters (tm). I still run fsck & chkdisk on my linux and windows machines. More than once I have had them raise flags about a drive that had recoverable read errors. This means it is time to add a new drive to the mirror or a new JBOD disk and re-sync or copy everything over to the new disk and unjoin ore remove the failing disk. While downtime is not an issue for me at home, it is inconvenient. Having these tools run at night when I d
      - Re: (Score:2)
        
        by Daniel Phillips ( 238627 ) writes:
        
        Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.
        And something really bad always happens. So please be careful with that word "should".
    - Re: (Score:3)
      
      by Daniel Phillips ( 238627 ) writes:
      
      You are not helpful. In the real world fsck is an important determinant of filesystem robustness. In your career, your proverbial butt will be saved at least once by a good fsck, and you will be left twisting in the breeze at least twice because of a bad or absent fsck. Why twice? Because that is how many times it takes to send the message to someone unwilling to receive it.
  - Re: (Score:3)
    
    by kangsterizer ( 1698322 ) writes:
    
    But then again, you'll want to fsck from time to time to know if you have an issue.
    If you're waiting for the issue to appear "hey boss we apparently lost half the db" you'll lose more data during the time the corruption happens and you're not aware of it, than if you detected it earlier.
    Thus being able to fsck in a decent amount of time matters.
    Thats not the only thing of course. Sometimes you don't have a backup. Sometimes things are fucked up. Sometimes you're just required to get the thing running before
    - Re: (Score:2)
      
      by h4rr4r ( 612664 ) writes:
      
      I never said such a thing. If you are fscking you are doubting your filesystem and there fore should already be restoring your backups. If you get lucky and everything is ok all you lost was a little time, if not you are ready to roll out the machine the backups went too.
  - Re: (Score:2)
    
    by nine-times ( 778537 ) writes:
    
    Not really. First, there are problems that a filesystem check can repair without damaging the integrity of your data.
    More importantly, some filesystem/disk problems are transparent until you check for errors. Linux is usually set to do a fsck at regular intervals in order to detect errors that might otherwise go undetected. So, in short, you might not know that you need to restore from backups until you do a filesystem check.
  - Re: (Score:2)
    
    by PhunkySchtuff ( 208108 ) writes:
    
    Yes, because every time I have an unclean shutdown, I sure want to be recovering from tape.
  - - Re: (Score:2)
      
      by h4rr4r ( 612664 ) writes:
      
      I am not retarded, that is why I have an UPS and I even know how to use tune2fs.
    - - Re: (Score:3)
        
        by HiThere ( 15173 ) writes:
        
        The last time I checked, the system required that fsck be run after a power loss. Also after the first reboot aften n days had passed. (I think n is somewhere around 200, but I haven't been interested enough to pin it down precisely.) And occasionally a system upgrade will require a reboot.
        OTOH, recovery is definitely a lot faster than it used to be, thanks to journaling.
        OTTH, all of my parftitions together are barely over 1TB, so this is only significant (to me) for future systems, when this will have c
        
        Re: (Score:2)
        
        by fnj ( 64210 ) writes:
        
        You must not have checked recently. Both ext3 and ext4 just recover by replaying the journal after a power loss. No fsck is "required".
        I have approximately 36 TB on my active systems. This stuff is significant to me right now.
  - - Re: (Score:2)
      
      by h4rr4r ( 612664 ) writes:
      
      Seconds?
      When you have that much data and you need high reliability you are doing streaming replication to multiple devices and layering other backup methods as well.
      Any idea what the cost of just trusting that the FSCK fixed the problems on 72TB of data your business needs could be?
  - - Re:fsck speed, want safety (Score:5, Funny)
      
      by darkpixel2k ( 623900 ) writes: on Friday February 03, 2012 @02:36PM (#38918453)
      
      when I need to fsck, I just call my girlfriend
      Why? Do you not know how to use the command line?
      
      Parent Share
      twitter facebook
    - Re:fsck speed, want safety (Score:5, Funny)
      
      by lvxferre ( 2470098 ) writes: on Friday February 03, 2012 @04:42PM (#38920259)
      
      Protip: if 'make love' returns no target, you need to do the job by hand.
      
      Parent Share
      twitter facebook
- Re: (Score:3)
  
  by grumbel ( 592662 ) writes:
  
  Yep, my last experience with fsck was after a HDD has gotten a few bad sectors. fsck on the ext3 file system let me recover the data alright, except of course for the filenames, thus I ended up with a whole lot of unsorted and unnamed stuff in /lost+found, which wasn't very helpful. I'd really like to see more focus on how secure the filesystems are and less on how fast they are.
- - Re: (Score:2)
    
    by h4rr4r ( 612664 ) writes:
    
    Basically that.
    They don't want you to know how little they know. If they used the same name over and over that pattern would be visible.
    Or maybe, yeah probably, ALIENS!
    - - Re: (Score:2)
        
        by Grishnakh ( 216268 ) writes:
        
        There's no way aliens are remotely as stupid as humans, at least not if they've managed to travel to earth. We're still arguing whether we should bother going back to our nearby moon, and mostly saying we'd rather just sit around and play with piles of green paper and play Angry Birds; any culture advanced enough to travel to a different star system would be far more intelligent than us.
fsck xfs does something? (Score:3, Interesting)

by drewstah ( 110889 ) writes: on Friday February 03, 2012 @01:54PM (#38917591) Homepage

When I had some EBS problems a couple years ago, I figured I would run xfs_check. It seemed to do absolutely nothing, even if there were disks known to be bad in the md array. xfs is nice and fast, but I haven't seen the xfs_check or xfs_repair to do either of the things I'd assume they'd do -- check and repair. I found it easier to delete the volumes and start from scratch, because any compromised xfs filesystem seems to be totally unfixable. Is fsck for xfs new?

Share
twitter facebook
- Re: (Score:2)
  
  by larry bagina ( 561269 ) writes:
  
  I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did something -- it told me that it was so corrupted, it couldn't be fixed.
  - Re:fsck xfs does something? (Score:4, Informative)
    
    by Sipper ( 462582 ) writes: on Friday February 03, 2012 @03:37PM (#38919387)
    
    I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did something -- it told me that it was so corrupted, it couldn't be fixed.
    I think you mean xfs_repair. On XFS, fsck is a no-op.
    I've never yet seen xfs_repair tell me there was an issue it couldn't fix -- that sounds unusual. However there have been lots of changes to XFS in the Linux kernel in recent years, and occasionally there has been a few nasty bugs, some of which I ran into. Linux-2.6.19 in particular had some nasty XFS filesystem corruption bugs.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by Sipper ( 462582 ) writes:
  
  When I had some EBS problems a couple years ago, I figured I would run xfs_check. It seemed to do absolutely nothing, even if there were disks known to be bad in the md array. xfs is nice and fast, but I haven't seen the xfs_check or xfs_repair to do either of the things I'd assume they'd do -- check and repair. I found it easier to delete the volumes and start from scratch, because any compromised xfs filesystem seems to be totally unfixable. Is fsck for xfs new?
  It's not you; xfs_repair will only operate on a filesystem that is not mounted at all. In other words, if you want to run xfs_repair, you need to do it after booting a LiveCD of some kind. Even when using the -d option for "dangerous" which implies that it will operate on a filesystem mounted read-only, xfs_repair will refuse and simply quit.
  However once you do boot a LiveCD and run xfs_repair, it does actaully repair an XFS filesystem. For obvious reasons this is critical to be able to do, because any i
  - - Re: (Score:2)
      
      by Sipper ( 462582 ) writes:
      
      In other words, if you want to run xfs_repair, you need to do it after booting a LiveCD of some kind.
      No you don't. Either force unmount the filesystem, or if you deem that too dangerous, boot into single user mode.
      ? I use XFS for the root filesystem. Tell me how I can completely umount it and then run xfs_repair -- which has to be read from the same filesystem I just umounted.
      Stop trying to oversimply things you don't understand.
      - Re: (Score:2)
        
        by AndroSyn ( 89960 ) writes:
        
        You have your root filesystem mounted read only and then run xfs_repair on it. Sometimes getting your root filesystem remounted read-only can be tricky, however. Sometimes this requires passing init=/bin/sh to the kernel, so you start with no other processes running. However you go about getting your root filesystem mounted read only, after you run xfs_repair(or e2fsck for that matter really) you reboot immediately.
        Stop trying to oversimply things you don't understand.
        Perhaps you don't understand things
        
        Re: (Score:3)
        
        by Sipper ( 462582 ) writes:
        
        You have your root filesystem mounted read only and then run xfs_repair on it. Sometimes getting your root filesystem remounted read-only can be tricky, however. Sometimes this requires passing init=/bin/sh to the kernel, so you start with no other processes running. However you go about getting your root filesystem mounted read only, after you run xfs_repair(or e2fsck for that matter really) you reboot immediately.
        Just tested it [on the box in which I'm using XFS on top of LUKS encyryption], and I didn't like the results.
        grub2 by default on Debian makes a "recovery" boot option to boot into single user mode, but even with this as you mention it's required to modify the boot option and add init=/bin/sh in order to actually be able to mount the root filesystem read-only. However after finally succeeding in diong this, xfs_check reports about a full screen of errors concerning file and directory link counts, which all
      - Re: (Score:2)
        
        by Sipper ( 462582 ) writes:
        
        Check it:
        [root@host ~]# mount /dev/sdb5 /mnt/test
        [root@host ~]# mount -o remount,ro /dev/sdb5
        [root@host ~]# xfs_repair -d /dev/sdb5
        Phase 1 - find and verify superblock...
        Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps... ....
        so boot to single user mode with a readonly root, grab the console, and off you go, same as, say, ext3.
        Try it with LUKS encryption on top. See a further reply from me to another commenter where I detail what happens when I try it. The results are not like the above.
Why bother? (Score:2)

by Waffle Iron ( 339739 ) writes:

They're testing 70 TB of storage, so with current hard drive quality, the odds of an unrecoverable read error are probably close to 100%. It would be simpler to write a two-line fsck utility to report it:
#!/bin/sh exit 1
- Re: (Score:2)
  
  by pankkake ( 877909 ) writes:
  
  Except they were using RAID60.
  - Re: (Score:2)
    
    by _LORAX_ ( 4790 ) writes:
    
    Unless every read does a checksum ( they don't or it would kill performance ) then there is still the possibility of a silent read corruption. At 70TB it would be rare, but not as rare as many would think and would depend on the sector size and checksum on the individual drives.
    - Re: (Score:2)
      
      by isorox ( 205688 ) writes:
      
      Unless every read does a checksum ( they don't or it would kill performance ) then there is still the possibility of a silent read corruption. At 70TB it would be rare, but not as rare as many would think and would depend on the sector size and checksum on the individual drives.
      Ideally you'd have something like zfs's scrubbing in the background. Or keep it in the application level (the application stores metadata about the files, may as wel throw in a checksum on create, then have a background checker), however a 1 bit error in an mpeg file isn't important.
      And when you're creating and destroying data at multi-gigabit speed, how do you perform backups?
    - Re: (Score:2)
      
      by colinrichardday ( 768814 ) writes:
      
      Unless every read does a checksum ( they don't or it would kill performance )
      How does that relate to using the journal checksum option on ext4?
      - Re: (Score:2)
        
        by _LORAX_ ( 4790 ) writes:
        
        jornal checksumming only prevents errors in the journal, not once the data has been written to the main storage area. This was done primarily to ensure the atomic nature of the journal is not violated by a partial write.
- Re: (Score:3)
  
  by _LORAX_ ( 4790 ) writes:
  
  After evaluating our options in the 50-200TB range with room for further growth we ended up moving away from linux and to an object based storage platform with a pooled, snapshotted, and checksummed design. One of the major reasons for this was the URE problem, we would virtually be guaranteeing silent data corruption at that size with a filesystem that did not have internal checksums. The closest thing in the OS world would be ZFS whose openness is in serious doubt. It is scary how much trust the communi
  - - Re:Why bother? (Score:4, Interesting)
      
      by _LORAX_ ( 4790 ) writes: on Friday February 03, 2012 @02:47PM (#38918647) Homepage
      
      Our BTRFS evaluation resulted in rejecting it for some very serious problems ( what they claim are snapshots are actually clones, panic in low memory situations, no fsck, horrible support tools, developers who are hostile to criticism, pre-release software, ... ). ZFS was nice, but limited to non-distributed systems and still had a non-trivial amount of volume and backend management headaches. Personally I use ZFS for my personal servers at home ( incremental snapshots are the bomb ) but out production systems needed more.
      
      Parent Share
      twitter facebook
    - Re:Why bother? (Score:4, Interesting)
      
      by Guspaz ( 556486 ) writes: on Friday February 03, 2012 @03:41PM (#38919467)
      
      ZFS now runs pretty well on Linux too, as a kernel module, thanks to zfsonlinux. If you're running a Debian-based distro, installing it is trivial (one command to add the PPA, one command to install the package).
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Electricity Likes Me ( 1098643 ) writes:
        
        ZFSonLinux's speed isn't really up there yet on Linux.
        It might only be a home system, but that to my mind makes it more important: I'd like to spend as little time maintaining as possible and that means I want to saturate my gigabit NIC when I need to.
    - Re: (Score:2)
      
      by ratsg ( 544275 ) writes:
      
      and ZFS is available to Mac OS X systems as an add on. Both opensource, and as of this week, a commercial version is available.
      There is very little reason to be running a system with out ZFS, unless you are running AIX, HP-UX or IRIX.
  - - Re: (Score:2)
      
      by Nick Ives ( 317 ) writes:
      
      Unrecoverable read error. It was mentioned in the OP.
      If you have a 200TB hard disk array then it's certain that you will encounter data corruption.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  They're testing 70 TB of storage, so with current hard drive quality, the odds of an unrecoverable read error are probably close to 100%. It would be simpler to write a two-line fsck utility to report it:
  #!/bin/sh exit 1
  That is only when you use the minimal guarantees from the datasheets. In practice, with healthy disks, read errors are a lot less common.
  - Re: (Score:2)
    
    by _LORAX_ ( 4790 ) writes:
    
    That is only when you use the minimal guarantees from the datasheets. In practice, with healthy disks, read errors are a lot less common.
    Are you willing to bet 70TB+ on it, because that's what you are doing.
- - Re: (Score:2)
    
    by _LORAX_ ( 4790 ) writes:
    
    Well since anything over 100TB is not supported by the vendor I would say not really a great idea. The reason it's not supported is there is no reasonable way to maintain ( things like an error would result in days worth of outages to fsck and/or restore from backup ).
Breaking News! (Score:2, Funny)

by Anonymous Coward writes:

This just in:
Full filesystem scans take longer as the size of the filesystem increases.
News at 11.
Damage? (Score:4, Funny)

by eggstasy ( 458692 ) writes: on Friday February 03, 2012 @02:02PM (#38917765) Journal

Honey badger don't give a fsck.

Share
twitter facebook
Who would engineer a storage system like that? (Score:2, Insightful)

by Anonymous Coward writes:

A single file system that big without checking features that file systems like ZFS or clustering file stores provide seems insane to me.
- - Re: (Score:2)
    
    by Mysticalfruit ( 533341 ) writes:
    
    That's the same reason as well. I've got a 4TB san at home and I'm using ZFS on linux (kernel modules, not fuse) to manage it. Certain parts of it are also backed up other places, but I run a zfs scrub on it once a week. One reason I chose ZFS over ext4 was that I wanted to be able to add disks and grow the filesystems as painlessly as possible. Since the disks are hanging off a mediocre onboard controller, the idea of having to fsck 4TB in the case of power outage / crash seemed craptastic. So far I'v
poor test.. bad results (Score:2)

by _LORAX_ ( 4790 ) writes:

A much better test of linux "big data"
1) write garbage to X blocks
2) run fsck if no errors found, repeat step 1
How long would it take before either of these filesystems noticed a problem and how many corrupt files do you have? With a real filesystem you should be able to identify and/or correct the data before it takes out any real data.
print article in one page is here : (Score:2)

by godrik ( 1287354 ) writes:

http://www.enterprisestorageforum.com/print/storage-hardware/linux-file-system-fsck-testing----the-results-are-in.html [enterprise...eforum.com]
going through 3 pages is so annoying...
Just how bad is it? (Score:2)

by Minwee ( 522556 ) writes:

Each pool is a LUN that is 3.6TB in size before formatting or actually 3,347,054,592 bytes as reported by "cat /proc/partitions".
a file system with about 72TB using "df -h" or 76,982,232,064 bytes from "cat /proc/partitions"
Yeah, I think there's definitely a scaling problem there.
Or perhaps a reading comprehension problem, since /proc/partitions reports in blocks, not bytes, but either way it doesn't inspire any kind of confidence in the rest of their testing methodology.
Damage? (Score:3)

by erice ( 13380 ) writes: on Friday February 03, 2012 @04:55PM (#38920415) Homepage

When an article about fsck has a tag line of "What's the damage", I expect to see some discussion of how fsck deals with a damaged file system.
The time required to fsck a file system that doesn't need checking is less interesting and inconsistant with the title. Although, if fsck had complained about the known clean file system that would be interesting.

Share
twitter facebook
On the face of it, this is poorly done (Score:2)

by Antibozo ( 410516 ) writes:

1. Why did they put a label on the RAID devices? They should have just used /dev/sd[b-x] directly, and not confused the situation with a partition table.
2. Did they align the partitions they used to the RAID block size? They don't indicate this. If they used the default DOS disk label strategy of starting /dev/sdb1 at block 63, then their filesystem blocks were misaligned with their 128 kiB RAID block size, and one in every 32 filesystem blocks will span two disks (assuming 4 kiB filesystem blocks).
3. Why d
Obsolete? (Score:2)

by thsths ( 31372 ) writes:

I am not sure it has much impact, but why would you use a 5 year old linux kernel to perform the test? Maturity is all very nice, but if you are pushing technology, it is not always the best approach.
- Re: (Score:2)
  
  by hobarrera ( 2008506 ) writes:
  
  I'll go tell _average joe/jane_ to go and get AIX, and dump ubuntu+unity which they like so much because it's shiny and pretty.
  - Re: (Score:3)
    
    by hawguy ( 1600213 ) writes:
    
    I'll go tell _average joe/jane_ to go and get AIX, and dump ubuntu+unity which they like so much because it's shiny and pretty.
    Few average Joe's have 72TB of disk space, and even for those that do, they're probably ok with 30 - 60 minutes of FSCK time. And more likely, instead of 100's of millions of files, they probably have a few million, so their fsck time will be in the 3 - 15 minute time range.
    I've seen servers that take over 3 minutes for their POST check.
    - Re: (Score:2)
      
      by jedidiah ( 1196 ) writes:
      
      So is this about big filesystems or lots of tiny files?
      'cause they are not the same thing.
      How many files is a lot? 300K? 10M? 100M?
    - - Re: (Score:2)
        
        by cryptographrix ( 572005 ) * writes:
        
        ...until you have a drive die during a scrub, destroy a zfs filesystem in a deduplicating zpool, or any other number of things that makes ZFS **ANGRY**, that is. and despite all that, I still trust it more than any most linux filesystems.
      - Re:linux is fail (Score:4, Funny)
        
        by Saxophonist ( 937341 ) writes: on Friday February 03, 2012 @05:09PM (#38920575)
        
        No, you're thinking of ReiserFS.
        
        Parent Share
        twitter facebook
- Re: (Score:2)
  
  by sunderland56 ( 621843 ) writes:
  
  OK, so I have a large x86/64 server and want to follow your advice. Can you please tell me where you can get AIX, or HP-UX, to run on X86?
  - Re: (Score:2)
    
    by evol262 ( 721773 ) writes:
    
    I like how you completely ignored Solaris yet still presented the comment as if it was a valid counterargument.
    - Re:linux is fail (Score:4, Funny)
      
      by ifrag ( 984323 ) writes: on Friday February 03, 2012 @03:15PM (#38919069)
      
      I like how you completely ignored Solaris yet still presented the comment as if it was a valid counterargument.
      I also like how GP completely ignored Solaris. I just like the fact it is being ignored.
      
      Parent Share
      twitter facebook
- Re:linux is fail (Score:5, Insightful)
  
  by gweihir ( 88907 ) writes: on Friday February 03, 2012 @02:36PM (#38918467)
  
  A cranky coward from the shadows is not s reliable source of information.
  I have used AIX and Solaris, and I can say that a lot of stuff is easier on Linux.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by bolthole ( 122186 ) writes:
    
    What "stuff"?
    Give actual, useful comparisons.
    Otherwise, your comment can be reduced to,
    "I am most familiar with linux. Therefore, using linux is easier for me"
    - Re: (Score:2)
      
      by Aighearach ( 97333 ) writes:
      
      IBM said please don't use AIX, use Linux instead. That was like... 10+ years ago.
  - Re:linux is fail (Score:5, Interesting)
    
    by jd ( 1658 ) writes: <imipak@yahoGINSBERGo.com minus poet> on Friday February 03, 2012 @05:58PM (#38921161) Homepage Journal
    
    A lot of stuff is also faster on Linux, particularly on the x86. Solaris x86 is dog slow. AIX ("aches") is an appropriate name for a mainframe OS that never really got the hang of this new-fangled "interactive user" stuff. It's a good mainframe OS, that is what it is designed for, tuned for and intended for, but traditional mainframe batch transactional work isn't the sort of payload that is typically run these days. The high-end users want hard real-time (i.e.: they know to the microsecond - or nanosecond, in some cases - exactly when each process will start and stop) for data collection, data analysis and simulation. The data centers want massive multithreading for gigantic servers with minimal overhead and service guarantees per thread. The typical user wants extremely low latency interactive. None of these are pre-scripted batch jobs.
    Now, if you wanted to develop a data warehouse for, say, technical writings, journalism, etc, where you're compiling a collection of things that can be typeset overnight, that may be doable as a batch job. However, anyone planning on publishing a journal that needs 72 terabytes of storage had best consider the marketplace a little more closely first. A publishing company, say Nature, might conceivably have use for AIX for batch work. I could see the number of submissions, referee responses and article selections per journal being such that a mainframe would be a perfectly valid way to do things. Even then, it might still be sufficiently small that a live transactional database would be more cost-effective.
    Traditionally, batch processing has been a niche market for electrical and gas companies, etc, where the number of customers is staggering. Even then, it has largely been replaced with live transactional systems because customers want things adjusted NOW and not overnight or at the end of the week.
    Mass mailers still use batch processing, but printing is the bottleneck and there is no point in having an expensive OS process everything in a fraction of a second on an expensive mainframe when it takes N actual real-world seconds before a printer becomes available to take the next block of data. You need run no faster than the slowest component because the end produce won't be delivered any faster. You would have to have a gigantic number of printers before the OS became a significant factor and most shops just don't have that kind of printing power.
    
    Parent Share
    twitter facebook
- Re:linux is fail (Score:4, Informative)
  
  by aix tom ( 902140 ) writes: on Friday February 03, 2012 @04:16PM (#38919935)
  
  You see my nick?
  AIX sucks more than Linux.
  Usual process for "weird"* AIX Problems:
  1) weird problem occurs after install. You report problem to IBM.
  2) IBM asks for your software version, see they are the newest ones available, and say they look into it.
  3) You ask several month later if they did find anything. They ask for your software version, they ask you to upgrade and see if the problem goes away.
  4) You upgrade to newest version.
  5) go to 2)
  *There are of course non-weird problems where you get the answer from IBM support in 2-3 days, and from Linux forums in 2-3 minutes.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by jimicus ( 737525 ) writes:
    
    There are of course non-weird problems where you get the answer from IBM support in 2-3 days, and from Linux forums in 2-3 minutes.
    I really wouldn't paint Linux support in such rosy terms. Many forums are heading in the direction of the blind leading the blind; application-specific mailing lists and IRC channels, while improving, still have a slight tendency to say "RTFM n00b!". (Or, as happened to me, "Can't be done. It's a stupid demand anyway. Fuck off" - twenty minutes later I figured out how to do it on my own, so it evidently could be done...)
- - - Re:linux is fail (Score:5, Funny)
      
      by Anonymous Coward writes: on Friday February 03, 2012 @02:38PM (#38918513)
      
      sudo kill yourself
      ;-)
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by impaledsunset ( 1337701 ) writes:
        
        kill -9 $$ # does the job pretty well
        
        Re: (Score:3)
        
        by fnj ( 64210 ) writes:
        
        killall Anonymous\ Coward
        
        Re:linux is fail (Score:4, Interesting)
        
        by jd ( 1658 ) writes: <imipak@yahoGINSBERGo.com minus poet> on Friday February 03, 2012 @05:32PM (#38920875) Homepage Journal
        
        Works best if you use the "Doom as Sys Admin" hack [sourceforge.net].
        
        Parent Share
        twitter facebook
    - - Re: (Score:3, Funny)
        
        by lvxferre ( 2470098 ) writes:
        
        Why would you replace a zero-ed string with another? At least use /dev/random, bro.
- Re: (Score:3)
  
  by Gazzonyx ( 982402 ) writes:
  
  They were using 15K RPM SAS drives. Your 7200 RPM drives aren't going to touch the speed of 15K RPM drives on a SAS backplane. Not by a long shot.
- Re: (Score:2)
  
  by Guspaz ( 556486 ) writes:
  
  Isn't that the point of using a filesystem that can do online scrubs, like ZFS? As far as I know, ZFS also checks metadata when scrubbing.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

fsck speed, want safety (Score:3, Insightful)

Re:fsck speed, want safety (Score:5, Insightful)

Re:fsck speed, want safety (Score:5, Insightful)

Re: (Score:2)

Re:fsck speed, want safety (Score:4, Interesting)

Re:fsck speed, want safety (Score:5, Interesting)

Re: (Score:3)

Re:fsck speed, want safety (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re:fsck speed, want safety (Score:4, Insightful)

Re:fsck speed, want safety (Score:4, Insightful)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Interesting)

Re:fsck speed, want safety (Score:5, Interesting)

Re: (Score:2)

Re:fsck speed, want safety (Score:5, Informative)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:fsck speed, want safety (Score:5, Funny)

Re:fsck speed, want safety (Score:5, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

fsck xfs does something? (Score:3, Interesting)

Re: (Score:2)

Re:fsck xfs does something? (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Why bother? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Why bother? (Score:4, Interesting)

Re:Why bother? (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Breaking News! (Score:2, Funny)

Damage? (Score:4, Funny)

Who would engineer a storage system like that? (Score:2, Insightful)

Re: (Score:2)

poor test.. bad results (Score:2)

print article in one page is here : (Score:2)

Just how bad is it? (Score:2)

Damage? (Score:3)

On the face of it, this is poorly done (Score:2)

Obsolete? (Score:2)

Re: (Score:2)