Become a fan of Slashdot on Facebook

EXT4 Data Corruption Bug Hits Linux Kernel 249

Posted by Soulskill on Wednesday October 24, 2012 @03:22PM from the plenty-of-time-to-fix dept.

An anonymous reader writes "An EXT4 file-system data corruption issue has reached the stable Linux kernel. The latest Linux 3.4, 3.5, 3.6 stable kernels have an EXT4 file-system bug described as an apparent serious progressive ext4 data corruption bug. Kernel developers have found and bisected the kernel issue but are still working on a proper fix for the stable Linux kernel. The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."

This discussion has been archived. No new comments can be posted.

EXT4 Data Corruption Bug Hits Linux Kernel

Load All Comments

Search 249 Comments Log In/Create an Account

Comments Filter:

This is why I stick to Reiser (Score:5, Funny)

by Anonymous Coward writes: on Wednesday October 24, 2012 @03:33PM (#41755929)

I know he'd never do anything to harm me or my data.

Share
twitter facebook
- Re: (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  Or your wife?
- Re: (Score:2, Funny)
  
  by localhost8080 ( 819098 ) writes:
  
  yeah, reiser 4 has some killer features
- Re: (Score:2)
  
  by psm321 ( 450181 ) writes:
  
  I know you're making a joke about the person, but I've had many corruption issues with ReiserFS. Granted, this was in its earlier days, but after it had been declared stable for use. I gave up on it after the problems, so no idea if later versions improved.
I don't see the problem then... (Score:5, Funny)

by Zapotek ( 1032314 ) writes: <tasos...laskos@@@gmail...com> on Wednesday October 24, 2012 @03:34PM (#41755939)

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.
We're talking about Linux users here...move along.

Share
twitter facebook
- Re: (Score:2, Troll)
  
  by vistapwns ( 1103935 ) writes:
  
  What is it about Linux users' jokes that remind me of the Iraqi Information Minister? ;)
- Re: (Score:2)
  
  by starless ( 60879 ) writes:
  
  Even though my linux desktop machine runs for long periods without needing rebooting, there are exceptions:
  My several year old Pioneer television runs linux. It crashes and reboots if I change HD channels more than 5 or 6 times.
  My roku box needs to be rebooted from time to time.
  So does my android phone.
  - Re: (Score:2)
    
    by RR ( 64484 ) writes:
    
    Even though my linux desktop machine runs for long periods without needing rebooting, there are exceptions: My several year old Pioneer television runs linux. It crashes and reboots if I change HD channels more than 5 or 6 times. My roku box needs to be rebooted from time to time. So does my android phone.
    All those are also unlikely to be running EXT4. They store the system on flash and use SquashFS, JFFS2, or YAFFS2. The ones that use eMMC might use EXT4, but Samsung just donated F2FS for that use.
    Also, they tend to use very old kernels.
- - Re: (Score:2)
    
    by Rich0 ( 548339 ) writes:
    
    Nope - Greg does a decent job with the Gentoo stable kernels. Granted, the current Gentoo stable kernel has a different ext4 bug that can cause panics when files are deleted, which is why I'm running unstable at the moment (I was getting nightly crashes when tmpreaper ran). Oh, the irony.
Really clever... (Score:5, Funny)

by K. S. Kyosuke ( 729550 ) writes: on Wednesday October 24, 2012 @03:36PM (#41755963)

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often."
They're trying to boost the average uptime of all installations by making people keep their machines turned on. It's just a continuation of the uptime war waged with the BSD folks!

Share
twitter facebook
- Re: (Score:2)
  
  by OzoneLad ( 899155 ) writes:
  
  Actually, it's trying to punish you for having a crappy uptime.
LKML Slashdotted (Score:2)

by o'reor ( 581921 ) writes:

Brilliant. Well, it certainly worries this Linux developer -- although I mostly rely on pre-3.0 kernels. Wasn't there a rule on Slashdot about mirroring articles before posting links to them ?
- Re: (Score:2)
  
  by Bill, Shooter of Bul ( 629286 ) writes:
  
  Not that I've ever remebered. It was oft suggusted in comments, but most websites are nearly slashdot prooff these days. Kind of surprised that lkml is so sluggish under the load.
  - - Re: (Score:2)
      
      by Bill, Shooter of Bul ( 629286 ) writes:
      
      I just pray someone hit the turbo button, we need all of that DX and all of the number co-processing it can give us.
      - Re: (Score:3)
        
        by Score Whore ( 32328 ) writes:
        
        It was the 486DX that brought the FPU on chip. The 386DX had a 32-bit wide data bus and the 386SX has a 16-bit wide data bus, as well as only 24-bits of the address bus hooked up externally.
Interesting bug, but don't get excited. (Score:5, Informative)

by dacut ( 243842 ) writes: on Wednesday October 24, 2012 @03:38PM (#41756001)

From Ted Ts'o's commentary, it's an optimization ("jbd2: don't write superblock when if its empty") gone awry:
The reason why the problem happens rarely is that the effect of the buggy commit is that if the journal's starting block is zero, we fail to truncate the journal when we unmount the file system. This can happen if we mount and then unmount the file system fairly quickly, before the log has a chance to wrap.
Basically, this optimization has the side effect of not updating the transaction log in this rare case. You can end up replaying old transactions after new ones, which will scramble metadata blocks. Given the rather unique conditions needed to hit this one, I'm not going to lose any sleep over any servers running without Ted's fix (though I'll certainly apply it once RedHat releases the patch).

Share
twitter facebook
- Re:Interesting bug, but don't get excited. (Score:5, Informative)
  
  by Tough Love ( 215404 ) writes: on Wednesday October 24, 2012 @03:58PM (#41756273)
  
  It means you could get an incorrect replay after a crash and end up needing to do a fsck. Good thing Ext2/3/4 fsck is awesome. Of course, having no replay bug will be much better. Note: the bug was introduced this October 8th. You are not running this kernel on your server or workstation unless you are a dev, it hasn't filtered through to distros yet.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by NotBorg ( 829820 ) writes:
    
    You are not running this kernel on your server or workstation unless you are a dev, it hasn't filtered through to distros yet.
    I'm a crazy, bad ass, rebel that uses ArchLinux for my workstation. Living wild and dangerous, I reclessly shutdown my heathen ext4 computer every night. I feel like I'm that evil mayhem guy on the Allstate commercials. RECALCULATING!
  - Re: (Score:2)
    
    by Bradmont ( 513167 ) writes:
    
    > it hasn't filtered through to distros yet.
    
    FTA:
    > Linux 3.4, 3.5, 3.6 stable kernels
    
    I'm running Ubuntu 12.10 stock kernel:
    % uname -r
    3.5.0-17-generic
  - Re: (Score:3)
    
    by WuphonsReach ( 684551 ) writes:
    
    Note: the bug was introduced this October 8th.
    
    Probably one of the more informative comments here.
    - Re:Interesting bug, but don't get excited. (Score:5, Informative)
      
      by fatphil ( 181876 ) writes: on Wednesday October 24, 2012 @06:29PM (#41758439) Homepage
      
      $ git show eeecef0af5e
      commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3
      Author: Eric Sandeen <sandeen@redhat.com>
      Date: Sat Aug 18 22:29:40 2012 -0400
      
      jbd2: don't write superblock when if its empty
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by fatphil ( 181876 ) writes:
        
        That's Linus' tree. This is Greg's:
        
        linux-stable$ git show 14b4ed22a6
        commit 14b4ed22a6b5fc1549504336131be4f5f6ba1bf4
        Author: Eric Sandeen <sandeen@redhat.com>
        Date: Sat Aug 18 22:29:40 2012 -0400
        
        jbd2: don't write superblock when if its empty
        
        commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3 upstream.
  - Re: (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    The offending commit is present in both Ubuntu's 12.10 and 13.04 generic kernels, though the package version are in proposed repositories.
  - - Re:Interesting bug, but don't get excited. (Score:4, Insightful)
      
      by Shimbo ( 100005 ) writes: on Wednesday October 24, 2012 @04:47PM (#41756943)
      
      There are certainly distributions out there using 3.4 and 3.5 kernels.
      Yes, but not many of them will push kernel updates all the way through to end users in a couple of weeks.
      
      Parent Share
      twitter facebook
      - Re:Interesting bug, but don't get excited. (Score:5, Informative)
        
        by Anonymous Coward writes: on Wednesday October 24, 2012 @05:34PM (#41757633)
        
        Ubuntu users are at risk.
        http://www.ubuntuupdates.org/package/core/quantal/main/proposed/linux-image-3.5.0-18-generic
        Look for " jbd2: don't write superblock when if its empty
        - LP: #1066176"
        If any Ubuntu users have proposed repo enabled and they've updated to 3.5.0-18, they're vulnerable.
        
        Parent Share
        twitter facebook
The file system dug too greedily... (Score:3, Funny)

by Bovius ( 1243040 ) writes: on Wednesday October 24, 2012 @03:43PM (#41756077)

...and too deep. It awoke a being of segfaults and kernel panics.

Share
twitter facebook
Part of the game (Score:2)

by ntropia ( 939502 ) writes:

At first I had mixed feelings of slight disappointment and concern, especially because it is the default filesystem in several distros, (including Android) [wikipedia.org]. Although, after some second thoughts, I have come to the following conclusions:

1) it is part of the game of having a continuous development toward improvement (most of the times) and new features implies some pitfalls. So far, benefits [wikipedia.org] are much larger than costs.

2) Despite the fact developers are still working on a fix, I wouldn't be surprised if it
- Re: (Score:2)
  
  by compro01 ( 777531 ) writes:
  
  This bug is only 10 days old. It's rather unlikely this has percolated down to anything important, much less Android, which still runs 3.0.31 from May.
  - Re:Part of the game (Score:4, Informative)
    
    by fatphil ( 181876 ) writes: on Wednesday October 24, 2012 @06:33PM (#41758489) Homepage
    
    It is *not* 10 days old.
    
    linux-stable$ git show 14b4ed22a6
    commit 14b4ed22a6b5fc1549504336131be4f5f6ba1bf4
    Author: Eric Sandeen <sandeen@redhat.com>
    Date: Sat Aug 18 22:29:40 2012 -0400
    
    jbd2: don't write superblock when if its empty
    
    commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3 upstream.
    
    This sequence:
    
    # truncate --size=1g fsfile
    # mkfs.ext4 -F fsfile
    # mount -o loop,ro fsfile /mnt
    # umount /mnt
    # dmesg | tail
    
    results in an IO error when unmounting the RO filesystem:
    
    [ 318.020828] Buffer I/O error on device loop1, logical block 196608
    [ 318.027024] lost page write due to I/O error on loop1
    [ 318.032088] JBD2: Error -5 detected when updating journal superblock for loop1-8.
    
    This was a regression introduced by commit 24bcc89c7e7c: "jbd2: split
    updating of journal superblock and marking journal empty".
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    
    diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
    index e149b99..484b8d1 100644
    --- a/fs/jbd2/journal.c
    +++ b/fs/jbd2/journal.c
    @@ -1354,6 +1354,11 @@ static void jbd2_mark_journal_empty(journal_t *journal)
    
    BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
    read_lock(&journal->j_state_lock);
    + /* Is it already empty? */
    + if (sb->s_start == 0) {
    + read_unlock(&journal->j_state_lock);
    + return;
    + }
    jbd_debug(1, "JBD2: Marking journal as empty (seq %d)\n",
    journal->j_tail_sequence);
    
    Parent Share
    twitter facebook
Reiserfs became 'murderfs'... (Score:2)

by Omnifarious ( 11933 ) * writes:

What term do we get to use for ext4 now? It's unfortunate that Theodore Tso is actually a pretty decent guy instead of being a murderer (and a jerk). So there aren't any obviously negative terms that come to mind.
But clearly, something needs to be done along these lines, as well as a legion of people who forever more claim that ext4 corrupts your data and you should never use it and stick with ext3 instead.
- Re:Reiserfs became 'murderfs'... (Score:5, Funny)
  
  by Anonymous Coward writes: on Wednesday October 24, 2012 @03:59PM (#41756295)
  
  So clearly the answer is General Tso's FS. Delicious, but you'll lose your data an hour later.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by corychristison ( 951993 ) writes:
  
  What term to we get to use for ext4 now?
  EXTerminator 4. Because its just awful. (Not really)
  EXTerminator 4. Because its corruptt
  EXTerminator 4. Because its on a (data) killing spree.
Summary is wrong (Score:5, Informative)

by DrJimbo ( 594231 ) writes: on Wednesday October 24, 2012 @04:05PM (#41756397)

The EXT4 file-system can experience data loss if the file-system is remounted (or the system rebooted) too often.
This is wrong. The problem occurs when the fs is unmounted too *soon*. Twice in a row. The bug only appears if the journal buffer does not wrap. You only get catastrophic results if this happens twice in a row.

Share
twitter facebook
- Re:Summary is wrong (Score:5, Interesting)
  
  by Anonymous Coward writes: on Wednesday October 24, 2012 @04:27PM (#41756669)
  
  This appears to be untrue. My latest tests suggest that it happens if a single unclean umount happens while the fs is mounted in 3.6.3. (At least, I saw corruption in /var after a single boot, followed by a rescue boot into 3.6.1 and fsck: every filesystem that had journal replay invoked also had corruption.)
  -- N., original reporter, not much enjoying his fifteen minutes of fame since it comes with happy fun filesystem corruption attached: captcha is 'contrite', how appropriate
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by DrJimbo ( 594231 ) writes:
    
    I suspect that unclean umounts may trigger the bug too but that does not contradict anything I said. I did not say there was no corruption when you hit the bug once, I said there was catastrophic corruption when you hit it twice in a row. If a bug can be triggered by a clean umount, it is not very surprising if it also gets triggered by an unclean umount.
    Your experience seems to confirm my correction. It is not about how *often* you mount, it is about how you umount. This is a non-trivial distincti
  - - Re: (Score:2)
      
      by Bronster ( 13157 ) writes:
      
      Man - I should wander back to the other place and read your war stories.
      So when is Oracle going to release ZFS to the Linux world rather than pushing btrfs which is still not finished?
How many times (Score:2)

by MetalliQaZ ( 539913 ) writes:

... can we get the words "stable", "linux", and "kernel" into a single summary? I like this game.
Well of course! (Score:3)

by Panaflex ( 13191 ) writes: <convivialdingo@ y a h oo.com> on Wednesday October 24, 2012 @04:47PM (#41756937)

They're mounting it wrong!
When you mount your disks, you need to be sure of proper head alignment. Make sure she's spun up properly as well, otherwise the disks could be surprised and jump away causing a crash. Lastly, my geek friends, mounting too often can cause burning friction which can destroy data and cause irritation and discomfort.

Share
twitter facebook
- Re:Well of course! (Score:4, Funny)
  
  by isorox ( 205688 ) writes: on Thursday October 25, 2012 @11:26AM (#41765259) Homepage Journal
  
  Lastly, my geek friends, mounting too often can cause burning friction which can destroy data and cause irritation and discomfort.
  I never had a problem with frequent mounting, however I have now found a side effect from a mount I performed last year. A child-process was forked into existence shortly after the mount, and now we find we're continuously receiving interrupts from the process, which has affected pretty much every aspect of system administration.
  I find that performing the mount is occasionally possible, but having to umount to give resources to deal with the child process (which often core dumps, and needs a lot of user interaction), before ejecting can lead to frustration and cold showers.
  Most of the time my team is simply trying to run sleep whenever we can.
  
  Parent Share
  twitter facebook
Wait what? (Score:3)

by freman ( 843586 ) writes: on Wednesday October 24, 2012 @06:21PM (#41758321)

People reboot linux?

Share
twitter facebook
Don't believe most of the early stories on the web (Score:2, Informative)

by Anonymous Coward writes:

I have a Google+ post where I've posted my latest updates to this still-developing story:
https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7
Also, I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is cu
Most of the early stories on the web are wrong.... (Score:5, Informative)

by tytso ( 63275 ) writes: on Wednesday October 24, 2012 @09:42PM (#41760179) Homepage

I have a Google+ post where I've posted my latest updates to this still-developing story:
https://plus.google.com/117091380454742934025/posts/Wcc5tMiCgq7 [google.com]
Also, I will note that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by all of the major file system authors). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I've run a very large set of automated regression tests on a very regular basis, and certainly before pushing the latest set of patches to Linus. So while it is no guarantee of 100% perfection, I and many other kernel developers *are* willing to eat our own dogfood.

Share
twitter facebook
hmmm... Android? (using ext4?) (Score:2)

by neurocutie ( 677249 ) writes:

what's this mean about various versions of Android using ext4? I think I just flashed my tablet to use ext4 (ugh)... really don't want corruption my tablet...
- Re: (Score:3)
  
  by MtHuurne ( 602934 ) writes:
  
  Android is unaffected: the bug was introduced after Linux 3.6 and no Android kernel is anywhere near that recent.
patch (Score:3)

by anonieuweling ( 536832 ) writes: on Thursday October 25, 2012 @05:02AM (#41762065)

The more recent patch at http://marc.info/?l=linux-kernel&m=135105626207228&w=2 [marc.info] fixes stuff.

Share
twitter facebook
- Re:Bisected? (Score:5, Informative)
  
  by Slayne ( 10400 ) writes: on Wednesday October 24, 2012 @03:30PM (#41755891) Homepage
  
  Nope - bisection is a common technique for tracking down the cause of a bug by doing a binary search through the code history.
  https://en.wikipedia.org/wiki/Code_Bisection
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Tough Love ( 215404 ) writes:
    
    The summary should say "bisected and found" not "found and bisected". Bisecting is a way of finding bugs.
    - Re: (Score:2, Funny)
      
      by fireman sam ( 662213 ) writes:
      
      Bisecting is also a way of killing bugs - or perhaps Bisecting is when you act like an insect that goes both ways.
    - Re:Bisected? (Score:5, Informative)
      
      by Just Some Guy ( 3352 ) writes: <kirk+slashdot@strauser.com> on Wednesday October 24, 2012 @06:56PM (#41758765) Homepage Journal
      
      The summary should say "bisected and found" not "found and bisected". Bisecting is a way of finding bugs.
      No. They found the bug, then bisected the commits between "last known working" and HEAD to discover what patch caused it.
      
      Parent Share
      twitter facebook
      - Re:Bisected? (Score:4, Informative)
        
        by Tough Love ( 215404 ) writes: on Wednesday October 24, 2012 @09:08PM (#41759941)
        
        Ah I see, we have ambiguity about what "find a bug" means. From the user's perspective, "finding a bug" means producing the buggy behavior. But from the developer's perspective, "finding a bug" means finding the erroneous code. And we are talking about developers here. From my perspective, until the bug was "found" by bisecting it was only "known to exist", not found. See?
        By the way, I've actually bisected bugs, have you? No? OK.
        
        Parent Share
        twitter facebook
- Re:Bisected? (Score:5, Funny)
  
  by Gothmolly ( 148874 ) writes: on Wednesday October 24, 2012 @03:31PM (#41755899)
  
  No this means the kernel has bug-like tendencies from time to time, but is not exclusively buggy. For instance when it's in college, or if its at a bar, and has had a few drinks, well then it might be buggy, but normally at work and at home and to all its friends it acts stable.
  
  Parent Share
  twitter facebook
  - - Re: (Score:3)
      
      by CheshireDragon ( 1183095 ) writes:
      
      I think YOU are the one who didn't get the joke...
- Re:Bisected? (Score:5, Informative)
  
  by petermgreen ( 876956 ) writes: <plugwash.p10link@net> on Wednesday October 24, 2012 @03:51PM (#41756179) Homepage
  
  What they actually split in half is a sequence of changesets (also known as commits).
  The idea is you have a seqence of changesets that take you from the last known good revision to the first known bad revision. By splitting that sequence in half and determining if the revsion in the middle is good or bad you can in principle halve the number of revisions between last known good and first known bad until you find the revision that introduced the bug. Reality is messier because of nonlinear history, because some revisions may be "broken" such that it is not possible to determine if they are "good" or "bad" and because some bugs may be difficult to test for but still bisection is a useful tool for finding problem revisions among a long history relatively easill.
  
  Parent Share
  twitter facebook
- Re: (Score:3)
  
  by FatdogHaiku ( 978357 ) writes:
  
  They split it in half?
  I know it's wrong but I just got this mental image of someone moving all the 0's to one side of a page and all the 1's to the other side...
  - Re: (Score:3, Funny)
    
    by Nivag064 ( 904744 ) writes:
    
    Nah!
    Your'e wrong!!
    The 0's go to the top of the page, and the 1's to the bottom!!!
    (As the 0's have air bubbles that make them float...)
    [An irrelevant irrelevancy?]
- - Re: (Score:2)
    
    by i_ate_god ( 899684 ) writes:
    
    presumably from this post, "being technical" only means complete knowledge of all tools.
    I'm guessing you find it very hard to find work with that kind of understanding of what "being technical" implies.
  - - - Your Papers Please (Score:5, Funny)
        
        by Anonymous Coward writes: on Wednesday October 24, 2012 @04:05PM (#41756393)
        
        grammar nazi's
        grammar Nazis
        
        Parent Share
        twitter facebook
      - Re: (Score:2)
        
        by mcgrew ( 92797 ) * writes:
        
        grammar nazi's?
        *facepalm* I hope that was deliberate.
- - Re: (Score:3)
    
    by newcastlejon ( 1483695 ) writes:
    
    Perhaps, if disect is a real word, but dissect means "cut up/apart", not specifically into two parts.
    - Re:Bisected? (Score:4, Funny)
      
      by EMR ( 13768 ) writes: on Wednesday October 24, 2012 @05:06PM (#41757221)
      
      If God forks the Universe every time you roll a die, he'd better have a damned good memory.
      Nah, He only needs the latest SHA1 for each roll outcome commit as that'll point up the GIT tree :-D
      
      Parent Share
      twitter facebook
- - Re: (Score:2)
    
    by zonky ( 1153039 ) writes:
    
    I'm a laptop owner, who uses Dmcrypt, and with a 2 second boot time off SSD, i never bother hibernating. Better check what kernel....
    - Re: (Score:3)
      
      by h4rr4r ( 612664 ) writes:
      
      This one occurred in october so pretty doubtful since none of the major distros are that up to date.
      - Re: (Score:2)
        
        by DeathFromSomewhere ( 940915 ) writes:
        
        From a fully updated Ubuntu 12.10 (no patch for this bug yet):
        
        $ uname -r
        3.5.0-17-generic
        
        From the summary:
        The latest Linux 3.4, 3.5, 3.6 stable kernels have an EXT4 file-system bug
      - Re: (Score:2)
        
        by fatphil ( 181876 ) writes:
        
        It's not from october:
        
        linux-stable$ git show 14b4ed22a6
        commit 14b4ed22a6b5fc1549504336131be4f5f6ba1bf4
        Author: Eric Sandeen <sandeen@redhat.com>
        Date: Sat Aug 18 22:29:40 2012 -0400
        
        jbd2: don't write superblock when if its empty
        
        commit eeecef0af5ea4efd763c9554cf2bd80fc4a0efd3 upstream.
- Re:Reinventing the wheel (Score:5, Interesting)
  
  by UnknownSoldier ( 67820 ) writes: on Wednesday October 24, 2012 @04:09PM (#41756449)
  
  I have to agree with you. This is one of the best demos of ZFS around :)
  http://www.youtube.com/watch?v=QGIwg6ye1gE [youtube.com]
  ZFS solves 3 problems by taking a wholistic approach:
  * Volume Management
  * File System
  * Data Integrity
  Instead of fragmenting the problem into 3 layers which only have limited access and knowledge by using a unified layer you have more meta-information available to make smarter decisions.
  Some interesting essays:
  https://blogs.oracle.com/bonwick/entry/raid_z [oracle.com]
  https://blogs.oracle.com/bonwick/en_US/entry/rampant_layering_violation [oracle.com]
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Troll)
    
    by h4rr4r ( 612664 ) writes:
    
    Hopefully BTFS will conquer this.
    Blame SUN, they choose a license for ZFS to ensure it never had proper in kernel linux support. They did that because Linux was eating their lunch and still is.
    - Re:Reinventing the wheel (Score:5, Interesting)
      
      by UnknownSoldier ( 67820 ) writes: on Wednesday October 24, 2012 @04:49PM (#41756969)
      
      > Blame SUN, they choose a license for ZFS to ensure it never had proper in kernel linux support.
      That's a myth / blatant lie.
      Fork Yeah! The Rise and Development of illumos
      http://www.youtube.com/watch?feature=player_detailpage&v=-zRN7XLCRhc#t=1460s [youtube.com]
      Why You Need ZFS
      http://www.youtube.com/watch?v=6F9bscdqRpo [youtube.com]
      @5:40 I just want to clarify you comment "It would be illegal to ship"
      @5:45 I think there is a perception issue that we need to tackle.
      @5:55 One point that I would like to make because I think said earlier that I think we have much more in common then that separates us.
      @5:58 One of the most important things we all have in common is we are all open source systems.
      @6:02 And we need to end this self inflicted madness of open source licensing compatibility.
      @6:12 I think that it is a boogey man and we letting it us hold us back.
      @6:19 You say it would be illegal to ship. I say no one has standing
      @6:24 The GPL was never ever designed to counter-act other open source licenses.
      @6:33 That is a complete rewrite of history to believe the GPL was designed to be at war with BSD or with Cuddle.
      @6:39 The GPL was at war with properiety softwware. And thank the GPL and Stallman open source won.
      @6:45 That is the whole point. Open source won.
      @6:49 We are pissing on our own victory parade by not allowing these technologies to flow between systems.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by h4rr4r ( 612664 ) writes:
        
        http://en.wikipedia.org/wiki/Common_Development_and_Distribution_License#GPL_incompatibility [wikipedia.org]
        There seems to be quite an argument over the matter.
      - Re: (Score:2)
        
        by amorsen ( 7485 ) writes:
        
        That's a myth / blatant lie.
        You are going to come up with better arguments than that. Your quotes do not support that statement.
        Sun was about as Linux-hostile as any company could get, basically from 1995 and forwards. They tried to do as much as they could to make sure that Linux did not benefit in any way from any Solaris or Sun technology.
        Of course it makes sense that they tried to fight against the OS which was destined to make them obsolete. Luckily they did not have a particularly competent legal team.
        
        Re: (Score:2)
        
        by UnknownSoldier ( 67820 ) writes:
        
        > Your quotes do not support that statement.
        I'm not sure how clearer you can get with "Open source won. We are pissing on our own victory parade by not allowing these technologies to flow between systems."
        You _do_ realize who said them, right?
        Bryan Cantrill (wrote dtrace) worked with Jeff Bonwick (designed/wrote ZFS.) They were both together at Sun for 14 and 20 years respectively. If you watch the "Fork Yeah!" video the impression I get is that it looks like they wanted to open source as much possible b
        
        Re: (Score:2)
        
        by fatphil ( 181876 ) writes:
        
        > they wanted to open source as much possible but was held back by legal.
        
        The legal dept. at Sun?
        
        Re: (Score:2)
        
        by amorsen ( 7485 ) writes:
        
        If you watch the "Fork Yeah!" video the impression I get is that it looks like they wanted to open source as much possible but was held back by legal.
        So what if certain engineers wanted to open source things? They didn't get to make that decision.
        The quotes are implying that the GPL does not work and that you can combine CDDL-licensed code with GPL'd code and distribute the combination. That position is rather weird, but then again Sun did suffer from a reality distortion field when it came to legal issues. The only other person I have heard of with the same view is Jörg Schilling.
      - Re: (Score:2)
        
        by VortexCortex ( 1117377 ) writes:
        
        The GPL was at war with properiety softwware. And thank the GPL and Stallman open source won.
        Amen.
    - - Re: (Score:2)
        
        by h4rr4r ( 612664 ) writes:
        
        ZFS has not already been debugged on linux. Is there even a non-FUSE ZFS implementation for linux?
        I am not sure everything has to be done in one step. Do one thing and do it well. This holistic idea is nice in concept but often leads to the windows outcome. Not much gets done and what gets done is not that great if at any point "just works" just doesn't.
        
        Re: (Score:2)
        
        by icebraining ( 1313345 ) writes:
        
        There's a native kernel port of ZFS for Linux: http://zfsonlinux.org/ [zfsonlinux.org]
- Re: (Score:2)
  
  by dimeglio ( 456244 ) writes:
  
  ...or XFS with a recent kernel.
- - Re:Low impact (Score:5, Insightful)
    
    by jedidiah ( 1196 ) writes: on Wednesday October 24, 2012 @04:14PM (#41756511) Homepage
    
    > Windows has never had anything as serious as a file system corruption bug.
    That you know of...
    Since the Windows development process isn't open, there's no way for you to tell. You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by bertok ( 226922 ) writes:
      
      You don't get to see Microsoft's development versions and you don't get to see Microsoft's bug database.
      You're looking in the wrong place!
      They're called features, and they're on the technet website for all the world to see.
      Like how in older Windows versions, disks would be auto-mounted, and NTFS didn't have native active/active capability. In other words, if you made the slightest mistake in your FC zoning, then you could kiss your multi-terabyte cluster volume goodbye.
    - Re: (Score:3)
      
      by fatphil ( 181876 ) writes:
      
      >> Windows has never had anything as serious as a file system corruption bug.
      
      >That you know of...
      
      So what were all those chkdsk errors after BSODs?
    - Re: (Score:2)
      
      by ffflala ( 793437 ) writes:
      
      Windows has never had anything as serious as a file system corruption bug.
      I believe they've accomplished this by ensuring that NTFS fails safely to a state of corrupt registry hive errors instead.
  - Re: (Score:2)
    
    by Bengie ( 1121981 ) writes:
    
    I love Chan9 and MS Research and I think a lot of what MS makes is "cool", but we are all human and mistakes WILL be made. Linux has a great track record. This is also why BTRFS will take a while to get traction in the Enterprise. EXT4 and ZFS are still getting bug fixes.
  - Re:Low impact (Score:5, Informative)
    
    by h4rr4r ( 612664 ) writes: on Wednesday October 24, 2012 @04:16PM (#41756543)
    
    http://answers.microsoft.com/en-us/windows/forum/windows_cp-files/bug-report-serious-filesystem-corruption-and-data/17f69e19-92ca-4e1e-b9d5-f78f1ac4e963 [microsoft.com]
    Bugs happen. The difference here is that Linux development is done in the open so people find out about them.
    
    Parent Share
    twitter facebook
    - - Re: (Score:2)
        
        by h4rr4r ( 612664 ) writes:
        
        Show me their development and I bet I find one.
      - Re: (Score:2, Troll)
        
        by jones_supa ( 887896 ) writes:
        
        Seriously, I have to agree here. It is extremely rare for NTFS to get corrupted under Windows. It just wins this battle.
        On Linux front, I presume FS corruption bugs partly arise from the continuously evolving R&D development style of the kernel. New file systems get invented all the time and previous ones get tweaked. Can't say if it's good or bad, it's just another way of doing things. I myself have not wished much since the journal support of ext3.
        
        Re: (Score:2)
        
        by 0123456 ( 636235 ) writes:
        
        Seriously, I have to agree here. It is extremely rare for NTFS to get corrupted under Windows. It just wins this battle.
        I've never seen NTFS get corrupted. I have seen it delete multi-gigabyte files because they were open when Windows crashed.
        I've never seen ext3 get corrupted, or delete multi-gigabyte files because they were still open when Linux crashed (or, more likely, went down due to a power failure).
        I've never trusted ext4 after the early 'so what if I delete your data after a power failure?' arguments from the developers.
        
        Re: (Score:3)
        
        by petman ( 619526 ) writes:
        
        I've had whole NTFS partitions get corrupted, twice. In both instances, the partitions were formatted under Linux, specifically Ubuntu.
        
        Lesson learnt is, never format an NTFS partition under Linux. Personally, I think this functionality should be disabled. It's way too dangerous.
      - Re: (Score:3)
        
        by Neil Boekend ( 1854906 ) writes:
        
        Windows 7 should not have automounted the partition once it detected it wasn't forward compatible with the partition formatting. Forced mounting and formatting would be possible user choices. The bug is in the detection (there may not be any) or the action after the detection.
  - Re:Low impact (Score:5, Informative)
    
    by sk999 ( 846068 ) writes: on Wednesday October 24, 2012 @06:17PM (#41758291)
    
    Still, for all of the shit that Linux users talk about Windows, WINDOWS has NEVER had anything as serious as a FILE system CORRUPTION bug.
    Finally, someone talking sense ... oh wait.
    http://www.computerworld.com/s/article/9054178/Microsoft_s_Windows_Home_Server_corrupts_files [computerworld.com]
    "Microsoft's Windows Home Server CORRUPTS FILES"
    "'Don't edit' list includes photos, as well as Quicken and QuickBooks files, warns Microsoft; no word on patch"
    Never mind ...
    
    Parent Share
    twitter facebook
    - - Re:Low impact (Score:5, Informative)
        
        by sk999 ( 846068 ) writes: on Wednesday October 24, 2012 @07:36PM (#41759199)
        
        Nice try, but fail. That wasn't a bug in Windows, it was a bug in applications.
        Really? Not according to Microsoft.
        http://support.microsoft.com/kb/946676 [microsoft.com]
        "A BUG has been discovered in the way that the initial release of Windows Home SERVER manages FILE transfer and balancing across multiple hard drives. In certain cases, depending on application use patterns, timing, and the workload that is placed on the Windows Home Server-based computer, certain FILES could become CORRUPTED."
        "... For distributing data across the different hard drives that are MANAGED by WINDOWS Home Server, the WINDOWS Home Server mini-filter driver REDIRECTS I/O ... A BUG has been discovered in the REDIRECTION mechanism which, in certain cases, depending on application use patterns, timing, and workload, may cause interactions between NTFS, the Memory Manager, and the Cache Manager to get out of sync. This causes CORRUPTED data to be written to FILES."
        
        Parent Share
        twitter facebook
  - Re: (Score:2)
    
    by sjames ( 1099 ) writes:
    
    I doubt that's true. They may not have released a version with such a bug, but they probably did have them at some point. Remember, the vanilla kernel and LKML are the FOSS equivilent of the internal development process and it's releases to QA.
    If you want the post QA versions, use a distro kernel.
  - Re: (Score:3)
    
    by smpoole7 ( 1467717 ) writes:
    
    > Windows has never had anything as serious as a file system corruption bug.
    I'm going to assume that either you are joking, or you have only been using Windows for about 5 minutes.
    On the off chance that you are actually serious, Geoff Chappell documented a case some years ago in which Windows would occasionally toggle a byte (might have been a word; can't remember now) on the hard drive. Just one byte in a random sector somewhere on the drive. Happy flower sunshine.
    You should also Google "Windows disk co
    - Re: (Score:3)
      
      by smpoole7 ( 1467717 ) writes:
      
      OK, and now I'm probably off topic, but I'm an older guy and as we get older, we like to reminisce. (Between bellowed exhortations to remove ones feet from the lawn, of course.)
      I remember a million years ago, when I was developing VxDs for Windows 95. I rigged up the debugger to go active early in the boot ... and had to disable it.
      Windows 95 generated SO MANY faults during the boot, it took forever otherwise. I mean, it constantly klonged. Bang, bang, bang, one exception after another. They (mostly) went a
  - Comment removed (Score:4, Interesting)
    
    by account_deleted ( 4530225 ) writes: on Thursday October 25, 2012 @07:00AM (#41762429)
    
    Comment removed based on user account deletion
    
    Parent Share
    twitter facebook
  - - Re: (Score:3)
      
      by negRo_slim ( 636783 ) writes:
      
      Source?
      Cuz I'm looking:
      
      http://en.wikipedia.org/wiki/Ntfs#Microsoft_Windows [wikipedia.org]
      http://www.tomshardware.com/forum/1249-63-ntfs-win7-windows [tomshardware.com]
      http://en.wikipedia.org/wiki/Ntfs#Versions [wikipedia.org]
      
      And just not seeing "XP is incompatible with the newest version of NTFS"
    - - Re:Low impact (Score:5, Insightful)
        
        by the_other_chewey ( 1119125 ) writes: on Wednesday October 24, 2012 @05:35PM (#41757679)
        
        That isn't a file system bug, that is progress. Would you consider it a bug if a Linux system from 1998 caused corruption on an ext4 volume?
        Hell yeah.
        
        If it'd tell me it doesn't know the file system and has no idea what do do with it,
        that would be perfectly fine.
        
        But corrupting a file system just because it is unknown to/unsupported by the
        system trying to read it would be a huge bug.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by dotancohen ( 1015143 ) writes:
        
        If it'd tell me it doesn't know the file system and has no idea what do do with it, that would be perfectly fine.
        But corrupting a file system just because it is unknown to/unsupported by the system trying to read it would be a huge bug.
        Windows did have this behaviour, by the way. In 2007 I had a Dell Inspiron laptop with two power buttons: one for Normal Windows and one for Media Center Windows. I had wiped the hard drive and installed Fedora on it. Powering with the normal button worked fine, but if by accident one were to power it on with the Media Center button then I would get the initial Media Center screen (I have no idea where that code was hiding, possibly in a hidden partition) and it would wipe all my ext3 filesystems.
- Re: (Score:3)
  
  by bluefoxlucid ( 723572 ) writes:
  
  I have used BSD. I found it .... quite striking. There's a hell of a lot of performance enhancement in Linux, and it really shows when you try to boot BSD and find it's ass-slow from the get-go. I even tried slapping down Debian-kfreebsd to compare something roughly the same and ... yeah it's just slow as shit. Solaris (both Sun Solaris and Nexenta = Ubuntu/Solaris) wasn't that slow.
  - - Re: (Score:2)
      
      by marcosdumay ( 620877 ) writes:
      
      And Netcraft conirmed it. I know. Everybody knows. You don't need to keep repeating it.
      But, of course, zumbies are knwon to be slow... You may be up to something.
- Re: (Score:2)
  
  by interval1066 ( 668936 ) writes:
  
  Figures... AC calls out FOSS.
  This is what you get when you use a filesystem that wasn't developed by a real company.
  Sounds like M$ FUD to me, but whatever. Is M$ the only "real" company?
  Because if they had to worry about losing money, they would make damned sure that problem didn't exist. Or at least make it go away.
  I got a list of "real"companies that haven't made good on many high-level flaws.
  I thought this "problem" existed with ext4 for years.
  You did? Would've made a nice /. article. Where are your notes regarding this flaw only you uncovered?
  Yeah, Micro$oft is evil, but their FS works.
  http://serverfault.com/questions/31709/how-to-workaround-the-ntfs-move-copy-design-flaw
- Re: (Score:2)
  
  by Rich0 ( 548339 ) writes:
  
  Uh, a journal helps prevent corruption of filesystem metadata by avoiding having it overwritten in place. You even get some benefit for data by doing ordered data writes.
  Granted, COW is better still, but we're not quite there yet on btrfs.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

This is why I stick to Reiser (Score:5, Funny)

Re: (Score:2, Funny)

Re: (Score:2, Funny)

Re: (Score:2)

I don't see the problem then... (Score:5, Funny)

Re: (Score:2, Troll)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Really clever... (Score:5, Funny)

Re: (Score:2)

LKML Slashdotted (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Interesting bug, but don't get excited. (Score:5, Informative)

Re:Interesting bug, but don't get excited. (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Interesting bug, but don't get excited. (Score:5, Informative)

Re: (Score:3)

Re: (Score:2, Insightful)

Re:Interesting bug, but don't get excited. (Score:4, Insightful)

Re:Interesting bug, but don't get excited. (Score:5, Informative)

The file system dug too greedily... (Score:3, Funny)

Part of the game (Score:2)

Re: (Score:2)

Re:Part of the game (Score:4, Informative)

Reiserfs became 'murderfs'... (Score:2)

Re:Reiserfs became 'murderfs'... (Score:5, Funny)

Re: (Score:2)

Summary is wrong (Score:5, Informative)

Re:Summary is wrong (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

How many times (Score:2)

Well of course! (Score:3)

Re:Well of course! (Score:4, Funny)

Wait what? (Score:3)

Don't believe most of the early stories on the web (Score:2, Informative)

Most of the early stories on the web are wrong.... (Score:5, Informative)

hmmm... Android? (using ext4?) (Score:2)

Re: (Score:3)

patch (Score:3)

Re:Bisected? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2, Funny)

Re:Bisected? (Score:5, Informative)

Re:Bisected? (Score:4, Informative)

Re:Bisected? (Score:5, Funny)

Re: (Score:3)

Re:Bisected? (Score:5, Informative)

Re: (Score:3)

Re: (Score:3, Funny)

Re: (Score:2)

Your Papers Please (Score:5, Funny)

Re: (Score:2)

Re: (Score:3)

Re:Bisected? (Score:4, Funny)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:Reinventing the wheel (Score:5, Interesting)

Re: (Score:2, Troll)

Re:Reinventing the wheel (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Low impact (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)