Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage Software Linux IT

Anatomy of Linux Journaling File Systems 59

LinucksGirl writes "Journaling file systems used to be an oddity primarily for research purposes, but today it's the default in Linux. Discover the ideas behind journaling file systems, and learn how they provide better integrity in the face of a power failure or system crash. Learn about the various journaling file systems in use today, and peek into the next generation of journaling file systems."
This discussion has been archived. No new comments can be posted.

Anatomy of Linux Journaling File Systems

Comments Filter:
  • Obligatory. (Score:5, Funny)

    by Tackhead ( 54550 ) on Thursday June 12, 2008 @04:22PM (#23770609)
    "And then there's ReiserFS [wikipedia.org], which had integrity issues when it came to retrieving the location of a bunch of widely-scattered bits under an unbalanced tree."
  • by Animats ( 122034 ) on Thursday June 12, 2008 @04:41PM (#23770871) Homepage

    File systems should know more about file type. Not "file type" in the extension sense, but file type in the sense of what the data written to the file needs for integrity.

    There are only a few standard use cases:

    • The entire file is the commit unit. For most files, you either want the entire file written correctly or you don't want anything written. When nothing is written, the previous version, if any, should remain intact. Applications with "Save" functions need this model. For many binary file types, from images to executables, a partial file is totally useless. So the file should be committed when closed. IBM put this in some of their early UNIX versions, the ones based on UCLA Locus. Done right, if the program crashes before closing or committing the file, the file reverts to the old version. It's not necessary to update the metadata until file close, so this is the fastest mode, and should be the default.
    • Log files. Files are only extended; old data is not overwritten. Ordered journaling is desirable, so that after a crash, the file is intact out to the point of the crash. This is UNIX "open for append" mode.
    • Database files. The file is being used to support a database with read/write data structures. The database system needs to know when the file system has committed a write and may need to know about ordering of queued writes. This is the most complex case, but database implementors pay attention to file system details and are willing to make special calls if necessary to tell the file system when to commit and to wait for commits to complete.

    If those three cases are properly supported, you should never see a garbled file from an unexpected shutdown. Some of the file systems out there have approximately the right feature set for this, but there's no standardized interface and set of expectations that corresponds to these use cases.

    • by klapaucjusz ( 1167407 ) on Thursday June 12, 2008 @05:06PM (#23771145) Homepage

      The entire file is the commit unit. For most files, you either want the entire file written correctly or you don't want anything written. When nothing is written, the previous version, if any, should remain intact.

      You don't need any extra kernel support for that. You just write the new version under a temporary name, and then atomically rename it over the old file. Fsync before renaming for extra credit.

      Good text editors have been doing that for as long anyone can remember.

      Log files. Files are only extended [...] This is UNIX "open for append" mode.

      Unfortunately, it doesn't work very well, since POSIX doesn't (AFAIK) specify the largest write that is guaranteed to be atomic. Hence, unless you're careful, you may end up with log entries from two processes being interleaved.

      • Re: (Score:2, Insightful)

        by 7 digits ( 986730 )
        > You don't need any extra kernel support for that. You just write the new version under a temporary name, and then atomically rename it over the old file. Fsync before renaming for extra credit. And lose all the hard links you may have had on that file...
      • by Animats ( 122034 )

        You don't need any extra kernel support for that. You just write the new version under a temporary name, and then atomically rename it over the old file. Fsync before renaming for extra credit.

        It's not the default. It's hard to do portably. The ritual for doing it on Windows is quite complex and usually implemented wrong. Rename is only atomic on some UNIX/Linux file systems. One can end up with forgotten temp files lying around. File loss is possible on systems lacking an atomic rename function, a

      • You don't need any extra kernel support for that. You just write the new version under a temporary name, and then atomically rename it over the old file.

        This is true if your file system supports atomic replacing. Under some file systems, renaming file A over file B will cause file A to replace file B. Under other file systems, renaming file A over file B will return a failure code or throw an exception; an application is expected to delete file B, warn the user, and then retry moving file A. See the documentation for Python os.rename [python.org].

      • by shani ( 1674 )

        Log files. Files are only extended [...] This is UNIX "open for append" mode.

        Unfortunately, it doesn't work very well, since POSIX doesn't (AFAIK) specify the largest write that is guaranteed to be atomic. Hence, unless you're careful, you may end up with log entries from two processes being interleaved.

        You are wrong, any size buffer passed to write() is guaranteed to be written atomically:

        http://www.opengroup.org/onlinepubs/000095399/functions/write.html [opengroup.org]

        Look for O_APPEND.

        • Unfortunately, it doesn't work very well, since POSIX doesn't (AFAIK) specify the largest write that is guaranteed to be atomic. Hence, unless you're careful, you may end up with log entries from two processes being interleaved.

          You are wrong, any size buffer passed to write() is guaranteed to be written atomically: http://www.opengroup.org/onlinepubs/000095399/functions/write.html [opengroup.org] Look for O_APPEND.

          Indeed. Thanks for the info.

    • by dargaud ( 518470 )
      There's also the case of changing a few bytes inside an existing file using fseek/fwrite. You don't want the file duplicated, but you have to if the operation needs to be atomic, haven't you ? I guess it's similar to the database case.
  • by Anonymous Coward on Thursday June 12, 2008 @04:43PM (#23770887)
    Dear journal,

    Today I was suddenly restarted. It seems as if the large meat machine which regularly uses me was startled by a file which was being written to my logs, "goatse.jpg". Fortunately, thanks to my reliability, The meat machine will be able to view the image upon his return! I hope it is happy with me!

    Yours truly,
    XFS
  • ehhh (Score:1, Troll)

    More anatomy of LinucksGirl, less anatomy of Linux file systems.
  • by bersl2 ( 689221 ) on Thursday June 12, 2008 @04:53PM (#23771007) Journal

Business is a good game -- lots of competition and minimum of rules. You keep score with money. -- Nolan Bushnell, founder of Atari

Working...