Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

First Journaling FS for Linux 281

wendyW writes "LinuxPR has the press release from Namesys, announcing the stable release of the journaling version of ReiserFS. According to the press release, journaling wound up making it even faster than it already was. "
This discussion has been archived. No new comments can be posted.

First Journaling FS for Linux

Comments Filter:
  • I know that ext2 has per-file attribute bits to support compression on a file-by-file basis, and there're patches out there to support that. afaik, there's no "whole filesystem" compression like Stacker/DriveSpace/etc. on DOS/Windows, however, and I don't know what the opinion is on doing it that way. (Probably not too good, unless I miss my guess.)
  • Not that M$ has that much to brag about as far as NT's journalling filesystem... it only journals on the metadata. For them, it's basically just another "me-too" kind of thing, so they can say "Hey! Look! Enterprise people! See, we're reliable! We do journalling just like all those expensive UNIXen!"
  • by Laven ( 102436 ) on Saturday November 06, 1999 @10:49PM (#1555419)
    It looks like there are a lot of questions about other journalling filesystems. I'm no expert on these things, but I have spent quite a bit of time following all three projects and I've read through all available documents on the three filesystems. Here's what I understand of the three.

    XFS
    Originally made by SGI for their IRIX OS, XFS is one awesome filesystem. Read this white paper (http://www.sgi.com/Technology/xfs -whitepaper.html [sgi.com]). This white paper describes all of its cool features. The main features of XFS make it a super scalable, very reliable, ultra fast journalling filesystem utilizing many cool FS technologies like B-trees and other cool stuff.

    Unfortunately, it seems that currently there are many problems with the Linux implementation of XFS. I don't know any details of this, but I guess it is safe to say that XFS will some day become available for Linux. This would be great.

    ext3fs
    I've only read about this in the linux mailing lists. ext3 appears to be a standard ext2fs implementation with journalling data, allowing backward compatibility with ext2, although one of the authors hinted that they may not make it backwards compatible in some later version. It is currently in super early alpha testing and definately not anywhere close to usable, stable and reliable.

    In my opinion this project is very new, and holds much promise. From their README, they appear to be done basic journalling code, and what remains to be done is error handling contingencies, metadata only journalling, performance tuning and lots of other coding. As a result, it may take some time but this could hold much promise and give another viable option for a journalling FS for Linux. Choices are always good.

    Ext3 Site - ftp://ftp.linux.org.uk/pub/linux/sct/f s/jfs/ [linux.org.uk]

    Reiserfs - http://devlinux.com/namesys/ [devlinux.com]
    I've been following reiserfs for a few months now. Its actually been available for quite some time now as a very stable, reliable and quick filesystem for Linux, but it was only recently when journalling was added to the code. Apparently this new addition is supposed to make it faster.

    In "releasing" reiserfs, SuSE doesn't mean that it is the first journalling filesystem for Linux. It is the first journalling FS for Linux to be dubbed reliable and suitable for normal use. This is great as journalling has long been a stumbling block for enterprise adoption of Linux. Alan Cox hinted that he may include reiserfs in the standard kernels soon. Excellent =)

    Warren Togami
    warren@togami.com [mailto]

  • The three really usable linux browsers are netscape/mozilla, lynx, and KFM.
    --
    Man is most nearly himself when he achieves the seriousness of a child at play.
  • why would you like pressing the power button? Your computer turns off when you do that, and then you can't use it :( Also, it can't run rc5des without power. :)
    #define X(x,y) x##y
  • This is not true. I've been using a dual boot FreeBSD (-current) and Linux system for half a year. I have a shared homedirectory on ext2fs which I access regularly from FreeBSD.

    The only annoyance is that in case of a crash in FreeBSD, I have to boot into Linux to fsck the ext2fs filesystem. There is no fsck for ext2fs in FreeBSD yet.
  • While until last year SuSE has positioned itself as a rather conservative distibution, only impementing proven technology (no glibc2 for over a year, KDE instead of GNOME) their X-Server efforts being the only visible exception, RedHat has always been the "bleeding edge" choice among the standard distributions.

    This seems to have changed dramatically during this year: after embracing ALSA (i.e. hireing the top-ALSA developer and making ALSA part of their distribution) and publishing their next version also on DVD, SuSE now also seems to be the first distib to include a working journaling FS and is actively funding its development. Considering their recent expansion efforts (esp. in the US and Eastern Europe) and their actually positive balance (instead of RedHat, they do make some money now), I can't wait for their IPO!
  • Well I'm ready to either change filesystems or increase the block size on my ext2 partition but there's one problem: I've got 15 gigs of data to migrate, a CD recorder, and no money to invest in 15 gigs of tape storage. What strategies are most often used to migrate filesystems?
  • Journalling is mentioned as one of the reasons you should buy NT over Linux on the FUD page:

    http://www.microsoft.com/ntserver/nts/news/msnw/ LinuxMyths.asp

    They'll have to edit it again, now it appears Linux has 2 or 3 to choose from. At the current rate of progress MS will have to edit it almost daily :-).

  • Yes, I certainly agree that ReiserFS intends to be a DBMS of sorts; you can use a filesystem to do so where:
    • A directory can represent a table
    • Each file in that directory is a record in the table
    • An "index directory" can contain symbolic links to the records
    • Things can hierarchicalize as needed

    One goal of ReiserFS is to make this practical even for small records by providing ways of efficiently storing hordes of tiny files.

    But that's a separate issue; that requires that someone create something like a ReiserSQL, a database server that maps SQL queries onto file requests on a filesystem.

    The journalling issue discussed in the top level posting implicitly regards journalling as being important for conventional RDBMSes like Oracle, Sybase, DB/2, PostgreSQL, where the model you are suggesting (and which probably is not too unlike what I outline above) does not apply.

  • Finally?/Hopefully?

    I can't wait to try this one out :>

  • I'll second the advice of the previous replier and recommend typing 'man patch'. If it's anything like any of the other kernel patches, you're going to cd to /usr/src and type 'patch -p(0 or 1)
    --
  • Then you're just as ignorant as he is.

    They're just OSes. If one offers a feature you need, use it. If not, stay with what you're using.
  • FWIW, Alan Cox did mention that he was considering merging reiserfs in with the stable kernel -- it was in the fairly short list of big-thing additions in the "maybe" pile. It would be nice to see it get into the stable tree, assuming that it's sufficiently rock-like to permit it.
  • so what about sgi's xfs? i thought that was going to be linux's savior for a journaling file system. Are we to expect this bundled in any distro's any time soon? will it replace ext2fs?
  • >>If the commercial entities were willing to
    >>redistribute the source for any changes they
    >>made, there would be no problem.

    Wrong. Since this FS is linked against the kernel, they would also have to make their kernel source code available under the GPL (unless they obtain the source under different licensing - see above).

    Again, the BSD folks are happy to redistribute the source for even their entire kernel, but because of the GPL restrictions (or features, as RMS would say), they cannot use this source in their product.


    Actually, no. All they need to do are put stubs in the kernel that will call an exteral filesystem module. Then rewrite the filesystem code as an external module. It'd be slower, but that's the price they pay for their anti-GPL decision.

    The are controlling who can use the currently-available source code. In particular, BSD folks cannot use it,

    As the other respondant said, BSD *can* use the code. The only thing stopping them is themselves.

    The GPL ensures that the Linux kernel will always be open source if the BSD guys want to look at a new FS to see how it was done. The BSD license does not ensure that BSD and its derivatives will always be open source.
  • So basically it means that if a newbie linux user presses the reset button after X crashes, it won't fsck up their file system and maybe have to reinstall Linux because they don't know how to fix it? (speaking from personal experiance : ) If so, then that is a very Good Thing (tm).
  • Well, roughly and generally (and possibly somewhat innacurately), a journaling filesystem is one that writes its data in such a way as, should a crash or power failure or similar event occur, the disk is left in a runnable state and there should be no data loss beyond a possible loss of the changes that were being written to a file. A consequence of journaling is that in such events, since the filesystem does not need fixing after a crash, a machine using it for a really huge disk can restart far more quickly than would be required if a fsck had to be run across an entire filesystem, as happens with ext2.
  • I see that some updates have required re-compile
    and re-format of existing partitions to use now
    source... *ouch*

    Any ideas on how many more times this may happen before the beta-testing is done??
    (Hmm... got first post earlier and didn't even try. :> Nice way to start the evening. *grin*)



  • by Parity ( 12797 )
    That came out of left field... all the hype has been about xfs, and now this.

    I wonder, though, how GPL purists are going to react, since their business model is to be GPL but sell GPL-exceptions to some companies.

    I suspect that the project will quickly fork into Reiser-FS-commercial and Reiser-FS-pureGPL as soon as a contributor refuses to license a GPL-exception.

    I wonder if anyone here has heard of this before? Beta-tested it? Maybe I'll try it tomorrow. (I want to keep my machine running tonight, so I can't very well replace the fs. :))


    --Parity
  • The NameSys FTP Archive, which houses the reiserf files, is located at:

    http://devlinux.com/pub/namesys [devlinux.com]

    If you grab the sources from the site, the README.FIRST file says to:
    • Apply linux-2.2.11-reiserfs-3.4.gz to pure linux 2.2.11 with `zcat linux-2.2.11-reiserfs-3.5.gz | patch -p0`

    • Do 'cd /usr/src/linux/fs/reiserfs/utils; make dep; make; make install' to make the utilities.
  • Buy or borrow any new HD. $200 USD will get you a 20GB HD... Find a friend, or someone at a local LUG willing to loan you one for a few hours if you can't afford it.
  • Which brings up the question of keeping support for multiple filesystems. MINIX has so little overhead that many still use it on floppies. The Squid caching [nlanr.net] group is working on a new VFS to put on top of the Unix filesystems it is installed on because they are so bad at handling large numbers of small files. It would be great to have an open filesystem standard for a small-file reliable filesystem for such things as caching and user document partitions. Then use ext2/3 for binary/library directories, etc. There shouldn't be a "one size fits all" filesystem we aim toward, should there? Complexity may be a pain some days, but you don't have to expose the average person to this, just those wishing for optimised performance (just like not everyone needs to know how to use RAID).

    - Michael T. Babcock <homepage [linuxsupportline.com]>
  • I didn't say it wouldn't make a good root file system, just that you would neet to boot off of another root partition to repair it.


    The above is incorrect. When mounting the filesystem, the journal is replayed. When the filesystem is unmounted cleanly, there is nothing to replay. If the filesystem was not unmounted cleanly, then there is *something* in there to replay. So you do *not* need to boot off of an alternate root to repair it. Just mount it and let it do the journal replay, and you are ready to go. It seems the FAQ must be out of date.

  • I'm afraid SGI still has lots of work with making the code endian clean.

    I spoke with the SGI guys at Linux Kongress. Currently XFS works on Linux/ia32, but the disks cannot be moved to a big endian (IRIX/MIPS) box because the version of XFS for Linux is little endian. Since we (the m68k and PPC guys) have quite some experience with bi-endianness in ext2 (originally ext2 was big endian on m68k and PPC), we were able to convince them that XFS has to be big endian on all platforms, just like ext2 is little endian on all platforms.

    Best wishes, SGI!

  • Actually, this is not quite true regarding NTFS. This is on their website regarding terminal server:
    "Fault Recovery
    The Windows NT Filesystem (NTFS) is a journaling, or transactional file system. This means that any I/O that alters the file system data or meta-data (directory structure, etc.) is completed atomically so that either all of the changes are completed, or none of the changes is completed. This design means the transaction log can be used to restore the file system to a known good state after a system crash. In addition, NTFS keeps copies of vital file system information in multiple sectors for extra redundancy."

  • Did you get the right patch?

    If you built the journalling version, you should not even have gotten a reiserfsck. In fact, the README in the utils directory specifically says you should *NOT* fsck a jRFS filesystem. It sounds like you built plain reiserfs, and not the journalling version.



    As far as fsck'ing an ext2 partition. I have to disagree. We have a machine at work with 4 17GB SCSI disks in it. When that machine resets, it takes between 10 and 20 minutes to fsck all of the disks, and mount the filesystems. Reiserfs with journalling would cut that down to a minute at most I would think...


    JFS is not just a buzzword IMHO. Other OS's have had this for quite some time (XFS on SGI, VxFS on HP-UX etc..), and Linux has been quite lacking in this area. Just my .02

  • by Anonymous Coward
    Sigh. NT "journalling" is MS's definition of journalling. It differs significantly from what SGI would call journalling. MS basically implement a subset of what a true JFS does, enough that their Marketing Droids could prate about it - not enough that your data is reasonably safe. That's why you have to _buy_ real JFSes separately for NT.

  • Lose only has one "o"
  • Heh, NTFS == journaled? When our NT servers crash they spend over 4 hours to run chkdsk on their 80GB volumes, this is not exactly what I would call journaling... unless you believe everything in Micro$oft's glossy NT product folders :)
  • but you shouldn't be proud by merely catching up with NT. That's my point. The news should be about the technology that makes reiser fs unique.
  • They aren't controlling who can use it. By putting the code under GPL, they're just making sure no one else can take away someone else's right to use and modify it.
    At the risk of starting a GPL flame war...

    The are controlling who can use the currently-available source code. In particular, BSD folks cannot use it, and neither can commercial entities whose kernels are not GPL'd and who do not wish to buy a license.

    Worse, since they are also making it available under alternative licensing schemes, they themselves are taking away other's rights to use and modify the code streams that are released in commercial systems. Now, I suppose it could be licensed to commercial entities in a way that requires all changes made to the code be assigned to the ReiserFS developers under their copyright, but I don't think they would be adamant about such a clause if the commercial entity didn't want it, and the ReiserFS folks stand to make a good bit of money from a deal.

    If the commercial entities were willing to redistribute the source for any changes they made, there would be no problem.
    Wrong. Since this FS is linked against the kernel, they would also have to make their kernel source code available under the GPL (unless they obtain the source under different licensing - see above).

    Again, the BSD folks are happy to redistribute the source for even their entire kernel, but because of the GPL restrictions (or features, as RMS would say), they cannot use this source in their product.

    99 little bugs in the code, 99 bugs in the code,
    fix one bug, compile it again...

  • Does anyone use this code on a production machine? Is it stable? I've got a bunch of Linux boxes just itching for the performance improvement and security of reiserfs. Is it possible to use this as the root file system? Thanks.

  • I hate to take issue with a well-spoken posting, but journalling is not of primary usefulness for helping support High Availability RDBMS systems.
    I'm sorry to disagree, but you're very very wrong here. The fact that Reiserfs is fast (if the article is true) as a filesystem for manipulating ordinary files is not what's important (though it's a big plus in many configurations). Nor is how a database manipulates its data relevant. It's how quickly the filesystem is recovered after a failure that's the primary benefit. I've delt with many hundreds of customers over the last 4-5 years that have brought systems for the sole purpose of running databases on Journalling filesystems.

    On multi system High Availability configurations of the type that run database application services designed to failover from one system to the next, a journaling filesystem is essential. Unless it's controlled by an administrator, when a service moves from one system to another it's because of some kind of system failure and in this situation the filesystem is not unmounted cleanly. With a journaling FS, recovery is quick, measured in seconds. For a non journaling filesystem recovery could take hours. I've even seen one (very stupidly configured) customer system (100+ GB's) where this took days. If it takes more than a few seconds to recover, then it's not High Availability.

    Note that for these, the metadata is very static which means that journalling of metadata is of relatively little importance.
    It doesn't matter if the metadata is static or not. If the database sits on filesystem that isn't journalled it will have to be fsck'd regardless and will take almost as long to fsck as a filesystem that isn't as static.

    As you rightly pointed out, the only other option is not to have a filesystem at all and drop the database into a RAW partition. In this case all bets are off as the stability and recovery time of a database after a system failure is up to the implementation specific design of the database vendor.

    Macka

  • I haven't heard of those tools...are they open-source? If you can tell me where to find them, maybe I can see if a port might be possible...

    -lx
  • Ok... great.. we have journaling.. where are the ACL's??? ACL's are one of the biggest things setting linux behind commercial *nixes such as solaris and irix. (And don't tell me User Private Groups get around ACL's - because they don't.)
  • BeOS isn't a heavy server platform? No shit. I might point out, though, that the vast majority of computers spend their time being workstations, not servers.

    What I asked for was a comparison between the different journaling filesystems, without saying that any of them was a cure-all.

    -lx
  • If you let the log grow unbounded, then yes.

    Thats why the log is recycled in a circular buffer fashion.

    Cheers
  • BS: Soft Updates is about as "not free" as the entire Linux kernel. RTFL
  • Did you have journaling enabled? If so, please try and report as much information as possible to the reiserfs mailing list.
  • When the time comes, some company or individual could release a distribution targeted specifically for musicians, with all the kernel mods they might need, and the sound utilities included and configured. Not a problem.
  • You have absolutely no idea what you are talking about.

    There is an old, deprecated "Log-Structured FS" in the FreeBSD source tree. Nobody's interested -- log structured FS'es generally have atrocious read performance, because they cannot lay out files for faster read performance like FFS (and I assume ext2fs) can. McKusick has nothing to do with this, and is not very interested in this approach either.

    The related journalling filesystems add an extra disk write for every single update operation, making them somewhat slower than the normal filesystem that the journal augments. The journalling technique is, however, conceptually quite simple. Since the extra data structure (the journal) is only used during FS recovery, at least it only wastes disk bandwith during normal operation.

    OTOH, soft updates makes a different trade-off: it saves the disk bandwidth, but takes up CPU time and memory. Since CPU's and memory systems are always going to be much faster than magnetic disks (for the forseeable future, anyway), I think this is a better tradeoff.

    And SU *does* leave the filesystem safe to mount after a crash. The *only* inconsistencies that can occur are:
    1) unused data blocks not marked free.
    2) inodes with too high of a link count.
    These can only result in wasted space, nothing more serious. McKusick is working on a background fsck (using NetApp-style FS snapshots) for FFS, so that fsck can basically be run at anytime during system operation (i.e. the FS doesn't have to be unmounted or r/o mounted).

    Oh, well, not that it matters -- this is slashdot, and I fully expect any reply to be "Linux rulez!"
    The bias I see running through this thread is that "Linux has it, so it must be great" and "BSD doesn't, therefore it must be necessary," so "let's bash BSD on technical grounds -- we can almost never do that ;-)"

    In reality, SU and journalling are radically different approaches to solving the same problem. They both add *extra* complexity to async writes -- that is, they are not performance tweaks! They are techniquest that try to retain *part* of the performance of async, while adding crash-resistance.
  • umm BeOS has a 250 microsecond scheduling latency. That's us not ms.
  • ./configure;make;make install

    :)
  • Damn, now I have to reformat my partiton. *sigh* oh well, the non journalling reiserfs partition sure was fast though.. hopefully the journalling one will be faster =). Maybe I shouldn't have hit the off button to test it ;) Thanks
  • This is a little off-topic, but I will ask anyway

    Call me stupid, but what exactly is endian-ness? What are endians? I mean I see the endian check anytime (it seems) i compile a prog in linux and just kinda ignored it since everything usually works

    thanks
  • Slower? by what benchmark?

    Some System V advocates had created a "benchmark" that proved that FFS (with 4k blocks and 1k fragments) was faster than their 1k-blocksize filesystem. They would create a file, and grow it by 1k blocks. This cause FFS to grow the file's last fragment (which often involves a copy) on every update. Of course, it was the optimal case for their filesystem. It was this crooked benchmark that lead to the tunefs -o (space|time) option. '-o time' will upgrade a fragment to a full 4k block, wasting a bit of space, but saving time -- it is quite useful for constantly-growing smallish files, e.g. logfiles. (and, of course, for running crooked benchmarks ;-)

    Also, logging filesystems and journalling filesystems are *very* different. Loggin filesystems pretty much have to leave file blocks scattered throughout the disk, or waste disk bandwidth relocating them (to the head of the log, of course). Journalling filesystems, however, can choose whatever file layout they please, so they can optimise this layout for good read performance. The FreeBSD LFS is not even alpha -- it's past that stage, nobody is working on it, and I think it might even be in the CVS attic... yep, no code there anymore. I think the LFS approach is pretty much dead -- journalling and soft updates give the same sort of reliability and write performance as LFS promised.

    Also, having just read two papers on the soft updates technique, I feel the need to improve on your description of it: the behavior you describe is present on FFS/async, as well.

    What soft updates does is mantain a list of updates for each in-memory metadata block. Before the block is written to disk, this list of updates is scanned, and any unsafe updates (i.e. ones that depend on other uncommitted updates) are rolled back. After the write completes, they are then rolled forward, to bring the block to its current state. In other words, the latest safe version of the metadata block is written to disk.
  • Me too. I thought it looked familiar. How lame. And to avoid getting into trouble the loser doesn't even have the courage to refer to a real person.
  • Disclaimer: I have never looked at the ReiserFS code, nor am I significantly familiar with it or Ext2 internals. The following is rampant speculation of the worst kind and should be ignored.

    Having said that, I can think of a couple of reasons why, given the stated design goals of rfs, it would not perform well on those tests. Basically, the performance ( O(n) = "big O" ) of an algorithm can be measured as it varies on the size of data points.

    Now, let's suppose that ext2 uses sequential scans to get directory entries (I'm fairly sure it does). The O() of a sequential scan is O(n)=n. That is, the time required to perform the scan for n elements increases linearly.

    The time for a B-tree based filesystem would increase according to O(log2(n)). The curve on this one is /worse/ for small values of n, but much better as n grows larger. Try graphing x=log(y) to in gnuplot to get an idea of what this would look like.

    In other words, you may not have had enough items in a single directory to experience the benefits of RFS. I would be interested in results with say 10,000 items in a single directory, or better yet 10,000 directories in a single directory with 10,000 one byte files.

    That (as I understand it) is really the kind of grueling stuff that reiserfs is designed for. Nor is this without application. On one of the boxes where I work, we have > 70,000 elm email folders, each stored under "customer_name/email". A simple "ls" takes an hour! Granted this is a boneheaded design (that I didn't do), but the point remains.
  • I wasn't able to reach that site (no response) but it's pretty strange to expect musicians to go out of their way to install an unsupported patch.
  • Jargon file:
    http://www.tuxedo.org/~esr/jargon/jargon.html#bi g-endian

    "big-endian adj.

    1. Describes a computer architecture in which, within a given multi-byte numeric
    representation, the most significant byte has the lowest address (the word is stored `big-end-first'). Most processors, including the
    IBM 370 family, the PDP-10, the Motorola microprocessor families, and most of the various RISC designs are big-endian."
  • ... So you can perfectly integrate a GPL part into BSD. The only problem is that the whole thing then has to be distributed under the GPL. (If I understand properly).
    Now people could still develop BSD under the Berkeley license, provide Reiser FS as an add-on, so that commercial types could rip off the BSD code, only without ReiserFS.
  • I understand it differently. It seems to me that they want the filesystem to BE THE DBMS, instead of a "conventional" dbms USING the filesytem.

    Something like every row in the db is a file and you can use the unix io primitives to access them.

  • Comment removed based on user account deletion
  • That's a "log-structured" filesystem, not a journalling filesystem. They're very slightly related (in that both of them are filesystems).

    The difference is a journalling filesystem can maintain decent read performance, because journalling does not confine the layout of data on the disk, while log-structuring does. Log-structured filesystems cannot get past the file fragmentation problem, without wasting tons of disk bandwidth.

  • Why don't you use 4 KB block size especially because you're using large files. It's about 10 times faster.

    % ls -l bigfile
    -rw-rw-r-- 1 root root 1843200000 Nov 7 17:04 bigfile
    % time rm bigfile
    0.00s usr, 0.28s sys, 4.94s real, 5% CPU


  • Clarification from the Multi-Disk-HOWTO [linuxberg.com]:
    These take a radically different approach to file updates by logging modifications for files in a log and later at some time checkpointing the logs.

    Reading is roughly as fast as traditional file systems that always update the files directly. Writing is much faster as only updates are appended to a log. All this is transparent to the user. It is in reliability and particularly in checking file system integrity that these file systems really shine. Since the data before last checkpointing is known to be good only the log has to be checked, and this is much faster than for traditional file systems.

    Note that while logging filesystems keep track of changes made to both data and inodes, journaling filesystems keep track only of inode changes.

  • Yeah that would work if Linux came with the journaling file system _installed_ Last time I checked Redhat, Caldera, or whatnot were still using ext2. It's not "FUD" if it's true. Linux distributions do not come with a journaling filesystem installed
  • by bgarcia ( 33222 ) on Saturday November 06, 1999 @03:19PM (#1555523) Homepage Journal
    I noticed that this code is released under the GPL. That means that the *BSD folks can't just take the code and incorporate it into their OS's.

    There is a clause in the license that states that if you contact them, they will let you use it under a different license. But I can't imagine them putting it under the BSD license. It sounds like they want to control who can use it, and they've decided that GNU projects and commercial entities who pay are their target market. If they ever release it under a BSD license, then commercial entities could just grab the BSD-released copy and work from there.

    Will the BSD's simply miss out on this nice new filesystem?

    99 little bugs in the code, 99 bugs in the code,
    fix one bug, compile it again...

  • two dozen CDRs should just about cover it, that's about $25 today isn't it? I helped a guy in uni do this with 9Gb worth of his research data when he moved from NTFS to ext2fs (take a guess why ;) in his case we had about 1.5Gb of space to work in so we used tar and bzip2 from cygnus to pack the data down into chunks just the right size for a CD, then did a comparison of the expanded data before nuking the original and moving on to the next chunk.
  • I guess you're right. I was working under the pretty-valid assumption that the BSD people would rather die than license their kernel under the GPL.
  • by keytoe ( 91531 ) on Saturday November 06, 1999 @03:21PM (#1555529) Homepage

    This would be a huge boon to those of us trying to truly break free of the commercial unices. I've had to put together quotes for enterprise quality database solutions before and there have always been a couple of hurdles to get past when considering an Intel/linux based system.

    PostgreSQL works wonderfully with large data sets, but lacks the ability to do hot restores. I'm eagerly awaiting that one... Now that it does a much better job with concurrant locks, that's my only real hesitation at this point.

    SMP has come a long way in a short time with linux, but is still a bit lacking. This makes it difficult to settle on Intel hardware - sometimes, you just need Raw Horsepower. I'd like to get there without having to buckle down and buy a Sun or HP box. I'm not worried about this one - things are coming along quite nicely...

    Now, my last concern was journaling filesystems - and it looks like it's coming at long last! I was excited when the initial announcement was made, but now that the code is out (and Alan is even considering merging into the stable branch!), I'm all gushy inside! Let's hear it for our team!

    I've watched this whole linux thing start out as a 'hobby OS' and develop through adolescence into what is becoming a damned serious contender with the big boys. Sure, they're baby steps at the moment, but at this pace, they add up right quick. God, I love this industry - never know what's gunna happen next. Who knows - maybe the government will sue Microsoft for anti-trust violations next. Oh... right...

  • flamebait? hey moderator, just because YOU don't understand why running a news spool off BFS would be a catasrophicly bad idea, doesn't make this AC any less correct.

    I'm not the moderator responsible, but I do agree it's flamebait. Let's have a look, shall we?

    Modern times? BFS? Yeah, right. That crud is fine for BeOS, but would die under multi-user stress. Want to run INN on BFS? That's what I thought.

    Ooh, look! Flamebait! And now, apply Instant Flamebait-Away(tm):

    BFS is fine for BeOS, but would die under multi-user stress. Want to run INN on BFS?

    It's saying the same thing (that BFS isn't appropriate for servers), but it's not so inflammatory any more.

    To ALL moderators: IF YOU DON'T UNDERSTAND THE SUBJECT, DON'T MODERATE THE COMMENTS!

    "Flamebait" is about tone, not content.

  • Comment removed based on user account deletion
  • by crow ( 16139 ) on Saturday November 06, 1999 @03:23PM (#1555532) Homepage Journal
    This is not the first time software has been released under this model. My understanding is that this is how RT Linux was released.

    [The idea of RT Linux is to put a small real time kernel underneath Linux. This kernel handles the real time tasks, and schedules Linux when a real time task doesn't require it. It also provides a communication mechanism between Linux processes and real time tasks.]

    So the RT linux kernel could, in theory, be used without Linux (perhaps with another OS instead) to provide real time services. The author has carefully retained the copyright to his code, so he can sell it under a non-GPL license if someone wishes to incorporate it into a commercial project.

    I'm not aware of any non-GPL licenses for RT Linux, but the model is there.

    The main thing that helps make this model work is that the copyright holder controls the distribution. That means that in order to get your changes into the official releases, you have to resolve any copyright issues. It only breaks down if there is a significant dispute and someone is willing to go to the effort to start a separate distribution. Of course, if they get the file system into the main Linux distribution, that action will trigger a fork in development.
  • Journeling is something Linux badly needs. This can only be good news.

    I've been dieing to get a journeled filesystem for my three servers. I design my servers so they will never have to be touched again. But I've always worried that we might get a power outage and the automatic-fsck Linux does might fail. Journeling would be a big help.

    I'm still waiting for Sun's XFS though...

  • Ah, thanks. I thought they where more or less the same. You learn something every day.
  • I have been speculating about filesystems for a while, and have come to the conclusion that performance is nice, but not a top priority for a home user/devloper like me. Instead I would like to see a little bit of "intelligence" in the fs, in the way it uses the unused (wasted?) part of the disk. With a little tracking of usage patterns, it should be able to do

    automatic version control of files that I work with.

    back-up copies of important system files

    compress seldomly used files

    reorder files on the disk by access patterns to save seek times

    even delete unwanted files if running low on space (core dumps and editor backups more that a week old...)

    This could be configured with special tools, and/or with a hidden file in each directory to tell what are the important things here. Most of this should happen automatically in the background, out of sight.

    Has this been done already? where? Anyone working on this sort of things? Anyone willing to steal these ideas? Technically feasible?

  • lol, what the hell are you talking about? For one thing, Linux is just an operating system, there is no need to personify it as Hitler. ;)
  • How does this new filesystem compare to ext2fs on deletion times. For starters here is what a typical deletion ext2fs takes:

    heroine:/home/mov% l *.mov
    -rw-r--r-- 1 root root 1958135327 Nov 6 17:49 xena1.mov
    heroine:/home/mov% time rm xena1.mov

    real 0m56.536s
    user 0m0.000s
    sys 0m0.920s

    Even a 30 second deletion time would be great.
  • Kinda funny... but not really... a mind like that could have done a much better job... take a little more time before your next post
  • Try ACL-Posix or Trustees. Both implement ACL's for Linux.
  • Though benchmarks aren't everything, they're always nice to look at.

    Here's the linkage:
    http://devlinux.org/namesys/bens.html [devlinux.org]

    --
  • if it's not Redhat, nobody cares. You're not going to get Windows and Mac musicians to come to yet another obscure underground linux distro.
  • linux offers other technologies that Be doesn't have. It's a better server. It's more configurable because it's under the GPL. Fine-BeOS is easier to use because it's _not_ under the GPL. 2 completely different OS's for 2 completely different users. Sometimes they need to share technologies. Linux needs a journalling FS to be a better server. That doesn't mean it's going to turn into BeOS! BeOS needs to integrate VM and disk cache to be better for users-that doesn't mean it's going to turn into freebsd or linux. In the end, it's all about implementation. Most people don't want to deal with 5 zillion patches, GUI's and distros. And some people _do_ and you have to accept that.
  • by Anonymous Coward
    ...didn't they develop this?
  • by tap ( 18562 ) on Saturday November 06, 1999 @03:35PM (#1555568) Homepage
    The ext3 journaling filesystem has its first beta a few months ago. It does't require you to reformat your existing ext2 partitions to convert to ext3. And an ext3 filesystem can still be used as an ext2 filesystem, you just need to update the journaling information if you go back to ext3 after using it as ext2. Read more about it at Stephen Tweedie's ext3 site [linux.org.uk].
  • by CelestialScum ( 23249 ) on Saturday November 06, 1999 @03:37PM (#1555571)
    The difference between the two are more of an academical than user-related issue, as it is basically in the way they are built up. As far as journaling goes, they are both up to the task.
    I do not know if ReiserFS is a true 64 bit one, handling the files as big as the XFS does, but a quick and dirty look at the two FS's homepages should yield a lot more info on this.
    XFS and ReiserFS is not going to replace ext2. Actually, ext3 is, which will, when released, also be a journaling FS (from what I heard).
    Maybe someone could provide the right urls or more info on this than I can. I believe in time, they will all be included into the kernel, and you can choose your preference based on your needs. In the meantime, make a small partition, insmod the module and mount the drive and play with it I guess :)
  • Well, first off, you're probably right not to switch over immediately for anything mission-critical. Every new program has bugs that need to be discovered and fixed, and this will be no exception.

    I don't agree that journaling FS's are a buzzword, or a fad, though. When they work, they work extremely well -- and invisibly. A good example of a solid, robust journalled filesystem operating system is IBM's AIX. AIX uses the journalled filesystem for everything, including the root partition, and based on my many years experience with these machines, system crashes simply don't break the filesystem.

    However, journaling filesystems aren't the end-all. There's still a significant feature set missing from unix filesystems ... and that's the concept of work units with commit/rollback.

    It works like this ... you want to make a bunch of changes to a bunch of files, all at once. However, if the system were to crash while you were in the middle of making these changes, your data files would be in an indeterminate state.

    If you had a filesystem with work units, you would start by making a system call to open a work unit, then make your changes. When you are finished, you either make a commit system call, or a rollback call. If the commit ends with a success return code, then all of the changes are guaranteed to be made. If an error occurs in the commit, or you make a rollback call, all of the changes in that work unit are backed off. If the system crashes before you make a commit/rollback, all of your changes are backed off when the system reboots. This gives you fine-grain control over how data changes are made to files in your filesystem. Once you've tried it, you'll never want to go back.

    This is a standard database programming technique, but moving the functionality into the operating system gives you a huge programming capability. It lets you write programs with database-grade data integrity as a matter of course, without requiring that you program against a database API.

    I was skeptical as to the value of commit/rollback for ordinary filesystem programming, until IBM included them in it's then-new SFS filesystem on VM. Now I consider it one of those great things that will probably take years for the rest of the world to discover and implement.

    - John
  • by Jeff Mahoney ( 11112 ) on Saturday November 06, 1999 @03:46PM (#1555580)
    There is a semi recent benchmark vs ext2fs at http://name sys.botik.ru/~yura/benchmarks/journal_227/ext2_vs_ jour9.html [botik.ru]

    Chris has the office next to mine and has been showing me these benchmarks just about every day - they improve just about every day.

    -Jeff
  • Alladin used to do that with Ghostscript. RMS had no problems with this business model, and I haven't heard of anyone who wanted to fork the project for this reason. It would also be silly, as you would have to remerge the enhancements from the trunk back to your branch.

    With Ghostscript the GPL was not restrictive enough. Proprietary software would simply call the gs executable in a separate process. That is why Alladin eventually switched to a more restrictive license. Namesys should have no such problems, you can't run a filesystem stand-alone.


  • Non-journalling file system (a la ext2, fat, etc. ) must be properly unmounted on shutdown. If it's not unmounted cleanly, it needs to be checked for errors, since it has no idea what happened just before the crash/power failure/whatever.

    Journalling file system keeps track of all the changes as they occur. So, even if it's not unmounted before shutdown, it can easily determine what was modified and deal with it as appropriate. So, for example, if you kick a power cord by accident, you no longer need to wait for 5 minutes while fsck scans the file system.

    High-end data warehouses have file systems measured in terabytes. You *definitely* don't want to wait for fsck there...
  • Look. I'll admit that the "Russian-made" line seems to have come from absolutely nowhere. But just because you don't immediately understand something is no reason to immediately and irrationally assume the author meant to defame anyone, much less an entire nation. This said, I'd like to know what he meant by it myself.
  • They mean ReiserFS is the first "stable" journaling FS for Linux. You are quite correct in saying that ext3 was "first", in that it had journaling before ReiserFS (at least, ext3 was publically available with journaling before ReiserFS was, that I'm aware of), but it's a fair way from being considered stable just yet.

    Having just looked at ReiserFS's site, it seems either they haven't updated the site yet, or they consider beta == stable, since I could only find the beta release of the code which has journaling.

  • But the free BSDs (FreeBSD, OpenBSD, NetBSD) all AFAIK use their own file system that works alot better than ext2, although I don't know if it is journaled. The FreeBSD people don't need to worry much about "missing" GPL software, FreeBSD and the like will run most if not all linux binaries and code is simple to port.
  • by Ami Ganguli ( 921 ) on Saturday November 06, 1999 @05:03PM (#1555612) Homepage

    So Redhat pays for Alan (and Gnome?), Corel supports WINE, and Suse pays for file systems.

    Open Source has always been good at producing excellent, relatively small and self-contained components. We haven't been so great (with a few very notable exceptions, the kernel being one) at producing large projects. If it's a lot of effort with no quick return, the coders get tired of it.

    Now the commercial companies are funding the big stuff in an attempt to gain mindshare ("we must know what we're doing, we've got Alan"). This really complements the existing strengths of Open Source.

  • 1. This is not about porting ext2 to BSD, but a journaled fs. So comparisons to ext2 are meaningless.

    2. Running linux binaries has nothing to do with this either, unless someone wants to make a user space version of RFS, and BSD can support it.

    --
  • 3:56.82 -- call it four

    Divide by 3:07 -- call it three

    Both roundings favor resiserfs, yet the ratio is said to be 1.56. I don't think so.

    Look at the rm -rf * stats -- ration is claimed to be 10.1, yet it's a lot closer to 7.

    What hope is there for the numbers themselves?

    --
  • If this File System is a good thing and can be integrated with *BSD

    It can't. It specifically says in the readme that you can't use it with a kernel that's not GPL'd without the authors' permission. That would generally be the case with GPL'd code anway, though (you don't link gcc with the FreeBSD kernel, so it's ok; but you can't take video4linux, which is GPL'd and in the kernel, and include it in the FreeBSD kernel).
  • by QuMa ( 19440 ) on Saturday November 06, 1999 @05:24PM (#1555622)
    This sort of covers it:

    http://collective.cpoint.net/lfs/ what_lfs_is.html [cpoint.net]
  • Sorry for the confusion here, I'll ask them to change the README. These instructions will get you the non-journaled version of the ReiserFS. From the ftp site, the patch you want is:

    linux-2.2.11-reiserfs-3.5.5-journaling-beta.gz

    This is the most recent code, even though it is not in beta any longer. The journaling portion of the ReiserFS site has links and more information:

    http://www.devlinux.com/projects/reiserfs/jrnl [devlinux.com]

    -chris

  • You could boot from an initrd RAM Disk, load the XFS module, and then remount your root partition from an XFS partition on your hard-drive. After all, this is how RedHat kernels allow you to have your root partition on a SCSI drive, yet still have all of the SCSI devices built as modules.

    Indeed, just this sort of technique can also be used to handle a ReiserFS root partition that needs to be fsck'd, by having the boot routines in the RAM disk image do the fsck if necessary. Strikes me as a bit more fragile than what I'd care to deploy in a mission critical setting, but....

    --Joe
    --
  • I think you confuse some issues. Alladin has had _two_ business models. The old was to use GPL and sell exceptions. The new is to use a more restrictive license for new versions, sell exceptions, and release old versions under the GPL.

    That some printer manufacturers didn't want to obey the GPL was not a problem for Alladin, it was a feature. It meant these manufactures would want to buy an exception from Alladin. When they buy an exception, the GPL become irrelevant to the customer. The problem was developers of "postscript enabled proprietary applications" who _hadn't_ any problem with the GPL, because they didn't link with gs, they just used it as a standalone program. They would not pay Alladin, instead they would distribute the source to gs. The new Alladin license was designed to prevent this.

    Namesys will not need to change their license, because their potential customers will not be able to use a similar loophole.


  • Ok, some independent things:

    I copied /usr/local to new ext2 and reiserfs partitions on a brand new harddrive. First thing of note, from df:

    /dev/hdb1 7823372 442980 7380392 6% /newdrive
    /dev/hdb2 5283091 410343 4599242 8% /ext2

    Newdrive is the reiserfs one. They contain the same data, but the reiserfs one is 30MB bigger.

    Now for some stuff.

    Running find . -exec wc {} \; on an installation of StarOffice on the Reiserfs one gives:

    9.94user 21.02system 0:53.46elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (177778major+32241minor)pagefaults 0swaps

    On ext2:

    9.78user 17.41system 0:50.85elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (205151major+32581minor)pagefaults 0swaps

    Reiserfs loses. Just one data point, however. And I may have it set up wrong. For reference, the reiserfs has lower cylinder numbers.
  • by Mr Z ( 6791 ) on Saturday November 06, 1999 @06:16PM (#1555658) Homepage Journal
    RMS had no problems with this business model, . . .

    Actually, I hear that he's not thrilled with it. Indeed, one of the biggest problems that I can see is that there is very little incentive for people to improve the existing GPL version of Ghostscript when they know that Aladdin has (a) already improved Ghostscript in the current commercial version, and (b) will be releasing their changes 'soon' (after one year). This interview with Ghostscript's author Peter Deutsch [linuxcare.com] sheds more light on the situation, including Stallman's thoughts.

    One result is that the GPL community is almost guaranteed to always be one year behind the latest in Ghostscript technology, unless someone gets up enough nerve to fork Ghostscript development and try to get ahead of Aladdin.

    With Ghostscript the GPL was not restrictive enough. Proprietary software would simply call the gs executable in a separate process.

    Part of the problem here is that the Aladdin folks try to license their code to printer manufacturers, etc. The printer folks aren't too keen on having to ship Ghostscript on demand to anyone who buys a printer. Also, if the printer folks make any platform specific changes (which undoubtedly they will, such as specific driver technology for running the print engine), they'd have to distribute those changes, and most aren't willing to do so.

    Also, more importantly, Peter Deutsch doesn't seem too keen on having people ship Postscript-enabled printers by using his work for free (as in gratis).

    The upshot: Aladdin offers their latest and greatest Ghostscript with a commercial license.

    With ReiserFS, I'm sure a similar but not identical set of considerations exist. People building embedded or mission critical systems on an otherwise proprietary base might license ReiserFS for their application without introducing any questions as to the effects of GPL. At the same time, a GPL version is available for everyone.

    The difference here is a bit subtle but important. Namesys appears to be releasing the latest and greatest ReiserFS under GPL, rather than imposing an artificial delay. (Whether or not this changes in the future is unclear, but for now it is an important distinction.) In this case, the commercial license seems to be a means for companies to buy an "unencumbered" version of ReiserFS for their own purposes. (By "unencumbered", I mean free of the implications of GPL.) I see this potentially as a way to keep both camps happy. Maybe. (Except, of course, RMS.)

    --Joe
    --
  • I hate to take issue with a well-spoken posting, but journalling is not of primary usefulness for helping support High Availability RDBMS systems.

    The main effect of journalling, the thing that is really important about it, is that it guarantees that metadata updates are kept consistent. That is, journalling is primarily supportive of making sure that filenames, directory structures, permissions, and such are kept consistent even when moderately catastrophic things happen.

    This is a really good thing when supporting file serving activities, as that indeed tends to involve lots of manipulations of files as users shift them around.

    I've been on the ReiserFS mailing list since '97; have been running a personal news spool on a small ReiserFS partition for probably 6 months. I can't tell for sure if the journalling now available is metadata-only, or if it also journals normal data updates. It looks rather more like metadata-only, which is useful for file-server work, but not so much for RDBMSes.

    Databases behave in quite different ways from file servers in terms of the way they do file access.

    If you look at most RDBMSes, they create a few files, and do lots of manipulations on top of them. Informix SE is a counterexample, basically using Informix C-ISAM underneath, but is unusual in that regard. If you look at the database partitions, you get one of two things:

    • Partitions containing a few very large files.

      Note that for these, the metadata is very static which means that journalling of metadata is of relatively little importance.

    • Partitions containing no filesystem, but rather raw data being managed by the RDBMS.

      Don't just believe me; I am not the ultimate authority on this. Transaction Processing : Concepts and Techniques [amazon.com] is a rather definitive reference; it discusses methods of managing transactions in the context of database management systems, and goes into considerable detail discussing transaction logging, which bears striking (and not merely coincidental) resemblance to journalling.

      The critical point here is that it is the database manager that wants to manage the logging/journalling; Oracle and Sybase and IBM and Informix will be loathe to pass on responsibility for this to Hans Reiser, wonderful guy though he is.

    Conclusions

    1. Sorry, I have to disagree with you on ReiserFS being of fundamental importance to those doing serious database work.

      What will be of fundamental importance will be when Stephen Tweedy's Raw Device Support [lwn.net] gets integrated into the "production" kernels. That is what Oracle is looking for (consider: Oracle has pumped some funds into RHAT, and RHAT is paying Stephen Tweedie... Could there be some connection?)

    2. Journalling IS important for sorts of applications that manipulate lots of files, which includes things like dynamic web serving and file serving.

      Even if this isn't such a boon to those doing serious RDBMS work, it can still be a boon to lots of other folks...

  • by Christopher B. Brown ( 1267 ) <cbbrowne@gmail.com> on Saturday November 06, 1999 @08:50PM (#1555673) Homepage
    I've got a filesystem that has been using ReiserFS for probably 6-8 months now, and Hans has been working on it since at least July 1997.

    "Who was first" isn't all that important; it should be noted that there is considerable communication between the development groups, and there are conscious efforts ongoing to make sure they build facilities that will be useful across the board:

    • The ReiserFS folks have been doing BTree "stuff," and intend to provide some code that should be usable by anyone wanting to do B-Trees at the kernel level, whether that be with ReiserFS, ext3, "ext4," or (and this has been explicitly mentioned) SGI's XFS.
    • Considerable discussion has taken place in trying to coordinate needed modifications to kernel code in terms of:
      • VFS
      • Buffer management
      • Cache management
      It often enough turns out that what one group needs another finds that they also need.
  • SGI is still working on it.

    You haven't seen a release; based on the discussions at ALS involving the developers, it would be surprising to see a "beta" before the end of 1999.

    A "beta" is not production code, and doesn't include integration into the "regular" kernel. I would be entirely unsurprised to hear that this hasn't yet occurred by the middle of next year.

    will it replace ext2fs?
    Not likely any time soon...
  • I've got a partition that has been running ReiserFS for quite some time now.

    As for the possibility of forking, that was intended as a way of raising funding to support the free version. Now that SuSE is funding ReiserFS, it is rather less likely that Hans Reiser will be feeling the need to bang on Sun's door looking for money.

    The hype may have been about XFS, but note that no code for XFS has been publicly released. And note that ReiserFS has been under active development since at least July 1997, which means that while silly people that watch fads may have been off hyping XFS, ReiserFS is hardly new and hardly surprising.

    Note, all of these developments in filesystems move us towards having a choice of filesystems, and the ability to tune systems for one kind of behaviour or another. None are likely to supplant ext2 for our root partitions any time soon, in much the same way that commercial UNIXes' "advanced" filesystems have not largely supplanted "traditional UFS" for root partitions.

    Plus ca change, plus ca reste meme.

  • The critical bottleneck resulting in the 2GB limit is that of the VFS layer that sits in between the kernel and filesystems.

    That bottleneck is not resolved by changes to filesystem functionality.

    This means that ReiserFS does not fix the problem; this means that XFS does not fix the problem.

    At present, your choices for resolving the 2GB file size limit are two:

    • Use the LFS API that SAS has promoted for allowing 32 bit UNIXes to support 64 bit file sizes when applications are recoded to use the LFS API.
    • Run a 64 bit architecture such as Alpha or UltraSPARC.

BLISS is ignorance.

Working...