Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

Benchmark Madness 89

Guillem Cantallops Ramis writes: "In Bulma (Balearic Islands LUG) we've done real benchmarks this time: Mongo benchmarks (designed by Hans Reiser and used to test ReiserFS, slightly modified by me to support XFS and JFS), kernel compilation benchmarks, and a small "database simulation" benchmark. You'll find everything in english this time, with benchmark results and interesting comments by Dr. Ricardo Galli (Universitat de les Illes Balears, UIB). Have fun... and switch to a journaled filesystem now!" The previous article was here.
This discussion has been archived. No new comments can be posted.

Benchmark Madness

Comments Filter:
  • Soft Updates are not acceptable if you need enterprise class reliability. Soft Updates does not remove the need to fsck, it just allows you to do it in the background.

    Unfortunately, while this is fine for your firewall and your little web-browsing "workstation," a real server cannot operate effectively with the majority of its disk bandwidth being sucked up by the background fsck.
  • by Anonymous Coward
    Benchmark breed inapropriate comptetition. Some
    remember back in old days of unix, some placed
    code segments of kernel api into userspace just to
    get better results, because benchmark has ran
    in user space. Techically it was deemed inapropriate,
    but hey they won!
    Publishing such benchmarks is cumilitive of
    negative mojo, it creates negative feedback to
    the authors of the code that is free. Publicity
    stunts like that are no better than of Mindcraft
    enterprises.
  • by Anonymous Coward
    It would be verbose, slow, and constantly be upgrading itself to use the latest buzzwords.

    It would call itself a journaling filesystem, but everyone knows it's just a hack.
  • by jandrese ( 485 ) <kensama@vt.edu> on Wednesday May 23, 2001 @01:03PM (#202656) Homepage Journal
    One advantage that ReiserFS and XFS are supposed to hold over ext2fs and other ufs based filesystems is the directory lookup time on directories with moderate to moderatly large numbers of files (1 million to 10 million or so). Does anybody know of any benchmarks available on the net that can backup this claim? If you want to test it yourself, you can look into Postmark which is easy to compile and simulates a heavily loaded mail or news server.
    Unfortunatly the primary site appears to be down (I just downloaded the file a couple of days ago!), but if it comes back the primary distribution site is: http://www.netapp.com/ftp/postmark-1_13.c [netapp.com]


    Down that path lies madness. On the other hand, the road to hell is paved with melting snowballs.
  • Uh, journaling _is_ a critical feature in any modern file system. The reason NTFS (for example) doesn't perform as fast as you'd expect is because it does journaling, which slows it down very much. Even so, you can tune it to perform almost as well as ext2 for some purposes.

    And yes, you do need a new filesystem for journaling, I'd like to see someone retrofit this into ext2 or fat32. This is something that requires a new foundation, a complete redesign.

  • As for the "lower cost(complexity)", i doubt that severely. Journalling isn't all that complex, either.

    Journaling may not be that complex, but XFS sure is. The code size the Usenix paper quotes was over half the size of the BSD kernel I was using at the time.

    XFS does give a number of other things. Hashed b-trees for file name lookups, multi volume file systems (with draining), GRIO (not in the GPLed version though).

    I don't know about the size of RiserFS, but it too offers more then just journaling, it has some size and speed performance bonuses on small files.

    FFS's soft updates changes were fairly small, but not simple. The second round of them adding file system check pointing are also pretty small, but again makes things harder to compare.

    I like FFS + soft updates a whole lot. I think someone ought to port it to Linux. However if I had to run a Usenet News system XFS would look very interesting since it will do name lookups in huge directories much faster then FFS.

  • Um... b-trees are always balanced, it's one of their most basic properties.

    Yes, but because they force balancing work to be done on (some) inserts and deletes. Sometimes a lot of work. And that can require locking and blocking on those operations. They are still good data structures in many cases, but one must remember they get fast search times at the expense of possibly slow insert/delete times (not as slow as a insert on an unbalanced binary tree could be though as that hits linear).

  • One caveat to using Jouneling file systems on RAID5 disk systems, is that RAID5 is particularly slowed on writes, while Journeling file systems WRITE TO DISK TWICE; once to the journel and again to the file system meta-data.

    I'd like to see these benchmarks on RAID5 systems.

    Futher, I'd like to see these benchmarks on data sets bigger than memory. I think the reason they had dramatically different results on the first run versus subsequent runs, is because alot of inode data got cached in the dcache. This is also a probably why alot of the benchmarks were so similar.

    Basically, these were poor benchmarks on unrealistic systems. Single disk systems are mostly used on workstations and home systems. Journeling is mostly advertised as an Enterprise level feature and Enterprise level machines use SMP and RAID5.
  • I agree that the level of FS knowlege by the Slashdot dweebs (aka people with User IDs over 100,000), but you are not completely correct.

    You can Journel meta-data and data with Journeling file systems. Most just journel the meta-data. However, Ext3 developers found that due to their design the could journel meta-data and data. So that is a counter example to your assertion.

    Second, B-trees are good in general but they have serious design conflicts with continously and simultaneously accessed file system type environments. To make this more concrete think about keeping b-trees balanced in the middle of constant use. Further consider the difficulties with shrinking b-trees while multiple processes are using a directory and keeping it consistant with the dcache. These are solvable problems but they extract a price in complexity and unexpected stalls for processes doing work.
  • The journaling capability isn't just about improving fsck times. Check out Intermezzo [inter-mezzo.org]. The company that's working on Intermezzo (Mountainview Data [mountainviewdata.com]) has some other cool sounding products that take advantage of some tricks only possible with journals (ie, taking snapshots of filesystems).... looks like mostly vaporware right now, though. I'm sure there are some other applications, but I'm not imaginative enough to see 'em. :-)
  • by gid ( 5195 ) on Wednesday May 23, 2001 @12:52PM (#202664) Homepage
    Kernel compile is real world. Hey, I compile stuff all the time. Besides, it gives comparisons between the performances of 3 different filesystems.

    It's comparing how well these different filesystems can compile a kernel. Do these tests tell you how well ext2, reiserfs, xfs will be able to serve up dynamic php/mysql content on an apache server? Hell no. Use apache's bench mark stuff for that on a controlled network, controlled machines, reboot between tests to eliminate caching, etc. But that's another can of worms. Please don't try reading more into the results than what's given to you.

    ---

  • Kernel compile is real world. Hey, I compile stuff all the time.

    I'd go further than that - for me, compiling stuff is really the only performance benchmark I'm interested in - everything else happens "fast enough" these days. If reformatting my /home partition as ReiserFS is going to increase performance for kernel compiles significantly, I'll seriously consider converting.

    Go you big red fire engine!

  • The POINT was that transactions do for database consistency what journalling does for filesystem consistency. So Sorry if that was so unclear that "my eyes are turning brown".
  • The only advantage the new FSes hold is probably their journaling capability

    Probably?

    Sorry, but "the good old Ext2" has left me crying for lost files more times than I want to count. I can abuse, for example, Solaris systems pretty well, and I have to work hard to get serious filesystem corruption. By comparison, for a long time it seemed every other time I just did an "init 6" on my SuSE box with Ext2 filesystems I lost another couple of files, until it got the the point where something to do with Qt is gone, and the window system won't come up. Greaaat.

    "Only" journalling capability is akin to complaining about Oracle's "only" advantage over MySQL etc. being rollback/atomicity/transaction consistency. Gee, what a "tiny" thing.

  • So now the "real world" is defined as random, cool... What makes compiling the kernel not a real world test. If you look at the bandwidth meter on www.kernel.org whenever there is a release, the bandwidth utilization would lead me to believe that quite a few people in the real world compile the kernel. Aside from that I am sure there are at least a couple of "real world" c/c++ development servers out there in the world running linux. Since when is compiling code not a good benchmark for that type of box? I am sorry but I think you might want to reconsider your view of the real world.
  • by Mullen ( 14656 ) on Wednesday May 23, 2001 @01:13PM (#202669)
    The only advantage the new FSes hold is probably their journaling capability, leading to faster fscks, faster bootups and less risk of data loss.

    Yah, those worthless little things; faster recovery, preserving data and faster boot-ups.


    --

  • Uh, I want to have whatever it is you're
    smoking. Journaling doesn't protect your data.
    The risk of data loss is about same as
    with non-journaling FS. The only thing that
    is protected by journaling is meta-data, i.e.
    your FS structure. The only way to protect
    your data is via mirrored RAID.
  • > what did you think I was talking about?

    You were talking about, and I quote:

    >> Replace [snip]"less risk of data loss" with "no risk of data loss"
  • I am using ReiserFS for over a yaer and I have noticed two problems:
    a)NFS (kernel NFS) exported partition stop working after I have switched to kernel 2.4.2. Everything is OK untill I mounted the NFS exported partition and start reading big files from it. To recover I have to reboot the computer (this one with reiserfs exported with kernel mod NFS),
    b)The 2.4.2 kernel with reiserfs enabled, creates partition that does not support big (>2.1GB) files.

    P.S. Plesae e-mail me if you know the solution to those problems.

  • I haven't had any problems installing XFS. You probably went about it a different way than I did - I grabbed the XFS ISO image from SGI, then proceeded to do a new installation. I've been running redhat 7.1 w/ xfs since it came out, and haven't had any problems. The ISO images can be grabbed from:

    ftp://oss.sgi.com/projects/xfs/download/Release-1. 0/iso [sgi.com]

    This isn't an option for those not running redhat, or those who don't want to bother with reinstalling, but using this method of installing XFS is extremely simple.
  • Looking back to the old benchmark on a slow CPU with JFS, the reading od 1398 100k files took just under a second, 200 times faster than next "competitor".

    This is transfer rate of 140Mbyte/s, some 20 times faster than my current 5200rpm Maxtor and Quantum drives.

    What is that?
    Typo?
    Effect of cashing? Why only on JFS?

  • Over 80% of file manipulation time is taken by reading. From the rest, 80% is taken by writing.
    Deleting, stats or symlinks take less than 4% of time.

    What is very suspicious is, that copying, which consists of reading and writing, takes half the reading time. Does it mean, that if I copy loaded binary to a dummy disk area, I'll sppeed application start twice?

    The annoying delays are caused by reading larger files, like images, 10Mbyte mailbox, or a large binary. Why does have the XFS and Reiserfs to be 24% and 9% slower than ext2? Reading does not involve any journaling action! B-tree delays? worse cashing? Where is the problem?!

    The 10% on reading makes way bigger impact than ten times slower deleting.

    Can we make XFS and ReiserFS to read as fast as ext2 or faster?
  • Well, everyone is entitled to their own opinion and everything, but I tried getting XFS to work, and it was like pulling teeth. Once it was on, it acted...quirky...System performance was the same in some cases, but when I had to pull out deleted package trees on it, oh momma did it ever suck (no flame intended). IMHO, Reiser is the all around more averaged and optimum choice.

    That's just my $.02

    JoeLinux
  • ok, you all remember that the mindcraft stuff right? as in it wasn't impartial, or all that independant, etc. this is an obvius joke, and supposed to be funny, not interesting!
  • I've had no problems running XFS on my main x86 box and my Alpha, although ACLs don't work on Alpha yet.

    I think reiserfs is still too young, they've already changed the on-disk format between 3.5.X and 3.6.X and now they plan on redoing the tail packing, I'm all for rapid development but I'd rather keep my data on something that atleast has stablized blueprints.
    --
  • I believe that's only a reiserfs problem.

    From the XFS FAQ:

    Q: Can I run XFS on top of LVM?

    Yes XFS should run fine on top of LVM. If you plan to do so please keep in mind that the current XFS cvs tree (and also the XFS 1.0 previews) already contains the lvm beta6 code and some tweaks for XFS - currently it is recommended to stay with this version instead of the current lvm version (beta7 at the moment) because there are some logistical problems around that version. Martin K. Petersen is keeping an eye on the lvm in the XFS tree as soon as there is a recommended new version of it i think. Keep one thing in mind with XFS on lvm: mounting snapshots of lvs containing XFS filesystem does not work due to the journaling nature of the filesystem (but a fix for this is on the TODO list).

    Q: Does XFS run with the linux software RAID md driver?

    Yes - the current XFS tree and the latest released previews contain fixes which should allow you to run XFS on top of any md RAID level.

    And you can also use XFS safely on NFS exports, something reiserfs was still fighting with, and required a seperate patch to work properly with last I checked.
    --

  • Make your filesystem code as fast as you want it, you still have to deal with the speed of the physical device. Aside from "cache the whole disk in RAM," isn't there a limit to how fast a FS can get, because of the drive? Aren't we really just trying to find out who can write the best drive caching algorithm?
  • by Spoons ( 26950 ) on Wednesday May 23, 2001 @12:42PM (#202681) Homepage

    Because the availability of the data in the Linux cache may affect the time measured for cp -a, I repeated the command a couple of times before doing the real measurements (there was a huge variance with the first time).

    I don't know that much about benchmarking. But this statement seems a little off. If there is data in the cache then the disk is not being tested on the reads. Its seems like the first time running is by far the most accurate because no data is in memory. However you have to ensure that the no data is in memory for all tests. This seems fairly easy to reproduce by rebooting and performing the same set of commands at startup to run the tests.

    ---
  • There are two kinds of write caching: within the kernel, and within the drive. Only the latter needs to be disabled to ensure proper functioning of a JFS.

    In your example, the kernel will let you believe it wrote the 500mb to the single drive, without actually sending this data to the drive. That is ok. Now when you type 'sync' it will really write the 500mb out and then the kernel will think the 500mb is physically on the magnetic platter. But that is *wrong* when write caching is enabled within the disk!

    Of course a disk will usually not have much more than a meg or so of cache.
  • Sorry, I should have read the article you responded to. It seems that _that_ guy was confused by the two types of write caching (calling the kernel write caching dangerous, which it is not... while the disk write caching can in fact defeat the purpose of a JFS).
  • Yep - but then Oracle is the most expensive RDBMS around and also has the worst licence ("pay per Mhz"). I'd take DB2, Sybase or even (gasp) SQL Server (even if that means running on NT) anyday against Oracle.
  • Did you read the overview in the link? If an EXT3 fs has been cleanly unmounted (or fscked) it can even be remounted as EXT2 again.
    And it's still journalling? If not, it is no longer a journalling filesystem. when you do that.

    Certainly the parts of EXT3 that EXT2 uses might be compatable.
  • A good "real world" test would be putting it on a database server that churns data with disks at near-full capacity (perhaps 24/7) - but then, how would you measure the performance? With a "real world" script? The problem with benchmarks...

    Well, that's the real-world database test. On the other hand, there's the real-world rgrep test, the real-world cp -a test, the INN nightly expire-and-so-on maintenance test (that I suspect would blow away ext2 due to the number of files/directories processed), et cetra ad nauseum. All of whose results are going to depend on things like the percentage of the disk used and probably other things not imagined before actual measurements of their effects. I wouldn't expect the effect to stay the same between different versions of the same database engine either.

    But you should include a few simulated power failures and fscks in there, if you are intent ("24x7") on modelling the real world, shouldn't you? And maybe the time to repair or restore a backup of the database if it proves necessary after those power-downs.
  • by Velox_SwiftFox ( 57902 ) on Wednesday May 23, 2001 @04:34PM (#202687)
    Oops, it's not yet in the standard kernel. Oops, the patches don't apply against the latest, most debugged versions of the kernel. Not, it's not time to switch now. I'll stay with stable and standard functions, thanks

    Let me check menuconfig on what I got from kernel.org... okay, Linux Kernel v2.4.4 Configuration: File systems ... Quota support ... Kernel automounter support ... Kernel automounter version 4 support (also supports v3) ... Reiserfs support.

    It is in the latest stable kernel.
  • Good god, learn how a journaled FS works! Journaled filesystems only guarentee the consistancy of metadata, not user data. Thus, a well designed journaling FS is just as fast as a non journaling one. Writes to the journal only come into play when metadata is being modified, say when creating or deleting a file. And even then, all it involves is a couple of extra disk writes, which are coalesced into a sequential stream of writes in most systems. The actual performance hit of journaling is pretty negligable when it comes down to it. For example, the non-indexing version of BeOS performs on part with ext2 for metadata updates. Lastly, using B+trees isn't a "trick" its just plain good design. The same goes for allocation groups, storing extents in trees, etc. For example, while ext2 goes all over the disk to find all of the double indirect nodes for a large file, XFS simple does a tree lookup in (usually) one block.
  • Yea, things look pretty bleak. BeOS isn't officially dead, but nothing much is going on. eVilla got some attention recently though.

    As for OSS development, it would be the best possible thing for BeOS. Even if Be has to strip out all the commerical code and release a non-working version of the source, great technology shouldn't be allowed to die. Something ala the BSD core team applied to the OS in its entirety (including userspace) would be great.
  • A kernel compile is a real World test. But definitely NOT a good choice for testing fs performance. This is why there was only 2% difference between them on that test.

    Real World tests, need to be chosen so that they put a strain on the part of the system that is being scrutinized. Choosing a test that barely uses the part in question is ridiculous.

    Choosing a "real World" test that traditionally strains nothing more than a particular part of the system (CPU) of which is not the part being scrutinized, is actually a very bad choice of test. Then going on to call this "real World" pertaining to the tested part is just silly.

    This is kind of like getting an ordinary car, putting much wider slick tyres on it, and then only driving in a straight line at street legal speeds and then saying they don't appear to be giving much improvement in handling. Then someone coming along and saying, "it is a real World test because they drove within speed limits".

    You want to test grip, test on a skid pad. Engine, then test on a dyno. Drag coefficient, test in a wind tunnel.

    Inevitably, you have some guys saying that these are not good tests, because they are not "real World". So they'll go out to the drag strip, some 200kW 4 cylinder 4WD will blow the doors off some old Ford hotrod that weighs twice as much and that is putting 400kW onto the ground at the rear wheels.

    Sure, the driver of the Ford smoked up the rear wheels too much, trying to push a lot more car towards the finish, but this does'nt mean that the Ford had a less powerful engine. It means the 4WD had better grip, perhaps less weight and more humanly managable power. But what if all we were interested in at the moment was engine power? Should we base our opinion of the comparative engine power on this drag race? Hell no.

    Real World tests are only good for proving that small localized tweaks (usually done with "artificial" tests), brought about improvements at the end of the day which were practical for the task at hand, thus real World. But trying to get these tweaks done trough a single "real World" test introduces too much requirement for human interpretation of the results and thus error and limits on performance gains. If you use an artifical test for disk i/o then tweak accordingly, network then tweak, core application then tweak, OS kernel then tweak, etc, you will have fine grained control over what is lacking where and then at the end of it all, you can prove the performance gain with your real World test compared with the same test done before the changes. Code profilers are great for application optimization, good luck finding those limitations through a generic "real World" test.

    These "real World tests" should never be used to test seperate components of systems unless they are carefully chosen to tax that component to stress levels. make bzImage was not carefully chosen to test file system performance. It is a choice that shows extreme ignorance, it would however be an excellent choice for testing CPU speed, as would Ray Tracing.

    To test file system performance, cp, mv, rm -rf, and maybe a script that creates files from /dev/zero, would be good choices.

    To test disk performance and perhaps RAID performance (whether hardware or software), then some tests like hdparm -Tt /dev/md0 or timing a dd if=/dev/md0 of=/dev/null might be good choices.

    But testing a kernel compile and then saying, dah this here AMI MegaRAID controller with 6x striped 15,000rpm SCSI 160 drives is only 2% quicker than this here 10 year old 200MB Connor IDE drive is laughable.

    They, don't know what they are doing, plain and simple. Sorry. I would like to see a decent comparison of Ext2, Ext3, ReiserFS, XFS, JFS, FFS+Softupdates, BeOS FS (if possible), QNX4 FS and the FS Solaris and Unixware use (FFS?), along with any other notable contenders.

  • by Baki ( 72515 ) on Wednesday May 23, 2001 @01:14PM (#202691)
    At lower cost (complexity). Using a plain old filesystem such as BSD's FFS, but it could be added to ext2fs too. Alas Linux developers once more thought they had to reinvent the wheel instead.

    Softupdates can guartantee consistency in case of crashes, thus providing save yet asynchronous-like performance (i.e. optimal performance).

    Details are explained on the website of the author [mckusick.com], Kirk McKusick. Also you can find a link there which leads to an interesting (technical) comparison of logging (aka journalling) versus softupdates.

    I wish someone would port softupdates to linux (ext2fs). Or better yet, make BSD's FFS+softupdates a native Linux filesystem. It would surely outperform the other currently available filesystems. At least on my computer, when I benchmark FreeBSD+FFS+softupdates against Linux+ext2fs/reiserfs (on the same hardware, disk, disklocation) FFS+softupdates consistently wins hands down. I don't think this is because of FreeBSD's kernel, but rather because of softupdates (and FFS with it's large blocksize combined with smaller fragments to avoid too much slack).

  • by selectspec ( 74651 ) on Wednesday May 23, 2001 @12:24PM (#202692)
    Benchmarks? Compiling the kernel to benchmark a filesystem? Hmmm. Sorry but, No. How about that "real-world" random write program? Give me a break people. These are not valid benchmarks which was clearly stated the first time this story was posted. Why /. seems to think it's interesting is beyond me.
  • He repeated it many times so the data would be in cache.

    This wont affect write times because write-caching is not enabled in linux by default (it's VERY dangerous). So he was only limiting the effect of the harddrive/filesystem of the source disk in the performance measurement, and leaving the test partition (reiserfs, xfs, ext2) the only bottleneck in the benchmark.
  • This isn't a troll, but I thought BeOS went kaput recently.

    If they dont release the sourcecode for BeOS under a liberal license, I dont see the point in tying up your coding effort into a proprietary platform that seems to have a pretty bleak future. Maybe you could correct any of my assumptions if they are wrong? I dont know..
  • & just think soon MS will have everyone using NTFS! It at least is better than FAT (at least for most things), maybe it's not quite up their with some linux filesystems, but...
  • Here is a good article that may help explain differences. http://www.linuxgazette.com/issue55/florido.html [linuxgazette.com]
  • "Only" journalling capability is akin to complaining about Oracle's "only" advantage over MySQL etc. being rollback/atomicity/transaction consistency. Gee, what a "tiny" thing.

    Watch out man, you're eyes are turning brown. Oracle has transactions but that's not all. You forget a bunch of stuff (quotas, speed, scalabity, fail-over, dictionnary etc etc). Let's not compare apples to oranges. While I am a big MySQL fan, there are places where that type of software just doesn't cut it.

    The initial poster, the guy you were replying to obviously doesn't have a clue of the value a journaling file system can bring. Make sure you don't loose your point by loosing your credibility.

    Just for the record, I installed Linux today on a VA Linux 3500. 3 scsi cards, 1 RAID controller. You know how long this beast takes to boot? A while! If you add to that the time to fsck 26 gig of data, I kill myself. I agree with you, journaling is the shit!

  • .... Watch out man, you're eyes are turning brown....

    No wonder nobody listens to me. My eyes have ALWAYS been brown. :-(



    All Your Base Are Belong To Us!!!
  • http://linux-xfs.sgi.com/projects/xfs/xfsroot.html describes how to make a Linux XFS Root from your existing disk.

  • Replace "faster fscks" with "instant fscks" and "less risk of data loss" with "no risk of data loss".

    I admittedly haven't used Linux's implementation of JFS, but a decent journalled filesystem can consistency-check the disk so fast it's ridiculous.

    Further, short of hardware failure, there is literally *no* way for a bug-free journalled FS to lose successfully written data. Period. Obviously you'll lose whatever didn't get fully journalled yet when the power failed (or whatever), but everything that actually got written to disk will be 100% reliable once the fsck completes. No lost clusters, no files hanging out in limbo. It's literally impossible.
  • You'll note that I mentioned hardware failure was still a problem. Hardware failure is the only thing RAID protects against, so obviously a journalled RAID filesystem is ideal.

    Of course it only protects the filesystem from being corrupted; what did you think I was talking about? That is, after all, the entire point of running an fsck in the first place.
  • cough* EXT3 [redhat.com]*cough*TUX2 [lwn.net]*cough

  • Did you read the overview in the link? If an EXT3 fs has been cleanly unmounted (or fscked) it can even be remounted as EXT2 again.

    And yes, I know TUX2 isn't a journaling fs strictly speaking but it does help illustrate my point that a fs does not have to totally rewritten to add that kind of functionallity.

  • Apart from that, XFS is also written for extreme scalability, in parallelism as well as size of supported filesystems and files.
  • WTF? How is implementing journalling "reinventing the wheel" in comparison to implementing a newer technique that achieves the same thing? As for the "lower cost(complexity)", i doubt that severely. Journalling isn't all that complex, either.
  • think about keeping b-trees balanced in the middle of constant use.

    Um... b-trees are always balanced, it's one of their most basic properties.

  • OK, the worst case would probably be bad if you have some sort of realtime constraints, but other than that, it doesn't matter.

    As for the locking and blocking, that can be done very parallelism-friendly by precautionary splitting/coalescing while you descend the tree, instead of having the changes propagate upwards.

  • Doesn't Ext2 use what basically amount to a text file to hold the contents of a directory?

    Just try having 100,000 files in a directory under Ext2 vs. RiserFS.
  • When XFS was initially released, I ran some very simple, for my-purposes-only benchmarks of XFS and Reiser, using Postgres and the dbbench utility. Ran the dbbench with 100 clients connecting a few times with a fresh initdb on a Reiser partition, and with a fresh initdb on an XFS partition. Didn't save any numbers, but it was considerably faster on XFS. Run it for youself and see:) (Note: This was w/out the use of the notail option for reiser) - James
    signature smigmature
  • Ok. I have a Linux system. How do I make the switch? Are there any HOW-TOs or FAQs that address this?
  • Actually, I've read papers that make the opposite claim that you're making. Reading for the most part happens from disk cache, or from OS buffers of disk, so you don't need to optimize reading as much.

    The Buslogic SCSI driver from Linux will show you what your disk accesses look like in /proc/scsi/Buslogic/ (where "n" is the number of the controller, 0 through the number of Buslogic controllers.) On my machine with an uptime currently at about one week, used for surfing and reading email, right now it says it has read 853,097,472 bytes, and written 1,269,571,584 bytes. So it's definitely doing more writing than reading to the actual disks. At the OS level I can't tell if it's reading or writing more. If it _is_ reading more, then the buffers are doing a great job of hiding it from the physical disks. On my machine, at the FS level, optimizing writes looks like a win.
  • admitantly there not the same -- but that's why you do multiple *diffrent* benchmarks :)
  • by da groundhog ( 137234 ) on Wednesday May 23, 2001 @12:37PM (#202713)
    ok, i can kinda see a kernel compile cause of all the io but what ever happened to good old find and grep. Certainly they could have come up with some better tests.

    find / |grep blah

    SEE, now that's alot of IO
  • A kernel compile is a real world situation, but it is NOT a real world performance test for a filesystem. When compiling, your CPU is under a lot of strain, but I dare say your filesystem isn't. Take a look at the results in the Kernel Compilation table in the article.

    ReiserFS (notail), which got the "green" for the fastest make bzImage, took 291.14 seconds realtime, of which 289.33 seconds was CPU time. So, on average, the CPU had a 99.4% load during this kernel compilation. Since a CPU is many orders of magnitude faster than a disk, one can assume that the disk was sitting idle for much of this time - hardly an intensive test of filesystem speed.

    A good "real world" test would be putting it on a database server that churns data with disks at near-full capacity (perhaps 24/7) - but then, how would you measure the performance? With a "real world" script? The problem with benchmarks...

  • Why is Linux just now getting viable journaling file systems? Hasn't that been a part of comercially viable OSes for quite some time now? In the all go no stop world of serious data crunching, one would think it absolutely necessary. Perhaps the lack of one of sufficient speed has kept linux from completely taking over the world. Which, if Linux DID take over the OS market, and dominated with something like 99.9% market penetration (don't forget the fuck-ups running BSD :) who would the government bust on for monopolistic non-competetive bullshit? Just wondering.
  • Umm, sorry dude, but Kirk McKusick recently added FFS snapshots to FreeBSD. I believe that -CURRENT is now able to do fsck-less boots, followed by a background fsck, but don't quote me on that 'coz I'm not sure how it could be possible.

    But if anyone can some up with something like that it has gotta be McKusick =)

    .flip.
  • by InsaneGeek ( 175763 ) <slashdot@insanegeek s . com> on Wednesday May 23, 2001 @05:47PM (#202717) Homepage
    I beg to differ, write cacheing is normally ENABLED by default. If the drive was in full sync, then your performance would be complete crap. Everytime you did anything you'd have to wait for the data to get destaged onto the drive (very painful).

    Do a "dd if=/dev/zero of=./file.test", wait a bit and break out of it (you could also do a rm -rf on a larger directory with lots of files). It will spit out how much it has supposedly written out to the drive, then quickly do a sync, you'll have to wait a bit while the data gets out of cache. The more memory you have the more pronounced this becomes, the sync destages the data to the drive from write cache. My box with 2gig of ram "supposedly" wrote ~500mb out to a single drive in just about 5 seconds... until I typed sync.
  • He he, funny joke! I get it.

    I think you're merely joking, trying to pull people's chains, myself, but just in case you're not, or in case somebody out there thinks this "obvious" joke mail makes a good point (and it most certainly does not), I'll still reply, assuming you were actually serious. :-)

    You answered your own question and didn't listen to yourself:
    'The only advantage the new FSes hold is probably their journaling capability, leading to faster fscks, faster bootups and less risk of data loss.'

    Those things you listed ARE stated goals of JFS's in general; performance never was, and most people are right when they don't care if it's *half* or a *third* the speed. In fact, they expect it. Want a faster file system? Get an UltraSCSI-III drive at 10,000 RPM's. That'll give you faster -and- Reiser-FS and XFS will pull further ahead of ext2, due to command tag queuing during journal writes!

    Performance has always rightly been considered a trade-off for reliability, which should nearly always be the case in any application, anywhere, anytime. Would you make this argument if we were talking about OSes or apps, instead of file systems? Of course you (and most people) wouldn't, or Linux's (and BSD's/Unix's) reliability factor would never be brought up. (File system reliability is FAR more important than OS and app reliability combined).

    In the cases of XFS and Reiser-FS, they're actually usually slightly faster, instead of being slower, which makes me enthused, not disappointed.

    Perhaps:

    This is a troll or is flamebait, and it probably is.
    -or-
    You haven't read about the future features of Reiser-FS and XFS. (Just go to the web sites and read about the *current* feature improvements of either, compared to ext2 -- it'll blow you away).
    -or-
    You haven't been the sysadmin of large shops that use both journaled and non-journaled file systems and just wouldn't understand.... Try recovering from a 100G non-JFS'ed system at 4 a.m. some time and you'll immediately change your mind. Turn off the power to a box like that -- I dare you. I triple dog dare you to do that with 100,000 small files on the box, with 500 of them opened for write!
    -or-
    You don't realize that to get point #1 at the top of this email usually requires you to have slower file systems, so that fact that they're "just" on par (actually a little faster) with ext2 is * EXTREMELY ADVANTAGEOUS!!!* (OK, I changed my mind, there is a use for the BLINK tag, after all!)
    -or-
    You haven't seen the EXTREME space savings on small files with Reiser-FS.
    -or-
    You don't have crash-recovery/data-recovery experience.

    Don't *ever* run a productional, mission critical, or any "important" Linux box without RAID 0+1, 1, or 5; a good backup/recovery strategy; well-conditioned power; redundant networking, assuming you need it; and a journaled file system. For programmers, add hardcopies and CVS....

    "Where's the advantage? Where's the progress? The benchmarks only leave me disappointed." You didn't even listen to your own extended Q+A, so I bet you won't listen to us, either. Now that's truly disappointing. (Of course, since you're joking in the first place, I'm actually amused....)

    OK, once again, if the speed is slightly faster (overall), your files take up FAR less space in Reiser-FS, there are a whole slew of new features coming, it's 64-bit/large file/large directory ready (and ext2 isn't even close to ANY of that), it still works with LVM, AND you get those journaling advantages. Oh, by the way, it's also free in both senses of the word, and you're wondering about the advantages?!?!? He he.

    I used to believe in "fast at all costs" until my first programming job, when an experieced programmer (I knew it all, right out of college, BTW) explained to me that CPU cycles are cheap, but people's time never is. If you waste a week recovering a system, you'll NEVER, EVER get that back if you had a file system *10* times as fast.

    I bet you were just joking, right? :-) I bet this was just a troll spoofing speed vs. reliability, just like the overclocker trolls that run around SlashDot! :-)

    The only reason I even dignified this joke with a reply was that somebody less acutely aware of the joke might actually be fooled into this line of reasoning and question switching from ext2 to Reiser-FS today. Switch right now, before you run another app.

    (Not to start a distribution war, but Mandrake 8.0 with Reiser-FS will actually prove to be faster all around than most {all other?} distros out there. They actually compile everything with Pentium optimizations *and* install on Reiser-FS *and* install XFree86 4.0.3 *and* compile apps with their own optimizations enabled *and* tweak the kernel, Apache, MySQL, PHP, Perl, etc. If you want faster *and* reliable, there's your distro)....

    "(Slightly Faster + Journaled + MANY Other Advantages + More Disk Space + No 4 a.m. Wakeup Calls) == Disappointment"

    He he, now that's a good punchline! :-)

    Anybody who believes in speed at the cost of reliability is an obvious newbie.

  • A database really ought to use fdatasync, not fsync if fdatasync is implemented. fdatasync won't update the file access times, thus requiring only a single write operation instead of two.

    Of course, since all "benchmark"/tests used fsync it kinda washes out since the same operation is common among all.
  • Does anyone know how ext3 is supposed to compare with these other fs?
  • High disk utilization is not the same as high IO. There's no disk "O" if grep or find. That's not going to prove much as far as the journaling overhead.
  • by Rosco P. Coltrane ( 209368 ) on Wednesday May 23, 2001 @12:30PM (#202722)
    Since the impartial independent lab test [zdnet.com] from Mindcraft, Inc., everybody knows that Microsoft Windows NT server 4.0 is 2.5 times faster than Linux as a file server and 3.7 times faster as a webserver.

    Who does Mr. Galli thinks he's fooling. I mean, come on ...

    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash

  • By default, reiserfs stores small files and `file tails' directly into the tree. This confuses some utilities like LILO. The notail option is used to disable packing of files into the tree.

    For more information on reiserfs mount options, see http://www.reiserfs.com/mount-options.html [reiserfs.com]

  • is that there is a linux users group in bulma.
    It would appear that linux truly can be found just about anywhere.

    I'm just waiting for Amnesia in Ibiza to replace its dj's with xmms and a kickin' sound system ;)
  • Pod writes: The reason NTFS (for example) doesn't perform as fast as you'd expect is because it does journaling, which slows it down very much.

    Are you that sure that journaling is the only reason NTFS performs poorly? What about NTFS's extreme tendency towards fragmentation [microsoft.com]? What about MFT fragmentation [microsoft.com]?

    It's surprising that a Microsoft Shill would actually admit that NTFS is slower than we'd all like.

  • Barzok writes: Aside from "cache the whole disk in RAM," isn't there a limit to how fast a FS can get, because of the drive?

    Log structured filesystems [harvard.edu] are supposed to achieve nearly the disk write speed [hhhh.org]. At least naively, one would also think they'd get better speed than "journalled" filesystems, because the filesystem itself constitutes a journal.

    Some folks seem to think that clustered writes [harvard.edu] give you nearly the same performance, though.

  • by wrinkledshirt ( 228541 ) on Wednesday May 23, 2001 @12:44PM (#202727) Homepage
    ...and a small "database simulation" benchmark.

    Wow. Does this mean it can handle SQL Server? That's always been my favourite database simulator.

  • I believe that he's referring to the random.c test that was supposed to mimic a real database situation. In that case, the ReiserFS looks pretty bad, being the slowest in both CPU and Real...
  • Journaling IS important and reason enough. ReiserFS and XFS were not designed for speed but for far better data integrity. It is very impressive that for file systems that have to do so much more, are on par with the performance of ext2.
  • The numbers listed for XFS make it seem terribly slow in almost every category compared with either ReiserFS or Ext2, yet the writeup seems to conclude that XFS and Reiser offer comparable performance. What gives? Did something get lost in translation?
  • I haven't heard that claim for a while *cough*
  • more like BLUBBER32, eugh, checkdisk...checkdisk...checkdisk...checkdisk... *shudder*
  • Does anybody know of any benchmarks available on the net that can backup this claim?

    Try this [namesys.com]... Not very impartial I suppose :-) But benchmarks all the same.
    --

  • ReiserFS [namesys.com] (and I believe JFS and XFS) use b-trees for practiacally everything. That means, for starters, that finding files in heavily populated directories is exponentially faster.

    See RFS Features [namesys.com]
    --

  • Does anyone know how Journaling on software RAID is coming around? As I recall, the last time I checked it was a no-go.
  • and most of the world is using FAT.
    Excuse me while I cry.
  • Other "only" advantages of Oracle over MySQL include triggers, stored procedures, sensible handling of validation rules, better handling of multiple data spaces and better performance on large-scale OLAP environments... :-)
  • Excuse me, but I think you're missing a point here: I'm not complaining because these people say something against a post of mine, but because they're all saying the same thing. I don't know about your opinion, but I wouldn't classify that as being rhetorically valuable. Imagine the following discussion:

    A: Question 1
    B: Reply 1
    C: Reply 2
    D: Reply 1
    E: Reply 1
    F: Reply 1

    Not very elegant, is it? Three absolutely redundant replies, just because D, E and F thought that they'd get some Karma out of saying something. Whoa.

    That has nothing to do with me being luddite; I mean, I wouldn't post a question if I wasn't prepared for the answer.

  • What is your point, in reference to the discussion at hand, which is journalling?

    None. I was referring to the thetorical quality of the discussion as such. Sort of a meta-remark. Since meta-discussions are not a canonical part of Slashdot discussions [slashdot.org], however, I realize that my post was probably offtopic; however, by specifying "OT" in the subject line, I had thought this would have been sufficiently stated.

  • by absurd_spork ( 454513 ) on Wednesday May 23, 2001 @12:23PM (#202741) Homepage
    It's really amazing that all these new filesystems that people have invested tremendous amounts of work into are not really significantly faster than the good old Ext2; ReiserFS is better in some disciplines, admittedly, but in others Ext2 is best.

    The only advantage the new FSes hold is probably their journaling capability, leading to faster fscks, faster bootups and less risk of data loss. Did we really need a new set of filesystems for that? ( BSD Soft Updates [mckusick.com] show that the whole speed and reliability advantage can be had with old filesystems as well!) Where's the advantage? Where's the progress? The benchmarks only leave me disappointed.

What is research but a blind date with knowledge? -- Will Harvey

Working...