Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Linux Software

Ask Slashdot: How Reliable are Enormous Filesystems in Linux? 145

Josh Beck submitted this interesting question:"Hello. We're currently using a Mylex PG card and a pile of disks to run a 120 GB RAID5 under Linux. After some minor firmware issues with the Mylex (which their tech support acknowledged and fixed right away!) , we've got a very stable filesystem with a good amount of storage. My question, though, is how far will Linux and e2fs go before something breaks? Is anyone currently using e2fs and Linux to run a 500+ GB filesystem? "
Josh continues... "I have plenty of faith in Linux (over half our servers are Linux, most of the rest are FreeBSD), but am concerned that few people have likely attempted to use such a large FS under Linux...the fact that our 120 GB FS takes something like 3 minutes to mount is a bit curious as well, but hey, how often do you reboot a Linux box?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How Reliable are Enormous Filesystems in Linux?

Comments Filter:
  • by Anonymous Coward

    When will we (do we) have support for files larger than 4 gigabytes ?
  • by Anonymous Coward
    There are a few hacks that allow support for files over 4G. You won't see files over 4G supported natively until we move to a 64-bit OS (at which time we will then have support for files up to 18 Exabytes - that is, 18,000,000,000,000,000,000 bytes).
  • My experience with journalled file systems on commercial UNIX systems has been overall negative. They may require less (or no) fsck'ing, but you pay a heavy price for that in terms of performance.

    The only way journalling can work reasonably well is if you have a battery backed RAM to hold the journal.

  • I get just under 9 MB/s in linux with my Seagate Barracuda (it's a narrow drive, but narrow versus wide doesn't matter when you only have a few devices.)

    At any rate, 3 MB/s seems too slow, by a factor of 3 or so.

    - A.P.

    "One World, One Web, One Program" - Microsoft Promotional Ad

  • For a minute there I thought you were talking about the tape drives. Why anyone would want even one, let alone eighteen, of those drives connected to their computer had me baffled fot a moment.

    - A.P. (Yes, I've had several bad experiences with Exabytes.)

    "One World, One Web, One Program" - Microsoft Promotional Ad

  • I can't tell you how well it worked, but I was watching VA set up at least one 100 GB+ FS for a customer who insisted on SW striping several nonsymetrical disks together to form a 100 GB partition. It's doable, but not recommended (VA was recommending several saner alternatives but the customer wouldn't buy it). This is pretty much the best way to guarantee yourself problems down the road -- HW RAID 5 or RAID 1/0 is a much better alternative for a reliability standpoint. If anyone at VA is interested in commenting on successes/failures with very large filesystems....

    There are a number of well-known websites which utilize Linux, including Deja News [dejanews.com]. Not sure what kind of partition sizes they're using, but it would be fun to know.

    FWIW, you can modify the reserved % parameter using tune2fs rather than mke2fs and save scads of time. You can also force an fsck (man fsck) to time the operation if you want.

  • Ext2fs would take months to fsck that thing. :)

    There is a log-structured filesystem for linux called "dtfs" available at their home page. [tuwien.ac.at] The author tells me he will be shooting for inclusion in 2.3.0 and that the bulk of it is working just fine.
  • since we're in blatant commercialism mode (sigh).

    I prefer the ICP-Vortex GDT line of RAID controllers -- there's even a fibre channel model that works fine with Linux. Leonard Z is a great guy, but I like supporting vendors who support Linux -- ICP-Vortex wrote their Linux driver back in 1.3 days, supports it, and even all of their utilities run native under Linux (none of that bull about booting to DOS to configure the RAID array).

    Interesting thing about ext2: mounting a 120gb partition takes about 3 minutes if you mount it read/write, but it's almost instant if you mount it read-only. Apparently it has to pre-load all that meta-data only if you intend to write to the filesystem.

    e2fsck'ing that beast took over ten minutes (I don't know how much over 'cause I gave up). Formatting it in the first place took about five to eight minutes, so I aborted my e2fsck and reformatted the partition (this is while I was doing system setup and configuration, so there wasn't any data on it).

    We can go up to half a terabyte without going to an external cabinet, using a solid heavy-duty steel California PC Products case and CRU LVD hot-swap backplanes rather than that effete gee-whiz stuff that's flimsy and breaks easily. This is in a dual Xeon 450 configuration. LHS also has a quad Xeon setup that has the horsepower to break the terabyte mark (dual PCI busses, etc.), it's pretty much the same thing VA Research sells (after all, there's not many providers of quad Xeon motherboards for system integrators: Intel and AMI). With commonly available 18gb drives this would require an external RAID cabinet or two. 36gb drives should be available shortly, and those will solve some of the space and heat problems (you better have a big server room for a terabyte of storage using 18gb drives!).

    Blatant commercialism. Yetch.

    -- E
  • if you mount them read-only. This was with a 120gb filesystem. Obviously it's not doing those basic file system checks upon that kind of mount. I guess I'll have to see what option "check=none" does with a big filesystem next time I have one to play with (which should be shortly, with a 180gb one).

    -- Eric
  • The important thing is the filesystem itself. The journal, as I understand it, is just a record of changes to be made. Rollbacks are possible for multiple levels of updates.
    If the glitch occurs prior to journal recording then there is nothing to fix.
    If the power outage or problem occurs after the journal entry has been made but prior to the commencement of writing then the changes can be rolled forward or back - posted or rejected.
    If the problem occurs after journaling but while writing is in progress then the changes can be rolled back and then possibly reposted.
    If the problem occurs after journaling and after writing but prior to reconciling the journal then the changes can be rolled back or the journal updated to match the filesystem.

    Journaling is good for systems that require very very high reliability - such as banking systems. There is obvious overhead involved in journaling.

    An optional, journaling filesystem for Linux would be a nice addition - hey, NTFS for Linux isn't far from being read/write is it?
  • Posted by GregK:

    We are selling 1.1TB(that's terabyte)machines currently using the Dac960 and external drive enclosures. You can check out our systems at the following URL;


    They are quite reliable, mostly due to the fact that the author of the Dac960 Linux drivers,(Leonard Zubkoff), works for us.
  • Posted by GregK:

  • Posted by Jim @ ImageStream Internet Solutions:

    The company I work for has been selling Linux based servers running the linux software raid. It works great in the 2.1 kernel, the largest raid we built under software was 150 gig. We did this to backup a clients old raid system. One of are clients has a server which is running 100 gig raid(linux software) and is moving huge databases on it daily without fail and been up over 6 monthes running 24/7 crunching data.
  • It seem that for Linux ext2 partition it was discussed that there is some loss for big filesystem (a % of space is lost for superblock copy and such)

    you want to turn on "sparse-superblocks" support when doing a mke2fs. it reduces the number of duplicate superblocks. the catch is that it is really only supported in the late 2.1.x/2.2.x kernels. 2.0.x will bawk like a dying chicken.
  • Sorry, but I don't buy that. Solaris and AIX have no problem with big files on 32-bit platforms. 32-bit platforms are normally limited to 32-bit address spaces (modulo games such as segments and some of the other truly horrible DOS and Windows hacks), but there's no compelling reason why file size and physical memory should be constrained by processor word length.
  • Wrong answer! I guess your instructor is being paid too much.

    See IBM's web page, which shows how to download an unsupported ADSM client for Linux:

    http://www.storage.ibm.com/software/adsm/adsercl i.htm
  • also the reservered blocks percentage (man mke2fs). On large partitions you can waste alot of space if you keep the default 5% reservered blocks percentage.

    Reducing the amount of reserved space may save you some space, but it can cost you a *lot* of time. Having 5% reserved space will mean that there will almost always be a "free block" within about 20 blocks from the end of the file.

    Unless you want your expensive raid system to spend lots of time seeking, you should keep the 5% min free value, or even increase it to 10% like the BSD folks use. You certainly don't want to constantly run the filesystem close to being full most of the time.

  • 1. Create a large 10MB test file on an IDE partition.

    2. Do lots of checksums using sum or md5sum.

    Q: Do you get the same checksum each time?

    I once had a problem like this where I would get a corrupt byte from the disk about once very 50-100MB of data read. It happened on two different disks, I tried three different IDE controllers, I swapped RAM, I ran RAM tests, I made sure my bus speeds were within spec. One byte every 50-100MB might not be very high, but it was enough to crash my system once or twice a week.

    It turned out that I needed to set my RAM speed in the BIOS to the slowest setting, and everything worked. The RAM test programs didn't detect anything, I think because the errors only occured if I was doing heavy disk access at the same time I was doing very CPU intensive things.

    Moral of the story: PC hardware is crap and Unix tends to push the hardware much futher than other systems.

    Set your BIOS settings down to the very slowest settings and see if the problem goes away. Try swapping components, and try a *good* memory tester (Linux has one called mem86 or something).

    Good luck

  • by jd ( 1658 )
    I believe ext3 is "work in progress", though feel free to contradict me if I'm wrong on that.

    From what I understand, ext3 would be better suited to giant partitions.

  • Dude, it sounds like you have either some bad memory or are pushing your processor a wee bit too hard. Are you using parity memory? Odds are not, which is fine, but you can get strange results like this. Maybe you should borrow some SIMMs from a friend and see if the problem persists.
  • It seem that for Linux ext2 partition it was discussed that there is some loss for big filesystem (a % of space is lost for superblock copy and such), don't forget to reduce % of filesystem reserved for root (5% by default, that would be overkill for 120Gb)...
    If you find 3mins long to mount a 120Gb filesystem, you should have seen a Netware server with 13Gb, that take at least 5-15 mins to mount the filesystem...
  • The main problem with Netware (before 5), is the lack of memory protection and a illegal read or write does crash the server (there is an option to ignore this, only use it when developping NLMs)
    A well configured Netware server with updates is very stable. For 3rd party NLM, if they are well written, it shouldn't crash the server (I'm also a NLM developper, and Unix programmer. One of my NLM hadn't crashed and the server uptime is about 2 months, and it's used fairly often, it mainly convert print job from Epson FX-80 format to another printer format. NLM are a pain to develop, develop (and test) most of the NLM under another OS, like Linux then to the last few lines under Netware is the best way, with of course set of routines for easy porting between Unix, Netware and Win32 with the same code.)
  • There is work in progress to develop a binary tree-based filesystem for Linux, which is currently on second beta. The paper and source files are located at http://idiom.com/~beverly/reiserfs.html [idiom.com] . It is supposedly faster than ext2, and might be better suited for gigantic partitions, although I cannot attest to that, as I have no experience with it. Does anyone here know anything about this?

  • Tried it, and got the same every time. Setup here is an AST Premmia GX P/100 in single-processor mode (has a 82430LX/NX Mercury/Neptune chipset and PCI and EISA buses), 32MB no-parity RAM, kernel 2.0.35, Western Digital WDAC1600 disk, and a *shudder* CMD 646 rev 1 controller. Well, actually, the disk mentioned above is connected to the second channel, which uses a different controller that gives me problems when I configure it to do IRQ unmasking and 32-bit transfers (the CMD646 does that fine here). I recommend you to try tuning your disk setup with hdparm and try again.
  • Sometimes called a 'log' file system, though they aren't the same. AFAIK the difference is that a log file system just writes the meta-data super-safely (so you don't need to fsck), while a full jounaling writes all data out as a 'log' - it just appends to where it previously wrote. This does mean you need a background garbage collection processes.

    Doing writes in this way makes writes go MUCH faster. I read a review by one journalist (no pun intended) who didn't believe Sun's claims that it made long sequential writes go 3x faster or more... It did. Unfortunately, Sun haven't (yet) put full journaled FS support into standard Solaris, though there is an option to put "UFS logging" on - it can also be done on the fly. Still, deleting files and creating lots of small ones goes about 5-10x faster when you put logging on.

  • A journaling (not "journaled") filesystem is one that keeps track of all the writes it's going to make on a special part of the disk. That way, if you lose power with the disk still spinning, the FS can read its record of "pending transactions" and make the needed changes immediately when you boot again. Journaling thus eliminates the need for fsck'ing. Cool, huh?
  • Piece o' cake: Use DLTs. I work at Indiana University (an awesome CS school, by the way; I just finished my bachelor's here ;) ) that just purchased what's called a "tape silo". It's a huge enclosure with shelf upon shelf of DLT-sized cubbyholes and a robot that moves among them in two dimensions (up/down and left/right) inserting and removing tapes. They plan to obtain around 15 TB of storage, with 1 TB of spinning disk to keep the whole thing moving. For a university of 30,000+ students and tons of research on the primary campus, that's not unreasonable!

    The moral of this story: DLTs are a perfectly feasible backup medium. You can get 17GB on one tape.

  • I know this sounds ignorant, but what is a journaled file system?

    A journalled file system writes all of the proposed changes to control structures (superblock, directories, inodes) into a journalling area before making those writes to the actual filesystem, then removes them from the journal after they have been committed to disk. Thus if the system goes down, you can get the disk into a sane state by replaying/executing the intention journal instead of checking every structure; thus an fsck can take seconds instead of minutes (or hours).

    For example, if you're going to unlink the last link to a file (aka delete the file), that involves an update to the directory, inode, and free list. If you're on a non-journalled system and update the directory only, you have a file with no link (see /lost+found); if you update the directory and inode only, you have blocks missing from your free list. Both of these require scanning the whole disk in order to fix; but a journalled system would just update the directory, inode, and free list from the journal and then it would be sane.

    Problems with journalled filesystems include conflicts with caching systems (e.g., DPT controllers, RAID subsystems with cache) where the intention journal is not committed to physical disk before the writes to the filesystem commence.

  • Cyrix 6x86 PR233MX with 64MB of RAM, EIDE hard drive 2.5 gbytes (Fujitu I think, don't feel like opening the case), Amptron 8600 motherboard. Linux 2.2.0pre4 kernel. No problems here, checksum was the same each time.
  • My god, how long does it take to fsck such a beast? Unfortunately, I haven't looked into the journalled filesystem that's supposedly available for Linux (I think it's commercial), but a journalled filesystem is exactly what you'll need for this. Even with a UPS, hour-long fsck times are not my cup of tea.

  • A quick amazon.com search would have revealed the title:

    Practical File System Design with the Be File System

    by Dominic Giampaolo

    From this book, you could literally write your own compatible implementation of BFS for Linux. The question is: would BeOS compatibility be worth missing the opportunity to create a new filesystem tuned for what Linux is used for? The nice thing about dbg's book is that he covers the reasoning behind every decision that he made when developing BFS. Clearly, some of these decisions are closely tied to what BeOS is being targeted for (a single-user power desktop for media professionals), rather than what Linux is most often used for (a multi-user Internet server).

  • I've got various linux boxes here with everything from 486's to PII's.

    For giggles I decided to run the test on my 34G IDE stripeset (2 17G drives). I used a 100M file instead of 10M since I have 64M of ram.

    Everything checked out 100% okay after 40 sums.

    I would say change BIOS settings / CPU Clock speed to something very conservative and re-run your reliability test. Once everything checks out then you've fixed your problem. Then you can start bring speeds, etc up to find out what breaks what.

    I've seen problems like this before a few years ago.. Intermittent failures on a news server I ran.. Started with Linux, went to FreeBSD and finally built a brand new box and the problems went away. I would say that your problems are *definitely* hardware. Overclocking your cpu/memory or failing cpu/memory/drives.

    Also IHMO, using Windows 95 as a test of hardware is like using an 85 year old lady to test drive an indy car.. Of course nothings going to go wrong at that speed. ;)

    -Jerry (jasegler@gerf.org)
  • A journaling filesystem is not the same thing as a log structured filesystem.

    A journaling filesystem is any filesystem that keeps a meta data transaction log so that it can be restored to a consistent state quickly by replaying the log instead of checking every file in the filesystem.

    A log structured filesystem, on the other hand, is a filesystem that places all disk writes on the disk sequentially in a log structure, which drastically improves file I/O performance when you have total rewrites of a large number of small files. The log is written into garbage collected segments that gradually free up the room taken by old file versions.
  • I think if you really need data sets over 4GB, it should be possible to split it over more than one file in your application. The kind of overhead this would give could be smaller than the overhead you'd get from a 64 bit file system, since it would be tailor made for your application. Once you have this flexibility in your system, users that would like to use multiple disks could also benefit from the possibility.
  • I like ext2fs. It makes ever other filesystem I've ever used look shoddy and slow. And I've read that it's benchmarked as the fastest filesystem on Intel. The only conceivable reason to replace it would be if someone came up with a new filesystem for Linux that was even faster and had lower overhead. I won't be holding my breath.

    Of course, people with big-ass disks and high uptime requirements need that journalling crap. And they'll have it. So don't be dissin' ext2!

  • by Bwah ( 3970 )
    ... Please excuse the stupid question here, BUT:
    How does a filesystem of this type allow for a 3x sequential write speed improvement?

    I understand what the journaling part is describing, but don't understand how this would be that much faster. Especially under a really heavily loaded server.

  • I read that it can handle volumes of over 1 terabytes....does support for this exist? I think it is a 64-bit journaling FS......seems pretty nice, but I haven't played with BeOS in quite some time...any help out there?

  • Note that I'm a FreeBSD person; all of this from theory, as I've not run Linux for anything serious for a couple of years.

    If you run this using the default parameters and get an unplanned shutdown (crash, power outage, whatever), you are likely to get minor file corruption. To get correct behaviour, you should mount the filesystem in sync mode, and rely on the underlying RAID setup to handle write caching if you need it (as this remove one failure layer).

    You will also want to modify e2fsck to avoid silent data corruption. e2fsck will (or would, the last time I was in a discussion with the author on these issues) handle a block that is shared between two files by duplicating the block.
    This silently corrupts at least one file. You will probably want to change it to delete both files, possibly making that an interactive question. (Deleting is the default action on the *BSD fsck, BTW).

  • A little background information: Note that the below discussion is mostly academic on *BSD unless you have mariginal hardware; the filesystem code does careful ordering of metadata updates unless the user explicitly turns the ordering off. Thus, you will only see this error if you have mariginal hardware or are running an FS you have explicitly decided you can gamble with.

    Linux and traditional *BSD has choosen different policies here; Linux somewhat gambles with the users data and security setup as a tradeoff for faster metadata updates and lower code complexity. This tradeoff is probably OK if you're only going to use it on a normal workstation, without any critical data on it; I guess it can be OK in some server apps too (though I wouldn't do it). *BSD does things "by the book", guaranteeing metadata integrity (and thus avoiding data leaks, and keeping POSIX semantics for e.g. rename). Note that the traditional BSD tradeoff is NOT the same as Linux 'sync'.

    The latest development on the BSD side of this is "soft updates", which is safe without the speed penalty.

    Now, back to the original poster:

    "Deleting is the default action on the *BSD fsck" Oh yeah, I didn't really want those files. They take up too much space anyway. After fsck destroys my data I will have room for more!

    I'll take "maybe corrupt" over "kiss your files goodbye" for sure.

    We're not talking "Maybe corrupt". We're talking of at least one file being corrupt, and we're talking of the possibility of private data crossing the protection domain between user IDs, and of wrong data or code migrating into setuid programs.

    For some applications, it might be an OK tradeoff to silently corrupt one file to potentially make another file OK. However, it is not OK for any of my applications - I need to know that the security policies for files are held; if I can't know this, I want to restore from backup, rather than keep running with corrupt files.


  • by law ( 5166 )
    I have a 38 gig partition on netware and it take about 6 minutes.... raid 5, mylex 960 32 megs of ram.
  • Heh, I know of a Netware server that mounts a 50GB volume in the blink of an eye :-) Seriously, its less than a second.

  • Dang, the ftp site with the patch gives a 'can't set guest privileges'. Anyone have an alternate site for the >4GB patch? It isn't at linuxmama...
  • It's callled NSS - Netware Storage System. I think it's journalled. I know it's similar to Unix filesystems, in that it doesn't matter where the data is physically, it all looks to be on the same volume. And it is REALLY fast. I mounted a 32 GB volume in about 2 seconds (on a Xeon 450 though).

  • A year ago when I was working on an AIX system and investigating their new support for file sizes over 2Gig (not 4Gig), I remember it was a bloody pain to switch over. Not only did you have to rebuild your file system and recompile ALL of your applications with the new libraries to get them to support the greater file sizes (I don't remember if you had to recompile apps that didn't care about large files), but once re-compiled, you couldn't use the same binaries with older file systems. On top of all that, there was a significant performance hit (10% to 20%) on file I/O.

    Again, I don't remember all the details, but in the end, we decided it was far too painful to implement the changes in our application. YMMV.

  • Yea, I've never needed more than that 640k I got in my machine anyway... all the rest just sits there.
    Endless Loop ; see Loop, Endless
    Loop, Endless; see Endless Loop
  • Ah.. that would be nice.. wouldnt it? goodbye to ext2

  • Not only are disk-writes faster on journeled file-systems, there are also such things as journeled operating systems.

    That is, if you turn the power off and turn it on, the entire OS comes back on to a state within a few minutes of where it was. One example that looks interesting is EROS [upenn.edu].

    I have not seen this one in operation, but there are theoretical arguments for their speed claims, and (as they say) it is theoretically impossible for *any* OS based on access lists (such as Unix) to achieve the same level of security that a capability based system can. (Note, I said "can", not "does".)

    Ben Tilly
  • If you're netware server crashes that much you have a problem. It can easlily be hardware or an errant NLM or running your backup during file compression.

    I routinely see netware servers that have uptimes of 400-600 days.. record is 900 days so far (took a polaroid of that one).

    If you want some help with your system, I would be happy to help you wih your problem for free. You can contact me at dminderh@baynetworks.com if you'd like.

    The new file system in netware 5 will mount & vrepair 1.1 TB in 15 secconds (that's the largest I have seen..I'm sure it will do more..)

    And your mount time isn't that bad. Chrystler has a 500 GB volume that takes 22 hours to mount :)
  • If our testing of GFS, we have created ~108 GB filesystems (12 9GB disks software striped together). The only limit in the file system size is 1TB (which all Linux filesystem share).

    GFS is a 64-bit filesystem. It supports files up to 2^64 bytes in size (on the alpha).
    It is much faster than ext2 for moving around big files.

    GFS will support journaling by the fall.


  • Since you were running NTFS on that size of file system, why do a chkdsk? NTFS is a journalled filesystem, so there is really no need - and of course journalling is designed to avoid long fsck's.

    NT may have its faults, but NTFS is not bad in this respect - Linux does not yet have a widely used journalling filesystem that I'm aware of.
  • Out of disk space is the fastest way I've seen to crash a Netware server. At one place I used to work, the default mailbox location was on the system volume. Once it was full, BOOM! the server was dead.
  • We are about install a central machine that
    runs NFS, sendmail, DNS, NIS, httpd for internal
    use, gnats for around 60 users. Here is the
    plan. Two identical machines with 512M ram and
    9.0G disks with OS installed. One machine
    would be running as NFS server and the other
    machine would have all the servers sendmail,
    DNS, NIS etc. The NFS server is connected to a diskarray with 7 18.0G disks and a backup
    tape autochanger. I want to leave one of the
    disks as a hot spare. I would like to write scripts such that if one machine fails, the other can take over by just running a script.

    It is the RAID part that is not clear to me. The
    last RAID I checked was Veritas on Solaris which
    was a major pain in the neck to manage. Don't
    know if managing RAID on Linux is any simpler.
    I am inclined to wait till RAID becomes a standard
    part of Redhat. Until then, I would rather
    depend on the tape backups than
    Linux RAID support.

    I am curious to hear any experiences on people managing large file systems 100G+.

    BTW, I haven't still figure how to use our
    Exabyte autochanger effectively with a GPLed
    backup sofware. Exabyte tech support wasn't very

  • While there aren't any effective GPLed solutions for using Exabyte (or any other SCSI/Medium Changer unit, for that matter) libraries and autoloaders under Linux, I've been playing around with 'em quite a bit lately, with some success.

    Drop me a note: johnbar@exabyte.com
  • Thanks for the compliment. I'll pass it on. ;-)

    - jmb. / exabyte corp.
  • If the system fails during a write to the journal, I believe that the whole operation is regarded as failed, and what's known in database circles as a ROLLBACK occurs... That is, since the disk operation (e.g. unlink) couldn't be completed, it's not done at all, and thus not left half-done and messy.
  • IIRC, isn't it NOT the OS, but rather the FS that must be 64-bit? I know NTFS is 64-bit and can handle files over 4gig (I've seen it). And, we all know that NT isn't 64 bit (yet). How it does it I am not sure - need an NTFS reference manual OR the MS source code. Fat chance of that...

    I kinda have to suggest this (shrug), but why couldn't we get the NTFS driver bulletproofed (r&w)?? Other than the anti-MS reason, NTFS isn't a bad FS (and is proven) and there is already substantial work done with it... It'd be great for that "Hey, NT admins, come to Linux?"

    But, then again, if people like Tweedie from RH working on designing ext3, why bother with NTFS?? ;-) Who knows where they are?

  • No, you don't. Not if you compile the module yourself. Try arla [stacken.kth.se], the free AFS client.
  • And arla [stacken.kth.se] provides an AFS client for free.
  • Big file systems are great when they work, but FSCK is a nightmare. We have 24 9.1 Gig Drives on a three channel DPT controller. We tried it as a single volume, but weird things happened with the DPT firmware.

    We originally used this as a Usenet news server. We tried 24 seperate volumes to have the maximum number of spindles, but Linux has a limit of 16 SCSI drives in the 2.0 kernels. We ended up creating 12 2-drive stripe sets. (no redundancy) We then created 6 partitions. 5 that were 2 gigs in length, and one with the remainder. We used a patch to allow the partitions to be handled as 2 gig files. This was very fast, and had no FSCK issues as there were no file systems. If a few articles were mangled because of a crash.....

    We ended up outsourcing our usenet service, and had this server to reuse. We created 3 volumes of 7 drives each, along with 3 hot spares. (One hot spare in each external drive chassis) Each volume is ~50 Gigs in size. One thing we have found is that if we HAVE to fsck the whole thing, (150 Gigs) you need about 4 hours. The PCI bus just doesn't have the bandwidth to check huge volumes in a reasonable time. We end up checking "/" and mounting it rw. We then mount the rest of the volumes "ro". We can then restart basic services (mail, web) and continue the fsck on the read-only volumes.

    It's a balance you have to strike. If you really need that large of a file system, understand the time to restart. For us, just a basic reboot takes 12 minutes. With FSCK, it's ~4-5 hours of time to babysit. If you don't need that much space, look at setting up several individual file servers. It will help spread the load.
  • For robotic changers, one of the best solutions I've seen is Arkeia by Knox Software. www.knox-software.com. It's commercial, and expensive, but worth the money if you need large-scale backup.
  • Wait... I missed something. When did we get files over 2 GB?
  • AFS, the Andrew File System, is the file system used to handle Carnegie Mellon users' home
    directories, including the home directories of all new accounts, as well as the bulk of system
    programs run by users.
    Just wanted to let you know. Am at Carnegie Mellon Univ which developed AFS. We are currently using it with 1 Terabyte of space just in our Univ. MIT and U of Mich (Ann Arbor) are among the other colleges that also use it.
  • I know this sounds ignorant, but what is a journaled file system. An explanation or link would be appreciated.

    talking windows users is where I draw the line ..
  • Actually IIRC journalling filesystems still require fscks. The process is just supposed to take a very short time. Besides, if you weren't supposed to run chkdsk on ntfs, why would they give it to you? And if ntfs is a journalling fs, why did it take 3 days? MS is obviously doing something (else) wrong.
  • Gee, too many people here act like RAID means totally reliable. It doesn't. RAID controllers go wrong; power supplies go wrong; many RAID controllers don't handle power outage properly (there is no such things as an "uninterruptable" power supply) as they have no battery support for data in transit; RAID controllers with battery support don't properly detect when the batteries have died, and won't provide any real support when their big day comes; hosts go wrong and scramble their disks; and even the best operating systems can go a bit pear shaped some days. There is also the good old "some idiot just typed a dumb command and wiped thousands of files" issue.

    I'm not saying RAID is a waste of time. It improves reliability a great deal, and the better designs make things go faster. They aren't perfect, though.

    Backing up a monster partition is a pain in the neck. If you have a monster database you have little choice, but smaller partitions make life easier.
  • I did your test.. and my results follow:

    10485760 Jan 18 22:00 bigfile

    after 30 runs of sum, _ALL_ checksums are the same
    41865 10240 (My setup is an ASUS TXP4 with Maxtor 1.2 gig EIDE.
    True, it's not ultra DMA, but as you can see, it
    checks out fine here.
  • no doubt throught some kludgy shift register crap or something similar. x86's memory arch is the most horrible i've even heard of.
  • I tried it and got the same checksum every time (Tried both checksums 4 times).
    I'm using a AMD K6-2 on an Asus P5A-B motherboard (uses Alladin (Ali1xxx) with a Quantum Fireball ST3.2A.(UDMA 2). Don't know the transfer rates.

    Greetz, Takis
  • Actually, at least Solaris 2.5.1 doens't support over 2GB files, according to Sybase, and believe it or not, but NT does. You can have a 32GB device (ie file) for Sybase under NT
  • I am interested in this as well. I am currently in the process of setting up a 180GB fileserver. I am using dual redundant CMD RAID contollers and 10 - 18GB UltraWide SCSI drives. The RAID provides a mechanism to create partitions that show up to the OS (linux) as individual drives. This is done by giving each partition in the RAID set it's own LUN. The bigest RAID partion I have made is 80GB. I am booting off the RAID set as well(no hard drive in the server box). I have had no problems so far. I also noticed that it takes quite some time to mount the larger partitions. One thing you might want to experiment with is varying the bytes/inode and also the reservered blocks percentage (man mke2fs). On large partitions you can waste alot of space if you keep the default 5% reservered blocks percentage.
  • I'm interested in info on using dat changers under linux. Is anyone doing this? If so what changers are supported?
  • I'm interested in info on using dat changers under linux. Is anyone doing this? If so what changers are supported?

    The machine I'm sitting at has an APS Technologies changer attached to it, of unknown model. The tape changer says "DATLoader600", but that is not an APS name.

    Here is a link to the APS website. [apstech.com]

  • All this talk about RAID makes me yearn to run Linux off our cool new Adaptec ARO-1130SA and AAA-130SA RAID controllers. However, to the best of my knowledge there are no drivers available yet. Has anyone else had luck getting one of these critters to run under Linux? If so, how'd you do it?
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Erik Norvelle
  • Linux 2.1.132
    Intel p255MMX (Yeah, it's overclocked, bus is running at 75Mhz.)
    128Mb edo, bios default for 60ns in my award bios.
    (simms are 4x 32Mb, 2x TI and 2x Panasonic)
    Mobo: Asus T2P4 Cache: 512Kb
    1.0 Gb samsung pio-4
    2.5 Gb bigfoot pio-4
    4.3 Gb bigfoot pio-4

    On all the disc the outcome was the same,
    I "summed" for 30 times each disc.

    I also tried it on my nameserver,
    Linux 2.0.36 + egcs patch
    AMD 386DX40
    Motherboard = unknown
    8Mb "topless" (8x1Mb)
    420 Mb Seagate
    BIOS MEM Setting: as conservative as you can get
    I tried it 20 times here, also no difference in

    Weird shit happenin' in yer machine...
    Try other ide cables, I had problems with that in the past, my hdd's used to spin down (a bit) click and the get up to normal speed again. Bad connectors caused the hdds to reset once in a while, this cost me some write and read errors including some badblocks! (my system tried to read/write while the heads were resetting -> the "click" sound).

    Anyone here who has/had this problem too?

  • .. in older versions of netware, i.e v3.x or v4.x it really takes ages. But take a look at Netware5 and the new file system... it rox!

  • At my work we have been toying with an 18GB RAID0 partition under Linux. We would like to perhaps stick even more disk on it, however I don't know of a good GPL backup package. Does anyone have any pointers here? I don't think that a simple dump or tar will cut it.




"The number of Unix installations has grown to 10, with more expected." -- The Unix Programmer's Manual, 2nd Edition, June, 1972