Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Linux Software

Distributed Filesystems for Linux? 375

zoneball asks: "What would you use for a distributed file system for Linux? I have several GNU/Linix machines running at home, and wanted to be able to see more or less the same file tree (especially all the ~user directories) regardless of which machine I'm connected to, and where the traversal into the distributed file system space is largely transparent for the end-user. Are there any URLs or documents that compare the features, bugs, road map, stability of these and other distributed filesystems? Which offers the best stability and protection from future obsolescence?"

Zoneball looked at 3 distributed filesystems, here are his thoughts:

" Open AFS was the solution I chose because I have the experience with it from college. For performance, AFS was built with an intelligent client-side cache, but did not support network disconnects nicely. But there are other alternatives out there.

Coda appears to be a research fork from an earlier version of AFS. Coda supports disconnected operations. But, the consensus on the Usenet (when I looked into filesystems a while ago) was that Coda was still too 'experimental.'

Intermezzo looks like it was started with the lessons learned from Coda, but (again from Usenet) people have said that it is still too unstable and it crashes their servers. The last 'news' on their site is dated almost a year ago, so I don't even know if it's being developed or not"

So if you were to recommend a distributed filesystem for Linux machines, would you choose one of the three filesystems listed here, or something else entirely?

This discussion has been archived. No new comments can be posted.

Distributed Filesystems for Linux?

Comments Filter:
  • NFS (Score:4, Informative)

    by mao che minh ( 611166 ) * on Tuesday May 13, 2003 @06:46PM (#5949787) Journal
    I know that this is going to be the most common answer, but just go with NFS. It's not the most secure option around, but obviously the simplest to implement and the best documented.

    NFS Linux FAQ [sourceforge.net]
    Howto #1 [sourceforge.net]
    Howto #2 [linux.org]

    If you find yourself needing help, try asking people at Just Linux forums [justlinux.com], or trying the NFS mailing list [sourceforge.net].

    • Re:NFS (Score:4, Informative)

      by vandan ( 151516 ) on Tuesday May 13, 2003 @06:54PM (#5949854) Homepage
      I have to agree.
      It takes about 5 minutes to get an understanding of what you need. After setting it up it just works.
      NFS is a great ... Network File System. No need to re-invent the wheel here.
    • does NFS have trouble with user permissions such as a user having different numbers on different system? Is it secure enough to share a common passwd file?
      • Re:permissions? (Score:5, Informative)

        by phorm ( 591458 ) on Tuesday May 13, 2003 @07:02PM (#5949926) Journal
        That's what NIS is for. You can schedule regular downloads of group/passwd files, which are updated in a NIS database stored on a master server, and passed down to "slave" servers.
        • by Kunta Kinte ( 323399 ) on Tuesday May 13, 2003 @08:06PM (#5950427) Journal
          Don't use NIS, unless you have absolutely no other option.

          Other options like LDAPS and Kerberos offer at least some form of security.

          ypcat, then brute force attack on the resulting passwd file is as old as dirt, and sadly still works. I was a bit dissappointed when I saw NIS as a required service on the Redhat cert syllabus.

          This may sound harsh, but I don't think there is much excuse for run NIS in this day and age. Anyone who does this in an environment where security is a concerns deserves what they get.

          • I've got other options, but I use NIS. The catch (there is always a catch) is that my NIS does not contain ANY password hashes, because I use Kerberos to contain those. It works well, and it's nice and simple. The future plan is to migrate to LDAP of course, and get rid of all my NFS mounts all over everywhere and implement AFS, but for now, NIS + Krb5 works great.
          • It's not all rosy like `use LDAP'

            NIS is simple and easy to maintain. LDAP is harder. From memory (10 years ago) Kerberos was geared towards as single user on a single machine, is that still the case?

            Lots of big organizations still use NIS because its flaws, while real, are well understood.
            • by cduffy ( 652 )
              Kerberos is not at all geared towards one-user-one-box -- it was created for large multiuser computing environments, initially MIT. Certainly not one-box -- heck, you need at least one dedicated, secured, guarded system to run the thing.

              It *does* have flaws (I'd prefer it did something similar to AFS's PAG-based authentication, such that tokens are per process group rather than for all instances of a given UID on a box -- and a malicious root can trivially steal tickets for all users who have valid ones on
        • Oh my god! People still use that antique technology? Just say no! Even Sun long ago gave up on NIS. Poor scaling, poor security. No way man, no how.
      • Re:permissions? (Score:5, Informative)

        by Dysan2k ( 126022 ) on Tuesday May 13, 2003 @07:09PM (#5949986) Homepage
        To be honest, big time, but a lot of people forget the other side of life with NFS, and that's NIS/NIS+. The yp-tools include pretty good NIS support, but not sure of NIS+. Would use niether in a production environment personally, but a common Auth system which is easy to manage would solve that issue.

        Could also look into LDAP (VERY complex, no good starting point that I've been able to find) and Kerbreos auth methods as well.

        Should give you a central point for uids/usernames. But NFS does not have transparent mounting that I'm aware of so that you could mount, say the /home directory of 5 computers onto / on a central system and it display all the mounts simultaneously. For example:

        <ECODE>
        CPU1 contains: /home/foo
        /home/baz

        CPU2 contains: /home/tic
        /home/tac

        CPU3 contains: /home/toe

        on CPU4, you'd do the following:
        mount CPU1:/home /home
        mount CPU2:/home /home
        mount CPU3:/home /home

        And you'd end up with on CPU4:
        /home/tic
        /tac
        /toe
        /foo
        /baz
        </ECODE>

        If there is a way to do this, please lemme know. I've heard people talk about it in the past, but haven't seen anything come of it yet.
    • Re:NFS (Score:5, Insightful)

      by gallir ( 171727 ) on Tuesday May 13, 2003 @07:12PM (#5950011) Homepage
      Naaaaaaaaaa.....

      NFS is not distributed, it's only "networked" or "remote". I t doesn't support any: replication, disconnection, sharing, distribution. It is centralised, requires the same user names|numberpace and security.

      In one word, it's far away of the requirements, at least if you compare them with the listed FS in the question.
      • But nothing the guy asked about requires "distributed". His post really sounds like he means "networked"

      • Re:NFS (Score:3, Insightful)

        by g4dget ( 579145 )
        NFS is not distributed,

        Of course it is. It gives you a single, unified view of a file system tree that can span many machines.

        It doesn't support any: replication, disconnection, sharing, distribution.

        Sure it does. Some of that functionality requires more than a vanilla NFS server, but that can be transparent to clients.

        It is centralised, requires the same user names|numberpace and security.

        Older versions did, current versions don't.

        Don't get me wrong, NFS has lots of problems in many environme

        • Re:NFS (Score:3, Informative)

          by cduffy ( 652 )
          [NFS] gives you a single, unified view of a file system tree that can span many machines.

          Only if your mount tables are the same everywhere, and they need to be kept in sync on the client side. By contrast, in AFS, change where a volume is mounted anywhere -- and it's changed everywhere. Add a new volume on a different server on one client, and it's there in that same place on all of them. No mucking with the automounter, no distributing config files to all your machines, none of that mess.

          AFS makes admin
          • Re:NFS (Score:4, Informative)

            by g4dget ( 579145 ) on Wednesday May 14, 2003 @12:29AM (#5951986)
            Only if your mount tables are the same everywhere,

            That's what NIS is for. Furthermore, the flexibility of being able to set up machines with different views of the network is crucial in many applications. None of my workstations or servers actually have the same mount tables: they all get some stuff via NIS, and some stuff is modified locally. The restrictions AFS imposes are just unacceptable.

            AFS makes administration tremendously easier after one's scaled the initial learning curve.

            AFS is an administrative nightmare. Apart from the mess that ACLs cause and the problems of trying to fit real-world file sharing semantics into AFS's straightjacket, just the number of wedged machines due to overfull caches and its complete disregard for UNIX file system semantics cause no end of support hassles. Then, there is the poor support for Windows clients. We started out using AFS because it sounded good on paper, but it was a disaster in terms of support, and we got rid of it again after several years of suffering.

            It performs far, far better than NFS on large networks (and merely somewhat better on smaller ones).

            AFS's caching scheme works better than what NFS is doing for small files, but that case is fast and easy anyway. AFS's approach falls apart for just the kind of usage where it would matter most: huge files accessed from many machines.

            Both NFS and AFS have very serious problems. But between the two, NFS is far simpler than AFS, is easier to administer in complex real-world environments, respects UNIX file system semantics better, and works better with large files. I can guardedly recommend NFS or SMB ("there is nothing better around, so you might as well use it"), but I can't imagine any environment for which AFS is a reasonable choice anymore. The only thing AFS had ever going for it as far as I'm concerned is that it was fairly secure at a time when NFS had no security whatsoever, but that is not an issue anymore.

            • Re:NFS (Score:3, Informative)

              by cduffy ( 652 )
              The restrictions AFS imposes are just unacceptable.

              Really, now? Tell me what you're trying to do that AFS won't allow you (not *how* you're trying to do it, but *what* you're trying to do), and I'll tell you how to do it with AFS.

              Apart from the mess that ACLs cause and the problems of trying to fit real-world file sharing semantics into AFS's straightjacket

              WHAT?! I could say the same thing about UNIX's user/group/world semantics, and far more defensibly. ACLs allow all sorts of useful things; I can ha
      • by billstewart ( 78916 ) on Tuesday May 13, 2003 @09:46PM (#5951048) Journal
        Yes, there are applications where you want a real, heavy-duty, full-scale Distributed File System. The last time I looked at AFS it had too much Transarc commerciality in the way of using it, but that was a Decade Ago. If the OpenAFS works, it's probably a great choice.

        But for a lot of applications, you simply don't need that much, and you've got some way to contain the security risks, and NFS can be enough. It's easy enough to set up, and if all you're *really* trying to do is make sure that everybody sees their home directory as /home/~user, and sees the operating system in the usual places and the couple of important project directories as /projecta and /projectb, NFS with an automounter and a bunch of symlinks for your home directories is really just fine. They hide the fact that users ~aaron through ~azimuth are on boxa and ~beowulf through ~czucky are on boxbc etc. And yes, there are times you really want more than that, and letting your users go log onto the boxes where their disk drives really are to run their big Makes can be critical help. But for a lot of day-to-day applications, it really doesn't matter so much.

    • Re:NFS (Score:5, Interesting)

      by rmdyer ( 267137 ) on Tuesday May 13, 2003 @07:55PM (#5950348)
      Nope, NFS is -not- a distributed file system. NFS is a point to point file system. And, unless you are using kerberized NFS, it is not secure.

      The only file system that is truely distributed, has a global namespace, replication, and fault tolerance is AFS.

      NFS is pretty much the same as CIFS for Windows. And, version 4 still doesn't have global namespace and volume location.

      So, NFS can't be a common answer because it isn't even allowed to be in the game.

      +4 cents.
    • by SuperBanana ( 662181 ) on Tuesday May 13, 2003 @08:04PM (#5950417)
      It's not the most secure option around

      That's like saying "jumping off a cliff is not the most intelligent thing to do." NFS is easily the LEAST secure option of ANY filesharing system.

      NFS is only appropriate on a 100% secured(physical and network-level) network. If anyone/someone can plug in, forget it. If anyone has root on ANY system or there are ANY non-unix systems, forget it. If ANY system is physically accessible and can be booted off, say, a CDROM, forget it. The only major security tool at your disposal is access by IP, which is pathetic. Oh, and you can block root access.

      Even though you can block root access for some/all clients, it's still massively insecure, and this remains NFS's greatest problem. You have zero way of authenticating a system. NFS is like a store where you could walk in, pick up any item you wanted, and say "I'm Joe Shmoe, bill me for this!" and they'd say "Right-o!" without even looking at you. All systems with the right IPs are explicitly trusted, and their user/permissions setups are also explicitly trusted.

      NFS is a pretty good performer, especially when tuned right and on a non-broken client(which linux is VERY far from.) However, its entire security model is in dire need of a complete overhaul. There needs to be a way to authenticate hosts, for one, more similar to WinNT's domain setup, which is actually incredibly intelligent(aside from the weak LANMAN encryption.) The administrative functionality in NFS can't compare to the features that have been available to MacOS and Windows administrators for over a decade, and it's purely embarassing.

      Either that, or AFS/Coda need to get a lot more documentation and (for Coda)implementation fixes. The unix world desperately needs a good filesharing system...

      • by tzanger ( 1575 ) on Tuesday May 13, 2003 @09:24PM (#5950902) Homepage

        I use a very simple script to help keep NFS secure:

        IPTABLES=/usr/sbin/iptables
        RPCINFO=/usr/sbin/rpc info
        GREP=/usr/bin/grep
        AWK=/usr/bin/awk

        $IPT ABLES -F nfs
        $IPTABLES -N nfs &> /dev/null
        $RPCINFO -p localhost | $AWK '/portmap|mount|nfs|lock|stat/ \
        { print "iptables -A nfs -p " $3 " --dport " $4 " -j DROP" }' | \
        /bin/bash

        $IPTABLES -L INPUT -vn | $GREP -q 'nfs all -- !ipsec0+'
        if [ $? -ne 0 ]; then
        $IPTABLES -I INPUT 1 -i eth0 -j nfs
        fi

        Basically it only allows incoming NFS-related connections over ipsec, dropping anything that is not. NFS port allocation is dynamic by default and I know you can force ports, but this seemed far easier to scale.

        One thing I have noticed (and perhaps it's common knowledge to NFS experts) is that in order to get locking to work at all, my NFS clients had to be running statd and lockd. Without 'em everything worked but locking would fail every time.

        • by rneches ( 160120 ) on Tuesday May 13, 2003 @10:34PM (#5951344) Homepage
          And if you're lazy and/or adventurous, you can turn on NFS over TCP in your kernel and tunnel it over ssh or ppp/ssh. I've never tried it, but it ought to work. I understand that NFS over TCP is relatively untested, but is reputed to work rather well. Doing weird things like this would be a pretty good way to test the NFS over TCP code, and I'm sure the developers would be interested to hear how it goes. Particularly if you run a lot of data over it for a long time, and have a good way of verifying that all is well. Or, better still - if all is not well, and you have a good way of articulating what went wrong.

          Of course, that doesn't mean it's a good idea. I think your solution with IPSec is much more elegant. Unfortunately, I happen to need to get through a heavily packet-shaped network that massively favors port 80, and drops random packets everywhere else. Not IPSec friendly at all. I avoid this by running multiple ppp/ssh tunnels through the retarded parts of the network and letting my gateway balance between them. Unfortunately, this requires privileged accounts on many, many boxes in odd places.

          By the way, 10 points to any Northeastern University students who send polite, well considered complaints to Network Services. Not RESNet - they exist only to prevent you from talking to Network Services. Don't bother yelling at them - they exist specifically for that purpose. RESNet has no authority whatsoever to, for instance, allow CVS to work when Network Services decides to to drop 90 percent of packets on port 2401. This is for your benifit - I'm perfectly happy with my tunnels.

      • by HuguesT ( 84078 ) on Tuesday May 13, 2003 @09:39PM (#5951007)
        > If anyone has root on ANY system or there are ANY > non-unix systems, forget it.

        By that you mean that it's easy to read stuff off people's directory if you can spoof their UID. Sure. I think you'll find the same is true on a SMB network.

        > The administrative functionality in NFS can't
        > compare to the features that have been available
        > to MacOS and Windows administrators for over a
        > decade,

        Given that 10 years ago Windows for Workgroup had hardly been released and didn't even have TCP/IP by default I think you are exagerating a little bit. At the same time MacOS version 7 was the norm, and we all know how secure that one was, right?

        Maybe NFS4 [samba.org] is your answer?

      • by Anonymous Coward
        There is plenty more that you can do to secure NFS than you suggest. Kerberos, secure rpc, DES / DH authentication, and IPSEC are all available tools. Unfortunately Linux NFS had tended to lag in security.

        http://docs.sun.com/db/doc/816-7125/6md5dsnvv?a= vi ew
        http://nscp.upenn.edu/aix4.3html/aixbman/comma dmn/ nfs_secure.htm
        http://docs.sun.com/db/doc/805-722 9/6j6q8sve1?q=de s+nfs&a=view

      • I run NFS over IPSec. That solves many of the security issues.

        -- Agthorr

    • Re:NFS (Score:5, Insightful)

      by nosferatu-man ( 13652 ) <spamdot@homonculus.net> on Tuesday May 13, 2003 @08:56PM (#5950699) Homepage
      "For every complex problem, there is an answer that is clear, simple, and wrong." -- HL Mencken

      'jfb
    • Re:NFS (Score:3, Insightful)

      by tuxlove ( 316502 )
      I know that this is going to be the most common answer, but just go with NFS.

      This is what immediately came to mind for me too. Except for one thing. NFS is not a distributed filesystem. It's merely a network filesystem. The data itself actually resides only in one central place, and is not distributed in any way. Storage is not shared across machines, and therefore NFS is limited, in performance and redundancy, to the levels that single storage point represents. If it's an infinitely scalable, fault-tol
  • Yup NFS (Score:2, Informative)

    by laugau ( 144794 )
    NFS + Automounter plus NIS and you get everything you ever wanted. NFS is fast, well known and documented and transparent.

    • Nope, sorry, NFS is -not a distributed file system. No amount of automounter pain will bring you even close to AFS. NFS is just the traditional de-facto point-to-point client-mounts-server share file system. It is just like CIFS for Windows.

      AFS on the other hand has volume location independence, client-side-cache, token based ACL security, global namespace, yata, yata, yata. If you think NFS is distributed, you are still in your crib compared to real enterprise filesystem administrators.

      Go get another
  • by nescafe ( 12858 ) <<gro.xavodronf> <ta> <efacsen>> on Tuesday May 13, 2003 @06:51PM (#5949825)
    I would use SFS [fs.net], the Self Certifying File System. Assuming all the systems you are using are supported, it offers global, secure access to anything you care to export.
  • Well it depends... (Score:5, Informative)

    by Tsugumi ( 553059 ) on Tuesday May 13, 2003 @06:51PM (#5949828)
    For my money, nfs in a LAN, afs over a WAN, it really depends on the size of the network your trying to play with.

    Since openafs [openafs.org] forked from the old transarc/IBM codebase, it looks as if it has a real future. It's used by a load of educational and research institutions (notably CERN), as well as Wall Street firms.

    • It is correct that normally you run nfs over LAN, but nfs *can* run over WAN, albeit you would want to run NFS over TCP rather than over UDP. Just to deal with delay and potential losses over WAN.

    • It's used by a load of educational and research institutions (notably CERN), as well as Wall Street firms.

      No kidding. I had an account where I could "cd /afs" and get into at least twenty universities. (I was trying to use some software that a student put on the web, and I needed to recompile it but he refused to put up the source. So I cd'ed into his home directory at his school and copied it. For my personal use only, not to distribute.)

      One nasty glitch is that (at least in some installations) AFS

  • NFS/BOOTP (Score:3, Informative)

    by rf0 ( 159958 ) <rghf@fsck.me.uk> on Tuesday May 13, 2003 @06:53PM (#5949850) Homepage
    I'm sure other ehere will suggest NFS but why not just go whole hog and setup you clients to boot off a server then mount the same NFS filesystem. That way total transparency without having to make sure that n FS is always mounted

    Just my $00.2

    Rus
  • Background on DFS (Score:5, Informative)

    by El Pollo Loco ( 562236 ) on Tuesday May 13, 2003 @06:54PM (#5949861)
    Check here [linux-mag.com] for a good background on DFS. It also has a quick table comparison of the popular programs, and a walkthrough to set up Intermezzo.
  • PVFS (Score:5, Informative)

    by Kraken137 ( 15062 ) on Tuesday May 13, 2003 @06:54PM (#5949862) Homepage
    We use PVFS at work to give us a high-performance network filesystem for use with our clusters.

    http://parlweb.parl.clemson.edu/pvfs/ [clemson.edu]
    • Perverted File System? Good for pr0ns I guess!!
      No offense here, it's just the first thing I had in mind. LOL
  • openmosix (Score:5, Informative)

    by joeldg ( 518249 ) on Tuesday May 13, 2003 @06:56PM (#5949874) Homepage
    I run an openmosix cluster with the openmosix filesystem here at work. Three computers.. no problems...
    If you want to take a look..
    http://lucifer.intercosmos.net/index.php [intercosmos.net]
    linkage and I am going to be placing some tutorials up. -joeldg
    • Re:openmosix (Score:3, Insightful)

      by Kz ( 4332 )
      Right!! OpenMosix is the solution.

      Using MFS, you can just have one pool of disks, memory, cpu's and the processes will migrate to the data; instead of copying the data around.

      Great system, once you settle on one version of the kernel (have to be the same on all machines)
  • Ye olde Samba (Score:4, Informative)

    by Anonymous Coward on Tuesday May 13, 2003 @06:57PM (#5949884)
    Samba works fine. I personally have approximately 5 samba mounts in my filesystem totally transparent for anybody who was to walk up and use my computer.

    No need to unnecessarily complicate things here, samba is simple to set up and functions great.
  • You might have luck googling for "clusted filesystems" as well. things like HP's CFS (no idea how good it is though)

    Rus
  • ...just to keep things simple. If you need redundancy, try changedfiles [bangstate.com], it's a lot less of a hassle than intermezzo (IMO).
  • This actually brings up a question I've been wondering about for a while. Does anyone have any solutions for a mirroring file system? Basically RAID 1 over a network.

    What is a good stable solution for this? Currently I'm just using a tar over ssh once a night to do an incremental backup.
    • by dlakelan ( 43245 ) <{gro.stsitra-teerts} {ta} {nalekald}> on Tuesday May 13, 2003 @07:16PM (#5950042) Homepage
      Whoa, you definitely need Unison [upenn.edu].

      Unison will synchronize any two file trees in The Right Way (TM).

      Get the gtk version for interactive conflict resolution.

      • Whoa, you definitely need Unison [upenn.edu].

        Unison will synchronize any two file trees in The Right Way (TM).


        Well luckily it's not two way mirroring, it's purely one way. My clients all update their website on the primary server only, and any changes are then backuped up nightly to the live backup server. However a better solution is definately desirable.
      • Unison (Score:3, Informative)

        by brer_rabbit ( 195413 )
        Anyone with a desktop and a laptop they want to maintain in sync definetely needs Unison. This is one of the coolest tools I found after I picked up a laptop.
    • I know this isn't quite what you asked for, but check out http://www.mikerubel.org/computers/rsync_snapshots / [mikerubel.org] for advice on using rsync to create "snapshot" like backups.
    • by Arethan ( 223197 ) on Tuesday May 13, 2003 @07:24PM (#5950099) Journal
      I usually use rsync for one way backups, and unison where I need 2 way synchronization.
      Rsync is nice because you can update lots of files very quickly, as it only moves binary diff's between files. Also, if it is a costly network link, you have the option to specify max transfer rates, so you don't kill your pipe when it runs from your cron job.
      Unison is nice because it is pretty smart about determining which files should be moved, and can correctly handle new and deleted files on either end of the link. Plus it supports doing all of it's comm via ssh, so it's secure.

      rsync [samba.org]

      unison [upenn.edu]

      The downside to both of these being that neither of them are instantaneous. However, I've had much success running both of these as often as every 5 minutes. Just make sure that you launch them from a script that is smart enough to check for already running instances before it starts trying to move data.
      • rsync -e ssh -azu --bwlimit=500 --stats --exclude='/proc' --exclude='/dev' / targetsystem:/targetdir/

        -e is how to go - so -e ssh means use ssh
        -a (archive mode - see docs)
        -z compression - if you have more CPU vs pipe, use it. but if you are on a lan, you probably want to leave it off unless you don't mind the cpu hog (fat pipes will use more cpu time for compression)
        -u update only (dont overwrite newer files)
        --stats show you what it did when it is done
        --exclude leave off paths/files you want to skip
        --bwlim
        • To do a true backup, you must copy permissions. To copy permissions, the target system needs to have the same UIDs and GIDs as the source system. This is hard to do on Windows and OS X. Typical tools such as rsync, Unison and rdiff-backup make no effort to solve this problem. Suggestions?
          • To do a true backup, you must copy permissions. To copy permissions, the target system needs to have the same UIDs and GIDs as the source system.

            Use rsync. Default is to map user and group names at both ends of the connection, unless you specify --numeric-ids. Of course you have to have at least the names right, otherwise there's nothing to work with. And you need rooteness on the receiving end, but that's also to be expected.

            I've been using rsync for some time now to manage moving research data between

  • by Dr.Zap ( 141528 ) on Tuesday May 13, 2003 @07:04PM (#5949949)
    While there is no new news posted on the site, ther are current tarballs on the ftp server, as recent as 5.9.03. (but that file appears to be a redux, last update to code seems to be 3.13.03)

    The sourceforge page for the project (http://sourceforge.net/projects/intermezzo) shows status as production/stable but the info there looks stale too.

  • by Rosco P. Coltrane ( 209368 ) on Tuesday May 13, 2003 @07:06PM (#5949969)
    Which offers the best stability and protection from future obsolescence?

    This guy must have installed too many versions of the same Microsoft products.
    In the GNU/Linux world, BSD world, and to some extend in the entire Unix world, good designs do not become obsolete. Even not-so-good designs often stick around, for the sake of backward compatibility. In the newest greatest Linux kernel, you can still have a.out support, NFS, Minix, FAT16 filesystem support ... You can still configure you networking using scripts for 2.0- or 2.2-based distros. You can often use 20 year old programs under Unix, albeit sometimes with some effort.

    Only in the M$ world is obsolescence such a big issue, because that obsolescence is planned. In short, don't worry that much about obsolescence : if Coda is as good as it looks, it'll be there for a long time. If SomeCrappyDistributedFS FileSystem is used by enough users, it'll stay around for compatibility's sake anyway, even if it sucks.

  • NFS & autofs (Score:4, Informative)

    by Greg@RageNet ( 39860 ) on Tuesday May 13, 2003 @07:11PM (#5950003) Homepage
    What you are looking for is 'autofs', which has been used extensively in solaris and linux for years (forever). You can set up an NFS share and then have autofs mount/unmount it on demand. The advantage is that if the share is not in use it's unmounted and the machine will be less vulnerable to hanging if the NFS server goes down. See the AutoFS Howto [linux-consulting.com] for more information on setting it up.

    -- Greg
  • None of the above (Score:4, Interesting)

    by SlightlyMadman ( 161529 ) <slightlymadman AT slightlymad DOT net> on Tuesday May 13, 2003 @07:14PM (#5950022) Homepage
    It seems like a distributed filesystem might be overkill for your needs. If what you really want is the appearance of a single common machine, why not just pick one as a server, and set up your other boxes as X clients. You can even pull out most of their memory and storage, and stick it in the server, thus turning them all into pretty powerful machines.
  • NFS is not a DFS (Score:5, Informative)

    by purplebear ( 229854 ) on Tuesday May 13, 2003 @07:15PM (#5950031)
    Just so you all know. NFS is a network accessible FS. A DFS can also be network accessible from clients, but it physically resides on multiple systems.
  • You limited the possible set of answers to "for Linux." I'll ask the larger question; are there any good distributed filesystems? Good meaning; mature, stable, works on at least one platform well, and is as transparent as possible to that platforms software, within reason.

    Truth is the only thing that resembles a distributed filesystem I have ever used is Domino. It does what I need quietly, efficiently and consistently. You can't open(...) the content you have stored from a C program (others APIs exist
  • Obsolete ? (Score:5, Funny)

    by CmdrTostado ( 653672 ) on Tuesday May 13, 2003 @07:22PM (#5950082) Journal
    Which offers the best stability and protection from future obsolescence?

    The best protection from future obsolescence is to use something that is already obsolete.
  • AFS vs NFS (Score:5, Insightful)

    by runderwo ( 609077 ) <runderwo@mail.wi ... rg minus painter> on Tuesday May 13, 2003 @07:24PM (#5950103)
    It takes more time to set up an AFS cell than a NFS server, but the rewards are pretty tremendous IMO.

    It's become such a part of my day to day life that I can't really describe the things I was missing before. The best things about it are probably the strong, flexible security and ease of administration. It also gives you everything you need from a small shop all the way up to a globally available decentralized data store.

    There seems to be a good comparison here [tu-chemnitz.de]. I would strongly recommend AFS for all of your distributed filesystem needs. (The OpenAFS developers are cool too!)

    • Re:AFS vs NFS (Score:5, Informative)

      by pHDNgell ( 410691 ) on Tuesday May 13, 2003 @07:42PM (#5950253)
      I'm disturbed at the number of people who are recommending NFS as a distributed filesystem solution. While it might be easy to get going initially, I've had more long-term problems with my NFS server and client interactions than my AFS. To get my NFS clients to behave anything like AFS clients, I had to build and install an automounter that could use NIS config.

      You only have to wait for the first day you want to reboot a fileserver without breaking every system on your network or waiting for startup dependencies, etc... One day, I moved all of the volumes off of an active fileserver (i.e. volumes being written) and shut the thing down and moved it to another machine room, brought it back up, and moved the volumes back. The reads and writes continued uninterrupted, no clients had to be restarted, no hung filesystems anywhere, etc...
      • Yes, I too am disturbed. It goes to show how little people really know about enterprise scale file systems.

        Mod this guy up!

  • Tutorial (Score:5, Informative)

    by TheFlu ( 213162 ) on Tuesday May 13, 2003 @07:25PM (#5950106) Homepage
    I just went through this process a few weeks ago and I must say I'm really glad I went through the trouble of setting it up...it's very cool. I actually wrote a tutorial [thelinuxpimp.com] about how to accomplish this by using NIS and NFS. I hope you find it helpful.

    The only trouble you might run into with the setup I used is some file-locking issues with programs wanting to share the same preference files.
    • SECURITY (Score:2, Insightful)

      ... and the gaping wide security hole that is NFS.

      "Hello, I'm user ID 500 and I'd like my home directoy ... thanks ... my accounting data now!".

      NFS doesn't actually have security anymore, never has since IP-capable machines became physically portable but more importantly since the assumption that every box would have a trusted admin became invalid ... about 15 years ago.

      KILL NFS, we need something that doesn't suck.
  • unison, anyone? (Score:2, Informative)

    by gooofy ( 548515 )
    The problem with these distributed files systems seems to be that they're either pretty old and lacking features like disconnected operation (AFS) or seem to be unstable or, even worse, unmaintained (Intermezzo, Coda).
    For many simple purposes backups can be done quite nicely using rsync or something like bacular [bacula.org]. For laptop/notebook support unison [upenn.edu] is definitely worth a look. It syncs directories like rsync does, but in both directions. Works nicely for me.
  • by danpat ( 119101 ) on Tuesday May 13, 2003 @07:50PM (#5950313) Homepage
    I've spent quite some time researching this issue for here at work. We have two primary offices, separated by a 256k of network topology. Too slow for most users to find acceptable (large files, several 10s of seconds to copy). A bit of a culture problem but oh well.

    I looked into a whole pile of options for having a "live" filesystem, a-la NFS, but the bandwidth killed interactivity (this is for users who've never used 100mbit network filesystems before).

    I found the following:

    1. Windows 2000 Server includes a thing called "File Replication Service". Basically, it's a synchronisation service. You replicate the content to many servers, and the service watches transactions on the filesystem, and replicates them to the rest of the mirrors as soon as it can. You can write to all mirrors, but I never quite worked out how it handled conflict resolution.
    A chapter from the Windows 2000 Resource kit that describes how it works: http://www.microsoft.com/windows2000/techinfo/resk it/samplechapters/dsdh/dsdh_frs_tkae.asp

    2. Some people have done similar work for Unix systems, but they mostly involve kernel tweaks to capture filesystem events. Can't remember any URLS, but some Googling should find it.

    3. Some people are using Unison to support multi-write file replication. So long as you sync regularly, you shouldn't have too many problems.

    4. The multi-write problem is a hard one, so most people tend to say "don't do it, just make the bandwidth enough". This is the way to go if bandwidth isn't an issue.

    A guy by the name of Yasushi Saito has done quite a bit of research into data replication. Some papers (search for them on google in quotes). He also put together a project called "Pangaea" which tries to do as described above. It wasn't great last time I looked. Some paper titles:

    - Optimistic Replication for Internet Data Services
    - Consistency Management in Optimistic Replication Algorithms
    - Pangaea: a symbiotic wide-area file system
    - Taming aggressive replication in the Pangaea wide-area file system

    There is also a bunch of other research work:

    - Studying Dynamic Grid Optimisation Algorithms for File Replication
    - Challenges Involved in Multimaster Replication (note: this talks about Oracle database replication)
    - Chapter 18 of the Windows 2000 Server manual describes the File Replication Service in detail
    - How to avoid directory service headaches (talks about not having multi-master-write replication and why)
  • My university [umr.edu] uses AFS as well. From a user standpoint, once everything is set up it works great. They've got it seemlessly integrated into all the Windows, Linux, and Solaris boxen on campus using OpenAFS and Kerberos.

    I had no complaints with it at all, until I tried to get a FreeBSD machine working with AFS. For starters, OpenAFS doesn't have a FreeBSD port. I've heard rumors of one in the works, but I haven't seen anything useful in the last year. I did stumble across a project called arla however,
  • OpenAFS all the way (Score:5, Informative)

    by fsmunoz ( 267297 ) * <fsmunoz@m[ ]er.fsf.org ['emb' in gap]> on Tuesday May 13, 2003 @08:10PM (#5950449) Homepage
    I had more or less the same basic requirements and I opted for AFS.

    My needs were a little more demanding (had to be implemented in GNU/Linux, Solaris, AIX, HP-UX and as an extra Windows 2000) and grocking AFS can be difficult at first but it was the best choice by far. Stable across all the Unices, very secure (this was another requirement) and integrates perfectly with our Kerberos Domain and LDAP accounting info. It provides a unique namespace that can span multiple servers transparently, does replication, automatic backups and read-only copies, client-side cache with callbacks, has a backup (to tape) system that can be used stand-alone or integrated with existing backup structures (Amanda, Legato, TSM) AND was the basis for the DCE filesystem, DFS (as a side note I find it interesting - and sad - that most things people try to emulate this days are present in DCE [opengroup.org], and Windows 2000 got many of the "new features" from a technology initially made for Unix :DFS, DCOM, Directory Services, SSO, DCE-RPC, etc.)

    AFS is amazing and much more robust than any distributed filesystem I know of; it has shortcomings when servers time out, but apart from that it's really an excellent solution; an example I generally use to give an idea of some of the good features of AFS is a relocation of a home directory to another server. The user doesn't even notice that his home directory was moved to another server *even if he was using it and was writing stuff to disk*; at most all writing calls to his home dir have a small delay (a couple of seconds) even if his/her home dir was 5 Gb worth.

    Kerberos integration is an added bonus, if you can you can use this as an excuse to kerberize your systems and form a Kerberos Domain. If you don't want to just stick with the standard AFS KA server.

    In my setup I have Windows users accessing their home dirs in AFS using the Kerberos tickets they have from the Windows login and the fact that a cross-realm trust was made between the Unix DOmain and the AD; the can edit all the files they are entitled to with that ticket, and the system is so secure that Transarc used to put the source code in it's public AFS share and added the customers that bought the source to the ACL of the directory that contained it.

    With all this features it would be hard not to vivedly recommend OpenAFS [openafs.org] as the best solution for a unified, distributed filesystem. Bandwidth utilization is, in my experience, at least half of what NFS uses, which is an added bonus.

    cheers,

    fsmunoz
    • by MilliAtAcme ( 232737 ) on Tuesday May 13, 2003 @11:32PM (#5951669)
      I second this "all the way" thought. I've been running OpenAFS for almost 2 years now on Debian GNU/Linux (many Thanks to Sam Hartman, the maintainer) and have never been disappointed. It's been pretty darn solid and, most importantly, has never lost any of my data through various upgrade cycles. It's a bit of a change in thinking, however, for those coming from an NFS background.

      There were three big wins for me...

      (1) Global file namespace managed server-side and accessible from anywhere... LAN, WAN, whatever. All clients see files in the same location.

      Unlike NFS, where you have to "mount" volumes within the file system on each client, the AFS file system is globally the same, living under "/afs", so every client accesses the same information via the same file system path. A notion of "cells" makes this possible... information under a single administrative authority lives in a "cell", e.g., "/afs/athena.mit.edu" is the top-most "mount point" for a well-known cell at MIT. Volumes, in AFS parlence, also aren't tied to any particular server or even location in the name space as far as the clients know. A client doesn't have to know explicitly in it's configuration which server a given bit of information lives on, and that data can be moved around behind the scenes as necessary (increase the volume space, increase the redundancy, taken offline, etc...) All volume mounts are handled server-side. The clients only have to know about the cell database server, and that can be determined via AFSDB records in DNS. (I.e., your AFS "cell" name matches up with your domain name, e.g., /afs/athena.mit.edu matches up with "athena.mit.edu" in DNS.) So almost all management aspects are handled server-side.

      (2) Client side implementations.

      All my Linux and Windows machines can access the same AFS file space. An OS X client is available too, but I've not needed that to date, but might someday. I thus have all home directory information, as well as a lot of binaries, living in the AFS file space, in one place. And behind the scenes, that info is on multiple AFS servers that have RAID-5 disk arrays and weekly tape backups going on.

      (3) The file system "snapshot" feature, for backups.

      You can take a snapshot of volume(s) at a particular point in time and roll them onto tape without needing to take them offline. You don't have to worry about inconsistencies in the files. Folks can continue to update files but the backup snapshot doesn't change. Very much the same as the snapshot feature on Netapps. These snapshots, called backup volumes, can even be mounted in the file space so folks can get access to the old view of the volume, e.g., accidentally deleted a critical file and need it back.

      And security via Kerberos is nice, especially if you already have an infrastructure. But it's not too hard to setup a single KDC to get started. In the Debian distribution docs for OpenAFS, there's a setup and configuration transcript that makes things relatively easy and clears up a lot of questions.

      In summary, OpenAFS is a very good solution here.
  • The guy wants to be able to do things like disconnected operation and file sharing over a WAN. NFS is totally unsuitable for either of those as it provides neither distributed file service (if the server you are getting a file from goes down you lose) or disconnected operation.

    NFS is also not a distributed/global file system. It is a pretty primitive way to handle global namespace management compared to stuff like AFS. At best what an automounter lets you do is avoid a few of NFSes problems. Ideally, I'd s
  • by dargaud ( 518470 ) * <slashdot2@@@gdargaud...net> on Tuesday May 13, 2003 @08:19PM (#5950495) Homepage
    May I suggest something more than this ? If you have static IPs and you are running linux, why not install OpenMosix [sourceforge.net] ? It's a cluster patch to the kernel, very easy to install and use. Not only does it turn your pile of hardware into one giant SMP system, it also comes with a special filesystem on top of ext3, so that you can see all drives from all nodes. I have it running on 24 processors, but I don't know how well it would perform through the Internet.

    It's been featured [slashdot.org] on [slashdot.org] slashdot [slashdot.org] before [slashdot.org].

  • Well (Score:2, Insightful)

    by mindstrm ( 20013 )
    I think you should clarify what you mean by "distributed"... becuase that word is going to cause a lot of confusion.

    IF you want a few linux boxes to all basically share a lot of files, so you can log into any one, do whatever, only install stuff once... nfs is fine. If it's just on a private network just for you.

    NFS is not considered a "distributed" filesystem... but I'm not sure that's what yo want anyway.
  • Gawd (Score:3, Funny)

    by The Bungi ( 221687 ) <thebungi@gmail.com> on Tuesday May 13, 2003 @08:49PM (#5950655) Homepage
    I have several GNU/Linix machines

    I'm vaguely sure this is a brand new affront to RMS, but I just can't put my finger on it.

  • by elronxenu ( 117773 ) on Tuesday May 13, 2003 @09:15PM (#5950832) Homepage

    Why not stick with NFS for the time being?

    I went through the "is coda right for me?" phase, and also "is intermezzo right for me?" and also spent tens of hours researching distributed filesystems and cluster filesystems online ... my conclusion is that the area is still immature, I will let the pot simmer for a few more years (hopefully not many), and use NFS in the meantime.

    My situation: desire for scalable and fault-tolerant distributed filesystem for home use with minimal maintenance or balancing effort. Emphasis on scalable, I want to be able to grow the filesystem essentially without limit. I also don't want to spend much time moving data between partitions. And last but not least, the bigger the filesystem grows, the less able I will be to back it up properly. I want redundancy so that if a disk dies the data is mirrored onto another disk, or if a server dies then the clients can continue to access the filesystem through another server.

    All that seems to be quite a tall order. I checked out coda, afs, PVCS, sgi's xfs, frangipani, petal, nfs, intermezzo, berkeley's xfs, jfs, Sistina's gfs and some project Microsoft is doing to build a serverless filesystem based on a no-trust paradigm (that's quite unusual for Microsoft!).

    Berkeley's xFS (now.cs.berkeley.edu [berkeley.edu]) sounded the most promising but it appears to be a defunct project, as their website has been dead ever since I learned of it, and I expect the team never took it beyond the "research" stage into "let's GPL this and transform it into a robust production environment". Frangipani sounds interesting also, and maybe a little more alive than xFS.

    On the other hand coda, afs and intermezzo are all in active development. afs IMHO suffered from kerberitis, i.e. once you start using kerberos it invades everything and it has lots of problems (which I read about on the openAFS list every day). AFS doesn't support live replication (replication is done in a batch sense) either.

    CODA doesn't scale and doesn't have expected filesystem functionality: for 80 gigs of server space I would require 3.2 gigs of virtual memory, and there's a limit to the size of a CODA directory (256k) which isn't seen in ordinary filesystems. There's also the full-file-download "feature". CODA is good for serving small filesystems to frequently disconnected clients but it is not good for serving the gigabyte AVIs which I want to share with my family.

    Intermezzo is a lot more lightweight than CODA and will scale a lot better, but it's still a mirroring system rather than a network filesystem. I might use that to mirror my remote server where I just want to keep the data replicated and have write access on both the server and the client, but it's again not a solution for my situation.

    The best thing about intermezzo is that it sits on top of a regular filesystem, so if you lose intermezzo the data is still safe in the underlying filesystem. CODA creates its own filesystem within files on a regular filesystem, and if you lose CODA then the data is trapped.

    Frangipani is based on sharing data blocks, so like NFS it should be suitable for distributing files of arbitrary size. I need to look at it in a lot more detail; this is probably the right way to build a cluster filesystem for the long haul. For the short term, Intermezzo is probably the right way for a lot of people: it copies files from place to place on top of existing filesystems.

    What I did in the end:

    • new server (Celeron 1.3 GHz, 512 meg RAM)
    • 2 x 80 gig IDE disks
    • Each IDE drive has 2 partitions (one small, one huge)
    • Each partition is RAID-1 mirrored with its partner on the other disk
    • The huge RAID partition is defined to Linux LVM (logical volume manager)
    • Logical volumes are created within that for root, /home, etc...
    • All logical volumes are of type ext3 for recoverability.

    The way it works is tha

  • Linix? (Score:4, Funny)

    by CaptainSuperBoy ( 17170 ) on Tuesday May 13, 2003 @09:29PM (#5950936) Homepage Journal
    Um.. Linix? Learn the name of your fucking operating system, to start off with. It's spelled L-U-N-I-X.
  • WebDAV (Score:4, Interesting)

    by g4dget ( 579145 ) on Tuesday May 13, 2003 @09:36PM (#5950989)
    Right now, I think the answer is to run NFS: it's by far the easiest to set up and the best in a UNIX environment. AFS, CODA, Intermezzo, and SMB are pretty iffy in comparison.

    In the medium term, however, I think WebDAV will become a better option, because it can be served and accessed with standard web servers and clients, in addition to being mappable onto the file system.

    The Linux kernel already has WebDAV support (CODA hooks plus some user-mode process), although I'm not sure how well it works.

  • by Sri Ramkrishna ( 1856 ) <sriram.ramkrishna@gmail. c o m> on Tuesday May 13, 2003 @10:01PM (#5951144)
    Watch for the new version of NFSv4. There are already a sample implementation in the linux 2.5 tree. NFSv4 will address most of the problems that NFSv3 and others have. Including plugin security models, namespace, and revamped ACL handling.

    It's also WAN friendly, letting several operations be done at the same time with a single directive. (COMPOUND directive) It also allows you to migrate one filesystem to another with no stale filehandles. Basically, it's trying to be an AFS killer.

    For more information, take a look at
    http://www.nfsv4.org/

    Lots of good info including the IETF spec. It's a interesting read.

    The spec is not quite complete. Currently, I believe there are discussions with how NFSv4 will work with IPsec.

    Cheers,
    sri
  • Reasons why (Score:3, Informative)

    by photon317 ( 208409 ) on Wednesday May 14, 2003 @01:18AM (#5952184)

    There's some reasoning behind the lack of big interest in distributed filesystems.

    1) Obviously, NFS continues to be a passable solution where you dont really need "distributed" so much as "universally network accessible in a simple way".

    2) For things where you truly want "distributed" access from multiple machines that are local to each other, there's a somewhat less complicated solution, which is to use shared storage. The idea is to attach all the machines to a SAN-style network (fiber channel, or hey even firewire these days) and use a sharing-aware filesystem that allows simultaneous access to the storage/filesystem from multiple hosts with sane locking and whatnot. One of the better places to look for this is otn.oracle.com - they've released a cluster-filesystem *and* drivers for firewire shared storage (which is cheaper than fiberchannel) for linux.

    Of course, that leaves out the case of a distributed filesystem for machines that can't be on a SAN together for distance or economical reasons. In that case you could of course hack something up using cluster-filesystem type of filesystem and SCSI-over-IP or something like that I guess, or use one of the experimental distributed filesystems you mention... but the market just isn't big enough for this yet to drive much.
  • by Sloppy ( 14984 ) * on Wednesday May 14, 2003 @11:24AM (#5954980) Homepage Journal
    ..is that apparently it doesn't use unix-style file permissions. "ACLs are better" you might say, but still, it's different. It sounds like using it would not be transparent -- not just for the admin who has to learn how to set it up but also for users and existing software and scripts, which assume the chmod way of doing things.

    I mean, if I use AFS, does that mean from now on, every time I run an install script for some random package that chmods something, I have to realize that the script doesn't really work, and then I have to analyze its intent and then do some ACL thing that accomplishes the same intent? Ugh, I am not interested in things that create more work for humans.

    Another annoying-looking thing is that it's really a filesystem, even from the servers' point of view. Unlike sharing/exporting services such as NFS and Samba, which you can run on top of your choice of filesystem (ext3, Reiserfs, xfs, etc), it appears that AFS/OpenAFS combines both the disk and the network topics. That means you don't get the advantages of all the great work the filesystem geeks have been doing in the last few years.

    It almost strikes me as inelegant design or something, that a single project concerns itself with both the details of how things are laid out on a disk, and also how to do network-related things such as replication. Somebody made their black box too big for my tastes.

    Am I wrong about all this?

  • I use CXFS at work (Score:3, Interesting)

    by leeet ( 543121 ) on Wednesday May 14, 2003 @02:31PM (#5956731) Homepage
    I guess this doesn't really apply to "home usage" but I have to manage a lot of machines over a SAN and if you don't want people screwing up your SAN, you better use something like CXFS.
    CXFS uses a sort of token technique and allows multiple file accesses. That way, we get the same files on all the machines but w/o the NFS overhead and network congestion. File read/write are done over the fiber channel switch and the "metadata" is done over a private network. This is WAY much faster than NFS over Gigabit Ethernet. One good thing about CXFS is the redundency possibility. You can have failover servers and other neat things.

    The only drawback, is that you need an SGI server but then, you can use Windows and Solaris clients. Very stable but probable not for home use :)

"If it ain't broke, don't fix it." - Bert Lantz

Working...