Distributed Filesystems for Linux? 375
Zoneball looked at 3 distributed filesystems, here are his thoughts:
" Open AFS was the solution I chose because I have the experience with it from college. For performance, AFS was built with an intelligent client-side cache, but did not support network disconnects nicely. But there are other alternatives out there.
Coda appears to be a research fork from an earlier version of AFS. Coda supports disconnected operations. But, the consensus on the Usenet (when I looked into filesystems a while ago) was that Coda was still too 'experimental.'
Intermezzo looks like it was started with the lessons learned from Coda, but (again from Usenet) people have said that it is still too unstable and it crashes their servers. The last 'news' on their site is dated almost a year ago, so I don't even know if it's being developed or not"
So if you were to recommend a distributed filesystem for Linux machines, would you choose one of the three filesystems listed here, or something else entirely?
NFS (Score:4, Informative)
NFS Linux FAQ [sourceforge.net]
Howto #1 [sourceforge.net]
Howto #2 [linux.org]
If you find yourself needing help, try asking people at Just Linux forums [justlinux.com], or trying the NFS mailing list [sourceforge.net].
Re:NFS (Score:4, Informative)
It takes about 5 minutes to get an understanding of what you need. After setting it up it just works.
NFS is a great
Re:NFS (Score:3, Insightful)
permissions? (Score:2)
Re:permissions? (Score:5, Informative)
NIS == "Hack me please" (Score:5, Interesting)
Other options like LDAPS and Kerberos offer at least some form of security.
ypcat, then brute force attack on the resulting passwd file is as old as dirt, and sadly still works. I was a bit dissappointed when I saw NIS as a required service on the Redhat cert syllabus.
This may sound harsh, but I don't think there is much excuse for run NIS in this day and age. Anyone who does this in an environment where security is a concerns deserves what they get.
Re:NIS == "Hack me please" (Score:3, Informative)
Re:NIS == "Hack me please" (Score:3, Informative)
NIS is simple and easy to maintain. LDAP is harder. From memory (10 years ago) Kerberos was geared towards as single user on a single machine, is that still the case?
Lots of big organizations still use NIS because its flaws, while real, are well understood.
Re:NIS == "Hack me please" (Score:3, Informative)
It *does* have flaws (I'd prefer it did something similar to AFS's PAG-based authentication, such that tokens are per process group rather than for all instances of a given UID on a box -- and a malicious root can trivially steal tickets for all users who have valid ones on
NIS!???? (Score:2)
Re:permissions? (Score:5, Informative)
Could also look into LDAP (VERY complex, no good starting point that I've been able to find) and Kerbreos auth methods as well.
Should give you a central point for uids/usernames. But NFS does not have transparent mounting that I'm aware of so that you could mount, say the
<ECODE>
CPU1 contains:
CPU2 contains:
CPU3 contains:
on CPU4, you'd do the following:
mount CPU1:/home
mount CPU2:/home
mount CPU3:/home
And you'd end up with on CPU4:
/home/tic
</ECODE>
If there is a way to do this, please lemme know. I've heard people talk about it in the past, but haven't seen anything come of it yet.
Re:permissions? Automounter (Score:4, Informative)
Re:NFS (Score:5, Insightful)
NFS is not distributed, it's only "networked" or "remote". I t doesn't support any: replication, disconnection, sharing, distribution. It is centralised, requires the same user names|numberpace and security.
In one word, it's far away of the requirements, at least if you compare them with the listed FS in the question.
Yeah (Score:2)
Re:NFS (Score:3, Insightful)
Of course it is. It gives you a single, unified view of a file system tree that can span many machines.
It doesn't support any: replication, disconnection, sharing, distribution.
Sure it does. Some of that functionality requires more than a vanilla NFS server, but that can be transparent to clients.
It is centralised, requires the same user names|numberpace and security.
Older versions did, current versions don't.
Don't get me wrong, NFS has lots of problems in many environme
Re:NFS (Score:3, Informative)
Only if your mount tables are the same everywhere, and they need to be kept in sync on the client side. By contrast, in AFS, change where a volume is mounted anywhere -- and it's changed everywhere. Add a new volume on a different server on one client, and it's there in that same place on all of them. No mucking with the automounter, no distributing config files to all your machines, none of that mess.
AFS makes admin
Re:NFS (Score:4, Informative)
That's what NIS is for. Furthermore, the flexibility of being able to set up machines with different views of the network is crucial in many applications. None of my workstations or servers actually have the same mount tables: they all get some stuff via NIS, and some stuff is modified locally. The restrictions AFS imposes are just unacceptable.
AFS makes administration tremendously easier after one's scaled the initial learning curve.
AFS is an administrative nightmare. Apart from the mess that ACLs cause and the problems of trying to fit real-world file sharing semantics into AFS's straightjacket, just the number of wedged machines due to overfull caches and its complete disregard for UNIX file system semantics cause no end of support hassles. Then, there is the poor support for Windows clients. We started out using AFS because it sounded good on paper, but it was a disaster in terms of support, and we got rid of it again after several years of suffering.
It performs far, far better than NFS on large networks (and merely somewhat better on smaller ones).
AFS's caching scheme works better than what NFS is doing for small files, but that case is fast and easy anyway. AFS's approach falls apart for just the kind of usage where it would matter most: huge files accessed from many machines.
Both NFS and AFS have very serious problems. But between the two, NFS is far simpler than AFS, is easier to administer in complex real-world environments, respects UNIX file system semantics better, and works better with large files. I can guardedly recommend NFS or SMB ("there is nothing better around, so you might as well use it"), but I can't imagine any environment for which AFS is a reasonable choice anymore. The only thing AFS had ever going for it as far as I'm concerned is that it was fairly secure at a time when NFS had no security whatsoever, but that is not an issue anymore.
Re:NFS (Score:3, Informative)
Really, now? Tell me what you're trying to do that AFS won't allow you (not *how* you're trying to do it, but *what* you're trying to do), and I'll tell you how to do it with AFS.
Apart from the mess that ACLs cause and the problems of trying to fit real-world file sharing semantics into AFS's straightjacket
WHAT?! I could say the same thing about UNIX's user/group/world semantics, and far more defensibly. ACLs allow all sorts of useful things; I can ha
Re:NFS (Score:4, Funny)
Wouldn't it be simpler and easier to manage if users had to sign up for computer time on a mainframe? Just think: you would only have to support one system! The benefits to security and maintinence would be enormous. Letting users have their own computers seems nice, but since it requires less planning and thinking (as a mainframe timeshare system requires) it will always become unmanageable. After all, there's no way to plan for the use of advanced tools. Why do you think many larger 1970s corporations running large computer implementations have a policy of not allowing any employee to access the mainframe without signing up first?!?
(Note for the humour impaired: I'm parodying the above author's style.)
Symlinks are your friend! (Score:4, Insightful)
But for a lot of applications, you simply don't need that much, and you've got some way to contain the security risks, and NFS can be enough. It's easy enough to set up, and if all you're *really* trying to do is make sure that everybody sees their home directory as /home/~user, and sees the operating system in the usual places and the couple of important project directories as /projecta and /projectb, NFS with an automounter and a bunch of symlinks for your home directories is really just fine. They hide the fact that users ~aaron through ~azimuth are on boxa and ~beowulf through ~czucky are on boxbc etc. And yes, there are times you really want more than that, and letting your users go log onto the boxes where their disk drives really are to run their big Makes can be critical help. But for a lot of day-to-day applications, it really doesn't matter so much.
Re:NFS (Score:4, Informative)
Disconnection in a DFS means a certain degree of replication: you still are able to work on your files even you you have no access to you repository, or you are off-line. Autofs doesn't do that, altough you can have some rsync's scripts to partially solve the problem, it's not a scalable or viable workaround for several users.
NIS on the other hand is not a good solution for WAN connections or different networks. Should you use this kind of soultion, I'd take a look to openldap instead.
Re:NFS (Score:2)
He was talking about disconnected operation, not automounting. Disconnected or offline operation means you can still access the data without connecting to the server (or a peer as the case may be). Totally different from automounting. As far as I know you can't do this with NFS.
Re:NFS (Score:5, Interesting)
The only file system that is truely distributed, has a global namespace, replication, and fault tolerance is AFS.
NFS is pretty much the same as CIFS for Windows. And, version 4 still doesn't have global namespace and volume location.
So, NFS can't be a common answer because it isn't even allowed to be in the game.
+4 cents.
NFS is not even close to secure (Score:5, Interesting)
That's like saying "jumping off a cliff is not the most intelligent thing to do." NFS is easily the LEAST secure option of ANY filesharing system.
NFS is only appropriate on a 100% secured(physical and network-level) network. If anyone/someone can plug in, forget it. If anyone has root on ANY system or there are ANY non-unix systems, forget it. If ANY system is physically accessible and can be booted off, say, a CDROM, forget it. The only major security tool at your disposal is access by IP, which is pathetic. Oh, and you can block root access.
Even though you can block root access for some/all clients, it's still massively insecure, and this remains NFS's greatest problem. You have zero way of authenticating a system. NFS is like a store where you could walk in, pick up any item you wanted, and say "I'm Joe Shmoe, bill me for this!" and they'd say "Right-o!" without even looking at you. All systems with the right IPs are explicitly trusted, and their user/permissions setups are also explicitly trusted.
NFS is a pretty good performer, especially when tuned right and on a non-broken client(which linux is VERY far from.) However, its entire security model is in dire need of a complete overhaul. There needs to be a way to authenticate hosts, for one, more similar to WinNT's domain setup, which is actually incredibly intelligent(aside from the weak LANMAN encryption.) The administrative functionality in NFS can't compare to the features that have been available to MacOS and Windows administrators for over a decade, and it's purely embarassing.
Either that, or AFS/Coda need to get a lot more documentation and (for Coda)implementation fixes. The unix world desperately needs a good filesharing system...
Re:NFS is not even close to secure (Score:5, Informative)
I use a very simple script to help keep NFS secure:
Basically it only allows incoming NFS-related connections over ipsec, dropping anything that is not. NFS port allocation is dynamic by default and I know you can force ports, but this seemed far easier to scale.
One thing I have noticed (and perhaps it's common knowledge to NFS experts) is that in order to get locking to work at all, my NFS clients had to be running statd and lockd. Without 'em everything worked but locking would fail every time.
Re:NFS is not even close to secure (Score:4, Interesting)
Of course, that doesn't mean it's a good idea. I think your solution with IPSec is much more elegant. Unfortunately, I happen to need to get through a heavily packet-shaped network that massively favors port 80, and drops random packets everywhere else. Not IPSec friendly at all. I avoid this by running multiple ppp/ssh tunnels through the retarded parts of the network and letting my gateway balance between them. Unfortunately, this requires privileged accounts on many, many boxes in odd places.
By the way, 10 points to any Northeastern University students who send polite, well considered complaints to Network Services. Not RESNet - they exist only to prevent you from talking to Network Services. Don't bother yelling at them - they exist specifically for that purpose. RESNet has no authority whatsoever to, for instance, allow CVS to work when Network Services decides to to drop 90 percent of packets on port 2401. This is for your benifit - I'm perfectly happy with my tunnels.
Re:NFS is not even close to secure (Score:4, Informative)
By that you mean that it's easy to read stuff off people's directory if you can spoof their UID. Sure. I think you'll find the same is true on a SMB network.
> The administrative functionality in NFS can't
> compare to the features that have been available
> to MacOS and Windows administrators for over a
> decade,
Given that 10 years ago Windows for Workgroup had hardly been released and didn't even have TCP/IP by default I think you are exagerating a little bit. At the same time MacOS version 7 was the norm, and we all know how secure that one was, right?
Maybe NFS4 [samba.org] is your answer?
Re:NFS is not even close to secure (Score:5, Interesting)
--Bruce Fields
Re:NFS is not even close to secure (Score:3, Informative)
http://docs.sun.com/db/doc/816-7125/6md5dsnvv?a
http://nscp.upenn.edu/aix4.3html/aixbman/comm
http://docs.sun.com/db/doc/805-72
Re:NFS is not even close to secure (Score:3, Interesting)
-- Agthorr
Re:NFS (Score:5, Insightful)
'jfb
Re:NFS (Score:3, Insightful)
This is what immediately came to mind for me too. Except for one thing. NFS is not a distributed filesystem. It's merely a network filesystem. The data itself actually resides only in one central place, and is not distributed in any way. Storage is not shared across machines, and therefore NFS is limited, in performance and redundancy, to the levels that single storage point represents. If it's an infinitely scalable, fault-tol
Yup NFS (Score:2, Informative)
Re:Yup NFS (Score:2)
AFS on the other hand has volume location independence, client-side-cache, token based ACL security, global namespace, yata, yata, yata. If you think NFS is distributed, you are still in your crib compared to real enterprise filesystem administrators.
Go get another
Nope, not NFS...yes AFS... (Score:4, Informative)
NFS is not secure. At most sites, NFS is exported read-only and limited to the domain, or to a given set of machine(s). If you export NFS as read/write then the client had better be secured, or you better use kerberos, and for damn sure better be behind a firewall. NFS has no client side cache, no volume location service, no ACL's, no authentication (unless kerberized), no replication, yata, yata, yata. We've used NFS sparingly for over 15 years because we -know how it works, and know its limitations.
On the other hand, we set an AFS cell for enterprise scale application and data sharing. It currently uses Kerberos V authentication, has volume replication, global namespace, client cache, fault tolerance. User's can setup their own groups, set their own ACL permissions. Did I say quota? AFS has per-user/per-volume quota. Hey, guess what, symbolic links work from any volume to any volume on AFS. And, AFS is just a simple daemon. You crank it up, mount the top of your cell and poof, you are done.
Another positive is the fact that once you setup an AFS cell you automatically become part of a larger community. Any AFS cell can mount the entire file system of another AFS cell within the same tree. I can for example mount many large university and government cells and share files. AFS allows Internet-wide file sharing with full security. On most versions of the client you can even enable encryption on the connection so your files won't be snooped easily.
All of our Solaris, Windows, Linux, and Mac boxes can use the same AFS tree without blinking an eye. We use AFS for many things. Before LDAP was really worth anything, we used AFS for simply exchanging read-only data. It -is- a replcated and global file system! Just put your config files in the tree and you are done.
If you are one of those people who are blinded by "always doing things one way", then I'd suggest you wake up and smell another technology, I did, and I liked what I got in return. Look into OpenAFS, you'll be glad you did.
+10,000 karma points!
Re:Yup NFS (Score:2, Informative)
Hell, if you're going to go to the trouble, you can just use the rsync method and not NIS! But I digress...
Self Certifying File System (Score:5, Informative)
Re:Self Certifying File System (Score:4, Interesting)
Highly recommend cheking it out. Mega convenient.
Well it depends... (Score:5, Informative)
Since openafs [openafs.org] forked from the old transarc/IBM codebase, it looks as if it has a real future. It's used by a load of educational and research institutions (notably CERN), as well as Wall Street firms.
Re:Well it depends... (Score:2)
Re:Well it depends... (Score:2)
No kidding. I had an account where I could "cd /afs" and get into at least twenty universities. (I was trying to use some software that a student put on the web, and I needed to recompile it but he refused to put up the source. So I cd'ed into his home directory at his school and copied it. For my personal use only, not to distribute.)
One nasty glitch is that (at least in some installations) AFS
NFS/BOOTP (Score:3, Informative)
Just my $00.2
Rus
Re:NFS/BOOTP (Score:2)
25 cents - $ (octal) 0.2 (Score:2)
Re:N/T, OT: $00.2 -- That's 20 cents, actually. (Score:2)
Not Today
Background on DFS (Score:5, Informative)
PVFS (Score:5, Informative)
http://parlweb.parl.clemson.edu/pvfs/ [clemson.edu]
Re:PVFS (Score:2)
No offense here, it's just the first thing I had in mind. LOL
openmosix (Score:5, Informative)
If you want to take a look..
http://lucifer.intercosmos.net/index.php [intercosmos.net]
linkage and I am going to be placing some tutorials up. -joeldg
Re:openmosix (Score:3, Insightful)
Using MFS, you can just have one pool of disks, memory, cpu's and the processes will migrate to the data; instead of copying the data around.
Great system, once you settle on one version of the kernel (have to be the same on all machines)
Ye olde Samba (Score:4, Informative)
No need to unnecessarily complicate things here, samba is simple to set up and functions great.
Non-obvious google (Score:2)
Rus
I'll go with NFS too... (Score:2)
Mirroring file system (Score:2)
What is a good stable solution for this? Currently I'm just using a tar over ssh once a night to do an incremental backup.
Re:Mirroring file system (Score:5, Informative)
Unison will synchronize any two file trees in The Right Way (TM).
Get the gtk version for interactive conflict resolution.
Re:Mirroring file system (Score:2)
Unison will synchronize any two file trees in The Right Way (TM).
Well luckily it's not two way mirroring, it's purely one way. My clients all update their website on the primary server only, and any changes are then backuped up nightly to the live backup server. However a better solution is definately desirable.
Unison (Score:3, Informative)
Re:Mirroring file system (Score:2)
Re:Mirroring file system (Score:5, Interesting)
Rsync is nice because you can update lots of files very quickly, as it only moves binary diff's between files. Also, if it is a costly network link, you have the option to specify max transfer rates, so you don't kill your pipe when it runs from your cron job.
Unison is nice because it is pretty smart about determining which files should be moved, and can correctly handle new and deleted files on either end of the link. Plus it supports doing all of it's comm via ssh, so it's secure.
rsync [samba.org]
unison [upenn.edu]
The downside to both of these being that neither of them are instantaneous. However, I've had much success running both of these as often as every 5 minutes. Just make sure that you launch them from a script that is smart enough to check for already running instances before it starts trying to move data.
Re:Mirroring file system - example w/ssh (Score:3, Informative)
-e is how to go - so -e ssh means use ssh
-a (archive mode - see docs)
-z compression - if you have more CPU vs pipe, use it. but if you are on a lan, you probably want to leave it off unless you don't mind the cpu hog (fat pipes will use more cpu time for compression)
-u update only (dont overwrite newer files)
--stats show you what it did when it is done
--exclude leave off paths/files you want to skip
--bwlim
Re:Mirroring file system - example w/ssh (Score:3, Interesting)
Re:Mirroring file system - example w/ssh (Score:3, Interesting)
Use rsync. Default is to map user and group names at both ends of the connection, unless you specify --numeric-ids. Of course you have to have at least the names right, otherwise there's nothing to work with. And you need rooteness on the receiving end, but that's also to be expected.
I've been using rsync for some time now to manage moving research data between
Re:Mirroring file system (Score:2)
Basically I have two servers, one is the active server one is the fall over backup. Now my nightly tar backup is good enough, but I was wondering if there was a better solution. A way to sync the two
Intermezzo does appear to be a current project (Score:5, Informative)
The sourceforge page for the project (http://sourceforge.net/projects/intermezzo) shows status as production/stable but the info there looks stale too.
Future obsolescence ? (Score:5, Insightful)
This guy must have installed too many versions of the same Microsoft products. ... You can still configure you networking using scripts for 2.0- or 2.2-based distros. You can often use 20 year old programs under Unix, albeit sometimes with some effort.
In the GNU/Linux world, BSD world, and to some extend in the entire Unix world, good designs do not become obsolete. Even not-so-good designs often stick around, for the sake of backward compatibility. In the newest greatest Linux kernel, you can still have a.out support, NFS, Minix, FAT16 filesystem support
Only in the M$ world is obsolescence such a big issue, because that obsolescence is planned. In short, don't worry that much about obsolescence : if Coda is as good as it looks, it'll be there for a long time. If SomeCrappyDistributedFS FileSystem is used by enough users, it'll stay around for compatibility's sake anyway, even if it sucks.
Re:Future obsolescence ? (Score:3, Informative)
Err, that's forwards compatibility. Backwards compatibility would be running a Red Hat 5.2 package on a Red Hat 9 box (if that runs, then Red Hat 9 is backwards compatible with Red Hat 5.2).
That said, though, you're discussing a different thing. The Linux kernel has a very good track record on backwards compatibility, as he stated. The Linux userland (as provided by Red Hat and such) has a really fsckin'
NFS & autofs (Score:4, Informative)
-- Greg
None of the above (Score:4, Interesting)
NFS is not a DFS (Score:5, Informative)
Distributed Filesystems? (Score:2)
Truth is the only thing that resembles a distributed filesystem I have ever used is Domino. It does what I need quietly, efficiently and consistently. You can't open(...) the content you have stored from a C program (others APIs exist
Obsolete ? (Score:5, Funny)
The best protection from future obsolescence is to use something that is already obsolete.
AFS vs NFS (Score:5, Insightful)
It's become such a part of my day to day life that I can't really describe the things I was missing before. The best things about it are probably the strong, flexible security and ease of administration. It also gives you everything you need from a small shop all the way up to a globally available decentralized data store.
There seems to be a good comparison here [tu-chemnitz.de]. I would strongly recommend AFS for all of your distributed filesystem needs. (The OpenAFS developers are cool too!)
Re:AFS vs NFS (Score:5, Informative)
You only have to wait for the first day you want to reboot a fileserver without breaking every system on your network or waiting for startup dependencies, etc... One day, I moved all of the volumes off of an active fileserver (i.e. volumes being written) and shut the thing down and moved it to another machine room, brought it back up, and moved the volumes back. The reads and writes continued uninterrupted, no clients had to be restarted, no hung filesystems anywhere, etc...
Re:AFS vs NFS (Score:2)
Mod this guy up!
Re:AFS vs NFS (Score:2)
The end.
Tutorial (Score:5, Informative)
The only trouble you might run into with the setup I used is some file-locking issues with programs wanting to share the same preference files.
SECURITY (Score:2, Insightful)
"Hello, I'm user ID 500 and I'd like my home directoy
NFS doesn't actually have security anymore, never has since IP-capable machines became physically portable but more importantly since the assumption that every box would have a trusted admin became invalid
KILL NFS, we need something that doesn't suck.
unison, anyone? (Score:2, Informative)
For many simple purposes backups can be done quite nicely using rsync or something like bacular [bacula.org]. For laptop/notebook support unison [upenn.edu] is definitely worth a look. It syncs directories like rsync does, but in both directions. Works nicely for me.
Remote Synchronised filesystems (Score:3, Informative)
I looked into a whole pile of options for having a "live" filesystem, a-la NFS, but the bandwidth killed interactivity (this is for users who've never used 100mbit network filesystems before).
I found the following:
1. Windows 2000 Server includes a thing called "File Replication Service". Basically, it's a synchronisation service. You replicate the content to many servers, and the service watches transactions on the filesystem, and replicates them to the rest of the mirrors as soon as it can. You can write to all mirrors, but I never quite worked out how it handled conflict resolution.
A chapter from the Windows 2000 Resource kit that describes how it works: http://www.microsoft.com/windows2000/techinfo/res
2. Some people have done similar work for Unix systems, but they mostly involve kernel tweaks to capture filesystem events. Can't remember any URLS, but some Googling should find it.
3. Some people are using Unison to support multi-write file replication. So long as you sync regularly, you shouldn't have too many problems.
4. The multi-write problem is a hard one, so most people tend to say "don't do it, just make the bandwidth enough". This is the way to go if bandwidth isn't an issue.
A guy by the name of Yasushi Saito has done quite a bit of research into data replication. Some papers (search for them on google in quotes). He also put together a project called "Pangaea" which tries to do as described above. It wasn't great last time I looked. Some paper titles:
- Optimistic Replication for Internet Data Services
- Consistency Management in Optimistic Replication Algorithms
- Pangaea: a symbiotic wide-area file system
- Taming aggressive replication in the Pangaea wide-area file system
There is also a bunch of other research work:
- Studying Dynamic Grid Optimisation Algorithms for File Replication
- Challenges Involved in Multimaster Replication (note: this talks about Oracle database replication)
- Chapter 18 of the Windows 2000 Server manual describes the File Replication Service in detail
- How to avoid directory service headaches (talks about not having multi-master-write replication and why)
AFS good on linux, good luck on FreeBSD (Score:2)
I had no complaints with it at all, until I tried to get a FreeBSD machine working with AFS. For starters, OpenAFS doesn't have a FreeBSD port. I've heard rumors of one in the works, but I haven't seen anything useful in the last year. I did stumble across a project called arla however,
Re:AFS good on linux, good luck on FreeBSD (Score:2)
Anyway, AFS *can* use Kerberos v5. The initial configuration can be a bloody nightmare though...
OpenAFS all the way (Score:5, Informative)
My needs were a little more demanding (had to be implemented in GNU/Linux, Solaris, AIX, HP-UX and as an extra Windows 2000) and grocking AFS can be difficult at first but it was the best choice by far. Stable across all the Unices, very secure (this was another requirement) and integrates perfectly with our Kerberos Domain and LDAP accounting info. It provides a unique namespace that can span multiple servers transparently, does replication, automatic backups and read-only copies, client-side cache with callbacks, has a backup (to tape) system that can be used stand-alone or integrated with existing backup structures (Amanda, Legato, TSM) AND was the basis for the DCE filesystem, DFS (as a side note I find it interesting - and sad - that most things people try to emulate this days are present in DCE [opengroup.org], and Windows 2000 got many of the "new features" from a technology initially made for Unix
AFS is amazing and much more robust than any distributed filesystem I know of; it has shortcomings when servers time out, but apart from that it's really an excellent solution; an example I generally use to give an idea of some of the good features of AFS is a relocation of a home directory to another server. The user doesn't even notice that his home directory was moved to another server *even if he was using it and was writing stuff to disk*; at most all writing calls to his home dir have a small delay (a couple of seconds) even if his/her home dir was 5 Gb worth.
Kerberos integration is an added bonus, if you can you can use this as an excuse to kerberize your systems and form a Kerberos Domain. If you don't want to just stick with the standard AFS KA server.
In my setup I have Windows users accessing their home dirs in AFS using the Kerberos tickets they have from the Windows login and the fact that a cross-realm trust was made between the Unix DOmain and the AD; the can edit all the files they are entitled to with that ticket, and the system is so secure that Transarc used to put the source code in it's public AFS share and added the customers that bought the source to the ACL of the directory that contained it.
With all this features it would be hard not to vivedly recommend OpenAFS [openafs.org] as the best solution for a unified, distributed filesystem. Bandwidth utilization is, in my experience, at least half of what NFS uses, which is an added bonus.
cheers,
fsmunoz
Re:OpenAFS all the way (Score:4, Informative)
There were three big wins for me...
(1) Global file namespace managed server-side and accessible from anywhere... LAN, WAN, whatever. All clients see files in the same location.
Unlike NFS, where you have to "mount" volumes within the file system on each client, the AFS file system is globally the same, living under "/afs", so every client accesses the same information via the same file system path. A notion of "cells" makes this possible... information under a single administrative authority lives in a "cell", e.g., "/afs/athena.mit.edu" is the top-most "mount point" for a well-known cell at MIT. Volumes, in AFS parlence, also aren't tied to any particular server or even location in the name space as far as the clients know. A client doesn't have to know explicitly in it's configuration which server a given bit of information lives on, and that data can be moved around behind the scenes as necessary (increase the volume space, increase the redundancy, taken offline, etc...) All volume mounts are handled server-side. The clients only have to know about the cell database server, and that can be determined via AFSDB records in DNS. (I.e., your AFS "cell" name matches up with your domain name, e.g.,
(2) Client side implementations.
All my Linux and Windows machines can access the same AFS file space. An OS X client is available too, but I've not needed that to date, but might someday. I thus have all home directory information, as well as a lot of binaries, living in the AFS file space, in one place. And behind the scenes, that info is on multiple AFS servers that have RAID-5 disk arrays and weekly tape backups going on.
(3) The file system "snapshot" feature, for backups.
You can take a snapshot of volume(s) at a particular point in time and roll them onto tape without needing to take them offline. You don't have to worry about inconsistencies in the files. Folks can continue to update files but the backup snapshot doesn't change. Very much the same as the snapshot feature on Netapps. These snapshots, called backup volumes, can even be mounted in the file space so folks can get access to the old view of the volume, e.g., accidentally deleted a critical file and need it back.
And security via Kerberos is nice, especially if you already have an infrastructure. But it's not too hard to setup a single KDC to get started. In the Debian distribution docs for OpenAFS, there's a setup and configuration transcript that makes things relatively easy and clears up a lot of questions.
In summary, OpenAFS is a very good solution here.
NFS is not a reasonable choice for the problem (Score:2)
NFS is also not a distributed/global file system. It is a pretty primitive way to handle global namespace management compared to stuff like AFS. At best what an automounter lets you do is avoid a few of NFSes problems. Ideally, I'd s
Something more than this... (Score:4, Interesting)
It's been featured [slashdot.org] on [slashdot.org] slashdot [slashdot.org] before [slashdot.org].
Well (Score:2, Insightful)
IF you want a few linux boxes to all basically share a lot of files, so you can log into any one, do whatever, only install stuff once... nfs is fine. If it's just on a private network just for you.
NFS is not considered a "distributed" filesystem... but I'm not sure that's what yo want anyway.
Gawd (Score:3, Funny)
I'm vaguely sure this is a brand new affront to RMS, but I just can't put my finger on it.
A potted review of several distributed filesystems (Score:5, Informative)
Why not stick with NFS for the time being?
I went through the "is coda right for me?" phase, and also "is intermezzo right for me?" and also spent tens of hours researching distributed filesystems and cluster filesystems online ... my conclusion
is that the area is still immature, I will let the pot simmer for a
few more years (hopefully not many), and use NFS in the meantime.
My situation: desire for scalable and fault-tolerant distributed filesystem for home use with minimal maintenance or balancing effort. Emphasis on scalable, I want to be able to grow the filesystem essentially without limit. I also don't want to spend much time moving data between partitions. And last but not least, the bigger the filesystem grows, the less able I will be to back it up properly. I want redundancy so that if a disk dies the data is mirrored onto another disk, or if a server dies then the clients can continue to access the filesystem through another server.
All that seems to be quite a tall order. I checked out coda, afs, PVCS, sgi's xfs, frangipani, petal, nfs, intermezzo, berkeley's xfs, jfs, Sistina's gfs and some project Microsoft is doing to build a serverless filesystem based on a no-trust paradigm (that's quite unusual for Microsoft!).
Berkeley's xFS (now.cs.berkeley.edu [berkeley.edu]) sounded the most promising but it appears to be a defunct project, as their website has been dead ever since I learned of it, and I expect the team never took it beyond the "research" stage into "let's GPL this and transform it into a robust production environment". Frangipani sounds interesting also, and maybe a little more alive than xFS.
On the other hand coda, afs and intermezzo are all in active development. afs IMHO suffered from kerberitis, i.e. once you start using kerberos it invades everything and it has lots of problems (which I read about on the openAFS list every day). AFS doesn't support live replication (replication is done in a batch sense) either.
CODA doesn't scale and doesn't have expected filesystem functionality: for 80 gigs of server space I would require 3.2 gigs of virtual memory, and there's a limit to the size of a CODA directory (256k) which isn't seen in ordinary filesystems. There's also the full-file-download "feature". CODA is good for serving small filesystems to frequently disconnected clients but it is not good for serving the gigabyte AVIs which I want to share with my family.
Intermezzo is a lot more lightweight than CODA and will scale a lot better, but it's still a mirroring system rather than a network filesystem. I might use that to mirror my remote server where I just want to keep the data replicated and have write access on both the server and the client, but it's again not a solution for my situation.
The best thing about intermezzo is that it sits on top of a regular filesystem, so if you lose intermezzo the data is still safe in the underlying filesystem. CODA creates its own filesystem within files on a regular filesystem, and if you lose CODA then the data is trapped.
Frangipani is based on sharing data blocks, so like NFS it should be suitable for distributing files of arbitrary size. I need to look at it in a lot more detail; this is probably the right way to build a cluster filesystem for the long haul. For the short term, Intermezzo is probably the right way for a lot of people: it copies files from place to place on top of existing filesystems.
What I did in the end:
The way it works is tha
Linix? (Score:4, Funny)
WebDAV (Score:4, Interesting)
In the medium term, however, I think WebDAV will become a better option, because it can be served and accessed with standard web servers and clients, in addition to being mappable onto the file system.
The Linux kernel already has WebDAV support (CODA hooks plus some user-mode process), although I'm not sure how well it works.
Watch for NFSv4 in the future! (Score:4, Informative)
It's also WAN friendly, letting several operations be done at the same time with a single directive. (COMPOUND directive) It also allows you to migrate one filesystem to another with no stale filehandles. Basically, it's trying to be an AFS killer.
For more information, take a look at
http://www.nfsv4.org/
Lots of good info including the IETF spec. It's a interesting read.
The spec is not quite complete. Currently, I believe there are discussions with how NFSv4 will work with IPsec.
Cheers,
sri
Reasons why (Score:3, Informative)
There's some reasoning behind the lack of big interest in distributed filesystems.
1) Obviously, NFS continues to be a passable solution where you dont really need "distributed" so much as "universally network accessible in a simple way".
2) For things where you truly want "distributed" access from multiple machines that are local to each other, there's a somewhat less complicated solution, which is to use shared storage. The idea is to attach all the machines to a SAN-style network (fiber channel, or hey even firewire these days) and use a sharing-aware filesystem that allows simultaneous access to the storage/filesystem from multiple hosts with sane locking and whatnot. One of the better places to look for this is otn.oracle.com - they've released a cluster-filesystem *and* drivers for firewire shared storage (which is cheaper than fiberchannel) for linux.
Of course, that leaves out the case of a distributed filesystem for machines that can't be on a SAN together for distance or economical reasons. In that case you could of course hack something up using cluster-filesystem type of filesystem and SCSI-over-IP or something like that I guess, or use one of the experimental distributed filesystems you mention... but the market just isn't big enough for this yet to drive much.
What troubles me about AFS/OpenAFS... (Score:3, Interesting)
I mean, if I use AFS, does that mean from now on, every time I run an install script for some random package that chmods something, I have to realize that the script doesn't really work, and then I have to analyze its intent and then do some ACL thing that accomplishes the same intent? Ugh, I am not interested in things that create more work for humans.
Another annoying-looking thing is that it's really a filesystem, even from the servers' point of view. Unlike sharing/exporting services such as NFS and Samba, which you can run on top of your choice of filesystem (ext3, Reiserfs, xfs, etc), it appears that AFS/OpenAFS combines both the disk and the network topics. That means you don't get the advantages of all the great work the filesystem geeks have been doing in the last few years.
It almost strikes me as inelegant design or something, that a single project concerns itself with both the details of how things are laid out on a disk, and also how to do network-related things such as replication. Somebody made their black box too big for my tastes.
Am I wrong about all this?
I use CXFS at work (Score:3, Interesting)
CXFS uses a sort of token technique and allows multiple file accesses. That way, we get the same files on all the machines but w/o the NFS overhead and network congestion. File read/write are done over the fiber channel switch and the "metadata" is done over a private network. This is WAY much faster than NFS over Gigabit Ethernet. One good thing about CXFS is the redundency possibility. You can have failover servers and other neat things.
The only drawback, is that you need an SGI server but then, you can use Windows and Solaris clients. Very stable but probable not for home use
Re:Format, Install Windows Server 2000 or 2003 (Score:2, Informative)
Re:Format, Install Windows Server 2000 or 2003 (Score:5, Funny)
Format, Install Windows Server 2000 or 2003, Repeat
Re:Oop (Score:2)
Re:rsync? (Score:2)
This may work, but only if a user doesn't login to more than one machine at a time. On login: rsync the user's directory off the server. And on logout: rsync the user's directory up to the server. But again, that's not very distributed.
(Also this seems to be what happens in our Win2k configuration where I work.)
Re:rsync? (Score:3, Funny)
And here I thought you were going for "+5 funny". rsync as a DFS? Man, that's scary. Someone get this guy a job at Microsoft!
Re:Andrew File system (Score:2)
AFS is the only enterpise ready, cross platform, secure, global namespace file system that is worth the money. In fact, OpenAFS is free...that's right...as in "beer".
+10 cents.