Ask Slashdot: Best *nix Distro For a Dynamic File Server? 234
An anonymous reader (citing "silly workplace security policies") writes "I'm in charge of developing for my workplace a particular sort of 'dynamic' file server for handling scientific data. We have all the hardware in place, but can't figure out what *nix distro would work best. Can the great minds at Slashdot pool their resources and divine an answer? Some background: We have sensor units scattered across a couple square miles of undeveloped land, which each collect ~500 gigs of data per 24h. When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds. We need to present the contents of these drives as one unified tree (shared out via Samba), and the best way to go about that appears to be a unioning file system. There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive. We've been looking around, but are having trouble finding info for this seemingly simple situation. Can we get FreeNAS to do this? Do we try Greyhole? Is there a distro that can run unionfs/aufs/mhddfs out-of-the-box without messing with manual recompiling? Why is documentation for *nix always so bad?""
Re:Wow (Score:2, Interesting)
He still hasn't told us what filesystem is on these drives they're pulling out of the field. That's the most important detail...........
You don't need a union file system (Score:1, Interesting)
There's no reason you need a union filesystem. Just mount the data at an appropriate point in a directory tree. Union file systems are designed to solve a different problem.
What you boot from has nothing to do w/ what you read the data from.
Samba is a really strange choice. Given the data volume I'd expect you to be using a large Linux cluster to process the data for which NFS would be more appropriate. It certainly sounds like microseismic data in which case the processing will benefit from making duplicate copies of the data and mounting read only via NFS so the first available server provides the data. Multiple ethernets are needed to get full benefit from doing that though.
*nix documentation is actually very good. But there is a lot of it, so you tend to have grey hair by the time you've read all of it.
BTW Does the CEO play guitar? I play harmonica.
Re:Do you need a unified filesystem at all? (Score:4, Interesting)
Comment removed (Score:4, Interesting)
Re:Wow (Score:3, Interesting)
Op here:
1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).
2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?
3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.
> Better yet, tell us what you need to do
- Take a server that is off, and boot it remotely (via ethernet magic packet)
- Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives.
- Share out the unioned virtual tree in such a way that it it's easily accessible to mac/win clients
- Do all this in under 30 seconds
I don't know why people keep focusing on the "under 30 seconds" part, it's not that hard to get linux to do this.....
> huge amounts of man pages
quantity != quality
Re:Wow (Score:4, Interesting)
1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).
Yes, I will ask why. Like why booted, and not hibernated, for example, if part of the reason is that it has to be powered off.
If the server is single-purpose file serving of huge files once, it does not benefit from huge amounts of RAM, and can hibernate/wake in a short amount of time, depending on which peripherals have to be restarted.
2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?
Yes? While Microsoft usually sucks, it can still be the least sucky choice for specific tasks. And there are more alternatives than Linux out there too.
3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.
What's the format on the drives? That can be a limiting factor. And what's the specifics for "sharing"? Must files be locked (or lockable) during access, are there access restrictions on who can access what?
For what it's worth, Windows Vista/7/2008R2 all come with Interix (as "Services for Unix") NFS support. So that's also an alternative.
- Take a server that is off, and boot it remotely (via ethernet magic packet)
That you want to "wake" it does not imply that the server has to be shut off. It can be in low power mode, for example - Apple's "bonjour" (which is also available for Linux) has a way to "wake" services from low-power states.
- Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives.
Why? Sharing a single directory under which all the drives are mounted would also give access to all the drives under a single mount point - no need for union unless you really need to merge directories and for some reason cannot do the equivalent with symlinks ("junctions" in MS jargon).
Unions are much harder, as you will need to decide what to do when inevitably the same file exists on two drives (even inconspicuous files like "desktop.ini" created by people browsing the file systems).
Even copying the files to a common (and preferably RAIDed) area is generally safer - that way, you also don't kill the whole share if one drive is bad, and can reject a drive that comes in faulty.
But you seem to have made the choices beforehand, so I'm not sure why I answer.
- Do all this in under 30 seconds
You really should have designed the system with the 30 seconds as a deadline then.
If I were to do this, I would first try to get rid of the sneakernet requirement. 4G modems sending the data, for example. But if sneakernetting drives is impossible to get around, I'd choose a continuously running system with hotplug bays and automount rules.
Unless the data has to be there 30 seconds from when the drive arrives (this is not clear - from the above it appears that only the client access to the system has that limit), I'd also copy the data to a RAID before letting users access it.
Sure, Linux would do, but there's no particular flavour I'd recommend. ScientificLinux is a good one, but *shrug*.
If you need support, Red Hat, but then you also should buy a system for which RHEL is certified.