Forgot your password?
typodupeerror
Data Storage Linux

Ask Slashdot: Best *nix Distro For a Dynamic File Server? 234

Posted by timothy
from the when-birdwatching-goes-too-far dept.
An anonymous reader (citing "silly workplace security policies") writes "I'm in charge of developing for my workplace a particular sort of 'dynamic' file server for handling scientific data. We have all the hardware in place, but can't figure out what *nix distro would work best. Can the great minds at Slashdot pool their resources and divine an answer? Some background: We have sensor units scattered across a couple square miles of undeveloped land, which each collect ~500 gigs of data per 24h. When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds. We need to present the contents of these drives as one unified tree (shared out via Samba), and the best way to go about that appears to be a unioning file system. There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive. We've been looking around, but are having trouble finding info for this seemingly simple situation. Can we get FreeNAS to do this? Do we try Greyhole? Is there a distro that can run unionfs/aufs/mhddfs out-of-the-box without messing with manual recompiling? Why is documentation for *nix always so bad?""
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Best *nix Distro For a Dynamic File Server?

Comments Filter:
  • by guruevi (827432) <evi AT smokingcube DOT be> on Saturday August 25, 2012 @12:32PM (#41123345) Homepage

    Really, singular hard drives are notoriously bad at keeping data around for long. I would make sure you have a copy of everything. So make a file server with RAIDZ2 or RAID6 and script the copying of these hard drives onto a system that has redundancy and is backed up as well.

    How many times I have seen scientist come out with their 500GB portable hard drives and they are unreadable... way too much. If you fill 500GB in 24 hours, there is no way a portable hard drive will survive for longer than about a year. Most of our drives (500GB 2.5" portable drives) last a few months, once they have processed about 6TB of data full-time they are pretty much guaranteed to fail.

  • unRaid (Score:1, Informative)

    by Anonymous Coward on Saturday August 25, 2012 @12:39PM (#41123397)

    unRaid FTW, I use this to handle TB's of data and it works fine.

  • by e3m4n (947977) on Saturday August 25, 2012 @12:45PM (#41123437)

    Scientific Linux is also a good option for similar reasons. Given its a science grant, they might like the idea that its used at labs like CERN

  • Questionable (Score:2, Informative)

    by Anonymous Coward on Saturday August 25, 2012 @12:48PM (#41123467)

    Why would you want a file server to boot in 30 secs or less? Ok, lets skip the fs check, the controller checks, the driver checks, hell lets skip everything and boot to a recovery bash shell. Why would you not network these collection devices if they are all within a couple of miles and dump to an always on server?

    I really fail to see the advantage of a file server booting in under 30 seconds. Shouldn't you be able to hot swap drives?

    This really sounds like a bunch of kids trying to play server admin. My apologies if this is not the case, but given the parameters provided this IS what it sounds like.

  • Re:Wow (Score:5, Informative)

    by mschaffer (97223) on Saturday August 25, 2012 @12:49PM (#41123477)

    [...]

    Wow.. I completely agree with an AC.

    The OP here is in way over his head and the entire project seems to have been planned by idiots.

    This will end badly.

    Like that's the first time. However, we don't know all of the circumstances and I wouldn't be surprised that the OP had this dropped into his/her lap.

  • by Anonymous Coward on Saturday August 25, 2012 @12:56PM (#41123521)

    OP here:

    I left out a lot of information from the summary in order to keep the word count down. Each disk has an almost identical directory structure, and so we want to merge all the drives in such a way that when someone looks at "foo/bar/baz/" they see all the 'baz' files from all the disks in the same place. While the folders will have identical names the files will be globally unique, so there's no concern about namespace collisions at the bottom levels.

  • Re:Wow (Score:5, Informative)

    by arth1 (260657) on Saturday August 25, 2012 @01:04PM (#41123565) Homepage Journal

    Yeah. Before we can answer this person's questions, we need to know why he has:
    1: Decided to cold-plug drives and reboot
    2: Decided to use Linux
    3 ... to serve to Windows

    Better yet, tell us what you need to do - not how you think you should do it. Someone obviously needs to read data that's collected, but all the steps in between should be based on how it can be collected and how it can be accessed by the end users. Tell us those parameters first, and don't throw around words like Linux, samba, booting, which may or may not be a solution. Don't jump the gun.

    As for documentation, no other OSes are as well-documented as Linux/Unix/BSD.
    Not only are there huge amounts of man pages, but there are so many web sites and books that it's easy to find answers.

    Unless, of course, you have questions like how fast a distro will boot, and don't have enough understanding to see that that that depends on your choice of hardware, firmware and software.
    I have a nice Red Hat Enterprise Linux system here. It takes around 15 minutes to boot. And I have another Red Hat Enterprise Linux system here. It boots in less than a minute. The first one is -- by far -- the better system, but enumerating a plaided RAID of 18 drives takes time. That's also irrelevant, because it has an expected shutdown/startup frequency of once per two years.

  • Re:Wow (Score:4, Informative)

    by Anonymous Coward on Saturday August 25, 2012 @01:14PM (#41123653)

    Op here:

    The gear was sourced from a similar prior project that's no longer needed, and we don't have the budget/authorization to buy more stuff. Considering that the requirements are pretty basic, we weren't expecting to have a serious issue picking the right distro.

    >You are looking for information that your average user won’t care about.

    Granted, but I thought one of the strengths of *nix was that it's not confined to computer illiterates. Some geeks somewhere should know which distros can be stripped down to bare essentials with a minimum of fuss.

    As for the 30 seconds thing, there's a lot side info I left out of the summary. This project is quirky for a number of reasons, and one of them being that the server itself spends a lot of time off, and needs to be booted (and halted) on demand. (Don't ask, it's a looooooong story).

  • Re:Wow (Score:4, Informative)

    by msobkow (48369) on Saturday August 25, 2012 @01:48PM (#41123861) Homepage Journal

    The "under 30 seconds part" is not as easy as you think.

    You're mounting new drives -- that means Linux will probably want to fsck them, which with such volume, is going to take way more than 30 seconds.

  • by Anonymous Coward on Saturday August 25, 2012 @02:06PM (#41123969)

    FreeNAS is based on FreeBSD, and boot speed (no matter what the OS) is based entirely on the hard drive speed + CPU speed + 'automagic' configuration.

    FreeBSD boots pretty fast, but you need to turn off things like the bootloader menu delay, and set fixed IP addresses. Same on Linux, but Linux tends to be sloppy about starting up services.

    In either case you can usually just turn anything you don't need off, and just turn on what you do need.

    FreeBSD's ZFS is better than anything you can setup on Linux, but unless the box has a lot of RAM you're not going to get the expected performance.

    Most of the NAS devices you see for sale run FreeNAS if they're based on x86-64 CPU's or Linux if they're not (PPC/MIPS/ARM) but they're not particuarly great pieces of hardware, you pretty much end up with something stupid silly like:
    OS -> UFS/EXT2/EXT3 -> Samba share
    for Windows clients, but you can also do this on FreeBSD/FreeNAS (ZFS is terrible under Linux-FUSE)
    FREEBSD->ZFS (using all drives, even remote drives) -> iSCSI
    iSCSI is something that you must have GigE/10GB Fiber for, and decent processing power. Most of the systems you see (including DELL) that do iSCSI are woefully underpowered for a small server, or extremely overkill (enterprise)

    Windows however supports iSCSI out of the box. So you can do something theoretically stupid like this:
    FreeBSD -> ZFS ->iSCSI ->Windows box accesses iSCSI and shares it with other Windows machines.

    So it depends what you really want to do. From your description, it sounds like what you really want to do is hotplug a bunch of drives into a system, that system is "union"'d by filesystem mounts (nobody says you have to mount everything to root) and the share them under that samba.

    But another possibility, not clearly indicated is that maybe the drives have overlapping file systems that you want to see as one (eg same directory structure, different file names) this is more complicated to deal with, but I'd probably go with not trying to share off the hotswapped drives and instead RSYNC all the drives to another filesystem and share that instead.

  • OP here (Score:5, Informative)

    by Anonymous Coward on Saturday August 25, 2012 @02:32PM (#41124155)

    Ok, lots of folks asking similar questions. In order to keep the submission word count down I left out a lot of info. I *thought* most of it would be obvious, but I guess not.

    Notes, in no particular order:

    - The server was sourced from a now-defunct project with similar setup. It's a custom box with non-normal design. We don't have authorization to buy more hardware. That's not a big deal because what we have already *should* be perfectly fine.

    - People keep harping on the 30 seconds thing.
    The system is already configured to spin up all the drives simultaneously (yes the PSU can handle that) and get through the bios all in a few seconds. I *know* you can configure most any distro to be fast, the question is how much fuss it takes to get it that way. Honestly I threw that in there as an aside, not thinking this would blow up into some huge debate. All I'm looking for are pointers along the lines of "yeah distro FOO is bloated by default, but it's not as bad as it looks because you can just use the BAR utility to turn most of that off". We have a handful of systems running winXP and linux already that boot in under 30, this isn't a big deal.

    - The drives in question have a nearly identical directory structure but with globally-unique file names. We want to merge the trees because it's easier for people to deal with than dozens of identical trees. There are plenty of packages that can do this, I'm looking for a distro where I can set it up with minimal fuss (ie: apt-get or equivalent, as opposed to manual code editing and recompiling).

    - The share doesn't have to be samba, it just needs to be easily accessible from windows/macs without installing extra software on them.

    - No, I'm not an idiot or derpy student. I'm a sysadmin with 20 years experience (I'm aware that doesn't necessarily prove anything). I'm leaving out a lot of detail because most of it is stupid office bureaucracy and politics I can't do anything about. I'm not one of those people who intentionally makes things more complicated than they need to be as some form of job security. I believe in doing things the "right" way so those who come after me have a chance at keeping the system running. I'm trying to stick to standards when possible, as opposed to creating a monster involving homegrown shell scripts.

  • Re:Is this a joke? (Score:4, Informative)

    by techno-vampire (666512) on Saturday August 25, 2012 @04:17PM (#41124913) Homepage
    If you want different removeable disks to be mounted in different places, it's even easier. Just list each disk (identified by UUID) in /etc/fstab, with the proper mountpoint and include auto in the options. That way, when you plug it in, the system knows exactly where it goes.
  • by oneiros27 (46144) on Saturday August 25, 2012 @04:17PM (#41124915) Homepage

    What you're describing sounds like a fairly typical Sensor Net (or Sensor Web) to me, maybe with a little more data logged than is normal per platform. (I believe they call it a 'mote' in that community).

    Some of the newer sensor nets use a forwarding mesh wireless system, so that you relay the data to a highly reduced number of collection points -- which might keep you from having to deal with the collection of the hard drives each night (maybe swap out a multi-TB RAID at each collection point each night instead).

    I'm not 100% sure of what the correct forum is for discussion of sensor/platform design. I know they have presentations in the ESSI (Earth and Space Science Informatics) focus group of the AGU (American Geophysical Union). Many of the members of ESIPfed (Federation of Earth Science Information Partners) probably have experience in these issues, but it's more about discussing managing the data after it comes out of the field.

    On the off chance that someone's already written software to do 90% of what you're looking for, I'd try contacting the folks from the Software Reuse Working Group [nasa.gov] of the Earth Science Data System community.

    You might also try looking through past projects funded through NASA AISR (Adanced Information Systems Research [nasa.gov]) ... they funded better sensor design & data distribution systems. (unfortunately, they haven't been funded for a few years ... and I'm having problems accessing their website right now). Or I might be confusing it with the similar AIST (Adanced Information Systems Technology [nasa.gov]), which tends more towards hardware vs. software. ... so, my point is -- don't roll your own. Talk to other people who have done similar stuff, and build on their work, otherwise you're liable to make all of the same mistakes, and waste a whole lot of time. And in general (at least ESSI / ESIP-wide), we're a pretty sharing community ... we don't want anyone out there wasting their time doing the stupid little piddly stuff when they could actually be collecting data or doing science.

    (and if you haven't guessed already ... I'm an AGU/ESSI member, and I think I'm an honorary ESIP member (as I'm in the space sciences, not earth science) ... at least they put up with me on their mailing lists)

  • Re:Wow (Score:4, Informative)

    by Fallen Kell (165468) on Saturday August 25, 2012 @09:47PM (#41126681)
    Even though I believe I am being trolled, I will still feed it some.

    1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).

    You will never find enterprise grade hardware which will do this. You will be even harder pressed to do this on mechanical drives (for the OS) and even harder still with random new drives being attached which may need to have integrity scans performed. This requirement alone is asinine and against every rule for data center and system administration handbook for something that is serving data to other machines. If you need something that you need to halt and shutdown so you can load the drives, well, you do that on something else other than the box which is servicing the data requests to other computers, and you copy the data from that one system to the real server.

    2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?

    No you don't need to justify it, but you do need to explain it some. For the most part it sounds like most people where you work do not have much experience with *nix systems, because if you did, you would never have had requirement (1) in the first place (as you would know the whole point of *nix is to be able to separate everything so that you don't have to bring down the system just to update/replace/remove one particular service/application/hardware, everything is compartmentalized and isolated, which means the only time you should ever need to bring down the system is due to catastrophic hardware failure or you needed to update the actual kernel, otherwise everything else should be build such a way that is is hot-swappable, redundant, and/or interchangeable on the fly).

    3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.

    Well, SAMBA is the only thing out there that will share to Win/Mac clients from *nix, so that is the right solution.

    - Take a server that is off, and boot it remotely (via ethernet magic packet) - Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives. - Share out the unioned virtual tree in such a way that it it's easily accessible to mac/win clients - Do all this in under 30 seconds

    I don't know why people keep focusing on the "under 30 seconds" part, it's not that hard to get linux to do this.....

    They are focusing on the "under 30 seconds" part because they know that it is an absurd requirement for dealing with multiple hard drives which may or may not have a working filesystem as they have not only traveled/been shipped, but have also been out in the actual field. The probability of data corruption is so astronomically higher that they know that the "under 30 seconds" is idiotic at best.

    For instance, I can't even get to the BIOS in 30 seconds on anything that I have at my work. Our data storage servers take about 15-20 minutes to boot. Our compute servers take about 5-8 minutes. They spend more that 30 seconds just performing simple memory tests at POST, let alone hard drive identification and filesystem integrity checks or actually booting. This is why people are hung up on the "under 30 seconds".

    If you had a specialty build system, in which you disabled all memory checks (REALLY BAD IDEA on a server though since if your memory is bad you can corrupt your storage because writes to your storage are typically from memory), used SSDs for your OS drives, had no hardware raid controllers on the system, used SAS controllers which do not have firmware integrity checks, you might, just might be able to boot the system in 30 seconds. But I sure as hell would not trust it for any kind of important data because you had to disable all the hardware tests which means you have no idea if there are hardware problems which are corrupting your data.

Many people are unenthusiastic about their work.

Working...