Which OSS Clustered Filesystem Should I Use?

Which OSS Clustered Filesystem Should I Use? 320

Posted by Unknown Lamer on Monday October 31, 2011 @10:02PM from the deleting-is-so-90s dept.

Dishwasha writes "For over a decade I have had arrays of 10-20 disks providing larger than normal storage at home. I have suffered twice through complete loss of data once due to accidentally not re-enabling the notification on my hardware RAID and having an array power supply fail and the RAID controller was unable to recover half of the entire array. Now, I run RAID-10 manually verifying that each mirrored pair is properly distributed across each enclosure. I would like to upgrade the hardware but am currently severely tied to the current RAID hardware and would like to take a more hardware agnostic approach by utilizing a cluster filesystem. I currently have 8TB of data (16TB raw storage) and am very paranoid about data loss. My research has yielded 3 possible solutions: Luster, GlusterFS, and Ceph." Read on for the rest of Dishwasha's question.

"Lustre is well accepted and used in 7 of the top 10 supercomputers in the world, but it has been sullied by the buy-off of Sun to Oracle. Fortunately the creator seems to have Lustre back under control via his company Whamcloud, but I am still reticent to pick something once affiliated with Oracle and it also appears that the solution may be a bit more complex than I need. Right now I would like to reduce my hardware requirements to 2 servers total with an equal number of disks to serve as both filesystem cluster servers and KVM hosts."

"GlusterFS seems to be gaining a lot of momentum now having backing from Red Hat. It is much less complex and supports distributed replication and directly exporting volumes through CIFS, but doesn't quite have the same endorsement as Lustre."

"Ceph seems the smallest of the three projects, but has an interesting striping and replication block-level driver called Rados."

"I really would like a clustered filesystem with distributed, replicated, and striped capabilities. If possible, I would like to control the number of replications at a file level. The cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations. And lastly it should require as minimal hardware as possible with the possibility of upgrading and scaling without taking down data."

"Has anybody here on Slashdot had any experience with one or more of these clustered file systems? Are there any bandwidth and/or latency comparisons between them? Has anyone experienced a failure and can share their experience with the ease of recovery? Does anyone have any recommendations and why?"

Which OSS Clustered Filesystem Should I Use?

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 320 Comments Log In/Create an Account

Comments Filter:

Repeat after me: (Score:5, Insightful)

by Anonymous Coward writes: on Monday October 31, 2011 @10:06PM (#37902892)

RAID is not a backup solution!

- Re:Repeat after me: (Score:5, Insightful)
  
  by NFN_NLN ( 633283 ) writes: on Monday October 31, 2011 @10:58PM (#37903396)
  
  Parent currently is marked as "0" but is dead on. His opening statement talks about a data loss (x2), is "very paranoid about data loss" and his closing remarks talk about "ease of recovery". Your statements suggest you are primarily concerned about data loss.
  Clustered filesystems are complex software that specialize in concurrent server access, not increased redundancy.
  You need to research backups and/or remote replication. Or buy an enterprise file server that does everything including call-home when it detects a hardware issue.. not waste time on a CFS.
  
  - Re: (Score:3)
    
    by NFN_NLN ( 633283 ) writes:
    
    And don't forget about RPO. If you want synchronous file replication over any useful distance we're talking $$$. If asynchronous is acceptable then decide what an acceptable RPO is, along with your data change rate. With those you can decide if you can afford offsite replication. Most business decide nightly tapes are acceptable at that point.
  - Re: (Score:3)
    
    by afabbro ( 33948 ) writes:
    
    Clustered filesystems are complex software that specialize in concurrent server access, not increased redundancy.
    Bingo. Spot on perfect answer.
  - - Re:Repeat after me: (Score:5, Insightful)
      
      by NFN_NLN ( 633283 ) writes: on Tuesday November 01, 2011 @12:10AM (#37903816)
      
      Except when they do support redundancy:
      http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Creating_Replicated_Volumes [gluster.com] - Replicated volumes replicate files throughout the bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.
      RAID is still NOT A BACKUP!
      I have a 500 node replicated filesystem... and I just overwrote the wrong file, or a virus infected a file, or the file got corrupted...
      The good news is my 500 replicated nodes are all consistent. The bad news is... wheres my fucking file!
      
      - Re: (Score:3, Informative)
        
        by rwa2 ( 4391 ) * writes:
        
        Yeah, subby just needs to:
        
        delete some porn. Sure, it's a good feeling to know it's all there, but you really just watch the top 1% over and over. "The redyouwankjizzhutdb Cloud" will do when you just want some random fix.
        compress the rest. There's no reason you need lossless 1080p masters of all your home videos of your kids spitting up. A nice h264 compressed archive can be enjoyed more often, is more portable to all your mobile devices, and you'll barely notice the loss of quality when someday you m
      - Re: (Score:3)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by somersault ( 912633 ) writes:
        
        For those that only need a tiny set of files backed up, you can use stuff like Dropbox, Ubuntu One, etc. as a convenient addition to any other backup system you have. Those automatically synch to other devices, and keep a cloud backup with previous versions.
        To do your own Dropbox-like backup system, I suppose you could use a content versioning system like svn/git. I hadn't really considered doing something like that until now. It would be less user friendly than Dropbox, but you wouldn't have to pay a subsc
      - Re: (Score:3)
        
        by TheRaven64 ( 641858 ) writes:
        
        So you revert to the last snapshot. Or you mount the last snapshot and recover that file (you are making regular snapshots of your volumes, right?). That is not the problem with RAID. The problems that RAID does not address are:
        
        What happens if there is a bug in the filesystem driver that causes the disk to be slowly filled with nonsense? Or the machine is compromised and malware overwrites the existing data.
        
        What happens when thieves steal the server? Or when lightning strikes and fries all of the di
Obligatory: RAID is not a backup (Score:5, Insightful)

by Anthony Mouse ( 1927662 ) writes: on Monday October 31, 2011 @10:15PM (#37902982)

Is the only reason you're looking at a clustered filesystem that you don't want to lose data? Because if it is, it's probably not what you want. The purpose of a clustered filesystem is to minimize downtime in the face of a hardware failure. You still need a backup in the case of a software failure or in case you fat finger something, because a mass deletion can replicate to all copies.

- Re: (Score:3, Informative)
  
  by chrb ( 1083577 ) writes:
  
  If you have more than one server then it's pretty easy to set up rsync with rolling backups (rsnapshot or rdiff-backup or whatever) which is more of a proper backup solution. It's also probably a bit easier to administrate than a clusterfs.
  Having said that, Hadoop's HDFS [apache.org] looks quite good. AFAIK it is pretty robust, and it runs on top of an existing FS so you won't need to repartition, which is useful. FUSE file system driver, and Java, will be a bit slower than in-kernel, but probably not an issue for bul
  - - Re: (Score:3)
      
      by allenw ( 33234 ) writes:
      
      The fuse support has likely gotten worse since no one on the core dev team really spends any time with it. I'd be surprised if it still compiles.
- Re: (Score:2)
  
  by SuperQ ( 431 ) * writes:
  
  And of course what the post really wants is a DISTRIBUTED filesystem. Not a clustered filesystem.
- Re: (Score:2)
  
  by Enfixed ( 2423494 ) writes:
  
  Totally agree, the clustered approach doesn't seem to solve the problem posed. It's simple, buy a bunch of 2TB drives and set them up with ZFS. Configure a nightly snapshot job to another similar machine and call it a day. You can have a larger storage area with a fully redundant backup for less than 2K in parts.
- Re: (Score:2)
  
  by Demonantis ( 1340557 ) writes:
  
  He needs to get priorities in order. I would say raid is probably what he wants for the most part for like you said hardware failure. An online backup service for the stuff he truly needs to back up. I sincerely doubt a single person can amass 8 TB of data that would be critical to have. Having it all is nice, but definitely not realistic.
- - Re: (Score:3)
    
    by Doc Hopper ( 59070 ) writes:
    
    Mass delete.
    ZFS with a snapshot schedule. Sorted, as long as you catch it within the reach of your oldest snapshot.
    Overwrite with bad data.
    ZFS with a snapshot schedule. Sorted.
    Silent filesystem corruption.
    ZFS. Sorted.
    Batches of disks at one end of the bathtub curve.
    ZFS verifies the data, and when your disks poop out the data is rendered read-only long before just about anything else would have realized there's a problem.
    Trees going through your roof.
    
    ZFS scheduled remote replication to a second array at you
PronFS (Score:3)

by igny ( 716218 ) writes: on Monday October 31, 2011 @10:16PM (#37902992) Homepage Journal

Where is PronFS when we desperately need one?

- Re: (Score:3)
  
  by Jeremi ( 14640 ) writes:
  
  Where is PronFS when we desperately need one?
  It's widely available... these days it goes by the name "the Internet".
I know this isn't what you asked but... (Score:5, Interesting)

by KendyForTheState ( 686496 ) writes: on Monday October 31, 2011 @10:20PM (#37903026)

20 disks seems like overkill for your storage needs. Seems like the more disks you use the greater the risk of failure of one or more of them. Also, your electricity bill must be through the roof. I have 4 3TB drives with a 3Ware controller in RAID5 array which gives me the same storage capacity with 1/5th the drives Aren't you making this more complicated than it needs to be? ...Maybe that's the point?

- Re: (Score:3)
  
  by doodleboy ( 263186 ) writes:
  
  I also have a 3ware [lsi.com] card and four 1 TB drives in RAID 5 in my 10.04 desktop PC at home. Some of that space is exported via iSCSI to a couple of Windows boxes. Then I back the RAID array up with a couple of external SATA drives. My wife thinks this is excessive, but I lost a lot of data, once, nothing critical but stuff I cared about, emails and papers from college, pics of friends and family, etc. But when the drive started throwing SMART errors I thought, yup, better go pick up a new drive soon... 3 days l
- 3 disks are just al vulnerable (Score:2)
  
  by dutchwhizzman ( 817898 ) writes:
  
  Because you get the same amount of single sector failures, no matter what the capacity of your discs is. As soon as they can slam more data on the same surface, they will do so, because the commercial threshold for data loss seems to be the chance of single sector failure.
  
  Also, if i had only 10 SATA discs for virtual machines image storage, I'd be really unhappy, let alone three. in the summary it clearly states hosting VMs for HA is a requirement. Judging by the number of disks without looking at the requ
- Re: (Score:2)
  
  by rwa2 ( 4391 ) * writes:
  
  Yeah, if it were me, instead of a RAID10 with exotic hardware, I'd split it across a few cheap servers and run software RAID6 for a more hardware-agnostic approach. Then use something like OpenAFS (which I unfortunately have 0 actual experience with) to make those servers look like one filesystem to clients. That should get you a good bang for the buck, since motherboards and tower chassis that can fit 6 disks and gigabit networking hardware is relatively cheap compared to JBODs and junk.
  Lustre and OCFS2
- - Re: (Score:2)
    
    by Kagetsuki ( 1620613 ) writes:
    
    Please, if you're going to make such good posts don't do it as AC - you deserve the karma. It just so happens we're looking at a scheme almost exactly like what you outline, we've currently got a dual server "humming" configuration with on and off site backups but we need something more serious after getting an influx of customers.
    - Re: (Score:2)
      
      by dbIII ( 701233 ) writes:
      
      I've recently ditched RAID10 and gone for RAID6. With a decent controller and a lot of drives it's not much slower and you can lose ANY two drives. With RAID10 you may be able to lose up to 7 drives if it's one from each pair - but if you lose two from the same pair you could have a big hole in every single large file on the array. The only multiple disk failures I've ever had were adjacent overheating drives anyway.
      Tape or USB storage is good - you can't overwrite something that is in a box in another b
      - I'll add more (Score:2)
        
        by dbIII ( 701233 ) writes:
        
        A mirror of live spinning disks updated at intervals is icing on the cake if you can afford it after you have real backups - doing it instead is can look extremely stupid when things go wrong.
        A web hosting company near me failed spectacularly due to that mistake - their mirror was mirroring garbage and they lost all of their clients files. Of course it made it even into the print media and it made them look very stupid.
ZFS (Score:4, Informative)

by Anonymous Coward writes: on Monday October 31, 2011 @10:20PM (#37903034)

LVM, mdadm & Ext4 or ZFS seems like it would be more then adequate for this. A 2U server can hold 36TB of raw data with software raid and consumer disks. 2.5" would be preferable for home use considering power usage unless your a fellow Canadian; in which case servers make great space heaters.

- Re: (Score:2)
  
  by DarkDust ( 239124 ) writes:
  
  I do have a ZFS setup of currently 6 disks and I really recommend buying server-grade HDDs, unless you have set up a monitoring system that tells you whenever a HDD is failing so you can buy a new one.
  Until half a year ago I used normal USB HDDs that you can buy everywhere. My experience was that they simply aren't meant to be always on and fail pretty soon. I usually had a failed HDD once every quarter year. It drove me mad. Almost one year ago I started using these HDD docks where you can put two 2,5" or
You still need to make a decision (Score:5, Insightful)

by 93 Escort Wagon ( 326346 ) writes: on Monday October 31, 2011 @10:24PM (#37903068)

You ask about the technical specifications; but, when commenting regarding the three likely candidates you found, you've put philosophical objections first and foremost. I think you first need to figure out which factor is more important to you - specs, or philosophy. Otherwise you're probably going to waste a lot of time arguing in circles.

- Re: (Score:3)
  
  by SoupIsGood Food ( 1179 ) writes:
  
  Philosophical objections are valid. It's why people decided to go with Open Source solutions in the first place... chose the right philosophy, and you're buying into a system that will have developer and user support for a long time, and pay off in more features implemented in satisfactory ways.
  If a tech has the right vision, it will go a long, long way, where pure technical excellence on its own is no guarantee the tech will grow with the user.
No ZFS? (Score:5, Interesting)

by theskipper ( 461997 ) writes: on Monday October 31, 2011 @10:24PM (#37903074)

How about ZFS with your RAID controllers in single drive mode (or worst case JBOD)? Let ZFS handle the vdevs as mirrors or raidz1/2 as you wish. ZFSforLinux is rapidly maturing and definitely stable enough for a home nas. Or go the OpenIndiana route if that's what you're comfortable with.
My 4TB setup has actually been a joy to maintain since committing to ZFS, with BTRFS waiting in the wings. The only downside is biting the bullet and using modern CPUs and 4-8GB memory. Recommissioning old hardware isn't the ideal way to go, ymmv.
Just a thought.

- Re: (Score:2)
  
  by JoeMerchant ( 803320 ) writes:
  
  ZFSforLinux is rapidly maturing and definitely stable enough for a home nas.
  ZFS has been maturing rapidly for the last 6 years... Didn't it almost make its way into OS-X at one point? I'm not sure I'd put all of my eggs in that particular basket (or any single system, really).
  If it's backup you want, I'd look into a system that copies off of one type of file-system into another. Ever since my QNAP TS-109 took a dump, and with it my data because of their proprietary "Linux" partition formatting, I've stuck to nice simple low performance solutions like 2TB USB drives straight out o
  - Re:No ZFS? (Score:4, Insightful)
    
    by hjf ( 703092 ) writes: on Monday October 31, 2011 @11:07PM (#37903460) Homepage
    
    ZFS isn't free anymore. It's all commercial and proprietary and no bugfixes or anything get released outside a big bad support contract with Oracle.
    If you want the free version you can still use v28 on FreeBSD and Solaris Express (no upgrades in over 1 year). Works great, the only thing you don't get is ZFS crypto (transparent encryption).
    
    - Re: (Score:2)
      
      by bill_mcgonigle ( 4333 ) * writes:
      
      If you want the free version you can still use v28 on FreeBSD and Solaris Express (no upgrades in over 1 year).
      or linux [zfsonlinux.org].
    - Re: (Score:2)
      
      by Marsell ( 16980 ) writes:
      
      Odd. The company I work for uses ZFS on many thousands of disks, we don't pay Oracle a dime, and we shovel code back to illumos.
      Most of the top Solaris talent jumped the Oracle ship long ago. A lot of them are committing code to illumos as part of the jobs.
    - - Re: (Score:3)
        
        by Zemplar ( 764598 ) writes:
        
        Go read that "new" Oracle license and you'll realize Solaris isn't nearly as free as it once was.
        
        Too bad, Solaris was gaining more momentum while it was available for free for any purpose, not just "...only for the purpose of developing, testing, prototyping and demonstrating your applications, and not for any other purpose."
Thoughts on OCFS (Score:4, Interesting)

by trawg ( 308495 ) writes: on Monday October 31, 2011 @10:24PM (#37903076) Homepage

We have been using OCFS [oracle.com] (Oracle Cluster File System) for some time in production between a few different servers.
Now, I am not a sysadmin so can't comment on that aspect. I'm like a product manager type, so I only really see two sides of it: 1) when it is working normally and everything is fine 2) when it stops working and everything is broken.
Overall from my perspective, I would rate it as "satisfactory". The "working normally" aspect is most of the time; everything is relatively seamless - we add new content to our servers using a variety of techniques (HTTP uploads, FTP uploads, etc) and they are all magically distributed to the nodes.
Unfortunately we have had several problems where something happens to the node and it seems to lose contact with the filesystem or something. At that point the node pretty much becomes worthless and needs to be rebooted, which seems to fix the problem (there might be other less drastic measures but this seems to be all we have at the moment).
So far this has JUST been not annoying enough for us to look at alternatives. Downtime hasn't been too bad overall; now we know what to look for we have alarming and stuff set up so we can catch failures a little bit sooner before things spiral out of control.
I have very briefly looked at the alternatives listed in the OP and look forward to reading what other reader's experiences are like with them.

- Also ACFS (next generation of OCFS...) (Score:3)
  
  by Meetch ( 756616 ) writes:
  
  Firstly, no I don't work for Oracle, and never have, and I know how hard it can be to justify using their products, especially the ones you pay for(!) considering some of the things I've seen, but credit where credit's due...
  OCFS was originally designed specifically for storing Oracle datafiles, in a cluster, in a non-POSIX fashion. After that came OCFS2, which is POSIX compliant, but can deadlock when NFS exported due to the way NFS handles locking, in a way that can be worked around with the "nodirplus
- - Re:Thoughts on OCFS (Score:4, Insightful)
    
    by afabbro ( 33948 ) writes: on Tuesday November 01, 2011 @12:20AM (#37903862) Homepage
    
    Thank you, this is one of the few valid answers to my primary question which is of actual experience with clustered file systems. I don't think most of the responders got the clue that I'm looking for a solution that will hopefully scale over a decade's worth of time.
    There is a question of missing clues, but I don't think it's in the responders. You either asked your question poorly or you don't understand your problem. Your question centers of being "paranoid about data loss" and yet you're discussing technologies designed to manage concurrent access to a filesystem. Do you put in gigabit ethernet when you want faster USB performance?
    I'll likely be upgrading to a Super Micro 2U Twin with QDR Infiniband
    Give me a break...
    
  - Re: (Score:3)
    
    by Macka ( 9388 ) writes:
    
    How is this an answer to your question? You identified 3 cluster filesystem types that protect against hardware loss by distributing the data over a cluster of systems - but OCFS2 isn't like that. It's a filesystem that's designed to provide concurrent shared access to a filesystem by a cluster of servers, which in combination with a HA framework can provide a platform that applications can use to protect against node failure, not disk failure. With OCFS2 you still have to make the storage highly availab
    - Re: (Score:2)
      
      by Dishwasha ( 125561 ) writes:
      
      Since you're such a low UID I'll bother answering your question.
      Thank you, this is one of the few valid answers to my primary question which is of actual experience with clustered file systems.
      I had already thrown out OCFS2 and GFS2 as possible candidates, but that was irrelevant to my reply. Also currently I am unaware of any non-proprietary hardware or software RAID (mdadm in particular) that supports active/active or active/passive on a shared backplane at any RAID level other than 1 or 0 (i.e. DRBD) and rather expensive and not yet released Areca external RAID controllers [areca.com.tw]. Also I'm looking for whitebox OSS solutions.
      - Re: (Score:2)
        
        by drsmithy ( 35869 ) writes:
        
        I had already thrown out OCFS2 and GFS2 as possible candidates, but that was irrelevant to my reply. Also currently I am unaware of any non-proprietary hardware or software RAID (mdadm in particular) that supports active/active or active/passive on a shared backplane at any RAID level other than 1 or 0 (i.e. DRBD) and rather expensive and not yet released Areca external RAID controllers [areca.com.tw]. Also I'm looking for whitebox OSS solutions.
        This helps to explain more about what you want to do, but does
  - response to OP, please read parent as well (Score:3)
    
    by dutchwhizzman ( 817898 ) writes:
    
    You don't seem to understand a few basics about storage, so let me explain them briefly:
    
    Backup is a method of storing your data in a safe place, so if you accidentally or purposefully delete it, or if you have a (severe) hardware failure, you still have your data. This automatically means you'll want to store your backup data on a totally, physically separated medium. If someone wants to destroy your data, a distributed filesystem won't do you any good. Taking one way snapshots over a network link to a rem
    - rsync (Score:2)
      
      by tapanitarvainen ( 1155821 ) writes:
      
      Taking one way snapshots over a network link to a remote location, for instance using rsync and a remote filesystem that supports snapshots, can be a viable solution for short term backups, but if you want longer term retention, "old hat" backup equipment still is a viable solution. How are you planning to restore from data corruption that happened 2 weeks ago?
      It's easy enough to keep any desired schedule of incremental backups with rsync - search for rsnapshot for example, or BackupPC if you want a fancy web-based interface.
      Otherwise, 100% agreement: backups should be physically separated from the primary data, preferably by significant geographical distance (think about fire) and duplicated on several locations.
  - Re: (Score:2)
    
    by dbIII ( 701233 ) writes:
    
    I'll likely be upgrading to a Super Micro 2U Twin with QDR Infiniband which none of the mentioned solutions have support for
    I think you are wrong, just about anything will run on those things, and although I don't have Infiniband I've noticed there are drivers for a lot of platforms.
AWS EBS (Score:2)

by curmudgeon99 ( 1040054 ) writes:

Why are you spending your money like that? Sneaker-net your drives to AWS EBS. It's a no-brainer.
- Re: (Score:2)
  
  by Enfixed ( 2423494 ) writes:
  
  Why are you spending your money like that? Sneaker-net your drives to AWS EBS. It's a no-brainer.
  AWS EBS = $0.10 per allocated GB per month or $102.40 per TB..... I doubt power and hardware is costing him > $819.20 a month.
- Re: (Score:2)
  
  by afabbro ( 33948 ) writes:
  
  CrashPlan or a similar online backup service is what the questioner needs. But it sounds so much cooler to be discussing clustered filesystems.
I was going to say Lustre, but... (Score:3, Insightful)

by Anonymous Coward writes: on Monday October 31, 2011 @10:27PM (#37903096)

I was going to say Lustre, but then I saw that you only have 16TB. 15 years ago that would have been impressive, but these days, those supercomputers you mention probably have that much in DRAM, and their file storage is in the multi-petabyte range. Lustre is optimized for large scale clusters, in which you have entire nodes (a node is a computer, here) dedicated to I/O - bringing external data into the in-cluster network fabric, while other nodes are compute nodes - they don't talk to the outside world, except by getting data via the I/O nodes.
That's why you'll see all this talk of OSSs and OSTs, as though they'd be distinct systems - on a large scale cluster they are.
For only 16TB, what you want is a SAN, or maybe even a NAS.
If you want open source, then go with openfiler. It supports pretty much everything. I haven't stress tested it, but it seems to work well for that order of magnitude of data.

Tahoe-LAFS (Score:2)

by the_brobdingnagian ( 917699 ) writes:

Try Tahoe-LAFS [tahoe-lafs.org].
- Re: (Score:2)
  
  by Dishwasha ( 125561 ) writes:
  
  Not a bad suggestion and more helpful than most. Thanks for the input!
LTO4 (Score:2)

by hawguy ( 1600213 ) writes:

I think the best disk-hardware agnostic solution for preventing filesystem dataloss is an LTO-4 autoloader and regular tape backups (hopefully taken off site regularly). They are pretty cheap, a superlader3 with an 8 tape (6TB/12TB) capacity is less than $3000. Or buy a refurb LTO3 autoloader for a third the price and half the capacity.
Bad Dog. Wrong Tree! (Score:4, Insightful)

by SmurfButcher Bob ( 313810 ) writes: on Monday October 31, 2011 @10:38PM (#37903228) Journal

You will spend all this effort to build this solution... and then your house will catch fire.
On the good side, the fire department WILL manage to save the basement by filling it with 80,000 gallons of water at 2,000GPM per fire engine.
Or, you'll be wiped out by a flood. Or a drunk will drive through the side of your house. Or you'll have a gas leak and the house will detonate. Or carpenter ants will eat away the floor joists.
Raid is not a backup solution. Neither is replication... if you whack the data, it'll likely be replicated. If you get a compromised machine somewhere, files they touch will likely be replicated. They only thing you're creating is an overly complex hardware mitigation. If THAT is how you define "data preservation"... you're doing it wrong.
Look more for a solution to move stuff offsite - a cheap pair of N routers running Tomato or OpenWRT, to a neighbor's house, and you reciprocate with each other. Bonus points if you use versions, transaction logs, journals, etc.

- Re: (Score:2)
  
  by Macka ( 9388 ) writes:
  
  Or alternatively you back everything off to tape (rotating sets) and store them in a fireproof safe.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:2)
    
    by Skater ( 41976 ) writes:
    
    My wife and I thought through this, and the only thing we felt we HAD to put offsite was our pictures. So we have an account with a backup provider that allows rsync, and I have it set up to update nightly. Works great so far.
    We also discussed building three 'backup boxes' that we could place at some relative's houses...then everyone with one of the backup boxes could back up to the other two. We decided not to do it for the expense, though, and we didn't think the relatives would be that interested.
- He did not ask for a backup solution, calm down. (Score:3)
  
  by Barryke ( 772876 ) writes:
  
  Stop bashing on the Raid!=Backup thing, we all know and its irrelevant to the question.
  I believe his main concern is having one giant volume (say 30 TB) to store data, and not about using it as a backup solution. (he did not even use the word)
  A backup for that volume would simply be duplicating the setup offsite, possibly offline archiving the cloned disks or (what i'd do) the complete hardware setup.
  I once investigated GlusterFS too, was impressed and descided that its for larger scale projects and for me
- - Re: (Score:2)
    
    by neonsignal ( 890658 ) writes:
    
    I guess you're not a smurf.
Drobo? (Score:2, Informative)

by varmittang ( 849469 ) writes:

Drobo pro with 3 TB drives setup with dual redundancy will get you 18 Gigs of drive space. In the future, just swap out drives as drive sizes get larger and you can continue to expand. www.drobo.com
- Re: (Score:2)
  
  by speedingant ( 1121329 ) writes:
  
  Slow as molasses though. Way slower than any other solution out there..
- Re: (Score:2)
  
  by Ralphus Maximus ( 594419 ) writes:
  
  Stay away from Drobo. I just bought a new unit, and the only way to see how much "real" free space you have is to use their windows based dashboard program. I have five 1TB drives in mine, the dashboard reports 3.5tb, while the filesystem mounted on both windows and linux report I have 17TB of space on the drives. I contacted their support, and they say it's as designed.
  
  Cheers,
  RM
- Re:Drobo? (Score:4, Interesting)
  
  by LoRdTAW ( 99712 ) writes: on Tuesday November 01, 2011 @09:09AM (#37906260)
  
  STAY AWAY FROM DROBO!
  I had a client ask me to set one up for them. You don't partition it like a standard raid array, you format it to some predetermined size that may be larger then the physical disk space in the machine (through their drobo dashboard). If you have three 1 TB disks you will have around 2TB of actual storage but you can format it for 16TB under Win 7. This is achieved via their "beyond raid" technology which fools the OS into thinking there is more disk space than there actually is. This lets the user make one large volume now and then add disks in the future, even disks of different sizes can be mixed and matched. If you start to go beyond the physical capacity, the array degrades and goes offline until you add another disk and wait hours or days for the disks to reorganize. My client was consolidating her photography library to the drobo when it just crapped out. Turns out she ran over the physical limit.
  Then if your lucky, your computer be it Apple or Windows will take upward of 30 to 45 minutes to boot and shutdown if the fucking thing is plugged in and powered on during either of those two procedures. Drobo recommends you move your data to another set of disks and re-format your drobo. As if people have a few spare TB of disk capacity just sitting around, that's the reason they bought your shit box to begin with, assholes. Its a known issue.
  I have personally used hardware raid 5, software block level raid 5 and ZFS. If you ask me id rather have the file system do the RAID work at the file system level, not the block level where the file system is ignorant of what lies beneath. ZFS is the way to go until BTRFS is fully stable and feature competitive with ZFS. Then you do incremental backups offsite, either to a family or friends house or to a commercial off site backup provider.
  And what is your 16 TB consist of? If its movies and the like then don't bother spending money backing it up. If its self make video and other personal large files then that makes sense. I know of people who spent oodles of cash to backup silly crap like downloaded movies that can easily be replaced or rented from Netflix.
  With the way things are going in the storage world, SSD's will eclipse mechanical disks at the desktop level and mechanical disks will be relegated to backup duty where they far outstrip SSD's in capacity. It reminds me of when tape drives were the king of capacity, often tapes were several orders of magnitude larger than current hard disks and tapes were cheap. They were slow but my god did they have capacity. Now it looks like SSD's will assume the role of desktop storage and to some degree server storage while mechanical disks will be used for large backup systems and file servers. Mechanical hard drives of today will be tomorrows tape drives and then obsolete when SSD's begin to overtake then in capacity. By then we might have something even higher in capacity like holographic or some other sci-fi sounding storage.
  
Stop with experimental shit (Score:2, Insightful)

by ArchieBunker ( 132337 ) writes:

Seriously stop with the experimental and filesystem projects still in beta. You need one that is matured and time tested. Do a bit of research. I don't even run RAID and have yet to permanently lose anything in probably 20 years.
- Re: (Score:2)
  
  by QuantumRiff ( 120817 ) writes:
  
  People sure seem to think clustering is the key to everything..
  I'm with you, tried and tested.. I like when a client mentions how they have 2 standby database servers in remote locations, with almost live replication, so they can survive everything... I ask them what happens if someone types "drop table user; commit;" in oracle.. Sure enough, it replicates to the standby's, just like its designed to... (they really get upset when I point that out too)
  Same thing with files.. I've seen way to often the mi
Lustre (Score:4, Informative)

by JerkBoB ( 7130 ) writes: on Monday October 31, 2011 @11:09PM (#37903468)

Lustre is pretty cool, but it's not magic pixie dust. It won't break the laws of physics and somehow make a single node faster than it would be as a NFS server. It's for situations when a single file server doesn't have the bandwidth to handle lots of simultaneous readers and writers. A "small" Lustre filesystem these days usually has 8-16 object storage servers serving mid-high tens of TB. The high end filesystems have literally hundreds of OSSes and multiple PB served. The largest I know of right now is the 5PB Spider [nccs.gov] filesystem at Oak Ridge National Labs.
One nice thing about Lustre on the low end is that you can grow it... Start out small and add new OSSes and OSTs as you need them. This often makes sense in Life Sciences and digital animation scenarios where the initial fast storage needs are unknown or the initial budget is limited (but expected to grow). But if you're never planning to get beyond the capacity of a single node or two, Lustre is just going to be overhead. I don't know much about the other clustered filesystem options.

- Re: (Score:2)
  
  by Dishwasha ( 125561 ) writes:
  
  Yeah, the complication is why I'm leaning more towards GlusterFS, yet so far Lustre is more proven. Unless I get some useful anecdotal experience here I'll probably model out all three solutions with VMs and do my own comparisons and performance analysis. Maybe I'll even post my experiences and results here afterwards.
Fix the machines first... (Score:2)

by k9mach3 ( 2497704 ) writes:

Lustre - no replication (it's on the roadmap for sometime in the next few years), and it relies on access to shared storage (read: FC/iSCSI disk array, and if that fails you loose your data.). OCFS - no replication, designed for multiple servers accessing one array. Ceph - has replication, but still in active development, and somewhat complex. Good if you don't mind loosing your data (it's in alpha... if it breaks, you get to keep both pieces...) GlusterFS - I have no experience with it, but it seems to be
- Re: (Score:2)
  
  by Dishwasha ( 125561 ) writes:
  
  http://wiki.lustre.org/index.php/Lustre_2.0_Features [lustre.org] lists filesystem replication as a benefit of Luster 2.0 back in November 2009. I won't be running any RAID since my requirement isn't really to reduce number of disks used by relying on parity. One or more replication partners/mirrors will handle that function. Rsync won't work for the aforementioned clustered virtualization needs.
Performance (Score:3, Informative)

by speedingant ( 1121329 ) writes: on Monday October 31, 2011 @11:17PM (#37903508)

What kind of performance are you after? If you're not after anything over 40MB/S, I'd go for unRAID. I use this at home and it's brilliant. I've replaced many drives over the years, and I've had two hard drives fail with no massive consequences (data isn't striped). Plus, many many plugins are now available. SimpleFeatures (replacement gui), Plex Media Server, SQL, Email notifications with APCUPSD support etc etc.

Two servers using ZFS (Score:2)

by drsmithy ( 35869 ) writes:

One as the primary, sharing space via NFS for your VMs and whatever else. Throw a couple of SSDs in there for caching.
The second replicating from the first (via ZFS send/receive, or just simple rsync) with snapshotting for backups and regular syncs to some off-site data store for truly irreplaceable data.
This is the setup I use at home, and it sits behind a 3-node VMware cluster, several desktop PCs (one of which boots from the main server over iSCSI), and couple of media PCs.
Other than that, your require
- Re: (Score:2)
  
  by Dishwasha ( 125561 ) writes:
  
  Hmmm.....I'll have to look more deeply in to ZFS as I keep hearing it thrown out there. I should have probably qualified my statement as "thereby supporting live guest migrations". The non-sequitur was basically a hint thrown in to suggest what I meant by high-availability for those less likely to catch or understand the subtle distinction of what high-availability typically means. Just like most other things in life, key requirements may be more basic than what I've described, but if the car salesman th
  - Re: (Score:3)
    
    by drsmithy ( 35869 ) writes:
    
    I'll have to look more deeply in to ZFS as I keep hearing it thrown out there. I should have probably qualified my statement as "thereby supporting live guest migrations".
    Well, you still don't need a clustered filesystem for that. Or are you using file-backed virtual disks and using your data storage servers as virtualisation hosts as well ?
    If you are, my advise would be to split out your data storage to a separate set of machines. So:
    Two data storage servers, software RAID6 (or RAIDZ2 if ZFS), replicatin
cheap NAS (Score:2)

by borgasm ( 547139 ) writes:

go buy 2 or 3 cheap 8-10TB NAS devices
cycle one of them through every few months for a backup, and then store it at another physical location
that will run you less than $3000 total and a lot fewer headaches
- Re: (Score:2)
  
  by carnivore302 ( 708545 ) writes:
  
  This is the best advise I've seen so far.
Wow... (Score:2)

by RecoveredMarketroid ( 569802 ) writes:

You're serious about protecting your porn...
Not a viable solution (Score:4)

by pavera ( 320634 ) writes: on Tuesday November 01, 2011 @01:10AM (#37904118) Homepage Journal

If you're looking to have any kind of decent performance in your VMs this just won't work.
I've worked with VMs on all different kinds of storage (fiber channel SAN, local disk, iSCSI SAN (over 1Gb and 10Gb ethernet), Local hardware raid, NFS file shares, GFS2 (as in the RedHat cluster file system), and MooseFS and GlusterFS) All of these have been either in large test labs or in production cloud deployments. I've never had a cluster file system get close to passing muster as a storage medium for VM usage. IO is the number 1 bottleneck in virtualized environments, and these schemes just add completely unacceptable latency and bandwidth restrictions.
The only way to really run VMs is fiber channel SAN, local disk (or hardward raid), or iSCSI with 10GbE (on the storage server side). Even iSCSI with 2GbE (2x1GbE bonded) is not speedy enough to support more than 5-10 VMs running concurrently. You'll start to see problems at 5 VMs if the VMs are windows... For whatever reason Windows really likes to write to the disk. Currently I have 4 servers in my basement, a single storage server (6 2TB drives in a raid6, giving 8TB of usable disk) and 3 VM servers (2 2TB drives each, in hardward RAID1). I run the VMs locally and back them up to the storage machine over iSCSI nightly. I also have a shared volume on the storage system that all VMs and my household computers can access. I use openfiler for my storage system, if I had the money it would be nice to get a second storage server and replicate it (which openfiler supports), but I don't have that cash just sitting around right now
Backing up 8TB of data (ok, so I have about 5TB used), is basically impossible offsite, so we have a "special" folder on the shared drive that is backed up using crashplan, its about 600GB, and the first backup took nearly 3 months over a 5mbps upload.
The above setup is the only one I've found that is both a) somewhat affordable, and b) performs well enough to do actual work in the VMs. It provides for some mobility in the event of a hardware failure (if a VM server crashes, I can run the crashed VMs via iSCSI on another server (from the day old backup), If the storage server crashes, the only "important" data is the 600GB in the special folder... which would take 2 months to download over my home connection... But could be downloaded in stages, IE get the most important stuff immediately). If both a vm server and the storage server crash, I'm out the VMs that were running on the vm server, but again the important data is off-site, and the VMs can be rebuilt in a day or less.

- Re: (Score:2)
  
  by liquidweaver ( 1988660 ) writes:
  
  Have you heard of ATA over Ethernet? It's the bees knees. For VM, it's pretty much better than iSCSI in every way I have seen. It's scales out in a peered fashion. It's really damn efficient, and last time I set one up the total configuration process took me around 10 min, given support is already in the kernel. Want to go faster? Grab a 10G switch - no Fibre channel required. Bottlenecks? Only if you design it that way - you could just have a "switch full of HD's" if you like. I may or may not have set up
- Re: (Score:2)
  
  by drsmithy ( 35869 ) writes:
  
  I've worked with VMs on all different kinds of storage (fiber channel SAN, local disk, iSCSI SAN (over 1Gb and 10Gb ethernet), Local hardware raid, NFS file shares, GFS2 (as in the RedHat cluster file system), and MooseFS and GlusterFS) All of these have been either in large test labs or in production cloud deployments. I've never had a cluster file system get close to passing muster as a storage medium for VM usage. IO is the number 1 bottleneck in virtualized environments, and these schemes just add compl
Why bother? (Score:2)

by guruevi ( 827432 ) writes:

Simply use ZFS across your drives. There is no way you can use all your resources (network bandwidth, disk bandwidth) even on a low-end machine unless you get to ~50-200TB and require more than ~100,000 IOPS (which is doable on a single machine loaded with SSD, memory and 10GbE). There are setups that offer 1PB with 1M IOPS running on 2 very beefy (failover) hosts, only after that, distributed becomes necessary (unless of course you need geographical distribution).
Distributed file systems are nice if you kn
Best License (Score:2)

by SpaceLifeForm ( 228190 ) writes:

Quickly reviewing, I would go with GlusterFS. GlusterFS is free software, licensed under GNU GPL v3 license. Lustre Filesystem is GPL, but tainted by Oracle as you noted. Ceph is LGPL. I would go with the license you are most comfortable with. OrangeFS is also LGPL which you may wish to check out.
Offsite backups (Score:2)

by mrbill1234 ( 715607 ) writes:

I'm surprised that removable backup media has not caught up with the speed of change in hard disk sizes. In the olden days we used to backup with a couple of QIC-60's and we were happy. Later it was DAT backups, What inexpensive tape backup technologies are available today? It would seem that the best alternative is to use a drive itself as a backup medium and take it off-site.
xtreemfs (Score:3)

by marcello_dl ( 667940 ) writes: on Tuesday November 01, 2011 @06:35AM (#37905422) Homepage Journal

http://www.xtreemfs.org/ [xtreemfs.org] is a distributed fs with no single point of failure (i guess, depending on the configuration), for high latency networks, if you want to put nodes on WAN. It's fairly easy to set up, now it replicates also mutable files, I dunno about its performance or reliability.

Ill fit... (Score:4, Interesting)

by Junta ( 36770 ) writes: on Tuesday November 01, 2011 @08:07AM (#37905800)

Those filesystems are not designed primarily with your scenario in mind. If you want a hardware agnostic support, use software RAID or a non-cluster filesystem like ZFS.
Distributing your storage will probably not enhance your ability to survive a mishap. In fact, the complexity of the situation probably increases your risk of messing up your data (I have heard more than a couple of instances of someone accidentally destroying all the contents of a distributed filesystem, but in those professional contexts they have a real backup strategy. You'll be pissing away money on power to drive multiple computers that you really don't need to power.
If you care about catastrophic recovery, you need a real backup solution. This may mean identifying what's "important" from a practical home situation. If you don't mind downtime so long as your data is accessible in a day or two (e.g. time to get replacement parts) without going to your backup media and without suffering the loss of non-critical data, then also having a software raid or ZFS is the way to go. If you want to avoid downtime (within reason), get yourself a box with basic redundancy designed into it like a tower server from Dell/HP/IBM. If Intel, you would sadly want to go Xeon to get ECC, on AMD you can get ECC cheaper. In terms of drive count, I'd dial it back to 4 3TB drives in a RAID5 (or 5 in RAID6 if you wanted), safe on power and reduce risk in the system.

Something Else (Score:3)

by Nite_Hawk ( 1304 ) writes: on Tuesday November 01, 2011 @08:36AM (#37906000) Homepage

Hi,
I work for a supercomputing center and am the maintainer of our 1/2 PB Lustre deployment. I also hang out on the GlusterFS and Ceph IRC channels and mailing lists and have spent some time looking at both solutions for some of our other systems.
For what you want, Lustre isn't really the right answer. It's very fast for large transfer (though slow for small ones). On our storage I'm getting about 12GB/s under ideal conditions and that's totally uninteresting as far as Lustre goes. There are very few other options out there that are competitive at the ultra-high-end (ie PBs of storage at 100+ GB/s). On the other hand you *really* need to understand the intricacies of how it works to properly maintain it. It doesn't handle hardware failures very gracefully and there are still numerous bugs in production releases. A lot of progress has been made since the Oracle acquisition, but it's going to be a while before I'd consider Lustre mainstream. I wouldn't use it for anything other than scratch (ie temporary data) storage space on a top500 cluster.
GlusterFS and Ceph are both interesting. GlusterFS is pretty easy to setup and has a replication mode but last I heard there were some issues simultaneously enabling striping and replication at the same time. Now that RedHat is backing it I imagine its going to pick up in popularity really fast. Also, having the metadata distributed on the storage servers eliminates a major problem that Lustre still has: A single centralized metadata server. Having said this it's still pretty young as far these kinds of filesystems go, and it's not immune from problems either. Read through the mailing list.
Ceph is also very interesting, but you should really run it on btrfs and that's just not there yet. You can also run it on XFS but there have been some bugs (see the mailing list). Ceph is really neat but I wouldn't consider it production ready. Rumors abound though that dreamhost is going to be making some announcements soon. Watch this space.
Ok, if you are still reading, here's what I would do if I were you:
If you are running on straight up gigabit ethernet you basically have no reason to bother with distributed storage from a performance perspective. 10GE is a cheap upgrade path and a single server will easily be able to handle the number of clients you'll have on a home network. From a reliability standpoint I've personally found that something like 70-80% of the hardware problems I have are with hardware raid controllers. I'd stick with something like ZFS on BSD (or Nexenta if you don't mind staying under 18TB for the free license). Then export via NFS or iscsi depending on your needs. If you want HA across multiple servers, here's what people are doing on BSD with ZFS:
http://blather.michaelwlucas.com/archives/221 [michaelwlucas.com]

Keep it simple, stupid. (Score:4, Informative)

by jimicus ( 737525 ) writes: on Tuesday November 01, 2011 @08:49AM (#37906104)

Two issues here:
1. You're approaching the problem from the wrong angle. IMV, the angle you take should be "how long can can I afford to be without this data and how much money am I prepared to throw at a solution?" rather than "what technology exists that I can use to make the system more reliable?". Taking the former approach allows you to plan exactly how you'd deal with data loss - whether it's through human error, software/hardware failure, fire, theft, flood or what have you. Taking the latter approach tends to result in some whacking great Heath Robinson (or if you're American, Rube Goldberg) of a solution that still has a whacking great hole in it somewhere.
2. 8TB of data is not an enormous amount by any modern standard. You can buy a NAS box off-the-shelf today that will take 12x3TB hard disks for 36TB (18TB if you've got the good sense to run them in a RAID 1+0 configuration) of storage; at this level they typically have replication built right into them so you can buy two and replicate one to the other (though like all replication-type solutions, it's not a form of backup and you mustn't treat it as such). If that doesn't appeal, simply put a couple of SATA controllers in a cheap box and run OpenFiler. Anything you cobble together yourself based on the latest clustered filesystem du jour will suffer from one huge flaw - a system that's designed to be highly-available is frequently less reliable than one that isn't, simply because you're making it that much more complicated that there's a lot more to go wrong.

Performance, not recovery (Score:3)

by riley ( 36484 ) writes: on Tuesday November 01, 2011 @09:30AM (#37906500)

Clustered filesystems are not designed to make your data safer, or to provide ease of recovery. In fact, they make both of those things a bit more difficult. In the case of Lustre, the point is performance -- I have N servers that I am willing to dedicate to serving the filesystem, I can therefore get N times the throughput for large distributed jobs.
File systems that provide replication help, but unless it is copy on write (COW), it does nto take the place of backups.
If you are paranoid about data safety, invest in a backup solution. The only reason to use a distributed file system is for increased performance.

- Re: (Score:2, Insightful)
  
  by RobDollar ( 1137885 ) writes:
  
  If ever, this article is the case for your comment. Dishwasha, what the living fuck are you doing with your life. Answer that and then maybe, just maybe, coherent answers will abound.
- Re: (Score:2)
  
  by DarkDust ( 239124 ) writes:
  
  Maybe he has a girlfriend and doesn't want to lose his homemade porn. I like mine and want them to be safe, too ;-)
- - Re: (Score:3)
    
    by slaker ( 53818 ) writes:
    
    As someone with considerably more than 8TB of porn (and a similarly vast quantity of non-porn content, handily digitized and indexed), until recently I used paired servers each holding 12TB of drives in RAID6 with 2 drives as hot spares (64 physical drives on four machines). I used rsync to maintain a second copy of all my data. I've decided that's insane, and I've moved to using a single 36TB FreeBSD server (running zfs for my storage pools) that has enough internal expansion to accommodate another 36TB wi
    - Re: (Score:3)
      
      by JWSmythe ( 446288 ) writes:
      
      I have to ask, what the hell are you going to do with 8TB of porn? What's the total runtime of all of that?
      Consider, the whole Doctor Who series [slashdot.org]. 202GB is almost 11 days, 20 hours of runtime. Assuming roughly the same size, which may allow for higher resolution video with better compression, and rounding 202GB at 11 days 20 hrs down to 11 days (giving you bigger files per hour) you'd be looking at roughly 435 days.
      If you beat your meat for an hour a day, ever
      - Re:You Should... (Score:4, Interesting)
        
        by NotQuiteReal ( 608241 ) writes: on Tuesday November 01, 2011 @12:34AM (#37903920) Journal
        
        I have to ask...
        
        I'll go out on a limb and say it is just hoarding behavior. I wouldn't be surprised if slaker (53818) has a whole bunch of other stuff, besides data, but at least the data hoarding takes up less room than books, and isn't as sick as animal hoarding...
        
        Having observed some hoarders, first hand, I think something goes off in their head that is like a "gotta collect them all" flag. It usually is concentrated on a favorite subject, but it could even be set off with garbage, like tearing open a package and setting down the wrapper... one is trash, but, if it is not discarded, the second one is the "start of a collection", and off they go.
        
        
        Re: (Score:2)
        
        by slaker ( 53818 ) writes:
        
        Digital hoarding. Yes. It's a terrible disease barely kept in check by the constant threat of all the newspapers and empty cereal boxes that you apparently think occupy the remainder of volume in my home.
        That was sarcasm.
        No, I really don't have abnormally large collections of anything else. I have a half-dozen long boxes of comic books and perhaps a dozen full bookshelves. My home is actually quite tidy. I just have an odd hobby, which is far off-topic anyway. I note that no one has commented on the techni
        
        Re: (Score:3)
        
        by Sparx139 ( 1460489 ) writes:
        
        You can't mention as a passing fact that you have 8TB worth of porn and not expect people to respond with "wait, what?"
    - Re: (Score:2)
      
      by Artifex ( 18308 ) writes:
      
      As someone with considerably more than 8TB of porn[..]
      Can't you just afford to lease a girlfriend, boyfriend, or goat at this point?
      The goat even has a pretty good shredder built in.
- - Re: (Score:3, Insightful)
    
    by KendyForTheState ( 686496 ) writes:
    
    Uh... he DID confess to the crime AND lead the cops to his wife's body. I know...sarcasm, right?
- Re: (Score:2)
  
  by imemyself ( 757318 ) writes:
  
  Yep...I do this with Unison so writes on both sides can be replicated. Granted - I'm not replicating significant amounts of data, I've heard Unison may have problems with large volumes of data. But I think the Internet connection would be more of an issue than that.
  - Re: (Score:2)
    
    by hawkinspeter ( 831501 ) writes:
    
    I used to use Unison for that, but it's a bit sensitive to having the versions the same. I switched back to rsync as that allows me to upgrade one side and still be able to replicate.
- Re: (Score:2)
  
  by Marillion ( 33728 ) writes:
  
  I'm working on a multi-institution team doing biomedical research and one of the team members is using Gluster. It's 200TB of high resolution microscopy spread across six brick (aka: nodes) systems. I don't know if the vendor misconfigured it, but it is a complete pig of a system. It's slow. Painfully slow. We ended up copying active data to a small 12TB consumer NAS for analysis and leave the Gluster as the permanent archive.
- Re: (Score:2)
  
  by allenw ( 33234 ) writes:
  
  but I see no reason that it couldn't serve you well as a large personal file service.
  HDFS is not POSIX or mountable. So actually using the data from something that is expecting POSIX is going to painful. "But there is a FUSE plug-in!" Yes, there is, but you'll take a 60% perf hit using it, assuming that it still works in newer versions of Hadoop. See none of the hardcore devs actually use it, so there is a very good chance it is completely busted.
  In any case, there are still problems around losing the fsimage and having no real HA for the NN, needing quite a bit of RAM for any significa
- Re: (Score:2)
  
  by StarHeart ( 27290 ) * writes:
  
  I am doing very much the same thing. I have six 1tb hard drives in my main desktop, and five 1.5tb in a iSCSI server. I then combine them with mhddfs. It is slow, but I only use it for big files that I am not going to be rewriting. I use linux software raid5 for the big filesystems, and linux software raid10 for my /home.
  I am excited to see 4-5tb drives coming down the pipe. With just four 5tb drives I could replace all my hard drives, and remove the need for the the iSCSI server.
  I have seen the same errors
- Re:The Cloud, obviously. (Score:5, Insightful)
  
  by jareth-0205 ( 525594 ) writes: on Tuesday November 01, 2011 @05:26AM (#37905148) Homepage
  
  I would be grateful if this bit of 'humour' could not be posted to *every single vaguely cloud-related post*.
  http://linux.slashdot.org/comments.pl?sid=2356014&cid=36928876 [slashdot.org]
  http://tech.slashdot.org/comments.pl?sid=1683582&cid=32542918 [slashdot.org]
  http://tech.slashdot.org/comments.pl?sid=2499970&cid=37882212 [slashdot.org]
  http://it.slashdot.org/comments.pl?sid=2489600&cid=37805882 [slashdot.org]
  Christ. It was only mildly amusing to begin with, let it go.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Repeat after me: (Score:5, Insightful)

Re:Repeat after me: (Score:5, Insightful)

Re: (Score:3)

Re: (Score:3)

Re:Repeat after me: (Score:5, Insightful)

Re: (Score:3, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Obligatory: RAID is not a backup (Score:5, Insightful)

Re: (Score:3, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

PronFS (Score:3)

Re: (Score:3)

I know this isn't what you asked but... (Score:5, Interesting)

Re: (Score:3)

3 disks are just al vulnerable (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I'll add more (Score:2)

ZFS (Score:4, Informative)

Re: (Score:2)

You still need to make a decision (Score:5, Insightful)

Re: (Score:3)

No ZFS? (Score:5, Interesting)

Re: (Score:2)

Re:No ZFS? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Thoughts on OCFS (Score:4, Interesting)

Also ACFS (next generation of OCFS...) (Score:3)

Re:Thoughts on OCFS (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

response to OP, please read parent as well (Score:3)

rsync (Score:2)

Re: (Score:2)

AWS EBS (Score:2)

Re: (Score:2)

Re: (Score:2)

I was going to say Lustre, but... (Score:3, Insightful)

Tahoe-LAFS (Score:2)

Re: (Score:2)

LTO4 (Score:2)

Bad Dog. Wrong Tree! (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

He did not ask for a backup solution, calm down. (Score:3)

Re: (Score:2)

Drobo? (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re:Drobo? (Score:4, Interesting)

Stop with experimental shit (Score:2, Insightful)

Re: (Score:2)

Lustre (Score:4, Informative)

Re: (Score:2)

Fix the machines first... (Score:2)

Re: (Score:2)

Performance (Score:3, Informative)

Two servers using ZFS (Score:2)

Re: (Score:2)

Re: (Score:3)

cheap NAS (Score:2)

Re: (Score:2)

Wow... (Score:2)

Not a viable solution (Score:4)

Re: (Score:2)

Re: (Score:2)

Why bother? (Score:2)

Best License (Score:2)

Offsite backups (Score:2)