Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Data Storage Networking Software Linux

Distributed Storage Systems for Linux? 52

elambrecht asks: "We've got a _lot_ of data we'd like to archive and make sure it is accessible via the web 24/7. We've been using a NetApp for this, but that solution is just waaaay to expensive to scale. We want to move to using a cluster of Linux boxes that redundantly store and serve up the data. What are the best packages out there for this? GFS? MogileFS?"
This discussion has been archived. No new comments can be posted.

Distributed Storage Systems for Linux?

Comments Filter:
  • by afabbro ( 33948 ) on Thursday May 05, 2005 @12:38PM (#12442500) Homepage
    What kind of idiotic Ask Slashdot is this? All of the important data is missing:
    • What's "a lot"? 1MB is a lot of data if you think about it. When people start talking about "a lot" of data these days, I assume they're meaning hundreds of terabytes. Is that what you mean?
    • What's the budget? What performance do you need? Do you need to back it up? Do you need to replicate it? Your post is sort of like "hi, I have a problem. What is the answer? Thanks!"
    Also, it's "too expensive to scale," my friend. You'd think an "Editor" like Cliffy would fix posts, but he's too lazy.

    If you can afford NetApp, why not keep with NetApp? A bunch of Linux boxes is not a storage solution. Indeed, what does Linux have to do with anything? We're talking storage here. What are you planning to do - put in 200 of them with internal SATA drives? Yeah, that'll be a lot cheaper to maintain...

    I'm not shilling for NetApp, but if you really have "a lot" of data to put "on the web" "24/7" then you need some kind of real storage solution like a NetApp or one of their competitors.

    Now go away and please take Cliff with you.

  • We use OpenAFS (Score:4, Insightful)

    by Bamfarooni ( 147312 ) on Thursday May 05, 2005 @01:49PM (#12443446)
    We have about 27TB of data from Mars (and adding another TB per month) that we need to keep online. We have been using netapps, but at ~$25K/TB, plus maintenance (3 years maintenance is about as much as a whole new system) they're just WAY too expensive for data warehousing.

    We've moved to using linux based OpenAFS servers. A high quality 3U box (qsol.com [qsol.com]) loaded with 16x 300GB ATA drives costs about $8.5K and provides us about 3.5TB (2 drives for parity, 2 drives for hot-swap). That works out to $2.5K/TB. If your risk tolerance is higher than mine, you can bring that up to $8K/5.5TB, for about $1.5K/TB). We really want 99.999% availability, so just to be safe, we keep a 100% redundent read-only copy on a second machine (AFS supports this beautifully, including automatic fail-over).

    OpenAFS has a couple of features that make it better than NFS (client-side cache, for instance), but it also has a few drawbacks, like no files >2GB.
  • by Punboy ( 737239 ) on Thursday May 05, 2005 @06:33PM (#12446462) Homepage
    A bunch of Linux boxes is not a storage solution.

    Hey man, don't tell that to Google.
  • Re:Centera (Score:3, Insightful)

    by egarland ( 120202 ) on Thursday May 05, 2005 @11:59PM (#12448513)
    But we call the Centerra a "data jail". It's like the roach motel..

    Ug. It's just not true. Most applications that are built to work with Centera include functionality to migrate in/out of the system just like most applications that are built to work with tape can both put data on and get it back. The difference is tape sucks, Centera doesn't.

    It can't scale beyond a 42U rack enclosure.

    Also not true. I have worked extensively with a 3 rack install with about 50tb of data on it. I believe all versions of Centera since the very first are capable of scaling to 4 racks and some are capable of going to 8 racks. Lots of customers have 2 rack installs. Raw storage on the currently shipping nodes is over 1 tb per node and you can put 32 nodes in a rack. Do the math, a 4 rack Centera is quite big even after taking mirroring or CPP into account.

    It's a bunch of little servers striped together to form a big NAS with a metedata controller in the middle.

    No. No No.

    It IS a bunch of little servers but no they are not "striped together", and no they don't form a NAS. There is no "metadata controller" and there certainly isn't one in the middle. It is a storage cluster that has features specifically designed to store fixed content. Centera is not a simple Linux hack to make a bunch of boxes look like a storage cluster. It's a robust, flexible, well thought out piece of clustering software that is built on top of a Linux base.

    Centera hardware is good stuff too. It has redundant externally facing servers (access nodes) so that if one fails, applications can keep working. Both back end switches are linked to every node so everything has redundant data paths. Data is stored in such a way that no data is unavailable if any single node fails or goes offline for any reason.

    It's easy to dismiss Centera because it's so different from the standard storage systems who's basic interfaces really haven't changed in 3+ decades. It's not a block device. It's not a filesystem. It's not a mountable share. It's a storage cluster with functionality specifically designed to manage fixed content. It is accessed only through a client side API that talks to the cluster over IP. It isn't easy to wrap your head around.

Real Programmers don't eat quiche. They eat Twinkies and Szechwan food.

Working...