Follow Slashdot stories on Twitter

New Linux Petabyte-Scale Distributed File System 132

Posted by samzenpus on Wednesday May 05, 2010 @08:14PM from the check-it-out dept.

An anonymous reader writes "A recent addition to Linux's impressive selection of file systems is Ceph, a distributed file system that incorporates replication and fault tolerance while maintaining POSIX compatibility. Explore the architecture of Ceph and learn how it provides fault tolerance and simplifies the management of massive amounts of data."

This discussion has been archived. No new comments can be posted.

New Linux Petabyte-Scale Distributed File System

Search 132 Comments Log In/Create an Account

Comments Filter:

Is data integrity really necessary for large data? (Score:2, Interesting)

by BadAnalogyGuy ( 945258 ) writes: <BadAnalogyGuy@gmail.com> on Wednesday May 05, 2010 @08:29PM (#32106546)

Look at Google and Facebook, arguably among the top users of massive databases. They have petabytes upon petabytes of data stored and are constantly growing. But what happens if they lose some data?
Nothing. They can always go back and regenerate that data. It's just a matter of time.
So at this large scale, it doesn't make any sense at all to focus on data integrity beyond making sure that fopen() and fread() don't return garbage. It's the smaller databases that contain critical information that need data integrity. These are typically sub-terabyte, though some may creep over that limit in a few uncommon instances.
And realistically, if you don't want your data to be hacked up, lost, then thrown out with a bad drive, ReiserFS or any other modern journaling filesystem is the right choice.
I wouldn't bet money on distributed filesystems just yet.

Share
twitter facebook
"Enterprisey" design? Yet no scrubbing? (Score:2, Interesting)

by Hurricane78 ( 562437 ) writes: <deleted@slas h d o t.org> on Wednesday May 05, 2010 @09:42PM (#32107018)

I see a lot too many layers over layers there. Which always smells like the inner-platform anti-pattern [wikipedia.org] that a “enterprise consultant” would to, to me.
But maybe I’m just misunderstanding things and that amount of layers is needed for large installations. Anyone here, who actually administers such large storage systems and read the article? Would be interesting to hear from someone with daily experience in this.
Also, I could not find any mentioning of any ZFS-like scrubbing going on. Which in my experience equals zero reliability at all with today’s unreliable drives. How would that system detect a controller creating corruption? Or data degradation? I had those problems. And they killed half my data. Despite having a RAID, doing automatic backups with verification and having a git-like history of changes (to protect from accidental overwriting). Nothing of that helped me at all.
Only constantly checking all data, and fixing them, before the errors become big enough for ECC to stop working, can prevent this.
Did I miss it, or did they really forget that crucial part?

Share
twitter facebook
Re:Totally not ripped from a webcomic... (Score:4, Interesting)

by SanityInAnarchy ( 655584 ) writes: <ninja@slaphack.com> on Wednesday May 05, 2010 @10:29PM (#32107318) Journal

Pick one.
What you call a "rat's nest", we call "compatibility", and it works surprisingly well. Writing a game? Use OpenAL -- the distro will configure it to work. Need realtime audio for a DAW? Use JACK. Anything else? Use ALSA.
What if you picked the "wrong one"? Doesn't really matter. If you managed to build a decent DAW on top of ALSA, it'll continue to work on top of ALSA. If you used OSS, that still works today.
Video APIs? Flash has its own codecs, so all you need to know is xvideo.
Seriously, you have even less of an excuse than people who bitch about how Linux has both GNOME and KDE, and oh, the horrors of actually having a choice.

Parent Share
twitter facebook
How does this differ from glusterfs? (Score:2, Interesting)

by caffeinejolt ( 584827 ) writes: on Wednesday May 05, 2010 @11:28PM (#32107664)

I am not real familiar with ceph and after going through the pain to learn more about glusterfs (http://www.gluster.org/) only to learn that gluster was not quite ready for primetime (this was about 6 month ago - may have changed), I am a bit skeptical. Anyone know the main differences between ceph and glusterfs (besides that glusterfs can run in userspace)?

Share
twitter facebook
Re:Totally not ripped from a webcomic... (Score:3, Interesting)

by iknowcss ( 937215 ) writes: on Wednesday May 05, 2010 @11:40PM (#32107750) Homepage

Actually, I'm glad that he didn't link to it. I swear, every other story on Slashdot has some comment with a link to XKCD. Hey, we get the jokes. All of us read XKCD. You don't link to a video of Yakov Smirnoff every time you make a Soviet Russia joke, do you?

Parent Share
twitter facebook
Re:Is data integrity really necessary for large da (Score:2, Interesting)

by Anonymous Coward writes: on Wednesday May 05, 2010 @11:48PM (#32107812)

Yes, but Google's file system makes no attempt to implement either the POSIX standard or the Linux VFS. It's highly specialized to only deal with the types of loads that Google sees. As a general solution, it's worth is debatable.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

New Linux Petabyte-Scale Distributed File System 132

New Linux Petabyte-Scale Distributed File System More Login

New Linux Petabyte-Scale Distributed File System

Is data integrity really necessary for large data? (Score:2, Interesting)

"Enterprisey" design? Yet no scrubbing? (Score:2, Interesting)

Re:Totally not ripped from a webcomic... (Score:4, Interesting)

How does this differ from glusterfs? (Score:2, Interesting)

Re:Totally not ripped from a webcomic... (Score:3, Interesting)

Re:Is data integrity really necessary for large da (Score:2, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot