Maintaining Large Linux Clusters 134
pompousjerk writes "A paper landed on arXiv.org on Friday titled Installing, Running and Maintaining Large Linux Clusters at CERN [PDF]. The paper discusses the management of the 1000+ Linux nodes, upgrading from Red Hat 6.1 to 7.3, securely installing over the network, and more. They're doing this in preparation for Large Hadron Collider-class computation."
I am a cluster of one (Score:3, Funny)
Re:I am a cluster of one (Score:2)
Huh? (Score:2)
Re:Huh? (Score:1)
Holy crap, that would have been embarrassing...
Lucky bastards (Score:5, Interesting)
Damn. Back when I was on a high-energy experiment located in the middle-of-nowhere in Japan (subject of at least two slashdot articles), our japanese colleagues used to lease gaggles of Sun workstations at a yearly maintanence cost that exceeded the retail value of the machines themselves!!
A few of us linux-fans used to grumble that we'd be better off buying dozens of cheap linux-boxes, but we weren't making the buying decisions. It seemed to us that the higher-ups didn't think cheap boxes with a free OS could compete on a performance basis with the Suns.
As for me? I just installed CERNlib on my laptop and just laughed as it blew the suns away on a price/performance(+portability) basis
pdf -- text (Score:1)
Vladimir Bahyl, Benjamin Chardi, Jan van Eldik, Ulrich Fuchs, Thorsten Kleinwort, Martin Murth, Tim
Smith CERN, European Laboratory for Particle Physics, Geneva, Switzerland
Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling OS
upgrades. This paper describes
Autoassimilating Diskless Linux Clusters (Score:3, Interesting)
Re:Autoassimilating Diskless Linux Clusters (Score:1)
Re:Autoassimilating Diskless Linux Clusters (Score:2)
The tftpboot thing was a bit of a mess to set up. Then Redhat simply did not want to let the normal boot process work, so I had to rewrite rc.sysinit basically from scratch.
Plus, I had to hack in
Re:Autoassimilating Diskless Linux Clusters (Score:2)
Boot hard drives are cheap (~$50) and uses the boot mechanism that is most stable and well validated. People building ~1000 CPU clusters may well justify the cost savings and the additional setup work to get all of the kinks out of setup. People building "normal" clusters
Re:Autoassimilating Diskless Linux Clusters (Score:2)
Re:Autoassimilating Diskless Linux Clusters (Score:2, Interesting)
Too little, too late. (Score:2, Funny)
Re:Too little, too late. (Score:1)
Re:Too little, too late. (Score:2)
Re:Too little, too late. (Score:1)
ClusterKnoppix - OpenMosix (Score:5, Interesting)
I've been looking at ClusterKnoppix mentioned recently on slashdot. It has built in openmosix and also supports thin clients via a terminal service. Just pop it in, and instant cluster. In case you missed the article:
ClusterKnoppix [slashdot.org]
Re:ClusterKnoppix - OpenMosix (Score:1)
I was building my own setup simular to knoppix until I discovered ClusterKnoppix.... I love when someone else does my work for me
Single system image (Score:5, Informative)
Where I work, we are developping a clustering system using single system images.. Where all the OS is stored on a server and is NFS mounted by each node. Our current tests show that we can easily run 100 nodes on 100mbit ethernet from a single server... And the coolest thing is that the nodes mount the / of the server, so for "small clusters" (under 100 nodes), we have to do a software upgrade only once and all nodes and the server are upgraded... Btw, this whole thing can be done using an almost unmodified Gentoo Linux distribution.
I'm hoping to convince my boss to let us publish detailed docs.. he thinks that if we do everyone will be able to use it and he will loose sales (we are in the hardware business..). Details at our homepage [adelielinux.com] and about an older version (but with more details) at the place where we used to work [umontreal.ca].
Re:Single system image (Score:1)
Another approach... (Score:2, Informative)
Of course, you could use System installer Suite (http://www.sisuite.org/) which is *similar* to the rsync method mentioned by the other poster, but you get to skip the redhat install step in favor of SiS's tools.
why such a huge cluster? (Score:5, Interesting)
imagine if they just used one machine.
Re:why such a huge cluster? (Score:2, Funny)
16 years, 156 days, 3 hours
Athlons would be putting out better graphics on their own that far into the future.
Re:why such a huge cluster? (Score:1)
--
http://oss.netmojo.ca
Related project: Loading disk images for clusters (Score:5, Informative)
Fast, Scalable Disk Imaging with Frisbee [utah.edu]. Fun talk.
Pretty cool tricks - they use multicast and filesystem specific compression techniques to parallel load the disks on a subset of the disks in the cluster. Very very very fast. (I use the disk imaging part of their software to load images on my test machines at MIT, and I'm quite impressed).
Anyway, just a bit of related cool stuff.
Red Hat 7.3 (Score:2, Informative)
Re:Red Hat 7.3 (Score:3, Informative)
Re:Red Hat 7.3 (Score:2)
Re:Red Hat 7.3 (Score:1)
Re:Red Hat 7.3 (Score:2)
maybe its so close to redhat that they burn the CD's with "redhat" written on it.
anyway, thanks for the link, if i had mod points i give you +1 informative
Re:Large _Hardon_ Collider? (Score:1)
OK I just laughed so hard at that the people around me gave me weird looks. Rare that you see something funny on slashdot these days as opposed to "rofl tacos spelling sux"
question from a psuedo-geek... (Score:2)
So, to all those who are in the know out there... when they have what they want how many nodes and individual machines could they maintain? What are the constraints? What about data back-ups? Is ephemeral data recorded on a few machines in separate nodes to make sure that one getting nocked out doesn't zap something for good?
Re:question from a psuedo-geek... (Score:2)
But, there are people looking into parallel, redundant filesystems and the like so that you can keep more on disk. For instance 1000x60GB=60TB is a sizable amount of free space on these clusters, but the output datarate from these experiments is a petabyte/year or so.
"securely installing over the network" (Score:5, Interesting)
e = mc^31337
Re:"securely installing over the network" (Score:4, Interesting)
Re:"securely installing over the network" (Score:2, Insightful)
--
http://oss.netmojo.ca
But does it... (Score:4, Funny)
Re:But does it... (Score:1)
I can just see the purchase-request now... 1000 copies of Windows at $250 each.
Re:But does it... (Score:2, Funny)
1000+ cluster? (Score:2)
SystemImager-like update mechanism for non-Linux? (Score:5, Interesting)
Now, that being said, I recently had the opportunity to evaluate using a number of OpenBSD boxes, but I couldn't find a utility for maintaining a bunch of the boxes in the same manner as SystemImager (i.e. Incrementally update servers from a golden master via rsync).
So, has anyone run found anything that does what systemimager does, but that is cross-platform? Do any SystemImager developers out there want to comment on the potential difficulty in supporting other-than-Linux operating systems in SystemImager?
SystemImager is one of the most useful tools I've ever seen, however, I believe that it would be an enterprise "killer app" if it could do MacOS X, *BSD, Windows etc.
-Peter
Re:SystemImager-like update mechanism for non-Linu (Score:2)
Re:SystemImager-like update mechanism for non-Linu (Score:2)
Also, how is partitioning taken care of.
No, I'm still looking for something like SystemImager that handles multiple Operating Systems. Perhaps extending SystemImager to support others will be the easiest way.
As a side note, Frisbee, which was mentioned in a previous
Re:SystemImager-like update mechanism for non-Linu (Score:2)
how is partitioning taken care of
Depends on the system. For Mac OS X, we pretty much need to use Apple's tools. For Solaris, we use Jumpstart. Kickstart on Linux. Partitioning i
linuxbios, anyone? (Score:2, Informative)
Re:linuxbios, anyone? (Score:1)
Use PXE when you want a diskless boot. May take more than 3 seconds, but is supported on many, many more systems!
ar98sarf s87aeh87aw4h (Score:1)
Re:Obligatory Posts... (Score:1, Offtopic)
Re:"But why?" asked Little Johnny. (Score:2, Insightful)
Look at google?
Re:"But why?" asked Little Johnny. (Score:5, Funny)
Maybe for a Large Hadron Collider-class computation.
Re:"But why?" asked Little Johnny. (Score:5, Interesting)
Just because you don't need it, or can't envision needing it, doesn't mean nobody else needs that kind of power.
Bob
Re:"But why?" asked Little Johnny. (Score:1)
Re:"But why?" asked Little Johnny. (Score:4, Informative)
(Disclaimer: IANAPP (Particle Physicist))
Only on Slashdot... (Score:2)
Gotta love Slashdot... the only place where such a disclaimer isn't taken for granted.
Re:"But why?" asked Little Johnny. (Score:1)
Re:"But why?" asked Little Johnny. (Score:3, Funny)
Either that, or all the pr0n encoding.
Best...Tivo...*ever*!
Host this thing at an Internap location, and you're the Ultimate LPB.
Searching for "First Posters" for the Homeland Security people to "visit."
SETI client!
"Every room has every movie ever made in any language." Who do you think hosts *that*?
ILM, seeing the second LOTR movie, decides an 'upgrade' is in order for the SW:EP3 render farm.
It takes this much computing power to fi
Re:"But why?" asked Little Johnny. (Score:1, Funny)
Re:"But why?" asked Little Johnny. (Score:2, Funny)
Re:"But why?" asked Little Johnny. (Score:1)
Re:"But why?" asked Little Johnny. (Score:1)
The Atlas Project [web.cern.ch] at CERN, when it comes online, is supposed to produce a petabyte of data every year. I doubt one 1000 node cluster would be enough to process that data quickly.
Re:"But why?" asked Little Johnny. (Score:5, Interesting)
First, as another poster pointed out, these detectors produce a LOT of data. I'm on an experiment [fnal.gov] slated to take data at about the same time as the LHC experiments, with similar rate requirements.
We plan to use a 2500 node cluster (of year 2007 CPUs) to filter our data in real time. The input rate into this cluster will be about 10 GB/s, output rate about 200 MB/s.
But, each interaction is analyzed (usually) by just one computer. There are so many interactions, though, that you need massive clusters, but not much communication between nodes of the cluster.
That's just for the data filter. You need even larger amounts of computing to analyze what comes out in that 200 MB/s and to simulate what happens in the experiment. Much larger amounts.
Our experiment will ultimately require clusters this size at the laboratory and at something like a dozen other institutions.
Re:"But why?" asked Little Johnny. (Score:1)
This is stored to tape though (~50 30 Mbyte/sec Storagetek 9.940B drives in parallel), not realtime.
Re:"But why?" asked Little Johnny. (Score:1)
Next Q: Why whould anyone want to make 3d animations in Blender? A. Because I want to!
Re:"But why?" asked Little Johnny. (Score:1, Interesting)
You will need this much computing power if you are trying to filter and analyze one the order of a petabyte of data yearly. Some collisions at the LHC will produce 1000s of particles, a large fraction of which will be detected in multiple detectors as they fly away from the collsion point (nucleus on nucleus collisions). Thousands of these collisions will happen every second. The information in the various detectors then must be collected back so that all the signals a partic
Re:"But why?" asked Little Johnny. (Score:3, Insightful)
Re:Episode 11.1: Blindsided (Score:1)
Re:Episode 11.1: Blindsided (Score:1)
Re:Episode 11.1: Blindsided (Score:1)
Hmmmmm
Re:Why? (Score:1, Flamebait)
Re:Why? (Score:1)
Well, Frenchie La Frencherson, last time I checked (right now as a matter of fact), Switzerland was located smack in the damn middle of Europe and the EU. How dumb do you think us americans are?
Re:Why? (Score:2)
you actually checked? hehe...
Re:Why? (Score:1)