Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Linux Software

Maintaining Large Linux Clusters 134

pompousjerk writes "A paper landed on arXiv.org on Friday titled Installing, Running and Maintaining Large Linux Clusters at CERN [PDF]. The paper discusses the management of the 1000+ Linux nodes, upgrading from Red Hat 6.1 to 7.3, securely installing over the network, and more. They're doing this in preparation for Large Hadron Collider-class computation."
This discussion has been archived. No new comments can be posted.

Maintaining Large Linux Clusters

Comments Filter:
  • by Java Geeeek ( 624925 ) on Saturday June 14, 2003 @04:06PM (#6200974)
    My book on maintaing a cluster of 0-1 nodes will be out next month.
  • Lucky bastards (Score:5, Interesting)

    by Professor D ( 680160 ) on Saturday June 14, 2003 @04:25PM (#6201063)
    #include "back-in-my-day-rant"

    Damn. Back when I was on a high-energy experiment located in the middle-of-nowhere in Japan (subject of at least two slashdot articles), our japanese colleagues used to lease gaggles of Sun workstations at a yearly maintanence cost that exceeded the retail value of the machines themselves!!

    A few of us linux-fans used to grumble that we'd be better off buying dozens of cheap linux-boxes, but we weren't making the buying decisions. It seemed to us that the higher-ups didn't think cheap boxes with a free OS could compete on a performance basis with the Suns.

    As for me? I just installed CERNlib on my laptop and just laughed as it blew the suns away on a price/performance(+portability) basis

    • Installing, Running and Maintaining Large Linux Clusters at CERN

      Vladimir Bahyl, Benjamin Chardi, Jan van Eldik, Ulrich Fuchs, Thorsten Kleinwort, Martin Murth, Tim
      Smith CERN, European Laboratory for Particle Physics, Geneva, Switzerland

      Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling OS
      upgrades. This paper describes
  • by Anonymous Coward on Saturday June 14, 2003 @04:29PM (#6201082)
    So yeah, I basically designed my own system for a professor in the Political Science Dept at my universidad Washington University in St. Louis that completely boots over the network and is completely diskless for every node. About a year before Knoppix ever started doing that. Did it with openMosix and its fully LAM/MPI functional. Bruce of the openMosix list was on me for quite a while to get the docs done, but some really not cool domesitc issues came up and I never got them done. If anyone is really interested, send an email to drtdiggers_DONT_SPAM_ME_BASTARDS_@_SUCKYMICROSOFT_ hotmail.com and let me know, I'll finish them up.
    • I built a 60-node cluster about two years ago that originally had boot drives in each node. Since the gubment bought the boxes from the lowest bidder, we got very poor quality hardware to work with and about 75% of the hard drives died within a few months. To get things working again, I had to do the diskless boot thing. It was not at all fun. I can't see any advantage to that approach at all and I certainly wouldn't do it again by choice.
      • I don't claim to know more about your situation than you do, but several distros, including k12ltsp.org , support Open Mosix straight from the install, and work with either PXE (which you couldn't have used) or Etherboot. I'm not trying to change your mind. I'm just pointing out that there are a lot of folks who prefer and even swear by diskless clusters.
  • Just a little too late for the SETI@home project. Kind of a shame, really. If only we had those computers sooner...
    • There are other projects that could use a lot of spare CPU time. If the humanitarian ones don't excite, how about pissing off Bill Gates? (Click on my sig.)
    • i assume by "those computers" you mean the 1000 at CERN (not that they'd actually waste their time processing SETI data with the shit load of aliroot stuff going on these days...) well, CERN have been running GNU/Linux clusters for a looong time now, so this is no new thing. In fact, my friend actually had one of the older dual intel 500Mhz machines as his desktop machine, ripped out from the last generation cluster. they basically led him into the buzzing cluster room and said "grab one and follow me"... :
    • if we had lots of these cheap processors before... when they were expensive...
  • by Anonymous Coward on Saturday June 14, 2003 @04:49PM (#6201172)

    I've been looking at ClusterKnoppix mentioned recently on slashdot. It has built in openmosix and also supports thin clients via a terminal service. Just pop it in, and instant cluster. In case you missed the article:

    ClusterKnoppix [slashdot.org]

    • I've been working on a redistro of ClusterKnoppix designed for video encoding... and it's comming along.... Just a few more deps to rebuild. It Makes DIVX encoding bearable.

      I was building my own setup simular to knoppix until I discovered ClusterKnoppix.... I love when someone else does my work for me :)
  • Single system image (Score:5, Informative)

    by Tester ( 591 ) <olivier.crete@ocre[ ]ca ['te.' in gap]> on Saturday June 14, 2003 @04:56PM (#6201198) Homepage

    Where I work, we are developping a clustering system using single system images.. Where all the OS is stored on a server and is NFS mounted by each node. Our current tests show that we can easily run 100 nodes on 100mbit ethernet from a single server... And the coolest thing is that the nodes mount the / of the server, so for "small clusters" (under 100 nodes), we have to do a software upgrade only once and all nodes and the server are upgraded... Btw, this whole thing can be done using an almost unmodified Gentoo Linux distribution.

    I'm hoping to convince my boss to let us publish detailed docs.. he thinks that if we do everyone will be able to use it and he will loose sales (we are in the hardware business..). Details at our homepage [adelielinux.com] and about an older version (but with more details) at the place where we used to work [umontreal.ca].

    • Another way (which happens to be the way we do it where I work) is to make a master OS image, store it on a central server, and rsync it down to / on every node. Updates are made to the master OS image and then get automatically propagated down to every node. When new or replacement nodes are deployed, we use RedHat's KickStart system [redhat.com] to install a base OS on them, then rsync down the master image. We maintain over 700 nodes this way.
    • Another approach... (Score:2, Informative)

      by Junta ( 36770 )
      If you want to scale more, and your nodes have tons of ram, you could likely stuff the whole os into ramdisk and then use the local disk for the scratch space. Once booted, the network impact of nfs goes away.

      Of course, you could use System installer Suite (http://www.sisuite.org/) which is *similar* to the rsync method mentioned by the other poster, but you get to skip the redhat install step in favor of SiS's tools.
  • by Anonymous Coward on Saturday June 14, 2003 @04:57PM (#6201203)
    well, i recently interviewed at nvidia, and they have a 3,000+ cluster just for emulating the new graphics/io chips they're working on... they don't manufacture anything, the turn around time to manufacture a prototype for testing would take too long... so all they do is simulate the actual chips and then send the data off for fabrication once they're done. on a cluster of 3,000 machines, some jobs take all weekend, from what i understand.

    imagine if they just used one machine.
    • Why imagine? I got a calculator...

      16 years, 156 days, 3 hours

      Athlons would be putting out better graphics on their own that far into the future. :-)
    • Oooh.. all weekend. I recently attended a talk in applied math where a researcher presented results of a simulation that ran on 47 CPUs for two years to reveal many hitherto unknown facts about hydrogen bonding...

      --
      http://oss.netmojo.ca
  • by angio ( 33504 ) on Saturday June 14, 2003 @04:57PM (#6201206) Homepage
    This reminds me of a paoper that was just presented at USENIX:
    Fast, Scalable Disk Imaging with Frisbee [utah.edu]. Fun talk.

    Pretty cool tricks - they use multicast and filesystem specific compression techniques to parallel load the disks on a subset of the disks in the cluster. Very very very fast. (I use the disk imaging part of their software to load images on my test machines at MIT, and I'm quite impressed).

    Anyway, just a bit of related cool stuff.
  • Red Hat 7.3 (Score:2, Informative)

    by Spoticus ( 610022 )
    RH 7.3 reaches it's end of life in December of this year. One can only assume (and hope) that they have the in-house people to support it, or it's going to cost them beacoup $$ for continued RHN support.
    • Re:Red Hat 7.3 (Score:3, Informative)

      by vondo ( 303621 ) *
      I'm sure they are firewalled/NATed off, so why would they need (or even want) to upgrade that often?
    • youve gotta be kiddin me... most of the GNU/Linux operating system is written in-house at CERN. the only reason they use redhat is so they can tell other institutions which distro to install in order to be binary compatible and sure of sources compiling successfully. im actually surprised they haven't made their own distro... i remember hearing the arguments against it once, but the memory has faded.
      • Actually they do have their own distro... CERN Linux [web.cern.ch]. It's essentially Redhat with a few modifications.
        • interresting, i know a few people at CERN and none of them use this

          maybe its so close to redhat that they burn the CD's with "redhat" written on it.

          anyway, thanks for the link, if i had mod points i give you +1 informative


  • So, to all those who are in the know out there... when they have what they want how many nodes and individual machines could they maintain? What are the constraints? What about data back-ups? Is ephemeral data recorded on a few machines in separate nodes to make sure that one getting nocked out doesn't zap something for good?
    • Well, in particle physics, the typical use is that data isn't stored on these systems longer than it takes to analyse it (and since data is constantly being accumulated, you don't worry about small losses).

      But, there are people looking into parallel, redundant filesystems and the like so that you can keep more on disk. For instance 1000x60GB=60TB is a sizable amount of free space on these clusters, but the output datarate from these experiments is a petabyte/year or so.

  • by ameoba ( 173803 ) on Saturday June 14, 2003 @05:25PM (#6201292)
    Who in their right mind would have a cluster this size, for this sort of work, on any network where "securely installing over the network" is an issue? I mean, I'd want this as far off of a public network as possible, unless I really want to explain to whoever authorized my grant why my experimental data indicates that:

    e = mc^31337
    • by samhalliday ( 653858 ) on Saturday June 14, 2003 @05:36PM (#6201360) Homepage Journal
      if you read the paper (which OK is not as bad as not reading the article), you would realise that this is not a project which is being performed only at CERN; when LHC (and others, eg ALICE) become active in a few years, the data is going to be piped to literally hundreds of participating instututions (this is the current list for one of the smaller experiments) [web.cern.ch] for data analysis. so, no, this is not enough processing power, and yes they need it to be publically available. i also know people who are (or were?) working on the security implementations. believe me, at CERN, they think it through; its run by lots of really smart people who know what they are at, not politicians. the distributed processing that comes out of these projects will hopefully pave the way forward for the next generation of the internet (the grid).
    • Perhaps the nodes are not all physically located in the same building, or are otherwise vulnerable to physical man-in-the-middle intrusions. If one adopts secure practices as a matter of principle, it saves having to go back and implement security as an afterthought someday when the situation changes in an unanticipated way.

      --
      http://oss.netmojo.ca
  • by arose ( 644256 ) on Saturday June 14, 2003 @06:05PM (#6201489)
    ...run Windows?
  • Running rpm --rebuilddb must be a real drag.
  • by pschmied ( 5648 ) on Saturday June 14, 2003 @07:28PM (#6201809) Homepage
    I'm surprised that nobody has mentioned SystemImager [systemimager.org]. If you haven't looked at it for maintaining large numbers of Linux boxes, scamper off and take a look now. It is worth your time.

    Now, that being said, I recently had the opportunity to evaluate using a number of OpenBSD boxes, but I couldn't find a utility for maintaining a bunch of the boxes in the same manner as SystemImager (i.e. Incrementally update servers from a golden master via rsync).

    So, has anyone run found anything that does what systemimager does, but that is cross-platform? Do any SystemImager developers out there want to comment on the potential difficulty in supporting other-than-Linux operating systems in SystemImager?

    SystemImager is one of the most useful tools I've ever seen, however, I believe that it would be an enterprise "killer app" if it could do MacOS X, *BSD, Windows etc.

    -Peter
    • SystemImager is one of the most useful tools I've ever seen, however, I believe that it would be an enterprise "killer app" if it could do MacOS X, *BSD, Windows etc.
      You should check out radmind [radmind.org]. It does in fact "do" Mac OS X, *BSD, and Linux.

      :w
      • Hmm... Not quite there yet. The collection of command line tools could probably be rolled into something that automates system management the way SystemImager does. But even then, radmind rather unintelligently seems to recopy entire files.

        Also, how is partitioning taken care of.

        No, I'm still looking for something like SystemImager that handles multiple Operating Systems. Perhaps extending SystemImager to support others will be the easiest way.

        As a side note, Frisbee, which was mentioned in a previous
        • Sorry, not a big SystemImager expert. I see that it just uses rsync, hence your comment about recopying entire files. I'd point out that for binary files, rsync tends to copy the entire file anyway, on a version change. radmind's nice in this case because it can tell that a file needs to be updated with no network traffic.

          how is partitioning taken care of

          Depends on the system. For Mac OS X, we pretty much need to use Apple's tools. For Solaris, we use Jumpstart. Kickstart on Linux. Partitioning i
  • linuxbios, anyone? (Score:2, Informative)

    by nafrikhi ( 681595 )
    has anyone tried linuxbios http://www.linuxbios.org/ to replace standard bios. results in a diskless, faster boot. used in this cluster architecture: http://www.clustermatic.org/
    • In a network, this seems to be largely redundant.
      Use PXE when you want a diskless boot. May take more than 3 seconds, but is supported on many, many more systems!
  • dsriugadniaw34r sareh98fase fasef

Some people manage by the book, even though they don't know who wrote the book or even what book.

Working...