Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Linux Software Science Hardware

How to get 1.5 TeraFlops from Linux 280

Oak Ridge National Lab has purchased from SGI an Altix 3000 (flash movie). This article claims that: SGI Altix 3000 is recognized as the first Linux cluster that scales up to 64 processors within each node and the first cluster ever to allow global shared memory access across nodes. There is more here, here, and here.
This discussion has been archived. No new comments can be posted.

How to get 1.5 TeraFlops from Linux

Comments Filter:
  • Look Out! (Score:4, Funny)

    by TWX ( 665546 ) on Tuesday July 08, 2003 @12:47PM (#6392752)
    "SGI Altix 3000 is recognized as the first Linux cluster that scales up to 64 processors"

    SCO will be all over your ass now!
  • by Kiriwas ( 627289 ) on Tuesday July 08, 2003 @12:51PM (#6392800) Homepage
    After all the beowulf cluster jokes, I am still incredibly curious about them. My goal is to build a small 5-6 node cluster by the end of the summer. The thing is, I still know very little about them. Every jokes about them, but no one puts any useful information. Are there specific langauges one must program in to tak advantage of the multiple processors? Or does the OS take care of that? How much speed can you actually get out of them? Is it pure processing power? Or is there more? I'm very curious and want to know.
    • by TWX ( 665546 ) on Tuesday July 08, 2003 @12:53PM (#6392831)
      You're better off using mosix. It'll allow for more normal (ie, not beowulf specific) applications to thread across computers. I'd imagine that an open-mosix setup (like the ones using the knoppix boot CDs tailored to it) could probably make for a fairly powerful computing cluster very easily.
      • while you are probably right that for most cases mosix will do just fine (I used it for a ~50 PC cluster at nights for DSP calcs), these machines are for super-computer calculations that require a lot of memory. If you even could run a 2GB process on mosix, it would be slowed down by the network, and these beasts can run 100GB processes at a 2GB/s interconnect !
      • by ERJ ( 600451 ) on Tuesday July 08, 2003 @01:20PM (#6393084)
        Mosix is nice, because it treats the cluster like a single, large, multi-cpu box by simply allocating threads to different boxes. The nice thing about this is that any multi-threaded program can take advantage (as stated in the parent post).

        However, this also can cause problems. Most threaded programs are written assuming that all the threads have high speed (i.e. system bus / cpu cache) access to shared information. When we introduce the latency incurred by a network, this can cause programs to run alot slower then they would if they simply had all the threads on a single box. Obviously, it all depends on how the program was written, and what it does.

        If you are writting a program specifically for a cluster, I would suggest instead looking at something like LAM-MPI [lam-mpi.org]. This allows for a much more controlling approach to be taken. It is more work (you have to decide how the work will be split) but it allows for much better control of where and what is being done and how to optimize it.
        • Threads can't be migrated. Only processes can be migrated.

          http://howto.ipng.be/openMosixWiki/index.php/App li cations%20using%20pthreads

          You have to write your application as a bunch of processes to take advantage of a mosix cluster.

          Joe
      • Mosix... (Score:3, Informative)

        by wowbagger ( 69688 ) *
        The thing about Mosix is the costs of process migration.

        First, you have to understand process migration. In a mosix cluster, a running process can be moved, lock stock and barrel, from one CPU to another. All that is left behind is a "stub" process that forwards all file I/O across the network to the new location. So, if the program was a 3D raytracer that had the source description file and the output file open, after migration all file accesses to those files would be forwarded over the network to the st
    • here [google.com] and here [beowulf.org] are probably good places to look.
    • The thing is, I still know very little about them. Every jokes about them, but no one puts any useful information.

      Well maybe you're looking in the wrong place [google.com].

      Good luck.
    • by gladbach ( 527602 ) on Tuesday July 08, 2003 @12:55PM (#6392858)
      just download clusterknoppix and knock yourself out. ; )

      http://bofh.be/clusterknoppix/
    • by The_ForeignEye ( 681271 ) <`ten.eyengierof' `ta' `nailuj'> on Tuesday July 08, 2003 @12:56PM (#6392867) Homepage
      Back in my days of parallel programming (read: 1998) on Beowulf clusters I used Fortran and C. The trick to make your program "parallel" is to use special programming libraries that will spawn instances of your program across the cluster and let them communicate between each other. The libraries I used were PVM and MPI.

      At that time they were working on a Java implementation, but I don't know what happened with that.
    • Another poster mentioned MOSIX, but openMosix [sourceforge.net] is probably a better bet. It's released under the GPL, and is a combination of kernel-patch and user-space tools. Once you get these installed on each node, and connected via ethernet (all with networking set up of course... IP addresses etc) you should have yourself a cluster.
    • You might be able to find this book [linuxjournal.com] in the remainder bins. Alas, the publisher has recalled it, for some reason.
    • Take a look at OSCAR [sourceforge.net]. We built a nine node cluster out of IBM e-servers using it. It was really quite straightforward.

      As far as languages go, you'll need an MPI library (like MPICH [anl.gov], or LAM/MPI [lam-mpi.org] (which is also a runtime environment), but the actual code used is usually C, C++, or Fortran. BTW, OSCAR comes with MPICH and LAM/MPI.
    • You don't seem to remember the joke well enough. You're supposed to IMAGINE the beowulf cluster, not actually build it.
    • This SGI isn't a beowulf cluster. Traditionally beowulf clusters refer to clusters that use COTS hardware, don't have global shared memory, etc. Lots of people in the cluster community won't even call clusters of workstations beowulf clusters if they have some high speed network like Myrinet. We just call ours a Linux cluster, a cluster, a distributed memory supercomputer... You can program your beowulf cluster in C or Fortran using a free MPI(message passing interface) implementation called MPICH. I h
    • you will need monkeys.

      LOTS of monkeys.
  • Apple (Score:5, Funny)

    by zzzmarcus ( 183118 ) on Tuesday July 08, 2003 @12:52PM (#6392820)
    Oh great... I can see Jobs wringing his hands already.

    "Now how am I going to make the G5's look faster than THIS?"
    • Re:Apple (Score:2, Funny)

      by Trigun ( 685027 )
      He just has to tell the apple fanboys that a G5 is, and they'll believe it.

      For his next trick, Jobs is going to walk on water.
    • Re:Apple (Score:2, Interesting)

      by Umrick ( 151871 )
      Rendevous will be used in 10.3 with Xcode to discover resources and distribute software builds across available 10.3 machines. If there's a perceived benefit to Apple, do you honestly think there's anything preventing the next version 10.4 from having distributed capabilities?

      You can already compile programs with LAM-MPI support, so in reality there is nada stopping you from building a Beowulf cluster of XServes. There may even be a compelling reason to use XServes over x86 boxes after XServers are updat
      • The point is, from the hardware side apple is lightyears away from machines like this. And then the OS must be capable of taking advantage/adapt to things like NUMA (which is not necessarily implement in this machines, but in many of this size), which I think OS X (or its BSD personality) is not able to.

      • Rendevous will be used in 10.3 with Xcode to discover resources and distribute software builds across available 10.3 machines. If there's a perceived benefit to Apple, do you honestly think there's anything preventing the next version 10.4 from having distributed capabilities?

        Back in the day, on the old black NeXT hardware running NeXTStep 3.3, there was an app called Zilla that could be used to distribute compute intensive jobs around a network of NeXT machines. They said that 100 NeXT Turbos was an even
    • Zilla. For OS X. Make a cluster of G5s. C'mon Steve, time to port Zilla to OS X.

      (more info: http://slashdot.org/comments.pl?sid=45647&cid=4722 113)
  • kernel sources? (Score:5, Interesting)

    by gladbach ( 527602 ) on Tuesday July 08, 2003 @12:53PM (#6392830)
    they going to release their kernel that allows them to globally share memory? or is it more of a hardware thing, than software?
  • by poptones ( 653660 ) on Tuesday July 08, 2003 @12:54PM (#6392836) Journal
    Now those obsessed geniuses have even more reason to forget to change the oil in their cars...

    (Inside joke for my ol' friends at ORNL...)

  • by mao che minh ( 611166 ) * on Tuesday July 08, 2003 @12:54PM (#6392840) Journal
    I wonder what kind of FUD Microsft and SCO will cook up to try to thwart this new display of raw power. McNealy seems intent on not only winning the Asshat award, but outright retiring it in his honor.

    It's funny that Microsoft always tries to downplay Linux's enterprise capabilities, when Linux has been scaled to far more power then Microsoft's best offering for years now. Windows 2003 is a clumsy, bloated, closed source chunk of green crap.

    • It's funny that Microsoft always tries to downplay Linux's enterprise capabilities, when Linux has been scaled to far more power then Microsoft's best offering for years now.

      RTFA. They are using this machine for research in the "sciences, clean energy management and production, environmental protection, and homeland security."

      It's not a web server, and it isn't demonstrating "enterprise capabilities." Windows has never been intended for, or used for, scientific computing on a large scale.
  • The best part about it is that you can actually run something on it, unlike SGi's older IRIX-based crap.... unless you like re-writing code to blow your nose, be glad this one is running Linux.
    • Um..

      I always liked Irix, and everyone I ever talked to who used Irix liked it. The GUI is about 500x more usable than the horrors of OpenWindows or CDE on Solaris.. bleugh.
      • by platypus ( 18156 ) on Tuesday July 08, 2003 @02:43PM (#6393978) Homepage
        I hated it, if it helps.

      • I always liked Irix, and everyone I ever talked to who used Irix liked it. The GUI is about 500x more usable than the horrors of OpenWindows or CDE on Solaris.. bleugh.

        I vastly prefer 4DWM to GNOME or KDE as well. I'm helping a coworker set up a Dell inspiron 7500 (P3-700) with Linux, and he immediately complained that KDE was far too slow. I switched to WindowMaker, and he immediately noticed the difference. This is a three-year-old machine, with tons of memory and a reasonable processor, and it crawl
        • I agree with this. Anyone that says X is slow, should sit in front of an older SGI sometime.

          4Dwm is snappy --even on a 150Mhz machine.

          IRIX is a nice place to work. Now that I am getting more into Linux, the little differences annoy me. SGI does do its part though. freeware.sgi.com is a very nice resource. Get any old SGI, point it there and download most of the good OSS built and ready for your machine.

  • ...now you get obscene frame rates on quake III while searching for those pesky pockets of natural gas!
  • by anzha ( 138288 ) on Tuesday July 08, 2003 @12:57PM (#6392872) Homepage Journal

    HPC Wire [tgc.com] had an article [tgc.com] that I referenced in my journal on 6/30.

    It's an interesting machine. I'd love to get one to play with. I'm sure our benchmarkers will have some even more interesting comments once they're done. Expect teething problems, folks. Systems of this size and complexity take time to break in.

  • lites (Score:3, Funny)

    by NetMagi ( 547135 ) on Tuesday July 08, 2003 @12:59PM (#6392885)
    makes me just wanna turn off the lights and look at all those LED's blinkin!
    • Re:lites (Score:5, Interesting)

      by CoolVibe ( 11466 ) on Tuesday July 08, 2003 @01:24PM (#6393118) Journal
      I've experienced it the other wat around once. At some previous $workplace, we had this humongous SGI Origin 3800 cluster. Due to a city-wide brown out, and due to the fact that we were just installing the diesel-powered generators, the thing had to survive for a couple of hours on the nobreak. Sure, all the lights in the building were out, but the behemoth was still churning. We (the venerable sysadmins) were trying to decouple a partition so we could hook up a console to ot to bring down the thing gracefully. Of course, that wasn't that easy.

      Suddenly the nobreak was all out, and the billion dollar machine went *poof* - down. Damage? A couple of SCSI disks, but of course everything was mirrored and had parity so even with the damaged disks, there was no data loss.

      Then (after a few hours) the powerfaillure ended, the lights went back on in the building, but the lights on the big cluster were still off. The other way round than you'd like to see. Although, when the building power was out, and the nobreak for the machine was active, it sure was a pretty sight. Although, with the impending doom, I didn't really have time to appreciate it.

      • Re:lites (Score:5, Informative)

        by Leebert ( 1694 ) on Tuesday July 08, 2003 @02:32PM (#6393827)
        the billion dollar machine

        What the hell kind of Origin 3800 do YOU have? ISTR ours (512-proc) was roughly $10M.
        • I was of course overreacting. The machine is expensive. The machine has 1024 procs, big fat arrays of scsi disks (think multiple multiple terabyte storage, raid5 and mirrored (can't say how much, it's too long ago, but it was heaps. Several room filling cabinets full of just SCSI disks), and really shitloads of RAM (think terabytes in total). It's probably not a billion dollars, but it's certainly quarter way if you count support contracts and upkeep.

          I was actually surprised that the thing managed to run

          • Re:lites (Score:3, Informative)

            by Anonymous Coward
            The machine has 1024 procs

            There are two 1,024-processor Origin 3000's in the world. One is in Eagan, Minnesota. The other is at NASA. The NASA machine is called chapman. It has 256 GB of RAM. Not terabytes.

            How do I know this? Because I'm sitting here looking at lomax right now.

            You're a... whaddya call it. Liar.
            • The NASA machine is called chapman. How do I know this? Because I'm sitting here looking at lomax right now.

              I have an account on turing. ;) In fact, I was through your datacenter back in February.
            • Nope, the Origin 3800 at this place [www.sara.nl] has 1024 procs. That's where I worked. You're behind the times, so to speak. Europe has big fast number crunching behemoths too.
              • Re:lites (Score:3, Informative)

                by CoolVibe ( 11466 )
                Oh, I found a little page on the sara website where it is clarified (can't get onto the intranet anymore, else I'd have mirrored some better specs). Anyway, more about TERAS here [www.sara.nl].
      • Re:lites (Score:4, Informative)

        by green pizza ( 159161 ) on Tuesday July 08, 2003 @05:27PM (#6395653) Homepage
        SGI Origin 3800 cluster

        Just to nitpick... most Origins are not clusters but rather one large single machine. It is possible to partition the machine in firmware and have each partition talk to others over the existing (and now unused) numalink interconnects... but it's much faster (even for plain MPI code) to just run the beast as one large single machine.
  • by panda ( 10044 ) on Tuesday July 08, 2003 @01:00PM (#6392893) Homepage Journal
    OK, so the moderators are on crack today. What's with all these obviously "funny" posts getting moderated as "insightful?"

    Guess it's time to meta-moderate!
  • Oops (RTFA) (Score:5, Informative)

    by Anonymous Coward on Tuesday July 08, 2003 @01:02PM (#6392914)
    The machine has 256 processors for 1.5 teraflops, not 64.
  • So... (Score:3, Funny)

    by Mipsalawishus ( 674206 ) on Tuesday July 08, 2003 @01:02PM (#6392915)
    How hard would it be to /. one of these things??
    • Processing power has very little to do with bandwidth.
    • Re:So... (Score:3, Insightful)

      by ocelotbob ( 173602 )
      Depends on the bandwidth into the machine more than anything else. Most /.ings, unless the database explodes into a shower of sparks, are limited by the bandwidth of the machine more than anything else. It'll be quite easy to /. it if it's only got a T1 or so. If it's got a 10Gb connection or two, I'd imagine that the system load wouldn't even be noticed.
  • Great.. so which option in the .config do I enable?
  • by jackDuhRipper ( 67743 ) on Tuesday July 08, 2003 @01:09PM (#6392989) Homepage
    What's that in bogomips?
  • by Anonymous Coward on Tuesday July 08, 2003 @01:14PM (#6393026)
    Perhaps this will finally get SGI's Open Source Software efforts in the spotlight.So far every other major hardware vendor has jumped the bandwagon making a lot of noise, and trying to get free publicity. SGI however has always quietly contributed large amounts of knowledge but always in a modest or even shy way (sometimes even publicly denying involvement, but working in secret :) ).
    In the meantime their additions have contributed quite a bit to open en free thinking in software, take OpenGL and open Inventor, or even to the kernel directly as with the XFS filesystem.
    I always liked this approach more than the hyping others have done with linux, but unfortunately this has kept them unadorned within the community. With the Altix cluster (as with their GNU/Linux workstations,which unfortunately failed) I think they have shown that they put their money where their mouth isn't.

    I think it's only fair that when we are talking about the large coorporate players in the OSS field SGI at least deserves a footnote for their efforts instead of just hammering exclusively on IBM,Sun etc..as the great backers.

    I know, I know. It's a coorporation, so they inherently put money over freedom, it's just something I noticed because of the lack of their name in any high-profile discussions, which I think is unfair.
    • One thing I've considered for some time is a "league table" of companies involved in "Open Source Software".

      The table would record the number of packages released, the number of patches, and the licenses used for each. Originally, I was going to make a four-way split - open-source packages or patches, and packages/patches for open-source OS'.

      From this, you could create some kind of scoring system, and thus compare the "open-sourceness" of companies. (From the above, it should be obvious that I consider

  • without SCO's help?
  • Setting one up now (Score:5, Informative)

    by jimshep ( 30670 ) on Tuesday July 08, 2003 @01:39PM (#6393280)
    We just got ours installed yesterday. I'm still installing software and am starting benchmarks. It's only the deskside version (12 cpus, 24GB RAM, 1TB disk), but still more powerful than the 4-cpu SGI Origins that we have been using.

    It is the first one that the regional SGI reps had actually installed, but since it is almost exactly the same as the MIPS-based origin 3000 servers (with the exception of the obviously different Itanium 2 cpus and supporting chipsets), they ran into almost no problems getting it online. I have also been suprised as to how many commercial codes have already been ported to the platform.

    The main reasons we purchased this machine is for the ease in parallelizing code and the floating point performance of the Itaniam 2 cpus. We're computational materials engineers and the less time we have to spend optimizing codes so that the nodes of a cluster are always kept busy and minimizing I/O bottlenecks gives us more time to concentrate on the theoretical issues.

    It runs RedHat 7.2 with some tweaks by SGI called SGI ProPack. The Propack modifications come on separate CDs, with the proprietary software on separate CDs from the open source software. So far, from the command line, everything works just like my PC. It's kind of strange running Linux on a >$100K machine, but it sure beats dealing with the annoying differences between IRIX and Linux. Now to see if it performs as well as we expect...
  • Non-Flash Links (Score:3, Informative)

    by frostman ( 302143 ) on Tuesday July 08, 2003 @01:54PM (#6393448) Homepage Journal
  • by twoslice ( 457793 ) on Tuesday July 08, 2003 @02:26PM (#6393777)
    'Cause it is surviving a /.ing with a Flash intro even!
  • Can it recompile its own kernel?
  • My gut reaction is that this isn't a cluster. A cluster is a network of independent computers collaborating on delivering some service.

    This is a parallell (super) computer. Key difference: all the processors share a single memory space with each other. Programs will run exactly as if this was a single (multitasking) computer.

    Most clusters I've worked on are just a bunch of computers with a fast network, using various protocols to synchronize their behavior ("Hey, node 19 isn't pinging, he must have died.

    • Back in my day, we used Connection Machines with 65536 1-bit processors!

      Do you (or anyone else here) know how a 1 bit processor works? I have read a bit about the CM, and always come across the "1-bit processor" thing - but never have found enough information on how a 1-bit processor really works?

      I have a feeling it is something like a single-instruction cpu (subtract then branch or something like that), but even more esoteric in practice. Furthermore, I tend to wonder how you write programs for such a mac

      • No, no, it's the data bus that's one-bit. Basically, you stream data to the processor over a serial line, either from memory or from another processor. There are a bunch of instructions. All the normal boolean operations, operations for streaming data aka NEWS, sleep-if, wake... No arithmetic operations, obviously.

        No links handy, sorry.

  • The Johannes Kepler University of Linz/Austria purchased an Altix 3700 (the first one in production use) in April. Here is a link (sorry, German only): http://www.news.jku.at/ARCHIVE/archivnewsroom/2003 maerz-april/newsroom/supercomputer.htm [news.jku.at]
  • by hellfire ( 86129 ) <deviladv.gmail@com> on Tuesday July 08, 2003 @03:20PM (#6394373) Homepage
    $500 for the scalp of anyone who says the words "Beowulf" and "cluster" in the same post in response to this article.
  • It seems unlikely that the SGI is the first Linux cluster with global shared memory. There are plenty of distributed shared memory systems in software, some of them open source. You can find a list here [uci.edu]. For most computations and most hardware, you are probably still better off with MPI or PVM rather than shared memory.

    Note also that there are several high speed interconnects for Linux clusters available from many different vendors, including InfiniBand, Gigabit Ethernet, FireWire, and Myrinet.
    • You can find a list here. For most computations and most hardware, you are probably still better off with MPI or PVM rather than shared memory.

      Note also that there are several high speed interconnects for Linux clusters available from many different vendors, including InfiniBand, Gigabit Ethernet, FireWire, and Myrinet.


      SGI systems (Origin and Altix) have massive interconnects that hold together the single-system architecture. They're fast for shmem-type shared memory apps, but also for MPI. In fact, SGI
      • The interconnects in most Origins and Altix systems are 3.2 gigaBYTE per second with extremely low latency.

        I think SGI is playing with the numbers there: while that is the limit for how much data you can push through a link, it seems unlikely that you get that kind of performance for arbitrary processor-to-processor communications or as aggregate bandwidth.

        In any case, very fast interconnects are not cost-efficient when they are not needed. 3.2GB is the memory bandwidth of a 400MHz bus motherboard. Mos
  • This does not seem to have been mentioned before:
    Niflheim at Danish University of Technology [fysik.dtu.dk]

A morsel of genuine history is a thing so rare as to be always valuable. -- Thomas Jefferson

Working...