Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Linux Software

New Linux Supercomputer Forecasts Rain 81

buzzcutbuddha writes "Linux PR has a press release about a new weather forecasting supercomputer running Linux built by High Performance Technologies, Inc. that will be unveiled on Wednesday by NOAA. There is even a phone number to call to tour the High Performance Computer Center. " (let's see if the trolls can be clever for a change ;) Anyhoo 276 nodes, but its costed $15M? Them must be some spendy nodes...
This discussion has been archived. No new comments can be posted.

New Linux Supercomputer Forecasts Rain

Comments Filter:
  • by FascDot Killed My Pr ( 24021 ) on Tuesday April 25, 2000 @04:50AM (#1111913)
    NOAA forecasts rain? That can't be good....
    --
  • Will this influence the sale of umbrellas? 1LC
  • When I find Slashdot in tons of trouble,
    Friends and colleagues come to me,
    Posting words of wisdom:
    "Let trolls be."

    As the moderating fast approaches,
    And grits are all that I can see,
    Somewhere, someone whispers:
    " Let trolls be."

    Let trolls be, Let trolls be,
    Let trolls be, oh, Let trolls be.
    Karma's dead and buried,
    Let trolls be .

    I used to post a lot to USENET,
    For gradeschool it worked flawlessly.
    Try using it for insight!
    Let trolls be.

    If you've just spent nearly 30 hours,
    Reaching for that +1 bump,
    Remember we moderate aslo
    Let trolls be.

    Let trolls be, Let trolls be,
    Let trolls be, yeah, Let trolls be.
    Flaming's not the answer.
    Let trolls be.

    Let trolls be, Let trolls be
    Let trolls be, oh, Let trolls be.
    Troll's, we keep the balance.
    Let trolls be.

  • Just kidding, I thought I'd beat the trolls to it.

    In any case, that does seem like a high price tag... by my calculations you could build a 1200 node cluster (using 8 node cubix boxes) for that kind of money...
  • What's the accuracy of the current technology? 75-80%?

    Anyways, I'm glad that the FSL is the first government lab to buy Linux systems. I'm wondering if they would have gotten any better results by using another version Of Unix or even a proprietary system.

    Is running SETI or RC5 on one of these practical also? They'd need to win in order to start paying back High Performance for the $15 million supercomputer ;)
  • by ElPresidente1972 ( 95949 ) on Tuesday April 25, 2000 @04:57AM (#1111918)
    Somebody please tell me they're using STORM Linux!
  • Actually, that is funny. I was about to post a similar comment.

    --

  • Holy crap. I misread, I thought that said $1.5 Million... I must change my estimate to 10000 nodes (subtracting some just for basic infrastructure)...

    My god, with that kind of power you could crack/render/spindle/mutilate anything in seconds.
  • but not in the news much is this article [linuxworld.com] from Linux World.
    I quote:
    [Incyte Genomics] now has about 20 farms with up to 200 processors each. Each farm behaves like a supercomputer, at about one-hundredth of the price -- or less.

  • This isn't good news, people. Weather forcasters have traditionally been wrong. All this advance means is that now they can be wrong faster. Using the miracles of distributed processing you can be assured within minutes that despite the fact that there's a big black cloud belching lightning 10 miles away the weather is still "95 and sunny" today.
  • In preparation for the forecast deluge, NOAA's Atmospheric Research Center (ARC) is fully staffed and stocked for over a month of independent operation.

    -- Bah! Trying too hard.
  • by Ho-Lee-Cow! ( 173978 ) on Tuesday April 25, 2000 @05:10AM (#1111924)

    Just because the machine runs Linux, doesn't mean that there is a free software solution to predict the weather. Let's be a tiny bit realistic about it: they built a BIG box, put a 'free' OS on it, and then had someone write unique, custom software for it. You and I aren't going to get our hands on this weather package anytime soon ;).

    By the time you count up the costs of that contract, I can readily see $15M. In fact, that figure is probably cheaper than if they had used, say, NT. Besides, absolutely nothing with the Government is 'free': defeats the the whole idea of pork barrel :)

  • I expect that they probably had to spend just a little money to write the software to run on that machine.

    Can I help them, and next year send in a couple of nodes instead of paying taxes?

  • Forecast calls for rain. In fact, scientists predict a 95% chance of cats and dogs; which, correcting for their poor forecasts in the past, means we'll be seeing frogs and locusts.

    Wha? Whaddya mean I can't skip to the next chapter?
  • $15M for 276 nodes. That works out to 54,347.83 per node (including networking, storage,etc...). Not exactly cheap. However looking at their site (which has a lot of missing pages) it looks like this thing is composed of Alpha boxes connected with fiber channel and some other goodies (like a big raid array data center). Alphas and fiber are not cheap so the price might not be so far off for (their claim) 3-4 Teraflops.

    The real question is this: If the same money were spent on, say, Athlon nodes connected with channel bonded fast ethernet (or even myrinet); could you get even more performance? I figure that you could build a cluster of stripped down Athlon-700's on channel bonded ether for around $2k per node including switches, etc. That would allow up to 7500 nodes (though I imagine that network bandwidth/latency would kill your performance at that scale). Hmmm...

  • by The Dodger ( 10689 ) on Tuesday April 25, 2000 @05:14AM (#1111928) Homepage

    Anyhoo 276 nodes, but its costed $15M?

    $54k/node does appear rather expensive at first glance, but let's bear in mind here that this is a HPC installation. That's "high performance", kids. Also, let's remember what it's purpose is: to "help researchers improve forecasts of severe weather such as thunderstorms, tornadoes,and winter storms, and ultimately, to save lives and property".

    Basically, this ain't a couple of 386's Beowulf'd together over 10BaseT in someone's bedroom and you can bet that this system ain't going to be using EIDE hard drives. In order to achieve the peformance, reliability and scalability which the NOAA would have specified for such a mission-critical syste, it doesn't surprise me that the cost per node is this high.

    Furthermore, this amount undoubtedly includes the two upgrades and maintenance over the contract period (three years plus)and that good old 24/7 4-hour response don't come cheap!

    All in all, I'd say that it's probably not that expensive after all.

    D.

  • by Genady ( 27988 )
    And I Interviewed for a SysAdmin possition there, had an offer even *SLAPS FOREHEAD*. Definatly some sah-weet stuff going on at NOAA for Linux folks, send in your apps!
  • Anyways, I'm glad that the FSL is the first government lab to buy Linux systems.

    Hardly the first, since NASA Goddard [nasa.gov] invented Beowulf.

    The press release says that they are the first to buy a "turn-key" Linux supercomputer.

  • Ah, but you... yes, you are very clever. It's nice of you to help out all the poor Linux folk who don't have access to Windows Calculator to perform those types of big calculations.
  • The stereotype is tired. Weather forecasting is one of the most mathematically and scientifically complex undertakings of modern science. Forecasting has improved dramatically over the past 5, 10, and 20 years.

    95 and sunny usually leads to storms anyway.
  • They must have charged them for labor... Like giving birth to each alpha or something.
  • Depends upon what period you want to be accurate: weather 10 seconds from now, pretty darn accurate, 10 days from now not so accurate, a month... well throw a dart at the guessing board. This will allow them to add more variables into the equation, but I don't think it will get show the public any noticeable differences.

    You can throw as big of a machine as you want at these problems and you will only marginally increase it's effectiveness, this is all due to chaos theory. There are so many items that seem insignificant (I seem to remember the phrase insignificantly significant from a professor somewhere) that can not be accounted for; that makes any long-range forcasting of weather impossible. Extremely small items added into an equation that at first glance would seem to only add maybe a .0001% variation can infact greatly change the results as you increase the period that the equation is used with. i.e. For small periods it doesn't add much variation but for longer periods it adds significant variation. There is no possible way for anyone to take in all these suttle complexities: if a raindrop rotates clockwise after it hits the ground and hits another one on it's way down moving it's position how does it affect weather 6 months from now?
  • That CMDRTaco, He cracks me up. Clever trolls. Whew (wiping moisture from eyes) I'm sure THATS gonna happen. Costed was pretty funny too...I wondered what he thinked when he writed that.


    Apartment6 [apartment6.org]
  • by slashdotsux ( 34410 ) on Tuesday April 25, 2000 @05:25AM (#1111936)
    You might be able to do RC5 quickly with a bunch of cubix boxes, but to get real work done, you often need a good interconnect. 100baseT just doesn't cut it from a latency or bandwidth perspective. Later in the press releases, they mention that they've partnered with Myricom. I presume that a big chunk of the money went to Myricom for a large Myrinet interconnect (>1Gbit/second, programmable NICs, ultra-low latenccy). Also, they mention a fancy storage system; depending on the size and performance, a good storage system (many drives, all hooked up to the myrinet) can cost a bunch of money.
  • Um.. when will you figure out that you're wasting your time? By making comments like that one, and using you life like you are, you're just showing your immaturity. Why not, for once, spend your time doing something constructive rather than tearing things down?

    I mean, come on! Build something, compliment someone, smile, contribute. Trolls only exist because they, for some reason, get pleasure from annoying people. AC's get moderated up when they have something good and on topic to say. Get over your little childish views and start doing something with your life.

  • Well...
    Let me see now. I would think that the most calculations would be floating point stuff. (Kind of like games =) )

    But alpha is real 64 bit computer, when playing around with lot a highprecsion floating point that really really helps. At least my intution tells me that. But anyway it would be really fun to play around with 7500 athlons, to bad that they ain't smp yet.
  • It's a sterotype that's often true! Beyond about 72 hours, most weather forcasting falls apart. Even massively-huge events like hurricanes can't be predicted beyond a day or so as to where they'll land - we're reduced to probabilities.

    I stand by my statement that weather forcasting over any length of time is still a shaky science - even if there's alot of people working on it.

  • The real question is this: If the same money were spent on, say, Athlon nodes connected with channel bonded fast ethernet (or even myrinet); could you get even more performance? I figure that you could build a cluster of stripped down Athlon-700's on channel bonded ether for around $2k per node including switches, etc. That would allow up to 7500 nodes (though I imagine that network bandwidth/latency would kill your performance at that scale). Hmmm...

    But would you get any kind of support for that? You can bet they got a hefty service contract with on-site Field Engineers rolled into that price. And you can't get anywhere close to the throughput these machines are getting in a $2k Athlon.


    --

  • by Greg Lindahl ( 37568 ) on Tuesday April 25, 2000 @05:51AM (#1111941) Homepage

    This contract includes 2 substantial upgrades; this is just the initial installation. The AlphaLinux cluster (yes, connected with Myrinet) is most of the initial equipment. There's also a tape robot from ADIC with 70 terabytes of tape (1400 tapes) and 20 tape drives, and a storage area network (SAN) using CVFS, a SAN filesystem being ported to Linux because of this contract.

    The main software used on the system is actually all free: Linux, the PBS batch queue system, mpich as modified by Myricom for MPI, and the SMS scalable modeling system, developed at FSL. FSL has demonstrated some of their software scaling efficiently up to around 100 nodes. Limits in scalablility, the Alpha's superior floating point performance, and Compaq's great AlphaLinux compilers are the reason we used Alphas.

  • Weather forecasting ? All right, I'm sure it's useful, but let's look at the more interesting uses this technology may have.

    As everyone knows, forecas^H^H^H^H^H^H^Hresearch agencies like Gartner Group have been using paraller supercomputers for the predictions. However, without cheap equipment that Linux makes possible, their predictions have been less than spectacular, because they haven't been able to include all the important factors - like the antitrust coefficient, PocketPC parabolism, /. effect and Natalie Portman.

    Now that they can be taken into account, let us see what some of the older predictions might have looked like.

    In March 1999, IDC predicted that Linux will 25 % each year until 2003. The real result, of course, is that Linux will grow without bounds until it hits /. barrier (this is known as the /. effect). An immediate conclusion is that Linux will run all computers by the end of 2003 - if IDC were to run this forecast again, they would tell you this, except they don't want to create wide-spread panic.

    There are been a couple of surveys that indicate a lot of companies would like to start using Linux instead of Windows. This is completely false. The more powerful forecasting engine would consider the antitrust coefficient and find that all these people have been paid by Microsoft to speak favorably of Linux, so that the trial would proceed to the right direction.

    Quite recently, some analysts predicted that PocketPC is no 'Palm killer'. What futility ! Careful forecasts would show that since PocketPC can run Quake (well, at least it can play mp3s) and Palm cannot, PocketPC would beat Palm on Quake death match every time. Palm is dead. Case closed.

    Any apparent inconsistencies concerning my predictions are caused by Natalie Portman.

  • Isnt the NASA JPL running a few beowulfs ?



    "THERE ARE BETTER THINGS IN THE WORLD THAN ALCOHOL, ALBERT"-Death
  • Yeah, but even two full days of accurate weather would be nice... heck, 18 hours would be good.

    Like this past weekend:
    No.... it won't rain tonight (3pm forecast). Nearly a torrential downpour that night (about 12 hours later).

    I'm not as worried about a week from now as tonight - can I leave the windows open to air the place out or am I going to have a couple inches of water inside?

    Even *I* can tell you what it's going to be 10 seconds from now... barring an apocolypse...
  • Yes, it can predict the rain, but can it run in William Scott emulation mode? Will there be some daemon that notifies the administrator of birthday of a 103 year old lady in Texas? Does it have tupee error correction? We want to know these things, dammit!
  • I wonder how many more nodes its going to take before we're not just simply forecasting weather and actually get around to controlling it.

    Can't imagine what kind of damage a scr1pt k1dd13 could do with r00t access :)

  • by Corbet ( 5379 )
    The system may seem expensive, but there are a few things to consider here:
    • The cost is for the final system, which includes an eventual replacement of the current nodes and the addition of lots more of them.

    • Don't forget the I/O subsystem as well.

    • Don't forget the onsite engineer
    This system is a true supercomputer, and will carry that sort of price tag.

    I can't resist pointing out that LWN wrote an article about this cluster [lwn.net], complete with pictures....

  • Mmmmmmm... meta-beowulf cluster :)

    --
  • by ian.layton ( 159953 ) on Tuesday April 25, 2000 @06:12AM (#1111949)
    First let's me qualify myself...I was a meteorology major for a few years back in the mid 90's. With that said:

    There is very little chance in the foreseeable future that weather predicition will be 100% correct, no matter how fast the computer get

    One of my faviorite quotes along this line:

    Why is Forecasting so difficult?

    Consider a rotating spherical envelope of a mixture of gases -- occasionally murky and always somewhat viscous.

    Place it around an astonomical object nearly 8000 miles in diameter.

    Tilt the whole system back and forth with respect ot its source of heat and light.

    Freeze it at the poles of its axis of roation and intensely heat it in the middle.

    Cover most of the surface of the sphere with a liquid that continually feeds moisture into the atmosphere.

    Subject the whole to tidal forces induced by the sun and a captive satellite.

    Then try to predict the conditions of one small portion of that atmosphere for a period of one to several days in advance.

    This quote came from a government manual for the NWS. This quote doesn't even touch the lack of quality observations in the atmosphere along with the unkown physics involved with it all.

    Yes...it has been improving over the years. Going into the 80's, the hits were generally 75% for 24 hours out, 50% for 3 days out, and just above a crap shot for beyond that. Going into the 21st centruy, it's generally running about 90% for 24 hours, 75% for 3 days, and 50% for 5 days.

    Even after studying it for years, I'm still amazed that they can get it to nearly 90% for 24 hours off.

    Congrats if you made it this far.

    Ian Layton

  • Yeah, I read Jurassic Park too.
    *grin*
  • I must change my estimate to 10000 nodes (subtracting some just for basic infrastructure)... My god, with that kind of power you could crack/render/spindle/mutilate anything in seconds.

    For weather prediction you need far more than basic infrastructure. It's a problem that requires LOTS of communication, and your basic ethernet style connectivity just isn't going to cut it. I would be interested to find out what portion of the cost is attributed to communication, but I wouldn't be surprized if it were most of the money.

    In contrast, cracking requires very little communication.

  • The New Linux Supercomputer Forcasts Rain?

    I'd better get my umbrella!

    --- Speaking only for myself,

  • Sure forecasting weather is cool, but when you get down to it, it doesn't matter if it is correct 100% of the time. Weather occurs whether you know what it is going to be like or not (theoretical discussions about knowledge of the future changing the future aside). This is therefore a waste of massive CPU cycles.

    If there is a phone hooked up to it though, I'd like to call it up and ask other, less mundane, questions. Eg:

    • What's the answer to Life, the Universe, and Everything?
    • Why do I like dried leaves in water? (tea)
    • What's 6 x 7?
    Cap'n Bry
  • Nope that would be the bright orange Chaos Theory book with the fractals on the front, now who wrote & who published I can't remember right now, but it's sitting on the shelf at home. Good high-level read for general public, got a blue book from college (title "Advanced Analysis" maybe) in a box somewhere that goes into the nitty gritty goodness.

    Course, I don't remember that part in Jurassic Park (maybe when he was talking about the frogs???), but not remembering wouldn't surprise me; as you might be able to guess, I need to upgrade my memory unit in my brain or at least my access algorithm. I never can remember titles, authors, names... :)
  • Oh yea? Well I saw the movie
    Top that!
  • by jbarnett ( 127033 ) on Tuesday April 25, 2000 @07:01AM (#1111956) Homepage

    You and I aren't going to get our hands on this weather package anytime soon ;)"

    Contray to popular belief they did release these weather package under the GPL, I have it running on couple of 386's Beowulf'd together over 10BaseT in my bedroom.

    It is pretty decent software to, do you release that it has never predicted rain, and you know what? It has yet to rain in my bedroom, amazing software.

    Tommorrow there is no chance of rain in my bedroom and the temparture will be around room temparture thoughout the entire day! Great weather I am having here

    Also since it was release under the GPL, a couple hackers have teamed up with Dr. Evil to create a weather control machine, that not only predicts the weather, but can alter it on the fly! GPL, you can do amazing things with it, including, but not limited to Total World Domination by bringing the United Nations down with a hail strom from uh hell. When Linus made a joke about gaining Total World Domination though the use of free (as in speech) software, he was serious!

    On a site note, if you check your preferences Nate has made a slashbox that displayes in real time the number of nations that have sumbited to Dr. Evil and his weather machine, recently they have gotten Russia and China (who would of thought), the part I found assuming was that Canada was the first to go...
  • What's the accuracy of the current technology? 75-80%?

    Depending on what spot of earth you live, simply assuming that it will rain/not rain gives you higher accuracy. Assuming that tomorrow will be the same as today is very accurate in a lot of places.

    If on the other hand you have a program that in a very desertic region can predict rain with 10% of accuracy, I would argue that it is a superior method.

    rmstar

  • The reason they can often be so wrong is that weather patterns are processed in areas of 100 square miles (it might be slightly + or -). Thus:

    a) Weather predictions may not be derived from representative data.
    b) Weather predictions may only be accurate for a portion of the area the prediction covers.

    It's very possible for a weather prediction to say "90% chance of rain" and areas within that prediction not get a drop and others get drenched. IBM tested out a weather computer of their own for the Summer Olympics in Atlanta that could predict weather down to something like five miles square. Obiviously much more accurate.

    -ryan

    "Any way you look at it, all the information that a person accumulates in a lifetime is just a drop in the bucket."

  • Lots of government labs are running Beowulfs. But the press release says that this is the first "turn key" Linux-based supercomputer. That's quite a difference.


  • I would be interested to find out what portion of the cost is attributed to communication, but I wouldn't be surprized if it were most of the money.
    Only a minority of the cost is Myrinet. People tend to think it's far more expensive than it actually is. If we had used Intel or AMD processors, and still had 1 gigabit of bandwidth per processor, then Myrinet would be more but still not a majority.

  • That's what the on-site engineer does, answers questions like yours.

  • What about reliability? If I am building a cluster of hundreds of computers, I don't want bottom-of-the-line boxes that fail all the time. Something like this requires reliability engineering to make sure the system's MTBF is acceptable.
  • bad spellingss.spellingss, it isss, my preciousss... *gollum*

  • Interesting.

    Greg Lindahl, who is obviously someone on the scene for this project has posted at least twice to this article that I can see with some very informative, insider, information. His scores have been modified up, but he isn't getting any comments.

    Slashdot readers, here you have a resource who is willing to share relevant information about the topic at hand, ie, he knows sh*t. Yet, you are not taking advantage of that. Instead, you are doing your silly little arguments, speculations, teasings, trollings, whatever, that have nothing to do with the facts presented by Greg.

    Just goes to show, many Slashdot posters are not interested in relevant intelligent discussions.
  • That's "Willard Scott", not "William". Funny post though, gotta admit.
    T
  • Here in hungary there was some big news about the Computer and Automation Research Institute of the Hungarian Academy of Sciences [sztaki.hu] linking 50-60 PCs into what they called a supercomputer. I looked on their site, but couldn't find out what OS these 'client' PCs actually run. Linux is popular, especially at universities, so they could be running that... They have a 'visual programming environment' [sztaki.hu] that aids development of applications taking advantage of the distributed & parallel system. Quote: "It is aimed at creating a professional graphical programming environment for supporting the development cycle of parallel and distributed programs." I myself am not familiar with distributed systems, so I don't know know how advanced it is, but I like seeing things like this in Hungary!

    -
  • Comment removed based on user account deletion
  • CHAOS
    MAKING A NEW SCIENCE
    By James Gleick.
    (A Penguin book nonetheless!)

    I would completely recommend this book to anyone interested in the subject.

    I couldn't put this one down.
  • The weather has non-linear dynamics, and has been shown to be chaotic (have strange attractors). It is therefore theoretically impossible to forcast the weather very far ahead, even with unlimited computer power, regardless of how accurate the models are.
  • Neat stuff!

    Can you point me to more info about the SAN?

    Is it "just" a bunch of fibre channel hooked up to boxes with myrinet and FC cards?


  • I think not. I seriously doubt they spent over $54,000 per node, even if the nodes have 2GB RAM and Ultra160 RAID5 disk arrays. More likely, they spent a great deal of money on high speed networking equipment (possibly fibre switches). Don't ya think?

  • ... Going into the 80's, the hits were generally 75% for 24 hours out, 50% for 3 days out, and just above a crap shot for beyond that. ...
    Hmm.. You don't mean it was worse than 50% after 3 days? That would be sooo easy.. :) Then simply the opposite has higher probability, and we can predict it easily... :)
  • NOAA is, strangely enough, the first introduction I ever got to the world wide web. It was 1993 I think, and I was on a tour there as a seventh grader, and some guy gave our group a demo of Mosaic, letting us try surfing the web. Man, I thought that was just so cool...

    --
    grappler
  • I don't get it.. can someone explain why this is funny?
    --
  • Hmm.. You don't mean it was worse than 50% after 3 days? That would be sooo easy.. :) Then simply the opposite has higher probability, and we can predict it easily... :)

    That would make sense if there were only two possibilities, (ie rain or shine) and you just had to pick one. But that's not what he meant.

    --
    grappler
  • Think Noa's Ark :-)

    --
    grappler
  • There is very little chance in the foreseeable future that weather predicition will be 100% correct, no matter how fast the computer get

    Change that to no chance and I'd agree with you. It turns out that the fundimental equations of motion for the atmosphere are unsolvable. That means that computer models need to be based on equations that are already approximate. Further, models divide the atmosphere with a three dimensional grid. The finer the grid, the better the forecast, the faster the computer needed to run the model in a timely manner. But no matter how fast the computer, models will always have grids and will always be approximate. Then there will be rounding errors in the floating point calculations, approximations to prevent small anomolies from propigating through the model, and bad, incomplete and under-representative input. This stuff makes rocket science look easy.

  • by Rain ( 5189 )
    Okay, why is the NOAA stalking me? I mean, sure, I complain about the NOAA sometimes, but when they start forecasting my actions, I have to get a little worried.. Of course, I do find it a little mysterious that they're announcing the fact they're stalking me... The government works in mysterious ways, I suppose!

    Ben Winslow..........rain@bluecherry.net
    bluecherry internet..http://www.bluecherry.net/
  • The real question is this: If the same money were spent on, say, Athlon nodes connected with channel bonded fast ethernet (or even myrinet); could you get even more performance? I figure that you could build a cluster of stripped down Athlon-700's on channel bonded ether for around $2k per node including switches, etc. That would allow up to 7500 nodes (though I imagine that network bandwidth/latency would kill your performance at that scale). Hmmm...

    You got it all wrong. What drives such a purchase is: How many CPUs can my app use advantageously? If the app scales to a few hundred CPUs, but not to thousands, there is absolutely no point to build a equivalent (in terms of theoretical Teraflops) system from thousands of cheaper CPUs. You absolutely want a few hundred of the fastest possible chips.

    As to interconnects: For many MPI-based codes you don't want to go over ethernet or any other TCP/IP-based interconnect. You want low level protocols over low latency/high bandwidth interconnects. TCP/IP really sucks when it comes to latency, no matter how many channels you bond together.

  • I'll use this as a good excuse to answer the "why don't they just use 10-base-T or stripped down PC mobo's" questions people have been asking.

    Once you start trying to solve problems that require large amounts of inter-node communication, low latency and high speed interconnects become your limiting factor. Traditional supercomputers have massive interconnects. I have heard 45 gigabits/sec per Alpha (or per router board, can't remember which) in a Cray T3E. The new machine that the project I'm on at SGI is working on has 1.6GigaBytes/sec off each node (4 MIPS or Itanium CPU's/node) for inter-machine memory access plus I/O. 10-base-T clearly doesn't cut it. Things like Myrinet have gone a long way towards closing the gap with traditional supercomputers, but they have also dramatically bumped up the cost of your network. GSN is a prime example of this.

    Now, for why you don't want a cheap mobo. Just as important (perhaps moreso) is the bandwidth from the processor to memory. The thing about working with huge datasets is that cache doesn't really help you. If you are about to look at 4 gigs of data sequentially, you aren't going to cache a whole lot. Therefore, your memory has to be capable of streaming a very large number of reads and writes to the processor. Remember that it doesn't really matter how fast your processor is if it spends half its time stalled waiting for memory accesses to complete. One of the goals in all the old Cray products was to never have your processor sitting idle. The memory could handle a constant stream of data going both in and out (ie, one memory read and write per cycle). Your standard cheap PC mobo just can't do that.

    I'll end with my sales pitch for traditional supercomputers - by the time you buy supercomputer class nodes and a supercomputer class interconnect, even if it's built from comodity components, your costs will approach a traditional large system. You also don't get the advantages of having a single system image (I use a single image 512p Origin 2000 on a regular basis) or even direct memory access from one node to another (the project I'm on at SGI is to break large systems into multiple images, but allow them to share user memory. That way if one panics, the worst that happens on the other side is the loss of the user ap that was sharing the memory). This is why people are still buying big iron. On the other hand, we're starting to sell Linux clusters to the people who don't require the massive bandwidth.

  • Frost Post!!
    Sorry couldnt Resist
  • Too bad Apple G4's don't have better memory bandwidth. Because if NOAA needs high precision, it can do 128-bit floating point math. Not to mention its peak of 3.6 GigaFLOPS (for the 500MHz version).

  • The storage area network hardware is the usual DDN fibre-channel RAID combined with Broacade FC switches. That's not that exciting.

    The software is the interesting part. It's the "CVFS" filesystem, which is from ADIC. They ported this filesystem to Linux for the FSL bid.


  • I'll end with my sales pitch for traditional supercomputers

    Please don't. We beat SGI's machines in the bid, and this machine provides both higher bandwidth than any SGI Origin machine (300 gigabits bisection bandwidth), and it also does provide a single system image for this customer, who only runs MPI programs. So numerous parts of your comment are wrong.

  • I'll pick a few nits here. First, on what statistics did you beat SGI's machines on (ok, yeah, I know that that's probably at least partly confidential info but I figured I'd ask anyway :) I guess it would not surprise me if it were price/performance, but it would surprise me if you beat us on things like MPI latency. The last I heard, the O2k was an order of magnitude better than Myrinet on IA32 Linux. I'd expect Alpha Linux to fare better, though. The biggest nit I'm going to pick is your assertion of running a single system image. That is true only if you can migrate processes between nodes in the cluster or transparently change your interconnect fabric to keep nodes running the same job physically close. (Assume a 512p machine or cluster) What if I allocate 128 processors for job A, then start 256p job B followed by 128p job C. Jobs A and C finish, but their processors are on opposite sides of the cluster. Now I start Job D which needs 256 CPU's. Unless you can reorg your interconnect the way a job can move on a single machine, you will end up with fragmented processors for Job D. That'll hurt your latency. Also, not everyone runs only MPI. Shared memory jobs are not going to be too easy on a cluster, unless you have distributed shared memory. Further, it's not exactly an SSI if the sysadmin has to install the oS on every node or has a seperate console connection to every node. The point is, if you provide a "single system image", you provide it *only* to the users of MPI, not to the sysadmin.

    I should also point out that SGI is not the only company that makes traditional supercomputers. I'd like to see how your cluster stacks up against a T3E or an SP2. Probably not quite so well.

    Finally, the Origin is about 5 years old. The next machine will be an order of magnitude better.

    Now, I'm not trying to say that clusters suck for all applications. They just aren't the solution to *every* problem, as a lot of people claim they are.

  • The cost of $54,000 per node is not so surprising when you consider they went with Compaq boxes.
    Of course as they still have a stranglehold on quad CPU and up Alphas it is less surprising.
  • Who azz do I have to kiss to get a gov contract

  • First, on what statistics did you beat SGI's machines on
    We beat SGI on performance on the customer's actual codes. If you have 1/10 the MPI latency and your machine costs 3 times as much, and the customer's codes don't get much of a benefit from reduced latency...
    The biggest nit I'm going to pick is your assertion of running a single system image. That is true only if you can migrate processes between nodes in the cluster or transparently change your interconnect fabric to keep nodes running the same job physically close.
    You're pretty confused about what a "single system image" can be to different people. Try reading Greg Pfister's book. By the way, Myrinet's CLOS topology is good enough that it doesn't matter where in the machine a job's processors are. That's an important factor simplifying the software that the FSL machine needs to get high performance. FSL tested for inter-job contention, and I suspect SGI flunked. The machine they bought had near-zero inter-job contention.
    Further, it's not exactly an SSI if the sysadmin has to install the oS on every node or has a seperate console connection to every node.
    We provide tools that give the sysadmin a single system image, too. There's nothing new there; people administering large clusters have had that for years.
    Now, I'm not trying to say that clusters suck for all applications. They just aren't the solution to *every* problem, as a lot of people claim they are.
    I never said that clusters were the solution to every problem. But a cluster was a solution to FSL's problem.
  • So if in the middle of the summer, the weather forecaster said it is NOT going to snow this week, you would be happy? Even if he is right?
  • All of this information has been released by the NOAA as a coverup of their true operation. The 15 million dollars went into the purchase of a unique Pentium 90, circa 1995. This pentium 90 processor has been used on distributed engineering projects continuously for 5 years now, and in that time its health has deteriorated significantly. This has been a benefit for its new job at the NOAA. The arthritis in its 2,000,000 transistors is SO sensitive, it is able to accurately predict weather systems months in advance. This is a great step over their previous computer, a cray T90. The cray ran several RC/SETI/Mersenne prime distributed projects, games, and a web site. The forecasting was done by its operater, 95 year old Joe Blow. His mere 206 bones and patented "trick knee" could only predict weather 3 days in advance.

    Joe will be retiring at the completion of the new computer installation in the facility. Plans are underway for a party in which he will receive a gold barometer

  • I once worked at the National Center for Atmospheric Research, which is just up the hill from NOAA in Boulder, and knew several people who worked at NOAA. They swore that the first question everybody asked when given a tour was "Where's the ark?" Since that was my first question, I couldn't argue. Anyhow, the "look out for rain" joke first surfaced about five milliseconds after the agency was named, and has ever since popped into the mind of everyone who first hears of the agency.


    Always and inevitably everyone underestimates the number of stupid individuals in circulation
  • Actually, that total cost is for the entire multi-year contract, with expansion up to ~ 1024 nodes. . .the first-year rollout cost only a small fraction of the total tab.

    You can go to HPTi's web site [hpti.com] and check out the cluster section for more information on the complete cluster project.

I've noticed several design suggestions in your code.

Working...