Forgot your password?
typodupeerror
Linux Software

Answers About The New NOAA Massive Linux Cluster 69

Posted by Roblimo
from the scaling-scaling-over-the-bounding-network dept.
On May 23 we requested questions for Greg Lindahl, chief designer of the new NOAA Forecast Systems Laboratories massive Alpha Linux Cluster. Here are his answers. Fascinating stuff for people interested in big-time parallel computing.

Who Else?
(Score:4, Insightful)
by Alarmist

You've built a large cluster of machines on a relatively pea-sized budget.

Are other government agencies going to duplicate your work? Have they already? If so, for what purposes?

Greg:

There are a lot of government agencies building large clusters, such as the Department of Energy's Sandia National Lab, which has the 800+ processor CPlant cluster today, with another 1,400 processors on the way. Like FSL, they use their cluster for scientific computing. The well-known Beowulf clusters started within NASA, another U.S. government agency.

However, the Forecast Systems Lab (FSL) system is a bit different from these other clusters: it's intended to be a production-quality "turn key" supercomputer, and it contains all the things supercomputer users are used to, such as a huge robotic tape storage unit (70 terabytes of tapes), and a fast disk subsystem (a bandwidth of 200 megabytes/second.) The FSL system is also much more reliable than your average cluster -- in its first three months of operation, it was up 99.9% of the time. During that time we had quite a few hardware failures (due to a power supply problems), but no work was lost, because of our fault-tolerant software.

Beowulf in General
(Score:4, Interesting)
by BgJonson79

How do you think the new wave of Beowulf clusters will affect all of supercomputing, not just forecasting?

Greg:

The kinds of problems that scientists solve have different computational needs. In the mid 1970's, the most cost effective machine to use for just about any problem was a Cray supercomputer. These days, desktop PCs far are cheaper per operation than the "big iron", so that's why this interest in clusters has sprung up. The availability of production-quality commodity clusters like the FSL machine is a new development in the field.

IBM already sells IBM's idea of a commodity cluster; it uses IBM's RS/6000 business servers as building blocks. I think commodity clusters can deliver far more bang for the buck as an IBM SP supercomputer, but then again I am a cluster evangelist.

In the beginning...
(Score:5, Interesting)
by zpengo

How did you come to be the project's chief designer? I'm curious to know the background of anyone who gets to work on such an interesting project.

Greg:

Well, let's see: I'm a dropout from an Astronomy PhD, and for fun I dress up in funny clothes (I'm the one in yellow) and play the hurdy gurdy. I've only taken one computer science class since I started college. I assure you that you're never going to meet anyone much like me in this field.

Seriously, I've worked in scientific computing for quite a while, and I've had a chance to work with a lot of people and learn from them. I was also helped quite a bit learning about distributed systems while working on IRC and later, the Legion distributed operating system. The art of designing a system like this is understanding the customer's needs, understanding what solutions are possible, and understanding what can actually be delivered, be made reliable, and hit the budget.

In addition, it's worth pointing out what this sort of project involves. Most of the interesting development parts are done by other people. Compaq designed the Alpha processor, and they and legions of Linux hackers provided Linux on the Alpha. Compaq supplied their extremely good compilers (FSL mostly uses Fortran.) Myricom supplied the interconnect and an MPI message-passing library which was optimized for their interconnect. HPTi provided the software glue that turned all this into a complete, fault-tolerant system. Without all these great building blocks, we would never have been able to produce this system.

The Future of the Control Software
(Score:5, Interesting)
by PacketMaster

I built a Beowulf-style cluster this past semester in college for independent study. One of the biggest hurdles we had was picking out a message passing interface such as MPI or PVM. Configuring across multiple platforms was then even worse (we had a mixture of old Intels, SunSparcs and IBM RS/6000's). What do you see in the future for these interfaces in terms of setup and usage and will cross-platform clusters become easier to install and configure in the future?

Greg:

We provided an easy-to-use set of administrator tools so that the Forecast Systems Lab (FSL) cluster can be administered as if it were a single computer. This is a fairly difficult to do if you have a big mix of equipment, but the FSL system will never become that complex. There's already been a lot of development of programs for administering large clusters of machines; they just tend to not get used by other people. I'll admit that I'm part of that problem; I took some nice ideas from other people's tools, added some of my own, and re-invented the wheel slightly differently from everyone else.

Beowulf Alternatives?
(Score:5, Interesting)
by vvulfe

Before deciding on a Beowulf clusters, what different options did you explore (Cray? IBM?), and what motivated you to choose the Beowulf System?

Additionally, to what would you compare the system that you are planning to build, as far as computing power is concerned?

Greg:

The company I work for, HPTi, is actually a systems integrator, so we didn't decide to go out and build our own solution until we had checked out the competition and thought they didn't have the right answer. For the computational core of the system, Alpha and Myrinet were much more cost effective than the Cray SV-1, the IBM SP, and the SGI O2000. A more cost-effective machine gives the customer more bang for their buck.

I'd compare the system that we built to the IBM SP or the Cray T3E, as far as computing power is concerned. Both are mostly programmed using the same MPI programming model that FSL uses, which is the main programming model that we support on our clusters.

Biggest whack in the head?
(Score:5, Insightful)
by technos

Having built a few small ones, I got to know quite a bit about Linux clusters, and about programming for them. Therefore, this question has nothing to with clusters.

What was the biggest 'WTF was I thinking' on this project? I'd imagine there was a fair amount of lateral space allowed to the designers, and freedom to design also means freedom to screw up.

Greg:

We actually didn't make that many mistakes in the design. We had some wrong guesses about when certain technology was going to be delivered -- the CentraVision filesystem (more about that below) for Linux arrived late, and we had to work with Myrinet to shake out some bugs in their new interconnect hardware and software. Our biggest problem with our stuff was actually getting the ethernet/ATM switches from Fore Systems to talk to each other!

Imagine ...
(Score:4, Interesting)
by (void*)

... a beowulf of these babies - oh wait! :-)

Seriously, what was the most challenging of maintenance tasks you had to undertake? Do you anticipate that a trade off point where the number of machines makes maintenance impossible? Do you have any pearls of wisdom for those of us just involved in the initial design of such clusters, so that maintaining it in the future is less painful?

Greg:

Hardware maintenance of the FSL machine actually isn't hard at all. If a computational node fails, we have a fault tolerance daemon which removes the failed node from the system and restarts the parallel job that was using that node. The physical maintenance of a few hundred machines actually isn't so bad; these Alphas came with three-year on-site service from Compaq. (Hi, Steve!)

More interesting than hardware maintenance is software maintenance. You can imagine how awful it would be to install and upgrade 276 machines one by one. Instead, we have an automated system that allows the system admin to simultaneously administer all the machines. We suspect that these tools could scale to thousands of nodes; after all, they're just parallel programs, like the weather applications that the machine runs.

Question about maintenance.
(Score:5, Interesting)
by Legolas-Greenleaf

A major problem with using a beowulf cluster over a single supercomputer is that you now have to administer many computers instead of just one. Additionally, if something is failing/misbehaving/etc., you have to determine which part of the cluster is doing it. I'm interested a] how much of a problem this is over a traditional single machine supercomputer, b] why you chose the beowulf over a single machine considering this factor, and c] how you'll keep this problem to a minimum.

Besides that, best of luck, and I can't wait to see the final product. ;^)

Greg:

You haven't described a problem, you've described a feature.

We've provided software that allows administration of the cluster as if it was one machine, not many. This software also allows FSL to test new software on a portion of the machine, instead of taking the whole thing down. The software on the machine can also be upgraded while the machine is running, instead of requiring downtime.

Since the hardware is fairly simple, it's actually quite easy to find a misbehaving piece of hardware. And in this kind of system, a hardware failure only takes out a small portion of the machine.

For example, on an SGI O2000 or similar large shared-memory computer, a single CPU or RAM chip failure takes out the entire machine. The interconnect on an O2000 is not self-healing like the interconnect we used, Myrinet. These features make a cluster more reliable than a "single machine".

Why alpha?
(Score:5, Insightful)
by crow

Why did you choose Alpha processors for the individual nodes? Why not something cheaper with more nodes, or something more expensive with fewer nodes? What other configurations did you consider, and why weren't they as good?

Greg:

We did a lot of benchmarking before settling on Alphas for this particular system -- in general we're processor agnostic, happily using whatever gives the highest performance for each customer. We could have bought more nodes if we had gone with Intel or AMD, but the total performance would have been much lower for this customer.

The Future of Scientific Programming?
(Score:5, Interesting)
by Matt Gleeson

The raw performance of the hardware being used for scientific and parallel programming has improved by leaps and bounds in the past 10-20 years. However, most folks still program these supercomputers much the same way they did in the 80's: Unix, Fortran, explicit message passing, etc.

You have worked in research with Legion and in industry at HPTi. Do you think there is hope for some radical new programming technology that makes clusters easier for scientists to use?

If so, what do you think the cluster programming environment of tomorrow might look like?

Greg:

Actually, in the end of the 1980's, Unix was new in the supercomputing scene, and most sites still used vector machines. It's only in the 1990s that microprocessors and MPI message-passing have become big winners. And that's because of price-performance, not because it's easier to use than automatic vectorizing compilers. Ease of use for supercomputers reached its peak around 1989.

I do think there's hope of new approaches, however. One great example is the SMS software system developed at FSL. This software system is devoted to make it easy to write weather-forecasting style codes, and involves adding just a few extra lines of source code to parallelize a previously serial program. The result can sometimes efficiently scale to hundreds of processors, still can run on only one processor, and FSL has enough experience with non-parallel-programming users to know that they can change working programs and end up with a working program. (If you've ever heard of HPF, then this is somewhat like HPF, except it actually works.)

Today, the best programming environments are ones that hide message-passing, either in specialized library routines or using a preprocessor approach like SMS. By the way, Legion allows you to program distributed objects with minimal source code changes. I expect more of the same thing in the future.

My crystal ball isn't good enough to tell me what the next revolutionary change will be. I'm actually pretty happy with the evolutionary changes I've seen recently.

Job management
(Score:4, Interesting)
by gcoates

One of the weaknesses for beowulfs seems to me to be a lack of decent (job) management software. How do you split the clusters resources? Do you run one large simulation on all the CPUs, or do you run 2 or 3 jobs on 1/2 or 1/3 of the available CPUs?

Is there provision for shifting jobs onto different nodes if one of them dies during a run?

Greg:

We use the PBS batch system to manage jobs; it handles splitting the cluster resources among the jobs. At FSL, there are typically 10+ jobs running at the same time; the average job uses around 16 out of the 264 compute nodes.

If a compute node dies during a run, a HPTi-written reliability daemon marks the dead node as "off-line" and restarts the job. The user never knows there was a failure.

Weather forecasting in general.
(Score:5, Interesting)
by Matt2000

Ok, a two parter:

As I understood it weather models are a fairly hard thing to paralleliz (how the hell do you spell that?) because of the interdependence of pieces of the model. This would seem to me to make a Beowulf cluster a tough choice as it's inter-CPU bandwidth is pretty low right? And that's why I thought most weather prediction places chose high end super-computers because of their custom and expensive inter-CPU I/O?

Greg:

Weather models are moderately hard to parallelize; in order to process the weather in a given location, you need to know about the weather to the north, south, east, and west. For large numbers of processors, this does require more bandwidth than fast ethernet provides, and that's why we used the Myrinet interconnect, which provides gigabit bandwidth, and which scales to thousands of nodes with high bisection bandwidth, unlike gigabit ethernet.

As far as disk I/O goes, yes, most clusters are fairly weak at disk I/O compared to traditional supercomputers from Cray. We are using the CentraVision filesystem from ADIC along with fibre channel RAID controllers and disks. This is more expensive than normal SCSI or IDE disks, but provides much, much greater bandwidth for our shared filesystem.

Second part: Is weather prediction getting any better? Everything I've read about dynamic systems says that prediction past a certain level of detail or timeframe is impossible. Is that true?

Greg:

The quality of a weather prediction depends on a lot of things: the quality of the input data, which has gotten a lot better with the new satellites and other data collection systems recently deployed; the speed of the computer used to run the prediction; the quality of the physics algorithms used in the program, which have to get better and better as the resolution gets finer and finer; and the expertise of the human forecaster who interprets what comes out of the machine. All of these areas have limits, and that's why forecasts have limits.

What about a dnet type client?
(Score:5, Interesting)
by x0

I am curious as to whether (no pun intended...:)) or not you have ever done any testing to see if a distributed.net type environment would be useful for your type of work?

It seems to me that there are more than a few people who are willing to donate spare cpu cycles for various projects. At a minimum. you could concentrate on the client side binaries and not worry as mouch about hardware issues.

Greg:

Most supercomputers, like the FSL system, are in use 100% of the time doing real work. The biggest provider of cycles to distributed.net are desktop machines, which aren't used most of the time. Running distributed.net type problems on the FSL cluster is a bit of a waste, since the FSL cluster has a lot more bandwidth than distributed.net needs.

---------------

In closing, I'd like to thank Slashdot for interviewing me, and I'd like to point out that I got first post on my own interview -- perhaps the only time that this will ever happen in the history of the Universe?

This discussion has been archived. No new comments can be posted.

Answers About the New NOAA Massive Linux Cluster

Comments Filter:
  • by Anonymous Coward
    Something like Distributed.net gives you lots of CPU cycles but very poor bandwidth between nodes. That's great only if your problems are highly parallelizeable and individual jobs don't need to share data or talk to each other. (i.e. cracking encryption keys) The problems that FPL needs to solve may not fall into that category.
  • by Anonymous Coward

    A kernel module falls under the GPL. Yes, I know, binary-only modules are allowed by convention, but it still sucks.

    You're going to be out of luck should you find a later kernel gives better performance but breaks binary compatibility. Think about proper async I/O, which is coming and can give a handy boost. If you have the budget for Fibre Channel fabrics at some point, at least look at the Global File System [globalfilesystem.org].

    BTW, if you're going to compare this cluster with a Cray T3E or IBM SP, actually compare them, don't just say they're comparible. The T3E's network is one-of-a-kind, with large bandwidth and almost no latency. (And I certainly wouldn't comare MPI implementations. Myricom's sucks and is causing no end of problems for some other projects.) You can't compare on that aspect with any commercially available interconnect. And there are much larger SPs around and coming, like San Diego's [sdsc.edu] and the second phase of NERSC's [nersc.gov].

    Don't take this the wrong way. What you've put together is impressive, especially surviving the procurement process, but there's still a lot of work to be done to catch up with the big boys. You know that, but a good many people reading the iterview may walk away with a good-we're-at-the-leading-edge-now impression. We aren't. We're at the cost-effective edge, but we can make the leading edge...

  • Apple PowerMac with PowerPC G4 (with AltiVec extensions) would have been the optimum choice.

    Oh really? Care to cite some real world benchmarks that show that?

    The G4 and its vector unit are cute and all, but there's two big problems with using them for HPC/supercomputing applications:

    1. Memory bandwidth: Memory bandwidth is probably the most important thing for single processor performance on scientific applications; a fast processor is useless if you can't keep it fed with data. The G4s use standard PC100 memory, which means that they have a theoretical peak memory bandwidth of 800 MB/s and sustained (measured) memory bandwidth in the 300-350 MB/s range. The 21264-based Alpha systems I've seen have sustained memory bandwidth in excess of 1 GB/s, which is a big part of why they scream for number crunching.
    2. Compiler support: Scientific applications are generally written in Fortran; don't bother whining that Fortran's a crappy language, because it's not for number crunching apps (and AFAIK FSL's main application is a big Fortran code anyway). As far as I know nobody makes a Fortran compiler that can take loops and convert them directly into AltiVec instructions -- and without that AltiVec unit, a PowerPC's floating point performance is comparable to an Intel Pentium III at the same clock (i.e. not that great).

    The G4 could be a great scientific platform... if these two problems get fixed. Till then it's an also-ran.

    --Troy
  • Greg said:

    We provided an easy-to-use set of administrator tools so that the Forecast Systems Lab (FSL) cluster can be administered as if it were a single computer. This is a fairly difficult to do if you have a big mix of equipment, but the FSL system will never become that complex. There's already been a lot of development of programs for administering large clusters of machines; they just tend to not get used by other people. I'll admit that I'm part of that problem; I took some nice ideas from other people's tools, added some of my own, and re-invented the wheel slightly differently from everyone else.

    ---

    Any chance that HTPi will release your cluster management software under a free license?

    I think it would fill a void. There is lack of good, free cluster managemnet software.

  • ...if you haven't "figured" it out yet, girls are truly chaotic. It doesn't matter if you use 8/16/32/64... whatever number of bits in your calculations.

    Sometimes you can predict their behavior for a short time into the future, but at other times, or for any real distance into the future the resolution of your predictions becomes so poor as to be useless.

    Even if the quality of your input data was ideal down to the quantum level, your GF-model will still have to deal with Heisenberg and after him, you'll have to deal with your GF-real who's mad at you for some reason(spending so much time on that silly computer program, perhaps?).
  • We do the same sort of thing at work (analyzing genetic data), we use a regular cluster of dual-cpu p3s and GNQS queueing software to queue the many runs of the same algorithm on different sets of data. It works great and we didn't need anything fancy like myrinet since intranode communication is unnecessary for this type of application.
    It us understandable to be surly sometimes.
  • Did Greg misunderstand the last question? It looked to me like the asker wanted to know about using spare CPU cycles like d.net, except to crunch weather numbers instead of crypto keys. Still, though, it'd seem that a d.net type of cluster wouldn't be a good application here, because the forecasting models want a level of interconnect between the nodes that's not necessary for what d.net focuses on.
  • Damn! When I started typing my message, there were only 10 comments. By the time I was finished, I was #37! -SGT, inuntentionally redundant...
  • Yeah that "registration" bullshit irks me, being that PBS was developed with taxpayers' money. That should be outlawed.
  • Most of the clusters are used for scientific calculations. In my field (molecular biology) there is enormous need for software to search and analyze sequence (i.e. Human genome) databases. What type of clusters and databases are the best for this task?
  • Actually, the SNIA system is a little bit of both. A cluster 'node' will be a shared memory system with 2 to (IIRC) 32 IA64 CPUs. These nodes can then be hooked together in distributed memory clusters. IBM, Compaq and Sun are using similar techniques with their upcoming systems (we are getting another system to go with our T3E and J90, so we all got to sit though several days of NDA vendor talks. Was pretty interesting). They all want to get the benefits of large numbers of processors without the hassles of all those CPUs talking to the same memory pool.
  • Nice interview. It looks like clustering technology is, indeed, advancing by leaps and bounds (In fact, if you substituted IA64 processors for the Alphas, you'd have just about described SGI's SNIA architecture ;-) ...
    Anyway, Greg mentioned that when a node goes down, the job running on that node is restarted. What about for scheduled maintenance? Does the system software have any checkpoint/restart features? That is, the ability to write out a jobs state to disk and then restart it later (on the same or other nodes).
    On our 272-node T3E checkpoint/restart is really vital as we have many users who run 128node (or larger) jobs that take from 8 to 16 hours. They'd scream bloody murder if we had to restart their jobs from the beginning all the time.

    Regards,
    Derek
  • Simple: It depends on the computation/communication ratio. As Greg pointed out, many problems have higher communication needs than a distributed approach can satisfy, or in other words, you don't want to compute 2 seconds and then wait 10 seconds to have your results acknowledged and integrated by the server.

    Parallel computations can be classified into intrinsically (embarassingly) parallel, highly parallelizable and hard to parallelize. The first class may lend itself to distibuted.net-type approaches (if client size - which is dictated by the problem/algorithm - isn't going to discourage you, that is). The second class is what we are talking about here and typically does not lend itself to distributed.net-type approaches. The third class remains in the realm of vector supercomputers, mostly.
  • In response to the In the beginning... question, Greg points out that "Compaq designed the Alpha processor". Let's give credit where it is due. Compaq bought the Alpha processor which was originally designed by Digital Equipement Corporation. The Alpha was the successor to the highly popular VAX. I'm pretty sure that Compaq also acquired the compilers from DEC as well.
  • Disclaimer: I'm talking outta ma ass.

    That little bit of personal information duely disclosed, I'm wondering how many computations could be expressed as cellular automata. The wonderful thing about them is of course that each iteration only needs to communicate NSWE.

    If we had a (and maybe we do, I don't know anything about distributed net) client that was able to talk to other clients, then we should be able to distribute those computations too.

    The thing about a cellular network is that it has a global clock; all nodes count in lockstep.
    So now we'd be bottlenecking the communication with the server -- this is necessary for reporting results and what not. However this could be designed against by only sending updates every n iterations, with different nodes having different offsets.

    The second thing we need to think about is recovery from a node failure. Each node could communicate its state each iteration to two neighbors. Then if it died, the server could just reassign that node and it could ask its neighbors what it's state was.

    So I have no idea if this would work. Since I'm posting it here though, I suspect it would. However, people don't do it this way, so I'm prolly wrong. Tell me why.
  • Imagine the heat these things must produce?
    --
  • I'd really like to know Greg's answer to that last question if he understood it correctly... I think the question was suppossed to be "does your application lend itself well to distributed processing (ala distributed.net)" not "would you run distributed.net software on your nice cluster". Hey, Greg... you reading this? Try that one again!!!
  • try reading the other responses before you start posting. he already did answer that question twice, in post 15 and 22.

    palop
  • The other questions and answers give the answer, in that weather forcasting requires high inter-node connectivity, and large data bandwidth.

    Both of these would be a problem in a widely distributed approach, which doesn't have that ability.

  • So is the heat generated by the cluster taken into account for the weather calculations? Or maybe you'd need another cluster for that? But then that one generates heat, so you'd have to have another cluster to calculate that, and then, well, I guess you'd need another one to take into account the heat generated from that one and...oh dear.
  • Warning:As opposed to flamebait, the following is really my humble opinion.

    Is it just me, or does this interview paint a little bit too rosy a picture of the project? Everything's fine, we didn't make any mistakes, managing hundreds of machines is even easier than managing a single machine, etc., etc. I have a lot of respect for the project, but I don't think Greg's being totally honest about any problems they (haven't) had. This may be a phenomenal opportunity to learn from experience here, and it'd be a shame to miss out on any detail at all.

    But hey, if it really is that easy, drop me a line if you need a supercomputing cluster. Reasonable pricing, subject to availability, yada, yada, yada...

  • we had a cluster of 14 P120's and a single P133 with a Fore ASX 1000 switch generating a sizable heat dent that would actually shut it down (crash) in the summertime.

    We've got a 8 machine PII-350 cluster. One time someone came in, the room was really cold, so he started up Merseine prime seach programs on all the machines. Heated the room up to 85 or so within a half hour. Course a PII isn't the coolest chip around, either. :)

  • this morning the traffic channel had an error message up instead of the usual low rez traffic graphics.(SF bay area, channel 32) It twas an NT desktop showing. :)
  • ...for sticking around after the interview to answer even more questions and clear up misconceptions in the following discussion. Just imagine if Lars had done the same ... *grin*
  • His answer to this question implies that it would not distribute well D-Net style. If a 100mbit connection isn't enough bandwidth, then There's no way that a D-Net style aproach would work.

    Weather forecasting in general.
    (Score:5, Interesting)
    by Matt2000

    Ok, a two parter:

    As I understood it weather models are a fairly hard thing to paralleliz (how the hell
    do you spell that?) because of the interdependence of pieces of the model. This
    would seem to me to make a Beowulf cluster a tough choice as it's inter-CPU
    bandwidth is pretty low right? And that's why I thought most weather prediction
    places chose high end super-computers because of their custom and expensive
    inter-CPU I/O?

    Greg:

    Weather models are moderately hard to parallelize; in order to process the weather in
    a given location, you need to know about the weather to the north, south, east, and
    west. For large numbers of processors, this does require more bandwidth than fast
    ethernet provides, and that's why we used the Myrinet interconnect, which provides
    gigabit bandwidth, and which scales to thousands of nodes with high bisection
    bandwidth, unlike gigabit ethernet.

    As far as disk I/O goes, yes, most clusters are fairly weak at disk I/O compared to
    traditional supercomputers from Cray. We are using the CentraVision filesystem from
    ADIC along with fibre channel RAID controllers and disks. This is more expensive
    than normal SCSI or IDE disks, but provides much, much greater bandwidth for our
    shared filesystem.
  • No, I don't think I missed the point. I agree with everyone that bandwidth is an issue (although that's changing) and that realtime weather forecasting is not a good candidate for distributed processing.
    BUT, my point was that the distributed/shared processing model would be good for A TON of static batch type problem work that's being run on monolithic systems currently, but very few organizations even consider the distributed model.
    The ones who do (SETI, etc.) have more CPU cycles than they know what to do with (literally).
    Lastly, Greg - I wasn't trying to read your mind, I was just questioning your mindset. Irregardless, I appriciate the time you took to answer everyone's questions so completely.
  • Elizabethan....you wouldn't catch me dead in it, even if my lady likes a man in slops. Give me Saxon stripy-pants any day! :)

    -- WhiskeyJack, whose preferred mode of funny clothing runs toward 11th century Saxon & Norse.

  • Troll? I'm hurt. Some people just don't appreciate corn-ball humor I guess.
  • Hey, point that stick the other way! Pluderhosen? I thought you said you dressed in funny clothes? Now I'm gonna have to look up 'pluderhosen'. I'm also going to have to look into hurdy gurdy since I have a childhood memory of "The Hurdy Gurdy Man" with a monkey. I don't think he was wearing pluderhosen though.

    carlos

  • "Hurdy gurdys aren't the same as organ grinders. I don't think monkeys were a part of the act in the 16th century."

    I found the following Q&A here [hurdygurdy.com].

    Isn't a hurdy-gurdy something played by a guy with a monkey?

    Yes and no. That instrument is a barrel piano or barrel organ, which only plays preprogrammed tunes. It was played by turning a crank, so it got named after the stringed instrument which preceded it. Though it is techically sophisticated, it could be used by someone with no musical talent, and was frequently used by street musicians who were more interested in catching people's attention than in providing music.

  • It would have been nice if he downloaded and ran the dnet client and ran the benchmark on it - just so us RC5 people could "ooh and aah" over the zillions of keys/sec it would do.

    Sonicboom
    TEAM GBH RC5
    http://www.gbhnet.org/rc5/
  • Is it true that only people with names ending in -dahl are allowed to design big-iron computing. Sorry crap gag Mr. Am....errrrr...Lindahl
  • or for any real distance into the future the resolution of your predictions becomes so poor as to be useless.

    like 5 minutes would be fine... even being able to figure out what happened yesterday would be fantastic. :)

    doesnt really matter too much as I'm going to dump her soon anyway... too much to think about really. are relationships ever really worth it?
    ------
    www.chowda.net [chowda.net]
    ------
  • that Douglas Adams is getting into weather... he's so smart!
  • Hmm... Think carefully what your saying.
    Say in Q3 you got 120fps on a 1gig machine, you wouldnt expect much more then that on a 1000gig machine. The Graphics card would be the bottleneck.

    Quake 3 tends to be best at testing one machines config, and it is quite good at testing a network connection.

    Regards.
  • This may help.

    There are three easy to follow (with a little time for practise) rules for dealing with a woman whom you're involved with.

    1)When they tell you there problems just listen, don't solve it for them. Women, in general, aren't looking for solutions to their problems. They 'just wanna be heard'.

    2)If they insist on being 'right', just let them. You can't win this one. Better to be 'happy' then 'right' anyways.

    3)Always remeber the first 2 rules in the context of this simple fact -- "Women and reality have, at best, a passing acquaintance"

    Goog luck!
  • personall I think it's pretty cool that NOAA is using a Linux distro instead of WIN xx; hopefully our local weather forcaster will get it a bit more accurate in the future thanks to this ;-)
    -Mr. Macx

    Moof!
  • Hey!

    I'm Allan Cox. I know all about freedom, man.

    Being an advocate of free (as in speech, not beer) software, I know all about rights.

    Let's be cool and talk about this.

    As for the "gutted bodies" you mention, I was as shocked as you to see SlashDot mess up the links I posted!

    Is there some way we can work together, for the community, to alleviate this problem?

    Thank you.

  • by 575 (195442)
    Cluster the penguin
    Mighty machine of machines
    Crunch numbers like god
  • by Anonymous Coward
    Sorry, but we can't bring you the weather today as our MTS application blue screened.

    The forcast for today was the crashing of at least two major clouds in the East. When the data was dumped, one person was heard, "the sky is falling!" 50 people were confirmed dead and 146 injured.
  • Greg pointed to the PBS batch system which has a liberal license, but unfortunately requires user registeration before downloading.

    I'd just like to point out that there is an excellant GNU licensed queuing system that I've used in the past called Generic NQS. It certainly is worth a go if you're building clusters. Having said that, I'd like to see a product comparison of the various versions of queuing systems.

    I looked at Generic NQS briefly when we started working on our cluster at OSC [osc.edu]. My understanding of GNQS was that it did not deal well with multiple execution nodes and parallel jobs, whereas PBS does a pretty good job of this. You can also purchase a PBS support contract from MRJ, which is a big plus in a production HPC environment.

    --Troy
  • :-)

    Indeed - But in my case it's a transitoin from Orbital mechanics to Mp3 music....

    Not quite as easy
  • in fact they do generate a lot of heat. we had a cluster of 14 P120's and a single P133 with a Fore ASX 1000 switch generating [cwru.edu] a sizable heat dent that would actually shut it down (crash) in the summertime. we wound up moving it to a server room which was very well cooled (about 60 degrees F) and things improved there.

    i know that the HIVE also had to cool things very forcibly (air in the bottom and drawn out the top of the racks). heat is definitely a big problem.

  • On the topic of Beowulf clusters, does anybody know what actually happend to Project Übermensch [slashdot.org]? It looked like fraud right away, but I wonder whatever happened to those involved [geocities.com].

    ------------------
  • I think the question was `wether a distributed.net system would do weather predection work well', but it was answered that running distributed.net type problems on the supercomputer would be a waste. It seems that there was a misunderstanding there. I'm still curious as to wether or not the weather prediction problem could benefit from a distributed.net style `fix'.

    Bad Mojo [rps.net]
  • I didn't get the impression that he's holding his cards close to his chest on this one. It wouldn't really make any sense to do so on slashdot either, especially after agreeing to the interview in the first place. If he just wanted to get some publicity for the project/company, a site more oriented towards managers instead of tech would have ben a smarter move, but i digress...
    It sounds more like because of his prior experience with implementing clustering technology, he was already well aware of the major pitfalls.
    That said, You're really just hooking up a bunch of machines to a network, setting up a shared data source, and configuring the parallel procesing software. The design of an effective solution (given the requisite expertise) really can't be that hard. Now if he had designed a major component of the architecture (which he admits was all done by third parties) then we'd have a real case study to look at.

    Either way though, i'd like to know more about the particulars of the cluster management software. The configuration management piece sounds particularly useful for large groups of identical machines. does anyone know if this is publicly available software, or if there is a publicly available quivalent??
    -earl

  • Of course, the reason that they can do this is because they are using a *Monte Carlo* method, not the finite difference/element/volume techniques that are employed for weather prediction. Monte Carlo methods lend themselves extremely well to parallelization. However, you can't use them (not in any practical manner, anyway) to numerically solve the Navier-Stokes equations at the resolution needed to weather predictions. You have to resort to using a more direct method such as finite differences, which WILL require lots of message passing because you can't uncouple the different regions of the problem domain.

  • If you have the budget for Fibre Channel fabrics at some point, at least look at the Global File System.
    Our storage is Fibre Channel, and we did evaluate GFS. We found that CentraVision was superior for this customer, mainly because GFS didn't have journaling at the time. GFS may yet become quite superior.
    And there are much larger SPs around and coming, like San Diego's and the second phase of NERSC's.

    Myrinet has superior scaling when compared to the SP switch, or the T3E switch for that matter. The T3E switch did have higher bandwidths and lower latencies, but for many real supercomputing problems, Myrinet does the job for far less money.

    The biggest way that this system doesn't compare well with the T3E is in programming models -- the T3E also supports the SALC model, shared address local consistancy. I hope to support that in around 12 months.

    The IBM SP doesn't support the SALC model, and has inferior per-processor bandwidth and latencies.


  • Yes and no. That instrument is a barrel piano or barrel organ, which only plays preprogrammed tunes. It was played by turning a crank, so it got named after the stringed instrument which preceded it.

    I play the original, not the modern kind. In fact, the original stringed instrument survives to the modern era in French folk tradition.

  • He thought that the question was about running Dnet on the cluster (that would be a waste), but they were actually asking why not use a Dnet "style" of parallel computing like SETI is doing....
  • "This system does run regular software without recompiling. It just doesn't use a lot of CPUs for simultaneous compute unless you change the code to use MPI. But they can access the shared storage at high speeds without any change, and they can get farmed out to separate CPUs without any change."

    This is goes to my main point, though. I want a clustering solution that will take full advantage of the processor power available in a transparent way, like SMP does on a traditional multi-proc machine. Imagine being able to just plug in a unit for more processor power. Imagine being able to do it while it is running. Imagine that and you get my idea.
    I am not saying that PCI/PCI-X/Infiniband is the answer but I think that transparent solutions are often the best solutions.
    For another example, think about how external RAID systems work. They present the OS with virtual disks that may be composed of many physical disks. The OS does not need to know about RAID at all. It just has a disk that is big and works fast. Below the surface, physical disks (like cluster nodes) can die without the virtual disk dying. I think that this type of solution is nearly perfect. You want speed and failure handling for your disks? Use RAID. You want speed and failure handling for your computer? Someone needs to make a clustering solution like what I am talking about. We need a clustering solution that works without the user needing to know anything about clustering. It should just work.
  • Many clusters seem to run into limits from the network and storage. Is there anyone playing with some sort of PCI Bridge that would let hardware in one box see hardware in another and communicate through peer to peer PCI? This could avoid a lot of the network and storage waits. Nodes could access each other's memory and hardware as if they were part of of their own. The address mapping would be tricky but other big multi-processor systems seem to have solved this.
    I guess I am thinking of some middle ground between traditional multi-processor systems and common clusters. I want something that operates like a multi-proc box but uses common hardware, with perhaps one special PCI card/bridge. I want a cluster that can run regular software without special compiling.
  • You missed the point.

    Weather forcasting has to be done now. You can't wait for John Q Computer User's packet to get crunched, washed, and shipped back to NOAA. It isn't as simple as checking a number against a key or finding a pattern in a sampling of radio waves. You need a huge amount of information to determine the fate of just one data point, and every different data point needs a different huge amount of information. And they would all have to be done in order.

    You'd have to send packets the size of bowling balls to get even the smallest amount of work done. It would be inefficient for both the clients and servers. The clients would be spending their time downloading and uploading, and the servers would spend their time desperately making connections between packets preparing the next bowling ball for the next client. If just one of the packets are lost because someone turned their computer off for the night, millions of next generation data points will have to wait.

    In other words, when you are not talking about a very linear project where it doesn't matter what order you get things processed, distributed computing sucks.
  • Well, maybe you could set up Quake 3 on 100 machines running at 60 fps each and offset by 1/6000 of a second. Then, you would have 6000 fps distributed over 100 monitors...
  • [warning: vaguely informed assumption]

    The entire job need not be restarted; The portion of the job that was on the dead node is handed off to another. The job may suffer a long barrier condition until the new node catches up, but on a reasonably finegrained cluster such as this it would be a mere matter of seconds. If the entire job were to die (controlling node, bad logic, etc) then the load lost isn't going to be saved by a checkpointing system anyway.
  • Here's me trying to complete my Astronomy PhD while working for a silicon valley internet music company...

    Astronomy -> computers/IT is a fairly normal path from where I'm sitting :-) Having done four years of image handling and catalogue analysis for my thesis, working in computers (databases actually) isn't such a stretch.

    Cheers,

    Toby Haynes

  • Sorry, but we can't bring you the weather today as our MTS application blue screened.
  • Maybe they'd let me have a few nodes to figure out my girlfriend!
    ------
    www.chowda.net [chowda.net]
    ------
  • Do you ever plan to release a detailed set of specifications that explains the set of software installed across all nodes, the source code of any custom software that needed to be written, a network topology diagram, etc?
  • by szyzyg (7313) on Thursday June 01, 2000 @07:50AM (#1033332)
    Here's me trying to complete my Astronomy PhD while working for a silicon valley internet music company...

    Don't ask me how an astronomer ends up coding Icecast and Liveice, it just sort of happened. The best thing about this isn't the money, it's the fact the myplay.winamp.com is based on icecast - this is a week after nullsoft banned icecast servers from shoutcast.com.

    ;-)
  • by Greg Lindahl (37568) on Thursday June 01, 2000 @08:14AM (#1033333) Homepage

    A yellow dress?! Those are pluderhosen, not a dress. Pants. With pockets. Worn by manly Elizabethan men, who carry sharp pointy sticks to poke people who accuse them of wearing dresses.

    Hurdy gurdys aren't the same as organ grinders. I don't think monkeys were a part of the act in the 16th century.

  • by Greg Lindahl (37568) on Thursday June 01, 2000 @07:27AM (#1033334) Homepage
    No. As I pointed out, weather codes require a fair amount of bandwidth, much more than that's available in a distributed.net situation. In addition, most weather codes assume that they're running on a uniform machine, so they'd have load-balancing problems if run on a distributed.net type system.

  • by Greg Lindahl (37568) on Thursday June 01, 2000 @08:21AM (#1033335) Homepage

    A PCI bridge would itself pretty much be a network. Myrinet is a great interconnect and is probably much better than any big PCI bridge that you could come out with. The new InfiniBand specification allows bridging of the successor to PCI, but I suspect Myrinet's successor is going to be a better interconnect by the time InfiniBand machines are available.

    This system does run regular software without recompiling. It just doesn't use a lot of CPUs for simultaneous compute unless you change the code to use MPI. But they can access the shared storage at high speeds without any change, and they can get farmed out to separate CPUs without any change.

  • by Greg Lindahl (37568) on Thursday June 01, 2000 @08:28AM (#1033336) Homepage

    The first answer to your question is that we never have scheduled maintenance. Since the machine isn't monolithic, we can repair most parts while it's live.

    This machine is nothing like the SGI SN-IA architecture. SN-IA is still shared memory, and has a significantly faster network (which is far less scalable and far more expensive). Whenever you share memory, you share failures.

    We did provide a user-level checkpoint feature to FSL, but it requires the user to modify their program. Kernel-level checkpoint is on our list of things to do. It's not that hard for single processes -- Condor does it, for example -- but it's fairly tough for programs that use MPI and run in parallel.

  • by Greg Lindahl (37568) on Thursday June 01, 2000 @08:31AM (#1033337) Homepage

    CentraVision is a traditional proprietary product.

    All they've released for Linux so far is a client, which is a kernel module. I'm not sure if they're going to release the metadata server for Linux.

  • by Greg Lindahl (37568) on Thursday June 01, 2000 @07:42AM (#1033338) Homepage

    They tell me that the first external users (20% of the machine) are going to be ocean modelers. But I think the FSL guys would disagree that it's far more interesting... all of these guys are pretty fanatical about what they do!

  • by Durinia (72612) on Thursday June 01, 2000 @01:23PM (#1033339)
    Clank! You took a good shot, but unfortunately the network latency gods put a lid on the basket.

    The only problems that the distributed.net/SETI@Home model works for have a maximum grand total of two (2) total communications per node. Transmission of the original data set(sometimes just a range of data) and after a long computation a return transmission of the results. You can send a large size block to each processor and then let it compute for a long time without any communication.

    Lets do a little math. The ping time from my workstation to the distributed.net server varies around 80 ms (round trip). I'm going to exaggerate (a lot!) and say it has a 500 MHz processor. If it can make a mathematical calculation every 2 cycles or so, thats one every 4 ns.

    A good example here may be a scientific time-dependant model, where calculations are made for a time segment, and then the processors all report their results to the other processors as needed for their calculations in the next segment. So, the 80ms of LATENCY (not counting the actual transmission of the data!) is 20,000 times the time for one ALU op. It would be nice if we could compute during this huge latency time, but we can't without the data produced by the neighboring nodes. Even if the program does 1000 ops per segment, there is at least a 20x overhead of communication over the entire program. Then take into account that data has to be sent back to the node by several others (which aren't synchronized) and the complexity of routing all the results to the right nodes, and you can see that it might just be faster if you forget the network and let my workstation run it alone.

    The reason people still buy "big iron" is not because they've never heard of distributed processing. The Cray T3E (my favorite!) is really just a distributed computer, itself. It uses commodity 300-600 MHz Alpha processors. People like it because its interconnect latency is over 10000 times faster, and it does all the routing for you. Not to mention it comes with optimized libraries and great service.

  • by RobL3 (126711) on Thursday June 01, 2000 @07:21AM (#1033340)
    The question:
    I am curious as to whether (no pun intended...:)) or not you have ever done any testing to see if a distributed.net type environment would be useful for your type of work?
    The answer:
    SNIP
    Running distributed.net type problems on the FSL cluster is a bit of a waste...
    SNIP

    I think Greg's answer to this question, i.e. not understanding that the question was about running simulations outside of his cluster, is indicative of the "we've got to run our jobs on somthing that sits in a big air-conditioned room on our site" mentality. Once people start to realize the potential that gazillions of unused processor cycles represent, we'll start to see incentives and advertising designed to lure people to donate thier cycles to one project or another.
    Of course then, only the cool, "pretty" projects will get attention.
  • I think Greg's answer to this question, i.e. not understanding that the question was about running simulations outside of his cluster, is indicative of the "we've got to run our jobs on somthing that sits in a big air-conditioned room on our site" mentality.
    You must be a great mind-reader.

    No, I don't have a "big air-conditioned room" mentality. In fact, Legion is capable of harvesting unused processor cycles in a much more sophisticated fashion than distributed.net. However, weather forecasting needs too much bandwidth. You have to consider problems on a case-by-case basis for such a low-bandwidth system; most traditional supercomputer problems aren't appropriate.

    This doesn't mean I think distributed.net isn't cool -- it's very cool, light-weight, and it gets its job done. It shouldn't be a surprise that it can't solve every problem.

When Dexter's on the Internet, can Hell be far behind?"

Working...