Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Linux Software

"Cplant" Parallel Computing Tool 77

SEWilco writes "Sandia National Laboratories has released its "Cplant" massively parallel processing software. This is related to the software used in their ASCI Red supercomputer, and eliminates several scalability problems to allow hundreds of nodes for algorithms which can't be parallelized for Beowulf-type clusters. This is now number 2 on the TOP500 supercomputer list. The press release refers to "licensing terms", but the license is the GPL. We discussed this in a Linux clusters discussion and several earlier reports as ASCI Red grew."
This discussion has been archived. No new comments can be posted.

"Cplant" Parallel Computing Tool

Comments Filter:
  • by Anonymous Coward on Friday June 08, 2001 @08:01AM (#165893)
    A standalone non-clustered one of these.
  • by Kostya ( 1146 ) on Friday June 08, 2001 @08:07AM (#165894) Homepage Journal
    ... have you heard of Nuclear Weapons Testing?

    In order to simulate new weapons configurations, it takes an awful lot of computing power. Just try to imagine all the factors that have to be tracked and taken into account in order to produce an accurrate and thorough simulation. Simulated tests have a lot of advantages, obvious (no radiation) and non-obvious (costs).

    You've been reading YRO too much. Trust me. The government has a lot better uses to put its supercomputers to than breaking our SSH and PGP keys--like big guns and bombs for laying waste to the known world!

  • Well, there's the concept of Beowulf clusters (free/open source software plus dedicated commodity hardware), and then there's the Beowulf software from Scyld. Just because someone's not using Bproc doesn't mean it's not a Beowulf cluster, and just because it works for your 10 node cluster doesn't mean it scales to the 1400 nodes in Cplant. I wouldn't go so far as to call Bproc a standard, either. MPI is certainly the standard for message passing (and my understanding is that the Cplant stuff supports MPI), but Bproc? Bproc isn't that much different from arrayd on SGI systems or RMS on the Compaq Alphaserver SC... or yod on Cplant, for that matter. It's just a way to start and control processes.

    The Scyld Beowulf software is very nice for quickly setting up small to medium size clusters where users use the whole cluster more-or-less serially. IMHO, it doesn't fare quite so well for production oriented shops like Sandia, where things like accounting and scheduling become important. The Scyld software also has very limited support for Myrinet, which is a very nice (and very fast) interconnect for clusters.

    You also need to remember that the Cplant stuff was specifically designed to emulate the user environment of the ASCI Red machine, which inherited its environment from Sandia's Paragon. That was done presumably to keep the retraining of Sandia's user base to a minimum. The Scyld software has no such requirements.

    (Disclaimer: One of my coworkers used to work on Cplant, and we've borrowed some of Cplant's ideas [though not any of the software] for the clusters we have at OSC.)

    --Troy
  • First, "bash" is not Linux. It's a GNU utility. A Linux machine need not have "bash" installed. Linux is only the kernel, and presumably, could be used as the basis for a "smart bomb", athough a real time/low latency OS would probably be more appropriate.

    Both Linux, and GNU bash are licensed under the GPL as "free software." Stallman has stated [gnu.org] that free software stems from "Freedom Zero", namely "the freedom to run the program for any purpose, any way you like. "

    To my knowledge, use restrictions would violate both the GPL [gnu.org] and Open Source Initiative's Open Source Definition [opensource.org].

  • by roystgnr ( 4015 ) <roy&stogners,org> on Friday June 08, 2001 @11:28AM (#165897) Homepage
    I've never worked in one of the supercomputer-happy departments at Sandia, but here's a few applications I've talked with others about:

    Nuclear simulation: This is the big one. With popular opinion and world politics the way it is, it's likely we won't set off another thermonuclear detonation for a very long time. Unfortunately, we have a few thousand warheads that are aging and decaying, and we want to be sure (and make everyone else sure) that our final deterrent isn't turning into duds under our noses. This is pretty much the sole official justification for the national labs' supercomputing programs.

    More nuclear simulation: After New Mexico's devastating summer fires last year, they stepped up research on the effects of fire on stored warheads (no, they won't go nuclear, but cleanup could still be awful). Simulating something that turbulent isn't easy, but it'll be nice to know if there are any further precautions Los Alamos needs to take.

    Computational Fluid Dynamics - refining supercomputer code to cut down on the need for even more expensive wind tunnel time. Military and civilian uses: the two I saw were hypersonic parachute unfolding for bombers and drag-reducing plastic attachments for big rig trucks.

    Impact testing - this is one of the big commercial apps of supercomputers; I don't know how much of it they're doing at Sandia right now. You can make vehicles a lot more crash safe cheaply if you can virtually destroy them (and refine their frame designs) hundreds of times before actually mangling hardware.

    As for crypto breaking... no. For example, the Teraflops has 9 or 10,000 processors (just upgraded to 3xx Mhz Xeons, I'm told, since those are the fastest things that could be massaged into the old PPro sockets) - That's on the order of how many distributed net computers brute forced 64 bit encryption... so for 128 bit encryption you'd just need 16 quintillion more Teraflops supercomputers. Your PGP key is infinitely more likely to be snagged by some hacker's trojan and keylogger than it is by a government supercomputer.
  • eCos is probably the best bet. And I think it's an open-source kernel, too (could be wrong, though).

    However, you are also missing the benefits of COTS (commodity-off-the-shelf). Although I doubt bombs use too many COTS parts, It is cheaper to use a known-working thing than it is to build something custom. Even if a bomb doesn't need megs of RAM (Linux does _not_ require an HD, just a flash card), it might be worth it to go ahead and use it to save the expense of writing a custom solution.
  • [...] "self-made"? Numbers 215, 396, and 413 [...]

    He, I've seen and worked with 215. Pretty non awe inducing group of 96 dual Pentium III boxes using a Myrinet interconnect. You can find some pictures at the Kepler homepage [uni-tuebingen.de]. It's amazing what good lighting and a good photographer can do :-) That aside, the system itself is pretty cool (and damn fast, even if there's some contention among its users for computing time)

  • They do nuclear weapons detonation simulations and modelling. They use the computing model as a substitute for REAL testing.
  • It depends on what you mean by "simulating nuclear explosions". Sandia is using ASCI Red to model the effects of a nuclear blast (like how far away from a blast of X kilotons do the buildings get knocked down), but they are legally prohibited from modeling the actual reactions that occur within a nuclear device, the part we call the "physics package." Los Alamos and Livermore are using their ASCI machines to do these calculations and we need all the horsepower we can get to model the phenomena accurately enough to certify our aging stockpile without nuclear testing.
  • Something you might find interesting: at one time the worlds largest repository of free and open-source software was at the (then) Army Ballistic Research Laboratory, open to anyone who could FTP there. It was an important resource during the 1980's when the free software community, a community that included the late Mike Muuss [matrix.net] of the BRL, was taking shape.

    This is only one example of many from that era. (I hope it's not too trivial to point out that the Internet itself originated with the "War Pigs.") Had the GPL included an anti-military clause, there is a good chance that much of GNU would not exist -- if the movement had happened at all. Don't forget, the "War Pigs" paid for Stallman's ARPANET connection (via MIT, which was on the ARPANET by virtue of being a major military contractor).

    I'm not attempting to justify the military, here, just pointing out that blindly excluding them may not be the best of ideas...

    -Ed
  • Nuclear testing's been mentioned. Other massive computing efforts include theoretical protien chemistry, astronomy, particle accelerator analysis, weather simulation, materials reserch (more theoretical chemistry). Getting good excited state properties of molecular systems scales as the 12th power of your basis set size. Doubling the size of the system you're looking at increaces computational effort by a factor of 4000. It's like, real easy to chew up unlimited amounts of computer power in computational science.
  • An important task for these supercomputers is stockpile stewardship, which means ensuring that nuclear weapons are safe and reliable as their internal components age and degrade.
  • by DLG ( 14172 ) on Friday June 08, 2001 @08:10AM (#165905)
    Although this sounds good for Linux, now in the number 2 most powerful computer in the world. Another sign that Linux is on the rise and not "dead"

    I don't see a reference to Linux in the description of this supercomputer. I see the following link to the specs [sandia.gov] which describes the OS as:

    The operating system used for the Service, I/O, and System Partitions is Intel's distributed version of UNIX (POSIX 1003.1 and XPG3, AT&T System V.3 and 4.3 BSD Reno VFS) developed for the Paragon XP/S Supercomputer. The Paragon OS presents a single system image to the user. This means that users see the system as a single UNIX machine despite the fact that the operating system is running on a distributed collection of nodes.
    As much as I like to push Linux (I use it as my desktop) it just isn't correct to say it is in the #2 in the top 500 list.
  • Yes, lots of machines do modeling and analysis well. The problem is that the questions the analysts are most interested in require more and more computing resources. When someone is testing a nuclear weapons simulation, with over 100 million degrees of freedom, it can take the entire resources of a Cplant or ASCI Red machine a couple of days just for the first seconds of the event. Speaking from my experience, I've never heard of any of the processing power of either the Cplant or ASCI Red machine being used for something as mundane and non-engineering-related as cracking codes to see if Aunt Bessie thinks that Communist men are cuter than Democratic men. These people have much more fun things to do.
  • This is related to what is running on ASCI Red. The pages suggest that this is not exactly what is running on ASCI Red.
  • The press release refers to "licensing terms", but the license is the GPL.
    What in the world does this mean? The GPL was a license and had several terms last time I checked.

    The press release actually mentions that it is necessary to agree to some licensing terms before downloading. This unnamed license turns out to be the GPL, which many of us know.

  • pretty well, the only problem is the bouncing cards at the end fly by WAY too fast.
  • Remember that the person I replied to said this:
    If you think the Internet doesn't have scaling problems, where have you been the past 10 years? Under a rock?!
    I'm just defending the assertions I have made.
    --
  • Sure, TCP/IP will be slower than Myrinet, but that's a speed problem, not a scalability problem, right? You can have an ethernet switch which does the same function as a Myrinet switch, just slower. Again, it's speed, not scalability.
    --
  • TCP/IP has big problems when you have thousands of nodes, each filling the pipe as fast as possible.
    And the relevance to clusters is...?

    If clusters were limited to "thousands of nodes", I promise you nobody would notice.

    The main bottlneck with MP machines is the message passing. It's rarely the memory or cpu speed, it's how fast one node can talk to another.
    You're talking about clusters of uniprocessors. I don't research this area myself, but I have been told that clusters of 4-way SMP machines saturate the memory bus before the netrork. That's because you have four CPUs and a Myrinet card all competing for the same bus, not to mention other DMA devices.

    Sure, if you use uniprocessors in your cluster, then the network becomes the bottleneck. No big insight there. And, it has nothing to do with TCP/IP.

    If you think the Internet doesn't have scaling problems, where have you been the past 10 years? Under a rock?!
    You mean to say the TCP/IP scaling problems in clusters are the same as in the Internet? I think not.

    Look, I just think a claim like "TCP/IP has fundamental scaling problems" could have been phrased better, because it seems ridiculous at first glance. I'm sure their claim is valid, whatever it is, but clearly there is ample evidence that TCP/IP itself scales to far larger systems than any cluster.
    --

  • by p3d0 ( 42270 ) on Friday June 08, 2001 @10:30AM (#165913)
    I like where they say TCP/IP has inherent scalability limitations. Have they heard of the Internet?
    --
  • The original poster wasn't just talking about using the code "in bombs." The primary thing the Sandia people are doing with this code is simulating nuclear explosions. The code doesn't have to actually be in the bomb to have helped kill the people. It could be used on the supercomputer that ran the simulations that allowed the designing of the bomb.

    Also, I worked in defense contracting for Lockheed for the Aegis defense system. Of course the guided missles (like Tomahawks) don't run anything even remotely similar to what we would call an "OS". There aren't different levels of apps running on top of each other, theres just dedicated circuitry hardware designed to do what the missle needs to do. Running something like linux or even QNX would be utterly assinine.


  • With about 30 machines dedicated to "research" surely one of those already perform the job well. FYI the government of the USA has 19 classified machines as well which are most likely NSA, and military machines. All of it can't be for so called nuclear research. At least in my eyes
  • by joq ( 63625 ) on Friday June 08, 2001 @08:01AM (#165916) Homepage Journal

    I always wondered what big brother does with these super computers, the FAQ says little about what tasks these perform, and I doubt you would need that much supercomputing for research.

    So the question is, just what is Sandia doing with this? Making super comps to crack codes perhaps for the NSA? Aside from that maybe some sole company should look into recovering the hundreds of obsolete PC's that are being tossed and create a super comp to test with and perhaps create the ultimate crypto algorithm. (Yes I know slightly off topic)

    Does anyone have any idea as to what these machines are truly doing?
  • is it now THE GPL, as in THE Ohio State University?
  • I would guess that many if not most of the applications for these machines are benign modelling problems (like astrodynamics and meteorology). The reason that so much computing has to been thrown at these problems is the accuracy that is required to answer some of these questions and the techniques involved.

    For example, suppose that you are doing remote sensing with a satellite and need to calculate your sensor's position to within a 2 inch box at all points along the data track. This requires you to have an acurate track for your spacecraft along all of these points. To do this, you have to have a ridiculously accurate gravity models for the earth, moon, and sun, as well as models for all the other perturbative effects ( atmospheric effects, solar radiation pressure, etc... ). Most of these things are modelled as giant matrices. The solutions then involve manipulating all of these matrices with respect to your know positions umpty-squat-gagillion times(technical term). This is what a big vector processing machine buys you. Doing all of these inversions and other operations are just a few machine instructions vs. giant loops.

    I'm sure that the reasons for so many of these machines that there are similar techniques in so many problem domains (fluid dynamics, etc...).

  • It's a totally different environment. This runs on commodity hardware with network interfaces (in this case, Myrinet), whereas Unicos/mk runs on the mostly-custom (except for processor and memory) Cray T3E. The inter-node interface for the T3E is totally different - it uses E-registers, which essentially allow a machine to do all of its communication through loads and stores to memory instead of having to context switch and send network packets. So, the two are like apples and oranges - not very comparable because of their environments. To the first order, unless you've got some pretty dull software people, the scalability of a system is 99% hardware dependent.

    That being said, as far as this hardware - a commodity cluster system - goes, it seems that this is a pretty decent set of tools and optimizations.

  • by Durinia ( 72612 ) on Friday June 08, 2001 @08:52AM (#165920)
    Even while under SGI, the hard-core CRAY lines kept the name (like the T3E, T90, and SV1).

    Now that they've been "un-bought", Cray gets to put its name back on the list as an independent company.

    As a side note, SGI sold the Cray division because it was "unprofitible" and a fiscal liability. Yet, Cray Inc. made a profit last quarter, and SGI has lost about $2/share for the last several quarters in a row, and just layed off another 1/3 of their workforce. Oh, and Cray's stock price is higher. Go fig. :)

  • by Durinia ( 72612 ) on Friday June 08, 2001 @08:26AM (#165921)
    And, I quote: "This is related to the software used in their ASCI Red supercomputer, and eliminates several scalability problems to allow hundreds of nodes for algorithms which can't be parallelized for Beowulf-type clusters." This is a pretty big over-statement. From exploring their site, it seems pretty clear that, while they made a few scalability enhancements (like cutting out the TCP/IP stuff, etc), they're main goal was to make large commodity cluster systems (Beowulf or not) more usable. They made a lot of good progress in this area by porting over several tools from their learning experience with ASCI Red. I also found it funny that their "commodity machine" had a custom-made myrinet switch. I think it must be hard to resist the "if we don't have it, we'll build it" mentality of a National Lab. Very cool. Oh, and I'm not sure when the source was put up, but from what I can tell, the site hasn't been updated in almost a year.
  • I'm afraid that I'm not really following you.

    If you are saying that the word "the" in front of GPL in unnecessary, I would disagree.

    Read both of the following, aloud:

    "This software is licensed under GNU Public License."

    "This software is licensed under the GNU Public License."

    -Peter

  • The press release refers to "licensing terms", but the license is the GPL.

    What in the world does this mean? The GPL was a license and had several terms last time I checked.

    -Peter

  • Why bother?

    Use ATLAS (http://www.netlib.org, platform self tuning BLAS and LAPACK) and FFTW (run-time algorithm optimized Fourier transforms).

    Both are portable and both approach or beat the performance of proprietary hand-tuned assembly written libraries.

    But don't take my word for it. MATLAB (http://www.mathworks.com) now uses the ATLAS implementation of LAPACK / BLAS and MIT's FFTW in the their computational core.

    I've used the ASCI Red BLAS and FFT stuff. I think the reason that it is not freely distributed is that it was developed in colloboration with Intel employees. However, ASCI Red libraries always had the disclaimer to the effect that if you had a compelling reason to have the source something could be worked out.

    Check out how FFTW works. It is one of the few things I've seen that I would actually consider clever. Basically, FFTW designs a algorithm at run-time which is optimal for your cache size, register file depth, memory bandwidtch and transform type; powers of two sizes are not required. What really impressed me is that FFTW's codelet generator stumbled across a couple of hitherto unknown algorithms with reduced flops for computing strange sized FFTs.

    ATLAS is pretty clever too. For kicks, run the installation and watch it tune the kernels. The routines for portably diagnosing FPU register size, FPU MAC performance and cache sizes are useful to have around.

    Kevin
  • Now, I usually don't nitpick, BUT I am just to proud for this: you missed number 126.

    528 Pentium III-800 computer, running Red-Hat Linux. (well, usually it is some less, since a couple are out of order at any time). Just normal mini-towers in long rows of shelves. And a huge network switch. Normally, the info can be found here [tu-chemnitz.de], but since Murphy lives, it seems like our webserver is down right now, another picture [chemnitz.ihk.de].

    Well, back to abusing this machine.

  • I think you guys are mixing terms with the word "trivial". There's algorithm trivial and implementation trivial

    Cracking PGP *is* trivial, in the sense that the algorithm to do it is published, understood, and widely believed to be the best we're going to have for a long time. Effectively, it's at a standtill as far as evoltion of real implementations goes. And to do it takes more computing power than we've got. Algorithm trivial, but not implementation trivial.

    Now, the number theory and algorithm research going into crypto is *very* non-trivial, but this has yet to trickle down to the implementations in a meaningful way

    Massive simulations such as a nuclear detonation are still an open-ended problem at the algorithm level. We've got some pretty good ideas about ways to do it that might reflect reality, but different angles are published regularly. Which degrees of freedom to play with, shrkinking residual versus orthoganal residual convergence, blah blah blah... *And*, as an added bonus, these things can be investigated and the results evaluate *before* our sun becomes a neutron star (unlike, say, cracking RSA).

    This is one we'd call implementation trivial, but not algorithm trivial.

    We now return you to your regularly scheduled slashdot fodder... :)

  • I didn't realize the Myrinet switches were custom made... maybe the lengths of the cables were custom. There are other gigabit switches that didn't exist when they were first building this machine, which might be better suited these days.

    The experience for this was related to SunMOS, an OS for the paragon and maybe the nCUBE.

    I'm not sure when the source was put up, but from what I can tell, the site hasn't been updated in almost a year
    I sent in this: 2001-04-20 21:15:46 Sandia Labs Cplant software under GPL (articles,linux) (rejected) a while back. So, that web page was updated within 6 weeks. This was sent in about 3 days after the announcement of GPL went up.

  • The main bottlneck with MP machines is the message passing. It's rarely the memory or cpu speed, it's how fast one node can talk to another. TCP/IP has big problems when you have thousands of nodes, each filling the pipe as fast as possible. It's not that the messages won't get there, it's that the latency skyrockets once the pipes are near saturation. This behavior will prevent your code from reaching its theoretical max.

    So, my question to you is, "Have YOU heard of the internet?" Because, if you think the Internet doesn't have scaling problems, where have you been the past 10 years? Under a rock?!
  • Some of the particular issues surrounding Sandia's Cplant project were the subject of a previous story on Slashdot [slashdot.org].

    AFAICT, the upshot is all the tweaking that must be done to coax higher performance on numerically intensive codes with that many processors.

    As many in the numerical simulation community already know, message-passing codes abuse a network in a way that web browsers do not; demanding lower latency and higher bandwidth than can be provided by plain ole 10/100 Mb Ethernet (at least for large numbers of high SPECfp processors with any reasonable memory speed.)

    The existence of Linux open source code facilitates the creation of their Portals layer that sits underneath MPI and above the Myrinet hardware on these Alpha machines.

  • Most consumer-grade crypto is pretty trivial compared to these problems
    If you're talking about PGP, as you seem to be, you're wrong. A machine capable of brute-forcing a good 128-bit block cipher in reasonable time does not exist. Have a look at http://axion.physics.ubc.ca/pgp-attack.html [physics.ubc.ca].
  • by acacia ( 101223 ) on Friday June 08, 2001 @09:35AM (#165931)
    This is great stuff. This is similar to what the commercial applications Ab Initio and Torrent Orchestrate do. What this software does is provide a standardized, consistent worldview of the all the resources in your parallel system. It should allow you to partition data, pass out processes to nodes, and handle internode communication between them transparently.

    This is an important software release, because it is a step away from hand rolled, low level message passing, toward a standarized means of communicaton between nodes at a much higher level of abstraction. Think of it this way: You don't want to have to write all of the control logic for processes that are divvied out to the nodes when you are writing an application. Instead, you provide base classes of behaviour, distribute them to all of the nodes, and then inherit and instantiate specialized behaviors for _EACH JOB_ from a control partition.

    This provides a nice level of abstraction for the programmer. It also puts Linux MPP systems in the same class as your IBM SP/2, NCR/Teradata, and Clustered Solaris systems, among others. I think that I will be doing some work on enhancing this software!

    Oh, and yes, I do professional parallel programming for a living. :-)

    Cheers and kudos to Sandia for releasing this as GPL!!!
  • With about 30 machines dedicated to "research" surely one of those already perform the job well.

    With all due respect, I don't think you really know what you're talking about. The amounts of data involved in these simulations is simply mind-boggling. And no matter how many points you simulate, and how many time-steps it runs over, it's still only a simulation. The way to get a more accurate simulation is to increase the number of data elements being simulated and decrease the amount of time between 'steps'.

    Basically, you might be running a sim for weeks (or longer) to be able to accurately simulate the first half-second of a nuclear explosion. Trust me when I tell you that they want the biggest, fastest computer your money can buy :) It's never enough.

    Most consumer-grade crypto is pretty trivial compared to these problems, and the (percieved) need to simulate nuclear explosions is probably greater than the need to break your PGP code and see where the last place you boffed the boss's wife was.

    Incidentally, contrary to what some other people were saying, these big beasts are very useful to certain types of research outside of cryptography. Astronomy, for instance.
  • Not Found
    The requested URL /cplant/doc/man/yod.html was not found on this server.


    Treatment, not tyranny. End the drug war and free our American POWs.
  • Will he post any changes? Will the Russkies? Will the Chinese Communists? Enquiring minds want to know. :)
  • Unfortunatly, the trend in software development has been to use the capabilities of the hardware to mask the lousy performance of the software. Unfortunately, it's often cheaper to upgrade your hardware than it is to pay a team of programmers for 6 months to make the software more efficient. The attitude in business is all too often that of "slap somthing together and release it, so you can get on to the next project". Most internally-developed software (and a large percentage of commercial apps) are in perpetual beta. There's never time to do it right, but there's always time to do it over.
  • Linus is great, but lets get our facts straight...

    Just because an application uses a cluster does not automatically mean that it's running on a stack of comodity PCs running Linux in a beowulf-style cluster interconnected via GigE or Myrinet. It also doesn't automatically mean that the application isn't capible of running well on a single large machine.

    For example, NOAA recently put together a cluster for computing weather models for the upcoming hurricane season. Their cluster is actually 8 machines, each a 128 CPU SGI Origin 3800 runing IRIX 6.5. The 8 machines are interconnected through a thick mesh of GSN (gigabyte system network, a modern version of HiPPI that can transfer 800 megabytes/sec per link). The messaging protocols used are a mixture of shmem, OpenMP, and MPI.

    Linux is great and all, but ASCI Red uses Intel's Paragon OS, a derrivative of Unix.
  • Linux isn't running on the second-fastest supercomputer either, Paragon OS is.
  • Heh, I can just see a good trivia question...

    Who is the "Cray woman on the upper right-hand side of most cray.com pages?"

    http://www.cray.com/products/index.html [cray.com]
  • by green pizza ( 159161 ) on Friday June 08, 2001 @09:45AM (#165941) Homepage
    Cray, Inc. [cray.com] is much more alive than their former owner, SGI...

    Lots of new products and they're even making a profit.

    http://www.cray.com/products/systems [cray.com].

    Nice varitey of systems, from their own SV1/SV1ex/SV2 machines, to Linux clusters, to maspar Alphas, to NEC vector-based machines, and more.
  • This, the 10 Tflop, 8-10 (majority) of the top 15 of the tops500 list, the cray T3Es (and some other cray stuff)?

    Lets see, ALPHAs. 'nuff said.
    Alphas aren't anywhere near dead, as many people have said they are, neither is cray.

  • High Performance Message Passing: In order to support application-level communication, such as MPI, as well as system-level communication, such as that which occurs between the compute node daemons and the launcher, a flexible, high-performance data movement layer is needed. Much of the work on the Intel MPP machines focused on providing a communication layer that could deliver the highest possible percentage of network resources to these applications. The result of this work are Portals , which are the data movement layer supported on the Intel TFLOPS machine.

    don't know why... but does anyone remember in Pulp Fiction when the black guy (Jules) is threatening that guy with the gun and says "English, motherf*cker, do you speak it?" and then almost blows his brains out? Sometimes that's how i feel.


  • With all of the talk about Beowulf clusters of this and that, I'm surprised that Intel has only one appearance on the the Top500 Supercomputers [top500.org] list.

    MayorQ

  • How well does this scale in contrast to Unicos/mk?
    --
  • Can't we develop a license that would prohibit free software and free ideas from being used in any application related to warfare?

    "You have been charged by the War Crimes Tribunal with genocide, crimes against humanity, unprovoked aggression against peaceful neighbors ... and violating the terms of a license for free software!"

    "I was just following orders! But I was ordered to recompile my kernel!"

  • by edgrale ( 216858 ) on Friday June 08, 2001 @07:50AM (#165947)
    here [top500.org]! Enjoy!
  • Apparently you don't know too much about cluster communcations. In order for a node in a cluster to communcate using TCP/IP the packets have to go from data to being encapsulated into TCP and then IP all the way down to the ethernet frames. However on a cluster speed is a critical factor and if the each node had to do this, no matter how fast the network, the network cards would be the bottleneck. So if you scale TCP/IP to a couple of thousand processors all of which need almost realtime communcation, then yes its got a few large limitations, mainly the speed at which packets can be created and sent and unencapsualted. The internet works because you can live with that 30ms ping time, but with a cluster they want ping times of under 1ms.
  • I wonder how well Solitaire runs on one of these.
  • as it allows the creation of super computers based upon a clustered set of smaller computers?
  • by OblongPlatypus ( 233746 ) on Friday June 08, 2001 @09:15AM (#165951)
    I'm pretty sure he's referring to the CPlant machine, which is the cluster of commodity boxes running the CPlant software which has just been released. All the boxes making up CPlant are running Linux. From the FAQ:
    What version of Linux is being used?
    We are using Red Hat 5.1 and Linux-AXP v2.0.34.
  • by Wills ( 242929 ) on Friday June 08, 2001 @08:32AM (#165952)

    Will the x86-optimised ASCI Red BLAS, FFT and Extended Precision libraries [utk.edu] also be open-sourced and licensed under the GPL instead of the binary-only releases to-date?

  • Microsoft is going to have a hell of a time convincing Congress that the GPL is bad news with all the recent GPL software releases from federal agencies.

    NSA, NASA, Sandia, etc.

  • Users of Cplant wanted to checkpoint their computation every 10 minutes or so to graphically observe the progress of the program. This is a very common technique in large-scale computation. Unfortunately, it took more than an hour to do that on the incarnation of the hardware that had six OC3s for external communication. And that was before the machine grew bigger. Commodity machines tend to have commodity io. Get what you pay for (sometimes). I thought Cplant did a nice job of applying two-level structure to an otherwise flat sea of cluster nodes. Think of it as worker bees and team leader bees. A rack of workers would have a team leader responsible for that rack. There's a 1024p SGI machine at NASA Ames. You can run your program through the compiler, get an a.out, run it, get all of those puppies honking at the same time while file io can be instantly visible to any or all of them. File io could easily be in the GBytes/sec range to/from a single file descriptor and single file. Wake me up when a cluster can approximate that functionality (compile+go, unified io) albeit if not at similar performance levels.
  • Now watch all the posts on Beowulf clusters come in!
    Although this sounds good for Linux, now in the number 2 most powerful computer in the world. Another sign that Linux is on the rise and not "dead"
  • I didn't notice SETI [berkeley.edu] on the list . . . .
  • With the PS2 Linux Kit [cnet.com] this could result in some interesting games.
  • Cool! The T3E is kind of dull-looking, compared to the 'classic' Crays, but the SV1 looks great, and clusters!
    (Go on, someone say it)

    "What are we going to do tonight, Bill?"
  • by tb3 ( 313150 )
    Was anyone else suprised at the number of Crays on the list? I thought they had been obsoleted years ago, when SGI bought Cray. Obviously not.

    "What are we going to do tonight, Bill?"
  • WElll,,

    If you use Fiber, with LowLowLow latency switchs,
    you may get a very nice 1ms ping cluster.

    You also lost all your savings for $$$ Os in a nice collection of highly costly (and breakable) and speedy Fiber...

    Well, if you REALLY need it...
  • As a former employee of a DOE lab, working in the super computing center, the word was that Sandia is trying to perform real time simulation of the explosion of a thermo-nuclear device.
  • Linux is running on the second fastest Supercomputer(via. clusters of parallel computers) in the world. Call Guiness.
    ----
  • Their foreign governments with their own ideas about IP. They can change it if they please. It's not likely that we're going to start WWIII over a stupid, in the weakest sense, GPL license agreement(unless it is used as a scapegoat for something else.) Their is a high probability that they will change agreement if their countries need s change or for whatever reason. Now enquiring minds know. What will enquiring minds ask next?
    ----
  • If their going to drop a guided bomb, they're certainly not going to use Linux. What does a bomb need with a Bash prompt anyways. All it needs is some guided sensors, some basic logic and *maybe* a way to communicate to the mothership about its current situation. This is customized and hardware specific code for the current device. For cost reasons, they probably want to keep this on a small integrated chip, not an internal HD with megs of memory and an Open Source Kernel. It really doesn't make sense to use Linux.
    ----
  • You might be right. Unfortunately, this is also the government. They have the money and the resources to build customized chips. They certainly wouldn't want your precious money go to waste by not using it, would they? The economics of non-profit governments. Spend as much money as possible so that you can claim that you have insufficient funding(so you can get more.) Ironic, isn't it.
    ----
  • Sorry, Linux is not running the fastest Supercomputers. :(
    ----
  • First rule of life. Don't state the obvious. Of course the Bourne Again Shell is not Linux. The kernel is Linux, nothing else. Like duh. Next topic, it could probably be used for a smart bomb. I'm sure you could put MS-DOS on an integrated chip, add some additional digital logic, and design a bomb that would have an accuracy of 95%+. The question is, why? The government would have to write most of the logic itself and what ever is included in the Linux kernel is probably just unneeded extras. It could save money in the long-run but the government is not about saving money, it is about keeping the people employed and educated so that they can squeeze more money out of them in the long run. Their is a balance of interest here. Bigger government with more people employed by it or a smaller government with a smaller budget and fewer people employed. It all depends on the current status of the economy and whose in power to which one they choose.

    Secondly, their is a problem with the GPL license. Sinced no one "owns" the license, their is no one to sue them. Sure, if some big organization breaks the license, many people of the community could get together enough funds to hire a good lawyer and sue their ass, but that is based on the power of the people to cooperate. Their is no central organization to control the power and thus could result in chaos when people break the license, money is short, or no one wants to sue them. You're not going to spend a million dollars to sue the little guy because he broke the license agreement, it is uneconomical. This is a type of federalism and it can, when times are tough, plain suck.

    Third of all, GPL only works as an End User License agreement if everybody cooperates. If only a few are willing to cooperate while others do their thing, it breaks up. Fortunately, so far, groups have cooperated and it has worked. We may not be so lucky forever.
    ----
  • uhh? GSN's speed (aka HIPPI-2) is 6.4 gigabytes/s, not 800 mb/s, thats the speed of the old HIPPI.
  • by kiwimate ( 458274 ) on Friday June 08, 2001 @08:09AM (#165969) Journal
    Does anyone else find it interesting that, in the midst of all the usual IBMs and SGIs, three entries on last year's list are described as "self-made"? Numbers 215, 396, and 413 -- the last of which is termed an "NT super-cluster"; if you check out the link, it's a group of 38 dual-processor HP Pentium III Xeon 550 Kayaks running NT clustered together.

    Another shameless plug -- as an ex-pat kiwi, I was pleased to see that number 191 is at NIWA (National Institute for Water and Atmospheric Research) in Wellington.

Math is like love -- a simple idea but it can get complicated. -- R. Drabek

Working...