

"Cplant" Parallel Computing Tool 77
SEWilco writes "Sandia National Laboratories has released its "Cplant" massively parallel processing software. This is related to the software used in their ASCI Red supercomputer, and eliminates several scalability problems to allow hundreds of nodes for algorithms which can't be parallelized for Beowulf-type clusters. This is now number 2 on the TOP500 supercomputer list. The press release refers to "licensing terms", but the license is the GPL.
We discussed this in a Linux clusters discussion and several earlier reports as ASCI Red grew."
Imagine (Score:4)
Uh ... (Score:4)
In order to simulate new weapons configurations, it takes an awful lot of computing power. Just try to imagine all the factors that have to be tracked and taken into account in order to produce an accurrate and thorough simulation. Simulated tests have a lot of advantages, obvious (no radiation) and non-obvious (costs).
You've been reading YRO too much. Trust me. The government has a lot better uses to put its supercomputers to than breaking our SSH and PGP keys--like big guns and bombs for laying waste to the known world!
Re:Cplant vs Bproc (Score:1)
The Scyld Beowulf software is very nice for quickly setting up small to medium size clusters where users use the whole cluster more-or-less serially. IMHO, it doesn't fare quite so well for production oriented shops like Sandia, where things like accounting and scheduling become important. The Scyld software also has very limited support for Myrinet, which is a very nice (and very fast) interconnect for clusters.
You also need to remember that the Cplant stuff was specifically designed to emulate the user environment of the ASCI Red machine, which inherited its environment from Sandia's Paragon. That was done presumably to keep the retraining of Sandia's user base to a minimum. The Scyld software has no such requirements.
(Disclaimer: One of my coworkers used to work on Cplant, and we've borrowed some of Cplant's ideas [though not any of the software] for the clusters we have at OSC.)
Re:War pigs: like anybody would use your code (Score:2)
Both Linux, and GNU bash are licensed under the GPL as "free software." Stallman has stated [gnu.org] that free software stems from "Freedom Zero", namely "the freedom to run the program for any purpose, any way you like. "
To my knowledge, use restrictions would violate both the GPL [gnu.org] and Open Source Initiative's Open Source Definition [opensource.org].
A few things (Score:4)
Nuclear simulation: This is the big one. With popular opinion and world politics the way it is, it's likely we won't set off another thermonuclear detonation for a very long time. Unfortunately, we have a few thousand warheads that are aging and decaying, and we want to be sure (and make everyone else sure) that our final deterrent isn't turning into duds under our noses. This is pretty much the sole official justification for the national labs' supercomputing programs.
More nuclear simulation: After New Mexico's devastating summer fires last year, they stepped up research on the effects of fire on stored warheads (no, they won't go nuclear, but cleanup could still be awful). Simulating something that turbulent isn't easy, but it'll be nice to know if there are any further precautions Los Alamos needs to take.
Computational Fluid Dynamics - refining supercomputer code to cut down on the need for even more expensive wind tunnel time. Military and civilian uses: the two I saw were hypersonic parachute unfolding for bombers and drag-reducing plastic attachments for big rig trucks.
Impact testing - this is one of the big commercial apps of supercomputers; I don't know how much of it they're doing at Sandia right now. You can make vehicles a lot more crash safe cheaply if you can virtually destroy them (and refine their frame designs) hundreds of times before actually mangling hardware.
As for crypto breaking... no. For example, the Teraflops has 9 or 10,000 processors (just upgraded to 3xx Mhz Xeons, I'm told, since those are the fastest things that could be massaged into the old PPro sockets) - That's on the order of how many distributed net computers brute forced 64 bit encryption... so for 128 bit encryption you'd just need 16 quintillion more Teraflops supercomputers. Your PGP key is infinitely more likely to be snagged by some hacker's trojan and keylogger than it is by a government supercomputer.
Re:War pigs: like anybody would use your code (Score:2)
However, you are also missing the benefits of COTS (commodity-off-the-shelf). Although I doubt bombs use too many COTS parts, It is cheaper to use a known-working thing than it is to build something custom. Even if a bomb doesn't need megs of RAM (Linux does _not_ require an HD, just a flash card), it might be worth it to go ahead and use it to save the expense of writing a custom solution.
Re:Top 500 Supercomputers can be found... (Score:1)
He, I've seen and worked with 215. Pretty non awe inducing group of 96 dual Pentium III boxes using a Myrinet interconnect. You can find some pictures at the Kepler homepage [uni-tuebingen.de]. It's amazing what good lighting and a good photographer can do :-) That aside, the system itself is pretty cool (and damn fast, even if there's some contention among its users for computing time)
Re:wonderful || wondering (Score:2)
Re:War pigs: like anybody would use your code (Score:1)
Re:War pigs (Score:2)
Something you might find interesting: at one time the worlds largest repository of free and open-source software was at the (then) Army Ballistic Research Laboratory, open to anyone who could FTP there. It was an important resource during the 1980's when the free software community, a community that included the late Mike Muuss [matrix.net] of the BRL, was taking shape.
This is only one example of many from that era. (I hope it's not too trivial to point out that the Internet itself originated with the "War Pigs.") Had the GPL included an anti-military clause, there is a good chance that much of GNU would not exist -- if the movement had happened at all. Don't forget, the "War Pigs" paid for Stallman's ARPANET connection (via MIT, which was on the ARPANET by virtue of being a major military contractor).
I'm not attempting to justify the military, here, just pointing out that blindly excluding them may not be the best of ideas...
Re:wonderful || wondering (Score:1)
Re:Uh ... (Score:2)
ASCI !=Linux (Score:3)
I don't see a reference to Linux in the description of this supercomputer. I see the following link to the specs [sandia.gov] which describes the OS as:
The operating system used for the Service, I/O, and System Partitions is Intel's distributed version of UNIX (POSIX 1003.1 and XPG3, AT&T System V.3 and 4.3 BSD Reno VFS) developed for the Paragon XP/S Supercomputer. The Paragon OS presents a single system image to the user. This means that users see the system as a single UNIX machine despite the fact that the operating system is running on a distributed collection of nodes.
As much as I like to push Linux (I use it as my desktop) it just isn't correct to say it is in the #2 in the top 500 list.
Re:not so fast (Score:2)
Re:Cool (Score:1)
Re:Watchutalkinbout Hemos? (Score:1)
The press release actually mentions that it is necessary to agree to some licensing terms before downloading. This unnamed license turns out to be the GPL, which many of us know.
Re:I Wonder (Score:1)
Re:TCP/IP isn't scalable? (Score:1)
--
Re:TCP/IP isn't scalable? (Score:2)
--
Re:TCP/IP isn't scalable? (Score:2)
If clusters were limited to "thousands of nodes", I promise you nobody would notice.
You're talking about clusters of uniprocessors. I don't research this area myself, but I have been told that clusters of 4-way SMP machines saturate the memory bus before the netrork. That's because you have four CPUs and a Myrinet card all competing for the same bus, not to mention other DMA devices.Sure, if you use uniprocessors in your cluster, then the network becomes the bottleneck. No big insight there. And, it has nothing to do with TCP/IP.
You mean to say the TCP/IP scaling problems in clusters are the same as in the Internet? I think not.Look, I just think a claim like "TCP/IP has fundamental scaling problems" could have been phrased better, because it seems ridiculous at first glance. I'm sure their claim is valid, whatever it is, but clearly there is ample evidence that TCP/IP itself scales to far larger systems than any cluster.
--
TCP/IP isn't scalable? (Score:3)
--
Re:War pigs: like anybody would use your code (Score:2)
Also, I worked in defense contracting for Lockheed for the Aegis defense system. Of course the guided missles (like Tomahawks) don't run anything even remotely similar to what we would call an "OS". There aren't different levels of apps running on top of each other, theres just dedicated circuitry hardware designed to do what the missle needs to do. Running something like linux or even QNX would be utterly assinine.
not so fast (Score:2)
With about 30 machines dedicated to "research" surely one of those already perform the job well. FYI the government of the USA has 19 classified machines as well which are most likely NSA, and military machines. All of it can't be for so called nuclear research. At least in my eyes
wonderful || wondering (Score:3)
I always wondered what big brother does with these super computers, the FAQ says little about what tasks these perform, and I doubt you would need that much supercomputing for research.
So the question is, just what is Sandia doing with this? Making super comps to crack codes perhaps for the NSA? Aside from that maybe some sole company should look into recovering the hundreds of obsolete PC's that are being tossed and create a super comp to test with and perhaps create the ultimate crypto algorithm. (Yes I know slightly off topic)
Does anyone have any idea as to what these machines are truly doing?
Re:Watchutalkinbout Hemos? (Score:1)
Re:not so fast (Score:1)
For example, suppose that you are doing remote sensing with a satellite and need to calculate your sensor's position to within a 2 inch box at all points along the data track. This requires you to have an acurate track for your spacecraft along all of these points. To do this, you have to have a ridiculously accurate gravity models for the earth, moon, and sun, as well as models for all the other perturbative effects ( atmospheric effects, solar radiation pressure, etc... ). Most of these things are modelled as giant matrices. The solutions then involve manipulating all of these matrices with respect to your know positions umpty-squat-gagillion times(technical term). This is what a big vector processing machine buys you. Doing all of these inversions and other operations are just a few machine instructions vs. giant loops.
I'm sure that the reasons for so many of these machines that there are similar techniques in so many problem domains (fluid dynamics, etc...).
Re:Unicos (Score:1)
That being said, as far as this hardware - a commodity cluster system - goes, it seems that this is a pretty decent set of tools and optimizations.
Re:Crays? (Score:3)
Now that they've been "un-bought", Cray gets to put its name back on the list as an independent company.
As a side note, SGI sold the Cray division because it was "unprofitible" and a fiscal liability. Yet, Cray Inc. made a profit last quarter, and SGI has lost about $2/share for the last several quarters in a row, and just layed off another 1/3 of their workforce. Oh, and Cray's stock price is higher. Go fig. :)
Various... (Score:4)
Re:Watchutalkinbout Hemos? (Score:1)
If you are saying that the word "the" in front of GPL in unnecessary, I would disagree.
Read both of the following, aloud:
"This software is licensed under GNU Public License."
"This software is licensed under the GNU Public License."
-Peter
Watchutalkinbout Hemos? (Score:2)
What in the world does this mean? The GPL was a license and had several terms last time I checked.
-Peter
Re:How about ASCI Red BLAS,FFT+Extended Precision? (Score:1)
Use ATLAS (http://www.netlib.org, platform self tuning BLAS and LAPACK) and FFTW (run-time algorithm optimized Fourier transforms).
Both are portable and both approach or beat the performance of proprietary hand-tuned assembly written libraries.
But don't take my word for it. MATLAB (http://www.mathworks.com) now uses the ATLAS implementation of LAPACK / BLAS and MIT's FFTW in the their computational core.
I've used the ASCI Red BLAS and FFT stuff. I think the reason that it is not freely distributed is that it was developed in colloboration with Intel employees. However, ASCI Red libraries always had the disclaimer to the effect that if you had a compelling reason to have the source something could be worked out.
Check out how FFTW works. It is one of the few things I've seen that I would actually consider clever. Basically, FFTW designs a algorithm at run-time which is optimal for your cache size, register file depth, memory bandwidtch and transform type; powers of two sizes are not required. What really impressed me is that FFTW's codelet generator stumbled across a couple of hitherto unknown algorithms with reduced flops for computing strange sized FFTs.
ATLAS is pretty clever too. For kicks, run the installation and watch it tune the kernels. The routines for portably diagnosing FPU register size, FPU MAC performance and cache sizes are useful to have around.
Kevin
Re:Top 500 Supercomputers can be found... (Score:1)
528 Pentium III-800 computer, running Red-Hat Linux. (well, usually it is some less, since a couple are out of order at any time). Just normal mini-towers in long rows of shelves. And a huge network switch. Normally, the info can be found here [tu-chemnitz.de], but since Murphy lives, it seems like our webserver is down right now, another picture [chemnitz.ihk.de].
Well, back to abusing this machine.
Re:not so fast (Score:1)
I think you guys are mixing terms with the word "trivial". There's algorithm trivial and implementation trivial
Cracking PGP *is* trivial, in the sense that the algorithm to do it is published, understood, and widely believed to be the best we're going to have for a long time. Effectively, it's at a standtill as far as evoltion of real implementations goes. And to do it takes more computing power than we've got. Algorithm trivial, but not implementation trivial.
Now, the number theory and algorithm research going into crypto is *very* non-trivial, but this has yet to trickle down to the implementations in a meaningful way
Massive simulations such as a nuclear detonation are still an open-ended problem at the algorithm level. We've got some pretty good ideas about ways to do it that might reflect reality, but different angles are published regularly. Which degrees of freedom to play with, shrkinking residual versus orthoganal residual convergence, blah blah blah... *And*, as an added bonus, these things can be investigated and the results evaluate *before* our sun becomes a neutron star (unlike, say, cracking RSA).
This is one we'd call implementation trivial, but not algorithm trivial.
We now return you to your regularly scheduled slashdot fodder... :)
Re: Various... (Score:1)
The experience for this was related to SunMOS, an OS for the paragon and maybe the nCUBE.
I'm not sure when the source was put up, but from what I can tell, the site hasn't been updated in almost a year
I sent in this: 2001-04-20 21:15:46 Sandia Labs Cplant software under GPL (articles,linux) (rejected) a while back. So, that web page was updated within 6 weeks. This was sent in about 3 days after the announcement of GPL went up.
Re:TCP/IP isn't scalable? (Score:1)
So, my question to you is, "Have YOU heard of the internet?" Because, if you think the Internet doesn't have scaling problems, where have you been the past 10 years? Under a rock?!
Historical Link (Score:2)
Some of the particular issues surrounding Sandia's Cplant project were the subject of a previous story on Slashdot [slashdot.org].
AFAICT, the upshot is all the tweaking that must be done to coax higher performance on numerically intensive codes with that many processors.
As many in the numerical simulation community already know, message-passing codes abuse a network in a way that web browsers do not; demanding lower latency and higher bandwidth than can be provided by plain ole 10/100 Mb Ethernet (at least for large numbers of high SPECfp processors with any reasonable memory speed.)
The existence of Linux open source code facilitates the creation of their Portals layer that sits underneath MPI and above the Myrinet hardware on these Alpha machines.
Re:not so fast (Score:2)
Abstraction Software (Score:3)
This is an important software release, because it is a step away from hand rolled, low level message passing, toward a standarized means of communicaton between nodes at a much higher level of abstraction. Think of it this way: You don't want to have to write all of the control logic for processes that are divvied out to the nodes when you are writing an application. Instead, you provide base classes of behaviour, distribute them to all of the nodes, and then inherit and instantiate specialized behaviors for _EACH JOB_ from a control partition.
This provides a nice level of abstraction for the programmer. It also puts Linux MPP systems in the same class as your IBM SP/2, NCR/Teradata, and Clustered Solaris systems, among others. I think that I will be doing some work on enhancing this software!
Oh, and yes, I do professional parallel programming for a living.
Cheers and kudos to Sandia for releasing this as GPL!!!
Re:not so fast (Score:2)
With all due respect, I don't think you really know what you're talking about. The amounts of data involved in these simulations is simply mind-boggling. And no matter how many points you simulate, and how many time-steps it runs over, it's still only a simulation. The way to get a more accurate simulation is to increase the number of data elements being simulated and decrease the amount of time between 'steps'.
Basically, you might be running a sim for weeks (or longer) to be able to accurately simulate the first half-second of a nuclear explosion. Trust me when I tell you that they want the biggest, fastest computer your money can buy
Most consumer-grade crypto is pretty trivial compared to these problems, and the (percieved) need to simulate nuclear explosions is probably greater than the need to break your PGP code and see where the last place you boffed the boss's wife was.
Incidentally, contrary to what some other people were saying, these big beasts are very useful to certain types of research outside of cryptography. Astronomy, for instance.
i luv gubmint softwer (Score:2)
The requested URL
Treatment, not tyranny. End the drug war and free our American POWs.
Will Saddam Hussein respect the GPL? (Score:1)
Re:Unicos (Score:2)
ASCI Red runs Paragon OS, not Linux (Score:1)
Linux is great, but lets get our facts straight (Score:1)
Just because an application uses a cluster does not automatically mean that it's running on a stack of comodity PCs running Linux in a beowulf-style cluster interconnected via GigE or Myrinet. It also doesn't automatically mean that the application isn't capible of running well on a single large machine.
For example, NOAA recently put together a cluster for computing weather models for the upcoming hurricane season. Their cluster is actually 8 machines, each a 128 CPU SGI Origin 3800 runing IRIX 6.5. The 8 machines are interconnected through a thick mesh of GSN (gigabyte system network, a modern version of HiPPI that can transfer 800 megabytes/sec per link). The messaging protocols used are a mixture of shmem, OpenMP, and MPI.
Linux is great and all, but ASCI Red uses Intel's Paragon OS, a derrivative of Unix.
Re:Cool (Score:1)
Re:Linux is great, but lets get our facts straight (Score:2)
Who is the Cray Woman? (Score:2)
Who is the "Cray woman on the upper right-hand side of most cray.com pages?"
http://www.cray.com/products/index.html [cray.com]
Lotsa new Crays (Score:3)
Lots of new products and they're even making a profit.
http://www.cray.com/products/systems [cray.com].
Nice varitey of systems, from their own SV1/SV1ex/SV2 machines, to Linux clusters, to maspar Alphas, to NEC vector-based machines, and more.
Anyone else noticing what proc they all run? (Score:1)
Lets see, ALPHAs. 'nuff said.
Alphas aren't anywhere near dead, as many people have said they are, neither is cray.
english, baby. (Score:1)
don't know why... but does anyone remember in Pulp Fiction when the black guy (Jules) is threatening that guy with the gun and says "English, motherf*cker, do you speak it?" and then almost blows his brains out? Sometimes that's how i feel.
And the Intel stands alone... (Score:1)
With all of the talk about Beowulf clusters of this and that, I'm surprised that Intel has only one appearance on the the Top500 Supercomputers [top500.org] list.
MayorQ
Unicos (Score:1)
--
A new charge at Nuremburg (Score:1)
"You have been charged by the War Crimes Tribunal with genocide, crimes against humanity, unprovoked aggression against peaceful neighbors ... and violating the terms of a license for free software!"
"I was just following orders! But I was ordered to recompile my kernel!"
Top 500 Supercomputers can be found... (Score:4)
Re:TCP/IP isn't scalable? (Score:1)
I Wonder (Score:1)
GPLed Code is then subject to export restrictions (Score:1)
CPlant != ASCI (Score:3)
How about ASCI Red BLAS,FFT+Extended Precision? (Score:3)
Will the x86-optimised ASCI Red BLAS, FFT and Extended Precision libraries [utk.edu] also be open-sourced and licensed under the GPL instead of the binary-only releases to-date?
Glad the government likes the GPL (Score:2)
NSA, NASA, Sandia, etc.
super computing needs super IO (Score:2)
Beowulf? (Score:1)
Although this sounds good for Linux, now in the number 2 most powerful computer in the world. Another sign that Linux is on the rise and not "dead"
Re:Top 500 Supercomputers can be found... (Score:1)
Fun for Sony (Score:1)
Re:Crays? (Score:1)
(Go on, someone say it)
"What are we going to do tonight, Bill?"
Crays? (Score:2)
"What are we going to do tonight, Bill?"
Re:TCP/IP isn't scalable? (Score:1)
If you use Fiber, with LowLowLow latency switchs,
you may get a very nice 1ms ping cluster.
You also lost all your savings for $$$ Os in a nice collection of highly costly (and breakable) and speedy Fiber...
Well, if you REALLY need it...
Re:wonderful || wondering (Score:1)
Cool (Score:1)
----
Re:Will Saddam Hussein respect the GPL? (Score:1)
----
Re:War pigs: like anybody would use your code (Score:1)
----
Re:War pigs: like anybody would use your code (Score:1)
----
Re:Cool (Score:1)
----
Re:War pigs: like anybody would use your code (Score:1)
Secondly, their is a problem with the GPL license. Sinced no one "owns" the license, their is no one to sue them. Sure, if some big organization breaks the license, many people of the community could get together enough funds to hire a good lawyer and sue their ass, but that is based on the power of the people to cooperate. Their is no central organization to control the power and thus could result in chaos when people break the license, money is short, or no one wants to sue them. You're not going to spend a million dollars to sue the little guy because he broke the license agreement, it is uneconomical. This is a type of federalism and it can, when times are tough, plain suck.
Third of all, GPL only works as an End User License agreement if everybody cooperates. If only a few are willing to cooperate while others do their thing, it breaks up. Fortunately, so far, groups have cooperated and it has worked. We may not be so lucky forever.
----
Re:Linux is great, but lets get our facts straight (Score:1)
Re:Top 500 Supercomputers can be found... (Score:4)
Another shameless plug -- as an ex-pat kiwi, I was pleased to see that number 191 is at NIWA (National Institute for Water and Atmospheric Research) in Wellington.