North America's Fastest Linux Cluster Constructed 325
SeanAhern writes "LinuxWorld reports that 'A Linux cluster deployed at Lawrence Livermore National Laboratory and codenamed 'Thunder' yesterday delivered 19.94 teraflops of sustained performance, making it the most powerful computer in North America - and the second fastest on Earth.'" Thunder sports 4,096 Itanium 2 processors in 1,024 nodes, some big iron by any standard.
Imagine a ... (Score:5, Funny)
And you thought I was going to say something else...
Re:Imagine a ... (Score:2)
Re:Imagine a ... (Score:5, Funny)
Re:Imagine a ... (Score:2)
Re:Imagine a ... (Score:2)
Re: (Score:2, Redundant)
Re:Imagine a ... (Score:4, Funny)
Very great and all... (Score:5, Interesting)
Re:Very great and all... (Score:5, Insightful)
Depending on budget, price (I wouldn't be suprised if Intel cut them a sweet deal to get this cluster publicized to help our their product's sales), and other factors, the Itanium could have been a good choice.
Especially if they were using software that had been designed for the Itanium (like they were replacing an older cluster) then they wouldn't have to port the software which would have saved real money.
I'm not a fan of Intel lately, but the Itanium isn't overpriced garbage no matter what. That smacks of fanboyism. Interesting you didn't add G5s to your list, BTW.
ALSO: Don't forget that the Itanium 2 was DESIGNED FOR big iron, while the Opteron was designed for servers and small iron. They can be used in other ways (you could run a web site off an Itanium 2), but the Itanium was designed for these kind of applications.
Re:Very great and all... (Score:2)
Re:Very great and all... (Score:5, Informative)
Compared to a Xeon or AthlonMP cluster, the Itanium faired poorly in price/performance. The only reason to use Itaniums was if you needed 64 bits for more than 4GB of memory, or needed high single CPU performance for a pooly parallized application. (Of course if your application parallizes poorly, a cluster is probably a bad choice to begin with). Then Opterion came out and changed all that. It's 64 bits, it's fast, and it's a fraction of the price of the Itanium2.
I just purchased a new Beowulf cluster. The decision was between Xeons vs Opterons. The Opterons had better price/performance, but the Xeons would fit in better with our existing Pentium3 Beowulf, other ia32 servers, and existing software. In the end, we went with Opterons. Itanium2 was never even in contention. Just one look at the price and performce of a Itanium2 system was all it took to cross it of the list.
Re:Very great and all... (Score:4, Informative)
Re:Very great and all... (Score:3, Interesting)
Re:Very great and all... (Score:5, Informative)
The Opteron 248 is $670 on pricewatch, while the 1.5 GHz It2 is $5200! The motherboards are like $1400 vs $400.
You have to keep in mind that this isn't a single machine, it's a cluster. You could take the money spent on an Itanium2 cluster, and buy an opteron cluster with five times as many processors. I am well aware that one does not get perfect scaling. But if you are running something on a cluster in the first place, I have a hard time imagining something that is faster with one fifth as many 27% faster processors. Yes, there are codes that would be faster on 1000 Itanium2 vs 5000 Opterons, but you would never runs these on cluster, because they would be faster still shared memory system.
Re:Very great and all... (Score:5, Insightful)
A system like this will use a high-speed interconnect, not gige. The popular choice right now is infiniband, and that stuff isn't cheap, and also has limits to the number of ports per IB switch. The system at LLNL has 4 procs per node, which reduces the number of IB switches involved. 5000 dual proc (you suggest 248 proc) machines would require 2500 IB ports, instead of 1024.
now if you considered the opteron 848 ($1300), in 8proc nodes, that would be something to think about, reduce the number of IB ports in half, and be able to double the processors.
the other consideration is also processor scale. the 27% per CPU is signifigant, because even with dual proc SMP, you loose some % of the CPU time. There was a posting on an article about how processors scale this way. I forget how the principle works.
Re:Very great and all... (Score:3, Insightful)
But look at NUMALink4, its got 6.4 GB/sec per link bandwith and 240ns latency.
QsNetII is just under 1 GB/sec bandwidth, the limit of PCI-X, with a latency of 3us.
So, NUMALink4 has 6.4 times the badwidth and 12.5 times less latency than QsNetII. That a much larger performance differen
Re:Very great and all... (Score:3, Interesting)
The It2 probably cost around 5 times as much as the opterons, so a real comparison would be 32 It2 processors vs 160 Opterons. With the scaling shown for that model, the Opterons of equilivent
Re:Very great and all... (Score:3, Funny)
Wow! What a great argument strategy! Let me try...
I like slashdot as much as the next guy. But the fact is that CmdrTaco is an evil blood-sucking cyborg who kills a puppy for each and every slashdot subscriber. I don't remember where I found this irrefutable proof, so you'll have to look it up yourself (or someone will reply with it).
Itanium vs Opteron (Score:5, Insightful)
"Big Iron" is a very vague term - server benchmarks behave very differently than scientific computation as far as performance is concerned; if you don't believe me I can easily point you to a couple of research papers analyzing them.
The humongous on-die caches makes the Itanium perform well on servers, and definitely not the instruction-set architecture. So "WAS DESIGNED FOR" is only 50% true.
Re:Very great and all... (Score:4, Insightful)
That sentence doesn't even parse, but anyhow: single-thread performance still matters to clusters. There is a limit to how much you can effectively parallelize many problems. If that limit is 1, then you need a Cray or something. If the limit is extremely high, you can use distributed.net, or a cluster of recycled C64s.
In the middle, you might be able to parallelize the task to a limited extent. If you can only split your work into 500 parallel tasks, then you want 500 of the fastest processors you can get. For many applications, that means 500 Itaniums. Even if you could buy 800 Opterons for the money, they might not be as fast.
only other option would be they thought intel would hold up better/be more stable.
Itanium has slightly better manageability; you can find out when a memory module or CPU is likely to fail for example. There is a heap of error detection/correction in the CPU, far beyond Xeon or Opteron afaik. If you have hundreds of machines being able to easily detect failures is worth something.
(Or you can just take the google route and let it fail and replace the whole box. But that really requires your whole application to be written to accomodate it.)
Re:Very great and all... (Score:4, Informative)
There is a limit to how much you can effectively parallelize many problems. If that limit is 1, then you need a Cray or something.
Well, Crays are also parallel computers, so they won't help you much in this situation. Some Crays do have vector processors, but that is also a sort of parallelism. It's just that you use that parallelism through tuned BLAS libraries or with a vectorizing compiler (e.g. Fortran 95, HPF and such things), instead of doing it manually with MPI or threads or something like that. So if you're problem is totally serial, a vector processor won't help you either.
(Or you can just take the google route and let it fail and replace the whole box. But that really requires your whole application to be written to accomodate it.)
Not necessarily. Most supercomputers are not used to run a single job taking months, but rather they run lots of smaller and shorter jobs. On the p690 cluster where I do my stuff, I (and apparently most users) mostly run jobs using about 8-16 cpu:s , with a runtime of a few hours to a day. If one node would fail, the jobs that are executing on that node would also fail. It's no big deal, just resubmit the job to the queue when you get around to it.
Of course, if you're programming one of the very few and far between applications that has a runtime of months, you certainly want to save intermediate results once in a while. Not only to guard against hardware failure, but also so that the user can check the intermediate result and see if the app is still on the right track. It would be quite a bummer to use months of cpu time only to realize the entire thing is wasted because you specified the initial values wrong..
Linux support (Score:3, Insightful)
See: http://www.llnl.gov/linux/linux_basics.html#compi
Intel can afford to provide little niceties like this. Can AMD? I doubt it.
Re:Very great and all... (Score:3, Informative)
Re:Very great and all... (Score:3, Informative)
Some reps from SGI came to my LUG [golum.org] the other day, and talked about their clusters and supercomputers. The guy doing the Q&A said that he per
Re:Very great and all... (Score:2, Informative)
I'd hate to be the guy... (Score:3, Funny)
I cringe when I leave the A/C on for too long..
Re:I'd hate to be the guy... (Score:2, Insightful)
That said, I think our national labs are pretty great when they aren't designing nukes.
Re:I'd hate to be the guy... (Score:2, Interesting)
Re:I'd hate to be the guy... (Score:2)
$2,863,104 total.
"Most" powerful (Score:5, Interesting)
Re:"Most" powerful (Score:5, Insightful)
Re:"Most" powerful (Score:3, Interesting)
There are many purpose-built supercomputers coming up (like Sandia's Red Storm) that use custom yet pricy interconnects that end up smoking anything Quadrics can put together. Anytime your interconnect relies on a PCI-type bus, you take a latency penalty on each end. Real supercomputers access memory on other nodes directly, not through
Re:"Most" powerful (Score:5, Informative)
According to Quadrics latest price list, the cards are $1200 each, $913 per port for a 64 node switch, and $185-$265 for a cable. That's $2300/node.
Myrinet cards are $595, the switch is $400 per port for 64 nodes, and the cables are ~$50. That's $1050/node.
Quadric's price for a 1024 node interconnect is $4,176,094. That's hardly chump change. The bandwith is about 10x higher than gigabit ethernet, and the latency about 100x lower.
Re:"Most" powerful (Score:5, Interesting)
Re:"Most" powerful (Score:2)
Re:"Most" powerful (Score:2)
Re:"Most" powerful (Score:5, Insightful)
Re:"Most" powerful (Score:2)
Mod parent up! Now let's get out the amp meters.. let's compare these computers kilowatt to kilowatt.
Then we can be quite literal about which computer has "more horsepower," as one kilowatt is about 1.34 horsepower. (-:
When we all know... (Score:2)
Nothing is as whiney as a Vauxhall Chevette doing 125km/h down the motorway, knowing that it ain't gonna get any faster, except maybe on a slope.
Re:"Most" powerful (Score:2)
Nah, but who knows what the NSA has cooking. (Score:2, Insightful)
The NSA, on the other hand... I would guess that they have the most powerful cluster of machines in the world for breaking encryption. Though perhaps not as powerful as the article's supercomputer for other tasks.
Plus there are undoubtedly several other highly classified supercomputers designed to chew on other problems.
how fast is it? (Score:4, Funny)
Awesome! (Score:2, Funny)
Now we can... uhh... what are we supposed to do with that much power again?
Re:Awesome! (Score:5, Funny)
Re:Awesome! (Score:2)
Nah, seriously though, we'll use it to support the largest growing industry. That industry which powers and drives the human imagination and recognizes our g
Re:Awesome! (Score:2)
but but but (Score:4, Funny)
Did hell freeze over? (Score:5, Funny)
LLNL built a supercomputer, and it's going to do things besides simulate nuclear weapons [llnl.gov]?
Quick, someone ring Satan and ask how the sno-cones are.
Re:Did hell freeze over? (Score:5, Insightful)
The lab is a GOOD thing damnit. Do you even know what nukes are? What nuclear research has done for us? Grow up man.
LLNL's usefulness (Score:2, Troll)
I think that speaks volumes as to the usefulness of LLNL's research. After all, it's been 10 years, and there are still no hydrogen-powered cars available for purchase by consumers. Furthermore, there is extremely little research needed in the area; hydrogen conversion kits were developed by numerous companies and individuals decades ago.
Why no hydrogen cars? Well, it could have something to do with hydrogen being a
Re:LLNL's usefulness (Score:3, Informative)
That's thermodynamics. It's true for any fuel. It's even true for oil and nuclear energy - the difference being only that the energy wasn't put in during our lifetime. (And in the case of nuclear, that the pre-existing energy is all but inexhaustible.)
Re:Did hell freeze over? (Score:3, Interesting)
-B
Re:Did hell freeze over? (Score:2)
Google Cache (Score:2, Informative)
I don't care what anyone says (Score:5, Funny)
vs google (Score:2, Interesting)
Re:vs google (Score:5, Informative)
The GFS article that appeared a while back said they used standard 100MBit ethernet, this is not going to get you a good score in any supercomputer benchmark.
Nope. But they can't do what google does either. (Score:3, Informative)
You have several types of clusters, each are designed to do a specific task, although you can easily mix-n-match for different purposes.
1. Server clusters. Bunches of machines running together, providing services that compliment each other.
For example you have a file server that is mirrored to another that is hooked up to a different part of a Lan/Wan backbone in order to improve service. Lot's of databases are clusters like this.
2. High avaiblity clusters.
Finally... (Score:2, Funny)
The way I see it... (Score:2, Redundant)
Re:The way I see it... (Score:3, Funny)
hear
of
Paragraph
tags?
Re:The way I see it... (Score:2, Informative)
That's a wildly inaccurate summary of the landscape of RDBMS clustering technology.
Problem is, that's not what we are talking about here.
So the answer to your question at this end is almost certainly "none of the above" or probably more correctly "some bits of all of the above". Functionally most of the kind of stuff you do here doesn't need shared concurrent access to the same data files however for simplicity of implementation they probably
Re:The way I see it... (Score:2, Informative)
Re: A Better Way To See It (Score:3, Insightful)
Ed Note: Unless the author wishes to narrow his/her audience to a small subset of Slashdot users, standard formatting and non-cutesy sentence case is always appropriate.
There are basically three type of clusters:
Shared Nothing: In this, each computer is only connected to each other via simple IP network: no disks are shared. and each machine serves part of data. These cluster doesn't work reliably when you have to aggregations. For example, if one of the machine fails and you try to to "avg()" and i
Another Article (Score:4, Interesting)
Clarification (Score:2, Funny)
Gimmy something I can grasp; what's this in BogoMips?
2nd fastest supercomputer (Score:5, Funny)
OK, here goes... (Score:4, Funny)
OK, I'm done. Sorry. Mod away!
apple's response will be interesting (Score:5, Insightful)
The Virginia Tech cluster for Apple had an Rmax of 10.28 teraflops with 2200 processors.
So, the Itaninum 2 delivered 4.8 gigaflops per processor, the G5 delivered 4.6 gigaflops per processor.
This seems like a pretty poor showing for Itanium 2, overall. It's a much hotter chip than the Opteron or the G5, so cooling and power costs are likely much higher than a comparable apple cluster. The Xserve G5 is also likely cheaper than a similarly equipped Itanium 2 server, given that the Itanium 2 is $1398 per chip on Pricewatch, and a dual processor Xserve G5 cluster node is $2,999 list. Even with 4 cpus in a single box, I think the Itanium 2 server would easily top $6,000.
But anyway, good game to Lawrence Livermore. I'll be curious to see if Apple has another volley to fire before the top500 list closes for this round.
Re:apple's response will be interesting (Score:3, Interesting)
Re:apple's response will be interesting (Score:2)
As for "can't say much"... I think that having the world's third fastest supercomputer for a fraction of the cost of its peers says plenty.
Re:apple's response will be interesting (Score:3, Interesting)
Lets see what the VTech system does with ECC RAM installed when some node's aren't double-checking other node's results.
Re:apple's response will be interesting (Score:3, Interesting)
Re:apple's response will be interesting (Score:2)
Like I said, I'm surprised the Itanium 2's performance was so low, given that it's a newer architecture than the PowerPC 970.
Re:apple's response will be interesting (Score:2, Interesting)
At the time - this was a study done in July/Aug 2003, remember - the speed of the G5 and the Itanium2 were similar for the same clock speed (for scientifi
Re:apple's response will be interesting (Score:5, Insightful)
It does? You know that clustered computing doesn't scale linearly. If virginia tech were to double the amount of processors used, they wouldn't double their performance.
Re:apple's response will be interesting (Score:4, Insightful)
Thunder is an absolutely remarkable machine.
Rejoicing at Intel (Score:5, Funny)
Sadly... (Score:5, Funny)
"The Itaniums, however, remain unsold."
*hopes that was not an actual mistake but rather a poorly conceived pun on "inane"...*
Second fastest on earth? (Score:3, Interesting)
Re:Second fastest on earth? (Score:2)
Treasury could use one to figure out the tax code, nothing else has worked.
Heat (Score:3, Funny)
So THAT'S what's causing our heat wave!
Wow (Score:2, Informative)
Take that Apple! (Score:3, Funny)
Re:But... (Score:2)
I really hope (Score:4, Funny)
LK
before everyone starts shouting at once... (Score:5, Insightful)
Some of the coolest features of the Itanium are also some of the reasons why a lot of people don't want to use it. The EPIC ISA, for example. It was designed ( along w/ the physical hardware ) to expose a lot of the internal workings of the processor to the user. But rather than recompile and re-optimize their code, people would rather bitch about migration. That's fine for workstations and servers, but in an HPC environment, you want the nifty features, you want to occasionally hand-tune code segments in assembler, etc.
Anyways, I'm not a fanboy ( well, maybe an AMD and MIPS fanboy ), just wanted to get in a few honest points before everyone started shooting holes in the Itanic.
Re:before everyone starts shouting at once... (Score:5, Informative)
I just coded some IA-64 assembly and from what I've seen, this comment is dead-on. They've got a lot of interesting features:
If you just have a simple sequence of operations, each dependant on the one before, you can't really take advantage of these capabilities. (My code was like this. Even though performance wasn't my reason for writing assembly, it was a little disappointing that I couldn't play with the new toys.) If you're expecting these features to make Word start faster, you'll probably be disappointed.
But if you're doing intensive computations in a tight loop, you can do amazing things. If you can get all the execution units working simultaneously, it will fly. And the features like rotating registers are designed to make that possible. You need a very good compiler or a very smart person to hand-tune it. You may need to recompile to tune if your memory latency changes (affecting how many iterations to run at once) or they come out with a new chip with more sets of execution units. But in a situation like this, none of that is a problem. They'll have applications designed to run as fast as possible on this machine. They may never be run anywhere else.
What about SCO? (Score:4, Funny)
do they have the nerve to go after this cluster?
afterall they are trying extortion by lawyer against other large Linux users
Big Iron? (Score:5, Funny)
If the government gets a hold of that, we're going to need some big tinfoil...
Probably OT, but... (Score:4, Interesting)
can't believe nobody mentioned favourite vendor! (Score:3, Funny)
I can see the investors now rubbing their 2 cents together....
Top500 list not updated yet (Score:3, Insightful)
Re:Whoa. (Score:4, Informative)
Re:Whoa. (Score:2)
Japan: PWN3D!
USA: Doh!
Yes but... (Score:3, Funny)
Re:It's all about sticking it to the mac. (Score:2)
Re:It's all about sticking it to the mac. (Score:3, Informative)
But... (Score:3, Funny)
Of course... (Score:3, Interesting)
300 processors. Thats 150 dual-processor boxes. I can't be bothered working it out now, but how far that goes to eliminating the power & heat advantage the G5 has would be interesting to find out...