North America's Fastest Linux Cluster Constructed 325
SeanAhern writes "LinuxWorld reports that 'A Linux cluster deployed at Lawrence Livermore National Laboratory and codenamed 'Thunder' yesterday delivered 19.94 teraflops of sustained performance, making it the most powerful computer in North America - and the second fastest on Earth.'" Thunder sports 4,096 Itanium 2 processors in 1,024 nodes, some big iron by any standard.
Imagine a beowulf cluster of beowulf jokes (Score:-1, Insightful)
Re:I'd hate to be the guy... (Score:2, Insightful)
That said, I think our national labs are pretty great when they aren't designing nukes.
Re:"Most" powerful (Score:5, Insightful)
Re:Very great and all... (Score:5, Insightful)
Depending on budget, price (I wouldn't be suprised if Intel cut them a sweet deal to get this cluster publicized to help our their product's sales), and other factors, the Itanium could have been a good choice.
Especially if they were using software that had been designed for the Itanium (like they were replacing an older cluster) then they wouldn't have to port the software which would have saved real money.
I'm not a fan of Intel lately, but the Itanium isn't overpriced garbage no matter what. That smacks of fanboyism. Interesting you didn't add G5s to your list, BTW.
ALSO: Don't forget that the Itanium 2 was DESIGNED FOR big iron, while the Opteron was designed for servers and small iron. They can be used in other ways (you could run a web site off an Itanium 2), but the Itanium was designed for these kind of applications.
apple's response will be interesting (Score:5, Insightful)
The Virginia Tech cluster for Apple had an Rmax of 10.28 teraflops with 2200 processors.
So, the Itaninum 2 delivered 4.8 gigaflops per processor, the G5 delivered 4.6 gigaflops per processor.
This seems like a pretty poor showing for Itanium 2, overall. It's a much hotter chip than the Opteron or the G5, so cooling and power costs are likely much higher than a comparable apple cluster. The Xserve G5 is also likely cheaper than a similarly equipped Itanium 2 server, given that the Itanium 2 is $1398 per chip on Pricewatch, and a dual processor Xserve G5 cluster node is $2,999 list. Even with 4 cpus in a single box, I think the Itanium 2 server would easily top $6,000.
But anyway, good game to Lawrence Livermore. I'll be curious to see if Apple has another volley to fire before the top500 list closes for this round.
Re:"Most" powerful (Score:5, Insightful)
Re:"Most" powerful (Score:1, Insightful)
If I was going to move my stuff to the other side of the country, how many trips would that take me in a Corvette?
Re:Did hell freeze over? (Score:5, Insightful)
The lab is a GOOD thing damnit. Do you even know what nukes are? What nuclear research has done for us? Grow up man.
before everyone starts shouting at once... (Score:5, Insightful)
Some of the coolest features of the Itanium are also some of the reasons why a lot of people don't want to use it. The EPIC ISA, for example. It was designed ( along w/ the physical hardware ) to expose a lot of the internal workings of the processor to the user. But rather than recompile and re-optimize their code, people would rather bitch about migration. That's fine for workstations and servers, but in an HPC environment, you want the nifty features, you want to occasionally hand-tune code segments in assembler, etc.
Anyways, I'm not a fanboy ( well, maybe an AMD and MIPS fanboy ), just wanted to get in a few honest points before everyone started shooting holes in the Itanic.
Nah, but who knows what the NSA has cooking. (Score:2, Insightful)
The NSA, on the other hand... I would guess that they have the most powerful cluster of machines in the world for breaking encryption. Though perhaps not as powerful as the article's supercomputer for other tasks.
Plus there are undoubtedly several other highly classified supercomputers designed to chew on other problems.
So it would seem that you'd have to caveat any claim of regarding the "fastest computer" by saying it's the fastest known, non-secret computer. But then the headline loses some of its appeal.
Re:apple's response will be interesting (Score:5, Insightful)
It does? You know that clustered computing doesn't scale linearly. If virginia tech were to double the amount of processors used, they wouldn't double their performance.
Itanium vs Opteron (Score:5, Insightful)
"Big Iron" is a very vague term - server benchmarks behave very differently than scientific computation as far as performance is concerned; if you don't believe me I can easily point you to a couple of research papers analyzing them.
The humongous on-die caches makes the Itanium perform well on servers, and definitely not the instruction-set architecture. So "WAS DESIGNED FOR" is only 50% true.
Linux support (Score:3, Insightful)
See: http://www.llnl.gov/linux/linux_basics.html#compi
Intel can afford to provide little niceties like this. Can AMD? I doubt it.
Re: A Better Way To See It (Score:3, Insightful)
Ed Note: Unless the author wishes to narrow his/her audience to a small subset of Slashdot users, standard formatting and non-cutesy sentence case is always appropriate.
There are basically three type of clusters:
Shared Nothing: In this, each computer is only connected to each other via simple IP network: no disks are shared. and each machine serves part of data. These cluster doesn't work reliably when you have to aggregations. For example, if one of the machine fails and you try to to "avg()" and if the data is spread across machines, the query would fail, since one of the machine is not available. Most enterprise apps cannot work in this config without degradation. For example, IBM study showed that 2 node cluster is slower and less reliable than 1 node system when running SAP IBM on windows and unix and MS uses this type of clustering (also called federated database approach or shared nothing approach).
Shared Disk Between Two Computers: In this case, there are multiple machines and multiple disks. Each disk is at least connected to two computers. If one of the computer fails, other takes over. no mainstream database uses this mode, but it is used by hp-nonstop. Still, each machine serves up part of the data and hence standard enterprise apps like SAP etc cannot take clustering advantage without lot of modification.
Shared Everything: In this, each disk is connected to all the machines in the cluster. Any number of machines can fail and yet the system would keep running as long as at least one machine is up. This is used by Oracle. All the machine sees all the data. Standard apps like SAP etc can be run in this kind of configs with minor modification or no modification at all. This method is also used by IBM in their mainframe database (which outsells their Windows and Unix database by huge margin).
Most enterprise apps are deployed in this type of cluster configuration. The approach one is simpler from hardware point of view. Also, for database kernel writers, this is the easiest to implement. However, the user would need to break up data judiciously and spread across machines. Also adding a node and removing a node will require re-partitioning of data. Mostly only custom apps which are fully aware of your partitioning etc will be able to take advantage.
It is also easy to make it scale for simple custom app and so most of TPC-C benchmarks are published in this configuration. Approach 3 requires special shared disk system. The database implementation is very complex. The kernel writers have to worry about two computers simultaneously accessing disks or overwriting each others data etc. This is the thing that Oracle is pushing across all platforms and IBM is pushing for its mainframes. Approach 2 is similar to approach 1 except that it adds redundancy and hence is more reliable.
So what type are we talking about here?
Re:apple's response will be interesting (Score:4, Insightful)
Thunder is an absolutely remarkable machine.
Re:apple's response will be interesting (Score:1, Insightful)
In the blue corner...
I know they were selling their old G5 PM's as a special refurb sort of deal.
and in the red corner..
I think that having the world's third fastest supercomputer
*dingdingding!*
They don't have anything right now - you are quite right, they are selling their old G5s, but the new machine is nowhere to be seen. Does this sound like value for money to you?
The sooner you realise that VT have been basically used by Apple as an advertising exercise, the better.
Re:Whoa. (Score:1, Insightful)
Even so, the US preaches to the rest of the world how they should do and think. I think you have reached the state of "being so dumb you don't even KNOW that you are dumb".
Re:Very great and all... (Score:4, Insightful)
That sentence doesn't even parse, but anyhow: single-thread performance still matters to clusters. There is a limit to how much you can effectively parallelize many problems. If that limit is 1, then you need a Cray or something. If the limit is extremely high, you can use distributed.net, or a cluster of recycled C64s.
In the middle, you might be able to parallelize the task to a limited extent. If you can only split your work into 500 parallel tasks, then you want 500 of the fastest processors you can get. For many applications, that means 500 Itaniums. Even if you could buy 800 Opterons for the money, they might not be as fast.
only other option would be they thought intel would hold up better/be more stable.
Itanium has slightly better manageability; you can find out when a memory module or CPU is likely to fail for example. There is a heap of error detection/correction in the CPU, far beyond Xeon or Opteron afaik. If you have hundreds of machines being able to easily detect failures is worth something.
(Or you can just take the google route and let it fail and replace the whole box. But that really requires your whole application to be written to accomodate it.)
Re:Very great and all... (Score:3, Insightful)
But look at NUMALink4, its got 6.4 GB/sec per link bandwith and 240ns latency.
QsNetII is just under 1 GB/sec bandwidth, the limit of PCI-X, with a latency of 3us.
So, NUMALink4 has 6.4 times the badwidth and 12.5 times less latency than QsNetII. That a much larger performance difference than Opteron vs Itanium!
Top500 list not updated yet (Score:3, Insightful)
Re:Very great and all... (Score:5, Insightful)
A system like this will use a high-speed interconnect, not gige. The popular choice right now is infiniband, and that stuff isn't cheap, and also has limits to the number of ports per IB switch. The system at LLNL has 4 procs per node, which reduces the number of IB switches involved. 5000 dual proc (you suggest 248 proc) machines would require 2500 IB ports, instead of 1024.
now if you considered the opteron 848 ($1300), in 8proc nodes, that would be something to think about, reduce the number of IB ports in half, and be able to double the processors.
the other consideration is also processor scale. the 27% per CPU is signifigant, because even with dual proc SMP, you loose some % of the CPU time. There was a posting on an article about how processors scale this way. I forget how the principle works.