Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Linux Software Technology

Cray CTO: Linux clusters don't play in HPC 435

jagger writes "Linux clustering was touted as the next big thing by many vendors last week at ClusterWorld Conference & Expo 2004. But supercomputer vendor Cray Inc. scoffed at the notion of putting Linux clusters in the high-performance computing (HPC) category. "Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer," said Dr. Paul Terry, CTO of Cray Canada."
This discussion has been archived. No new comments can be posted.

Cray CTO: Linux clusters don't play in HPC

Comments Filter:
  • Business or science? (Score:2, Interesting)

    by grub ( 11606 ) <slashdot@grub.net> on Tuesday April 13, 2004 @11:44AM (#8848756) Homepage Journal

    Dr. Terry's assertions remind me of a Seymour Cray quote I had as my /. sig a while back:
    "If you were plowing a field, which
    would you rather use? Two strong oxen or 1024 chickens?"

    I'm not picking a side, it just seems interesting that the Cray CTO would echo Seymour's thoughts. I guess it's for business and marketting reasons though, sadly.
  • CTO of Cray? (Score:2, Interesting)

    by shachart ( 471014 ) <shachar-slashdot ... ac.il minus city> on Tuesday April 13, 2004 @11:45AM (#8848775)
    You did notice he is the CTO of Cray... Canada??
  • Are too (Score:5, Interesting)

    by Anonymous Coward on Tuesday April 13, 2004 @11:45AM (#8848780)
    "Most cluster [experts] know now that users are fortunate to get more than 8% of the peak performance in sustained performance."
    Tell that to PIXAR. I don't believe it either.

    I guess that the simple problem is just that the algorithm applied is usually not suitable for massively parallel computing.

  • by huhmz ( 216967 ) on Tuesday April 13, 2004 @11:47AM (#8848814)
    REading the article it's fairly obvious that Cray's CTO has an agenda, however, assuming he's right, what does play in HPC? Cray Prorpritary Cluser OS (TM) or what?
  • by Ridgelift ( 228977 ) on Tuesday April 13, 2004 @11:49AM (#8848865)
    Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer, said Dr. Paul Terry, CTO of Cray Canada. "At best, clusters are a loose collection of unmanaged, individual, microprocessor-based computers."

    I guess they're not happy about being only #19 on the Top 500 Supercomputer List [top500.org]. Linux is considered faster than they are according to the list.

    The 'ol ad-hominem attack of "if you can't beat them ligitimately, attack them personally" just doesn't cut it Paul. Build a better computer.
  • by yppiz ( 574466 ) on Tuesday April 13, 2004 @11:49AM (#8848866) Homepage
    The Cray CTO makes the point that Linux clusters get, at best, just under 10% peak as sustained performance and uses this as a justification that Linux clusters are not HPCs. This is a reasonable criticism. Let's take the percentage he cites as real for a moment. Now what is the cost difference between a Linux cluster and a Cray (not some future offering, but today) and how much more of a Linux cluster could you afford? Would that offset the quoted inefficiency? Would the flexibility of being able to use commodity components further offset any advantage Cray might have? What about 24hr or same-day parts replacement without a hyper-expensive service contract? At the end of the day, I suspect the Linux cluster wins out even given the sub-10% efficiency figure Cray cites. --Pat / zippy@cs.brandeis.edu
  • Funny... (Score:3, Interesting)

    by Greyfox ( 87712 ) on Tuesday April 13, 2004 @11:51AM (#8848887) Homepage Journal
    Works for Google...

    Wish I'd been there so I could have slapped him after about 3 seconds of stunned silence.

  • Problem (Score:5, Interesting)

    by rawgod0122 ( 574065 ) on Tuesday April 13, 2004 @11:51AM (#8848896)
    It all depends on the problem you are trying to solve. I have been doing some work of late that would not complete in my life time on the 108 node cluster that we have. But when programmed for and run on two Cray X1s I should complete inside of a week.

    Granted there are many codes (and more every day) that will run on clusters, the big iron will never die.
  • by Anonymous Coward on Tuesday April 13, 2004 @11:54AM (#8848937)
    MySQL and PostGre ARE toy databases compared to Oracle. They don't have the functionality nor the power of Oracle. They are taking steps to start working in some of the better features of Oracle, but it'll be awhile before that comes to fruition.

    Show me a link where MS Says "Apache cannot be used for REAL web serving", you won't find it because that's not what they've said.

    As to Sun Announcing that Intel and Linux cannot be sued for enterprise computing, well, they're p[artially right. I've seen Sun boxes trudge through loads that would hardlock a Linux box, ona daily basis, that's due to the hardware and the OS being built to work with each other. No one ever accused Slowlaris of having the "snappiness" of Linux, but if I had my choice of either a Sunfire with Solaris or a Dell with Redhat on it to take the brunt of my business, guess what, Sun is getting my money.
  • by nacks1 ( 60717 ) on Tuesday April 13, 2004 @12:05PM (#8849065) Homepage Journal
    I happen to work in a facility that has large had both large supercomputers (cray t3e, j90, sgi) and linux and *nix based clusters (beowulf/linux, compaq/Tru64). The Cray CTO is correct that you can't just call every linux cluster out there HPC. Just about anyone with networking and linux knowledge can build a linux cluster.

    What really makes a difference between an HPC cluster and your normal every day cluster is the hardware interconnects used. There is a comment in the artical that refers to not using I/O for memory and message passing. I am not quite sure what he means by that, but I am guessing that he is saying that the network is not used for shared memory/message passing (MPI/openMP/SHMEM).

    If a cluster can limit the impact of latency between nodes either through smarter software or faster interconnects then I can't see any reason not to concider a linux cluster as HPC.

    Clusters without smarter software tend to be a real difficult coding platforms. Some developments with things like globally shared memory might make the difference, but there will still be the problem of latency between nodes.
  • by The Placid Casual ( 661461 ) on Tuesday April 13, 2004 @12:10PM (#8849138)
    The VA cluster is no more. It is being sold as individual G5s by Macmall, each with a certificate proving that it was part of the cluster!

    There were issues with unbuffered RAM, so they have decided to make a new cluster with the new 2ghz X-Serves which use EEC RAM ( and new IBM 970fx chips).

    This has resulted in massive shipping delays for the dual X-Serves, but should mean that a very, very good machine is created at VT...

  • by AngstAndGuitar ( 732149 ) on Tuesday April 13, 2004 @12:14PM (#8849189)
    My friends and I at the university create a beowulf cluster as a project for our Linux class, (just installing software, nothing impressive really, basic system administration) and benchmarked it (24 P200s 32MB ram each 10baseT network) against a reasonably fast computer running the non-parallel version of the same code... (HP C station) and ours finished faster, with the overhead of networking! as to being "a loose collection of unmanaged, individual" We managed it pretty tightly, using nfs to provide the binaries that would be used, thought we did not use NFS for the root partition, as this would put to much stress on our poor fileserving node. If I remember properly, we used dist for password files, and aside from pushing the power switches, we could manage everything in the room without leaving our seats (24 computers running 'xlock -mode matrix" hehe).

    Basicly, I disagree with Dr. Terry.
    our project writeup is here. [csuchico.edu]
    (Please forgive any mistakes or stupiness therein, we were 15, 15, and a 30something non-geek at the time.)
  • by Anonymous Coward on Tuesday April 13, 2004 @12:15PM (#8849203)
    If I type a bunch of stuff and put these quote thingies around it while sighting a source, usually that means something. But I guess since the original post was dumb, they mean something else?

    He's spouting the same ole FUD common of all Linux zealots, it just so happens he tried to present his argument using BS quotes.

    Anyone that's actually used those software packages will realize that he's an idiot.
  • Re:Help me here... (Score:5, Interesting)

    by krlynch ( 158571 ) on Tuesday April 13, 2004 @12:17PM (#8849230) Homepage

    So depending on the task at hand, the cluster might perform very well, or perhaps a little less well.

    Surely what you meant to say is that, depending on the task at hand, a cluster might perform very well, or perhaps perform attrociously. :-)

    Clusters tend to work well when the various nodes don't need to communicate very often but you need lots of cycles for the subtasks, while dedicated supercomputers tend to perform very well in tasks requiring vast amounts of internode communications bandwidth along with large numbers of cycles. If you need vast bandwidth and relatively low numbers of cycles, your pricepoint is likely a mainframe. And if you don't need either, you get a cheap desktop machine.

    Certain problems parallelize well on a cluster ... others don't. Some don't parallelize at all, and a cluster won't do you a darn bit of good. The different machines are designed for different uses ... and one should be careful not to push a "one size fits all" solution. The Cray guy clearly got it wrong on that point, and likely knows it, but he was marketting, not teaching a course in choosing hardware for the task at hand.

  • I'm confused... (Score:1, Interesting)

    by Anonymous Coward on Tuesday April 13, 2004 @12:25PM (#8849308)
    are they? or are they not? owned by SGI right now?

    SGI has bought and sold the company so many times I lost track.

    The funny thing about that is, now the same problems Cray is having, SGI is having as well: (trying to sell single supercomputer machines in a market that is heading to clusters because of price.)
  • ...the same ole FUD? (Score:3, Interesting)

    by heironymouscoward ( 683461 ) <heironymouscowar ... .com minus punct> on Tuesday April 13, 2004 @12:42PM (#8849523) Journal
    You know Goodwin's Law? Well, here is Heironymous' Corollary to Goodwin's Law:

    "Anyone using the terms 'zealot' or 'FUD' in a Slashdot discussion is immediately declared the loser of the thread and discussion stops at that point".

    Of course I'm force to break my own corollary to make this point.

    But to call me a "Linux zealot spouting FUD" (and excuse me for paraphrasing your lucid comment) because I mock a commercial vendor who says that the free alternative is no competition... WTF?

    As it happens: I have 20+ years of experience in IT and I've used every one of those packages (except the Cray). Oracle, MySQL, IIS, Apache, Sun, Solaris, Linux. And hundreds of other platforms, as well.

    My opinions are not those of a zealot, but pretty impartial and generally very accurate. There is a good reason, for instance, why the most critical servers in my business all run Debian Linux, why the desktops use Xandros, why our applications use MySQL, and why we're phasing our out Microsoft/COM+/IIS/SQLServer platforms. Zealotry has little to do with it, but good sense does.

    The facts are these: open source, free, commodity IT has become good and cheap enough to exceed the capabilities (at any price) of many commercial systems. Most specifically, Cray, Oracle, Microsoft, and Sun find themselves spot center of the area that has been commoditized.
  • Re:Marketing (Score:2, Interesting)

    by tomhudson ( 43916 ) <barbara,hudson&barbara-hudson,com> on Tuesday April 13, 2004 @12:45PM (#8849569) Journal
    They're just pissed because clusters are 7 out of the 10 top supercomputers, as noted here [top500.org]
  • Whatever (Score:3, Interesting)

    by hackstraw ( 262471 ) * on Tuesday April 13, 2004 @12:49PM (#8849638)
    I'd like to see Paul Terry say this in front of everybody at the Super Computing conference where they announce the Top 500 Computers [top500.org]. Its worth noting that he is not bashing Linux per se, but "Linux Clusters", which is pretty arbitrary, because he should be saying "all clusters", because the OS really doesn't have too much to do with it. Supercomputing apps run in userspace, not kernel space, and the hardware, including interconnects or some kind of interprocessor communication drive the performance.

    The Cray XD1 looks like a nice system, but there are only theoretical performance values given, and noone can go out and buy one of these things yet. I also don't know how much these guys cost.

    I love this statement:

    Linux clusters do have a place. "For applications that require low performance, they are a cheaper solution," said Terry.

    Yeah, when we spend a million+ dollars on a supercomputer, we are thinking of low performance, because our applications require it. Thanks.

    I'm guessing this guy is a wannabe marketer who got stuck in a CTO position. There are plenty of HPC vendors out there, and trust me if this XD1 has a good price/performance and they work (this is key), then people will buy them with little questions asked. Otherwise, this whole article is just an advertisement that makes many statements without any evidence that the XD1 is any better than 4 Xboxes connected together over a serial connection. Next....
  • by borwells ( 566148 ) on Tuesday April 13, 2004 @01:01PM (#8849801) Homepage
    I have a rack right here housing a linux cluster. 36 1U dual-Xeon servers. On the Cray XD1 [cray.com] site it details the "Exceptional Performance" of the XD1 system. It details the performance of a system with 12 AMD Opteron processors, and the performance of a rack of systems with 12 AMD Opteron processors. I understand that the underlying architecture of those servers may be vastly different than the servers in the rack next to me, but fundementally aren't they both multi-processor PC servers operating in a cluster? If so why does their rack full of multi-processor systems qualify as an HPCbut mine does not?
  • by pragma_x ( 644215 ) on Tuesday April 13, 2004 @01:07PM (#8849900) Journal
    Despite assertions made by Linux vendors, a Linux cluster is not a high performance computer, said Dr. Paul Terry, CTO of Cray Canada. "At best, clusters are a loose collection of unmanaged, individual, microprocessor-based computers."

    Although this statement reeks of FUD, he's right about one thing: a cluster is not an HPC... that's why its called a cluster. But to say that a cluster is 'unmanaged' is one hell of a stretch IMO. All in all, he's just arguing semantics: nothing to see here, put down your flamethrowers, move along folks.

    Since this is slashdot, I'll add that the rest of the article is full of choice quotes all of which point squarely at basic FUD + marketing spin for their new cluster-cost-like product.

    It seems to me that Cray is just plain bitter that Linux (through all the cluster solution providers) has managed to steal Cray's thunder at a mere fraction of the cost. Cray's probably even more bitter that folks are willing to sacrifice performance (at least from Cray's perspective) just to save a buck.

    Okay, this is Cray we're talking about here: people are saving millions of bucks all over the place by using clusters instead of big expensive machines.

    And guess who wants 'their' slice of the pie back.
  • by heironymouscoward ( 683461 ) <heironymouscowar ... .com minus punct> on Tuesday April 13, 2004 @01:11PM (#8849962) Journal
    Hey, will the real Anonymous Coward please stand up!?

    True. But I spout what are called "stealth opinions", being understated (or even unstated) makes them harder to criticize, and I have the advantage of being able to change opinion in mid-spout to dodge the zealots.
  • Re:Are too (Score:4, Interesting)

    by bugnuts ( 94678 ) on Tuesday April 13, 2004 @01:27PM (#8850146) Journal
    All tests for the top 500 supercomputers [top500.org] are done solving a problem using Linpack [top500.org], not some trivially parallel code such as raytracing 100,000 frames of a movie.

    Message passing is the biggest issue with such solvers, and in a way, cray was absolutely right about Linux, although misleading. There are some tests going on now with a modified Linux kernel for doing true HPC, and it's been done in the past (I know, I've used it). Things like disk swapping pretty much immediately disqualifies you for high performance computing. It has its place of course, such as trivially parallelizable codes is one example (Pixar).

    Myrinet was out before Gbit ethernet was really available, and also has some nifty routing capabilities. And since the bottleneck for HPC is usually message passing, high performance computing will better realize its theoretical performance as the communication speed catches up to the processor speed.

    But, to Cray's discredit, making a blanket statement that Linux can't do HPC is like saying Macintoshes can't do HPC. [top500.org]
  • Re:Help me here... (Score:3, Interesting)

    by starm_ ( 573321 ) on Tuesday April 13, 2004 @01:32PM (#8850211)
    I'n not that familiar with HPC either but I'll try to explain what I know in laymen terms. A cluster is nothing more than a bunch of computer networked together intellegibly with an OS that is capable of seperating tasks between these computer. Crays on the other acts more like one big computer. Like the cluster, It also has hundred of CPUs but they are all on the same "motherboard" ( if you can call it that). Some of them share memory. The memory is very high speed. (somethimes in configuration equivalent to gigabites of L1 cash) And it often comes with a huge liquid cooling system so that it can be run at high speed. I've seen crays with very cool cooling systems with a running liquid fall on the front of it. Its uses an inert liquid called something like florinert. This liquid flows on the whole "motherboards" and since it is inert it doesn't conduct electricity or react with anything. Clusters a much cheaper to build, that's why people tend to use clusters, but they can't do everything Crays can.
  • Re:Marketing (Score:1, Interesting)

    by flaming-opus ( 8186 ) on Tuesday April 13, 2004 @01:47PM (#8850400)
    Which is most curious, as the XD1 IS a linux cluster. It's a very well designed linux cluster, with very high bandwidth DMA between nodes. However, it is programmed, and behaves very much like more traditional clusters of microprocessors.

    The XD1 is NOT the same as the big vector-processor X1s.

  • Re:Help me here... (Score:3, Interesting)

    by CatOne ( 655161 ) on Tuesday April 13, 2004 @01:57PM (#8850528)
    It slows things down a little, yes, but it's not a huge difference. Infiniband can do DMA across machines -- so the memory on machine 2 *can* be directly accessed by the CPU on machine 1 (i.e. the CPU on machine 1 doesn't need to be consulted).

    Sure, this reduces peak efficiency. I think on the VT cluster it was in the 50-60% range (I could Google search but I'm lazy... shoot me)... that is, the total performance is about .5 or .6 times (2200 CPUs). This is pretty good, overall, compared to other systems.

    But the Cray guy is full of hot air. Of course you're going to sing the praises of massive SMP when that's what you have to sell. The fact is, if 1100 dual CPU machines clustered together can significantly outperfom the Cray, for less money, and they're easy to manage (they are...), then why not go that route?

    So Cray sells FUD, because it's their last option.
  • Valid Question, then (Score:3, Interesting)

    by Allen Zadr ( 767458 ) * <Allen.Zadr@nOspaM.gmail.com> on Tuesday April 13, 2004 @02:35PM (#8851043) Journal

    At what price point does the Cray XD1 come in? While huge clusters are (supposedly) cheap individual computers -- I would argue that G5s are not inherantly cheap -- how many G5s that make up the Virginia Tech cluster would you have to get to before you've paid for a Cray XD1?

    I mention this because the article implies that Cray is planning on selling the XD1s at a price point cheaper than equivelant clusters. If they succeed at making the XD1 cheap enough, then it may be more cost effective to [[ effectively, cluster ]] a couple of these Crays, with less power consumption, heat dissipation and plain old real-estate.

    It seems to me that TCO would be cheaper for the Cray, especially considering that the best clusters expect 5% of the member computers to be broken at any given time.

    So, does anybody have Cray XD1 pricing? That, seems to me, to be the only way to rationally decide on the 'better' solution.

  • by mosel-saar-ruwer ( 732341 ) on Wednesday April 14, 2004 @11:08AM (#8859960)

    I asked this earlier in the thread: Provably non-Parallelizable? [slashdot.org]

    Allow me to ask it again: What's the state of the art of proofs of parallelizability [and non-parallelizability]?

    Is there a standard list of problems that have been proven to be non-parallelizable? Are there any problems that have been proved to be parallelizable, but for which no parallelizing algorithm has yet been discovered? Is there anything analogous to the NP-completeness conjecture in this field?

To the systems programmer, users and applications serve only to provide a test load.

Working...