Ask Slashdot: Best Use For a New Supercomputing Cluster? 387

Posted by Unknown Lamer on Tuesday September 13, 2011 @05:32PM from the reclaim-heat-for-silicon-diner dept.

Supp0rtLinux writes "In about 2 weeks time I will be receiving everything necessary to build the largest x86_64-based supercomputer on the east coast of the U.S. (at least until someone takes the title away from us). It's spec'ed to start with 1200 dual-socket six-core servers. We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs. So, what's the best Linux distro for something of this size and scale? Any that include a chargeback option/module? Additionally, due to cost contracts, we have to choose either InfiniBand or 10Gb Ethernet for the backend: which would Slashdot readers go with if they had to choose? Either way, all nodes will have four 1Gbps Ethernet ports. Finally, all nodes include only a basic onboard GPU. We intend to put powerful GPUs into the PCI-e slot and open up the new HPC for GPU related crunching. Any suggestions on the most powerful Linux friendly PCI-e GPU available?"

This discussion has been archived. No new comments can be posted.

Ask Slashdot: Best Use For a New Supercomputing Cluster?

Load All Comments

Search 387 Comments Log In/Create an Account

Comments Filter:

Lost some funding? (Score:5, Funny)

by turkeyfeathers ( 843622 ) writes: on Tuesday September 13, 2011 @05:35PM (#37392050)

Start with the cheapest backend that'll get the system up and running, then use your supercomputer to mine Bitcoins for a few days, then use all the money you'll make to buy the InfiniBand backend (you'll probably have enough money left over to buy Monster cables to hook everything up).

Share
twitter facebook
- - Re: (Score:2)
    
    by leenks ( 906881 ) writes:
    
    Let me guess... you bought the green pens (or stick on rims) for the edges of your CDs too?
  - - Re: (Score:2)
      
      by KZigurs ( 638781 ) writes:
      
      To be fair some of Britneys recordings are (actually indeed) exceptionally well mastered. You might not like her music, but if you are a proper audiophile you will still enjoy it.
      - Re: (Score:2)
        
        by webmistressrachel ( 903577 ) writes:
        
        To be fair, some people make very good recordings of power tools! And if I'm a "proper" audiophile I'll still enjoy it, will I?
  - - Re: (Score:3, Informative)
      
      by Anonymous Coward writes:
      
      Maybe the mods are a little more aware than you of the engineering and scientific FACTS about Monster Cable. Some things that you said:
      Monster cables are only worth the investment for speakers and line-level / mic stuff (i.e. analogue signals). [...] But 44.1KHz 16-bit sound, converted to analogue in the transport and sent to the amp via line leads WILL benefit from Monster / premium cables, as will speaker cables of any kind.
      are, I'm afraid, complete nonsense. Counterfactual, in fact. And yes, there's real science to support that. Let me gloss over it...
      A 44.1 kHz sample rate before the DAC means the maximum frequency component the cables need to handle is 22 kHz. (This is due to the Nyquist limit, as in the Nyquist-Shannon Sampling Theorem.) 22 kHz is low. Really low. Practically any ol
    - Re: (Score:2)
      
      by chargersfan420 ( 1487195 ) writes:
      
      It appears with posts like this, that perhaps the opposite of your signature is also true.
      
      Off-topic I know, but sorry, I couldn't resist.
    - Re: (Score:2)
      
      by Toonol ( 1057698 ) writes:
      
      Protestations notwithstanding, I still think you were trolling. That's a kindness, by the way; I'm generously assuming you don't actually believe what you're claiming.
    - Re: (Score:2)
      
      by Known Nutter ( 988758 ) writes:
      
      You're right - should've been Offtopic.
      Nobody wants another monster cable tennis-match.
    - - Re: (Score:3)
        
        by ls671 ( 1122017 ) writes:
        
        Maybe he is, I have always assumed that on Slashdot, nicknames like "webmistressrachel" could very well be owned by males ;-)
- - Re: (Score:3)
    
    by Arrepiadd ( 688829 ) writes:
    
    I work in computational chemistry and there's currently two or three codes out there using the GPU. Granted that number will only increase, but at this point having GPUs is almost useless (these codes don't do 10% of what other codes, or a combination of them, can do.
    Your mileage may vary, but assuming someone is a moron just because he isn't doing what fits you perfectly is moronic itself.
Best Use For a New Supercomputing Cluster? (Score:2, Funny)

by Jodka ( 520060 ) writes:

Generating Bitcoins
- Re: (Score:2)
  
  by Monkey-Man2000 ( 603495 ) writes:
  
  LOL, where's my mod points when I need them. The bitcoins will help offset the energy consumption I'm almost sure.
I call Shenanigans!!! (Score:5, Insightful)

by sconeu ( 64226 ) writes: on Tuesday September 13, 2011 @05:36PM (#37392058) Homepage Journal

No way in hell a project that big gets approved without a rationale.
And no way in hell the administrator of such a project would ask Slashdot what to do with it.

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  Truth!
  Two weeks away and still at the “thinking of cool shit to use it for” and “picking out hardware” stages? How does that even happen? Is this some kind of tax scam to burn as much money as possible?
  I get that the submitter already have a primary use... but I imagine if I was ever given that kind of budget I’d probably have to account for every CPU cycle every hour of the day (especially since I’m a programmer and should have no business with something like this ;p). I
  - Re: (Score:2, Funny)
    
    by Anonymous Coward writes:
    
    Yes, it probably is a tax scam. It is now the US Federal Year End. Someone wrote a really good funding proposal and got it approved to get money for a HPC cluster to do *something*. Doesn't really matter. The grant application will have focused on broad ideas like # of cores and what not and not the details. A bit surprising that the network wasn't spec'd because that is such a major cost item, but whatever, maybe the grant application's work loads are not network bound.
    So, now that the money is approved th
  - Re: (Score:3)
    
    by DrgnDancer ( 137700 ) writes:
    
    Also who the Hell buys hardware like this without vendor support? OS and backend choices should have been part of integration from the vendor. No one buys 3000 rack mount servers, a bunch of switches, some racks and some storage and builds "the largest x86_64-based supercomputer on the east coast of the U.S."
    OP, if you are in anyway serious about this stop now. You don't want the largest supercomputer on the East Coast, you want a computer that works. Call SGI, IBM, Cray, or even (ewww) Oracle/Sun and g
- Totally believable. (Score:4, Interesting)
  
  by khasim ( 1285 ) writes: <brandioch.conner@gmail.com> on Tuesday September 13, 2011 @05:50PM (#37392180)
  
  I totally believe the submitter's question.
  Next up on Ask Slashdot:
  I just got permission to buy the biggest fleet of trucks on the east coast ... and I was wondering if anyone on Slashdot had any ideas what I should do with them.
  Followed by,
  The company I work for just purchased 10,000 acres of land on the east coast and I was wondering if anyone on Slashdot had any idea what we should do with it.
  Happens all the time!
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by ColdWetDog ( 752185 ) writes:
    
    You're just trying to get us arrested. Better luck next time, Mr. DEA agent....
  - Re:Totally believable. (Score:4, Interesting)
    
    by blair1q ( 305137 ) writes: on Tuesday September 13, 2011 @06:47PM (#37392638) Journal
    
    Actually, it does.
    I remember taking possession of a spanking-new Thinking Machines cluster some <mumble> years ago.
    The principal investigator got it to do one particular calculation, and promised the excess would be put to good use.
    We spent our time trying to figure out what "good use" meant in that context.
    It hasn't got much easier.
    I say if you run out of numbers to crunch of your own, these days, just hook it up to some lucky grid-computing project and let it swamp the stats.
    
    Parent Share
    twitter facebook
    - - Re: (Score:3)
        
        by blair1q ( 305137 ) writes:
        
        Things like that generally cost more to shut down and power back up than the power you use letting them run the screensaver.
        
        Re: (Score:3)
        
        by itamblyn ( 867415 ) writes:
        
        Right, and it's bad to turn off a car even for a second, and you're better off running the AC with the window open.
        
        Re: (Score:3)
        
        by ls671 ( 1122017 ) writes:
        
        Well, shutting your car down and powering it up excessively will cause a car gas engine to wear faster since it is generally accepted that an important part of a car engine wear occurs when you power it up. For a short period, oil isn't evenly distributed and this cause excessive wear and stress compared to while it is running smootly.
        For the rest things like:
        -"not shutting your water heater when you leave for 3 months will save you money because it will cost more in the end to eat the water when you get ba
        
        Re: (Score:3)
        
        by TheRaven64 ( 641858 ) writes:
        
        The boot time for an SGI Altix is about 6 hours (I was at a fun talk by the guy at SGI doing the Xen port - he'd boot half a dozen machines so that he had one to work on when he'd crashed the last one). If you power a machine like this down when it's idle, the you're basically making it unavailable for a large category of jobs. If you can do the work in 6 hours on your computer or 10 minutes on the supercomputer, it's faster to do it on your computer because the supercomputer will still be booting up when
- Re: (Score:2)
  
  by Amouth ( 879122 ) writes:
  
  agreed - was just about to ask who was stupid enough to let someone buy that much hardware without an existing project/plan in place. and how can i get them to fund me and my start-up (don't have one now but you bring the cash i'll figure out something to do with it)
  - Re: (Score:2)
    
    by corbettw ( 214229 ) writes:
    
    Well, it's on the east coast, so there a are a few [dod.gov] possible [house.gov] culprits [whitehouse.gov] who come to mind who might do just that.
- While I find this highly doubtful.... (Score:4, Interesting)
  
  by xzvf ( 924443 ) writes: on Tuesday September 13, 2011 @06:01PM (#37392286)
  
  I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget. I can see a university get a late grant, that had to be spent in 30 days, could only be spent on technology, that can only come out of a pre-approved catalog, and some administrative type that just saw a Top 500 super-computer list with competing university names on it, bring up in a meeting that we should build a super computer, and some grad assistant saying how easy it would be. They found a room with a window in it and ordered a bunch of parts, and will walk prospective students and their parents by it saying "This is the largest super-computer on the east coast".
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by geekmux ( 1040042 ) writes:
    
    I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget. I can see a university get a late grant, that had to be spent in 30 days, could only be spent on technology, that can only come out of a pre-approved catalog, and some administrative type that just saw a Top 500 super-computer list with competing university names on it, bring up in a meeting that we should build a super computer, and some grad assistant saying how easy it would be. They found a room with a window in it and ordered a bunch of parts, and will walk prospective students and their parents by it saying "This is the largest super-computer on the east coast".
    Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.
    Sad to say, I've seen Government "last-minute" spending like this too, but not exactly to this level of magnitude. This is a shitload of money "left over". This may have come from somewhere, but "budget" obviously had nothing to do with it.
    - Re: (Score:3)
      
      by kcitren ( 72383 ) writes:
      
      Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose?
      Nope, I never wonder because the answer is obvious. If they don't spend it this year, they won't get it next year.
    - Re: (Score:3)
      
      by Zancarius ( 414244 ) writes:
      
      Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.
      In some parts of the DoD it's so bad that, due to the way the finances work, if there is unallocated parts of the budget they'll be removed for the following fiscal year, sending everyone into a scramble to spend whatever's left of their budget before the axe drops. It's no secret then that most divisio
    - Re:While I find this highly doubtful.... (Score:4, Insightful)
      
      by robotkid ( 681905 ) writes: <alanc2052 AT yahoo DOT com> on Wednesday September 14, 2011 @02:00AM (#37395170)
      
      Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.
      Sad to say, I've seen Government "last-minute" spending like this too, but not exactly to this level of magnitude. This is a shitload of money "left over". This may have come from somewhere, but "budget" obviously had nothing to do with it.
      Yeah, I used to wonder that too. Then my wife got a job in state government. And the answer became painfully obvious judging by the maximum pace at which stuff gets done even when you have people willing to work hard and important problems sitting right in front of you. If you allowed unspent money to roll over indefinitely, that would create an irresistible incentive to do the cheapest job that won't get you in trouble and then hoard, hoard that money. Heck, you could stretch that 3-year project into a 5-year one by doing it very slowly. You could build up a war chest and use it on pet projects that noone approved. Or you could wait till no-one even remembers the project existed anymore and then embezzle it.
      So as inefficient as it is, the blanket rule that all money must be spent the year in which it is allocated is a simple way to increase transparency and accountability across the board. It may even be one of the driving forces anything gets done remotely on schedule in an environment where purchasing a USB cable requires 2 requisition forms, 3 vendor quotes, the signature of your boss (who is in an all-day meeting), your boss's boss (who is talking with legislators today and can't be disturbed), and pre-approval from someone in accounting (who just went on vacation yesterday).
      Of course, it would be great if getting the job done on time and under-cost were somehow rewarded. But that's incentivizing success, that's the profit maximizing, the corporate bottom line, whereas the the Gub'ment bottom line is minimizing "embarrassment" (be it from the media, the voting public, and especially legislators on the appropriations committee). You use a Gub'ment bureaucracy for things you can't trust the for-profit world to do on their own, so the service provided has to be somewhat divorced from the revenue stream if you want to ensure more reliable results than just contracting out to a private company. (I'm sure Ron Paul would beg to differ, but then again he also probably enjoys being able drink water out of the tap without getting sick). You wouldn't pay a health inspector, for example, just based on the number of sites inspected per day because that encourages as cursory a job as possible on as many sites as possible. Instead, you set a minimum quota they have to fulfill, and then make it known you'll have their head on a platter if a restaurant shows up in the news for salmonella poisoning the week after you've signed off on it. That's the Gub'ment way. .. .
      
      Parent Share
      twitter facebook
- Re: (Score:3)
  
  by AdamHaun ( 43173 ) writes:
  
  It did have one. Right there in the submission:
  We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs.
  - Re: (Score:2)
    
    by geekmux ( 1040042 ) writes:
    
    It did have one. Right there in the submission:
    We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs.
    Ah, stating an existing purpose on "fairly small" hardware vs. the justification to spend for the "largest x86_64-based supercomputer on the east coast of the U.S." are several orders of magnitude away from each other (and common sense for that matter). Sorry, but I'm calling Shenanigans on this too.
    And if this turns out to be true, then I don't give a shit what they do with the HPC. I want to meet the person who managed to get this expense approved with basically little or no justification behind it, for
    - Re: (Score:3)
      
      by Doc Ruby ( 173196 ) writes:
      
      They are buying a supercomputer because their lucrative medical research is too big for the smaller HPC, but not (yet) big enough for the biggest supercomputer of its type in the region. So they're also looking for some other apps to use the extra capacity instead of it going to waste.
      That might not be true - this is just a Slashdot assertion. But there's nothing inconsistent in there to suggest it's false. It's perfectly plausible.
      You are just one of the modern type of people who make up your mind on your
      - Re: (Score:3)
        
        by Nimey ( 114278 ) writes:
        
        Modern? Your faith in your elders is cute.
  - - Re: (Score:3)
      
      by mapsjanhere ( 1130359 ) writes:
      
      Everyone assumes this is a government funded project. I see an administrator at a start-up, running a bunch of promising biochemical/medical simulations stuff on a 20 machine cluster using some linux-based code. Now they got some serious venture capital investments, and venture capital wants fast scale-up for fast flipping. If the researchers say they can do their work in 3 years on the 20 machines or in 1 year with a couple million in new hardware, the couple million will not even cause a blink to a maj
- Re: (Score:2)
  
  by kcitren ( 72383 ) writes:
  
  Government year end money; use it or lose it. I've seen this happen before, they're a few hundred thousand laying around allocated to hardware acquisition. They need to spend it fast, so they find something related to what they do and get something newer, bigger, and better...
  - Re: (Score:3)
    
    by Doc Ruby ( 173196 ) writes:
    
    Whether or not this is a true story, or whether or not it's a government project, there is as much budget-reserving in private industry like what you described as there is in government. Probably more, since government is more transparent than private business, and so more people have access to exposing that little game, which tends to inhibit it some.
- Re: (Score:2)
  
  by Razed By TV ( 730353 ) writes:
  
  No way in hell, indeed. Everything about this is stupid. Take the cost, spending 1200 servers * 2 cpus each * at least $200 and you're singing $480,000, not including the 1200 servers themselves which I'm going to lowball at $200 each because I don't feel like newegging it, and you get $720,000. If you really had a lot money to spend in a short period of time, is this the first thing you would think of to squander it on? Do you already have 8-core desktops with dual 50" HDTV's as displays? How's all yo
Ummm two things (Score:4, Insightful)

by Sycraft-fu ( 314770 ) writes: on Tuesday September 13, 2011 @05:36PM (#37392060)

1) Something with 10gb really isn't a "supercomputer" it is a cluster. Fine, but call it what it is. I really wouldn't call a cluster with Infiniband a supercomputer either.
2) You really should maybe get someone who knows more about your project and someone who knows more about clusters/supercomputers. The questions you are asking are not ones I would want to see form the guy making the choices on a multimillion dollar project.

Share
twitter facebook
- Re: (Score:2, Interesting)
  
  by Anonymous Coward writes:
  
  You clearly have no idea what you're talking about. I was just part of a million-euro EU project consisting of a large partnership of universities and companies. Given the fact that none of them ever did anything, my professor gave up and defined the project on his own.
  I coded the entire project on little more than minimum wage while I was also attending classes. I managed a couple of helpers who did web design and documentation, and dealt with the rest of the partners on my own, even interacting with fancy
- Re: (Score:3)
  
  by Anubis350 ( 772791 ) writes:
  
  1)You haven't been to any computer conference (like, say, SC) have you? or worked on a supercomputer? Most supercomputers these days are clusters, and hell, one of the most common interconnects is still gigE, not even 10gigE, though that's slowly changing (check the top500 stats if you don't believe me, but I've been at SC's top500 announcement every year for the past 4, and it's been mentioned each time. For that manner I run jobs on a gig based cluster everyday, and for many types of work it's not necessa
  - Re: (Score:3)
    
    by Sycraft-fu ( 314770 ) writes:
    
    They may call them "supercomputers" but in my mind that is mislabeling things. They work for cluster operations, where there's not a ton of inter-node communication and no need for access to memory outside your node. Well, that is what supercomputers were made for. So in a real supercomputer, you have the ability to do that. That is also why real supercomputers cost more.
    I think it is an important distinction for that reason. While a supercomputer can do all a cluster can, the reverse is not true. Same with
    - Re: (Score:2)
      
      by wagnerrp ( 1305589 ) writes:
      
      The only real difference between clusters and shared memory "supercomputers" is that shared memory systems get a hardware assist to access remote data, while clusters have to do it all in software in the network stack and communications framework. When your infiniband backbone is running 5GB/s and latencies in the hundreds of nanoseconds between each node, where is the real cut off? It seems more like a gradual sliding scale to me.
  - Re: (Score:2)
    
    by blair1q ( 305137 ) writes:
    
    I think 2) is not seeing the whole story there.
    They do have a continual use for mass quantities of computation. But it looks like it's not a 24/7 workload. And with $/core dropping like a rock, this iteration of the "biggest" may be cheaper than the last, and therefore not the sort of budgetary lightning rod that building-sized supercomputers used to be.
Uh oh.. (Score:5, Insightful)

by joib ( 70841 ) writes: on Tuesday September 13, 2011 @05:36PM (#37392066)

Shouldn't you have figured out answers too all these (simple) questions before ordering several million $$$worth of hardware? Sheesh.. As for you specific questions: - IB vs. 10GbE: IB hands down. Much better latency and more mature RDMA software stacks (e.g. for MPI and Lustre). Cheaper and higher BW as well. - GPU: NVidia Fermi 2090 cards. CUDA is far ahead of everything else at the moment.

Share
twitter facebook
- Re: (Score:3)
  
  by Savantissimo ( 893682 ) writes:
  
  I'll assume you know more about this than me, but he did say that the nodes are going to be wired with 4x GigE. Might there be a penalty bridging from that to IB rather than 10GigE?
  Anyway, to get low latency those GigE links to the nodes need to be optimized. I thought this was interesting:
  High performance network technologies such as InfiniBand use a kernel by-pass method to improve performance. This capability is also available for Ethernet, but is not widely used outside of the HPC community. One such m
  - - - Re: (Score:3)
        
        by Savantissimo ( 893682 ) writes:
        
        That IBM whitepaper link was supposed to be: Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet [chelsio.com]
- Re: (Score:2)
  
  by LWATCDR ( 28044 ) writes:
  
  This has got to be a troll. I mean really setting up a cluster and you have no idea about the interconnects or GPUs? Not to mention cooling or power. I picture this being put together in a spare back room and walls of plastic shelving and APC UPSs from Best Buy.
  Who would fund such a thing.
  Here is the best of all suggestions if this is not a troll. FIND A VENDOR. http://www.linuxclusters.com/vendors.html [linuxclusters.com]
  - Re: (Score:2)
    
    by LWATCDR ( 28044 ) writes:
    
    I really want to believe that you are correct but I have dealt with government IT people before. This could be on the up and up, good lord help us all.
Crysis 2 (Score:2)

by Arnos ( 91951 ) writes:

Perhaps this can actually run (gasp) Crysis?
- Re: (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  Perhaps it can program Crysis...
- Re: (Score:2)
  
  by poity ( 465672 ) writes:
  
  With 7000 cores, it can probably ray-trace Crysis 0.o
Riiiiight (Score:2)

by GrumpySteen ( 1250194 ) writes:

We're supposed to believe that you've purchased 1200 servers, 2400 six core CPUs and all the associated hardware without deciding basic things like how you're going to connect it all or what distribution you're going to use?
- Re:Riiiiight (Score:4, Funny)
  
  by PPH ( 736903 ) writes: on Tuesday September 13, 2011 @05:43PM (#37392136)
  
  Happens to me when I visit Costco all the time.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by Jah-Wren Ryel ( 80510 ) writes:
  
  We're supposed to believe that you've purchased 1200 servers, 2400 six core CPUs and all the associated hardware without deciding basic things like how you're going to connect it all or what distribution you're going to use?
  Sounds like they got some of that 75 billion dollars per year of anti-terrorism money. [slate.com]
  Even though he's dead, Osama still knows how to make it rain!!
Pong (Score:2)

by Vandilzer ( 122962 ) writes:

One really smooth and acuter game of pong! or asteroids if that suits you fancy... though it will require a bit more computing power :)
EPIC TROLLING (Score:5, Insightful)

by jpedlow ( 1154099 ) writes: on Tuesday September 13, 2011 @05:44PM (#37392142)

Wow, he just TROLLED THE CRAP out of slashdot. We mad, bros!

Share
twitter facebook
- excuse me... (Score:2)
  
  by Thud457 ( 234763 ) writes:
  
  but destroying the market for bitcoins has a quantifiable societal benefit. Burn down bitcoin's house while you burn in your hardware!
Use it to run Skein Hash... In Bash (Score:2)

by Alain Williams ( 2972 ) writes:

You are going to need something like that to get Skein Hash In Bash [slashdot.org] done in an acceptable time.
Mom & dad's new basement data center (Score:2)

by macraig ( 621737 ) writes:

It would appear somebody got enough of a life to move out of mom and dad's basement and now wants to convert it into a Bitcoin mining hub....
What we do ... (Score:4, Informative)

by Anonymous Coward writes: on Tuesday September 13, 2011 @05:47PM (#37392158)

Similar size setup in bio-informatics in Europe. We run redhat 6.1, was centos 5 and LSF. single 1gbit to each server (blades). No need for 10gb or IB unless huge mpi which no one uses. 32GB to 2TB per node - some people like enormous R datasets. All works well for our ~500 users.

Share
twitter facebook
- Re: (Score:2)
  
  by gknoy ( 899301 ) writes:
  
  Thank you for posting the first informative post I saw, rather than mocking or trolling ones. :)
- Ditto On Redhat, w/PBS (Score:3)
  
  by cmholm ( 69081 ) writes:
  
  This is what the biggest USAF compute cluster uses (RH, PBS), the main difference being that it does include IB because MPI support was a requirement (and is used). Otherwise, you'd better hope your users' jobs are almost exclusively embarrassingly parallel. The cluster is based on Dell PowerEdge blades, which provided good mflop/$.
  They're playing with full size Tesla GPU cards in one of the blades. I'm not sure what will give you the best bang for the buck: Tesla/Fermi/FirePro cards in-blade, or the Nvidia
Did someone say Bitcoin!? BUY! BUY! (Score:3)

by recrudescence ( 1383489 ) writes: on Tuesday September 13, 2011 @05:50PM (#37392172)

Holy crap! Someone mentioned the word "Bitcoins" on slashdot again! It's only a matter of time before its value hits the roof again! Quick! BUY! BUY!

Share
twitter facebook
- Re: (Score:3)
  
  by blair1q ( 305137 ) writes:
  
  Fuck that. What's the ticker symbol for "Beowulf Cluster"?
Monkeys! (Score:2)

by eljefe6a ( 2289776 ) writes:

How about helping me out with some computing power for my monkeys project? http://www.jesse-anderson.com/2011/08/a-few-more-million-amazonian-monkeys/ [jesse-anderson.com]
hardly the biggest (Score:2)

by zeldor ( 180716 ) writes:

Amazon's HPC cluster there in Virginia I suspect is way bigger then your little toy..
plus all the agencies.
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
Multiple super-computers instead of a single one? (Score:2)

by sisukapalli1 ( 471175 ) writes:

You need to specify additional information:
1) What about the data and storage? Many complex applications require vast amounts of data (e.g. climate change models, CFD models, GIS data sets that can complement or take advantage of modeling). Many end users may not be very adept at accessing these data.
2) What about the software? For example, CFD modeling software is very expensive. In some cases, open source software may not make the cut.
3) Does it have to be a single supercomputer? Why not split into multip
break blu-ray encryption (Score:2)

by roc97007 ( 608802 ) writes:

Assuming it hasn't already been done.
As a cluster admin myself.... infiniband!!! (Score:3)

by Fallen Kell ( 165468 ) writes: on Tuesday September 13, 2011 @06:06PM (#37392328)

I can not stress this enough. As good as 10gb ethernet is, the latency is still horrible compared to infiniband.

As for distributions, really, that depends on what you are doing and how your current applications are built/designed. Rocks cluster is fairly nice. Unfortunately we have not been able to deploy that due to our FOSS policies, which have really been hurting this project. So we have a mixed Red Hat and Solaris cluster using Grid Engine.

Share
twitter facebook
Total BS (Score:2)

by friedmud ( 512466 ) writes:

I work with some of the largest supercomputers in the world... and I can tell you that this is BS. There is no way this guy got someone to give him enough cash to put this together without:
1. A Plan of what to buy / build
2. A sound reasoning behind what would be done with the machine.
Beyond that... that isn't even that large of a cluster. There are numerous computers on the east coast larger than that... at universities and government research labs (i.e. http://www.nccs.gov/computing-resources/jaguar/ [nccs.gov] alt
two weeks away, and you still haven't spec'ed... (Score:2)

by capsteve ( 4595 ) * writes:

two weeks away, and you still haven't spec'ed all your hardware?
c'mon, this is a put on!
if you're getting this monster installation, you would have spec'ed all aspects of the hardware, including 10gb and gpu's and OS months ago.
Imagine Beowulf of those! (Score:2)

by porky_pig_jr ( 129948 ) writes:

Come on, folks. Is that Slashdot or what?
- Re: (Score:3)
  
  by blair1q ( 305137 ) writes:
  
  I was imagining partitioning it into an enormous brigade of heterogenous virtual machines, then hooking those up as a Beowulf cluster.
Closing the barn door? (Score:2)

by Have Brain Will Rent ( 1031664 ) writes:

Not quite the perfect analogy but close enough. Seems to me that these questions should all have been answered before a single piece of hardware was ordered.
Cluster software & GPU experence (Score:5, Informative)

by PAPPP ( 546666 ) writes: on Tuesday September 13, 2011 @07:11PM (#37392828) Homepage

I assume this is an epic troll, but am going to give an honest answer anyway, because there are some legitimate questions buried in there.

I work with a aggregate.org [aggregate.org] a university research group which has a decent claim [aggregate.org] to having built the very first Linux PC Cluster, set some records [wikipedia.org] with them (KLAT2 and KASY0 were both ours), and still operates a number of Linux clusters, including some containing GPUs, so I feel like I have some idea of the lay of cluster technology. It is *way* overdue for an update (and one is in progress, we swear!), but we also maintain TLDP's widely circulated Parallel Processing HOWTO [tldp.org], which was the goto resource for this kind of question for some time.

In a cluster of any size, you do _not_ want to be handling nodes individually. There are several popular provisioning and administration systems for avoiding doing so, because every organization with a large number of machines needs such a tool. The clusters I deal with are mostly provisioned with Perceus [infiscale.com] with a few ROCKS [rocksclusters.org] holdovers, and I'm aware of a number of other solutions (xCat [sourceforge.net] is the most popular that I've never tinkered with). Perceus can pass out pretty much any correctly-configured Linux image to the machines, although It is specifically tailored to work with Caos NSA (Redhat-like), or GravityOS (a Debian derivative) payloads. Infiscale, the company that supports Perceus, releases the basic tools and some sample modifiable OS images for free, and makes their money off support and custom images, so it is pretty flexible option in terms of required financial and/or personnel commitment. The various provisioning and administration tools are generally designed to interact with various monitoring tools (ex. Warewulf [wikipedia.org] or Ganglia [sourceforge.net]) and job management systems (see next paragraph).
Accounting and billing users is largely about your job management system. Our clusters aren't billed this way, so I can't claim to have be closely familiar with the tools, but most of the established job management systems like Slurm [llnl.gov], and GridEngine [wikipedia.org] (to name two of many) have accounting systems built in.
The "standard" images or image-building tools provided with the provisioning systems generally provide for a few nicely integrated combinations of tools, which make it remarkably easy to throw a functioning cluster stack together.

As for GPUs... be aware that the claimed performance for GPUs, especially in clusters, is virtually unattainable. You have to write code in their nasty domain-specific languages (CUDA or OpenCL for Nvidia, just OpenCL for AMD) and there isn't really any concept of IPC baked in to the tools to allow for distributed operations. Furthermore, GPUs are also generally extroridnarly memory and memory bandwidth starved (remember, the speed comes from there being hundreds of processing elements on the card, all sharing the same memory and interface), so simply keeping them fed with data is challenging. GPGPU is also an unstable area in both relevant senses: the GPGPU software itself has a nasty tendency to hang the host when something goes wrong (which is extra fun in clusters without BMCs), and the platforms are changing at an alarming clip. AMD is somewhat worse in the "moving target" regard - they recently deprecated all 4000 series cards from being supported by GPGPU tools, and have abandoned their CTM, CAL, and Brook+ environments before settling on OpenCL, and only OpenCL. Nvidia still supports both their C
Read the rest of this comment...

Share
twitter facebook
Yes, this is legit and no, we're not idiots (Score:5, Informative)

by Supp0rtLinux ( 594509 ) writes: <Supp0rtLinux@yahoo.com> on Tuesday September 13, 2011 @07:28PM (#37392924)

For everyone that thinks I trolled slashdot... here's the quick backstory behind my question(s): Our organization received a grant to pay for this from a private philanthropist that has a medical issue that is currently being researched by one of our labs (this happens to us not to infrequently). We have an existing HPC of roughly 300 nodes and 1200 cores that's all 1Gbps connected and running Rocks 5.1. The grant money came in in two different payments. We used the first payment to buy the nodes (which are in route to arrive in 2 weeks or so). The second payment was going to pay for the GPU's and the extra infrastructure (storage is one thing we currently have plenty of... both SAN and NAS). Unfortunately, we hit two issues: 1) one of our more seasoned enterprise admins took a new job at Apple's new NC datacenter and 2) our cluster admin passed away from a heart attack about a week after the purchase was made. This put us into a bit of a holding pattern. We're in the process of replacing both of them, but in the meantime we A) have the equipment arriving soon and B) have the second round of the grant money in hand now. We're smart enough to know that we lost two very valuable resources and we decided to step back, pause, and re-evaluate. The servers are already bought. The infrastructure, interconnects, and GPU's are not. The old admin knew which GPU's he wanted; unfortunately we haven't found his research anywhere to know what and why. He had also planned to go with the latest release of Rocks, but only because he was very familiar with it. We know there are other options out there and we've no idea how well Rocks can scale. Additionally, I don't see an option for chargeback with Rocks (at least not from a Google search), plus we've heard they recently lost a core developer. Thus, we went to the Slashdot community for advice. So I've already seen some good info on the IB versus 10GbE question and its much appreciated. We're still looking for info on which Linux distro and which GPU to go for. We want to make the best decision we can and use the money as wisely as possible. But we also realize that we know what we don't know and thought the Slashdot community could provide some experience to help us make the right decisions.

Share
twitter facebook
- Re: (Score:2)
  
  by rish87 ( 2460742 ) writes:
  
  Okay apparently you aren't trolling but you have to understand people's suspicions. I understand you've lost key people, but still, these sorts of decisions are important for initial phases of the design that everyone should be aware of. A few suggestions: If you are running a lot of smaller parallel jobs that do most of the computation within the same node (more of a SMP parallel vice mpi) then you may get away without using 10gbe unless you are also moving a lot of data through the network for storage
- MOD PARENT UP (Score:2)
  
  by jamesh ( 87723 ) writes:
  
  I wish you'd mentioned that in your original post, because it read like "in two weeks we are making an attempt to land on the moon. We are considering dusting off one of the old Saturn series rockets or maybe going with something newer... what does Slashdot think?"
  Sorry to hear about your loss of staff. Hope it all works out for you.
- Re: (Score:3)
  
  by hackstraw ( 262471 ) writes:
  
  If you want to hire me send a mail to hpc.hackstraw@spamgourmet.com. Expert in the field.
- Re:Yes, this is legit and no, we're not idiots (Score:5, Funny)
  
  by Anonymous Coward writes: on Tuesday September 13, 2011 @08:31PM (#37393402)
  
  "I've got 1200 servers shipping to me and my two best engineers are gone and we're not sure what to do with them when they get here."
  Best. IT horror story. Ever.
  
  Parent Share
  twitter facebook
- Re: (Score:3)
  
  by byteherder ( 722785 ) writes:
  
  If you are serious, go the SuperComputing 2011 conference. Pretty much all the supercomputing geeks hang out there and you can get all your question answers by experts.
  
  As for whether to go with IB or 10GbE, go with IB if you can afford it. IB has a bunch of advantages faster bandwidth, lower latency, but you pay for it in price.
  
  Good Luck.
  
  byteherder
- Re:Yes, this is legit and no, we're not idiots (Score:4, Interesting)
  
  by Sgs-Cruz ( 526085 ) writes: on Wednesday September 14, 2011 @12:08AM (#37394638) Homepage Journal
  
  Are you at MIT and is your benefactor David Koch? Because in that case, we have some researchers up at the Plasma Science and Fusion Center that do simulation work that could definitely use access to a bigger cluster. As long as you can compile FORTRAN on it, the TRANSP runs and GYRO simulations that we do are already run on a (smaller) cluster. This falls under "energy research" and is way cool to boot.
  I'm not joking, if you are at MIT, please get in touch with Martin Greenwald (contact info on the PSFC staff page [mit.edu]).
  
  Parent Share
  twitter facebook
- - Re: (Score:3)
    
    by bill_mcgonigle ( 4333 ) * writes:
    
    Steve Jobs gave you how much funding?!
    And then hired their sysadmin out from under them? No.
- - Re: (Score:3)
    
    by DarwinSurvivor ( 1752106 ) writes:
    
    Did you just ask for a job while posting as Anonymous Coward and THEN ask them to post their email as a public reply to it?!?
  - - Re: (Score:3)
      
      by afidel ( 530433 ) writes:
      
      The 2050 is what HP uses in the SL390 cluster configuration because they can actually cool and power 8 of them in a 4U enclosure, since the M2070 has the same power draw it should be capable of the same density.
The best use ... switch it off while it's not used (Score:2)

by Lazy Jones ( 8403 ) writes:

Save some energy, switch it off until you find something useful to do with it. It's the Right Thing to do. ;-)
OS, duh! (Score:4, Funny)

by ThurstonMoore ( 605470 ) writes: on Tuesday September 13, 2011 @10:17PM (#37394048)

The obvious answer is Windows Server 2008 HPC.

Share
twitter facebook
BS on "largest cluster" (Score:3)

by sl3xd ( 111641 ) writes: on Wednesday September 14, 2011 @01:23AM (#37394974) Journal

I have to wonder what you're on the east coast of. East coast of Madagascar? I work in HPC; a thousand nodes just isn't that much. We sold larger clusters than that four years ago.

Share
twitter facebook
- Re: (Score:2, Funny)
  
  by oobayly ( 1056050 ) writes:
  
  Indeed, it's a bit like somebody writing in to Dear Deirdre and saying "I've a 13 inch cock, how can I make girls aware of this, and what's the best way to make use of it?"
  - Re: (Score:3, Interesting)
    
    by webmistressrachel ( 903577 ) writes:
    
    No it's not, some really ugly, nerdy guy out there has a big cock and nobody is interested in him - he can't just flop it out in public, so that might be a very real problem for him! Or maybe he does, and girls only want him for that?
    Back on topic, it's not like that at all because the computer is probably real, and if not, it's just another hypothetical "Ask Slashdot" for us to fantasize over. "What would you do if you had...". What's wrong with that? Just my 2 pence!
- Re: (Score:2)
  
  by 93 Escort Wagon ( 326346 ) writes:
  
  Wait a moment here. You're this close to receiving your hardware and you don't even know what O/S you're planning to use, what interconnect to choose, or what problems you intend to solve with it? Where do you get funding like this?
  Yeah, I think we need more specific info here. I can't see any way a group would attract funding without spelling out all these items... however the submitter doesn't actually refer to funding, he states "I will be receiving everything necessary to build ...". What does that mean, exactly? Did he just buy hundreds of 386-based machines off the scrap heap? And, more importantly, does this person's supervisor know he apparently seems to think this is his own personal playground rather than a professionally ru
- Re: (Score:2)
  
  by ArsonSmith ( 13997 ) writes:
  
  Yea, I almost got caught by that super computer in the impulse buy section at the drugstore checkout too.
- Re: (Score:2)
  
  by stox ( 131684 ) writes:
  
  That is exactly what we used to burn in the first SGI Origin 2000, and later the first few thousand nodes of a Linux cluster at Fermilab.
- Better than SETI (Score:2, Offtopic)
  
  by tomhudson ( 43916 ) writes:
  
  Help everyone here on earth
  Generate every single possible combination of software or business method patent, and break the patent office once and for all.
- Re:SETI ! (Score:4, Insightful)
  
  by Jarik C-Bol ( 894741 ) writes: on Tuesday September 13, 2011 @09:49PM (#37393920)
  
  screw SETI, run folding@home and find the cure for cancer. We need that a little more than we need to stare at the sky, wishing someone would call from alpha centauri or some such place.
  
  Parent Share
  twitter facebook
- What? (Score:3, Insightful)
  
  by sycodon ( 149926 ) writes:
  
  Isn't this shit you should have had all figured out before you even applied to whatever company, agency, government, etc, you got the money from?
  WTF is this? I can only hope you didn't get money from the feds.
  "Hey, look! The feds gave me a shit load of money to get this cool super computer...what should I do with it?"
  Seriously...if you got any government money for this then you are first class tool for not having all of this known before you even applied.
- Re: (Score:2)
  
  by GameboyRMH ( 1153867 ) writes:
  
  Makes me feel like an idiot for not applying to a job working on a small cluster used for climate research. I didn't apply because I didn't have any HPC knowledge.
  If I knew I could get hired without knowing shit I would've given it a shot!
- Re: (Score:2)
  
  by drwho ( 4190 ) writes:
  
  OK, LFTR (Liquid Fluoride Thorium Reactor) development would be useful. Can you explain what modeling needs to be done? Is this merely a provisioning problem (you haven't got the computational resources), or it is also a programming problem, and perhaps even an algorithm problem (do you know what you want to compute)?
  Another question is, who would own the results?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Lost some funding? (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Best Use For a New Supercomputing Cluster? (Score:2, Funny)

Re: (Score:2)

I call Shenanigans!!! (Score:5, Insightful)

Re: (Score:2, Informative)

Re: (Score:2, Funny)

Re: (Score:3)

Totally believable. (Score:4, Interesting)

Re: (Score:2)

Re:Totally believable. (Score:4, Interesting)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

While I find this highly doubtful.... (Score:4, Interesting)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re:While I find this highly doubtful.... (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Ummm two things (Score:4, Insightful)

Re: (Score:2, Interesting)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Uh oh.. (Score:5, Insightful)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Crysis 2 (Score:2)

Re: (Score:2)

Re: (Score:2)

Riiiiight (Score:2)

Re:Riiiiight (Score:4, Funny)

Re: (Score:2)

Pong (Score:2)

EPIC TROLLING (Score:5, Insightful)

excuse me... (Score:2)

Use it to run Skein Hash... In Bash (Score:2)

Mom & dad's new basement data center (Score:2)

What we do ... (Score:4, Informative)

Re: (Score:2)

Ditto On Redhat, w/PBS (Score:3)

Did someone say Bitcoin!? BUY! BUY! (Score:3)

Re: (Score:3)

Monkeys! (Score:2)

hardly the biggest (Score:2)

Re: (Score:2)

Multiple super-computers instead of a single one? (Score:2)

break blu-ray encryption (Score:2)

As a cluster admin myself.... infiniband!!! (Score:3)

Total BS (Score:2)

two weeks away, and you still haven't spec'ed... (Score:2)

Imagine Beowulf of those! (Score:2)

Re: (Score:3)

Closing the barn door? (Score:2)

Cluster software & GPU experence (Score:5, Informative)

Yes, this is legit and no, we're not idiots (Score:5, Informative)

Re: (Score:2)

MOD PARENT UP (Score:2)