Forgot your password?
typodupeerror
Networking Supercomputing Linux

Ask Slashdot: Best Use For a New Supercomputing Cluster? 387

Posted by Unknown Lamer
from the reclaim-heat-for-silicon-diner dept.
Supp0rtLinux writes "In about 2 weeks time I will be receiving everything necessary to build the largest x86_64-based supercomputer on the east coast of the U.S. (at least until someone takes the title away from us). It's spec'ed to start with 1200 dual-socket six-core servers. We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs. So, what's the best Linux distro for something of this size and scale? Any that include a chargeback option/module? Additionally, due to cost contracts, we have to choose either InfiniBand or 10Gb Ethernet for the backend: which would Slashdot readers go with if they had to choose? Either way, all nodes will have four 1Gbps Ethernet ports. Finally, all nodes include only a basic onboard GPU. We intend to put powerful GPUs into the PCI-e slot and open up the new HPC for GPU related crunching. Any suggestions on the most powerful Linux friendly PCI-e GPU available?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Best Use For a New Supercomputing Cluster?

Comments Filter:
  • by turkeyfeathers (843622) on Tuesday September 13, 2011 @05:35PM (#37392050)
    Start with the cheapest backend that'll get the system up and running, then use your supercomputer to mine Bitcoins for a few days, then use all the money you'll make to buy the InfiniBand backend (you'll probably have enough money left over to buy Monster cables to hook everything up).
  • Generating Bitcoins

    • LOL, where's my mod points when I need them. The bitcoins will help offset the energy consumption I'm almost sure.
  • by sconeu (64226) on Tuesday September 13, 2011 @05:36PM (#37392058) Homepage Journal

    No way in hell a project that big gets approved without a rationale.

    And no way in hell the administrator of such a project would ask Slashdot what to do with it.

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Truth!

      Two weeks away and still at the “thinking of cool shit to use it for” and “picking out hardware” stages? How does that even happen? Is this some kind of tax scam to burn as much money as possible?

      I get that the submitter already have a primary use... but I imagine if I was ever given that kind of budget I’d probably have to account for every CPU cycle every hour of the day (especially since I’m a programmer and should have no business with something like this ;p). I

      • Re: (Score:2, Funny)

        by Anonymous Coward

        Yes, it probably is a tax scam. It is now the US Federal Year End. Someone wrote a really good funding proposal and got it approved to get money for a HPC cluster to do *something*. Doesn't really matter. The grant application will have focused on broad ideas like # of cores and what not and not the details. A bit surprising that the network wasn't spec'd because that is such a major cost item, but whatever, maybe the grant application's work loads are not network bound.

        So, now that the money is approved th

      • Also who the Hell buys hardware like this without vendor support? OS and backend choices should have been part of integration from the vendor. No one buys 3000 rack mount servers, a bunch of switches, some racks and some storage and builds "the largest x86_64-based supercomputer on the east coast of the U.S."

        OP, if you are in anyway serious about this stop now. You don't want the largest supercomputer on the East Coast, you want a computer that works. Call SGI, IBM, Cray, or even (ewww) Oracle/Sun and g

    • Totally believable. (Score:4, Interesting)

      by khasim (1285) <brandioch.conner@gmail.com> on Tuesday September 13, 2011 @05:50PM (#37392180)

      I totally believe the submitter's question.

      Next up on Ask Slashdot:
      I just got permission to buy the biggest fleet of trucks on the east coast ... and I was wondering if anyone on Slashdot had any ideas what I should do with them.

      Followed by,
      The company I work for just purchased 10,000 acres of land on the east coast and I was wondering if anyone on Slashdot had any idea what we should do with it.

      Happens all the time!

      • You're just trying to get us arrested. Better luck next time, Mr. DEA agent....

      • by blair1q (305137) on Tuesday September 13, 2011 @06:47PM (#37392638) Journal

        Actually, it does.

        I remember taking possession of a spanking-new Thinking Machines cluster some <mumble> years ago.

        The principal investigator got it to do one particular calculation, and promised the excess would be put to good use.

        We spent our time trying to figure out what "good use" meant in that context.

        It hasn't got much easier.

        I say if you run out of numbers to crunch of your own, these days, just hook it up to some lucky grid-computing project and let it swamp the stats.

    • by Amouth (879122)

      agreed - was just about to ask who was stupid enough to let someone buy that much hardware without an existing project/plan in place. and how can i get them to fund me and my start-up (don't have one now but you bring the cash i'll figure out something to do with it)

    • by xzvf (924443) on Tuesday September 13, 2011 @06:01PM (#37392286)
      I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget. I can see a university get a late grant, that had to be spent in 30 days, could only be spent on technology, that can only come out of a pre-approved catalog, and some administrative type that just saw a Top 500 super-computer list with competing university names on it, bring up in a meeting that we should build a super computer, and some grad assistant saying how easy it would be. They found a room with a window in it and ordered a bunch of parts, and will walk prospective students and their parents by it saying "This is the largest super-computer on the east coast".
      • by geekmux (1040042)

        I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget. I can see a university get a late grant, that had to be spent in 30 days, could only be spent on technology, that can only come out of a pre-approved catalog, and some administrative type that just saw a Top 500 super-computer list with competing university names on it, bring up in a meeting that we should build a super computer, and some grad assistant saying how easy it would be. They found a room with a window in it and ordered a bunch of parts, and will walk prospective students and their parents by it saying "This is the largest super-computer on the east coast".

        Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.

        Sad to say, I've seen Government "last-minute" spending like this too, but not exactly to this level of magnitude. This is a shitload of money "left over". This may have come from somewhere, but "budget" obviously had nothing to do with it.

        • by kcitren (72383)

          Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose?

          Nope, I never wonder because the answer is obvious. If they don't spend it this year, they won't get it next year.

        • by Zancarius (414244)

          Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.

          In some parts of the DoD it's so bad that, due to the way the finances work, if there is unallocated parts of the budget they'll be removed for the following fiscal year, sending everyone into a scramble to spend whatever's left of their budget before the axe drops. It's no secret then that most divisio

        • by robotkid (681905) <alanc2052&yahoo,com> on Wednesday September 14, 2011 @02:00AM (#37395170)

          Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.

          Sad to say, I've seen Government "last-minute" spending like this too, but not exactly to this level of magnitude. This is a shitload of money "left over". This may have come from somewhere, but "budget" obviously had nothing to do with it.

          Yeah, I used to wonder that too. Then my wife got a job in state government. And the answer became painfully obvious judging by the maximum pace at which stuff gets done even when you have people willing to work hard and important problems sitting right in front of you. If you allowed unspent money to roll over indefinitely, that would create an irresistible incentive to do the cheapest job that won't get you in trouble and then hoard, hoard that money. Heck, you could stretch that 3-year project into a 5-year one by doing it very slowly. You could build up a war chest and use it on pet projects that noone approved. Or you could wait till no-one even remembers the project existed anymore and then embezzle it.

          So as inefficient as it is, the blanket rule that all money must be spent the year in which it is allocated is a simple way to increase transparency and accountability across the board. It may even be one of the driving forces anything gets done remotely on schedule in an environment where purchasing a USB cable requires 2 requisition forms, 3 vendor quotes, the signature of your boss (who is in an all-day meeting), your boss's boss (who is talking with legislators today and can't be disturbed), and pre-approval from someone in accounting (who just went on vacation yesterday).

          Of course, it would be great if getting the job done on time and under-cost were somehow rewarded. But that's incentivizing success, that's the profit maximizing, the corporate bottom line, whereas the the Gub'ment bottom line is minimizing "embarrassment" (be it from the media, the voting public, and especially legislators on the appropriations committee). You use a Gub'ment bureaucracy for things you can't trust the for-profit world to do on their own, so the service provided has to be somewhat divorced from the revenue stream if you want to ensure more reliable results than just contracting out to a private company. (I'm sure Ron Paul would beg to differ, but then again he also probably enjoys being able drink water out of the tap without getting sick). You wouldn't pay a health inspector, for example, just based on the number of sites inspected per day because that encourages as cursory a job as possible on as many sites as possible. Instead, you set a minimum quota they have to fulfill, and then make it known you'll have their head on a platter if a restaurant shows up in the news for salmonella poisoning the week after you've signed off on it. That's the Gub'ment way. .. .

    • by AdamHaun (43173)

      It did have one. Right there in the submission:

      We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs.

      • by geekmux (1040042)

        It did have one. Right there in the submission:

        We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs.

        Ah, stating an existing purpose on "fairly small" hardware vs. the justification to spend for the "largest x86_64-based supercomputer on the east coast of the U.S." are several orders of magnitude away from each other (and common sense for that matter). Sorry, but I'm calling Shenanigans on this too.

        And if this turns out to be true, then I don't give a shit what they do with the HPC. I want to meet the person who managed to get this expense approved with basically little or no justification behind it, for

        • by Doc Ruby (173196)

          They are buying a supercomputer because their lucrative medical research is too big for the smaller HPC, but not (yet) big enough for the biggest supercomputer of its type in the region. So they're also looking for some other apps to use the extra capacity instead of it going to waste.

          That might not be true - this is just a Slashdot assertion. But there's nothing inconsistent in there to suggest it's false. It's perfectly plausible.

          You are just one of the modern type of people who make up your mind on your

    • by kcitren (72383)
      Government year end money; use it or lose it. I've seen this happen before, they're a few hundred thousand laying around allocated to hardware acquisition. They need to spend it fast, so they find something related to what they do and get something newer, bigger, and better...
      • by Doc Ruby (173196)

        Whether or not this is a true story, or whether or not it's a government project, there is as much budget-reserving in private industry like what you described as there is in government. Probably more, since government is more transparent than private business, and so more people have access to exposing that little game, which tends to inhibit it some.

    • No way in hell, indeed. Everything about this is stupid. Take the cost, spending 1200 servers * 2 cpus each * at least $200 and you're singing $480,000, not including the 1200 servers themselves which I'm going to lowball at $200 each because I don't feel like newegging it, and you get $720,000. If you really had a lot money to spend in a short period of time, is this the first thing you would think of to squander it on? Do you already have 8-core desktops with dual 50" HDTV's as displays? How's all yo
  • Ummm two things (Score:4, Insightful)

    by Sycraft-fu (314770) on Tuesday September 13, 2011 @05:36PM (#37392060)

    1) Something with 10gb really isn't a "supercomputer" it is a cluster. Fine, but call it what it is. I really wouldn't call a cluster with Infiniband a supercomputer either.

    2) You really should maybe get someone who knows more about your project and someone who knows more about clusters/supercomputers. The questions you are asking are not ones I would want to see form the guy making the choices on a multimillion dollar project.

    • Re: (Score:2, Interesting)

      by Anonymous Coward

      You clearly have no idea what you're talking about. I was just part of a million-euro EU project consisting of a large partnership of universities and companies. Given the fact that none of them ever did anything, my professor gave up and defined the project on his own.
      I coded the entire project on little more than minimum wage while I was also attending classes. I managed a couple of helpers who did web design and documentation, and dealt with the rest of the partners on my own, even interacting with fancy

    • by Anubis350 (772791)
      1)You haven't been to any computer conference (like, say, SC) have you? or worked on a supercomputer? Most supercomputers these days are clusters, and hell, one of the most common interconnects is still gigE, not even 10gigE, though that's slowly changing (check the top500 stats if you don't believe me, but I've been at SC's top500 announcement every year for the past 4, and it's been mentioned each time. For that manner I run jobs on a gig based cluster everyday, and for many types of work it's not necessa
      • They may call them "supercomputers" but in my mind that is mislabeling things. They work for cluster operations, where there's not a ton of inter-node communication and no need for access to memory outside your node. Well, that is what supercomputers were made for. So in a real supercomputer, you have the ability to do that. That is also why real supercomputers cost more.

        I think it is an important distinction for that reason. While a supercomputer can do all a cluster can, the reverse is not true. Same with

        • by wagnerrp (1305589)
          The only real difference between clusters and shared memory "supercomputers" is that shared memory systems get a hardware assist to access remote data, while clusters have to do it all in software in the network stack and communications framework. When your infiniband backbone is running 5GB/s and latencies in the hundreds of nanoseconds between each node, where is the real cut off? It seems more like a gradual sliding scale to me.
      • by blair1q (305137)

        I think 2) is not seeing the whole story there.

        They do have a continual use for mass quantities of computation. But it looks like it's not a 24/7 workload. And with $/core dropping like a rock, this iteration of the "biggest" may be cheaper than the last, and therefore not the sort of budgetary lightning rod that building-sized supercomputers used to be.

  • Uh oh.. (Score:5, Insightful)

    by joib (70841) on Tuesday September 13, 2011 @05:36PM (#37392066)
    Shouldn't you have figured out answers too all these (simple) questions before ordering several million $$$worth of hardware? Sheesh.. As for you specific questions: - IB vs. 10GbE: IB hands down. Much better latency and more mature RDMA software stacks (e.g. for MPI and Lustre). Cheaper and higher BW as well. - GPU: NVidia Fermi 2090 cards. CUDA is far ahead of everything else at the moment.
    • I'll assume you know more about this than me, but he did say that the nodes are going to be wired with 4x GigE. Might there be a penalty bridging from that to IB rather than 10GigE?

      Anyway, to get low latency those GigE links to the nodes need to be optimized. I thought this was interesting:

      High performance network technologies such as InfiniBand use a kernel by-pass method to improve performance. This capability is also available for Ethernet, but is not widely used outside of the HPC community. One such m

    • by LWATCDR (28044)

      This has got to be a troll. I mean really setting up a cluster and you have no idea about the interconnects or GPUs? Not to mention cooling or power. I picture this being put together in a spare back room and walls of plastic shelving and APC UPSs from Best Buy.
      Who would fund such a thing.
      Here is the best of all suggestions if this is not a troll. FIND A VENDOR. http://www.linuxclusters.com/vendors.html [linuxclusters.com]

      • by LWATCDR (28044)

        I really want to believe that you are correct but I have dealt with government IT people before. This could be on the up and up, good lord help us all.

  • by Arnos (91951)

    Perhaps this can actually run (gasp) Crysis?

  • We're supposed to believe that you've purchased 1200 servers, 2400 six core CPUs and all the associated hardware without deciding basic things like how you're going to connect it all or what distribution you're going to use?

  • One really smooth and acuter game of pong! or asteroids if that suits you fancy... though it will require a bit more computing power :)

  • EPIC TROLLING (Score:5, Insightful)

    by jpedlow (1154099) on Tuesday September 13, 2011 @05:44PM (#37392142)
    Wow, he just TROLLED THE CRAP out of slashdot. We mad, bros!
    • but destroying the market for bitcoins has a quantifiable societal benefit. Burn down bitcoin's house while you burn in your hardware!
  • You are going to need something like that to get Skein Hash In Bash [slashdot.org] done in an acceptable time.
  • It would appear somebody got enough of a life to move out of mom and dad's basement and now wants to convert it into a Bitcoin mining hub....

  • What we do ... (Score:4, Informative)

    by Anonymous Coward on Tuesday September 13, 2011 @05:47PM (#37392158)

    Similar size setup in bio-informatics in Europe. We run redhat 6.1, was centos 5 and LSF. single 1gbit to each server (blades). No need for 10gb or IB unless huge mpi which no one uses. 32GB to 2TB per node - some people like enormous R datasets. All works well for our ~500 users.

    • by gknoy (899301)

      Thank you for posting the first informative post I saw, rather than mocking or trolling ones. :)

    • This is what the biggest USAF compute cluster uses (RH, PBS), the main difference being that it does include IB because MPI support was a requirement (and is used). Otherwise, you'd better hope your users' jobs are almost exclusively embarrassingly parallel. The cluster is based on Dell PowerEdge blades, which provided good mflop/$.

      They're playing with full size Tesla GPU cards in one of the blades. I'm not sure what will give you the best bang for the buck: Tesla/Fermi/FirePro cards in-blade, or the Nvidia

  • by recrudescence (1383489) on Tuesday September 13, 2011 @05:50PM (#37392172)
    Holy crap! Someone mentioned the word "Bitcoins" on slashdot again! It's only a matter of time before its value hits the roof again! Quick! BUY! BUY!
  • How about helping me out with some computing power for my monkeys project? http://www.jesse-anderson.com/2011/08/a-few-more-million-amazonian-monkeys/ [jesse-anderson.com]
  • Amazon's HPC cluster there in Virginia I suspect is way bigger then your little toy..
    plus all the agencies.

  • i dunno...someones still working on cancer i think...and i know a guy whos still trying to find the higgs bozon.
    having solved all other problems, maybe dick around on jeopardy?
  • You need to specify additional information:

    1) What about the data and storage? Many complex applications require vast amounts of data (e.g. climate change models, CFD models, GIS data sets that can complement or take advantage of modeling). Many end users may not be very adept at accessing these data.
    2) What about the software? For example, CFD modeling software is very expensive. In some cases, open source software may not make the cut.
    3) Does it have to be a single supercomputer? Why not split into multip

  • Assuming it hasn't already been done.

  • by Fallen Kell (165468) on Tuesday September 13, 2011 @06:06PM (#37392328)
    I can not stress this enough. As good as 10gb ethernet is, the latency is still horrible compared to infiniband.

    As for distributions, really, that depends on what you are doing and how your current applications are built/designed. Rocks cluster is fairly nice. Unfortunately we have not been able to deploy that due to our FOSS policies, which have really been hurting this project. So we have a mixed Red Hat and Solaris cluster using Grid Engine.
  • I work with some of the largest supercomputers in the world... and I can tell you that this is BS. There is no way this guy got someone to give him enough cash to put this together without:

    1. A Plan of what to buy / build
    2. A sound reasoning behind what would be done with the machine.

    Beyond that... that isn't even that large of a cluster. There are numerous computers on the east coast larger than that... at universities and government research labs (i.e. http://www.nccs.gov/computing-resources/jaguar/ [nccs.gov] alt

  • two weeks away, and you still haven't spec'ed all your hardware?
    c'mon, this is a put on!
    if you're getting this monster installation, you would have spec'ed all aspects of the hardware, including 10gb and gpu's and OS months ago.

  • Come on, folks. Is that Slashdot or what?

    • by blair1q (305137)

      I was imagining partitioning it into an enormous brigade of heterogenous virtual machines, then hooking those up as a Beowulf cluster.

  • Not quite the perfect analogy but close enough. Seems to me that these questions should all have been answered before a single piece of hardware was ordered.
  • by PAPPP (546666) on Tuesday September 13, 2011 @07:11PM (#37392828) Homepage
    I assume this is an epic troll, but am going to give an honest answer anyway, because there are some legitimate questions buried in there.

    I work with a aggregate.org [aggregate.org] a university research group which has a decent claim [aggregate.org] to having built the very first Linux PC Cluster, set some records [wikipedia.org] with them (KLAT2 and KASY0 were both ours), and still operates a number of Linux clusters, including some containing GPUs, so I feel like I have some idea of the lay of cluster technology. It is *way* overdue for an update (and one is in progress, we swear!), but we also maintain TLDP's widely circulated Parallel Processing HOWTO [tldp.org], which was the goto resource for this kind of question for some time.

    In a cluster of any size, you do _not_ want to be handling nodes individually. There are several popular provisioning and administration systems for avoiding doing so, because every organization with a large number of machines needs such a tool. The clusters I deal with are mostly provisioned with Perceus [infiscale.com] with a few ROCKS [rocksclusters.org] holdovers, and I'm aware of a number of other solutions (xCat [sourceforge.net] is the most popular that I've never tinkered with). Perceus can pass out pretty much any correctly-configured Linux image to the machines, although It is specifically tailored to work with Caos NSA (Redhat-like), or GravityOS (a Debian derivative) payloads. Infiscale, the company that supports Perceus, releases the basic tools and some sample modifiable OS images for free, and makes their money off support and custom images, so it is pretty flexible option in terms of required financial and/or personnel commitment. The various provisioning and administration tools are generally designed to interact with various monitoring tools (ex. Warewulf [wikipedia.org] or Ganglia [sourceforge.net]) and job management systems (see next paragraph).
    Accounting and billing users is largely about your job management system. Our clusters aren't billed this way, so I can't claim to have be closely familiar with the tools, but most of the established job management systems like Slurm [llnl.gov], and GridEngine [wikipedia.org] (to name two of many) have accounting systems built in.
    The "standard" images or image-building tools provided with the provisioning systems generally provide for a few nicely integrated combinations of tools, which make it remarkably easy to throw a functioning cluster stack together.

    As for GPUs... be aware that the claimed performance for GPUs, especially in clusters, is virtually unattainable. You have to write code in their nasty domain-specific languages (CUDA or OpenCL for Nvidia, just OpenCL for AMD) and there isn't really any concept of IPC baked in to the tools to allow for distributed operations. Furthermore, GPUs are also generally extroridnarly memory and memory bandwidth starved (remember, the speed comes from there being hundreds of processing elements on the card, all sharing the same memory and interface), so simply keeping them fed with data is challenging. GPGPU is also an unstable area in both relevant senses: the GPGPU software itself has a nasty tendency to hang the host when something goes wrong (which is extra fun in clusters without BMCs), and the platforms are changing at an alarming clip. AMD is somewhat worse in the "moving target" regard - they recently deprecated all 4000 series cards from being supported by GPGPU tools, and have abandoned their CTM, CAL, and Brook+ environments before settling on OpenCL, and only OpenCL. Nvidia still supports both their C
  • by Supp0rtLinux (594509) <Supp0rtLinux@yahoo.com> on Tuesday September 13, 2011 @07:28PM (#37392924)
    For everyone that thinks I trolled slashdot... here's the quick backstory behind my question(s): Our organization received a grant to pay for this from a private philanthropist that has a medical issue that is currently being researched by one of our labs (this happens to us not to infrequently). We have an existing HPC of roughly 300 nodes and 1200 cores that's all 1Gbps connected and running Rocks 5.1. The grant money came in in two different payments. We used the first payment to buy the nodes (which are in route to arrive in 2 weeks or so). The second payment was going to pay for the GPU's and the extra infrastructure (storage is one thing we currently have plenty of... both SAN and NAS). Unfortunately, we hit two issues: 1) one of our more seasoned enterprise admins took a new job at Apple's new NC datacenter and 2) our cluster admin passed away from a heart attack about a week after the purchase was made. This put us into a bit of a holding pattern. We're in the process of replacing both of them, but in the meantime we A) have the equipment arriving soon and B) have the second round of the grant money in hand now. We're smart enough to know that we lost two very valuable resources and we decided to step back, pause, and re-evaluate. The servers are already bought. The infrastructure, interconnects, and GPU's are not. The old admin knew which GPU's he wanted; unfortunately we haven't found his research anywhere to know what and why. He had also planned to go with the latest release of Rocks, but only because he was very familiar with it. We know there are other options out there and we've no idea how well Rocks can scale. Additionally, I don't see an option for chargeback with Rocks (at least not from a Google search), plus we've heard they recently lost a core developer. Thus, we went to the Slashdot community for advice. So I've already seen some good info on the IB versus 10GbE question and its much appreciated. We're still looking for info on which Linux distro and which GPU to go for. We want to make the best decision we can and use the money as wisely as possible. But we also realize that we know what we don't know and thought the Slashdot community could provide some experience to help us make the right decisions.
    • by rish87 (2460742)
      Okay apparently you aren't trolling but you have to understand people's suspicions. I understand you've lost key people, but still, these sorts of decisions are important for initial phases of the design that everyone should be aware of. A few suggestions: If you are running a lot of smaller parallel jobs that do most of the computation within the same node (more of a SMP parallel vice mpi) then you may get away without using 10gbe unless you are also moving a lot of data through the network for storage
    • I wish you'd mentioned that in your original post, because it read like "in two weeks we are making an attempt to land on the moon. We are considering dusting off one of the old Saturn series rockets or maybe going with something newer... what does Slashdot think?"

      Sorry to hear about your loss of staff. Hope it all works out for you.

    • by hackstraw (262471)

      If you want to hire me send a mail to hpc.hackstraw@spamgourmet.com. Expert in the field.

    • by Anonymous Coward on Tuesday September 13, 2011 @08:31PM (#37393402)

      "I've got 1200 servers shipping to me and my two best engineers are gone and we're not sure what to do with them when they get here."

      Best. IT horror story. Ever.

    • If you are serious, go the SuperComputing 2011 conference. Pretty much all the supercomputing geeks hang out there and you can get all your question answers by experts.

      As for whether to go with IB or 10GbE, go with IB if you can afford it. IB has a bunch of advantages faster bandwidth, lower latency, but you pay for it in price.

      Good Luck.

      byteherder
    • by Sgs-Cruz (526085) on Wednesday September 14, 2011 @12:08AM (#37394638) Homepage Journal

      Are you at MIT and is your benefactor David Koch? Because in that case, we have some researchers up at the Plasma Science and Fusion Center that do simulation work that could definitely use access to a bigger cluster. As long as you can compile FORTRAN on it, the TRANSP runs and GYRO simulations that we do are already run on a (smaller) cluster. This falls under "energy research" and is way cool to boot.

      I'm not joking, if you are at MIT, please get in touch with Martin Greenwald (contact info on the PSFC staff page [mit.edu]).

  • Save some energy, switch it off until you find something useful to do with it. It's the Right Thing to do. ;-)
  • OS, duh! (Score:4, Funny)

    by ThurstonMoore (605470) on Tuesday September 13, 2011 @10:17PM (#37394048)

    The obvious answer is Windows Server 2008 HPC.

  • by sl3xd (111641) on Wednesday September 14, 2011 @01:23AM (#37394974) Journal

    I have to wonder what you're on the east coast of. East coast of Madagascar? I work in HPC; a thousand nodes just isn't that much. We sold larger clusters than that four years ago.

Logic is the chastity belt of the mind!

Working...