Ask Slashdot: Best Use For a New Supercomputing Cluster? 387
Supp0rtLinux writes "In about 2 weeks time I will be receiving everything necessary to build the largest x86_64-based supercomputer on the east coast of the U.S. (at least until someone takes the title away from us). It's spec'ed to start with 1200 dual-socket six-core servers. We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs. So, what's the best Linux distro for something of this size and scale? Any that include a chargeback option/module? Additionally, due to cost contracts, we have to choose either InfiniBand or 10Gb Ethernet for the backend: which would Slashdot readers go with if they had to choose? Either way, all nodes will have four 1Gbps Ethernet ports. Finally, all nodes include only a basic onboard GPU. We intend to put powerful GPUs into the PCI-e slot and open up the new HPC for GPU related crunching. Any suggestions on the most powerful Linux friendly PCI-e GPU available?"
Re:I call Shenanigans!!! (Score:2, Informative)
Truth!
Two weeks away and still at the “thinking of cool shit to use it for” and “picking out hardware” stages? How does that even happen? Is this some kind of tax scam to burn as much money as possible?
I get that the submitter already have a primary use... but I imagine if I was ever given that kind of budget I’d probably have to account for every CPU cycle every hour of the day (especially since I’m a programmer and should have no business with something like this ;p). I can’t imagine a budget for something like this comprised of “and hopefully we’ll be able to recoup the millions of dollars by leasing it out to some TBD people”.
Also, the first person to mention bitcoin as an option gets to have their teeth rotated. I’m not joking.. we will find you..
What we do ... (Score:4, Informative)
Similar size setup in bio-informatics in Europe. We run redhat 6.1, was centos 5 and LSF. single 1gbit to each server (blades). No need for 10gb or IB unless huge mpi which no one uses. 32GB to 2TB per node - some people like enormous R datasets. All works well for our ~500 users.
Re:Lost some funding? (Score:3, Informative)
Maybe the mods are a little more aware than you of the engineering and scientific FACTS about Monster Cable. Some things that you said:
Monster cables are only worth the investment for speakers and line-level / mic stuff (i.e. analogue signals). [...] But 44.1KHz 16-bit sound, converted to analogue in the transport and sent to the amp via line leads WILL benefit from Monster / premium cables, as will speaker cables of any kind.
are, I'm afraid, complete nonsense. Counterfactual, in fact. And yes, there's real science to support that. Let me gloss over it...
A 44.1 kHz sample rate before the DAC means the maximum frequency component the cables need to handle is 22 kHz. (This is due to the Nyquist limit, as in the Nyquist-Shannon Sampling Theorem.) 22 kHz is low. Really low. Practically any old piece of wire can carry audio frequencies with perceptually flat response across the audible range and nearly no loss as long as the cable lengths are as short as they are in a typical home stereo system. The only thing you need is large diameter wire for your speaker cables to ensure they're very low resistance so that the higher currents involved in powering a speaker don't cause resistive loss in the cable.
As for low-power line level signals (such as CD player to amp), the most likely source of problems is actually ground loops, where the source equipment has a different ground reference than the destination. (A lesser concern is interference.) The pros don't solve this with stupid Monster Cable, they solve it by using pro equipment with balanced (differential) signaling, which both eliminates the need for the source and destination to have a common ground and provides some noise immunity.
For home stereo systems, however, making sure that everything is grounded to the same point (3 prong plugs all plugged into a single grounded power strip) is generally good enough, and noise is rarely (if ever) a significant problem.
Cluster software & GPU experence (Score:5, Informative)
I work with a aggregate.org [aggregate.org] a university research group which has a decent claim [aggregate.org] to having built the very first Linux PC Cluster, set some records [wikipedia.org] with them (KLAT2 and KASY0 were both ours), and still operates a number of Linux clusters, including some containing GPUs, so I feel like I have some idea of the lay of cluster technology. It is *way* overdue for an update (and one is in progress, we swear!), but we also maintain TLDP's widely circulated Parallel Processing HOWTO [tldp.org], which was the goto resource for this kind of question for some time.
In a cluster of any size, you do _not_ want to be handling nodes individually. There are several popular provisioning and administration systems for avoiding doing so, because every organization with a large number of machines needs such a tool. The clusters I deal with are mostly provisioned with Perceus [infiscale.com] with a few ROCKS [rocksclusters.org] holdovers, and I'm aware of a number of other solutions (xCat [sourceforge.net] is the most popular that I've never tinkered with). Perceus can pass out pretty much any correctly-configured Linux image to the machines, although It is specifically tailored to work with Caos NSA (Redhat-like), or GravityOS (a Debian derivative) payloads. Infiscale, the company that supports Perceus, releases the basic tools and some sample modifiable OS images for free, and makes their money off support and custom images, so it is pretty flexible option in terms of required financial and/or personnel commitment. The various provisioning and administration tools are generally designed to interact with various monitoring tools (ex. Warewulf [wikipedia.org] or Ganglia [sourceforge.net]) and job management systems (see next paragraph).
Accounting and billing users is largely about your job management system. Our clusters aren't billed this way, so I can't claim to have be closely familiar with the tools, but most of the established job management systems like Slurm [llnl.gov], and GridEngine [wikipedia.org] (to name two of many) have accounting systems built in.
The "standard" images or image-building tools provided with the provisioning systems generally provide for a few nicely integrated combinations of tools, which make it remarkably easy to throw a functioning cluster stack together.
As for GPUs... be aware that the claimed performance for GPUs, especially in clusters, is virtually unattainable. You have to write code in their nasty domain-specific languages (CUDA or OpenCL for Nvidia, just OpenCL for AMD) and there isn't really any concept of IPC baked in to the tools to allow for distributed operations. Furthermore, GPUs are also generally extroridnarly memory and memory bandwidth starved (remember, the speed comes from there being hundreds of processing elements on the card, all sharing the same memory and interface), so simply keeping them fed with data is challenging. GPGPU is also an unstable area in both relevant senses: the GPGPU software itself has a nasty tendency to hang the host when something goes wrong (which is extra fun in clusters without BMCs), and the platforms are changing at an alarming clip. AMD is somewhat worse in the "moving target" regard - they recently deprecated all 4000 series cards from being supported by GPGPU tools, and have abandoned their CTM, CAL, and Brook+ environments before settling on OpenCL, and only OpenCL. Nvidia still supports both their C
Yes, this is legit and no, we're not idiots (Score:5, Informative)
Re:Yes, this is legit and no, we're not idiots (Score:2, Informative)
Frequently in academic settings this is not an option. Grant money for equipment is not transferrable to personnel.