Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Linux Software Science

LinuxBIOS, BProc-Based Supercomputer For LANL 189

An anonymous reader writes "LANL will be receiving a 1024 node (2048 processor) LinuxBIOS/BProc based supercomputer late this year. The story is at this location. This system is unique in Linux cluster terms due to no disks on compute nodes, using LinuxBIOS and Beoboot to accomplish booting, and BProc for job startup and management. It is officially known as the Science Appliance, but is affectionately known as Pink to the team that is building much of it."
This discussion has been archived. No new comments can be posted.

LinuxBIOS, BProc-Based Supercomputer For LANL

Comments Filter:
  • Re:Uses (Score:2, Informative)

    by saveth ( 416302 ) <cww&denterprises,org> on Tuesday October 08, 2002 @12:32AM (#4408103)
    Let's just hope they do something good with this. I'm tired of reading about how supercomputers are used for military war simulations.

    LANL tends to do projects that are focused much more on science and engineering than military applications. It's very likely that Pink will end up analysing spectral emissions of bombarded protons or something like this.

    The military simulations you mention probably don't happen at LANL.
  • by Anonymous Coward on Tuesday October 08, 2002 @01:13AM (#4408219)
    Not really, a new revision can be flashed with a single utility that can be run on all the nodes in parallel.
  • by goombah99 ( 560566 ) on Tuesday October 08, 2002 @01:27AM (#4408247)
    I've been a beta tester on the prototype for this system. It works great. I've seen diskless systems before they all were NFS nighmares, could not scale and had horrible tendencies to cause rippling crashes as one computer after the next timed out on some critical disk based kernel operation it could not complete across a wedged network.

    This one, brpoc, is different it is completely stable. You never get NFS wedges. Jobs launch in flash. Plus if you do reboot the whole thing is back up in seconds (literally).

    Bproc is an incredibly light weight job submission system. It is so light weight and fast that it changes how you think about sumbitting jobs. Rather than designing long duration jobs and tossing them on queue, you can just run tiny short jobs if you want with no loss to overhead. It makes you re-think the whole idea of batch processing.

    when the jobs run they appear in the process list of the master node. That is if you run "top" or "ps" the jobs are listed right there. In fact from the users point of view the whole system looks like just one big computer.

  • Re:Uses (Score:3, Informative)

    by foobar104 ( 206452 ) on Tuesday October 08, 2002 @03:07AM (#4408478) Journal
    Wrong. Render farms are neither clusters nor supercomputers. At best, a render farm might be considered an array.

    A supercomputer is a single system image. Some people call large clusters "supercomputers," but technically they're wrong.

    A cluster is an interconnected group of computers that can communicate with each other. Usually a cluster depends on some kind of software layer to allow programs to run across multiple systems, something like MPI. Clusters are tightly interconnected many-to-many systems.

    An array has a single job control system and a number of job execution systems. Batch jobs are submitted by users to the job control system, which doles them out to the various execution systems and then collects the results. The execution nodes don't talk to each other, and one job runs on one execution node at a time. Render farms are basically arrays; each execution node works on rendering a single frame of a multiframe animation. Because each frame can be rendered independently, without any dependencies on the previous and subsequent frames, rendering is particularly well suited to array computing.
  • Re:AMD Opteron (Score:3, Informative)

    by Hoser McMoose ( 202552 ) on Tuesday October 08, 2002 @08:31AM (#4409067)
    Ugg.. I do WISH that people would stop reading "Tom's Hardware", or at least that they would get a clue first and realize that Tom doesn't know dick-all about what he's talking about most of the time.

    His comments about heat rising more then 1C/second make NO SENSE AT ALL! It's flat-out wrong! I don't know what orafice he pulled that comment from, but it certainly had no technical backing to it. The chip uses a thermal diode. It will tell you the temperature whenever you poll it. It doesn't matter how fast or slow you poll it, it will give you the temp. You would really have to go out of your way to try to break this sort of data to get it to only be able to handle a 1C/s temp increase.

    As for the heat "problem". AMD's AthlonXP chips have a maximum power consumption of roughly 50-70W. Intel's P4's have a maximum power consumption of roughly 50-70W (yes, they consume almost the exact same amount of power, check the data sheets).

    For comparison, Intel's Itanium has a maximum power consumption of around 100-130W, and IBM's Power4 is also on the high-side of 100W.
  • Re:Don't be so sure (Score:3, Informative)

    by foobar104 ( 206452 ) on Tuesday October 08, 2002 @11:56AM (#4410163) Journal
    The important thing to notice about the word "supercomputer" is that it's singular. A supercomputer is a single system image; this is implicit in the definition. This is not to say that supercomputing clusters aren't worthy; it's just that they're different in important ways from single-system-image supercomputers.

    Some classes of problems aren't suited for cluster computation. I won't pretend to be educated enough to tell you exactly which problems can and can't be adapted for cluster computation, but consider the nature of clusters to see my point. Clusters are highly scalable, but the inter-node latency is huge. An interconnect like Myrinet can get your remote messaging latencies down to the microsecond range, but the far more common MPI/PVM-over-Ethernet solution is a thousand times slower than that. This makes it somewhat inefficient for node N to try to access a bank of memory on node M. In order for a cluster to be efficient, each node should have sufficient physical memory to hold it's entire data set, and each node should be able to operate more-or-less autonomously, without having to contact other nodes.

    Supercomputers are fundamentally different from clusters. In some cases, you can do the same job with either a supercomputer or a cluster. Some jobs are better suited to clusters, while some are better suited to supercomputers. Some jobs, as I mentioned above, are better suited to arrays than to either clusters or supercomputers. It just depends on the job.

If all else fails, lower your standards.

Working...