Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Linux Software

Linux Clusters Explained 53

tramm writes: "As someone who works on massively parallel Linux clusters everyday, I get tired of explaining why it is not 'Just another Beowulf'. Linux World has a good article on the four major types of Linux clusters. Our work is in supporting scientific codes that have a high degree of communication. This requires a very different system from the standard Beowulf-class machines that excel at the 'embarrasingly parallel' codes that do not require as much communication. The cost of the network interconnect for a high performance cluster is vastly more than that of a generic 100base-T system."
This discussion has been archived. No new comments can be posted.

Linux Clusters Explained

Comments Filter:
  • by -brazil- ( 111867 ) on Tuesday April 11, 2000 @08:00AM (#1139329) Homepage
    Preach on. The need for really fast node interconnection is what still makes people buy honest-to-god supercomputers. Look at this baby [lrz-muenchen.de] that was just installed next door from me.

    No Linux clustering project will ever reach the performance of such systems (though some of them might eventually run Linux), but the low-end high performance computing market (yes, I know it sounds oxymoronish) is bound to be taken by Linux.

  • by Signal 11 ( 7608 ) on Tuesday April 11, 2000 @08:02AM (#1139330)
    I suppose you don't consider putting a dozen webservers into a round-robin a cluster, either...

    Well, anyway.. "clustering" can mean just about anything to anyone, given the context. A system administrator probably thinks clustering is taking the same network resource and distributing it across several machines (which typically appear on the network as one logical object). A physics major might think clustering is a huge room full of SGIs all doing rendering and particle analysis... and an anonymous coward might think that clustering allows them to download pictures of natalie portman pouring hot grits down her pants. The point is that it means different things to different people.

    Boiled down to the basics, a cluster is simply a group of machines working towards a common goal (whether it be filesharing, parallel computations, or whatnot). There may only be 4 types now, but next week, there'll be 6.

  • by Bob(TM) ( 104510 ) on Tuesday April 11, 2000 @08:08AM (#1139331)
    The beauty of the Beowulf type solution is that you can tailor the configuration to the problem.

    In the past, you had to shell out beaucoup bucks to just get out of the shoot. With a Beowulf strategy, you can make hardware improvements in concert with different software approaches to achieve an optimal price/performance approach.
  • As IBM is now our friend, I'd like to see them port the switch adapter and pssp to linux. Is the switch adapter PCI? I know the nodes are PCI but I can't remember if the Switch Adapter goes in a special slot or not.
  • by rho ( 6063 ) on Tuesday April 11, 2000 @08:10AM (#1139333) Journal

    Nerd Clusters, which are more widespread than Beowulf, and more scalable.

    For example, a Nerd Cluster, using ChineseTakeOut messaging, are often used in last-minute, panic-striken Intranet roll-outs, yet each node of a Nerd Cluster can answer simple management questions such as, "Hey, my PC at home crashes all the time. How can I fix it?"

    Nerd clusters are, however, more dangerous to operate. If, for example, you say "Let's migrate our core applications from Solaris to NT", you run the risk of massive memory leakage as individual Nerd-nodes began to prioritize jobs such as "update_resume" over your request queue.

    Nerd clusters need a "master" node as well. These can generally be identified by their bushy beards, or a long string of nodes queueing up to beg for static IPs.

  • by Arctic Fox ( 105204 ) on Tuesday April 11, 2000 @08:11AM (#1139334) Homepage Journal
    Come up with a cool name.

    If i was on one of these teams,
    the beast would be called...

    COCK
    Centralized
    Organizational
    Compuational
    Kluster.

    A leading female computer scientist was over heard saying, "My, what a big COCK you have!"

    "University of ________ researches have built the world's largest COCK."

    "Some bugs were found in the COCK today."

    It could go on for days.


    Be thankful you are not my student. You would not get a high grade for such a design :-)

  • I agree with you entirely. But, the price-to-flop ratio of a beowulf cluster is well below that of a traditional supercomputer (on embarassingly parallel problems, that is). This is enough for some researchers to implement cheaper ( Unfortunately, this may drive supercomputers into more of a niche and might make Cray (whoever owns them now, anyway) and IBM rethink their strategy.

  • They are very different types of Clusters out there (MOSIX being one of the more interesting ones) but...

    Can you run Beowulf on them? `8r)

    --
    Gonzo Granzeau

  • by epaulson ( 7983 ) on Tuesday April 11, 2000 @08:16AM (#1139337) Homepage
    Condor [wisc.edu], from the University of Wisconsin, should have been listed on page two. Condor is a high-throughput computing system, that runs on UNIX (virtually all flavors) and NT. We support MPI and PVM. We can run regular jobs, or you can relink with our libraries and get transparent checkpointing and remote I/O. You can use sockets in your job.

    You don't need to have a dedicated cluster - Condor started life as a scavenger of idle workstations. We run Condor on every workstation here at CS, and routinely recover several thousand CPU-hours a day that otherwise would have been wasted. You can configure Condor to run with any policy you want on a per-workstation level - only run jobs at night, only run jobs from this group, only run jobs if the wind is blowing from the west - whatever makes sense to the workstation's owner.


    Best of all, we're free-as-in-beer.


    If you have any questions, send us mail at condor-admin@cs.wisc.edu [mailto]

  • "clustering" can mean just about anything to anyone, given the context.

    Well, to me, it means any team activity (usually as part of employment) that results in completely botched results.

    Example: "My ISP really clustered on that DNS switchover."

    :)

  • Pretty soon the fridge will have enough power to browse the web - but where are you going to encode your home video dvd's etc. You wouldn't want a big cube under your desk it would either be in the basement or distributed around the house.

    Maybe an thin beowulf client will get into the kernel (I'm not into the technique) and linux will be the answer, or Micro$ofts research on distributed computing will make it.

    Whatever the future holds take care of the good thing in UN*X like X-windows on the server in the cellar and don't let fancy desktops bring in anything that binds you to your machine instead of being free to grab any spare resource on the network.

    BTW. Any expert out there know how I can start a shell session on my server, log out of my X-terminal without quitting it and be able to check out the progress next time I logon.

  • by Zurk ( 37028 ) <zurktech AT gmail DOT com> on Tuesday April 11, 2000 @08:21AM (#1139340) Journal
    it depends..fiberoptic interconnects are already dropping in price and if you can get a fast enough bus on your cluster nodes and a fast enough interconnect it might catch up to traditional supercomputers. e.g. if you have 256 SGI octane machines with internal gigabyte per second class databuses connected via improved fiber interconnects (say 16Gbps cards & terabit switch connects) it would equal your supercomputers internal bus. granted, this thing is a bit far off, but not as far off as some people think.
  • Sure. Off-the-shelf systems will catch up with supercomputers. With today's supercomputers. But supercomputers are being improved just as quickly as desktop systems. Today's Big Iron is tommorows's heap of useless junk, that's how it is with computers.
  • Yes, the current switch adapter is a PCI card. There is also a microchannel version since there's still a lot of MCA based nodes out there.

    However, IBM has said that the PCI bus is too limiting and the next generation switch will have a special interface (probably right on the CPU bus).
  • fascinating thing about these types of clustering solutions is that when they are compared to large pricey mainframe systems and their relative size is taken into consideration along with the administrative and licensing costs is that you have wasted 15 seconds reading this post.
  • Yeah, but free beer is still good beer ;-)
  • Beowulf is also a character in "Grendel" by John Garner (Gardner-- sic), a modern retelling of the medieval epic poem of "Beowulf"
  • by zorgon ( 66258 ) on Tuesday April 11, 2000 @08:41AM (#1139346) Homepage Journal
    ...that SOMEONE out there at least is using Linux and clustering technology(ies!) to do some real work (pause to don asbestos gumby suit), as opposed to merely sucking down Internet bandwidth. Which is what I'm doing right now. Oh no!
    SIGHYPOCRISY received: Dumping core
    panic: Hypocrisy error in SIMM 0x0B
    panic: Hypocrisy error in SIMM 0x0B
    panic: Hypocrisy error in SIMM 0x0B
    panic: Hypocrisy error in SIMM 0x0B
    Syncing filesystems... [11][8][6][3.14159][0][0][0][0][0][0][0][0][-1]
    System Halted
    Press any key to reboot
  • A not-so-standard use of ttysnoop may work. With ttysnoop you can 'spy' on any telnet/console session. I keep a snoopified shell session opened from the console whenever I need to keep tabs on a lengthy process from a variety of locations at random times. Just telnet/open a term to the server and interactivly spy on the persistant session. If you're dealing with X apps that need to be run on the server and seen at random times/locations, VNC will fill the bill..
  • Any tips as where to find this? I'm a big Beowulf fan.
  • 'Clustered' in this case is, of course, just a shortened derivative of the real term - cluster-fscked (edited for TV)...
  • by sjames ( 1099 ) on Tuesday April 11, 2000 @09:00AM (#1139350) Homepage Journal

    No Linux clustering project will ever reach the performance of such systems (though some of them might eventually run Linux),

    With standard Wintel hardware, agreed. I predict that the hardware will migrate to comodity componants as the off the shelf cluster becomes more popular. In many supercomputers, the CPUs are nothing special, it's all in the interconnect hardware. The real bottleneck at the moment is the PCI bus. However, 64 bit 66 Mhz busses are becoming a bit more common now. The rollout of 64 bit CPUs will speed that along. AGP shows some promise. Clearly, the industry is realizing that 33 Mhz 32 bit PCI isn't fast enough. Bridge chips are getting smarter. The next step is to replace the bus with a crossbar switch driven by the bridge chips and providing backwards compatability for existing PCI devices. I also wouldn't be surprised to see gigabit serial devices connected to the FSB soon.

    At that point, the gap between custom supercomputing hardware and commodity clusters will be much smaller. Every time the interconnect gets faster and lower latency, the problem set that can be handled by commodity clusters expands.

  • by Anonymous Coward
    When someone brings up clustring at slashdot everyone always jump and say "Beowulf".
    But beowulf is not really clustering technology. It is not really a technology at all - it is simpy a buzzword. Beowulf (for example the 'Extreme linux' distribution from NASA) is simply plain linux with some user-space programming libraries like PVM and MPI, which are developed elsewhere. Networking a couple of computers and running PVM on them is nothing new. Networks of Sun sparcs were common long before someone heard of linux.

    True clustering technology should be integrated at the OS level. The OS must me aware that it is running in a cluster enviroment, and deal with this in appropriate ways, such as balancing processes over the entire cluster.

    For example, using "clustering" techonlogy like beowulf (PVM), processes get allocated staticaly on each node in a round robin fashion. If more then one user is running on the cluster or the cluster is hetrogenous - different speeds and memory sized, PVM will give far from optimal performance. If some node will begin thrashing, i.e. it fills up all its physical memory, PVM will not move some of the processes in the node so that it will stop.

    The only technology currently avaialable that does all this job in linux transparently to the user is MOSIX (http://www.mosix.org). It integrates into the linux kernel and provides transparent process migration for load and memory balancing.
    This means that you can both run PVM and legacy applications on a MOSIX/Linux cluster and achieve optimal performance.

  • Oh great, just what we needed /.

    Now we're going to have 12 other types of clusters posted by trolls!

    Can you imagine... Jessica 2 cluster?

    sheesh :-)

  • Maybe this should be an Ask Slashdot but...

    I want a system that allows every CPU on my network to be available to every process on the network as if it were another CPU in the same machine. Sort of Distributed Multiprocessing.

    I want a system where I can add a HD to any machine on my LAN and have that added into a single pool of diskspace much like multiple drives are attached to the root filesystem in Unix.

    Then in this model every computer consists of a "CPU server" a "Disk server" and a terminal with attached peripherals, like keyboard, mouse, scanner, joystick etc. The whole network is literally a single high availability computer.

    If I set fire to a particular box then the computer/network just doesn't use those resources anymore. When I replace that box the whole network/computer is faster/has more space seamlessly.

    Every application sees a "simple" multitasking environment. It tries to execute on the local node. (i.e. the one the terminal is connected to) and draws resources as needed from any other nodes.

    The entire thing should be asymetric so if I try to run Quake on a 386, it just runs out to the network right away and uses additional CPU's to run possibly even assigning the whole process to another more capable CPU like the athlon in the next room.

  • BTW. Any expert out there know how I can start a shell session on my server, log out of my X-terminal without quitting it and be able to check out the progress next time I logon.
    If I understand your question, nohup is your friend.

    I routinely run non-X, non-interactive apps thus:
    nohup prog >output 2>&1 &

    Later when I log in, I
    tail output

    YMMV on C shell.

    Do note that some versions of AIX have a bug in ksh which prevents a proper nohup and you should exec ksh first.

    Regards,
    Dean

  • The problem you will run into with a system like that is not cpu power, but I/O and latency. I believe even the best switches only allow 64 nodes; that is why IBM's Beowulf clusters stop at 64 machines. I believe, however, that QNX allows you to use CPU's like you are describing, and a distributed filesystem like CODA will allow the storage to be destributed.
  • Speaking of trolls, I just got the deluxe edition Star Wars I, which comes with 5 frames of film from the movie. The 5 frames that I got in my copy all have Natalie Portman looking, well.. erm... petrified.

    I immediately thought of Slashdot!
  • Just wanted to toss this out for the experts to chew on - I'd be interested in hearing your thoughts about clustering technology applied in a business setting.

    Where I work, there has been a nightmarish endeavor to implement a data warehouse (oodles and oodles of sales history data stored along a variety of dimensions, available for quick access and reporting). I haven't been involved in this project, but I understand that the technical challenges have been daunting, given a reasonable financial limit. One package simply failed to accomodate the amount of data once heavy loads were run, while another failed to process nightly updates within a specified timeframe.

    My question is, would one of these clustering technologies be applicable to a data warehouse? I think of the 100 or so PC's that sit around the office doing nothing overnight, and wonder if an investment in some extra disc and some elbow grease might give us at least some functionality while handling that massive amount of data. Just a thought...

  • I'm fairly sure that there's a Linux driver for it too.

    God I love that network hardware. Another nice technology is SCI from Dolphin (http://www.dolphinics.com/)

  • Condor was a bitch. Damn thing would always chew up precious CPU cycles on the poor DecStation 5100's I'd be using.

    Anyway, they also forgot about Eddie [eddieware.org] which is mainly designed for redundancy and load balancing for web servers.

  • WHY do you want to keep backwards compatibility, pray tell?

    Because it's a gimme. The hardware for PCI already exists, and there are bridge chips right now that can just about fit the bill. The cost of going that way is a deciding factor (it's a reletivly cheap solution). It helps to avoid the chicken and egg problem:

    Cluster users represent a small minority of PC purchaces. Servers are a much bigger market, but the price/performance would have to be comperable to the current server PCs to sell there. A manufacturer has a much better chance of a decent sized market by incrementally improving on existing standards while maintaining backward compatability. That's the whole point of commodity clustering.

  • It really depends what you're doing with your data. Thousands of inserts updates and deletes aren't so hot on a distributed system. Queries can be really nice though.

  • by tolldog ( 1571 ) on Tuesday April 11, 2000 @11:42AM (#1139362) Homepage Journal
    Having just purchased a linux render farm, I can really appreciate this article. We went through the process of determining what the best solution was for our system and for the software that we use for rendering (A|W's Maya [sgi.com]) and for load balancing (Platorm's LSF [platform.com]) was to expand to linux boxes and use the same software.

    We explored Beowulf, but after talking to those that are in the know, Maya's tile renderer is not well suited to a Beowulf system.
    I looked at other solutions as well, but due to shared memory and the network bottle neck, nothing could take what we saw as a distributed system and turn it in to a parallel system.

    By using a load balancing cluster, we are given the opertunity to render multiple frames at the same time, giving us a speed advanteage. This uses more overall memory than a massively parallel beowulf cluster, but it keeps the speed gain of a parallel system the same. The overhead exists for scene file loading becuase that is done on every machine, but it takes minutes when rendering takes hours. A fair trade.
    The distributed system needs horsepower and memory more than network speed or file system speed. It is true that an increase in those will speed up the process, but the money is better spent in CPU and mem concerns. Our systems are all dual 600 mhz with a gig of ram per box. It may seem extreme but from our SGI render benchmarking, the scenes that we render can take over 500-600 mb of system memory.

    Is it worth the cost?

    We are taking our current render system of SGI boxes, which currently are used as desktops durring the day and render boxes at night and adding full time render boxes as well. The cost comparison of a linux render box can be seen in the hardware price alone. We are using these linux boxes to keep par with boxes that cost at least 3x's as much.
    The only disadvantage is that the linux boxes can not be rolled out to desktop systems when new hires arive, where as the SGI boxes can. This is due to Maya's modeler being SGI/NT only and our support of Maya on the SGI only.

    All in all, in our situation, a linux cluster is a God send, allowing me to have more horsepower and to allow the company to save money.

  • The article says that there are 3 kinds of clusters, not 4. They are scientific clusters (like Beowolf -- think of them as CPU clusters), load-balancing clusters, and high-availability clusters.
  • Being a clustering product manager at a Linux company, I've been cornered into coming up with a definition of clustering that seems to fit most technologies that people consider clustering. It may have been abstracted to the point of being overly vague, but the following definition seems to fit most forms of clustering...


    "A cluster is a group of systems, bound together into a common resource pool. A given task, whether that task is a web server, mathematical calculation, or robotic cooking widget is able to be properly and arbitrarily executed on any of the member nodes within the cluster.
    (This does not imply that it can be run concurrently or in parallel on multiple nodes)

    To the 'outside world', or the entity using the cluster, the cluster appears as a single object. That is to say, the cluster has the image of a single system. (single state image)"


    I sometimes think an accurate and useful definition of 'cluster' is one of the holy grails of this industry. It's about as overloaded a term as the word 'stuff'. :) If there are comments that can make the above definition better, please feel free to comment.

    Aaron McKee
    Clustering Products Manager
    TurboLinux, Inc.
  • There was a cool project at Berkeley called Global Layer Unix that was the backbone of the NOW (Network of Workstations) project.

    There were two design goals: operating system independance and program transparency. You could install and run the GLU daemons and programs on any system and programs would be able to run on any node in he cluster without knowing about it (of course, for special GLU features there was a library of routine to use).

    The GLU project purposely didn't make any kernel modifications to help in portability. A few of the research papers gave ways to modify the kernel to help performance or feature set, but that wasn't the goal.

    GLU also supports another Berkeley project called Split-C, which is a parallel extension of the C language. Use this and you do not need to use the GLU library, gcc will generate appropriate code (I think that is how it worked?).

    http://now.cs.berkeley.edu/Glunix/glunix.html
    http://now.cs.berkeley.edu
  • > "The limits on today's Internet are no longer determined by raw bandwidth, but rather by how
    > well the different network components work together," said Brian Valentine, senior vice
    > president of the Windows Division at Microsoft.

    It is clear to me that Microsoft and I do not share the same uplink provider...
  • LOL,

    ya know i was talking to one of the guys who worked on ASP from Microsoft, originally it was going to be called "Active Server Scripts". but the acronym, "ASS" lead to a slight change.

    -Jon
  • by Anonymous Coward
    Have to agree with your point. Anybody who thinks Beowulf is the end-all doesn't understand HPC and hasn't taken any distributed programming courses. The idea is simple: if you need a lot of communication between your nodes, then you need a fast network. That's why SGI is making a killing with their Origin2000 servers- they got the network to work at high speed (800 megaBYTES per second!), and provided a good NUMA that still has a single image. The Linux answer will probably be distributed NUMA, though it is doubtful that we will ever see a single system image without additional hardware to help with the cache coherency (e.g. ccNUMA). And even that's probably far down the road -- Linux 2.7 at least.
  • For instance, Deja uses a large cluster of Linux machines to index their News feed.

    Their software is custom written, though (Including the databases)

    It's the same situation at most search engines - Google uses multiple machines for indexing and searching.

    It's going to be a problem finding a commericial database that will allow you to distribute it over multiple Linux machines, though.

  • You just confirmed that you are indeed a nerd cluster node.

  • by Anonymous Coward
    What's a cucumber called "King George" doing in the bath tub together with a rubber duck? Your company is either very funny and uneducated in Freudianisms, or slightly naughty.
  • no necissarily....
    i'm not an expert, but what we did was give multiple nics to each machine.... using multiple switches and multiple nics, you can get as many machines as you want, we currently have 66, and they act as one machine...
  • Yes, Condor is great - We used it for balancing batch jobs on a 8-node cluster (16 CPU:s), to make use of the extra cycles, when it isn't busy running parallel jobs (it's an SCI cluster, 2x4 mesh, with software from Scali [scali.com]).

    However, we had to give up Condor, and now we use PBS instead (which sucks in comparison), mainly because of two things - Firstly, the whole libc6 issue; We needed the 2.2 kernel and glibc for the SMP support, but it took a long time before Condor supported glibc.

    And then, when the libc6 version finally came out, we found that the client damons couldn't make contact with the master node - The master bound to the wrong Ethernet interface (130.x.x.x), and I just couldn't make it talk to the other interface instead (a private subnet, at 192.168.1.x). I even tried IPChaining the UDP ports, which didn't work (the packets got in, but they never got out). The 2.0 kernel seems to have been more forgiving - Traffic to one interface was let through to a daemon bound to the other. That seems not to be the case anymore.

    Now, since I have you on the line, do you know if there is a solution for this? We would like to switch back to Condor, if possible. But if the clients can't talk to the master, it isn't very useful...

  • screen. screen is great - you can run one or
    many shells in one console / xterm / telnet
    session / whatever, and switch between them when
    logged in.

    Before you log out, detatch the screen session.
    All the shells will continue to run, and when you
    log back in, you can reconnect to screen and
    continue as if nothing had happened.

    screen rocks :)

    Regards,
    Tim.
  • For that kind of work, you probably don't want a cluster, but a Sun E6500 [sun.com] or a similar machine instead. They are built specifically for buisness applications, like huge databases and that sort of thing. Clusters are generally useful for scientific computing, which is another thing.

  • Argh. It would have been nice if they had mentioned Dolphin Interconnect Solutions as a (the?) vendor of SCI-based interconnects, or that Dolphin supported VIA before and better than GigaNet. Dolphin had all sorts of problems, including production problems, but at a basic technology level they were always way ahead of anyone else. 3.2Gbps, 2.5us latency - that's the real deal. If they'd had the resources that Myricom's hype machine generates, they'd have cascadable mega-switches and much better economies of scale (lower prices) and nobody would ever consider buying GigaNet or MyriNet pieces of crap.

    *sigh* But it was not to be. Dolphin ran up against compatibility and performance problems with PCI chipsets before anyone else did, and the drivers weren't stabilized soon enough, and they never really figured out who their market was, and they made the major mistake of being honest with customers while competitors were bullshitting about stuff as though they actually had it ready to ship when in fact it was barely even on the drawing boards. In the end the liars and cheats got the mind and market share, and Dolphin is barely eking out an existence nowadays.

    I also take issue with the following from the article:

    >HA clusters may perform load-balancing, but systems typically just keep the secondary servers idle while the primary server runs the jobs.

    Bull. Idle standby is just _so_ early-90s. I worked on eight-node mutual-standby (i.e. load sharing with potential for full failover) clusters in '94. We were before most people, but not first. Nowadays almost nobody would buy an HA solution without this capability.
  • So what you want is a mainframe that can play quake...
  • You're sounds like you need Mosix. I haven't played with it myself, but it sounds really cool.
    Check out http://www.mosix.cs.huji.ac.il
  • O'Reilly [oreilly.com] is coming soon (august --supposedely) with a book called Building Linux Clusters [oreilly.com].

    I think some of you are going to find it quite interesting.

    Note that I am working on my PhD thesis in this field of research (specialized in MPI), and we have softwares available at : http://www.itl.nist.gov/div895/savg/auto/ [nist.gov] designed to help user work with data-types in MPI.

    Please drop me a note at martial.michel@nist.gov [mailto] if you desire more information on our project (we hope it will be added on the CD-Rom of the O'Reilly book).
  • As for the multiple interfaces, that is now supported as well. Please look at section 3.11.8 in the V6.1 manual

    Ah! Thank you. I'll test it at once! :-)

  • FWIW, InterSystems' Caché (a version for Linux exists available here [edbms.com]) lets you distribute a database over a number of computers, and they don't need any OS modifications.

    Disclaimer: I'm using just a single server for development, and it's on the NT Workstation partition. Caché is supported on RH 6.1, I have SuSE, which is rumored to work with a few tweaks but I haven't gotten around to installing Caché on it yet - probably will though because the Linux version is also rumored to be very fast :-)

"Sometimes insanity is the only alternative" -- button at a Science Fiction convention.

Working...