Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

NSF awards $500,000 grant for Beowulf Cluster 100

ragnar! writes "National Science Foundation (NSF) awarded $500,000 to support a new parallel computing facility for Bartol. The "major research infrastructure" (MRI) grant will support a parallel system based on 100 linked processors, each of which will run at speeds up to 600 megahertz, connected by fast Ethernet hardware - very similar to the Avalon-Beowulf Cluster, developed by the Los Alamos Center for Nonlinear Studies and Goddard Space Flight Center. "
This discussion has been archived. No new comments can be posted.

NSF awards $500,000 grant for Beowulf Cluster

Comments Filter:

  • MRI grants can't include personnel money or electricity.

  • I suppose I knew it would be risky making a comment like this on a thread that actually _was_ about beowulf clustering. Just to clarify things, I intended it to be humorous - hence the part about gaining back my lost respect via first post :)
  • Probably slashdot has a script that automatically marks the first post -1 offtopic. ;)
  • You have to take into account the normal government bureaucracy bullshit, so it will come out to about 2 to 3x of whatever is a fair and reasonable price.
  • I imagine a BSD variant would be best - still open source, but the TCP/IP stack is faster, so you'd probably lose less in inter-processor communication.

    If you're running a private gigabit-class network (GigE, Myrinet, Giganet, etc.) and have a separate control network (typically Fast Ethernet), there's no reason to run TCP/IP over the high-speed network. In tht case, you could bypass the TCP/IP stack entirely and have the message passing system (typically an MPI implementation) talk directly to the hardware -- the "user space"/"OS bypass" approach. This is what Myricom's GM and the various VIA implementations let you do. Most of the larger Beowulf cluster installations are going with something like this.

    I must admit that I find it very surprising that they're going to the trouble of buying fast DEC Alphas and then connecting them with something as pokey as Fast Ethernet. I hope their RMHD and other calculations are pretty close to embarassingly parallel (i.e. almost no IPC), or the network will definitely end up being a performance bottleneck.

    --Troy
  • Here at Michigan Tech one of our professors went and worked on the original "Beowulf" and is currently working on a similar project that received a $780,000 from NASA of all people!

    the story is here [nasa.gov]

  • That's true, but if you want to be a Karma whore, like people here seem to be accusing each other of, it won't do you much good, since you can't get Karma by having an AC post moderated up.

    But who needs Karma whores anyhow? The only difference it makes is when you get over the +2 threshold, and then it doesn't matter.

    --
    grappler
  • We have our own research group at UDel developing just such a system. It's called EARTH, Efficient Architecture for Running THreads. It's still being developed but is functional as far as I remember, just not necessarily optimized. I don't know wether or not it will be used on the new system, but I'd be willing to bet that one way or another, EARTH will find it's way on to the cluster. The professor running the group has a way of getting things done :)

    If you want to check out more, check out http://www.capsl.udel.edu [udel.edu]. The information on the web page isn't too organized, and all the info on the EARTH page is bound to be pretty old, but you might be able to get a decent idea.

  • I'm currently a grad student at UD (in Mechanical Engineering, not Physics) and am really stoked that I might be able to run code on this system if I can talk to the right people.

    I also work at one of the top composites research centers in the world (also at UD). It takes in $5 million a year, but in the end only between 8-10% gets spent on actual material and equipment. My boss has over $1 million in grants himself, but I still get a kick out of how much of our equipment is still held together with duct tape. A grant that large will be nibbled away to fix problems the physics department or research group has and any fix will cost at least a thousand bucks because it probably won't be something that can come off the shelf. I know my school, it will happen that way.

    When all is said and done $500,000 over 3 years ain't that much really, especially since what defines necessary hardware can be kind of shaky. Also you have to consider that about 10% of the money research groups spend on equpiment gets siphoned off the back end by the UD purchasing department.

    Incidentally personnel means grad students who don't get benefits and get paid a pittance. Thats probably what the other 20% of mathcing funds will pay for.


  • You can get a sufficient quality PC for a scientific cluster for far less than $20k. I'd bet that they don't want 100 raid controllers.

  • ah..now why do women go pantie less? Isnt it cold? Do they think they're going to get laid pretty soon, say in the janitorial closet? Or do they just go naked under a single piece of clothing for their own pleasure? Or maybe they're too hot and need a way to stay cool..
    dont get me wrong, I LIKE the fact that women go undiless..I just wish I knew when.
  • by Anonymous Coward
    Which Linux operating system tends to get used for clusterware? Is it a customer version, or off the shelf stuff, like SuSE or Slackware? Does it require a special kernel? Does it require special utilities and admin stuff?
  • Well, I sort of admin a Beowulf at work... it consists of 8 dual PII-450s with 512MB RAM apiece...

    It came pre-made, with a slightly-modified version of RH 5.2 installed - basically just an SMP kernel and some utilities and libraries. You don't really need any special software, except for PVM or MPI (we use MPI). The MPI distribution we use is LAM 6.2 - I'm sure you can hunt it down if you look around a bit (try google.com/linux).

    I'm going to eventually set up another small cluster for testing and development purposes, once we get the first piece of software into operation (or maybe before, if I don't have a whole lot of stuff to do) - I'm planning on setting it up with Debian and a much more sensible layout (share /usr, instead of each machine with its own /usr - or maybe not, depending on what seems to be the best way to do it).

    I wouldn't recommend RedHat, tho - adminning on it is not much fun. (no apt!)
  • by a.out ( 31606 ) on Sunday December 05, 1999 @09:51AM (#1478198)
    We are a cash strapped beowulf group doing legit research (we actually made a little bit of knot (ap math) history [baldric.uwo.ca]) Everyone seems to like us and support what we are doing but no one is putting their money where their mouths are BALDRIC [baldric.uwo.ca]. We are doing this in the true spirit of beowulf.. Taking old surplus hardware from all around a university and putting it to good use .. All of the research findings must be public inorder for anyone to use it, we have a very open source attitude to the cluster.. We currently have 8 nodes up and running with 7 more waiting to get 'on the action'. But our problem is that we have *no* funding. The biggest support we've gotten is a tiny room (I'm talking 15' x 15' at the most) from our Computer Science department.

    My question is: How can we get this kind of support??
  • Very bad luck to mock the Trolls. Look what happened to Britain when the Vikings visited. Still a little rape and pillage would break the bordom.
  • ... But I will ask the question anyway :-) Assuming that someone had more money than brains,
    could a Beowulf cluster run an interactive application? Say, wine for instance?
    I just had this vision of Windows apps actually running fast... On second thought, nahhh.
    Some things are just not meant for mankind.
  • The trouble with supercomputers today is that price/performance peaks with high-end desktop machines. It didn't used to be this way. Machines used to obey Grosch's Law (computing power increases as the square of the price. But that's ancient history. It ended when the fastest CPUs became single chips. If you're going to build a fast single-chip CPU, you don't want to waste it on a supercomputer that sells maybe ten units. You make Pentium IIIs and sell millions.

    The main result of this is that only the Government buys supercomputers, and nowadays they're mostly a boondoggle. SGI is currently trying to sell Cray, with limited success. Even Deep Blue is a cluster, made of stock CPUs on custom boards with additional custom hardware. The era of the classic supercomputer, with its huge mat of hand-wired connections, is over.

  • *points at the moderator points he just got today, which he hasn't spent*

    Personally I thought this was kinda funny. not the funniest, but certianly has merrit.

    Anyway, off to find a good post to moderate up (I wish more people would log in, or not post AC... doubtful you'll get moderated down for a minority opinion, only a stupid one... That and people seem to make more sense when they have a name)

  • Lam can be found at http://www.mpi.nd.edu/lam/ [nd.edu]. It was originally written at the Ohio Supercomputing Center. It is currently being maintained by the Laboratory for Scientific Computing at the University of Notre Dame. By the way, we just released version 6.3 of LAM. If you're looking for a good way to see how LAM is communicating, check out XMPI, a graphical interface to LAM (as well as SGI's MPI implimentation). LAM is available as a tarball, i386 and SRC RPMS, and should be available in the Debian Potato archives. BTW - While you're visiting the LSC's pages, don't forget to see the world famous domecam [nd.edu].
  • by Dr. Sp0ng ( 24354 ) <mspong.gmail@com> on Sunday December 05, 1999 @08:05AM (#1478216) Homepage
    I want a Beowulf cluster of THESE THINGS!!

    Sorry, just couldn't resist... bye-bye karma :-)

    "Software is like sex- the best is for free"
    -Linus Torvalds

  • This source of funding isn't that unusual -- the University of Virginia Centurion cluster [virginia.edu] was funded by two $450,000 MRI grants.

  • Almost no one uses Linda -- what would you think UDel does?

    Most people with systems like this use a batch queue system like PBS and message passing libraries like MPI.

  • by abach ( 103405 ) on Sunday December 05, 1999 @08:20AM (#1478219)
    Beowulf like clusters become popular, Linux is
    often used, but it have to compete with the large
    and good old Unix suppliers. Take a look at:

    http://www.fysik.dtu.dk/CAMP/valhal.html

    Here you find a similar project, and even an
    explanation why they didn't choose linux.

    Seems like the commercial unices are running
    out of time.
  • After reading the article, I couldn't help but wonder what type of software they would use to keep the processing happening smoothly. Parallel processing in the large such as this is a whole area of study on it's own, I would assume they would implement some sort of process control software that would model the virtual OS Linda, but I don't see any reference in the article as to how they are handling this.
  • by Signal 11 ( 7608 ) on Sunday December 05, 1999 @08:23AM (#1478221)
    Great! Let's just hope none of them have been listening to the ACs here on slashdot or they'll try to build it out of iMacs running linux or palm pilots....
  • I'm sorry folks, but I'm just not creative enough to come up with a way to somehow make a beowulf cluster of these. I apologize for not being able to contribute to the obligatory beowulf cluster thread, and hope that I can earn back all of your respect by getting a first post somehow.
  • by Greg Lindahl ( 37568 ) on Sunday December 05, 1999 @08:28AM (#1478224) Homepage

    Linux does have to compete with other Unixes, but people often decide in Linux's favor. For example, this cluster [hpti.com] is 277 nodes with better networking, and we chose Linux over Tru64, due to Linux's super system administration capabilities.

    BTW, you can get Compaq's great Alpha compilers for Linux.

  • I've always wondered what happens to these huge computers and labs people build for one project after the project is over. Do they auction the parts off? Do they do more research? How do they decide when it's obsolete?

    Dan
  • So will we soon see the demise of massively paralell supercomputers as clusters of cheap machines become cheaper? Though they're technically not as efficient, because, CMIIW, the big SGI type supercomputers have their components connected via very big pipes, allowing for more data throughput...but I think the relative low cost of cluster type solutions may outweight that.


  • ... these machines ARE massively parallel supercomputers, if you build them big enough and you use the best commodity networking (like myrinet [myri.com]).



  • The obvious answer for getting this kind of support is to apply for a large grant of your own. To do this you are going to need applications for the cluster mapped out, doing something specific and preferably 'hot'. Talk to your grant office people and try and get plugged in. The truth is though, that not everyone can pull off this kind of funding. Sounds like you are doing the best that you can with the resources you currently have availible (none), so I guess the other suggestion I would make is try to raise awareness of what you ARE doing. A little PR can go a long way.
  • Just 32 Pii-450's, a great 3com switch, and a master node. Beowulf is the way to go. BTW, with that $500000, I could make a zillion-nodes cluster ;) I just think our new cluster at CS dept. needs a name. Thinkin'
  • by Admiral Mouse ( 3430 ) on Sunday December 05, 1999 @10:32AM (#1478235) Homepage

    People making coments about the amount of hardware/support that can be had for $500,00 should remember the realities of grant funding at a University in this country:

    • Universites/Departments typically keep 40-50% of the grant amount awarded to a lab for "indirect cost recovery" (ICR). This is the fee they asses for providing buildings, plumbing, offices, etc (infrastructure costs).

    • People tend to cost 2 x salary once benefits and whatnot are considered. So each $40k person costs the grant about $80k.

    • Labs usually have other costs they need to cover and small bits of large grants are usually used to cover these "extra" needs.


    So, a $500k grant is about $250k after ICR. Then say you fund 2 peole at $35k/year to help build and run it. Now you're down to just $110k for hardware. Even with a "best case" run of the numbers and cheap people, you're still not going to have more than $150k for hardware in this grant.

    Also keep in mind that this grant's funding is spread over 3 years.

    100 600MHz PCs is going to run about $100k even before you start buying networking equipment, backup equipment and power supply/protection equipment.

    In all likelyhood, Bartol is going to need additional funding (possibly x% matching money from the state or other similar grants) to make this a realitiy.

    Just thought people should know that when you get a $500,000 grant, you don't just get a check for $500,000 to blow on hardware. :-)

    ----

  • The article wasn't specific as to hardware, but since they said it was "much like the Avalon cluster" they might well be using Alphas, not Pentia. $5k/box would be a good price if they are using the newer Alpha boxes based on the 21264 chip (which is better than twice as fast, on average, than the 21164's used in Avalon, even at the same MHz).

    -Ed
  • Remember that supercomputers are expensive. They will also crush any PC based system for throughput. What is the backplane speed of a PC around 600 MB/s max!! Large mulitprocessor computers like the Sun E10000 have 64 processors with a backplane speed of over 9.0 G/s (This is processor to processor, memory to process via a switched backplane). This machine can crush any of these Beowolf clusters but the cost is in the millions. How many Universiy research groups can afford a million dollar supercomputer and the staff to run it? Do not call the supercomputers dead just because of your lack of knowledge.
  • With beowulfes getting ever bigger, the issue faced by administrators of these beasts are very much related to management.
    By chance, some projects answer this issue, as BLD [nersc.gov] (free, Berkeley Lab) or ALINKA [alinka.com] RAISIN [alinka.com] (commercial) and ALINKA LCM [alinka.com] (GPL), but there are still things to be done. Moreover, once you have overcome the software management, you still have to deal with the hardware (of these 1000 fans, one _has_ to fail...).
    Still, the hardest job is not for the administrators: users have to actually write good parallel code... and this is no piece of cake.
  • You're exactly right, and this is why a good Beowulf cluster costs more than N times the price of a single box. You want those extra sensors to detect failing hardware and you want them to be hooked up to some management port which in turn is wired to a management console.

    You also want streamlined software configs, automatic integrity checks of OS and software installs and a host of other stuff to keep the software side of the house healthy and under control.

    This all adds up and makes a middle to large-sized Beowulf a bit pricier than expected. You will find vast differences between the various Beowulf integrators when it comes to management issues. VA is one that impressed me.
  • Not true! While clusters and in particular Linux clusters are coming on strong, there are things that they just can't do.

    The bandwidth of the interconnects on a Beowulf and worse still their latency are just not there yet to go head to head with a traditional supercomputer. They are getting very close, I admit, but haven't consistently beaten a Cray T3E-type interconnect yet.

    Also, some codes/algorithms just don't lend themselves well to massively parallel implementations. They might be much happier on SMP-type machines or perhaps on vector machines.

    Finally, some of the management issues for very large Linux clusters aren't fully resolved yet, but they have been in place on traditional supercomputers for quite some time.

    As a result many institutions, including government, but also research sites and large financial institutions continue to buy Cray, SGI and SUN supercomputers, all of which aren't clusters. Just check the latest Top500 [top500.org] list, in particular the slides and statistics.

    Remember, we're talking large systems here, and I would define this as more than 16 nodes and more than 32 CPUs minimum.

  • Troll... I mean it! We wouldn't have so many people who do write d00d and stuff if we didn't have articles like this... but now since /. is part of a publicly traded company, it's anything to get more readers.

    I dunno why I'm writing this... no one's still lookin' at this discussion.
  • If you look at the top 50 on the Top500 list of Big Expensive Supercomputers [top500.org], the only one that isn't either government-funded or an in-house machine of a supercomputer manufacturer is Charles Schwab. The top ones are the usual suspects; Sandia, LLNL, Los Alamos, and the inevitable (Government/Classified), which is probably NSA.

    I wonder what a discount broker is doing with the twelfth biggest number-cruncher in the world.

  • by say-tan ( 54534 ) on Sunday December 05, 1999 @08:50AM (#1478250) Homepage
    i'm sorry, i'm going to have to agree with the ac on this one. not only was this post on topic, but it was funny. you (mr. moderator) obviously have no sense of humor. we all knew that the beowulf cluster post was going to show up, but this guy beat everyone to it with a first post. if i had moderator points, i would have tried to help correct this by moderating him up, but, alas, i don't.
  • What? Surely this must be a flame war attempt...

    I enjoy linux, but saying that a quad xeon outperforms a cray is rediculous. Also, saying that linux outscales Solaris is a bit far fetched...
  • Is it just me? Or does spending half a million dollars on a 100-node 600MHz cluster seem high? I could have sworn that recent posts of this kind put this type of cluster in at under $200k?
  • by Greg Lindahl ( 37568 ) on Sunday December 05, 1999 @10:37AM (#1478255) Homepage

    MRI grants do not allow universities to charge overhead, and is 100% hardware money. You also have to get at least 20% matching funds.

    In general, equipment over $500 isn't assessed overhead by any university.

  • slashdot.org puts the B O in beowulf hahaha
  • you can't moderate in any discussion you post in. I suppose you could do that with two accounts. Just use each account to moderate up the other one. The implications are rather interesting actually...

    --
    grappler
  • It would be interesting to know how many of these things the US government has stashed away. (I mean, since NASA originally developed the thing (Is that correct?) I'm sure other agencies with one less letter in their names were pretty interested.)

    In any case, the IRS might think about using AI Beowulf clusters to check tax returns. Ha. That would be the day. (I want my refund!)

    What Crypto applications might there be for a Beowulf cluster out there? Genetic algorithms for new ciphers?

    Jon
  • Good point. I assumed they would be using 600 MHz Athalons.

    A 100-box Alpha setup would of course be much more powerful, but also much more costly; I wonder if the added expense would be better used by just buying more, but cheaper x86 machines?

    Now, there are so many possibilities for people looking for power setups. We've got Athalons, G4s, Pentium IIIs, Alphas, Transmeta mystery chips, and those are just the CPU choices. I can't wait to see what will win the price/performance war once multiprocessor, multicore and cluster technology really go mainstream.
  • I imagine a BSD variant would be best - still open source, but the TCP/IP stack is faster, so you'd probably lose less in inter-processor communication.
  • I read one comment where Linux Beowulf clusters are cool, but can't really compete with a well built supercomputer. You can't do fluid modeling on one of these things, you don't have the memory speed. this is 100 Pentuims running at on 100 100mhz FSB....
  • by seaportcasino ( 121045 ) on Sunday December 05, 1999 @09:02AM (#1478267) Homepage
    Because any post associated with beowolf clusters is normally a troll, the moderators are having a hard time moderating this particular topic...

    Their first instinct is, "Oh God, it's a beowulf post - moderate down, moderate down." It must be a hard itch for them not to scratch in this case :)

  • by Tiro_Dianoga ( 68651 ) on Sunday December 05, 1999 @09:08AM (#1478268)
    From the article it is hard to tell exactly what this money was for. Was it a $500,000 payment for a Beowulf cluster, for Bartol to run the cluster, or for Bartol to build and run the cluster?

    If they are purchasing hardware for that amount, they're getting ripped, because I'm thinking all the needed hardware, including the boxes and the networking equipment, can be had for under $150,000 (they could get a nice bulk order discount).

    My figure wouldn't include costs like assembly/setup labour and the OS (heh) but half the work is opening the boxes...

    Seriously, once the system is going and the scientists have their apps setup, all you need to do is make sure it doesn't overheat. (We are talking about a massive number of x86 systems, here).

    Disclaimer: I really don't know what the hell I'm talking about in this post. If someone could inform us what it costs to maintain a project like this, please post.

How many hardware guys does it take to change a light bulb? "Well the diagnostics say it's fine buddy, so it's a software problem."

Working...