Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming Software Linux IT Technology

HA-OSCAR 1.0 Beta release - unleashing HA Beowulf 90

ImmO writes " The eXtreme Computing Research (XCR) group at Louisiana Tech University is pleased to announce the first public release of HA-OSCAR 1.0 beta. High Availability Open Source Cluster Application Resource (HA-OSCAR) is an open source project that aims toward non-stop services in the HPC environment through a combined power of High Availability and Performance Computing solutions. Our goal is to enhance a Beowulf cluster system for mission-critical applications and downtime-sensitive HPC infrastructures. To achieve high availability, component redundancy is adopted in HA-OSCAR cluster to eliminate single point of failures, especially at the head node. HA-OSCAR also incorporates a self-healing mechanism; failure detection & recovery, automatic failover and fail-back. The 1.0 beta release supports new high-availability capabilities for Linux Beowulf clusters based on OSCAR 3.0 It provides an installation wizard GUI and a web-based administration tool that allows a user to create and configure a multi-head Beowulf cluster. A default set of monitoring services are included to ensure that critical services, hardware components and important resources are always available at the control node. "
This discussion has been archived. No new comments can be posted.

HA-OSCAR 1.0 Beta release - unleashing HA Beowulf

Comments Filter:
  • ...written by Tong Liu (the lead developer) in last month's LinuxWorld [linuxworld.com].

    You have to be a subscriber to view the HTML, but it seems that you can download the PDF version for free...
  • Linuxworld (Score:5, Informative)

    by ViceClown ( 39698 ) * on Tuesday March 23, 2004 @10:16AM (#8644873) Homepage Journal
    Worth noting also, Linuxworld magazine has an article [linuxworld.com] this month on HA-OSCAR which is pretty good!
  • CPU RAID (Score:3, Interesting)

    by manganese4 ( 726568 ) on Tuesday March 23, 2004 @10:24AM (#8644977)
    So on a multi-CPU sever if you started the same process synchronously on multiple CPU, how close in time would they finish assuming there is sufficient memory and disk drive controller to prevent severe competition?
    • Re:CPU RAID (Score:3, Informative)

      by straponego ( 521991 )
      The only simple, honest answer to this is: it depends. If your jobs stay completely inside the CPU cache, and nothing else is happening in the system, and the scheduler is smart enough not to swap the tasks between CPUs without good reason, you should see very nearly 100% scalability. The larger the cache, the more likely this is, so at this point smaller jobs favor Xeon CPUs over Athlon/Opterons. Most jobs do need to access memory and disk, though. In these cases, the Opteron architecture does well,
  • The ratio 'imagine a...'-jokes to 'now there will be a lot of 'imagine a...''-jokes
  • by PonyHome ( 625218 ) on Tuesday March 23, 2004 @10:30AM (#8645037)
    Darl McBride files suit against Louisiana Tech, saying "This is one more example of how SCO innovation has been misappropriated."
  • by DanoTime ( 677061 ) on Tuesday March 23, 2004 @10:31AM (#8645050)
    Boy, I could make my manager's head spin just by reading the summary of that article!
    • Haha. Yeah I was kinda thinking there was a bit of a buzzword overload there..

      That said , I think they missed the bit about it using "XML compliant Strategic Webservice Failover Product placements + Redundant steak knives!!"

      Aint it scary tho, when you read articles like that, and despite having years of IT deep-fried knowledge, you'd probably have to pass it to marketing to decode it.
  • More about beowulf? (Score:5, Informative)

    by Krik Johnson ( 764568 ) on Tuesday March 23, 2004 @10:39AM (#8645127) Homepage
    If you have seen all the jokes, but you still don't know what a beowulf cluster is, then this site [beowulf.org] is for you. It has all you need to know about it.
  • by Electrawn ( 321224 ) <[electrawn] [at] [yahoo.com]> on Tuesday March 23, 2004 @10:42AM (#8645161) Homepage
    High amount of corporate buzzwords detected: self-healing, mission-critical, GUI, beowulf...

    Oh, this project actually does those things? Quaint!

    Just running the vaporware bullshit o-meter here...
  • I can hear the terrorist governments of the world licking their chops for this one! Im just joking. Or am I?
  • I thought this was about Beowolf clusters in NASCAR. :o
  • by brechin ( 309008 ) on Tuesday March 23, 2004 @10:51AM (#8645242)
    I've been writing some articles about OSCAR and some of the projects that are related that are being developed at NCSA and other places. You can find the latest version of this newsletter at the Linux Developer Newsletter [uiuc.edu] site.
  • by brechin ( 309008 ) on Tuesday March 23, 2004 @10:56AM (#8645306)
    The link in the story to OSCAR 3.0 should be to http://oscar.sourceforge.net [sourceforge.net] The other site is just the parent organization's info page.
  • So how come none of the linked sites have been slashdotted?

    Is it because they have un-killable servers, or rather that is this not a hot enough topic here?

  • "Just imagine one of these!" doesn't have the same ring ...
  • It isn't surprising that beowulf clusters would want to incorporate mechanisms to deal with node failure, but I am curious if those who have worked on actual clusters could expand on the most common causes of failure. I was surprised to read in a previous slashdot post (sorry no URL) that even clusters of mini-ITX boards without hard drives (the most failure-prone component I would have thought) have frequent failures.
    • I've heard (no sources, google it) that Grendel is hard for beowulf clusters to deal with.....

      Maybe? I dunno. :)
    • but I am curious if those who have worked on actual clusters could expand on the most common causes of failure...

      As a research assistant that helps maintain a cluster, the most frequent problems in out Commercial Off The Shelf (COTS) clusters are power supplies. We have at least one die each week. Hard drives are a close second.

    • The sources are essentially no different than for your desktop but if you do the math you'll see that failure is much more common when you have a bunch of them.

      What's the probability that your desktop will crash if you run it fully loaded for a week? Pick a number, say 1%. So it has 99% chance of completing the job.

      Now suppose you have a job that runs in parallel on 100 such nodes flat out for a week. The probability that the job finishes successfully is (0.99)^100 or about 36%

      So the job is about two
  • Dr Box (Score:2, Interesting)

    by ChaserPnk ( 183094 )
    I actually go to Louisiana Tech. Chokchai Leangsuksun (Dr. Box), the director of the HA-Oscar program also teaches my Operating Systems class. He came into class today looking tired...he said he'd been working very hard on it.

    I think it's about time LaTech got some recognition.
    • Yes it is time that LaTech got a little recognition. Dr. Box also deserves a great deal of credit. He's a very talented and gifted man. Box actually brought a small cluster of his IBM tablets into class the other day and we actually saw a few of HA-OSCAR's capabilities. Hopefully HA-OSCAR will pan out like expected.
  • High availability and beta don't seem to go together to me. I don't think an OS should be classified such until it is STABLE



    <mumble> I doubt anyone will read this, drowning as it is in stupid Beowulf jokes</mumble>

    This story is burning up enough mod points to give us all karma nirvana.

    Please, stop wasting points modding off-topic.
    • Good point. I was wondering why a beta was released at 1.0, which implies, to me, a production release. If it were up to me, I'd release a beta at 0.9 or something.

      If it's stable then they should probably drop the beta suffix.
  • when there was no mention of the satellites or amateur radio here.
  • I've got about ~55 Compaq's that are bored to death and looking for something to do..

    Now, if the circuit breakers will only hold up long eno
  • I hear a certain terrorist group's Open Source Application Management Administrator (OSAMA) is already working hard to find some loop holes in the code.
  • Shameless Plug:

    There is now a magazine [clusterworld.com] and a news website [clusterworld.com] dedicated to HPC/Beowulf cluster computing. You may recognize the webpage format.

    We are still running our free three month trial issue offer as well.

  • Mosix does this, but what about this? Or do you have to recompile and optimize for clustering?

    In a 'regular' environment auto propagate would be more useful.
  • by jelle ( 14827 ) on Tuesday March 23, 2004 @02:53PM (#8648239) Homepage
    How does this compare to OpenSSI [openssi.org]? OPenSSI is nice because of the single system image approach, that makes administration very simple. AFAIK, an OpenSSI cluster also supports PVM and MPI in addition to exec and run-time load balancing (a'la mosix [openmosix.org]).

    OpenSSI has a lot of "HA-" support, including support for various clustered filesystems, failover of network interfaces across nodes, and failover of the first node (hopefully soon without needing shared SCSI storage but using something like drbd [drbd.org]).

"You'll pay to know what you really think." -- J.R. "Bob" Dobbs

Working...