Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

SGI and SuSE Team Up on FailSafe for Linux 111

Syn Ack writes, "SGI and SuSE announced at CEBIT that they are going to team up to bring Iris FailSafe to Linux. Linus is quoted as saying that this is a "piece of the puzzle" that Linux is missing. Here is SGI's press release." The press release says FailSafe for Linux will be open source, but doesn't say under what license.
This discussion has been archived. No new comments can be posted.

SGI and SuSE Team Up on FailSafe for Linux

Comments Filter:
  • This kind of redundancy and task distribution could help break linux/unix type systems more into the upper level corporate server market where Solaris currently seems to be the trend do to it's robustness.
  • Yes, but we have to remember that Sun's Community License calls itself "open source" too...

  • True, but that's doesn't exactly seem to be the case.
  • Could this sort of thing be used as protection against Denial of Service attacks?
  • Really I don't think this would help that much. It's just like having twice or three times as much computing power so what happens is either it takes that much more traffic to take down the server, or it takes a normal amount, causes an overflow into the second server and then just requires more time.
  • This is what all those nay-sayers of Linux have been waiting for. "But who needs FailSafe for Linux? I thought it was FAILPROOF! Isn't that why we should switch to it? I don't see Windows needing FailSafe..."

    BTW, for those of you who couldn't tell, that was a joke :P
  • Not being a networking person, I'm really rather ignorant in this area. I always assumed that all multiple server systems must be running something simmilar to this. Since the majority of the servers out there are running unix/linux how have they been doing this sort of distributed overflow handleing stuff?
  • by Zurk ( 37028 )
    Redhat's piranha tools in 6.1 already allow clustering. The SGIs failsafe thing needs to work with a RAID drive array and does roughly the same thing as the linux virtual server project http://www.linuxvirtualserver.org/ [linuxvirtualserver.org]
    SGIs stuff is available for download if anyone is interested at http://oss.sgi.com/ projects/sgilinux11/download/1.2-latest/ISO/ [sgi.com]
  • failsafe, failsafe.. Whats a failsafe..
  • by pb ( 1020 )
    SGI has just gotten cooler and cooler. They *really* donate code to the cause, as opposed to Sun, which just claims to do so... (but with strings attached)

    And failover is really handy. You never know when you're going to have hardware die on you. Of course, it's much more important when you're working with a bunch of NT boxes, but... ;)

    Established vendors with real operating systems switching to Linux instead for production-level systems, and improving it, and giving back to the community. Does it get any better?
    ---
    pb Reply or e-mail; don't vaguely moderate [152.7.41.11].
  • I guess AC doesn't know about "Loki". :)

    meisenst
  • ok. that is a dumb question..:) to be brief :
    The high availability of virtual server can be provided by using a tool to monitor network service availability and server nodes. The "heartbeat" code currently provides the heartbeats among two node computers through serial line and UDP heartbeats. IP take-over software is provided by using of ARP spoofing. In other words,
    [a] a daemon sites monitoring heartbeat packets coming from the servers [master and slaves]
    [b] one master goes down, the heartbeats stop from the master.
    [c] one of the other slave(s) takes over with ARP spoofing of the masters ip address.
    more info :
    http://www.linuxvirtualserver.org/HighAvailabili ty.html
  • "Could this sort of thing be used as protection against Denial of Service attacks?"

    Not really. The classic DDoS attack ( AKA what took down Yahoo ) simply has no defiance. Apart from perhaps having separate sites under different names with the same Data onboard.

    What most people miss about those DDoS attacks is that they didn't actually overload the servers. On the contrary, they loaded down the pipes so much that the servers sat idle for hours.

    See cringly's latest rant [pbs.org] for more details ( not from him but a letter writer ). The only practical protection is to secure machines to prevent them becoming zombies in someone else's DDoS army.

  • I know that failsafe won't require a numa architecture to run under linux, but the link mentions this running on ccNUMA based Origin servers. I haven't heard anything about numa support for linux, and it would be a very cool thing to have. It would definitely help Linux grow out of the strictly peecee image it has. I have no idea what it would take to implement that though, if it could be some sort of subsystem, or if it would have to be a seperate branch from the smp capable kernel. Does anyone have any resources for NUMA on linux?
  • This is not a clustering but a failover solution. In case of a failure the second machine will assume the identity of the main one and start the same services (maybe with reduced performance).

    Unless DOS means that somebody is ripping out e.g. the network cable, this won't help.

    Michael
  • by Forge ( 2456 ) <kevinforge AT gmail DOT com> on Saturday February 26, 2000 @09:07AM (#1244465) Homepage Journal
    This will do wonders for Linux Availability and Scalebility.

    That however is just touching on the obvious part. Less obvious is that this will let stuff written for Linux scale to the upper limits of business computing in very short order.

    How is that you ask ?

    Linux has been ported to the IBM Mainframe in such a way that a single s390 Box can run thousands of copies of Linux each doing dedicated tasks. The resources available to each can be adjusted to whatever the Kernel supports ( I.e. 64 GB or RAM, 2 TB of Storage etc... ) or what's needed for that particular operation ( 1 mips on the image server, 3 on the Web servers and 20 on the Database server ).

    Add this in and you start to see a really terrifying scenario where Linux is able to scale to tomorrow's web service tasks and very little else can. By that I mean when Computing takes off in the 3rd world the way TV has. When Bandwidth becomes cheaper and more abundant. You are talking about 20 Million 800x600 two way Videophone conversations at the same time.

    Nobody has the horsepower to play traffic cop in that situation now. But a Linux mainframe scaled to beyond today's limit will. Being Linux will simplify the development process since the developers can all have it on the desktop too.

    As for the Licensing. SGI isn't completely cluless. They put XFS under the GPL to get it into the Kernel and avoid a fuss. This ccNUMA stuff will be at least partially in Kernel space so you can once again expect it to be GPLed.

  • SGI is so great that they're ditching IRIX in favor of Linux.

    Look at the facts [sgi.com] first please (providing links is nice too).

    SGI has really changed their direction lately.
    ---
    pb Reply or e-mail; don't vaguely moderate [152.7.41.11].
  • by Anonymous Coward
    SGI wants you to think they actually care about opensource but really they don't. That's why they try to charge people $1,200 for the package needed to run even gcc. It's called the IDO. They package up needed headers and other essentials along with their compilers with this IDO to lock you into using their stuff only.

    Now rather than being kind individuals give out the IDO package they refuse. Irix 5.3 and below are basically limited to hobbyists today. Not many are still running 4D equipment in any serious commercial application. In doing this they are really only hurting the 'community' of hobbyists who like do create free software. SGI is incredibly two-faced. Don't help them in any way or promote anything they make until they quit being such hypocrites.
  • moderator points expire, and they only are granted every now and then, so moderators feel obligated to use them on posts like this.
    --
    Whether you think that you can, or that you can't, you are usually right.
  • by LanMan ( 16456 ) on Saturday February 26, 2000 @09:32AM (#1244475) Homepage
    Perhaps I've missed something, but the High-Availability Linux Project (http://www.linux-ha.org [linux-ha.org]) already has similiar goals for clustering and failover.

    Wouldn't it be better to put more community effort into a "real" OpenSource (GPL'ed) solution instead of trying to port Irix's existing product and possibly getting a half-baked license?

  • by DevTopics ( 150455 ) on Saturday February 26, 2000 @09:37AM (#1244477) Homepage
    A staff member of the SuSE team told me that the source for IRIS FailSafe will be GPL'ed. And if you take a look at http://www.heise.de/newsticker/data/odi-26.02.00-0 01/ you will notice that the c't magazine writes the same, so that this info has a high probability...
  • by Anonymous Coward
    Irix 6.5 ships with the dev foundation CDs which let you run GCC without buying the IDO. Needing the IDO for gcc was only true for 5.3 and older.
  • Linux may be the only enterprise UNIX but that hardly makes it an enterprise operating system. I'm not even sure if that assurtion that it's the only Enterprise UNIX is even supportable, but it'd be cool if it were true...

    Linux NEEDS clustering to become a true Enterprise operating system. It also needs to actually improve its stability... a true enterprise operating system must NEVER, EVER go down. Even for upgrades! Clustering is one solution to this problem, and Linux needs it to gain mass enterprise acceptance. Until then, things like OpenVMS (which isn't open in the open source sense) will still be more feasable in the enterprise market.

    I'm not saying that SGI is a savior or something, but this is a weak area in Linux which needs to improve. Until it does, closed operating systems which already implement clustering will be used in Linux's place in enterprise situations. (This is also a lesson Microsoft needs to learn but I'd rather they didn't and keep on trying to sell Win2000 as an "enterprise solution." :)

  • Well, since this is PORTING and not re-inventing project - then SGI brings their experience with Failsafe.

    Ofcourse, since Linux HA and FailSafe are open source - then the HA guys can grab (and look) at the source and make the HA better...

  • The 3D filemanager as seen on Jurrasic Park. [sgi.com] THAT'S what I want to see open-sourced =)
  • I too think the notion of "enterprise os" is a little far fetched for now.
    But - and this is a big but - as you say the ms'esque enterprise thingy (which mustn't be confused with the _real_ enterprise) is an area where linux is a real danger for microsoft.
    Nobody in their right mind would use win2000 (or linux) for let's say bank transactions, but there's a big area between these uses and small office appliances where good but not maximal reliability is needed. MS wants to go there, but combine the public perception of linux being more robust than everything MS has ever produced with these additional industrial ha-features and I predict microsoft will face more and more problems on it's way.
  • by Syn Ack ( 3105 ) <slashdot@[ ]me.ca ['not' in gap]> on Saturday February 26, 2000 @10:26AM (#1244487) Homepage
    Piranha aka LVS is NOT the same thing as FailSafe. LVS is more like a Cisco local director. FailSafe or MC/ServiceGuard (HP-UX) is for protecting applications like Oracle where LVS is more for network services like Web and SMTP/POP servers.

    I specialize in High Availability for a consulting firm here in Toronto so I am as close to an expert on these topics as you can get. I use MC/ServiceGuard when protecting databases, backup programs or anything that isn't network based. I use hardware load balancers like ArrowPoint, Big/IP, or Cisco LocalDirector when I have to cluster and load balance Web servers or mail servers.

    If you had read the information about failsafe you would have figured this out.

    It pays to inform yourself before opening your mouth.

    :)

    Paul
    ---
    Syn Ack.
  • There is such a thing. It's called FSV. Check out http://fox.mit.edu/skunk/soft/fsv/ [mit.edu].

  • There are a lot of other fault-toler ant systems [networkmagazine.com]. Most are Unix on redundant hardware, or Unix-like, such as VOS [stratus.com] (but that's being replaced by a fault-tolerant HP-UX [stratus.com]). It's nice to see Linux acquiring a few more automated capabilities.

    • VMS [digital.com] has been around a little while and has quite an assortment of abilities.
    • Bridges' OS list [arizona.edu]
  • SGI is currently working on a port. Check out the information [sgi.com] at sgi's oss site. e;
  • Most SGI engineers are willing to "lend" you an
    SGI 5.3 IDO CD if you ask them in the right way and make the point that it's for a non commercial and hobby system. If you spent less time bashing SGI and approached them in the right way you might get somewhere!

    Although to some degree you have a point, since IRIX 5.3 is now obsolete and only useful on older
    (MIPS R3K) hardware they should just release it for free download.
  • by Anonymous Coward on Saturday February 26, 2000 @10:49AM (#1244494)
    I hope Redhat, Caldera and Turbo leap on this bandwagon. It's consistent with the goals of the linux-ha project, and gives it a tremendous kick-start.

    The capabilities of SGI's stuff aren't in any of the current Linux offerings. It has complete N-node cluster quorum, application monitoring and failover-restart capabilities. It also has the nice GUI that is necessary to make it look real. This is completely comparable to the Win NT/2k Microsoft Cluster Services (MSCS).

    The people who have made whinny comments here really don't get it. It would have taken a year or 18 months for the community to come up with something flakey that would approach the capabilities that have just been dropped in our laps by the grace of [deity of your choice]. Adoption and exploitation of Linux/Failsafe, and getting it all going on IA64 this year is critical to smacking Redmond around while they fumble with Win2k.

    I would hope more and more companies with locked in proprietary software would release it like SGI, making it usefull and acceptable to people who won't go down the proprietary road. We could still use some better storage solutions.

  • The Jurassic Park file manager is already available for Linux. (Or at least, a 3D file viewer that's similar to it.) http://fox.mit.edu/skunk/soft/fsv
  • You are probably the exact same COWARD who whines everytime there is a sgi article. They CAN'T release it for free, they licensed the tech from AT&T and others. If you have a copy of the IDO somewhere and you look at the header files they have copyrights all over them, and they aren't from SGI; hence SGI CAN NOT LEGALLY GIVE THEM TO YOU!!!!

    I guess you want sgi to be sued to death by giving out other companies' copyrighted material!!! They are bound by law... oh yeah in your mind I guess that companies should be able to take GPL'd software and make it closed source. I guess you are just a friggin idiot!

    I have posted multiple times about getting header files from a linux box then using the compiled gcc (from SGI nontheless) and getting it working just fine. If you aren't able to get it going, I guess that would be because you just aren't smart enough.

    Spell checker & grammar check off because I don't care.
  • by Pengo ( 28814 ) on Saturday February 26, 2000 @11:38AM (#1244499) Journal

    I was just visiting the SGI booth here at Cebit and I must say that I am very impressed with those guys. I was talking to one of the engineers that have worked on the XFS port to linux, and it was interesting to hear the "Engineers" point of view on the entire release scenerio of XFS into the GPL /Linux world. Aparantly SGI is working very hard right now to get all of the copyrighted code out of the XFS source. To me it sounded like it started as a great marketing decision and the engineers had to kinda clean up after them a bit. :) (Sound familiar!?) :)

    They previewed for me the XFS actually working on one of there linux boxes running at the show.. (I must say, the new rack mount cases they have are SOOO sexy!!) :)

    But most importantly , I spent a bit of time talking to the engineers and I was very impressed with how they want to help the community. I felt like they where members of the community themselves, just getting paid for it. :) I must say that any mixed feelings I had about SGI previous to now have been turned around. (Who knows, maybe thats just the power of a 15 million dollar booth!) :)

    Has anyone had a chance to see the new Octane product they have under a NDA? (I am going to sign it just to get into the "Closed doors" and play with it...)
  • Most SGI engineers are willing to "lend" you an SGI 5.3 IDO CD if you ask them in the right way and make the point that it's for a non commercial and hobby system.

    It's easy to find someone willing to lend you a Windows 2000 and Visual C++ CD, so where's your point?

  • For any company bigger than say 10 people, the guys you meet at the Cebit booth are not the actual product engineers. This sounds a lot like they hired a bunch of Linux people to pose as the real engineers and rave about the "community". The sad thing is it seems to work.
  • They may well be actual engineers. I work for an 800+ IT company, and if we were to attend Cebit, you can rest assured that some of the actual product engineers would be there.

    See you in March at DATE-2000. And yes, I am an actual product engineer for the stuff that I will be exhibiting there.

    --

  • *sigh* I don't even know why I bother to respond to crap like this. But I will anyway...

    a) SGI can do nothing right: so I guess switching to Linux is wrong? Making a very high percentage of the machines on the Top500 list is wrong? Um, ok...

    b) They have crappy unscalable hardware: So I guess Onyx II Infinite Reality Graphics are crappy? Hate to break it to you, but, while they may be a bit pricey, there ain't nothin' much faster. As for unscalable, 512 processors isn't scalable? Please. I run Irix on a machine with 512 processors and 196 gigs of RAM. Can Linux do that? Other than Cray, and Intel's one-off for ASCI, does anyone make anything bigger? Granted we (I work for SGI, in case you couldn't tell) are selling Cray, but the T3E has been sold in configurations of 1800 processors and the architecture scales even further. I think that qualifies as "scalable".

    c) Inferior OS with no features Linux doesn't have: Pass me whatever *you're* smoking please. How about a journaling file system that is production ready? Scalable to 512 processors? ccNUMA support? Runs Alias|Wavefront applications that produce probably at least half the special effects you see on TV/movies? I'm sure there are more, but I don't feel like coming up with them. Now don't get me wrong - Linux is a great OS, but that doesn't mean that Linux lacks no feature found in Irix.

    d) Public commercial company: RedHat. VA Reasearch. Need I go on?

    e) Secret motives to steal the genious from Linux: And just how would we do that even if we wanted to??? All we would succeed in doing is getting everyone upset with us and ending up with a propietary version of Linux. Where, exactly, would that get us besides bankruptcy court? If it were up to us, we'd probably insert massive scalability features into Linux like, say, support for 512 processor SSI's. But, the Linux community would never accept those changes so we simply won't make them until the community will. Trust me, SGI is far more interested in playing by the rules than I'll bet most "Linux companies" out there.

    If you want a company that keeps mumbling about contributions to Linux/Open Source and doesn't deliver, think of Sun, not SGI.

  • Who needs a Microsoft engineer? Everyone else has those CDs.
  • IRIX 5.3 will run fine on MIPS R4K hardware; specifically on Indigo, Indy, and Challenge hardware, as well as R3K platforms. Some later CPUs may require specific patches to operate properly.
  • "IRIS FailSafe runs in a cluster environment"

    OK, this I can appreciate. But the next sentence makes me wonder how useful it will be :

    "In the event of a failure IRIS FailSafe automatically fails over applications from one system in the cluster to the other."

    so if i understand correctly, if an application fails, Iris makes sure that the failure is spread out over the whole cluster. Distributed failing ? Interesting approach .....

    CJM
  • Wouldn't it be better to put more communty effort into a [...] GPL'ed solution instead of trying to port Irix's existing product and possibly getting a half-baked license?

    I think SGI have been quite good about licenses generally. Their journaling file system, XFS [sgi.com], is released under the GPL, as is NFS 3 and probably more of their stuff [sgi.com]. So let's wait to see what license they use here before assuming it will be the sort that Sun try to fob us off with.
  • SGI wants you to think that they care about opensource, but really they don't.
    What you say about SGI does sound disturbing. However, I would urge people to "support policies, not companies". SGI are releasing some good GPLed software which will help the free software community. We should praise this and use their GPLed software. From what you've said, it sounds like they are also ripping off some of their customers. If it's true, then this behaviour should be condemned. It's not hypocritical to give both praise and condemnation to a single company for different actions. What is hypocritical is to support the bad actions of a company just because you like something else which they are doing.
    Remember, public companies have a legal obligation to make money. This means that they will act with (enlightened?) self-interest. This means most companies will at different times act in ways which are good or bad from our point of view. It's not like with humans, where personality comes into it. All companies have the same selfish personality, just reacting differently because they are in different situations.


  • I understand that Linux has multiple desktop environments (aka window-like GUI), I also understand that Linux has multiple distros. I can understand the possible need for multiple journaling file system for Linux, but try as I may, I just can't find enough reason for Linux to have multiple HA, Fault Tolerant options.

    There is the original High Availability scheme that had been in development for quite some time, and then Red Hat thrown in its own version - sorry I can't remember what's the name - and now SGI and SUSE is coming in with its own version.

    While I personally to the "CHOICE IS GOOD, MORE CHOICES IS BETTER" concept, multiple HA/FT option will ultimately confuse the consumer/user of Linux, and that will undercut Linux's ability to gain confidence in the corporate world.

    Please allow me to propose that all the available options for HA/FT for Linux to be murged, and in the result, Linux will become the world's # 1 OS with the most robust HA/FT.



  • > This kind of redundancy and task distribution
    > could help break linux/unix type systems more
    > into the upper level corporate server market
    > where Solaris currently seems to be the trend
    > do to it's robustness.

    I beg to differ, albeit for just a little.

    Yes, it is true, the availability of HA (High Availability) and FT (Fault Tolerance) will be good for Linux in general, but we must be careful not to ignore the adage "Too many cooks spoilt the soup" for there are currently (counting the SGI/SUSE announcment) at least THREE different implementations of HA/FT for Linux !!

    Linux should acquire HA/FT, no doubt, but Linux should have ONE VERY ROBUST HA/FT and not three or four or five not-very-much-useful HA/FT.

    Please allow me to propose that a merge of all these HA/FT efforts to be carried out, so to benefit all the Linux community and to pave the way for Linux to penetrate the corporate world and be used as the OS for the entire enterprise.



  • While it may be understandable for Linux to have multiple GUI development... While it may even be okay for Linux to have more than one journaling FS, I question the wisdom for Linux to have more than one HA/FT option, because, IMVHO, that will only confuse the users, and the confusion will be turned into a powerful weapons for M$ FUD machinery.

    They may say something like this - "See? Linux is so fragmented that even in the HA/FT area it has to have many different implementations of HA/FT !"

    My question again, thus : Does Linux really need more than one HA/FT implementation?

    Is it possible to pursuade the people behind all those different implementation to merge their efforts and produce a most stable and robust HA/FT thingy for Linux.

    Does anyone think that is possible?

  • SGI wants you to think they actually care about opensource but really they don't.

    Hmm. That'll explain why they contributed a journaling filesystem to the Linux kernel under the GPL, then.

  • <donning asbestos underwear>

    Gosh, that sounds like Microsoft Windows. It seems that the Navy had a distributed failure a while back...
  • Any companies looking at SERIOUS HA/Failover clusters are looking for 1 major thing - is it supported? They could probably care less about the source code so long as they know that at 3am on Christmas morning they can call tech support and get someone on the phone that might have a clue as to what's going on. As long as companies can fork out $$ for "premier platinum gold ultra unlimited enterprise emergency super-dooper-tech-guy-living-at-your-data-center" levels of support they'll be happy. As long as it's supported well companies will jump on it. Otherwise, they'll be less inclined (though I'm sure plenty techies will push for it regardless).

    Slightly off topic... I recall Veritas announcing they were porting software to Linux and I'm hoping their HA software goes too. It was pretty good stuff. I'd be happy running VxVM, VxFS and FirstWatch (or whatever they're calling it now) on my Linux hardware.
  • #53 is irony. Happy? :) Perhaps it's the lack of sleep and the general frustration with seeing my company and indirectly my coworkers flamed every time we release something as Open Source, but I sure read that as flamebait. If it wasn't intended that way, I appologize for the tone of my reply.

    This is kinda unrelated but I would like to say this - perhaps this post was meant as a joke, but others of this nature often aren't. I've noticed that any time a company releases a product open source, the first thing that happens is they get flamed on Slashdot. This does *not* encourage companies to release more software. I've been in meetings where first and second line managers (I'm a peon, not a manager, though) have questioned SGI's strategy of going open source because they have seen comments here and wonder if we will ever be accepted. Remember that when you post, people! Your posts *do* get read, sometimes even by important people! EOR (End Of Rant)

  • I personally think that the Linux community is aiming too low here. High Availability failover services are just about to become yesterdays technology. Take a look at where Compaq are taking their Tru64 Unix clustering.

    ... A cluster "system" disk, containing a common /usr for all systems (each cluster member has its own root and swap, also on shared storage).

    ... Cluster Common Filesystem. All filesystems mounted on any cluster member appear in the mount tables on all systems. Even filesystems on private buses (eg: CD-ROM's)

    ... Context Dependant Symbolic links, eg: /etc/{memb}/blah/... where {memb} is mapped to the cluster member ID. From a members perspective the filesystem structure adheres to tradition, when in reality system specific parts of the filesystem are held independantly.

    ... Install the OS once and the Cluster software once. Adding new cluster members (out of the box, with no installed OS) takes only 10 minutes.

    ... Install an application only once and all members can run it.

    ... Cluster member numbers factored into PID numbers (init is no longer PID 1) creating unique cluster wide PID's. Helps in cluster process management, but more importantly, paves the way for future advances in "process" failover between cluster members. IMHO this is the holy grail for future cluster technology.

    ... DLM (distributed lock manager) out of the box. Applications like Oracle Parallel Service should be a lot easier to build, run an maintain in future.

    There are a good number of other features, but this is enough to get the point across. There is a big difference between what is "called" clustering in the UNIX world right now (which is not much more than fast hot standby failover) and what clustering was meant to be. VMS has had it for years. Compaq's Tru64 UNIX is on the cusp of getting it (first production quality release is TruCluster v5.0a, due I believe within a month or two).

    THIS is what Linux Clustering needs to be aiming for. Not playing catch up with existing failover technology, because that will soon go the way of the dinosaurs.

    Macka
  • > Applications like Oracle Parallel Service

    Whoops, typo. That should be Oracle Parallel Server.

    Macka
  • I'm not that familar with FailSafe but I know that MC/ServiceGuard (HPUX) RAID is no required but obviously recommended, that or disk mirroring. In fact a local drive is only required in a 2 node and 4 node cluster, the disk is connected to each machine in the cluster and becomes the "tie breaker" in the event that the heartbeat is lost, the first machine to get the disk becomes the new parent node in the cluster.

    I've setup MC/SG for Oracle, Mysql, msql, and a few other things it works really well and I am excited to see FailSafe comming to Linux.

    Paul (aka Syn Ack)
  • No.

    Mosix is a clustering technology which is more similar to--yes, your favorite--Beowulf. Except that Mosix is basically, as my friend puts it, "SMP Writ Large" :) The people who maintain Mosix call it a "fork and forget" cluster, because basically what it does is to distribute processes between nodes. It's not as special-purpose as Beowulf, and doesn't need to have things specially coded/compiled for it to work (of course, Beowulf will likely get better performance, IF you take the time to tailor your app to it, and if your app was "embarassingly parallel" to begin with).

    This is more of a failover technology, e.g. it's not really a "cluster" in the sense you're thinking. It's more than 1 machine, yes, but they're there to provide high availability. Basically, if one machine goes down, another will take over for it.

    You can get something similar by going here:

    http://linuxvirtualserver.org

    They have patches and instructions for setting up a nifty webserver HA cluster, which makes use of apps like mon, heartbeat, and fake (at least 2 of which are Debian packages, which makes my life easier :)

    I'm now building a cluster out of low-end machines, and I'm going to try to run both Mosix AND VirtualServer :) Maybe I'll try this SGI thing when it comes out, too; can't look bad on a resume...
  • At least, I hope that was the original intent. Just in case...

    It's not: ...'failing' over...
    It's: ...'failing over'

    As in, "failover". This means the software does what you would expect: in the event of a failure, a working machine takes over for the failed one.
  • > a cluster "system" disk...common /usr

    You can do this with Coda.

    > all filesystems mounted on any node appear to all others...

    Now that's cool :) Got me there.

    > Context Dependent Symbolic Links

    OK, don't think we have that now, but it doesn't sound incredibly hard to do...

    > Install the OS once and the cluster software once...

    Put an NFS server on one of the nodes, serving "/". When you get a new client, fire it up with Tom's rootboot, fdisk the new disk, mount the local drive and the NFS share, and cp -afr. Adjust /etc/init.d/network, chroot, LILO, reboot.

    > Install any application once, all members can run it

    As long as you have a shared /usr or /opt or whatever, that's pretty much implied (so long as all nodes are running the same kernel, C libraries, etc...which they really would be).

    > Cluster member numbers...

    It sounds like Mosix may be doing something along these lines, but I admit that I'm not entirely certain (yet). I'm also not certain about that last thing you mentioned (DLM), I just wanted to point out that some of these are doable today with Linux (some, like Coda, are not "finished"...but what ever is? :)
  • This is exactly true -

    I work as a consultant for a firm that sells HA software on a variety of platforms - Linux, FreeBSD, HP-UX, IRIX, Solaris (x86 and SPARC), and NT. We have orders for our RSF-1 product (Linux HA) and the #1 issue of our customers is "Can you support it?" ... And we can(24 x 7 x 365)!

    I definitely agree w/ Linus - This is one of the things that Linux is missing to come completely "Enterpise-Ready", and we provide a scalable solution for Linux NOW, with support for upto 32 nodes and 256 applications that is not hardware or application specific. We also support cross-platform clustering so that you can add a Linux box into your existing cluster w/almost no impact. Just a cheap shameless plug :)

    http://www.starfiretechnology.com [starfiretechnology.com]
    bob@starfiretechnology.com [mailto]

  • I'm one of the key contributors to the Linux-HA project, and the owner of the domain name linux-ha.org, etc. I've recently joined SuSE to help them bring this to market. It's way ahead of where we could be on our own.

    I think we'll do well, and expect to get this out much faster than we possibly could starting from scratch. We expect to use something like the Apache development model.
  • With the SGI announcement, our aim went up considerably :-) We expect it to keep rising and rising. Several of the things you're talking about are under development now.

    Check out the Global Filesystem project at www.globalfilesystem.org.

    One thing to keep in mind to help on balance: 90% of all cluster systems will be 2 or 3 node clusters. The more powerful machines get, the higher the percentage will require only a few nodes.
  • There are several ways to do this:

    Hire those who are responsible for other alternatives (that's what SuSE did with me and a few others) :-)

    Produce a superior product sooner, and put it out under the right license terms. Work on including the important industry players. We're working on this strategy now...

    Now, this is not to say that alternatives are bad, because the Next Great Breakthrough couldn't happen without alternatives.
  • Now, here's a man with a good grasp of the plan :-)
  • My personal speculation:
    Caldera will likely jump on the bandwagon. Red Hat might, and TurboLinux almost certainly won't. Of course, all would be more than welcome to. We've got a really big bandwagon :-)
  • Linux should acquire HA/FT, no doubt, but Linux should have ONE VERY ROBUST HA/FT and not three or four or five not-very-much-useful HA/FT.

    Kinda silly suggestion, really. No offense intended, but this SGI solution is not a "not-very-much-useful" solution, it's a tried and proven solution.

    There are many routes to take to a HA system, and merging them all into one is going to a) stifle individual development (since a lot of open-source projects are for the developers to develop as well as the code), b) limit our choices and c) I don't really have time to come up with a "c)", but just an "A)" and a "B)" would look silly.

    "THREE" different implementations is a) not an outrageous number, b) not even beginning to reveal the real number of options when you call into play the hardware and other software solutions for a HA system and c) I've got that "c)" problem again.

    Linux is no longer an infant, but it's still too early to start cutting off its options as it works its way into adolescence. Give it time to experiment. There's room for a lot of projects.

  • Hi, I'm from SuSE and have been/am invoolved in this stuff.

    The owner of linux-ha.org and leader of that project is Alan Robertson. He is a SuSE employee now. End ;-)
    --
    Michael Hasenstein
    http://www.suse.de/~mha/ [www.suse.de]

  • First, sorry for the typo ("invoolved")...

    Ah, read first, then post, didn't see he already spoke for himself... sorry.
    --
    Michael Hasenstein
    http://www.suse.de/~mha/ [www.suse.de]

  • If you look under the surface, what there is today, is either the load balancing solution for web serving type environment or peices that go into the enterprise high availability solution but not a complete scalable HA solution - Redhat has only software RAID implementation which is but one of the components to allow resilience from disk failures, Turbocluster is the load balancing solution, like Webdirector. Some of the other packages that support HA for linux, are nowhere near the capabilities IRIS FailSafe would bring to Linux in a very short amount of time and give it a giant leap forward in the enterprise domain. The goal would be to strive for a common HA solution however and yes, to have combined efforts to the extent possible, one of the reasons why we are working in partnership with the Linux community HA leader!
  • Yes, MOSIX can do some of the above:

    The latest development of the MFS file-system of MOSIX, now available for testing, allows all files and directories on all mounted file-systems of all the nodes in the cluster to be viewed as a single file-system (subject to certain documented restrictions):

    If you mount MFS on /mfs, then the root of node #1 will be accessible via "/mfs/1", the root of node #2 on "/mfs/2", etc. The equivalent of Context Dependent Symbolic Links (though implemented as directories, rather than symbolic links) is also implemented: for example, "/mfs/home" refers to the root of the calling process' home-node; "/mfs/here" refers to the root of the calling process' current node; and "/mfs/selected" refers to the root of any node previously selected by the process or by its parent(s).

The 11 is for people with the pride of a 10 and the pocketbook of an 8. -- R.B. Greenberg [referring to PDPs?]

Working...