Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Linux Software

Linux Clustering Cabal project 59

RayChuang turned us on to this ZDnet story about the Linux Clustering Cabal project, which, Ray says, is "...the one that will allow Linux server clustering of many server machines. Sounds like just the thing to finally get eBay working reliabily and also make John C. Dvorak eat his words about the deficiencies of Linux."
This discussion has been archived. No new comments can be posted.

Linux Clustering Cabal project

Comments Filter:
  • In a nutshell: Ninja is dealing with several problems which Jini is not addressing. We care a great deal about security, scalability (millions of simultaneous users), fault-tolerance, and deployment of wide-area services -- Jini is more focused on the local area and "workgroup" issues. We hope that there will be a Ninja-Jini bridge so that the two can talk to each other. I will be at the Jini Community Meeting in Annapolis next month to discuss these issues with the Jini folks and get a better handle on them.
  • interesting...i was kind of disappointed by jini once i full understood the architecture. infospheres from caltech (infospheres.com) seems more scalable. generally speaking i think this stuff is the next big thingTM but theres just so many damn versions floating around..jini, espeak, ninja, etc...
  • Clustering is very differant than what you describe. In most unix enviroments (or sometimes refered to as High Availability) it is failover. Where you have one App running on a server, and a backup server that chacks to see if the App server is running the App. If the Appserver dies then the backup starts the App and assumes the IP of the Appserver. This is just an oversimplified example. VMS uses a simular scheme that is more dynamic that enforces load sharing. Then Systems like seti@home is more of a distributed application for hi-calculation. This system does not do well in a database driven application. Clusters in general make many hosts into one host.
  • The best place I found were the talk page and products pages off of www.bitmover.com

    There isn't a whole lot there right now.
  • I think the idea of having a project where one of the aims is 'make John C. Dvorak eat his words' is a really good idea.

    Here are some more projects that might be worthwhile:

    - The Bob Metcalfe Word-Eating Project
    - The SCO Project
    - The Mindcraft Project
  • First off, the product isn't licensed as free software or OSI Certified -- because there's not yet either product or a license (which is to say, WRT product, there's a program, but it's not yet product).

    From what I've pieced together of comments of Larry's (SVLUG), web blurbs (his, others), and the license sketch currently on the download site, terms will be liberal but not quite free. Larry likes the idea of free software, but isn't convinced he can make a commercial go of it all in and of itself. Specifically, my impression was that the source is available and hackable (a specific requirement of Alan Cox, per Larry).

    Given that his business model right now is sort of half-software house, half-consulting services (SW: BitKeeper and others, services, Hunkin' Big Clusters), I'd like to hope he eventually discovers he doesn't have to worry quite so much about this. Along the lines of Cygnus.

    For insight on licensing, you might want to read:

    • The current COPYING [bitkeeper.com] file of the BitKeeper FTP archive.
    • A bit about BitKeeper and Free Software [bitkeeper.com], at the BitKeeper site.
    • An LWN feature [lwn.net] on BitKeeper and its licensing|business model.

    Note that the most commonly cited alternatives to Larry's solution all have pretty heavy consequences:

    • Sun Community Source License. As the wags say, it's the (Sun Community) + (Source License), not the (Sun) + (Community Source License). Sun gets some hefty rights reserved to itself, largely so that it can continue to control the direction software to benefit itself.
    • The Alladin Ghostscript licensing method -- what's been called "Delayed Public License" -- software is licensed under the GPL (largely because Peter Deutsch promised RMS that he always would) -- but only after an initial period in which the software is covered by a proprietary license. This means that the OSS version of the software is always slightly behind the proprietary version.
    • Shareware|Freeware -- there's lots of software out there that's cost-free to use, but the source isn't available. Convenient, yes. Full benefits of free software, no.

    The BitKeeper license is most like the SCSL, though the intent seems to be to build a code escrow term into it which reverts to GPL should BitKeeper fold or fail to maintain the source.

    Addressing specific points of your post, certain libs of BitKeeper will be GPLd or LGPLd, allowing them to be redistributed or incorporated into projects under terms of the GNU [L]GPL.

    WRT your bugfix and feature comments -- the BitKeeper license is oriented around limiting potential for fragmentation. It's got some elements of the common view of the xBSD development model (centrally controlled cabal), and I'll share your view that this is, if not a Bad Thing, at least a Thing of Questionable Worth (TM).

    I can't see how your last point (source is still closed) stands with your other arguments. The source is available, it can be reviewd, modified, and mucked with, It's not compliante with the OSD, but it's certainly not proprietary either.

    Larry's blazing a new path here, it'll be interesting to see how it plays out.

  • the shift into Intel consumerism where they did not have any competitive advantaged showed some very wooly thinking

    They thought they did have some competitive advantage on the Intel platform, and technically speaking, they did, with their UMA architecture and strong texturing and video capabilities. However, SGI's woolly thinking since 1996 has been overlooking the fact that a differentiated product is not enough, you have to be differentiated in an area that adds significant value to a significantly large market.

    (for the cognosti, there is nothing technically inferior about the MIPS architecture)
    I agree but this is pretty irrelevant. At the end of the day, it's all economics, and MIPS and other RISCs have steadily lost substantial price/performance ground to Intel, with no business model that ever made sense to regain it (i.e one amortizing both fab and multi-team design investments over relatively small volumes. Their embedded strategy overlooked the fact that volumes don't cut your high-end processor design costs.)

    The engineers there (grossly generalizing here) got really excited about texture mapping but the bulk (40+%) of the workstation markets didn't need it very much -- CAD engineers wanted more polygons, not texture fillrate. Sun and HP paid more attention to CAD and stopped SGI's growth in its tracks. The texture-intensive "entertainment/digital content creation market" was growing from 10s of millions and never bulked up enough to help save SGI.

    If you work for a company, you'd realise that the first law is survival which is depedent on their market relevance.
    I agree totally, and this is exactly what has been giving SGI such trouble. They've focused more on where they could do interesting cutting-edge differentiated things than on where they could be most relevant to the largest segment of customers.

    In hindsight, their timing was just bad- they should have either gone NT a couple years earlier or held off till later (a la Sun). And they haven't to this day figured out how to reduce engineering cycle times for their products down to PC standards of 6-12 months for each new product, not 3-4 years.

    I still wish em the best, but find it hard to put much hope in them at this point.

    --LP

  • I too would like to know how to do real-time replication like that - having one computer handling a databse still presents a single point of failure... Heck, slashdot probably needs this too! :P
  • by LL ( 20038 ) on Friday September 24, 1999 @07:16PM (#1660571)
    belswick [slashdot.org] wrote
    Note that SGI is showing all the signs of entering the death throes stage. Another 30% of the workforce laid off, abandoning major initiatives, CEO bailing (to MS!!), loss of faith by major customers.

    Unless you've got inside information (which the SEC would be very interested in hearing about), I think the slashdot audience would appreciate more evidence than mindless parrotting of popular press. For your information, they are spinning off several portions of their divisions into separate business entities. Now while some people may consider this akin to kicking fledgings out the nest, the rate of turnover in Silcon Valley is such that the difference between working for one company vs another is just which branded T-shirt you wear. Think of it as a beehive with clumps forming and dispersing to form interesting new combinations. Abandoning major initiatives?, how many announcements have you've heard from major companies that have died the silent death of being irrelevant to real needs.

    As for the CEO, well, I'm sure there will be some interesting books a few years down the track but for many hard-core SGI purchasers, the shift into Intel consumerism where they did not have any competitive advantaged showed some very wooly thinking (for the cognosti, there is nothing technically inferior about the MIPS architecture). The loss of customers is not surprising considering that many applications that used to be top-end in the 70s can now run on a single modern processor and big cache (the refuge of the lazy microarchitect). Getting a free ride from Moore's Law is not the same as coming up with innovative new software applciations that can really take advantage of increased CPU capacity (apart from molecular simulations which will chew up any CPU cycle you throw at them).

    Customers will buy SGI equipment if SGI can show they offer a value proposition that is worth the premium over mainstream machines, whether it is memory latency, quality engineering, coolness factor or whateever, people will buy (oh and getting their manufacturing/distribution process to be more efficient would help a lot). Computers are becoming so prevalent that the only distinguishing feature nowadays for PCs is image and lifestyle (does the color clash with the decor :-) ).

    Reasonable people must expect that SGI goes Chapter 11 RSN (barring a government bailout) and then what happens to people who need supercomputers?
    Would you say Apple devotees are unreasonable? Don't you understand that given a planet of 5 billion odd people, not everyone is interested in the toys you are? Cries of doom and gloom have always been around in any industry in one form or another as it gives paper pushers a reason to justify their existance instead of getting their hands dirty coding or designing. You have to realise that SGI serves a fairly specialised market (data intensive, high-end graphics, scientific back-end grunt machines) in the 50K-50M range. Much like Porsche and BMW cater towards a cliental that wants absolute performance and not cheap consumer junk (admitedly the Japanese have given the US auto industry a shot in the arm since 80s), there will always be people who appreciate the qualities that SGI offers. Provided SGI can continue to support those companies and not go around trying to push Porshes for people wanting bicyles (amazing how hype can convince people they need a Pentium III to browse the web) at an affordable price, they will survive.

    If you work for a company, you'd realise that the first law is survival which is depedent on their market relevance. SGI will continue so long as their is a demand for their expertise as priced compared with other market alternatives.

    LL
  • by KMSelf ( 361 ) <karsten@linuxmafia.com> on Friday September 24, 1999 @07:20PM (#1660573) Homepage

    Greg Pfister's book is good -- the details are somewhat dated, though the conceptual portion appears to be aging well.

    Distributed net has a page with references for other texts [distributed.net] on clustering. `Course, you can always check out the related book purchases links at Amazon.

  • by mattdm ( 1931 ) on Friday September 24, 1999 @07:21PM (#1660574) Homepage
    You're right in that clustering is not anything new -- one of the best implementations is in Digital's OpenVMS, which is pretty old-school if you're counting in internet years.

    But clustering is very different from the examples you give. It's not running different services on different machines. It is taking a bunch of machines and making them act as one.

    Beowulf-style clusters are one way of doing this, but there's a limit to how many nodes you can connect that way and still get performance increases. It scales up, but probably not to thousands of nodes. Now, the LCC people obviously haven't built anything to prove that they can do better, but it sounds like they may have a theoretical improvement.

    And, it's only hinted in the article ("satisifies both commercial data processing and HPC requirements"), but it's possible also that this technology is not only fast, but unlike Beowulf also provides improved robustness.

    This is all vapor now of course. But we'll see. The people working on this have some important projects to their credit.

    --

  • how does ninja differ from jini?
  • is about the "unreliability" of Linux because, according to Mr. Dvorak, Linux can't run IRC servers !!

    I will find you the url to John C. Dvorak's article if you want it.
  • It's obvious that you never worked with VMS,
    The Operating Slashdot Would Be Running On
    If Unix Weren't Around[TM]. You can put a
    VMS cluster behind a single IP address and then
    just throw machines at the cluster at will. On another cluster, you have a single logical Oracle or Rdb database instance and do the same - scale by throwing machines at that cluster. IMVHO, it's way superior to what the Unix guys provide at the moment (said the guy who had a VMS cluster running in his attic for years :-)).


    I do remember, though, that a number of features from VMS clusters were implemented by special hardware: multi-hosted hard drives (DSSI) that could participate as a voting member of the cluster, boxes with cluster-wide shared memory, etcetera. I'm interested to see how they work around that (I assume they restrict themselves to software).

  • There's a bit of info at his homepage [bitmover.com] and resume [bitmover.com]. I think you might have found your sophisticated know-how.

  • In a thread about the clustering cabal, someone
    apparently confused the BitKeeper licensing model
    with the clustering stuff. The two are seperate,
    BitKeeper has it's own license, and as far as I
    know 100% of the work we are doing for clustering
    is GPLed, not even LGPLed, straight GPL.

    That said, I'll respond to some of the innaccurate
    summaries of the BK license:

    a) you can't take bits of the product and use it.

    That's basically true, you have to ask first. But
    for chunks that don't compete with BK directly,
    we'll happily free them. The most obvious one
    is the mmaped / anonymous DBM lib we wrote and
    that will be released under the GPL. A somewhat
    different version of the same code is in the
    process of being released under the GPL by SGI,
    or so I've been told.

    b) You can't redistribute the product.
    That's just plain false. You absolutely can
    redistribute it, without fee. However, if you
    modified it, it has to pass our extensive
    regression test.

    Etc. Since the BK license isn't complete yet,
    I'd thank people like Anonymous Coward to wait
    and see. We're trying to be good guys and make
    a living. Since we did all the work, we get to
    choose how we make that living. But we are
    definitely committed to letting people who are
    working on public projects to use this for free,
    and we will try to be accomodating to people at
    research institutions (hey, Nat) who want to use
    it but can't afford it.
  • Interesting idea.... It's a mysql database that I'm most concerned with; everything else can be rsync'd once a night or something. I wonder how much of a network load this would generate.
  • Sounds like a possible platform for Slashdot in a few years...

    (actually, although I know something like this could have many far-reaching useful applications, I'd be happy with web sites that aren't susceptible to the slashdot effect. :)
  • by Signal 11 ( 7608 ) on Friday September 24, 1999 @04:57PM (#1660593)
    Umm, not to burst anybody's bubble.. but decentralized computing has been the paradigm for IT for a long time - put your web server on one box, your DNS on another, your mail server on a third (Multiply the number by 4 if you are running NT...), etc.

    Clustering isn't ground-breaking technology.. it's been around for a long time. Now, the concept of parallel processing has been around for a long time too... and it doesn't seem like many manufacturers are rushing to get their products working on beowulf clusters.

    This isn't to say it isn't a great idea - it's just that there isn't any support for it. There's plenty of alternatives too. For example:

    Webservers: Set up several servers, and an SQL backend (or an NFS mounted partition) to hold the content. For added speed, throw squid over that setup. You can even tell remote caches to access your servers round-robin style by putting in multiple 'A' records.

    DNS/mail: Heh. Even the IETF got this one right by suggesting primary and secondary DNS.

    Filesharing: There is some work being done to create a 'real' beowulf cluster to create something of a decentralized logical file server. For now, use AFS or CODA.. which have all kinds of cool performance benefits. As an aside - both are a helluva lot more stable than the Nightmare File System (NFS).

    Printing: They have affordable net appliances to do this (HP print server anyone?), and even some printers support direct access. Failing that, setting up multiple servers for multiple printers works pretty well - This is decentralized by design anyway...

    So there you have it... all the staples of the corporate network - "clusterized". New technology? I don't think so. All the examples I gave you are in wide use (and have been for some time!).

    --

  • by Matt Welsh ( 11289 ) on Friday September 24, 1999 @04:59PM (#1660594) Homepage
    I would love to see a whitepaper on this. I have spoken a couple of times with Stephen Tweedie about his ideas, and he certainly has a lot of experience (he worked on VMS clusters for a while). However there are many smart people all over the world working on this same set of problems -- Microsoft, IBM, Oracle, Compaq, etc. all spring to mind. A large number of university research projects are working on things that most commercial vendors aren't even thinking of yet -- my own research project [berkeley.edu] at Berkeley being one of them.

    For those who want some background on the important issues, I highly recommend Gregory Pfister's book In Search of Clusters [fatbrain.com] . Clustering is a lot harder than most people realize, and people should not ignore the work that's been done before in this area. The important question for LCC is what is fundamentally new in their design. I doubt that the lack of kernel locks is really it.

    The thing that remains to be seen is what set of applications they target, and what tradeoffs they make to support those applications. The fundamental issues in clustering have been addressed by a large number of research projects and products, and I'd like to know what's new about LCC.

    That being said, I'm happy that some smart people are going after this problem!

  • Isnt this the same as Linux HA http://apps.freshmeat.net/homepage/911156316/ project ?
    or eddie http://apps.freshmeat.net/download/924568847/ ?
  • by Anonymous Coward
    Over all some nice points but read the whole article. Might be possible, but there is more to it, see the bottom of the article. This is not going to be Linux anymore but some special bred beast in the end.

    One thing though, given the amount of raw CPU power and throughput required now and in the future it is great to read something like this. It is something one company alone cannot keep up with.

  • by LL ( 20038 ) on Friday September 24, 1999 @05:38PM (#1660597)
    As Matt Welsh noted, it is not exactly a trivial problem. If you look very closely at the article, the LCC wants to occupy a happy ground between the share-nothing crowd (Microsoft, Tandem) and the share-everything (Oracle). The share nothing pardigm is rather simplistic in its approach and reflects the fact that throwing together a bunch of machines with a cheap interconnect is a comparatively straight-forward re-engineering approach. The share-everything come froms the extension of shared-bus architectures (e.g. Sun Starfire) which enforces a multiple lock strategy. Companies like SGI have thrown million of R&D dollars into the middle-ground which is why their cc-NUMA architecture and cellular IRIX is quite popular. I wish the LCC luck but there is a reason why a successful working solution is expensive as it requires a savvy combination of hardware+software+smart routing (the SGI solution uses a cache directory). You are effectively paying for some very sophisticated know-how as part of every SGI machine.

    Given the direction that SGI is heading (Linux for entry-level&apps + IRIX kernel extensions for high-end) I would wonder whether the LCC would produce anything practical in a realistic time-frame. This is not to decry their laudable efforts and I would hope businesses are patient enough to wait for robust and cheap solutions. If nothing else, it will hopefully offer a shardardised set of software extensions (a la OpenMP [openmp.org]) and coding practices so that a single source tree can support 1 to n processors.

    Who knows, they might be able to come up with a few tricks that the pros have missed.

    LL
  • This is what could get large corporations (read: trend setters) interested in Linux servers... since Linux occupies mostly the small-business server market, it needs to expand to the large-business server market and workstation market... Many companies have been getting Linux workstation projects up and running, but large-business server farms/clusters are definitely one of Linux's weak points right now.
  • clustering combines disk+cpu usually..which is why most clustering systems have dfs or similar type filesystems. Note that beowulf clusters combine CPU (and maybe disks via DFS) but look like seperate machines (i.e. you have to use PVM/MPI), MOSIX clusters use a single-system-image (i.e. it looks like 1 machine) and share CPU only. Web clusters combine several cpu's and usually 1 disk image (not DFS). DFS systems such as AFS use cells w/o using CPU. These are general..YMMV.
  • I'm stunned... That was absolutely the most vague article I've read for quite some time, and given the usual vagueness of popular press, that's something ! Do they pay tax on publishing facts ? Or are they heading for a Guiness record ?

    So some well known people are somewhat involved in some project that has a three-letter-acronym for Linux [buzzword] Cluster [buzzword] Cabal [you need three words to make a TLA].

    What are they aiming for ? Is development going on at all ? Do they have _any_ goals yet, except to make this cluster stuff and put the rest of the cluster stuff projects to sleep ?

    If anyone knows more than was put in that [cough] ``article'', I'd be delighted to know about it.

    Or, perhaps it will get posted once they get their record...
  • Could someone explain the key differences between
    clustering and distributed file systems?

    As far as I know, clustering combines cpu, while dfs combines disk space. I may be wrong, so please correct that assumption if I am.

    Are there any other differences?
  • Can someone explain to me why beowolf clusters wouldn't do what businesses are wanting? I see Drovak's comments, this article, etc. but isn't beowolf clustering for linux?
  • This is all so deja vu. Does anyone know if they are talking about a single system image style cluster (something like the old Locus TCF/TNC) where the cluster just looks like a big system as far as users and apps go ?? In a past life I worked on such as system which supported hundreds of nodes. Instead of Linux, we used the Mach 3.0 microkernel from CMU and a user mode Unix server. Scalability and availability will be big challenges.
  • Hmm. I'm sort of surprised to see that Peter Braam's mentioned as the head of the Coda project. I bet Satya's even more surprised, though. See, he's actually the head of the Coda group. It wouldn't have been hard for ZDnet to figure this out; it says so right on the Coda group's web page.

    I've been seeing mentions of Braam as "head of the Coda project" and "the man who created Coda" a lot recently, and it's starting to get annoying. Does nobody do any fact checking anymore?
  • Does anybody know much about the "Business Public License" that he talks about on his homepage?
  • beowulf is mainly used for combining several cpus with shared message passing for number crunching. different clusters have different uses (i.e. MOSIX clusters, disk farms, web clusters (Eddie/LinuxHA) etc etc)
  • by Anonymous Shepherd ( 17338 ) on Friday September 24, 1999 @08:39PM (#1660610) Homepage
    Of course look at Apple three years ago:

    Licensing of clones, the Newton, the eMate, etc. They were losing major money/resources and got rid of people. Their CEO left, and everyone thought they were going to die.

    They rehired Steve Jobs, trimmed their products down to their core strengths, and are now worth more than they ever have been before.

    So SGI, but spinning off and properly marketing their strengths(without tying them down to SGI) such as MIPS and Cray and their VisualPC stations, while focusing on their Irix high end supercomputing, and Linux on their low end desktop workstations, gives them a reasonable future. If they can focus on their core strengths and not waver or get distracted...

    It's a perfect chance to buy their stock at 11 and (hopefully) see it go to 40!


    -AS
  • Given the apparent endorsement and possibly connections with the DOE (given the quote in the ZDNET article, this is another sign of the end of the supercomputing business. Look at all the roadkill: Thinking Machines KSR Ncube - still exists does video servers I guess. Cray Computer Intel Supercomputer Systems Division (now defunct.) Convex - Bought by HP Cray Research - Bought by SGI Seems IBM is the only really viable player anymore. When I was with Intel SSD, it was obvious that the government was making it really hard to make a profit in that business.
  • by Anonymous Coward
    People seem to forget that TurboLinux (formerly Pacific HiTech) has been developing a cluster product, which is still in beta, see the page here [turbolinux.com]. Unfortunately, if you read the FAQ, only the kernel patch is GPL'd, the monitor application is going to be released under something called a "TurboLinux Software Licence" without source. Oh, well.
  • Note that SGI is showing all the signs of entering the death throes stage. Another 30% of the workforce laid off, abandoning major initiatives, CEO bailing (to MS!!), loss of faith by major customers. Reasonable people must expect that SGI goes Chapter 11 RSN (barring a government bailout) and then what happens to people who need supercomputers? Buy them from the Japanese? I don't think so.
  • Sorry, I didn't realize that it would compact my list down to one line.
  • by Anonymous Coward
    I know TurboLinux is working on real clustering and looks like it's doing a good job - but this is a different ball game.

    It's a pity the the article doesn't have more detail - my reading is that it's a statement of intent for now

    Leave them alone and let's see what they can come up with.

  • by Bruce Perens ( 3872 ) <bruce@perens.com> on Friday September 24, 1999 @07:01PM (#1660616) Homepage Journal
    I need a replicator for the Zope database. I think Digital Creations is working on a closed-source one for their support-option customers, but of course a Free Software one would be nice to have for the rest of us. Essentially, I'd like to put Zope servers in colocation sites that are distant from each other, and have all of them to have a local copy of a common Zope database, propogating updates to the database to each other, and resynchronizing with each other after a network partition event.

    I'd also be interested in hearing about any Free Software databases that can do this sort of synchronization. Thanks

    Bruce

  • It [clustering] is taking a bunch of machines and making them act as one... Beowulf-style clusters are one way of doing this...

    AFAIK, Beowulf is not really a general clusting solution. Beowulf is more concerned with parallel processing then general clustering. PP takes a problem and breaks it down into many small pieces, distributes those pieces to a bunch of nodes, sets them working, and then collects the result. Your application has to be written specifically for Beowulf, and each node is distinct.

    General clustering is, as you say, making many machines appear as one, not only to the outside world, but to the processes running in the cluster. Ideally, a cluster is no different from a single machine. In practice, it gets a little more complex (your applications typically need to be cluster-aware), but from a user POV, it should appear, roughly, as "one big machine".

We are each entitled to our own opinion, but no one is entitled to his own facts. -- Patrick Moynihan

Working...