Slashdot is powered by your submissions, so send in your scoop


Forgot your password?
Linux Software

Mosix 1.0 Released 77

Mosix is a scalable clustering system for Linux, released under the GPL. Version 1.0 for the 2.4 kernels is now available.
This discussion has been archived. No new comments can be posted.

Mosix 1.0 Released

Comments Filter:
  • by Anonymous Coward

    The way I understand this, is that the porch tool only does half of the work : porch only prepare a given software source to be used in a heterogenous cluster.
    You still need the heterogenous cluster managing software (in this case, the part that would handle the portable checkpoints - transfer them from one node to another).
    Anyway, the porch tool (or something equivalent) seems necessary.

  • by Anonymous Coward
    Some possibilities:
    • That node has a faster processor than the others.
    • The other nodes have clustered together to pick on that node.
    • The other nodes are just too damn lazy.
  • by Anonymous Coward
    Mosix monitors the rate of I/O and if necessary migrates the process to the node that has the disk.
  • by Anonymous Coward
    Now we are at it I would really like to be able to customize my account, so that I could ignore all the score points that was given because the post was funny. On the other hand I would like to give all the flames a positive score (Not that I flame a lot myself, I just like reading them). These changes should be easy to make. They may require a little more computerpower, but if they code it the right way, I think they will manage
  • by Anonymous Coward
    Yeah, but if they had called it MUSIX, the RIAA would probably be after them.
  • by Anonymous Coward
    Do faster machines produce insights faster? If you can run several simulations in parallel you can map out a multi dimensional parameter space much faster, so the answer is yes. Computational Physicists can never have enough computing power.

    Even with the machinery we've got now you can't do, for instance, first principles simulations of the growth of semiconductor structures.

  • In order to migrate processes across platforms, you'd need something like porch []. It makes portable checkpoints.
  • I'd like to try it out on my 16 node Alpha cluster. I'm surprised that they called it 1.0 with only x86.
  • I find it hard to believe that there is that much arch specific code. If it's not already, the arch-specific part (should be small) should be easy to write for each arcitecture. I worry that being too x86-centric at the begining could lead to a bad design. Anyway, I have not actually looked at the code, so these are idle comments...
  • Convolo [] addresses some of these concerns.
  • Another alternitive to having just an add/subtract moderation system where it sorts by total, is to do it by average.

    Actualy, you could do it in two dimentions; up/down being good/bad and left/right being ontopic/offtopic, and then slash could render a little map showing where the post falls!.

  • Then again, how many apps never check the return value of malloc and just expect the OS to go down if the system runs out of memory anyway?

    As an anal-retentive programmer, I do always check my mallocs for NULL and my news for std::bad_alloc. Whether this is actually valuable is debatable, however. On Linux, malloc() does NOT fail under memory pressure because Linux does not commit the malloc'd page until it is actually faulted in. So your malloc() will return a non-NULL pointer as a weak promise to find an available page later. When you actually touch the page and Linux can't find a free page, it will simply kill your process. :-( There has been some debate on the linux-kernel mailing list about the value of such behaviour.

    This is a similar problem to Linux's fsync() which can be tricked by some hard disk caches claiming to have written the data safely to disk.. and then crashing with the data still in cache. :-(
  • by cpeterso ( 19082 )

    btw, this is the same problem that early RPC systems faced. Programmers don't typically expect a procedure call to fail mysteriously, but this is exactly what can happen when your blocking RPC call can't reach the other server.
  • it's the nearly-new (few weeks old) and little-heralded Developers section, just a little different color scheme, the same way the YRO section has a different color scheme. Expect to see things like programming / conferences / infrastructure in this section :) Cheers, timothy
  • So the best I can make out of your comment (and the website for that matter) is that it is a piece of software that takes the fuss out of loadbalancing over a number of linux machines?

    Then what about the performance of this stuff?

    And is the design of this software interesting enough to support it?

    Is anyone really using it?
  • To your opinion, is a High Availability cluster a paralell system?

    By HA cluster I mean a set of computers bundled together to build up redundancy to protect the application(s) running on it against hardware failure.

    I think this protection gets more important the bigger you build your paralell systems. What would the value of a paralell system be if one of it's million parts brings it down completly?

    To take this drift even further, I think that for large dimension paralell systems there is no way avoiding the NUMA approach. Performance is just one of the problems you'll face when building a paralell system, even though it may be the most important reason for starting to build them.

    In the end it all comes down to scalability and the cost of expanding.
  • From the information on the site I'm not clear what the intention of this software is. Is it:

    a. a way to create a huge compute node (as in Beowulf cluster)

    or is it:

    b. a way to create a scalable and robust service node (as in a google cluster)?

    I think it could be quite interesting to use this software, but to get a large userbase (and thus robust and well debugged software) it seems to me that there should be a well defined goal.

    I'm not trying to post a troll, but I'm trying to find out if:
    - this software would make hardware failures have less inpact on running programs like database servers or large simulation jobs?

    - this software has a greater impact on performance than MPI alike solutions

    - this software has security impact of any sort?

    Just my 2 eurocents ;)
  • What happens if a procss who need to read from the harddisk migrate to an other noede??

    You could ofcause let the kernel handle forward all request but that might cause insane amouts of network trafic.

    So my question is? Can this cluster solution with reason be used with a webservers/sql server or is this only ment to be used for programs who don't read much from the harddisk?

  • That's true, but that means the whole process (all threads) must migrate. Unless you are runnings lots of processes (each having lots of threads), you don't gain much from this model.
  • What's with the Microsoft blue in the header?
  • Threshold -1, Highest scores first, start at the bottom.

  • And what's most interesting about this clever strategy is that it formed naturally, in an almost anarchic fashion. Eventually we will see whether the best of man's business planning is any match for the forces of nature.

    I just wish that I could see it all in fast motion, whereas the reality is that I probably won't even notice. "Microsoft? What's that?"

  • Why do the Linux community constantly play 'catch up' with the closed-source community.

    That's a brilliant strategy for world domination. Linux does not need to be innovative or profitable. It just needs to stick around and stay in the game. Sooner or later another niche will erupt -- like the internet -- and Linux will be there to explode into it and overwhelm it -- as long as it stays in the game.

    The bar for Linux's survival is much lower than is the bar for Microsoft, for example. Microsoft needs to perpetually remain profitable. Also they need to keep growing to be a more attractive investment to stock-holders than a savings account.

    Linux just needs to hang out in prowl until the times comes for it to dominate.

    Milk, it does a body good.

  • Neither. It does not suport any kind of distributed memory(suposedly they are working on this) and a process can only be on one node at a time (no magic parallelization(sp?)). So it doesnt quite fall into the Beowulf class. Also it does not have any kind of fault tallerance at the moment. What it is good for (from what I have read, Ive never used it) is making efficient use of a remote cluster. ie lets say you have something like google (except with higher cpu utilization) with a dynamic load, this would be perfect.
  • I could find an application that would play nice with it. MySQL, PostgreSQL, dnet, Q3A map complies ...

    ** sigh **
    until (succeed) try { again(); }
  • master controller with all software and hd space
    boot from floppy for all mosix child nodes
    child nodes are diskLESS machines btw
    Let primary do the disk storage
    let mosix cluster do the work :)

    i know it has some flaws, but that's my solution... for my home "system"
  • was EVEN distribution. Did anyone else notice that this one machine seems to be working harder than the others []?

  • Hey, anyone know if any single disk distros support mosix or is it small enough to just recompile and add a startup script to the disk? At school, all of the stupid teacers are microsoft trained(EVIL), but after the sixth period bell rings most of the machines boot linux. We run around to all the machines and set them to recieve the latest multicasted HD image among other things. If the machines boot linux from the hard drive all the "microsoft monkeys" run around screeming, but we regularly boot it from floppies when we actually neeed to use a computer or mess with the network. A single disk to run around with after school so we could get large jobs done would be great!
  • Beowulf (distributed parallel computing environment), while an interesting and useful technology, has nothing whatsoever to do with clustering!

    Agreed. Beowulf is really more of an array computing model than stuff that is traditionally referred to as cluster-computing.

    Back in olden times-- circa 1995, I think-- SGI had a set of software and an API for their Challenge servers that allowed customers to configure them as arrays. I can't remember the marketing-name of the product, but I do remember that parallel HIPPI was the preferred interconnect (still damn fast, six years later) and some customers had arrays consisting of a great many 36-processor nodes. Pretty cool product. Very similar in concept to a Beowulf cluster using gigabit-e or Myrinet as the interconnect.

  • I don't know how to moderate this - it's interesting and offtopic in the same time. Please take time to "Ask slashdot" about moderation - then your post would be (+5, Interesting).

    BTW, Einstein would use "No Score +1 Bonus" while posting about bad weather. And so do I.

  • Your link is broken because of the space /. puts in there. Here [] is the correct one.
  • The past month on Slashdot, a disportionate number of posts have been marked +5 Interesting. In the past, +5 Interesting has been reserved for especially well written and clued in posts. Slashdot needs to change either the number of people receiving moderation points or increase the maximum a post can be rated to 10. If this were to take effect, I could simply read the 5-6 best written or insightful comments instead of the posts people feel they have to waste their mod points on.

    (just my drunken rambling)
    Well, I don't know about you, but I've been getting mod points like every two or three days lately. I think The Gods (TM) decided that moderation would work better if everone always got to moderate. Think of it this way: moderation USED to work pretty well on slashdot, and that was when you only had mod points occasionally! Why, if you give EVERYONE mod points ALWAYS, it's bound to work even BETTER, no? The more moderation, the more thoroughly moderated the discussion. And thoroughly moderated discussions are Good Things.
    The other possibility I can think of is that within the past month a shipload of fake accounts came into fruition (you know how you can't moderate for a long time while your account is "new"?) I've always thought it would be cool to set up some scripts to generate a few hundred accounts and to actually make them intelligently enough spoofed so that slashdot can't tell they're not people. Then, as soon as they can start to moderate, five to ten are guaranteed to be moderators at any given time and then I can find some old threads of mine no one will look at and moderate them up to hell. Of course, usually this thirty-second daydream ends with the thought: "and then what? get fifty karma? Dude, you're such a loser!" :) (Especially because I already have 22 and that's just because I'm lazy. It really doesn't take a lot to get mod points. One easy way is to go to the less-visited parts, like ask slashdot, and just spend fifteen minutes with google and in formulating a very obvious opinion that you can linkage to death. You're guaranteeed to get +5. At least the three times I did it. Gets boring after awhile.)
    Anyway, even though the Karma isn't worth it for me, nor the power especially of being able to bitch-slap whoever, I bet some other people did that a few months back and we're starting to see the fruit of their loins, or something.

    Off-topic: the quote on the bottom of my page says now, "There is one way to find out if a man is honest -- ask him. If he says "Yes" you know he is crooked. -- Groucho Marx". A more mathematical way to arrive at a (more guaranteed-correct) answer is to ask the man whether he WOULD say he's honest IF you asked him. Then his answer is in fact the truth. :)

  • You could switch back to OpenVMS. Good distributed file system, really nice distributed lock manager - not as fast, propietary as hell but it works well enough for serious applications.

    Note this is also my peeve about WinClusters (NT or 2K), the job was half done.

  • The idea first struck me as quite odd -- you like reading flames. I personally do not, but, the more I thought about it, the more I loved your idea. Why not enable people to search for posts with a specific karma description? You know, for the days when you need a little humor; sometimes I just read scanning for "funny" posts. Of course, the system could allow for people to search for anything, so one could conceivably look for "flamebait" posts, or even "offtopic" posts.
  • Read about it.There is a bunch-o-hardware dependant stuff that goes on when you are doing task switching. It isn't as simple as a recompile. This is a MAJOR undertaking to get it to a different platform. Bus widths are different, cache sizes are different, register sizes are different, number of registers, etc. are all different.

    I just like to see a 1.0 for anything Linux. It means that it is a mature product with a really stable API. This is the opposite of most commercial software manufacturers that I have a dealing with. These guys can't get a stable release after 5 or 6 releases. I hate that.
  • >That's a brilliant strategy for world domination. >Linux does not need to be innovative or profitable.

    You're absolutely right! That's why innovative companies like Microsoft will never achieve world domination, and ...

    micje eyes himself warily...

  • ..had the yarbles [] to try Mosix out yet?
  • ..ah, never mind.

  • on the topic of clustering, i was noticing that noone seams to be making programs that run on differnt platforms at the same time and share processes. is this just imposible, or does no one care? or do i have to become a master admin before i can even think about something like this?
  • actually, acording to the FAQ on their website:

    Most JAVA VMs use shared memory, and thus can not migrate. Try using a "green threads" VM.

    but i don't know what a "green threads VM" is. in theroy java apps would be great for clustering, no?
  • will this compile on FreeBSD or OSX? Judging from the docs, no... but is there a mosix for other oses?
  • Well, the reality is that most of the sort of major technological developments in the computer world were pioneered, not in BSD or Linux or Windows or SunOS or anything else, but in academia or major research labs. I mean, the researchers might have been working at HP, but they weren't working on the OS. A lot of this stuff had the original research done twenty years ago, as is true with most kinds of technology.

    The folks writing operating systems, or software intended for the public, are usually trying to take the stuff coming out of academia and optimize it, get it relatively bug-free, package it up in a reasonably friendly fashion, things like that. Linux developers are certainly not always the first to do this, but then again, given that there are eight or ten OSes people are developing for, we wouldn't expect them to be.

    Nor is Linux necessarily the best environment to pioneer for. If I'm the first guy trying to write a clustering system fit for ditributing to the world, I may well want to do it on a simpler sort of system- something written with the idea of being very stable and well-organized, say, like NetBSD, as opposed to something written with the idea of being very hardware-supporting and practical, like Linux. That doesn't mean that 'NetBSD is doing the innvoation' and 'Linux is copying it later'. The 'Linux community' isn't some sort of absolute, to which you have either given your soul or have no part in.

    To the extent that the 'Linux community' does exist, it has given a great many things to the larger IT community. That it is often focused on writing open-source or portable things that have existed in more closed form before, does not particularly seem to me like a mark against the importance of its efforts.

  • Thanks for the link. I had not seen this, and found it very interesting.

    This is a very primative form of clustering (share-nothing due to the lack of distributed lock manager, simple failover only, two node limit) similar to Microsoft's so-called clustering solution. A step up, but still very limited...

    I guess I'm just spoiled by OpenVMS clustering. When I set up our Exchange 2000 clustered back-end servers, I constantly found myself astonished by the primative capabilities of Microsoft's clustering technologies. "What do you mean more than one node can't control this disk at the same time? That's a breeze with MSCP disk sharing, and has been for a decade and a half. Can't have hundreds of nodes separated by hundreds of miles for disaster-tolerance? Give me a break..."

  • Not only that, this guy is so damn arrogant. He's implying that you cannot possibly have a job where you have what he mentioned.
  • Can i please have the other one?
  • by Anonymous Coward
    A beowulf cluster of mosix clusters? :)
  • There was a volume of the Springer Lecture Notes on Computer Science about MOSIX that came out in the early 90s; I remember reading it as an undergrad and finding it quite interesting. At the time MOSIX was built on top of BSD. It basically added support to the kernel for automatically migrating processes from one machine to another. It's more complicated than just copying the code + data segments, mostly because of open files. Inside the kernel a machine ID was added to the data structures for file descriptors, and I/O requests could be forwarded to the appropriate machine, so a process that gets migrated still has access to any files it had open.

    That's the basic idea, suitable for spreading out CPU-heavy tasks that share nothing over a bunch of machines. It looks like they're working on extensions such as migratable sockets which would make it suitable for applications that require sharing or communication.

  • and it installed and ran very cleanly (once I learned what I was doing ;). Used a pentium 133, pentium 200 and a 486/66. Just made a simple empty loop C program to test and it worked as expected. Start a process on the 133 and it was migrated to the 200. Start 2 processes and one would get migrated to the 200, showing up in 'mon' as a load of '2', and the other would run on the 133 as a load of '3'. I had to launch 6 processes before the 486 kicked in with a load of '12' running one process, with the other 5 distributed with 3 on the 200 (6) and 2 on the 133 (also 6). Next I got and tried the parallel make MPMake. A kernel compile using MPMake w/ switch -j3 on the cluster took 18 minutes, while on the 200 alone it took 12 :)) Obviously I need more/faster boxes to make the cost of load balancing worthwhile but it basically works as advertised.
  • The first generation of most new technologies sucketh badly. (Reliability, scalability, flexibility, extensiblity...)


  • Sun and HP can do transparent process of the user context of a process between machines? Can you point to any more information on it? I've never heard of that, other than with Mosix.

    This is not 'beowulf' clustering... this is not parallel tasking.. this is having portions of processes automatically migrate to other machines in a cluseter based on memory/cycle availability.

    This is not rubbish. Mosix has been around for a while, but it's great to see version 1.0

  • What's with the Microsoft blue in the header?


    I thought it was the light-blue C from the cover of the original Kernighan and Ritchee language manual.
  • I thought it was the light-blue C from the cover of the original Kernighan and Ritchee language manual.

    On second thought it IS a bit dark for that, at least on my current montior.
  • ...the posts people feel they have to waste their mod points on.

    Do people actually feel compelled to moderate? I frequently don't. I can take it or leave it. Maybe something in the FAQ could address this (tho then you have to get people to RTFF).

  • One of my roommates describes MOSIX as "SMP writ large", and that's essentially true. The people behind MOSIX describe it as a "fork and forget" server. Basically, it divides processes amongst nodes the same way SMP under linux divides processes amongst CPUs. Except that with MOSIX, you can make provisions for some nodes being faster than others (x86 mobos would barf with multiple different CPUs in them). So, MOSIX is great for CPU-intesive stuff that can be forked (LAME, gcc come to mind).

    For a "web cluster", you want something like this:

    This is a combination of load balancing and high availability. Machine A load-balances web traffic between machines C,D,E, and F. Machine B monitors machine A, and takes over for it if it goes down for more than 4 seconds. They've got various algorithms for load balancing.

    Sotto la panca, la capra crepa
  • Imagine in the not too distant future (say 10 years most) when gigabit ether is considered slow, that computers will automatically cluster, sending threads to other computers anonomously, and runing off idle time. The internet will have a collective mind, and of course programmers will be forced to improve their own minds by having to prioritize threads so as to keep the important crap running locally. Now if only I could easily cluster with FreeBSD...

    Roy Miller
    :wq! DOH!
  • Interesting, this morning I was just thinking up a letter about the same issue to write to the slashdot admins. Heh.

    I do think that the cap on moderation of a post should be upped, maybe not all the way to 10 though.
  • Of course, by the time we have that much bandwidth, we will probably have computers running at 20 GigaHertz (if that is physically possible, and then again, probably if it is impossible, too). So their is a good chance we won't need to export processes to Brazil across 12 foot wide optical cables.

  • It has an O in it. Now I have to update my regular expression for unix-related stuff.
  • No, Beowulf is not a clustering technology. Abuse of the word "clustering" in the context Beowulf is one of my biggest pet peeves. Beowulf is distributed parallelized computing. Mosix at least vaguely resembles true clustering, but still no distributed lock manager, and no true hardware-level device sharing (as opposed to file sharing via NFS)...
  • by volsung ( 378 ) <> on Saturday May 05, 2001 @02:37PM (#243992)
    A "green threads VM" is a Java Virtual Machine that does not use the native operating system threading mechanisms to implement the Java threads. Instead, the JVM acts like one single threaded process to the operating system and threads the Java app internally. Green threads VMs are generally slower (cuz the OS can manage task switch more efficiently), but since they are just a single process, they can migrate to a new box without any problems.
  • Why do you feel compelled to post on a subject correcting other people's ignorance and yet being so profoundly ignorant yourself?

    Repeat after me "THERE ARE MANY KINDS OF CLUSTERS". Again...again...again....

    Now we will play a game: match the clustering technology description to a popular name. Match the letter to the number.

    A. Message passing clusters used primarily for low bandwidth parralelel computation.
    B. Load balanced single protocol network clustering.
    C. Hardware takeover / hardware redudency for Hi-Availability clustering.
    D. Load balanced, homogeneous platform, with process migration clustering.

    1. Veritas Cluster Server with Sun Multipath IO devices.
    2. Arrowpoint-type web load balancer.
    4. Mosix.
    3. Beowulf.

    For extra credit:
    Is the above listing of clustering technologies comprehensive? [Y]es [N]o

    Answers available from those with a clue after class.

  • by landley ( 9786 ) on Saturday May 05, 2001 @12:53PM (#243994) Homepage
    The moderation range is saturated. No doubt about that. The obvious solution is increasing the moderation range (possibly all the way to ten, for future growth), but first we've got to get Slash HQ to acknowledge that there is a problem.

    There used to be a moderation category that was "just the best, most pithy synopses of the dicussion". Now that can easiy be 30 posts, and reading them doesn't fit in 3 minute "while this compiles" break anymore.

    Part of it is that there's more posters these days, and more moderators, and the top 5% of 50 posts is a lot smaller than the top 5% of 500 posts.

    Part of it is the automatic +1 of posters with a history of good karma. This is a good thing, but it reduces by 25% the range that can only be reached by active moderation. (The original moderation range of 2-5 has been reduced to 3-5. You used to be able to read at 2 and filter out the stuff that hadn't been voluntarily moderated up at least once. That's no longer the case, and even Einstein wasn't ALWAYS worth listening to. Sometimes he was just ordering breakfast, or complaining about the weather.)

    Zero used to be a penalty for posting as an anonymous coward (since the troll ratio there was higher). 1 was standard. 2 being experienced poster who generally has somethng to say, that's meaningfull. This is a good heuristic for a starting position, but there's not enough room to go up fromt here, the system is swamped.

    Slashdot has outgrown that range, even WITHOUT raising the floor. More marginal opinions less universally approved of (and less central to the topic) now reach the top category, because they have more opportunities to be moderated up. 5% of the viewership can easily spend 5 moderation points now.

    perhaps we can go to a moderation percentage system? "Show me just the top 5% of posts"? Or sort them by popularity and give me the top fifteen...

    It's an interesting problem.


  • >Repeat after me "THERE ARE MANY KINDS OF
    >CLUSTERS". Again...again...again....

    >Now we will play a game: match the clustering
    >technology description to a popular name. Match
    >the letter to the number.

    Berries come in clusters. Stars come in clusters. Military rank insignia come in clusters...

    Californians... No wait, this is a family oriented area.


    (Austinite. They move here and can't drive, so we get to make fun of them.)

  • by landley ( 9786 ) on Saturday May 05, 2001 @10:39PM (#243996) Homepage
    >SMP and NUMA are different problems because they
    >have different failure characteristics.

    It's a question of what problems you want to address. It's entirely possible to have multitasking multiuser operating systems without virtual memory. (Just about every 1970's era unix before the Vax, actually.)

    Doesn't make the problem fundamentally different, just that there's more cases to cover. Do you always check for a non-null return from your mallocs, or do you just say "the system should just never run out of memory"?

    >As far as I know, an SMP operating system
    >assumes that, if CPU #2 was there just a moment
    >ago, it will still be there.

    Three words: Hot pluggable hardware.

    And yes, they're talking about adding that capability to the Linux kernel in 2.5. (Although the current patch has a /proc entry to switch the appropriate processors of and on before just yanking them. Then again, PCMCIA proves you can do it without manual notification since you get several miliseconds of warning, which is ages to the computer...)

    >What happens when your operating system needs to
    >fault in a page, but your distributed VM manager
    >lost network contact with your other server(s)?

    Well, when did this (no local hard drive, it swapped through the network to the server in the back room), its response was to die spectacularly (sunOS didn't blue screen, it white screened). This is not a new problem.

    Then again, how many apps never check the return value of malloc and just expect the OS to go down if the system runs out of memory anyway?

    If you were really swapping through the network (despite hard drives being cheap they ARE failure-prone moving parts), I'd say use distributed redundant swap devices and treat them like RAID 5 so you can loose one and recover the data? Also avoids network bottlenecks. But then you're eating network bandwidth needlessly, which is usually your limiting factor. (Then again, you page fault all sorts of other stuff through the network anyway in a shared memory config, it wouldn't so much be swapping as a larger distributed memory management system.)

    It's an open question on the best way to go. Performance vs reliability is often a tradeoff. But there are PLENTY of different options.

    >How can the operating system handle this error
    >gracefully? Or politely warn the userspace
    >application? :-(

    How does RAID 5 do it today? (Let's see, SMART disks, battery packed up power supplies notifying of failure, hot pluggable hardware... It'll probably all get molded together someday into pseudo-coherent infrastructure of dynamic system status.)

    The most graceful thing for the OS to do may just be to suspend the app and save off its state until it can continue. It depends. As I said, there are a lot of options.


  • by cpeterso ( 19082 ) on Saturday May 05, 2001 @07:39PM (#243997) Homepage

    SMP and NUMA are different problems because they have different failure characteristics. In distributed programming, you often must expect network failure to be a common occurence and handle those errors gracefully. As far as I know, an SMP operating system assumes that, if CPU #2 was there just a moment ago, it will still be there.

    What happens when your operating system needs to fault in a page, but your distributed VM manager lost network contact with your other server(s)? How can the operating system handle this error gracefully? Or politely warn the userspace application? :-(
  • by loony ( 37622 ) on Saturday May 05, 2001 @08:42AM (#243998)
    that it would go in the official linux kernel...


    It's a great chance that Linux doesn't only play catch up with Windows or other flavors of Unix - it can take the leader ship and give you the ability to create clusters using the tools in the standard distribution!
  • by De ( 39631 ) on Saturday May 05, 2001 @08:56AM (#243999)
    I've been trying to run this for the last few days, and I've gotten so many kernel oopses that I've had to revert back to a standard kernel. YMMV
  • by BierGuzzl ( 92635 ) on Saturday May 05, 2001 @11:22AM (#244000)
    Unlike other paralell processing environments, mosix is a solution that can in many cases be put to work with just a few changes in the init scripts and a kernel recompile -- no applications or libs need be changed, and things like web servers can take advantage of it right out of the box.
  • by gavrie ( 201912 ) on Saturday May 05, 2001 @12:03PM (#244001)
    On the issue of true device sharing:

    There is a possibility of using MOSIX together with GFS [] (which gives true device sharing) so that you don't need to use something like NFS. This way, a migrated process will be able to access the device directly, without needing to go through its home node.

    AFAIK, this option is still not production-level, though.

  • by Anonymous Coward on Saturday May 05, 2001 @08:57AM (#244002)
    [OT] deluge of overrated posts (Score:5, Interesting)

    the irony is thick.

  • by Anonymous Coward on Saturday May 05, 2001 @08:38AM (#244003)
    I'm in an academic department that does a veritable sh*tload of computations, and we've been using it for nearly a year to load-balance a bunch of P2-350s. It's great, makes those machines feel "loved", and keeps people's research on progress. It's great when you can break up your problem/program into a bunch of smaller ones. But it's not perfect, and as with all clustering solutions, doesn't do the hard work (algorithm parallelizing) by itself. However, it is going to form the backbone of our dept system, replacing a pair of big-iron Sun servers...
  • by MSG ( 12810 ) on Saturday May 05, 2001 @12:19PM (#244004)
    It should be noted that Linux takes longer to switch tasks on the PIII if it was compiled to support SSE2 instructions. Five dollars says that FreeBSD doesn't support them. Perhaps those benchmarks would have showed different results if the kernel had been built without SSE2 support? 05 -03-007-20-NW-KN
  • by AiX2 ( 90563 ) on Saturday May 05, 2001 @08:49AM (#244005) Homepage
    The past month on Slashdot, a disportionate number of posts have been marked +5 Interesting. In the past, +5 Interesting has been reserved for especially well written and clued in posts. Slashdot needs to change either the number of people receiving moderation points or increase the maximum a post can be rated to 10. If this were to take effect, I could simply read the 5-6 best written or insightful comments instead of the posts people feel they have to waste their mod points on.

    (just my drunken rambling)

  • by janpod66 ( 323734 ) on Saturday May 05, 2001 @09:18AM (#244006)
    The Mosix research group [] has been working on clustering for many years:

    So far MOSIX was developed 7 times, for different versions of UNIX and architectures. It has been used as a production system for many years. The first PC version was developed for BSD/OS. The latest version is for Linux on X86/Pentium/AMD platforms.

    Yes, they did start out basing their system on proprietary kernels, then they moved to BSD, then to Linux. The current work is not about the basic idea anymore, moving processes around somehow, but about things like distributed virtual memory, distributed file systems, and migration strategies.

    This isn't "playing catch-up", it is cutting edge research by the people who did the original work moving to the BSD and Linux platforms because they are more widely available, are better supported, are easier to license and share, and have more software available for them.

  • My comments were not rooted in ignorance, but rather an intimate familiarity with the technology developed by the DIGITAL engineers who INVENTED clustering, back in the days before the term was diluted by use in situations that have no relationship to the original application of the term.

    • Veritas Cluster Server is actually a clustering technology (only marginally, since, like Microsoft clustering, it employs a shared-nothing model). I have no problem with the application of the term here.
    • Arrowpoint? No. Load balancing is useful in a cluster (and even more useful when applied to all networking protocols, not just HTTP), but load balancing alone does not a cluster make.
    • Mosix (homogeneous process environment) is borderline. It just needs the addition of a few key technologies to qualify. (DLM, device sharing)
    • Beowulf (distributed parallel computing environment), while an interesting and useful technology, has nothing whatsoever to do with clustering! The term is misused here, and that is what annoys me. If you were to classify Beowulf as a clustering technology, you would also have to call every computer participating in SETI@home a single cluster, which it clearly is not...

    And no, that list is far from complete. No mention of HP's clustering technology, Compaq's OpenVMS Clusters, True64 Unix TruClusters, or Tandem NonStop Fault-Tolerant clusters, or Microsoft Clustering (although, again, that is a VERY weak form of clustering, and lacking in several respects).

    Essentially, this is an arguement over semantics, over the definition of the term "cluster". I merely oppose dilution of the meaning by applying it to lesser technologies which have no relationship with the original meaning of the term.

  • by landley ( 9786 ) on Saturday May 05, 2001 @01:20PM (#244008) Homepage
    There are traditionally three different types of paralell processing systems: SMP, NUMA, and networked clusters like Beowulf. In reality, these form a continuous range, with SMP at one end, Beowulf at the other, and NUMA in the middle.

    SMP is Symmetrical Multi-Processing, or one computer with multiple processors just like multiple hard drives, multiple serial ports, or multiple banks of RAM. In an SMP setup, each processor has equal access to the other system resources, and although they may need locking to avoid stomping on each other's activities, it's no more expensive for processor #2 to access a certain resource (such as an area of main memory) than it is for processor #5 to do so. Thus there's no real reason to shuffle processes around to be "closer" to some other resource.

    The other end of the spectrum is message passing networked clustering, like beowulf, where isolated systems (each with its associated set of resources) accept complete tasklets, work on them more or less alone, and output the results. Accessing resources from the rest of the cluster is very expensive, and you try not to do it more than absolutely necessary (once per transaction). A message comes in with all the info a node needs to do its work, and the node sends a message back out with the result and to announce it's ready for the next mouthful.

    NUMA is in between, and it stands for Non-Uniform Memory Architecture. You have a bunch of similar processors, like in SMP, but some resources are "close" to each processor and some are far away.

    Remember, clusters own resources outright, this is my node's memory. On SMP all processors access a pool of shared resources (like main memory) at the same speed (hence symmetrically). On NUMA, processor #53 -CAN- access memory over by processor #1736, but it'll take much longer than if it accesses memory near itself. It'll block, it'll have wait states. (Just like accessing a page swapped to the hard drive vs accessing one in memory.)

    The thing is, as systems on either end become more complex they move towards NUMA. Think mondo SMP systems with dozens of processors, each of which has megabytes of L1 cache. You want to keep stuff "in cache" rather than accessing main memory, and sometimes you wan't to access something that's currently in some other processor's cache. Cache line pollution and such. That's a NUMA type of problem.

    From the other end, once you start connecting beowulf clusters together with really high speed interconnects (like gigbit ethernet or myrinet, and often speed here is more a question of latency than bandwidth,) and start teaching them how to pretend to be one big shared memory image by page faulting through the network, you're approcaching NUMA from the other end. Stuff's in my machine's memory locally right now, and swapping it in from some other guy's memory (and swapping out some of my stuff to make room for it) is something I only want to do when absolutely necessary, because it slows me down.

    MOSIX is taking beowulf clusters in the direction of NUMA. This is a good thing, it makes them more flexible and capable, but it opens up a whole can of worms to optimize it properly. (Not a new can of course, the kernel hackers are already dealing with a rather significant portion of NUMA's issues just trying to get 32 processor alphas to work smoothly.) If the interconnects between clusters were perfect, we could just treat it as one big SMP machine. Then again if our hard drives were as fast as our ram we wouldn't try so hard to minimize swapping, would we? You could still just treat MOSIX as SMP instead of NUMA if you don't want to optimize your performance. And for many things that's a fine solution, just distributing it cross the cluster gives you all the performance you need, and adding nodes is more cost effective than rewriting your app for greater speed in the new environment.

    But performance hits of thrashing all your pages through the network can be just as bad as thrashing them in and out of the swap partition. And performance is the only reason we're using clusters in the first place, isn't it?

    And NUMA optimization just makes maintaining locality of reference, streamlined locking, and minimizing contention for commonly accessed resources even MORE important. It's the same kind of thing you'd do on a normal SMP machine anyway, it just has more of an impact, because there's more inefficiency to optimize away.


Money is truthful. If a man speaks of his honor, make him pay cash. -- Lazarus Long