Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Red Hat & AMD Demo Live VM Migration Across CPU Vendors

Posted by kdawson on Fri Nov 07, 2008 11:14 AM
from the dude-where's-my-virtualization-business dept.
An anonymous reader notes an Inquirer story reporting on something of a breakthrough in virtual machine management — a demonstration (not yet a product) of migrating a running virtual machine across CPUs from different vendors (video here). "Red Hat and AMD have just done the so called impossible, and demonstrated VM live migration across CPU architectures. Not only that, they have demonstrated it across CPU vendors, potentially commoditizing server processors. This is quite a feat. Only a few months ago during VMworld, Intel and VMware claimed that this was impossible. Judging by an initial response, VMware is quite irked by this KVM accomplishment and they are pointing to stability concerns. This sound like scaremongering to me ... All the interesting controversy aside, cross-vendor migration is [obviously] a good thing for customers because it avoids platform lock-in."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • Bravo! (Score:4, Funny)

    by Cornwallis (1188489) on Friday November 07 2008, @11:16AM (#25676515)
    I love to see things like this that give me a greater freedom to migrate off the major players.
  • The real beauty of this will come when the system automatically moves VMs to machines in case of hardware problems or when a system is underutilized. It would let you power down servers during non-peak times and save oodles of cash.

    • Well, that kinda *is* the purpose of live VM migration... it's already being done, just not between systems with different processor types.

      • by TheRaven64 (641858) on Friday November 07 2008, @12:57PM (#25677599) Homepage Journal

        They don't seem to have released many details of this. Migrating between x86-with-SSE and x86-without-SSE, for example, is pretty simple - you just need the OS or hypervisor to trap the illegal instruction exception and emulate. Migrating from x86 to x86-64 is pretty easy too - you just don't get any advantages from the 64-bit chip. Going the other way is really hard, and would need the hypervisor to trap the enter-64-bit-mode instruction and emulate everything until the mode was exited (difficult, slow, and probably pointless).

        I read TFA when it first came out and couldn't work out exactly what they were claiming was novel. Migrating between very-slightly-different flavours of x86 is not really that hard. Migrating between ARM and x86 would be incredibly hard - Xen can actually do this with the P2E work (not sure if it ever made it in to trunk), which migrated a VM from real hardware in to QEMU but, again, that's not an ideal solution unless the emulator has traps that userspace can use - for example a Java VM might get a signal after migration, flush its code caches, and re-JIT as x86 code instead of ARM.

        • Re: (Score:3, Interesting)

          Between different vendors is actually quite hard. Live migration requires saving the CPU state exactly, including all registers. Going to a different vendors CPU means all this saved state may not match up and then you have to do something so the VM won't just crash. This is actually becoming _harder_ as more and more virtualization technology is being put into the CPU silicon (Intel VT, AMD-V etc). Each new series has a few more features to make virtualization simpler, and you have to deal with making sure

    • by Comatose51 (687974) on Friday November 07 2008, @12:11PM (#25677133) Homepage
      You mean like VMware's VMotion, HA, and DRS functionalities?
      • Re: (Score:3, Insightful)

        Yes, except without requiring identical hardware.
        • by nabsltd (1313397) on Friday November 07 2008, @02:11PM (#25678987)

          VMware doesn't require "identical" hardware to do live migration, either.

          It does have to be similar enough, which at this point pretty much means just the same processor manufacturer. As long as the processor supports the hardware virtualization, then VMware will allow you to set up a cluster that will allow live migration with no issues.

  • Umm... (Score:3, Interesting)

    by frodo from middle ea (602941) on Friday November 07 2008, @11:28AM (#25676647) Homepage
    All the interesting controversy aside, cross vendor migration is [obviously] a good thing for customers because it avoids platform lock-in Well almost all VM products barring VirtualPC do indeed supoort running the same VM image on across various vendor platforms, in fact that is the whole point of a VM , isn't it ?

    The fact to highlight is that the migration was done of a live VM without disrupting the VM's operations.

    • It's not a matter of it RUNNING on multiple platforms. The issue here is live migration. Moving a running VM from one machine to another without skipping a beat. On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly.

      • Re: (Score:3, Informative)

        On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly

        Do you? I first saw Xen demo live migration in 2005, and I don't think it was new then. Their demo had a Quake server being thrown around a cluster without clients noticing. Downtime was well under 100ms. You can read the paper [cam.ac.uk] for more information.

        They were claiming that you can move between processor types, but they didn't specify how much different they could be. If it's just a matter of SSE or 3DNow! support disappearing then that's not a hard problem - just trap-and-emulate any of the old instruc

          • Re: (Score:3, Informative)

            And, when you think about it, any instruction that you would have to trap if the VM used to be running on a different processor must be trapped at all times.

            This is because you have no way of knowing which processor type the VM was first started on. When this happened, it's likely the OS did some hardware checking and figured out which instructions it could (and could not) use. Moving the VM isn't going to change what the OS believes is the processor, and that's the problem.

            Overall, VMware's Enhanced VMot

  • by stabe (1133453) on Friday November 07 2008, @11:29AM (#25676659)
    Xen supports this feature since Xen 3.3, it is called CPUID: http://www.nabble.com/Xen-3.3-News:-3.3.0-release-available!-td19106008.html [nabble.com] No real breakthrough here...
  • Still x86 only (Score:4, Insightful)

    by boner (27505) on Friday November 07 2008, @11:33AM (#25676715)

    Real magic would have been demonstrating a move between ANY processor architecture - Power, SPARC, x86_64 etc..

    Between x86 processors is nice, but not unexpected.

    • No problem! Just run x86 linux under qemu on all physical platforms, then run your applications under x86 linux inside a kvm inside qemu with migration between the qemu instances on each physical system!

    • That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?

      Seems like this would work between processors with a very similar ISA [wikipedia.org].

      If they could run stuff compiled for one processor on another processor with a different ISA at near full speed,... that would change more than just virtualization. Run Wine on a PowerPC, emulate old consoles easily on a Pandora [openpandora.org], etc..

      • That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?

        Most definitely. At that point, you're emulating, not virtualizing.
      • Re: (Score:3, Interesting)

        Depends. Modern emulators can run at around 50% of the host platform speed. If your guest is paravirtualised then all of the privileged instructions will be run in the hypervisor. If you're running a JIT in the guest then you can poke it to flush its code caches and start emitting native code for the new architecture, but even if you aren't then migrating the VM from the 200MHz ARM chip in your cell phone to the quad-core 4GHz x86 chip connected to your TV might be interesting.
  • by Anonymous Coward on Friday November 07 2008, @11:39AM (#25676773)

    Open source is for morons.

    Only Apple has the engineering know-how and skills to pull of something like this. The fact that they have not done so to date is a clear indication that it is impossible.

  • check the graphs... (Score:5, Interesting)

    by alta (1263) on Friday November 07 2008, @11:41AM (#25676801) Homepage Journal
    Go to 4:05 in the video. On the far left, you can see from the blue intel line that the guest is running there, then they migrate, and the blue line goes to the idle point, and the orange line starts taking the load. But NOTICE, the AMD line is consistantly higher than the intel line was. I'm no intel fanboy... or AMD. I have both intel and amd servers in my racks. I just thought it was interesting, and I'm surprised they let the video go out like that.
    • Hehe, I checked the same thing. :)

      To be fair, the performance of playing a HD video is pretty much determined by your graphics card. It's not really the best CPU benchmark you could imagine. :)

      • GPUs have nothing to do with video decoding, it's handled 100% by the CPU. At least until we get a software that can reliably take advantage of the relatively recent introduction of h264 decoding on some high-end GPUs.
    • I can't watch the video right now, so I'm assuming the graph is processor utilization?

      Could it possibly be because the AMD processor is running some kind of instruction translation, communication layer, or something like that?

    • Re: (Score:3, Informative)

      It didn't seem that interesting to me. If you watch the video, the Intel and Barcelona machines showed no VM's running (0% load). When the Shanghai server took over the load, *of course* it's load line will rise -- it's the only server running a VM at that point!

      There are no shenanigans going on here, and I don't think this says anything about the chips as you imply, either.

    • Well, duh, thet can run their Core 2 @ 4.5GHz on stock air cooling, silly!

      Shanghai can still be faster clock for clock as they promised ;)

      Seriously now, a CPU % utilization of a VM running WMP is no indication of anything.

  • by mnmn (145599) on Friday November 07 2008, @11:48AM (#25676901) Homepage
    Declaration: VMware support engineering here, but speaking strictly on my own behalf.

    The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).

    We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993

    We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault.

    If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course.

    Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.
    • by Anthony Liguori (820979) on Friday November 07 2008, @11:59AM (#25677029) Homepage

      Declaration: VMware support engineering here, but speaking strictly on my own behalf.

      The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).

      KVM goes to great lengths to by default, mask out CPUID features that aren't supported across common platforms. You have to opt-in to those features since they limit a machine's migrate-ability.

      However, I won't say this is always safe. In reality, you really don't want to live migrate between anything but identical platforms (including identical processor revisions).

      x86 OSes often rely on the TSC for time keeping. If you migrate between different steppings of the same processor even, the TSC calibration that the OS has done is wrong and your time keeping will start to fail. You'll either get really bad drift or potentially see time go backwards (causing a deadlock).

      If you're doing a one time migration, it probably won't matter but if you plan on migrating very rapidly (for load balancing or something), I would take a very conservative approach to platform compatibility.

      • Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?

      • by kscguru (551278) on Friday November 07 2008, @01:09PM (#25677777)
        Yet Another VMware engineer here.

        The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code still uses CPUID as a serializing instruction that this has noticeable overhead.

        So there are two strategies. Pass directly through the CPUID bits (and on the newest processors, apply a mask), or remember a baseline value, trap-and-emulate every CPUID and always return that value. Sounds like KVM has picked the latter approach for a default; VMware's default is to expose the actual processor features and accept a mask as an optional override, which skews towards exposing more features at the expense of some compatibility. Equally valid choices, IMHO.

        The Worst Case Scenario when not doing a trap-and-emulate of every CPUID is an app that does CPUID, reads the vendor string, then decides based on the vendor string which other CPUID leafs to read. (Like the 0x80000000 leafs, which are vendor-specific and would come back as gibberish if you get the processor wrong). If the app migrates during the dozen or so instructions between the first CPUID and the following ones, instant corruption. Good enough for a pretty demo, destined to make a guest kernel die a few times a year if actually used in production. And I'm 95% sure this is what the OP demo is doing - living dangerously by hoping mismatched CPUID results never get noticed.

        I agree with Anthony Liguori here - on a production machine, an Intel/AMD migration is way too much of a stupid risk. All you have to do is reboot the VM, it's much safer.

        (As a side note to everyone reading, the reason Linux timekeeping is such a problem is that TSC issue. Intel long ago stated TSC was NOT supposed to be used as a timesource. Linux kernel folks ignored the warning, made non-virtualizable assumptions, and today are in a world of hurt for timekeeping in a VM. And only now, many years later, are patching the kernel to detect hypervisors to work around the problem.)

        • by Chirs (87576) on Friday November 07 2008, @01:39PM (#25678391)

          The TSC is an optional clock source. You can use other things (ACPI, HPET) but the problem is that they're relatively expensive to access.

          The kernel people have been complaining literally for multiple years that x86 needs a system-wide clocksource that is cheap to access (and presumably hypervisor-friendly). So far AMD and Intel haven't bothered to provide one.

        • Re: (Score:3, Interesting)

          Yes you're not supposed to use TSC.

          BUT there is no good alternative that's:
          1) Cheap
          2) Fast
          3) Available on most platforms

          I find it quite amazing actually that the CPU manufacturers add all those features, and yet after so many years there is still no good standard way to "get time", despite lots of programs needing to do it.
        • The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code

    • VMWare have more stability worries than this on their plate. I've just upgraded Fusion on the Mac to version 2 and it's still very unstable. First use the guest OS locked up, forcing me to reboot the host so I could try again, only to find that, like with Fusion 1.1, the Mac hangs on shutdown. *sigh*

  • The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed). Nor how it cna be impossible - while the x86 has many extensions, it's still a well-specified architecture with specific behaviors.

    The real trick is if an application is using features not present on the other architecture - e.g., an AMD virtual machine migra

    • The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).

      Erm... actually, if you watch the video, you will see that the "live" migration is actually live - the VM is not suspended, it is kept running and active through the migration.

      • Re: (Score:3, Informative)

        Actually, it is suspended, but only for a fraction of a second. First you copy the entire contents of memory to the new machine and mark it as read-only. Each page fault caused by this is used to mark pages that are still dirty. Then you copy these. You keep repeating this process until the set of dirty pages is very small. Then you suspend the VM, copy the dirty pages, and start the VM on the new machine. Userspace programs will just notice that they went an unusually long time without their scheduli
    • I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).

      You just completely missed the point. The VM was not suspended, moved, and resumed. It was moved live. The VM never stopped doing its thing. It was up, running, and servicing requests the whole time.

      ...which isn't terribly amazing. I know VMWare can do that now. The big deal is apparently that it moved from one CPU vendor to another. I didn't realize this was so tricky... I kind of figured that x86 was x86 regardless of vendor. Obviously, I was wrong.

  • After all, all x86 are the same. MMX extensions get emulated on AMD, Linux distro's run on both processors without recompiling, the kernel handles calls and most likely an Apache server is not going to call the special media extensions. It would be interesting to see this happen in an environment that has been optimized and is using certain incompatible extensions (like 3DNow!) eg. a computing cluster.

    If you abstract enough and emulate a processor you should even be able to move between architectures but th

  • by Anthony Liguori (820979) on Friday November 07 2008, @12:01PM (#25677055) Homepage

    FWIW, KVM live migration has been capable of this for a long time now.

    KVM actually supported live migration of Windows guest long before Xen did. If you haven't given KVM a try, you should!

  • It's worth noting that VMware have been a huge contribution to the Linux-society, giving corps a very good reason (â$£) to migrate, thus including important pawns in the future of Linux. I for one believe that VMware was wrong, but that it's an honest mistake. There's no use in poking on VMware for this one, hopefully they'll help lift the technology even higher along with their competitors.

    You've lost this round VMware, but the match isn't over yet!
    • Re: (Score:3, Insightful)

      by Anonymous Coward
      so easy that you did it yourself three years ago, right?
    • This was done between different vendors, not altogether different architectures. That would demand emulation beneath the virtualization, on at least one machine -- not likely to happen any time soon.