Slashdot Log In
Red Hat & AMD Demo Live VM Migration Across CPU Vendors
Posted by
kdawson
on Fri Nov 07, 2008 11:14 AM
from the dude-where's-my-virtualization-business dept.
from the dude-where's-my-virtualization-business dept.
An anonymous reader notes an Inquirer story reporting on something of a breakthrough in virtual machine management — a demonstration (not yet a product) of migrating a running virtual machine across CPUs from different vendors (video here). "Red Hat and AMD have just done the so called impossible, and demonstrated VM live migration across CPU architectures. Not only that, they have demonstrated it across CPU vendors, potentially commoditizing server processors. This is quite a feat. Only a few months ago during VMworld, Intel and VMware claimed that this was impossible. Judging by an initial response, VMware is quite irked by this KVM accomplishment and they are pointing to stability concerns. This sound like scaremongering to me ... All the interesting controversy aside, cross-vendor migration is [obviously] a good thing for customers because it avoids platform lock-in."
Related Stories
Submission: Redhat and AMD demo cross CPU vendor VM migration by Anonymous Coward
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Bravo! (Score:4, Funny)
Re:Bravo! (Score:4, Informative)
Parent
Um (Score:2, Insightful)
The VM software vendor becomes "the major player".
As The Who's so insightfully titled song said "Meet the new boss. Same as the old boss."
Re:Um (Score:5, Insightful)
Parent
Re:Um (Score:4, Insightful)
Parent
This is still unreleased test demo's (Score:5, Insightful)
The real beauty of this will come when the system automatically moves VMs to machines in case of hardware problems or when a system is underutilized. It would let you power down servers during non-peak times and save oodles of cash.
Re:This is still unreleased test demo's (Score:4, Insightful)
Well, that kinda *is* the purpose of live VM migration... it's already being done, just not between systems with different processor types.
Parent
Re:This is still unreleased test demo's (Score:5, Insightful)
They don't seem to have released many details of this. Migrating between x86-with-SSE and x86-without-SSE, for example, is pretty simple - you just need the OS or hypervisor to trap the illegal instruction exception and emulate. Migrating from x86 to x86-64 is pretty easy too - you just don't get any advantages from the 64-bit chip. Going the other way is really hard, and would need the hypervisor to trap the enter-64-bit-mode instruction and emulate everything until the mode was exited (difficult, slow, and probably pointless).
I read TFA when it first came out and couldn't work out exactly what they were claiming was novel. Migrating between very-slightly-different flavours of x86 is not really that hard. Migrating between ARM and x86 would be incredibly hard - Xen can actually do this with the P2E work (not sure if it ever made it in to trunk), which migrated a VM from real hardware in to QEMU but, again, that's not an ideal solution unless the emulator has traps that userspace can use - for example a Java VM might get a signal after migration, flush its code caches, and re-JIT as x86 code instead of ARM.
Parent
Re: (Score:3, Interesting)
Between different vendors is actually quite hard. Live migration requires saving the CPU state exactly, including all registers. Going to a different vendors CPU means all this saved state may not match up and then you have to do something so the VM won't just crash. This is actually becoming _harder_ as more and more virtualization technology is being put into the CPU silicon (Intel VT, AMD-V etc). Each new series has a few more features to make virtualization simpler, and you have to deal with making sure
Re:This is still unreleased test demo's (Score:5, Interesting)
Parent
Re: (Score:3, Insightful)
Re:This is still unreleased test demo's (Score:4, Informative)
VMware doesn't require "identical" hardware to do live migration, either.
It does have to be similar enough, which at this point pretty much means just the same processor manufacturer. As long as the processor supports the hardware virtualization, then VMware will allow you to set up a cluster that will allow live migration with no issues.
Parent
Re:This is still unreleased test demo's (Score:4, Interesting)
This is like blowing the engine in a Ford and electing to put a Chevy engine in to replace it.
While still driving down the highway at 60 mph.
Parent
Umm... (Score:3, Interesting)
The fact to highlight is that the migration was done of a live VM without disrupting the VM's operations.
Re: (Score:2)
It's not a matter of it RUNNING on multiple platforms. The issue here is live migration. Moving a running VM from one machine to another without skipping a beat. On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly.
Re: (Score:3, Informative)
On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly
Do you? I first saw Xen demo live migration in 2005, and I don't think it was new then. Their demo had a Quake server being thrown around a cluster without clients noticing. Downtime was well under 100ms. You can read the paper [cam.ac.uk] for more information.
They were claiming that you can move between processor types, but they didn't specify how much different they could be. If it's just a matter of SSE or 3DNow! support disappearing then that's not a hard problem - just trap-and-emulate any of the old instruc
Re: (Score:3, Informative)
And, when you think about it, any instruction that you would have to trap if the VM used to be running on a different processor must be trapped at all times.
This is because you have no way of knowing which processor type the VM was first started on. When this happened, it's likely the OS did some hardware checking and figured out which instructions it could (and could not) use. Moving the VM isn't going to change what the OS believes is the processor, and that's the problem.
Overall, VMware's Enhanced VMot
Xen 3.3 supports this already (Score:3, Informative)
Re: (Score:2)
Xen supports this feature since Xen 3.3, it is called CPUID: http://www.nabble.com/Xen-3.3-News:-3.3.0-release-available!-td19106008.html [nabble.com] No real breakthrough here...
Looks to me like Xen supports migration between different CPU models, not entirely different CPU manufacturers. So yes, there is a breakthrough here.
Re: (Score:2, Informative)
Xen does migration, but not Live... (Score:5, Informative)
This is a demo of a Live migration, no shutdown or reboot involved. Xen does not support the live migration of a running VM between an AMD and Intel server. Watch the video, they are running a video in the VM that keeps playing during the migration. Very impressive stuff.
Parent
Still x86 only (Score:4, Insightful)
Real magic would have been demonstrating a move between ANY processor architecture - Power, SPARC, x86_64 etc..
Between x86 processors is nice, but not unexpected.
Re: (Score:2)
No problem! Just run x86 linux under qemu on all physical platforms, then run your applications under x86 linux inside a kvm inside qemu with migration between the qemu instances on each physical system!
Re: (Score:2)
That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?
Seems like this would work between processors with a very similar ISA [wikipedia.org].
If they could run stuff compiled for one processor on another processor with a different ISA at near full speed,... that would change more than just virtualization. Run Wine on a PowerPC, emulate old consoles easily on a Pandora [openpandora.org], etc..
Re: (Score:2)
Most definitely. At that point, you're emulating, not virtualizing.
Re: (Score:3, Interesting)
This was in all likelyhood faked. (Score:5, Funny)
Open source is for morons.
Only Apple has the engineering know-how and skills to pull of something like this. The fact that they have not done so to date is a clear indication that it is impossible.
check the graphs... (Score:5, Interesting)
Re: (Score:2)
Hehe, I checked the same thing. :)
To be fair, the performance of playing a HD video is pretty much determined by your graphics card. It's not really the best CPU benchmark you could imagine. :)
Re: (Score:3, Interesting)
Re: (Score:2)
I can't watch the video right now, so I'm assuming the graph is processor utilization?
Could it possibly be because the AMD processor is running some kind of instruction translation, communication layer, or something like that?
Re: (Score:3, Informative)
It didn't seem that interesting to me. If you watch the video, the Intel and Barcelona machines showed no VM's running (0% load). When the Shanghai server took over the load, *of course* it's load line will rise -- it's the only server running a VM at that point!
There are no shenanigans going on here, and I don't think this says anything about the chips as you imply, either.
Re: (Score:2)
Well, duh, thet can run their Core 2 @ 4.5GHz on stock air cooling, silly!
Shanghai can still be faster clock for clock as they promised ;)
Seriously now, a CPU % utilization of a VM running WMP is no indication of anything.
Stability issues are justified (Score:5, Interesting)
The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).
We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993
We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault.
If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course.
Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.
Re:Stability issues are justified (Score:5, Interesting)
Declaration: VMware support engineering here, but speaking strictly on my own behalf.
The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).
KVM goes to great lengths to by default, mask out CPUID features that aren't supported across common platforms. You have to opt-in to those features since they limit a machine's migrate-ability.
However, I won't say this is always safe. In reality, you really don't want to live migrate between anything but identical platforms (including identical processor revisions).
x86 OSes often rely on the TSC for time keeping. If you migrate between different steppings of the same processor even, the TSC calibration that the OS has done is wrong and your time keeping will start to fail. You'll either get really bad drift or potentially see time go backwards (causing a deadlock).
If you're doing a one time migration, it probably won't matter but if you plan on migrating very rapidly (for load balancing or something), I would take a very conservative approach to platform compatibility.
Parent
Re: (Score:2)
Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?
Re:Stability issues are justified (Score:5, Informative)
The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code still uses CPUID as a serializing instruction that this has noticeable overhead.
So there are two strategies. Pass directly through the CPUID bits (and on the newest processors, apply a mask), or remember a baseline value, trap-and-emulate every CPUID and always return that value. Sounds like KVM has picked the latter approach for a default; VMware's default is to expose the actual processor features and accept a mask as an optional override, which skews towards exposing more features at the expense of some compatibility. Equally valid choices, IMHO.
The Worst Case Scenario when not doing a trap-and-emulate of every CPUID is an app that does CPUID, reads the vendor string, then decides based on the vendor string which other CPUID leafs to read. (Like the 0x80000000 leafs, which are vendor-specific and would come back as gibberish if you get the processor wrong). If the app migrates during the dozen or so instructions between the first CPUID and the following ones, instant corruption. Good enough for a pretty demo, destined to make a guest kernel die a few times a year if actually used in production. And I'm 95% sure this is what the OP demo is doing - living dangerously by hoping mismatched CPUID results never get noticed.
I agree with Anthony Liguori here - on a production machine, an Intel/AMD migration is way too much of a stupid risk. All you have to do is reboot the VM, it's much safer.
(As a side note to everyone reading, the reason Linux timekeeping is such a problem is that TSC issue. Intel long ago stated TSC was NOT supposed to be used as a timesource. Linux kernel folks ignored the warning, made non-virtualizable assumptions, and today are in a world of hurt for timekeeping in a VM. And only now, many years later, are patching the kernel to detect hypervisors to work around the problem.)
Parent
Re:Stability issues are justified (Score:4, Interesting)
The TSC is an optional clock source. You can use other things (ACPI, HPET) but the problem is that they're relatively expensive to access.
The kernel people have been complaining literally for multiple years that x86 needs a system-wide clocksource that is cheap to access (and presumably hypervisor-friendly). So far AMD and Intel haven't bothered to provide one.
Parent
Re: (Score:3, Interesting)
BUT there is no good alternative that's:
1) Cheap
2) Fast
3) Available on most platforms
I find it quite amazing actually that the CPU manufacturers add all those features, and yet after so many years there is still no good standard way to "get time", despite lots of programs needing to do it.
Re: (Score:3, Informative)
The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code
Re: (Score:2)
VMWare have more stability worries than this on their plate. I've just upgraded Fusion on the Mac to version 2 and it's still very unstable. First use the guest OS locked up, forcing me to reboot the host so I could try again, only to find that, like with Fusion 1.1, the Mac hangs on shutdown. *sigh*
Wasn't this always possible? (Score:2)
The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed). Nor how it cna be impossible - while the x86 has many extensions, it's still a well-specified architecture with specific behaviors.
The real trick is if an application is using features not present on the other architecture - e.g., an AMD virtual machine migra
Re: (Score:2)
The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).
Erm... actually, if you watch the video, you will see that the "live" migration is actually live - the VM is not suspended, it is kept running and active through the migration.
Re: (Score:3, Informative)
Re: (Score:2)
I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).
You just completely missed the point. The VM was not suspended, moved, and resumed. It was moved live. The VM never stopped doing its thing. It was up, running, and servicing requests the whole time.
Doesn't surprise me (Score:2)
After all, all x86 are the same. MMX extensions get emulated on AMD, Linux distro's run on both processors without recompiling, the kernel handles calls and most likely an Apache server is not going to call the special media extensions. It would be interesting to see this happen in an environment that has been optimized and is using certain incompatible extensions (like 3DNow!) eg. a computing cluster.
If you abstract enough and emulate a processor you should even be able to move between architectures but th
Not quite a break through (Score:3, Insightful)
FWIW, KVM live migration has been capable of this for a long time now.
KVM actually supported live migration of Windows guest long before Xen did. If you haven't given KVM a try, you should!
Creds anyway (Score:2, Insightful)
You've lost this round VMware, but the match isn't over yet!
Re: (Score:3, Insightful)
Re: (Score:2)