Red Hat & AMD Demo Live VM Migration Across CPU Vendors 134
An anonymous reader notes an Inquirer story reporting on something of a breakthrough in virtual machine management — a demonstration (not yet a product) of migrating a running virtual machine across CPUs from different vendors (video here). "Red Hat and AMD have just done the so called impossible, and demonstrated VM live migration across CPU architectures. Not only that, they have demonstrated it across CPU vendors, potentially commoditizing server processors. This is quite a feat. Only a few months ago during VMworld, Intel and VMware claimed that this was impossible. Judging by an initial response, VMware is quite irked by this KVM accomplishment and they are pointing to stability concerns. This sound like scaremongering to me ... All the interesting controversy aside, cross-vendor migration is [obviously] a good thing for customers because it avoids platform lock-in."
Umm... (Score:3, Interesting)
The fact to highlight is that the migration was done of a live VM without disrupting the VM's operations.
check the graphs... (Score:5, Interesting)
Stability issues are justified (Score:5, Interesting)
The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).
We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993
We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault.
If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course.
Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.
Re:Stability issues are justified (Score:5, Interesting)
Declaration: VMware support engineering here, but speaking strictly on my own behalf.
The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).
KVM goes to great lengths to by default, mask out CPUID features that aren't supported across common platforms. You have to opt-in to those features since they limit a machine's migrate-ability.
However, I won't say this is always safe. In reality, you really don't want to live migrate between anything but identical platforms (including identical processor revisions).
x86 OSes often rely on the TSC for time keeping. If you migrate between different steppings of the same processor even, the TSC calibration that the OS has done is wrong and your time keeping will start to fail. You'll either get really bad drift or potentially see time go backwards (causing a deadlock).
If you're doing a one time migration, it probably won't matter but if you plan on migrating very rapidly (for load balancing or something), I would take a very conservative approach to platform compatibility.
Re:This is still unreleased test demo's (Score:4, Interesting)
This is like blowing the engine in a Ford and electing to put a Chevy engine in to replace it.
While still driving down the highway at 60 mph.
Re:This is still unreleased test demo's (Score:5, Interesting)
Re:check the graphs... (Score:1, Interesting)
(1) It didn't seem clear to me how many VM's each box was running. Could very well be that the Shanghai box was already doing quite a bit before the migration.
(2) There's a reason Shanghai isn't available yet.
(3) There's a reason this live migration stuff isn't available yet. Could very well be that the migration (at the moment) causes additional overhead.
I'm not trying to justify AMD here per se. It's just there's no where near enough information to make any real conclusions what so ever. This may not say anything bad about AMD which AMD would have wanted to cover up.
Re:Still x86 only (Score:3, Interesting)
Re:Stability issues are justified (Score:4, Interesting)
The TSC is an optional clock source. You can use other things (ACPI, HPET) but the problem is that they're relatively expensive to access.
The kernel people have been complaining literally for multiple years that x86 needs a system-wide clocksource that is cheap to access (and presumably hypervisor-friendly). So far AMD and Intel haven't bothered to provide one.
Re:check the graphs... (Score:3, Interesting)
Re:This is still unreleased test demo's (Score:3, Interesting)
Between different vendors is actually quite hard. Live migration requires saving the CPU state exactly, including all registers. Going to a different vendors CPU means all this saved state may not match up and then you have to do something so the VM won't just crash. This is actually becoming _harder_ as more and more virtualization technology is being put into the CPU silicon (Intel VT, AMD-V etc). Each new series has a few more features to make virtualization simpler, and you have to deal with making sure what was available to the VM on one CPU is identical to whats available on the new CPU without destroying performance (which is what will happen if you start emulating).
In saying that, VMWare are very very VERY careful with the tech they introduce, to give you an example round robin network teaming is still "experimental". I'm fairly sure they have played with this internally already and not done it either because it would make support harder or because of the changing CPU landscape with regard to the integrated virtualization features on new CPU's they would need to release a new version for each new CPU release for this to continue working.
Make no mistake, this is big news for KVM and well done to them, but if they can make it work reliably so can anyone else, and that includes VMWare
Re:Stability issues are justified (Score:3, Interesting)
BUT there is no good alternative that's:
1) Cheap
2) Fast
3) Available on most platforms
I find it quite amazing actually that the CPU manufacturers add all those features, and yet after so many years there is still no good standard way to "get time", despite lots of programs needing to do it.