Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Red Hat & AMD Demo Live VM Migration Across CPU Vendors 134

Posted by kdawson on Friday November 07, 2008 @11:14AM from the dude-where's-my-virtualization-business dept.

An anonymous reader notes an Inquirer story reporting on something of a breakthrough in virtual machine management — a demonstration (not yet a product) of migrating a running virtual machine across CPUs from different vendors (video here). "Red Hat and AMD have just done the so called impossible, and demonstrated VM live migration across CPU architectures. Not only that, they have demonstrated it across CPU vendors, potentially commoditizing server processors. This is quite a feat. Only a few months ago during VMworld, Intel and VMware claimed that this was impossible. Judging by an initial response, VMware is quite irked by this KVM accomplishment and they are pointing to stability concerns. This sound like scaremongering to me ... All the interesting controversy aside, cross-vendor migration is [obviously] a good thing for customers because it avoids platform lock-in."

This discussion has been archived. No new comments can be posted.

Red Hat & AMD Demo Live VM Migration Across CPU Vendors

Load All Comments

Search 134 Comments Log In/Create an Account

Comments Filter:

Comment removed (Score:4, Funny)

by account_deleted ( 4530225 ) writes: on Friday November 07, 2008 @11:16AM (#25676515)

Comment removed based on user account deletion

Share
twitter facebook
- - Re: (Score:1)
    
    by harry666t ( 1062422 ) writes:
    
    OK, sure
    
    <aims a gun at greenhuey's head>
    
    You know, it certainly doesn't bother me that I don't have the source code for this gun.
  - Re: (Score:1, Informative)
    
    by Anonymous Coward writes:
    
    With Obama at the helm, you may not have guns to protect your liberty, so death is more likely. ;)
- Re:Bravo! (Score:4, Informative)
  
  by 2names ( 531755 ) writes: on Friday November 07, 2008 @11:45AM (#25676849)
  
  We have certainly come a long way when a Cornwallis supports freedom of the people. :)
  
  Parent Share
  twitter facebook
- Um (Score:2, Insightful)
  
  by Colin Smith ( 2679 ) writes:
  
  The VM software vendor becomes "the major player".
  As The Who's so insightfully titled song said "Meet the new boss. Same as the old boss."
  - Re:Um (Score:5, Insightful)
    
    by Korin43 ( 881732 ) writes: on Friday November 07, 2008 @02:04PM (#25678859) Homepage
    
    Except they're doing it with KVM [wikipedia.org], which is open source..
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Bearhouse ( 1034238 ) writes:
      
      Damn. Please someone re-write that Wiki entry to make it more friendly for our non-tech friends. It starts...
      "Kernel-based Virtual Machine (KVM) is a Linux kernel virtualization infrastructure. KVM currently supports native virtualization using Intel VT or AMD-V. Limited support for paravirtualization is also available for Linux guests and Windows in the form of a paravirtual network driver[1], a balloon driver to affect operation of the guest virtual memory manager[2], and CPU optimization for Linux gues
      - Re:Um (Score:4, Insightful)
        
        by abdulla ( 523920 ) writes: on Friday November 07, 2008 @07:07PM (#25683547)
        
        Why should it be dumbed down? I don't go reading Biology articles and expect to know everything. That's why there are links to other articles explaining each bit in more detail.
        
        Parent Share
        twitter facebook
This is still unreleased test demo's (Score:5, Insightful)

by Beached ( 52204 ) writes: on Friday November 07, 2008 @11:21AM (#25676569) Homepage

The real beauty of this will come when the system automatically moves VMs to machines in case of hardware problems or when a system is underutilized. It would let you power down servers during non-peak times and save oodles of cash.

Share
twitter facebook
- Re:This is still unreleased test demo's (Score:4, Insightful)
  
  by Hercynium ( 237328 ) writes: <Hercynium AT gmail DOT com> on Friday November 07, 2008 @11:31AM (#25676685) Homepage Journal
  
  Well, that kinda *is* the purpose of live VM migration... it's already being done, just not between systems with different processor types.
  
  Parent Share
  twitter facebook
  - Re:This is still unreleased test demo's (Score:5, Insightful)
    
    by TheRaven64 ( 641858 ) writes: on Friday November 07, 2008 @12:57PM (#25677599) Journal
    
    They don't seem to have released many details of this. Migrating between x86-with-SSE and x86-without-SSE, for example, is pretty simple - you just need the OS or hypervisor to trap the illegal instruction exception and emulate. Migrating from x86 to x86-64 is pretty easy too - you just don't get any advantages from the 64-bit chip. Going the other way is really hard, and would need the hypervisor to trap the enter-64-bit-mode instruction and emulate everything until the mode was exited (difficult, slow, and probably pointless).
    I read TFA when it first came out and couldn't work out exactly what they were claiming was novel. Migrating between very-slightly-different flavours of x86 is not really that hard. Migrating between ARM and x86 would be incredibly hard - Xen can actually do this with the P2E work (not sure if it ever made it in to trunk), which migrated a VM from real hardware in to QEMU but, again, that's not an ideal solution unless the emulator has traps that userspace can use - for example a Java VM might get a signal after migration, flush its code caches, and re-JIT as x86 code instead of ARM.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Interesting)
      
      by sirsnork ( 530512 ) writes:
      
      Between different vendors is actually quite hard. Live migration requires saving the CPU state exactly, including all registers. Going to a different vendors CPU means all this saved state may not match up and then you have to do something so the VM won't just crash. This is actually becoming _harder_ as more and more virtualization technology is being put into the CPU silicon (Intel VT, AMD-V etc). Each new series has a few more features to make virtualization simpler, and you have to deal with making sure
      - Re: (Score:1)
        
        by LarsG ( 31008 ) writes:
        
        This is actually becoming _harder_ as more and more virtualization technology is being put into the CPU silicon (Intel VT, AMD-V etc). Each new series has a few more features to make virtualization simpler, and you have to deal with making sure what was available to the VM on one CPU is identical to whats available on the new
        I must admit that I'm not quite up to date on the details, but isn't the VT/AMD-V changes only visible to the hypervisor (ring -1)? It might make moving VM state harder (the hypervisor has to handle migrating AMD-V state to the equivalent VT state), but it should be invisible to the guest OS running inside the VM.
        I would suspect that Guest OS visible changes (ring 0-3) would be harder to handle (like migrating from a CPU that has SSE2 to one that doesn't). The hypervisor would either have to trap and emulat
    - Re: (Score:1)
      
      by ampman ( 91479 ) writes:
      
      As posted before, it seems to have been done along time ago see: http://www.byte.com/art/9407/sec6/art1.htm
    - Re: (Score:2)
      
      by Chris Snook ( 872473 ) writes:
      
      The really cool thing is the hardware support for masking CPUID calls from guests, so you don't have to emulate them in the hypervisor, which the VMware people in this thread have pointed out adds measurable overhead on some workloads. This lets you present a generic x86_64 CPU to the guest, which will run most non-HPC enterprise apps just fine. SSE2 is a mandatory part of the x86_64 instruction set, so all x86_64 processors will be able to get decently optimized math that can be live-migrated between dif
- Re:This is still unreleased test demo's (Score:5, Interesting)
  
  by Comatose51 ( 687974 ) writes: on Friday November 07, 2008 @12:11PM (#25677133) Homepage
  
  You mean like VMware's VMotion, HA, and DRS functionalities?
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by JEB_eWEEK ( 549975 ) * writes:
    
    Yes, except without requiring identical hardware.
    - Re:This is still unreleased test demo's (Score:4, Informative)
      
      by nabsltd ( 1313397 ) writes: on Friday November 07, 2008 @02:11PM (#25678987)
      
      VMware doesn't require "identical" hardware to do live migration, either.
      It does have to be similar enough, which at this point pretty much means just the same processor manufacturer. As long as the processor supports the hardware virtualization, then VMware will allow you to set up a cluster that will allow live migration with no issues.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by cheater512 ( 783349 ) writes:
        
        Not AMD to Intel and vice versa, thats the real breakthough.
      - Re: (Score:2)
        
        by JamesTRexx ( 675890 ) writes:
        
        just the same processor manufacturer
        
        And even that's not true. We had to buy an older server type because even if we bought a newer server with -in this case- AMD cpus it wouldn't mix with the others.
        The CPU has to be functionally the same, otherwise you'll have to resort to cpu masking [google.nl].
        
        Re: (Score:2)
        
        by nabsltd ( 1313397 ) writes:
        
        Google for "Enhanced VMotion Compatibility" and you'll see that you don't have to do any manual work to allow every processor that supports hardware virtualization to participate in your cluster.
        The only caveats are that you can't have any VMs running when you enable this feature. This is not a big deal if you enable it when you first create the cluster, and only a problem if you bring a new host with running VMs into the cluster. But, a small amount of planning deals with that, too.
  - Re: (Score:2)
    
    by drachenstern ( 160456 ) writes:
    
    Er yeah, but by a proprietary vendoooorrrr, eh... I see what you did there ;)
    I think the goal is to eventually open-source the concepts, and sell the wrappers. And the support, always sell the support...
    I have to say tho, that I thought the whole point of CPU ISA was to be able to do just this sort of thing. If you're not writing code that absolutely depends on the underlying CPU hardware (why would you, isn't that the point of the kernel) then you should be able to move to any other platform... Okay oka
    - Re: (Score:2)
      
      by AJWM ( 19027 ) writes:
      
      Since Itanium2 will run x86 code sort of natively, going Xeon->Itanium shouldn't be that hard. Migrating a VM that's running IA-64 code to a Xeon could be a little tougher.
  - Re: (Score:1)
    
    by virtualboy ( 1402343 ) writes:
    
    Save your self some money and check out Virtual Iron. It does not require identical hardware.
- - Re:This is still unreleased test demo's (Score:4, Interesting)
    
    by voidptr ( 609 ) writes: on Friday November 07, 2008 @12:00PM (#25677047) Homepage Journal
    
    This is like blowing the engine in a Ford and electing to put a Chevy engine in to replace it.
    While still driving down the highway at 60 mph.
    
    Parent Share
    twitter facebook
Umm... (Score:3, Interesting)

by frodo from middle ea ( 602941 ) writes: on Friday November 07, 2008 @11:28AM (#25676647) Homepage

All the interesting controversy aside, cross vendor migration is [obviously] a good thing for customers because it avoids platform lock-in Well almost all VM products barring VirtualPC do indeed supoort running the same VM image on across various vendor platforms, in fact that is the whole point of a VM , isn't it ?
The fact to highlight is that the migration was done of a live VM without disrupting the VM's operations.

Share
twitter facebook
- Re: (Score:2)
  
  by MBGMorden ( 803437 ) writes:
  
  It's not a matter of it RUNNING on multiple platforms. The issue here is live migration. Moving a running VM from one machine to another without skipping a beat. On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly.
  - Re: (Score:3, Informative)
    
    by TheRaven64 ( 641858 ) writes:
    
    On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly
    
    Do you? I first saw Xen demo live migration in 2005, and I don't think it was new then. Their demo had a Quake server being thrown around a cluster without clients noticing. Downtime was well under 100ms. You can read the paper [cam.ac.uk] for more information.
    They were claiming that you can move between processor types, but they didn't specify how much different they could be. If it's just a matter of SSE or 3DNow! support disappearing then that's not a hard problem - just trap-and-emulate any of the old instruc
    - Re: (Score:2)
      
      by MBGMorden ( 803437 ) writes:
      
      VMotion has been around for quite a while. The specialty here is between different processor types, and it's apparently not as trivial as you state. For one, there are different extensions and such between various processor types. Sure everything can be compiled for i386 and run on anything, but we're talking about arbitrary code that can be running on these VM's. There's a whole lot that can be different beyond their commonality, and if you resort to trapping and emulating all those instructions then y
      - Re: (Score:3, Informative)
        
        by nabsltd ( 1313397 ) writes:
        
        And, when you think about it, any instruction that you would have to trap if the VM used to be running on a different processor must be trapped at all times.
        This is because you have no way of knowing which processor type the VM was first started on. When this happened, it's likely the OS did some hardware checking and figured out which instructions it could (and could not) use. Moving the VM isn't going to change what the OS believes is the processor, and that's the problem.
        Overall, VMware's Enhanced VMot
        
        Re: (Score:2)
        
        by ultranova ( 717540 ) writes:
        
        This is because you have no way of knowing which processor type the VM was first started on.
        
        Why ? Is there any particular reason you can't send this information along with the VM itself ?
        
        Re: (Score:2)
        
        by nabsltd ( 1313397 ) writes:
        
        Not usefully, and even so, you're still likely to take needless performance hits.
        If the VM started running on a Xeon Harpertown and moved to an AMD Santa Rosa, then to a Core 2 Quad Yorkfield, which features should be enabled/disabled in each move?
        
        Re: (Score:2)
        
        by ultranova ( 717540 ) writes:
        
        If the VM started running on a Xeon Harpertown and moved to an AMD Santa Rosa, then to a Core 2 Quad Yorkfield, which features should be enabled/disabled in each move?
        
        After each move, compare the current hosts featureset with the original hosts featureset, and emulate all that are missing.
        Your original claim was: "This is because you have no way of knowing which processor type the VM was first started on." That is clearly untrue, since you can pass the CPUID information of the original processor right alon
        
        Re: (Score:2)
        
        by nabsltd ( 1313397 ) writes:
        
        Passing the CPUID information will give you the identity of the original processor, but not the actual capabilities. You could also pass the whole set of extended CPUID information to know which flavors of SSE are (or aren't) supported, etc., but that still isn't the true capabilities and details of the CPU.
        The reason you can't emulate efficiently is because the matrix of "VM CPU" to "physical CPU" would be large...very large. And, it would include things that would require the hypervisor to trap an instr
      - Re: (Score:1)
        
        by online-shopper ( 159186 ) writes:
        
        Isn't this like what transmeta did, except in software?
    - Re: (Score:1)
      
      by LarsG ( 31008 ) writes:
      
      Relaunching programs that use these will cause the new values of CPUID to be picked up.
      I suspect one could end up with devil in details problems if the guest OS suddenly saw different CPUID values. While it might work fine, the expectation has always been that CPUID won't change after boot-up so you could end up with all sorts of snafus.
      What you could do is have the hypervisor trap CPUID and report a least common dominator set of capabilities for the CPUs in the cluster. Or have CPUID report more capabilities than the weakest/oldest CPUs but have the hypervisor trap and emulate those instruct
Xen 3.3 supports this already (Score:3, Informative)

by stabe ( 1133453 ) writes: on Friday November 07, 2008 @11:29AM (#25676659)

Xen supports this feature since Xen 3.3, it is called CPUID: http://www.nabble.com/Xen-3.3-News:-3.3.0-release-available!-td19106008.html [nabble.com] No real breakthrough here...

Share
twitter facebook
- Re: (Score:2)
  
  by Vendetta ( 85883 ) writes:
  
  Xen supports this feature since Xen 3.3, it is called CPUID: http://www.nabble.com/Xen-3.3-News:-3.3.0-release-available!-td19106008.html [nabble.com] No real breakthrough here...
  Looks to me like Xen supports migration between different CPU models, not entirely different CPU manufacturers. So yes, there is a breakthrough here.
  - Re: (Score:2, Informative)
    
    by stabe ( 1133453 ) writes:
    
    Yes, it does: http://lists.xensource.com/archives/html/xen-devel/2008-06/msg00430.html [xensource.com]
- Xen does migration, but not Live... (Score:5, Informative)
  
  by LinuxGeek ( 6139 ) * writes: <`moc.liamg' `ta' `cn.dnajd'> on Friday November 07, 2008 @12:04PM (#25677085)
  
  This is a demo of a Live migration, no shutdown or reboot involved. Xen does not support the live migration of a running VM between an AMD and Intel server. Watch the video, they are running a video in the VM that keeps playing during the migration. Very impressive stuff.
  
  Parent Share
  twitter facebook
Still x86 only (Score:4, Insightful)

by boner ( 27505 ) writes: on Friday November 07, 2008 @11:33AM (#25676715)

Real magic would have been demonstrating a move between ANY processor architecture - Power, SPARC, x86_64 etc..
Between x86 processors is nice, but not unexpected.

Share
twitter facebook
- Re: (Score:2)
  
  by Hercynium ( 237328 ) writes:
  
  No problem! Just run x86 linux under qemu on all physical platforms, then run your applications under x86 linux inside a kvm inside qemu with migration between the qemu instances on each physical system!
  - Re: (Score:2)
    
    by Atti K. ( 1169503 ) writes:
    
    Putting aside the huge performance penalty, I wonder if qemu can emulate the cpu virtualization support needed for kvm...
    Ok, yes, I know, whooosh ;)
  - Re: (Score:2)
    
    by cduffy ( 652 ) writes:
    
    Actually, qemu recently merged in live migration support -- which kvm then adopted in place of its homegrown solution -- so not nearly as much nesting as you suggest is actually needed.
    - Re: (Score:2)
      
      by Hercynium ( 237328 ) writes:
      
      but then the joke isn't funny! :)
- Re: (Score:2)
  
  by corsec67 ( 627446 ) writes:
  
  That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?
  Seems like this would work between processors with a very similar ISA [wikipedia.org].
  If they could run stuff compiled for one processor on another processor with a different ISA at near full speed,... that would change more than just virtualization. Run Wine on a PowerPC, emulate old consoles easily on a Pandora [openpandora.org], etc..
  - Re: (Score:2)
    
    by NormalVisual ( 565491 ) writes:
    
    That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?
    
    Most definitely. At that point, you're emulating, not virtualizing.
  - Re: (Score:3, Interesting)
    
    by TheRaven64 ( 641858 ) writes:
    
    Depends. Modern emulators can run at around 50% of the host platform speed. If your guest is paravirtualised then all of the privileged instructions will be run in the hypervisor. If you're running a JIT in the guest then you can poke it to flush its code caches and start emitting native code for the new architecture, but even if you aren't then migrating the VM from the 200MHz ARM chip in your cell phone to the quad-core 4GHz x86 chip connected to your TV might be interesting.
- Re: (Score:2)
  
  by TheLink ( 130905 ) writes:
  
  That's doable with emulation but you will take a performance hit. I don't think there's a good way to do it without a lot of emulation.
  
  I don't see a practical reason for cross platform "live" moves.
  
  Switching within a platform class is likely to be far far more useful.
  
  With cross architecture switching, it's going to be a lot harder to use the strengths of the CPUs.
  
  Say you're on x86 and using SSE, then you switch to SPARC, what are you going to do then?
  
  Or you're on UltraSPARC T2 and using the eight encryption
This was in all likelyhood faked. (Score:5, Funny)

by Anonymous Coward writes: on Friday November 07, 2008 @11:39AM (#25676773)

Open source is for morons.
Only Apple has the engineering know-how and skills to pull of something like this. The fact that they have not done so to date is a clear indication that it is impossible.

Share
twitter facebook
check the graphs... (Score:5, Interesting)

by alta ( 1263 ) writes: on Friday November 07, 2008 @11:41AM (#25676801) Homepage Journal

Go to 4:05 in the video. On the far left, you can see from the blue intel line that the guest is running there, then they migrate, and the blue line goes to the idle point, and the orange line starts taking the load. But NOTICE, the AMD line is consistantly higher than the intel line was. I'm no intel fanboy... or AMD. I have both intel and amd servers in my racks. I just thought it was interesting, and I'm surprised they let the video go out like that.

Share
twitter facebook
- Re: (Score:2)
  
  by Loibisch ( 964797 ) writes:
  
  Hehe, I checked the same thing. :)
  To be fair, the performance of playing a HD video is pretty much determined by your graphics card. It's not really the best CPU benchmark you could imagine. :)
  - Re: (Score:3, Interesting)
    
    by wanderingknight ( 1103573 ) writes:
    
    GPUs have nothing to do with video decoding, it's handled 100% by the CPU. At least until we get a software that can reliably take advantage of the relatively recent introduction of h264 decoding on some high-end GPUs.
    - Re: (Score:2)
      
      by Loibisch ( 964797 ) writes:
      
      Sure they don't.
      Hardware H264 encoding is available and working.
      Also go mess around with different video displaying options (overlay, x11...or for windows the various VMR revisions) and watch the CPU load go up and down.
      It's not _all_ the CPU, so it'S bullshit as a CPU benchmark, especially on guaranteed-to-be-different systems.
      - Re: (Score:2)
        
        by wanderingknight ( 1103573 ) writes:
        
        So... if you do the benchmarks on PCs that don't have a GPU that can decode H264, is it still a bullshit benchmark?
- Re: (Score:2)
  
  by nschubach ( 922175 ) writes:
  
  I can't watch the video right now, so I'm assuming the graph is processor utilization?
  Could it possibly be because the AMD processor is running some kind of instruction translation, communication layer, or something like that?
- Re: (Score:1, Interesting)
  
  by Anonymous Coward writes:
  
  (1) It didn't seem clear to me how many VM's each box was running. Could very well be that the Shanghai box was already doing quite a bit before the migration.
  (2) There's a reason Shanghai isn't available yet.
  (3) There's a reason this live migration stuff isn't available yet. Could very well be that the migration (at the moment) causes additional overhead.
  I'm not trying to justify AMD here per se. It's just there's no where near enough information to make any real conclusions what so ever. This may not s
- Re: (Score:3, Informative)
  
  by michrech ( 468134 ) writes:
  
  It didn't seem that interesting to me. If you watch the video, the Intel and Barcelona machines showed no VM's running (0% load). When the Shanghai server took over the load, *of course* it's load line will rise -- it's the only server running a VM at that point!
  There are no shenanigans going on here, and I don't think this says anything about the chips as you imply, either.
  - Re: (Score:2)
    
    by alta ( 1263 ) writes:
    
    What I'm saying is that theh load is higher on the shanghi machine with a VM than it was on the intel with a VM.
- Re: (Score:1)
  
  by Luke_22 ( 1296823 ) writes:
  
  Go to 4:05 in the video. On the far left, you can see from the blue intel line that the guest is running there, then they migrate, and the blue line goes to the idle point, and the orange line starts taking the load. But NOTICE, the AMD line is consistantly higher than the intel line was.
  look better: when the switch happens, one load eliminates the other, they're equal
  then amd load keeps increasing a little bit, even after the switch is complete.
  I guess it could just be the s.o. doing something else. it was a windows after all ;)
- Re: (Score:2)
  
  by Ecuador ( 740021 ) writes:
  
  Well, duh, thet can run their Core 2 @ 4.5GHz on stock air cooling, silly!
  Shanghai can still be faster clock for clock as they promised ;)
  Seriously now, a CPU % utilization of a VM running WMP is no indication of anything.
- Re: (Score:1, Informative)
  
  by Anonymous Coward writes:
  
  True, it is higher but the guy mentions each server is running several VMs (each of which could be doing stuff), not just the one. Also the scale of time isn't visible from the start of migration until finish. Not sure it shows anything really but well spotted.
- Re: (Score:2)
  
  by xouumalperxe ( 815707 ) writes:
  
  This was done between different vendors, not altogether different architectures. That would demand emulation beneath the virtualization, on at least one machine -- not likely to happen any time soon.
- Re: (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  so easy that you did it yourself three years ago, right?
Stability issues are justified (Score:5, Interesting)

by mnmn ( 145599 ) writes: on Friday November 07, 2008 @11:48AM (#25676901) Homepage

Declaration: VMware support engineering here, but speaking strictly on my own behalf.

The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).

We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993

We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault.

If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course.

Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.

Share
twitter facebook
- Re:Stability issues are justified (Score:5, Interesting)
  
  by Anthony Liguori ( 820979 ) writes: on Friday November 07, 2008 @11:59AM (#25677029) Homepage
  
  Declaration: VMware support engineering here, but speaking strictly on my own behalf.
  The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).
  KVM goes to great lengths to by default, mask out CPUID features that aren't supported across common platforms. You have to opt-in to those features since they limit a machine's migrate-ability.
  However, I won't say this is always safe. In reality, you really don't want to live migrate between anything but identical platforms (including identical processor revisions).
  x86 OSes often rely on the TSC for time keeping. If you migrate between different steppings of the same processor even, the TSC calibration that the OS has done is wrong and your time keeping will start to fail. You'll either get really bad drift or potentially see time go backwards (causing a deadlock).
  If you're doing a one time migration, it probably won't matter but if you plan on migrating very rapidly (for load balancing or something), I would take a very conservative approach to platform compatibility.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by NonSequor ( 230139 ) writes:
    
    Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?
    - Re: (Score:2)
      
      by Anthony Liguori ( 820979 ) writes:
      
      Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?
      The halting problem?
      - Re: (Score:2)
        
        by ultranova ( 717540 ) writes:
        
        Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?
        
        The halting problem?
        The parent is not talking about a general algorithm which could determine whether any other algorithm halts, he's talking about translating the state of an algorithm from one specific approximation of a Turing machine to another. These problems have nothing to do with one another.
        Besides, the halting problem only applies to an actual
        
        Re: (Score:2)
        
        by dkf ( 304284 ) writes:
        
        Solving the halting problem for a finite-storage Turing machine approximation is trivial: simply trace the algorithm and compare each state it reaches with each previous state. If the states are identical, then the algorithm will never terminate, for it has entered an infinite loop.
        This is the sort of "trivial" that is only used by pure mathematicians. In practice, it's much harder than that, especially when modeling any program with non-determinism in it (almost all of them these days).
        The problem is two-fold. Firstly, working out when two states are the same is a much harder challenge than it first appears: e.g. does it matter that the system clock has advanced in the meantime? Secondly, the state space grows massively fast and storing all the states of even a small program tends to
        
        Re: (Score:2)
        
        by ultranova ( 717540 ) writes:
        
        This is the sort of "trivial" that is only used by pure mathematicians. In practice, it's much harder than that, especially when modeling any program with non-determinism in it (almost all of them these days).
        
        A Turing machine is completely deterministic - that is, for the given input and algorithm, it will always return the same result, or never return. A non-deterministic program is not a Turing machine, and as such the halting problem doesn't apply.
        Of course it might still be impossible to determine whet
  - Re:Stability issues are justified (Score:5, Informative)
    
    by kscguru ( 551278 ) writes: on Friday November 07, 2008 @01:09PM (#25677777)
    
    Yet Another VMware engineer here.
    The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code still uses CPUID as a serializing instruction that this has noticeable overhead.
    So there are two strategies. Pass directly through the CPUID bits (and on the newest processors, apply a mask), or remember a baseline value, trap-and-emulate every CPUID and always return that value. Sounds like KVM has picked the latter approach for a default; VMware's default is to expose the actual processor features and accept a mask as an optional override, which skews towards exposing more features at the expense of some compatibility. Equally valid choices, IMHO.
    The Worst Case Scenario when not doing a trap-and-emulate of every CPUID is an app that does CPUID, reads the vendor string, then decides based on the vendor string which other CPUID leafs to read. (Like the 0x80000000 leafs, which are vendor-specific and would come back as gibberish if you get the processor wrong). If the app migrates during the dozen or so instructions between the first CPUID and the following ones, instant corruption. Good enough for a pretty demo, destined to make a guest kernel die a few times a year if actually used in production. And I'm 95% sure this is what the OP demo is doing - living dangerously by hoping mismatched CPUID results never get noticed.
    I agree with Anthony Liguori here - on a production machine, an Intel/AMD migration is way too much of a stupid risk. All you have to do is reboot the VM, it's much safer.
    (As a side note to everyone reading, the reason Linux timekeeping is such a problem is that TSC issue. Intel long ago stated TSC was NOT supposed to be used as a timesource. Linux kernel folks ignored the warning, made non-virtualizable assumptions, and today are in a world of hurt for timekeeping in a VM. And only now, many years later, are patching the kernel to detect hypervisors to work around the problem.)
    
    Parent Share
    twitter facebook
    - Re:Stability issues are justified (Score:4, Interesting)
      
      by Chirs ( 87576 ) writes: on Friday November 07, 2008 @01:39PM (#25678391)
      
      The TSC is an optional clock source. You can use other things (ACPI, HPET) but the problem is that they're relatively expensive to access.
      The kernel people have been complaining literally for multiple years that x86 needs a system-wide clocksource that is cheap to access (and presumably hypervisor-friendly). So far AMD and Intel haven't bothered to provide one.
      
      Parent Share
      twitter facebook
    - Re: (Score:3, Interesting)
      
      by TheLink ( 130905 ) writes:
      
      Yes you're not supposed to use TSC.
      
      BUT there is no good alternative that's:
      1) Cheap
      2) Fast
      3) Available on most platforms
      
      I find it quite amazing actually that the CPU manufacturers add all those features, and yet after so many years there is still no good standard way to "get time", despite lots of programs needing to do it.
    - Re: (Score:1)
      
      by virtualboy ( 1402343 ) writes:
      
      Virtual Iron Engineer You also have to be worried about programs that check for CPU and use specific functions within that CPU. When you then move to other CPU that don't have the functions the OS may stay up and running but the application may crash. In house we have done this, but don't recommend customer to LiveMigrate from between Intel and AMD.
    - Re: (Score:3, Informative)
      
      by Anthony Liguori ( 820979 ) writes:
      
      The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code
    - Re: (Score:2)
      
      by Chris Snook ( 872473 ) writes:
      
      Rebooting isn't always an option. If you've got 10 guests running on a host, and you have the luxury of rebooting 9 of them, you still need to migrate one of them. Sure, you can keep separate pools of hosts with different processor revisions and migrate between them most of the time, but what happens when it's time to retire your rack full of netburst-era Xeon boxes, running several hundred guests? You're correct that CPUID trapping introduces overhead on older CPUs, but this demo was run on new CPUs, in
    - - Re: (Score:2)
        
        by kscguru ( 551278 ) writes:
        
        VMware however does not intercept all cpuids. It can't because binary translation only applies to priviledged code (the kernel). VMware doesn't translate user programs and therefore cannot intercept all cpuids. This leads to the inconsistencies in applications you describe. Both Intel and AMD introduced a capability to mask some of the cpuid values to support VMware's enhanced migration but this is a far cry from completely spoofing cpuid's like kvm does.
        And here I thought VMware employees were experts on how VMware software works!
        You've actually run afoul of an extremely common misconception. VMware has been using VT (the same thing KVM uses) since 2005; the VMware hypervisors can run in either a binary translation mode, a VT/SVM mode, or a paravirtualized mode for Linux kernels 2.6.23 and above (or Ubuntu, who accepted the patches earlier), and do in fact switch modes depending on which guest OS, vMotion options, and other settings are configured. Con
- Re: (Score:2)
  
  by Malc ( 1751 ) writes:
  
  VMWare have more stability worries than this on their plate. I've just upgraded Fusion on the Mac to version 2 and it's still very unstable. First use the guest OS locked up, forcing me to reboot the host so I could try again, only to find that, like with Fusion 1.1, the Mac hangs on shutdown. *sigh*
- I see a much harder problem... (Score:1)
  
  by Osvaldo Doederlein ( 34220 ) writes:
  
  This migration won't work for systems that employ advanced JIT code generation, such as Java. Modern production JVMs, like Sun's and IBM's, will create native code on the fly - and they will produce code that's ultra tuned for the specific processor that is running. This means using the best instructions available (like SSEx), and also fine-tune various behaviors, e.g. GC can be tuned for the L1/L2 cache sizes, and locking can be tuned to factors like number of CPUs/cores/hardware threads - so for example,
  - Re: (Score:2)
    
    by BitZtream ( 692029 ) writes:
    
    The apps already are migration aware, as are the OSes, thats why you reboot them.
Wasn't this always possible? (Score:2)

by tlhIngan ( 30335 ) writes:

The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed). Nor how it cna be impossible - while the x86 has many extensions, it's still a well-specified architecture with specific behaviors.
The real trick is if an application is using features not present on the other architecture - e.g., an AMD virtual machine migra
- Re: (Score:2)
  
  by thePowerOfGrayskull ( 905905 ) writes:
  
  The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).
  Erm... actually, if you watch the video, you will see that the "live" migration is actually live - the VM is not suspended, it is kept running and active through the migration.
  - Re: (Score:3, Informative)
    
    by TheRaven64 ( 641858 ) writes:
    
    Actually, it is suspended, but only for a fraction of a second. First you copy the entire contents of memory to the new machine and mark it as read-only. Each page fault caused by this is used to mark pages that are still dirty. Then you copy these. You keep repeating this process until the set of dirty pages is very small. Then you suspend the VM, copy the dirty pages, and start the VM on the new machine. Userspace programs will just notice that they went an unusually long time without their scheduli
- Re: (Score:2)
  
  by Ephemeriis ( 315124 ) writes:
  
  I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).
  You just completely missed the point. The VM was not suspended, moved, and resumed. It was moved live. The VM never stopped doing its thing. It was up, running, and servicing requests the whole time.
  ...which isn't terribly amazing. I know VMWare can do that now. The big deal is apparently that it moved from one CPU vendor to another. I didn't realize this was so tricky... I kind of figured that x86 was x86 regardless of vendor. Obviously, I was wrong.
  - Re: (Score:2)
    
    by BitZtream ( 692029 ) writes:
    
    The virtual machine was paused, just not very long. At some point you have to transfer the contents of the VMs ram between the servers running it and swap which hardware owns the virtual disk. When that moment occurs, the virtual machine is paused for a brief period of time while the final bits of memory and ownership of disks is transfered to the new host.
    This pause is mitigated by transfering as much of the running VM's RAM to the new host as possible, then when the move actually occurs, copying those l
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Re: (Score:2)
  
  by jamesh ( 87723 ) writes:
  
  MMX extensions get emulated on AMD
  Yes, but the kernel has to detect this, and once it does, it assumes that it doesn't keep having to re-detect this (afterall, why would it change?). And what if you migrated right in the middle of that detection?
  With PV kernels that understand virtualisation it's probably not a big deal because the kernel can just say "don't migrate while i'm doing this", but for fully virtualisation where the kernel doesn't know it's virtualised, it's a bit harder.
Not quite a break through (Score:3, Insightful)

by Anthony Liguori ( 820979 ) writes: on Friday November 07, 2008 @12:01PM (#25677055) Homepage

FWIW, KVM live migration has been capable of this for a long time now.
KVM actually supported live migration of Windows guest long before Xen did. If you haven't given KVM a try, you should!

Share
twitter facebook
Creds anyway (Score:2, Insightful)

by noundi ( 1044080 ) writes:

It's worth noting that VMware have been a huge contribution to the Linux-society, giving corps a very good reason (â$Â£) to migrate, thus including important pawns in the future of Linux. I for one believe that VMware was wrong, but that it's an honest mistake. There's no use in poking on VMware for this one, hopefully they'll help lift the technology even higher along with their competitors.

You've lost this round VMware, but the match isn't over yet!
AMD (Score:1)

by wzinc ( 612701 ) writes:

I don't know if this will help AMD sell more procs. I like AMD, but Intel's stuff is by far faster these days. Still, Intel's procs are nightmarishly expensive compared to AMD, and the difference in price/performance seems disproportionate to me.
OpenVZ has been able to do it for like 2 years now (Score:1)

by dowdle ( 199162 ) writes:

Let me clarify before people jump down my throat... OpenVZ (www.openvz.org) is OS Virtualization (aka containers) and NOT machine / hardware virtualization... so it can only run Linux on Linux... but it has been able to do live migrations from one processor family to another since they initially added checkpointing. OpenVZ is fairly CPU agnostic and it has been ported to a number of CPU families. In fact the project leader recently ported it to ARM (Gumstix Overo). See: http://community.livejournal.com/o [livejournal.com]
Once again... (Score:1)

by emptycorp ( 908368 ) writes:

AMD is the first to technological breakthrough and all Intel can do is copy the technology and overclock it to do better on benchmarks.

AMD - First to create lower clock speeds with same or better performance to Intel's higher speeds.

AMD - First (and only) to TRUE dual and quad core technology (Intel does not use logical cores).

AMD - First to 64-bit.

Of course other smaller chip makers have done these sorts of things first, but they don't compare to the Intel/AMD dominance and consumer marketplace
What about apps, not servers? (Score:2)

by JamesTRexx ( 675890 ) writes:

With all the talk about virtualization in the last couple of years, I'm a bit surprised that I haven't seen major talks about live migration capabilities at application level.
I'm not talking about cluster capable apps, but being able to run an app on one server, and then migrate it.
Even the capability of FreeBSD jails, Solaris containers, OpenVZ, etc. to migrate live would come closer to live apps migration.

There's always a good reason to virtualize at OS level, but ultimately it only comes down to bein
Whats New Here? (Score:1)

by keean ( 824435 ) writes:

My company have been successfully migrating VMs from 32bit Intel to 64bit AMD to 64bit Intel for years. We use Linux VServers and OpenVZ. This shared kernel approach to virtualisation is much lower overhead than VMWare, Xen or KVM. We can even run different distro's inside the VMs, the only limitiation is that all the VMs see the same Linux kernel version. So whilst we haven't done this with a hypervisor style VM, for what we want (migrating server images between physical hosts, backing up server images) L
RAID-VM (Score:2)

by Janek Kozicki ( 722688 ) writes:

Redundant Array of I...Intercommunicating D....Devices of Virtual Machines.
It is obvious that the next step is to set up several servers with Virtual Machines on them. Run the same VM in parallel on one or more of them. And if one of the servers goes down - the end user will not notice this, because his virtual machine was be mirrored on all those other servers. Just like hotswap in RAID HDDs we will have this capability with Virtual Machines. It's just a matter of time.
And if someone is stupid enough to tr

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Comment removed (Score:4, Funny)

Re: (Score:1)

Re: (Score:1, Informative)

Re:Bravo! (Score:4, Informative)

Um (Score:2, Insightful)

Re:Um (Score:5, Insightful)

Re: (Score:2)

Re:Um (Score:4, Insightful)

This is still unreleased test demo's (Score:5, Insightful)

Re:This is still unreleased test demo's (Score:4, Insightful)

Re:This is still unreleased test demo's (Score:5, Insightful)

Re: (Score:3, Interesting)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re:This is still unreleased test demo's (Score:5, Interesting)

Re: (Score:3, Insightful)

Re:This is still unreleased test demo's (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re:This is still unreleased test demo's (Score:4, Interesting)

Umm... (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Xen 3.3 supports this already (Score:3, Informative)

Re: (Score:2)

Re: (Score:2, Informative)

Xen does migration, but not Live... (Score:5, Informative)

Still x86 only (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

This was in all likelyhood faked. (Score:5, Funny)

check the graphs... (Score:5, Interesting)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1, Informative)

Re: (Score:2)

Re: (Score:3, Insightful)

Stability issues are justified (Score:5, Interesting)

Re:Stability issues are justified (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Stability issues are justified (Score:5, Informative)

Re:Stability issues are justified (Score:4, Interesting)

Re: (Score:3, Interesting)

Re: (Score:1)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I see a much harder problem... (Score:1)