On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote:
On 2015/11/3 14:58, Xulei (Stone, Euler) wrote:
On qemu-kvm platform, when I reset a VM through "virsh reset", and coincidently the VM is in process of internal rebooting at the same time. Then the VM will not be successfully reseted any more due to the reset reentrancy. I found: (1)SeaBios try to shutdown the VM after reseting it failed by apm_shutdown(). However, apm_shutdown() does not work on qemu-kvm platform; (2)I add 1s sleep in qemu_prep_reset(), then continuously reset the VM twice, aforementioned case must happen.
So, the problem occurs when issuing a second reset before the first reset completes?
Yes. Detailedly, the 2nd reset issued after "HaveAttemptedReboot = 1" and prior to the memcpy completing in qemu_prep_reset().
This patch fixes this issue by letting the VM always execute the reboot routing while a reenrancy happenes instead of attempting apm_shutdown on qemu-kvm platform.
The reason for the HaveAttemptedReboot check is to work around old versions of KVM that unexpectedly map the same memory to both 0xf0000 and 0xffff0000. So, it does not make sense to wrap the check in a !runningOnKVM() block as that disables the only reason for the check.
I'm surprised you would see the above on a recent qemu/kvm though - as on a newer KVM I think the second reset would have to happen after HaveAttemptedReboot is set and prior to the memcpy in qemu_prep_reset() completing. Can you verify your KVM version?
-Kevin
I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can see this problem. I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, a self-defined timeout, HA mechnism will issue a internal reboot command to the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, aforementioned problem will occurs in high probability.
-Leixu