[SeaBIOS] [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

Mon Nov 9 14:32:53 CET 2015

On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
> 
> >On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote:
> >> On 2015/11/3 14:58, Xulei (Stone, Euler) wrote:
> >> > On qemu-kvm platform, when I reset a VM through "virsh reset", and coincidently
> >> > the VM is in process of internal rebooting at the same time. Then the VM will
> >> > not be successfully reseted any more due to the reset reentrancy. I found:
> >> > (1)SeaBios try to shutdown the VM after reseting it failed by apm_shutdown().
> >> > However, apm_shutdown() does not work on qemu-kvm platform;
> >> > (2)I add 1s sleep in qemu_prep_reset(), then continuously reset the VM twice,
> >> > aforementioned case must happen.
> >
> >So, the problem occurs when issuing a second reset before the first
> >reset completes?
> 
> Yes. Detailedly, the 2nd reset issued after "HaveAttemptedReboot = 1"
> and prior to the memcpy completing in qemu_prep_reset().
> 
> >> > This patch fixes this issue by letting the VM always execute the reboot
> >> > routing while a reenrancy happenes instead of attempting apm_shutdown on
> >> > qemu-kvm platform.
> >
> >The reason for the HaveAttemptedReboot check is to work around old
> >versions of KVM that unexpectedly map the same memory to both 0xf0000
> >and 0xffff0000.  So, it does not make sense to wrap the check in a
> >!runningOnKVM() block as that disables the only reason for the check.
> >
> >I'm surprised you would see the above on a recent qemu/kvm though - as
> >on a newer KVM I think the second reset would have to happen after
> >HaveAttemptedReboot is set and prior to the memcpy in
> >qemu_prep_reset() completing.  Can you verify your KVM version?
> >
> >-Kevin
> 
> I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can 
> see this problem. 
> I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, 
> let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, 
> a self-defined timeout, HA mechnism will issue a internal reboot command to
> the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, 
> aforementioned problem will occurs in high probability. 

Ah, okay.  I'm not sure what the best solution to this problem is.  We
don't want to exclude KVM because the check is meant to prevent an
infinite loop on older versions of KVM (which looks like a mysterious
hang to users).  We also don't want to be in a situation where we
reboot and the memcpy hasn't fully completed, as that's likely to lead
to mysterious crashes on the next boot.

-Kevin