From: Kevin O'Connor [mailto:email@example.com]
Sent: Sunday, December 20, 2015 10:33 PM
To: Gonglei (Arei)
Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seabios(a)seabios.org;
Huangweidong (C); kvm(a)vger.kernel.org; Radim Krcmar
Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
problem on qemu-kvm platform
On Sun, Dec 20, 2015 at 09:49:54AM +0000, Gonglei (Arei) wrote:
> From: Kevin O'Connor
> Sent: Saturday, December 19, 2015 11:12 PM
> On Sat, Dec 19, 2015 at 12:03:15PM +0000, Gonglei (Arei) wrote:
> > Maybe the root cause is not NMI but INTR, so yield() can open hardware
> > And then execute interrupt handler, but the interrupt handler make the
> > stack broken, so that the BSP can't execute the instruction and occur
> > VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs
the surface phenomenon.
I can't see any reason why allowing interrupts at this location would
be a problem.
Does it have any relationship with *extra stack* of SeaBIOS?
None that I can see. Also, the kvm trace seems to show the code
trying to execute at rip=0x03 - that will crash long before the extra
stack is used.
When the gurb of OS is booting, then the softirq and C function send_disk_op()
may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: irqentry_extrastack
is invoked, and the extra stack will be used again. And the stack of first calling
will be broken, so that the SeaBIOS stuck.
You can easily reproduce the problem.
1. start on guest
2. reset the guest
3. inject a NMI when the guest show the grub surface
4. then the guest stuck
If we disabled extra stack by setting
Then the problem is gone.
Besides, I have another thought:
Is it possible when one cpu is using the extra stack, but other cpus (APs)
still be waked up by hardware interrupt after yield() or br->flags = F_IF
and used the extra stack again?
Kevin, can we
drop yield() in smp_setup() ?
It's possible to eliminate this instance of yield, but I think it
would just push the crash to the next time interrupts are enabled.
Perhaps. I'm not sure.
> > Is it really useful and allowable for SeaBIOS? Maybe for other
> > I'm not sure. Because we found that
when SeaBIOS is booting, if we inject
> > NMI by QMP, the guest will *stuck*. And
the kvm tracing log is the same
the current problem.
If you apply the patches you had to prevent that NMI crash problem,
does it also prevent the above crash?
Yes, but we cannot prevent the NMI injection (though I'll submit some
forbid users' NMI injection after NMI_EN
disabled by RTC bit7 of port 0x70).