On Sun, Dec 20, 2015 at 09:49:54AM +0000, Gonglei (Arei) wrote:
From: Kevin O'Connor [mailto:kevin@koconnor.net] Sent: Saturday, December 19, 2015 11:12 PM On Sat, Dec 19, 2015 at 12:03:15PM +0000, Gonglei (Arei) wrote:
Maybe the root cause is not NMI but INTR, so yield() can open hardware
interrupt,
And then execute interrupt handler, but the interrupt handler make the
SeaBIOS
stack broken, so that the BSP can't execute the instruction and occur
exception,
VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs except the surface phenomenon.
I can't see any reason why allowing interrupts at this location would be a problem.
Does it have any relationship with *extra stack* of SeaBIOS?
None that I can see. Also, the kvm trace seems to show the code trying to execute at rip=0x03 - that will crash long before the extra stack is used.
Kevin, can we drop yield() in smp_setup() ?
It's possible to eliminate this instance of yield, but I think it would just push the crash to the next time interrupts are enabled.
Perhaps. I'm not sure.
Is it really useful and allowable for SeaBIOS? Maybe for other components? I'm not sure. Because we found that when SeaBIOS is booting, if we inject a NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same with the current problem.
If you apply the patches you had to prevent that NMI crash problem, does it also prevent the above crash?
Yes, but we cannot prevent the NMI injection (though I'll submit some patches to forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70).
-Kevin
Dear Kevin,
-----Original Message----- From: Kevin O'Connor [mailto:kevin@koconnor.net] Sent: Sunday, December 20, 2015 10:33 PM To: Gonglei (Arei) Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seabios@seabios.org; Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
On Sun, Dec 20, 2015 at 09:49:54AM +0000, Gonglei (Arei) wrote:
From: Kevin O'Connor [mailto:kevin@koconnor.net] Sent: Saturday, December 19, 2015 11:12 PM On Sat, Dec 19, 2015 at 12:03:15PM +0000, Gonglei (Arei) wrote:
Maybe the root cause is not NMI but INTR, so yield() can open hardware
interrupt,
And then execute interrupt handler, but the interrupt handler make the
SeaBIOS
stack broken, so that the BSP can't execute the instruction and occur
exception,
VM_EXIT to Kmod, which is an infinite loop. But I don't have any proofs
except
the surface phenomenon.
I can't see any reason why allowing interrupts at this location would be a problem.
Does it have any relationship with *extra stack* of SeaBIOS?
None that I can see. Also, the kvm trace seems to show the code trying to execute at rip=0x03 - that will crash long before the extra stack is used.
When the gurb of OS is booting, then the softirq and C function send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI, romlayout.S: irqentry_extrastack is invoked, and the extra stack will be used again. And the stack of first calling will be broken, so that the SeaBIOS stuck.
You can easily reproduce the problem.
1. start on guest 2. reset the guest 3. inject a NMI when the guest show the grub surface 4. then the guest stuck
If we disabled extra stack by setting
CONFIG_ENTRY_EXTRASTACK=n
Then the problem is gone.
Besides, I have another thought:
Is it possible when one cpu is using the extra stack, but other cpus (APs) still be waked up by hardware interrupt after yield() or br->flags = F_IF and used the extra stack again?
Regards, -Gonglei
Kevin, can we drop yield() in smp_setup() ?
It's possible to eliminate this instance of yield, but I think it would just push the crash to the next time interrupts are enabled.
Perhaps. I'm not sure.
Is it really useful and allowable for SeaBIOS? Maybe for other
components?
I'm not sure. Because we found that when SeaBIOS is booting, if we inject
a
NMI by QMP, the guest will *stuck*. And the kvm tracing log is the same
with
the current problem.
If you apply the patches you had to prevent that NMI crash problem, does it also prevent the above crash?
Yes, but we cannot prevent the NMI injection (though I'll submit some
patches to
forbid users' NMI injection after NMI_EN disabled by RTC bit7 of port 0x70).
-Kevin