* Kevin O'Connor (kevin@koconnor.net) wrote:
On Tue, Feb 14, 2017 at 02:50:23PM +0000, Dr. David Alan Gilbert wrote:
I've been sporadically carrying on debugging this and I think I have a bit more understanding, but not quite all the way.
I'm pretty sure we are trying to run code that's been over written by variable data - which I believe to be TimerLast, and I'm also fairly sure the problem is not a delayed reset from multiple reboot signals.
Some detail: a) I added some debug to qemu to print what triggered the reboot by always passing qemu_system_reset_request a string; the kernel is always rebooting via KBD_CCMD_RESET, I see that enter the BIOS and then see SeaBIOS then does KBD_CCMD_RESET again - which I think is the one coming from handle_resume32. But then I noticed that every so often we got a reboot from a KVM_SHUTDOWN which is apparently a triple-fault or the like.
b) With a KVM trace I got the addresses the fault was happening at, and it was always an address in the copy fo the bios around bffbcf.. although the actual failure jumped about a bit, taking the: Relocating init from 0x000e4810 to 0xbffb2050 (size 57120) I worked the fault address back to 'ef799' and from the objdump found that was an address in maininit but also corresponded to TimerLast:
Hi Kevin,
I can confirm there is a defect with TimerLast and reboot handling. I'm not sure what the best fix is - suggestions welcome.
Thanks for confirming my diagnosis so far.
The root of the problem is that QEMU does not reset the f-segment on a reboot signal and so SeaBIOS must manually reset that memory.
If you need a qemu fix then lets just figure out the right fix.
It does this in the call to qemu_prep_reset() in tryReboot(). That call copies the pristine copy of the bios at 0xffff0000 to 0xf0000. The code then goes on to signal a hard reboot so that all the hardware is reset and so that the code starts just like it would on first boot.
Unfortunately, the hard reboot logic makes calls to udelay() - specifically in i8042_reboot() and pci_reboot(). These calls to udelay() are not valid here as the timer isn't even necessarily configured at this point. Alas - it also has the nasty possibility of corrupting the pristine copy of the bios that was just made in qemu_prep_reset(). Random crashes can then occur on the next boot.
It's easy to catch this with the following code change:
--- a/src/hw/timer.c +++ b/src/hw/timer.c @@ -146,6 +146,8 @@ timer_adjust_bits(u32 value, u32 validbits) static u32 timer_read(void) {
- if (!GET_GLOBAL(HaveRunPost))
u16 port = GET_GLOBAL(TimerPort); if (CONFIG_TSC_TIMER && !port) // Read from CPU TSCpanic("timer read too early\n");
With that, I get a panic on every reboot. Commenting out the udelay() calls then gets reboots working again.
The simplest fix is to remove the udelay() calls from the reboot logic. However, it's not clear to me if it's valid to omit the delays when running on real hardware.
Yes, it would be scary to modify that because then you'd be stuck with not quite knowing how many hardware implementations were fussy about it - although as you say it's dependent on that read before the timer is configured, so it's not safe anyway.
Wasn't one of the old tricks for a delay to repeatedly read an ISA location on the assumption if the hardware was old then it would be slow and if the hardware wasn't as old it probably wouldn't be as fussy about the delay?
Another possibility would be to try and harden timer_read() so that it can be run very early (prior to code relocation), but that's tricky to implement correctly..
I guess I could hack around it for my use by just removing the delay; Would preserving TimerLast in i8042_reboot/pci_reboot around the udelays work?
Good find Dave - I know that wasn't easy to track down.
Thanks; as heisenbugs go it's a pretty nasty one; still learnt a bit about the innards of seabios :-)
Dave
-Kevin
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK