Hi, I turned the debug level up to 4 on our smaller (128k) ROM downstream build and seem to have hit a case where it's been layed out so that the 'ExtraStack' is at the same location as some code (display_uuid) which was causing some very random behaviour;
from an objdump disassemble of the rom.o that was produced:
ef79d: f3 ab rep stos %eax,%es:(%edi) ef79f: 8d 43 08 lea 0x8(%ebx),%eax ef7a2: b1 10 mov $0x10,%cl ef7a4: 8d 54 24 74 lea 0x74(%esp),%edx ef7a8: e8 50 46 ff ff call e3dfd <memcmp> ef7ad: 85 c0 test %eax,%eax ef7af: 0f 84 33 01 00 00 je ef8e8 <ExtraStack+0x110> ef7b5: 8b 15 0c 23 0f 00 mov 0xf230c,%edx ef7bb: 80 7a 06 02 cmpb $0x2,0x6(%edx) ef7bf: 0f b6 43 17 movzbl 0x17(%ebx),%eax ef7c3: 77 0c ja ef7d1 <StackPos+0x1> ef7c5: 0f 85 80 00 00 00 jne ef84b <ExtraStack+0x73> ef7cb: 80 7a 07 05 cmpb $0x5,0x7(%edx) ef7cf: 76 7a jbe ef84b <ExtraStack+0x73>
Note the 'ExtraStack+...' where as a few lines before it's maininit for other jumps, then looking at a sorted output of the rom.o.objdump:
000eddf2 l F .text 000000e0 virtio_scsi_add_lun.constprop.113 000eded2 l F .text 0000080d device_hardware_setup 000ee6df l F .text 00001a17 maininit <------------- 000ef790 g *ABS* 00000000 final_varlow_start 000ef790 g O *ABS* 00000004 BootSequence 000ef794 g O *ABS* 00000001 FloppyDOR 000ef798 g O *ABS* 00000008 LastUSBkey 000ef7a0 g O *ABS* 00000001 Ps2ctr 000ef7a4 g O *ABS* 00000004 RTCusers 000ef7a8 g O *ABS* 00000004 TimerLast 000ef7ac g O *ABS* 00000001 HaveAttemptedReboot 000ef7ad g O *ABS* 00000001 Century 000ef7b0 g O *ABS* 00000010 CDRom_locks 000ef7c0 g O *ABS* 00000010 DefaultDPTE 000ef7d0 g O *ABS* 00000004 StackPos 000ef7d8 g O *ABS* 00000801 ExtraStack <------------- 000effdc g O *ABS* 00000018 Call16Data 000f0000 g *ABS* 00000000 final_readonly_start 000f0000 g *ABS* 00000000 zonefseg_start 000f00f6 g F .text 0000038b dopost 000f0481 g F .text 000001c2 handle_pmm 000f0644 l O .text 00000014 CSWTCH.1353 000f0658 l O .text 00000014 __func__.14607 000f066c l O .text 00000011 __func__.14624 000f0680 l O .text 00000010 __func__.14549 000f0690 l O .text 0000000b __func__.14497
What's supposed to stop that happening?
I'd chosen a debug level of 4 since that was the largest it would go without the build complaining it wouldn't fit, so I thought I was safe since something did complain if it got way too big.
(This is based off 1.9.1)
Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote:
Hi, I turned the debug level up to 4 on our smaller (128k) ROM downstream build and seem to have hit a case where it's been layed out so that the 'ExtraStack' is at the same location as some code (display_uuid) which was causing some very random behaviour;
from an objdump disassemble of the rom.o that was produced:
ef79d: f3 ab rep stos %eax,%es:(%edi) ef79f: 8d 43 08 lea 0x8(%ebx),%eax ef7a2: b1 10 mov $0x10,%cl ef7a4: 8d 54 24 74 lea 0x74(%esp),%edx ef7a8: e8 50 46 ff ff call e3dfd <memcmp> ef7ad: 85 c0 test %eax,%eax ef7af: 0f 84 33 01 00 00 je ef8e8 <ExtraStack+0x110> ef7b5: 8b 15 0c 23 0f 00 mov 0xf230c,%edx ef7bb: 80 7a 06 02 cmpb $0x2,0x6(%edx) ef7bf: 0f b6 43 17 movzbl 0x17(%ebx),%eax ef7c3: 77 0c ja ef7d1 <StackPos+0x1> ef7c5: 0f 85 80 00 00 00 jne ef84b <ExtraStack+0x73> ef7cb: 80 7a 07 05 cmpb $0x5,0x7(%edx) ef7cf: 76 7a jbe ef84b <ExtraStack+0x73>
Note the 'ExtraStack+...' where as a few lines before it's maininit for other jumps, then looking at a sorted output of the rom.o.objdump:
000eddf2 l F .text 000000e0 virtio_scsi_add_lun.constprop.113 000eded2 l F .text 0000080d device_hardware_setup 000ee6df l F .text 00001a17 maininit <------------- 000ef790 g *ABS* 00000000 final_varlow_start 000ef790 g O *ABS* 00000004 BootSequence 000ef794 g O *ABS* 00000001 FloppyDOR 000ef798 g O *ABS* 00000008 LastUSBkey 000ef7a0 g O *ABS* 00000001 Ps2ctr 000ef7a4 g O *ABS* 00000004 RTCusers 000ef7a8 g O *ABS* 00000004 TimerLast 000ef7ac g O *ABS* 00000001 HaveAttemptedReboot 000ef7ad g O *ABS* 00000001 Century 000ef7b0 g O *ABS* 00000010 CDRom_locks 000ef7c0 g O *ABS* 00000010 DefaultDPTE 000ef7d0 g O *ABS* 00000004 StackPos 000ef7d8 g O *ABS* 00000801 ExtraStack <------------- 000effdc g O *ABS* 00000018 Call16Data 000f0000 g *ABS* 00000000 final_readonly_start 000f0000 g *ABS* 00000000 zonefseg_start 000f00f6 g F .text 0000038b dopost 000f0481 g F .text 000001c2 handle_pmm 000f0644 l O .text 00000014 CSWTCH.1353 000f0658 l O .text 00000014 __func__.14607 000f066c l O .text 00000011 __func__.14624 000f0680 l O .text 00000010 __func__.14549 000f0690 l O .text 0000000b __func__.14497
What I think you're seeing here is an artifact of seabios' code self-relocation. The objdump stores the final location of "varlow" variables, and not the location of their pre-relocation initial values. After the code is self-relocated (in post.c:reloc_preinit() ) it's malloc.c:malloc_init() (see memmove call) that copies over that area of memory.
What's supposed to stop that happening?
The code in scripts/layoutrom.py is supposed to layout the rom without conflicts. It's not clear to me if that's malfunctioning or if the underlying issue is something else - what is the "very random behaviour" you are seeing?
I'd chosen a debug level of 4 since that was the largest it would go without the build complaining it wouldn't fit, so I thought I was safe since something did complain if it got way too big.
It should have been safe - something must not be right.
-Kevin
* Kevin O'Connor (kevin@koconnor.net) wrote:
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote:
Hi, I turned the debug level up to 4 on our smaller (128k) ROM downstream build and seem to have hit a case where it's been layed out so that the 'ExtraStack' is at the same location as some code (display_uuid) which was causing some very random behaviour;
from an objdump disassemble of the rom.o that was produced:
ef79d: f3 ab rep stos %eax,%es:(%edi) ef79f: 8d 43 08 lea 0x8(%ebx),%eax ef7a2: b1 10 mov $0x10,%cl ef7a4: 8d 54 24 74 lea 0x74(%esp),%edx ef7a8: e8 50 46 ff ff call e3dfd <memcmp> ef7ad: 85 c0 test %eax,%eax ef7af: 0f 84 33 01 00 00 je ef8e8 <ExtraStack+0x110> ef7b5: 8b 15 0c 23 0f 00 mov 0xf230c,%edx ef7bb: 80 7a 06 02 cmpb $0x2,0x6(%edx) ef7bf: 0f b6 43 17 movzbl 0x17(%ebx),%eax ef7c3: 77 0c ja ef7d1 <StackPos+0x1> ef7c5: 0f 85 80 00 00 00 jne ef84b <ExtraStack+0x73> ef7cb: 80 7a 07 05 cmpb $0x5,0x7(%edx) ef7cf: 76 7a jbe ef84b <ExtraStack+0x73>
Note the 'ExtraStack+...' where as a few lines before it's maininit for other jumps, then looking at a sorted output of the rom.o.objdump:
000eddf2 l F .text 000000e0 virtio_scsi_add_lun.constprop.113 000eded2 l F .text 0000080d device_hardware_setup 000ee6df l F .text 00001a17 maininit <------------- 000ef790 g *ABS* 00000000 final_varlow_start 000ef790 g O *ABS* 00000004 BootSequence 000ef794 g O *ABS* 00000001 FloppyDOR 000ef798 g O *ABS* 00000008 LastUSBkey 000ef7a0 g O *ABS* 00000001 Ps2ctr 000ef7a4 g O *ABS* 00000004 RTCusers 000ef7a8 g O *ABS* 00000004 TimerLast 000ef7ac g O *ABS* 00000001 HaveAttemptedReboot 000ef7ad g O *ABS* 00000001 Century 000ef7b0 g O *ABS* 00000010 CDRom_locks 000ef7c0 g O *ABS* 00000010 DefaultDPTE 000ef7d0 g O *ABS* 00000004 StackPos 000ef7d8 g O *ABS* 00000801 ExtraStack <------------- 000effdc g O *ABS* 00000018 Call16Data 000f0000 g *ABS* 00000000 final_readonly_start 000f0000 g *ABS* 00000000 zonefseg_start 000f00f6 g F .text 0000038b dopost 000f0481 g F .text 000001c2 handle_pmm 000f0644 l O .text 00000014 CSWTCH.1353 000f0658 l O .text 00000014 __func__.14607 000f066c l O .text 00000011 __func__.14624 000f0680 l O .text 00000010 __func__.14549 000f0690 l O .text 0000000b __func__.14497
What I think you're seeing here is an artifact of seabios' code self-relocation. The objdump stores the final location of "varlow" variables, and not the location of their pre-relocation initial values. After the code is self-relocated (in post.c:reloc_preinit() ) it's malloc.c:malloc_init() (see memmove call) that copies over that area of memory.
OK, I'll try and trace that.
What's supposed to stop that happening?
The code in scripts/layoutrom.py is supposed to layout the rom without conflicts. It's not clear to me if that's malfunctioning or if the underlying issue is something else - what is the "very random behaviour" you are seeing?
Hangs, typically after/in display_uuid or kvm entry exceptions where the EIP is totally bogus; they only happen sometimes on reboot, and adding some debug can make them totally disappear. So the thought of the code beign scribbled over by a stack sounded like a reasonable explanation.
I'd chosen a debug level of 4 since that was the largest it would go without the build complaining it wouldn't fit, so I thought I was safe since something did complain if it got way too big.
It should have been safe - something must not be right.
Hmm OK.
Dve
-Kevin
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 01/20/17 20:39, Dr. David Alan Gilbert wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote:
Hi, I turned the debug level up to 4 on our smaller (128k) ROM downstream build and seem to have hit a case where it's been layed out so that the 'ExtraStack' is at the same location as some code (display_uuid) which was causing some very random behaviour;
from an objdump disassemble of the rom.o that was produced:
ef79d: f3 ab rep stos %eax,%es:(%edi) ef79f: 8d 43 08 lea 0x8(%ebx),%eax ef7a2: b1 10 mov $0x10,%cl ef7a4: 8d 54 24 74 lea 0x74(%esp),%edx ef7a8: e8 50 46 ff ff call e3dfd <memcmp> ef7ad: 85 c0 test %eax,%eax ef7af: 0f 84 33 01 00 00 je ef8e8 <ExtraStack+0x110> ef7b5: 8b 15 0c 23 0f 00 mov 0xf230c,%edx ef7bb: 80 7a 06 02 cmpb $0x2,0x6(%edx) ef7bf: 0f b6 43 17 movzbl 0x17(%ebx),%eax ef7c3: 77 0c ja ef7d1 <StackPos+0x1> ef7c5: 0f 85 80 00 00 00 jne ef84b <ExtraStack+0x73> ef7cb: 80 7a 07 05 cmpb $0x5,0x7(%edx) ef7cf: 76 7a jbe ef84b <ExtraStack+0x73>
Note the 'ExtraStack+...' where as a few lines before it's maininit for other jumps, then looking at a sorted output of the rom.o.objdump:
000eddf2 l F .text 000000e0 virtio_scsi_add_lun.constprop.113 000eded2 l F .text 0000080d device_hardware_setup 000ee6df l F .text 00001a17 maininit <------------- 000ef790 g *ABS* 00000000 final_varlow_start 000ef790 g O *ABS* 00000004 BootSequence 000ef794 g O *ABS* 00000001 FloppyDOR 000ef798 g O *ABS* 00000008 LastUSBkey 000ef7a0 g O *ABS* 00000001 Ps2ctr 000ef7a4 g O *ABS* 00000004 RTCusers 000ef7a8 g O *ABS* 00000004 TimerLast 000ef7ac g O *ABS* 00000001 HaveAttemptedReboot 000ef7ad g O *ABS* 00000001 Century 000ef7b0 g O *ABS* 00000010 CDRom_locks 000ef7c0 g O *ABS* 00000010 DefaultDPTE 000ef7d0 g O *ABS* 00000004 StackPos 000ef7d8 g O *ABS* 00000801 ExtraStack <------------- 000effdc g O *ABS* 00000018 Call16Data 000f0000 g *ABS* 00000000 final_readonly_start 000f0000 g *ABS* 00000000 zonefseg_start 000f00f6 g F .text 0000038b dopost 000f0481 g F .text 000001c2 handle_pmm 000f0644 l O .text 00000014 CSWTCH.1353 000f0658 l O .text 00000014 __func__.14607 000f066c l O .text 00000011 __func__.14624 000f0680 l O .text 00000010 __func__.14549 000f0690 l O .text 0000000b __func__.14497
What I think you're seeing here is an artifact of seabios' code self-relocation. The objdump stores the final location of "varlow" variables, and not the location of their pre-relocation initial values. After the code is self-relocated (in post.c:reloc_preinit() ) it's malloc.c:malloc_init() (see memmove call) that copies over that area of memory.
OK, I'll try and trace that.
What's supposed to stop that happening?
The code in scripts/layoutrom.py is supposed to layout the rom without conflicts. It's not clear to me if that's malfunctioning or if the underlying issue is something else - what is the "very random behaviour" you are seeing?
Hangs, typically after/in display_uuid or kvm entry exceptions where the EIP is totally bogus; they only happen sometimes on reboot, and adding some debug can make them totally disappear. So the thought of the code beign scribbled over by a stack sounded like a reasonable explanation.
I'd chosen a debug level of 4 since that was the largest it would go without the build complaining it wouldn't fit, so I thought I was safe since something did complain if it got way too big.
It should have been safe - something must not be right.
Hmm OK.
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
Thanks Laszlo
On Mon, Jan 23, 2017 at 11:11:02AM +0100, Laszlo Ersek wrote:
On 01/20/17 20:39, Dr. David Alan Gilbert wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote:
Hi, I turned the debug level up to 4 on our smaller (128k) ROM downstream build and seem to have hit a case where it's been layed out so that the 'ExtraStack' is at the same location as some code (display_uuid) which was causing some very random behaviour;
[...]
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
The ExtraStack isn't used at the point Dave reports the problem - display_uuid() is part of the init phase and that happens on the main "post" stack.
[...]
(This is based off 1.9.1)
I missed that earlier - there were some important fixes post 1.9.1 wrt reboots. Commits b837e68d / a48f602c2 could explain the issue. I'd make sure the issue is still present on the latest version.
-Kevin
On 01/23/17 16:49, Kevin O'Connor wrote:
On Mon, Jan 23, 2017 at 11:11:02AM +0100, Laszlo Ersek wrote:
On 01/20/17 20:39, Dr. David Alan Gilbert wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote:
Hi, I turned the debug level up to 4 on our smaller (128k) ROM downstream build and seem to have hit a case where it's been layed out so that the 'ExtraStack' is at the same location as some code (display_uuid) which was causing some very random behaviour;
[...]
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
The ExtraStack isn't used at the point Dave reports the problem - display_uuid() is part of the init phase and that happens on the main "post" stack.
[...]
(This is based off 1.9.1)
I missed that earlier - there were some important fixes post 1.9.1 wrt reboots. Commits b837e68d / a48f602c2 could explain the issue. I'd make sure the issue is still present on the latest version.
That's a very promising hunch -- b837e68d explicitly mentions "reboot loop" in the subject. It seems that Dave didn't mention any RHBZ numbers in his email, but we have two somewhat similar bug reports (which I hope share a root cause) and the second report triggers the issue with a reboot loop specifically.
https://bugzilla.redhat.com/show_bug.cgi?id=1411275 https://bugzilla.redhat.com/show_bug.cgi?id=1382906
(Apologies that the 2nd RHBZ is not public; it's currently filed for the RH kernel, and those BZs default to private. :/)
CC'ing DavidH too, for RHBZ#1382906.
Thank you! Laszlo
* Laszlo Ersek (lersek@redhat.com) wrote:
On 01/23/17 16:49, Kevin O'Connor wrote:
On Mon, Jan 23, 2017 at 11:11:02AM +0100, Laszlo Ersek wrote:
On 01/20/17 20:39, Dr. David Alan Gilbert wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote:
Hi, I turned the debug level up to 4 on our smaller (128k) ROM downstream build and seem to have hit a case where it's been layed out so that the 'ExtraStack' is at the same location as some code (display_uuid) which was causing some very random behaviour;
[...]
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
The ExtraStack isn't used at the point Dave reports the problem - display_uuid() is part of the init phase and that happens on the main "post" stack.
[...]
(This is based off 1.9.1)
I missed that earlier - there were some important fixes post 1.9.1 wrt reboots. Commits b837e68d / a48f602c2 could explain the issue. I'd make sure the issue is still present on the latest version.
That's a very promising hunch -- b837e68d explicitly mentions "reboot loop" in the subject. It seems that Dave didn't mention any RHBZ numbers in his email, but we have two somewhat similar bug reports (which I hope share a root cause) and the second report triggers the issue with a reboot loop specifically.
https://bugzilla.redhat.com/show_bug.cgi?id=1411275 https://bugzilla.redhat.com/show_bug.cgi?id=1382906
(Apologies that the 2nd RHBZ is not public; it's currently filed for the RH kernel, and those BZs default to private. :/)
CC'ing DavidH too, for RHBZ#1382906.
Yeh, it's looking promising; I've done a build with low debug that survived for 50+ reboots and turned my debug on and it's going for 20 so far, so that's pretty good.
However, reading the commits I'm a little confused.
I don't seem to have hit any cases where it's taken the shutdown case after failing to reboot; so it's not that path.
My reboots in this case are always guest triggered, so they're not very early reboots.
One comment in there is: + // Some old versions of KVM don't store a pristine copy of the + // BIOS in high memory. Try to shutdown the machine instead.
do you have a definition of 'old'; in this case it's a new-ish qemu on our downstream (older) kernel but it's got fairly new kvm bits in, but the qemu is configured in our rhel6 compatibility mode - so hmm.
Dave
(With any luck this post will act as a jinx to wake up any sleeping bugs that make me think it's OK....)
Thank you! Laszlo
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
- Laszlo Ersek (lersek@redhat.com) wrote:
On 01/23/17 16:49, Kevin O'Connor wrote:
On Mon, Jan 23, 2017 at 11:11:02AM +0100, Laszlo Ersek wrote:
On 01/20/17 20:39, Dr. David Alan Gilbert wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote: > Hi, > I turned the debug level up to 4 on our smaller (128k) ROM downstream > build and seem to have hit a case where it's been layed out so that the > 'ExtraStack' is at the same location as some code (display_uuid) which > was causing some very random behaviour;
[...]
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
The ExtraStack isn't used at the point Dave reports the problem - display_uuid() is part of the init phase and that happens on the main "post" stack.
[...]
(This is based off 1.9.1)
I missed that earlier - there were some important fixes post 1.9.1 wrt reboots. Commits b837e68d / a48f602c2 could explain the issue. I'd make sure the issue is still present on the latest version.
That's a very promising hunch -- b837e68d explicitly mentions "reboot loop" in the subject. It seems that Dave didn't mention any RHBZ numbers in his email, but we have two somewhat similar bug reports (which I hope share a root cause) and the second report triggers the issue with a reboot loop specifically.
https://bugzilla.redhat.com/show_bug.cgi?id=1411275 https://bugzilla.redhat.com/show_bug.cgi?id=1382906
(Apologies that the 2nd RHBZ is not public; it's currently filed for the RH kernel, and those BZs default to private. :/)
CC'ing DavidH too, for RHBZ#1382906.
Yeh, it's looking promising; I've done a build with low debug that survived for 50+ reboots and turned my debug on and it's going for 20 so far, so that's pretty good.
However, reading the commits I'm a little confused.
I don't seem to have hit any cases where it's taken the shutdown case after failing to reboot; so it's not that path.
My reboots in this case are always guest triggered, so they're not very early reboots.
One comment in there is:
// Some old versions of KVM don't store a pristine copy of the
// BIOS in high memory. Try to shutdown the machine instead.
do you have a definition of 'old'; in this case it's a new-ish qemu on our downstream (older) kernel but it's got fairly new kvm bits in, but the qemu is configured in our rhel6 compatibility mode - so hmm.
Dave
(With any luck this post will act as a jinx to wake up any sleeping bugs that make me think it's OK....)
Ah excellent, that worked; I seem to have reawaken a kvm entry failure in the debug mode probably around the display_uuid but I can't tell yet.
Dave
Thank you! Laszlo
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
SeaBIOS mailing list SeaBIOS@seabios.org https://www.coreboot.org/mailman/listinfo/seabios
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, Jan 23, 2017 at 06:49:07PM +0000, Dr. David Alan Gilbert wrote:
- Laszlo Ersek (lersek@redhat.com) wrote:
On 01/23/17 16:49, Kevin O'Connor wrote:
On Mon, Jan 23, 2017 at 11:11:02AM +0100, Laszlo Ersek wrote:
On 01/20/17 20:39, Dr. David Alan Gilbert wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote: > Hi, > I turned the debug level up to 4 on our smaller (128k) ROM downstream > build and seem to have hit a case where it's been layed out so that the > 'ExtraStack' is at the same location as some code (display_uuid) which > was causing some very random behaviour;
[...]
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
The ExtraStack isn't used at the point Dave reports the problem - display_uuid() is part of the init phase and that happens on the main "post" stack.
[...]
(This is based off 1.9.1)
I missed that earlier - there were some important fixes post 1.9.1 wrt reboots. Commits b837e68d / a48f602c2 could explain the issue. I'd make sure the issue is still present on the latest version.
That's a very promising hunch -- b837e68d explicitly mentions "reboot loop" in the subject. It seems that Dave didn't mention any RHBZ numbers in his email, but we have two somewhat similar bug reports (which I hope share a root cause) and the second report triggers the issue with a reboot loop specifically.
https://bugzilla.redhat.com/show_bug.cgi?id=1411275 https://bugzilla.redhat.com/show_bug.cgi?id=1382906
(Apologies that the 2nd RHBZ is not public; it's currently filed for the RH kernel, and those BZs default to private. :/)
That first report mentions migration to a different QEMU version. When migrating, is the BIOS software migrated as well (the copy at 0xffff0000), or does the new instance get a potentially different instance of the BIOS?
CC'ing DavidH too, for RHBZ#1382906.
Yeh, it's looking promising; I've done a build with low debug that survived for 50+ reboots and turned my debug on and it's going for 20 so far, so that's pretty good.
However, reading the commits I'm a little confused.
I don't seem to have hit any cases where it's taken the shutdown case after failing to reboot; so it's not that path.
My reboots in this case are always guest triggered, so they're not very early reboots.
Both of those seabios fixes are for reboots that occur while processing a reboot. Any chance the guest tries multiple reboot signals and one of them gets delayed?
One comment in there is:
// Some old versions of KVM don't store a pristine copy of the
// BIOS in high memory. Try to shutdown the machine instead.
do you have a definition of 'old'; in this case it's a new-ish qemu on our downstream (older) kernel but it's got fairly new kvm bits in, but the qemu is configured in our rhel6 compatibility mode - so hmm.
I don't have the kvm version handy, but it's really old. You're definitely not on that version, or every reboot would result in a shutdown instead.
-Kevin
* Kevin O'Connor (kevin@koconnor.net) wrote:
On Mon, Jan 23, 2017 at 06:49:07PM +0000, Dr. David Alan Gilbert wrote:
- Laszlo Ersek (lersek@redhat.com) wrote:
On 01/23/17 16:49, Kevin O'Connor wrote:
On Mon, Jan 23, 2017 at 11:11:02AM +0100, Laszlo Ersek wrote:
On 01/20/17 20:39, Dr. David Alan Gilbert wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
> On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote: >> Hi, >> I turned the debug level up to 4 on our smaller (128k) ROM downstream >> build and seem to have hit a case where it's been layed out so that the >> 'ExtraStack' is at the same location as some code (display_uuid) which >> was causing some very random behaviour;
[...]
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
The ExtraStack isn't used at the point Dave reports the problem - display_uuid() is part of the init phase and that happens on the main "post" stack.
[...]
(This is based off 1.9.1)
I missed that earlier - there were some important fixes post 1.9.1 wrt reboots. Commits b837e68d / a48f602c2 could explain the issue. I'd make sure the issue is still present on the latest version.
That's a very promising hunch -- b837e68d explicitly mentions "reboot loop" in the subject. It seems that Dave didn't mention any RHBZ numbers in his email, but we have two somewhat similar bug reports (which I hope share a root cause) and the second report triggers the issue with a reboot loop specifically.
https://bugzilla.redhat.com/show_bug.cgi?id=1411275 https://bugzilla.redhat.com/show_bug.cgi?id=1382906
(Apologies that the 2nd RHBZ is not public; it's currently filed for the RH kernel, and those BZs default to private. :/)
That first report mentions migration to a different QEMU version. When migrating, is the BIOS software migrated as well (the copy at 0xffff0000), or does the new instance get a potentially different instance of the BIOS?
This case isn't with migration; as soon as I started debugging that I found I could hit it with just a simple reboot loop in the guest. (and migration should give you exactly the same ROM image as the source because the read-only blocks get migrated and used rather than the local copy on the destination - until you power the VM down and restart the QEMU when it rereads the local file).
CC'ing DavidH too, for RHBZ#1382906.
Yeh, it's looking promising; I've done a build with low debug that survived for 50+ reboots and turned my debug on and it's going for 20 so far, so that's pretty good.
However, reading the commits I'm a little confused.
I don't seem to have hit any cases where it's taken the shutdown case after failing to reboot; so it's not that path.
My reboots in this case are always guest triggered, so they're not very early reboots.
Both of those seabios fixes are for reboots that occur while processing a reboot. Any chance the guest tries multiple reboot signals and one of them gets delayed?
It's possible; I'll have to add some qemu debug to watch for that; the guest I'm using is a rhel 6.9-ish image - so old kernel with lots of patches; but I'll try and see what it's doing. I've just got a loop that waits for the VM to boot, sees the login/password prompt on serial console, and as soon as it gets a shell issues 'reboot'.
One comment in there is:
// Some old versions of KVM don't store a pristine copy of the
// BIOS in high memory. Try to shutdown the machine instead.
do you have a definition of 'old'; in this case it's a new-ish qemu on our downstream (older) kernel but it's got fairly new kvm bits in, but the qemu is configured in our rhel6 compatibility mode - so hmm.
I don't have the kvm version handy, but it's really old. You're definitely not on that version, or every reboot would result in a shutdown instead.
OK, thanks.
Dave
-Kevin
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
- Kevin O'Connor (kevin@koconnor.net) wrote:
On Mon, Jan 23, 2017 at 06:49:07PM +0000, Dr. David Alan Gilbert wrote:
- Laszlo Ersek (lersek@redhat.com) wrote:
On 01/23/17 16:49, Kevin O'Connor wrote:
On Mon, Jan 23, 2017 at 11:11:02AM +0100, Laszlo Ersek wrote:
On 01/20/17 20:39, Dr. David Alan Gilbert wrote: > * Kevin O'Connor (kevin@koconnor.net) wrote: >> On Fri, Jan 20, 2017 at 06:40:44PM +0000, Dr. David Alan Gilbert wrote: >>> Hi, >>> I turned the debug level up to 4 on our smaller (128k) ROM downstream >>> build and seem to have hit a case where it's been layed out so that the >>> 'ExtraStack' is at the same location as some code (display_uuid) which >>> was causing some very random behaviour;
[...]
Would this be consistent with a stack overflow?
See commit 46b82624c95b951e8825fab117d9352faeae0ec8. Perhaps BUILD_EXTRA_STACK_SIZE (2KB) is too small now?
The ExtraStack isn't used at the point Dave reports the problem - display_uuid() is part of the init phase and that happens on the main "post" stack.
[...]
(This is based off 1.9.1)
I missed that earlier - there were some important fixes post 1.9.1 wrt reboots. Commits b837e68d / a48f602c2 could explain the issue. I'd make sure the issue is still present on the latest version.
That's a very promising hunch -- b837e68d explicitly mentions "reboot loop" in the subject. It seems that Dave didn't mention any RHBZ numbers in his email, but we have two somewhat similar bug reports (which I hope share a root cause) and the second report triggers the issue with a reboot loop specifically.
https://bugzilla.redhat.com/show_bug.cgi?id=1411275 https://bugzilla.redhat.com/show_bug.cgi?id=1382906
(Apologies that the 2nd RHBZ is not public; it's currently filed for the RH kernel, and those BZs default to private. :/)
That first report mentions migration to a different QEMU version. When migrating, is the BIOS software migrated as well (the copy at 0xffff0000), or does the new instance get a potentially different instance of the BIOS?
This case isn't with migration; as soon as I started debugging that I found I could hit it with just a simple reboot loop in the guest. (and migration should give you exactly the same ROM image as the source because the read-only blocks get migrated and used rather than the local copy on the destination - until you power the VM down and restart the QEMU when it rereads the local file).
CC'ing DavidH too, for RHBZ#1382906.
Yeh, it's looking promising; I've done a build with low debug that survived for 50+ reboots and turned my debug on and it's going for 20 so far, so that's pretty good.
However, reading the commits I'm a little confused.
I don't seem to have hit any cases where it's taken the shutdown case after failing to reboot; so it's not that path.
My reboots in this case are always guest triggered, so they're not very early reboots.
Both of those seabios fixes are for reboots that occur while processing a reboot. Any chance the guest tries multiple reboot signals and one of them gets delayed?
It's possible; I'll have to add some qemu debug to watch for that; the guest I'm using is a rhel 6.9-ish image - so old kernel with lots of patches; but I'll try and see what it's doing. I've just got a loop that waits for the VM to boot, sees the login/password prompt on serial console, and as soon as it gets a shell issues 'reboot'.
One comment in there is:
// Some old versions of KVM don't store a pristine copy of the
// BIOS in high memory. Try to shutdown the machine instead.
do you have a definition of 'old'; in this case it's a new-ish qemu on our downstream (older) kernel but it's got fairly new kvm bits in, but the qemu is configured in our rhel6 compatibility mode - so hmm.
I don't have the kvm version handy, but it's really old. You're definitely not on that version, or every reboot would result in a shutdown instead.
OK, thanks.
I've been sporadically carrying on debugging this and I think I have a bit more understanding, but not quite all the way.
I'm pretty sure we are trying to run code that's been over written by variable data - which I believe to be TimerLast, and I'm also fairly sure the problem is not a delayed reset from multiple reboot signals.
Some detail: a) I added some debug to qemu to print what triggered the reboot by always passing qemu_system_reset_request a string; the kernel is always rebooting via KBD_CCMD_RESET, I see that enter the BIOS and then see SeaBIOS then does KBD_CCMD_RESET again - which I think is the one coming from handle_resume32. But then I noticed that every so often we got a reboot from a KVM_SHUTDOWN which is apparently a triple-fault or the like.
b) With a KVM trace I got the addresses the fault was happening at, and it was always an address in the copy fo the bios around bffbcf.. although the actual failure jumped about a bit, taking the: Relocating init from 0x000e4810 to 0xbffb2050 (size 57120) I worked the fault address back to 'ef799' and from the objdump found that was an address in maininit but also corresponded to TimerLast: 000eebdf l F .text 00001954 maininit 000ef780 g *ABS* 00000000 final_varlow_start 000ef780 g O *ABS* 00000004 BootSequence 000ef784 g O *ABS* 00000001 FloppyDOR 000ef788 g O *ABS* 00000008 LastUSBkey 000ef790 g O *ABS* 00000001 Ps2ctr 000ef794 g O *ABS* 00000004 RTCusers
000ef798 g O *ABS* 00000004 TimerLast
000ef79c g O *ABS* 00000001 Century
|> ef78d: 75 07 jne ef796 <RTCusers+0x2> |> ef78f: e8 fc 3c 00 00 call f3490 <yield> |> ef794: eb e4 jmp ef77a <maininit+0xb9b> bffbcfd6 |> ef796: b0 80 mov $0x80,%al * |> ef798: e6 70 out %al,$0x70 * |> ef79a: e4 71 in $0x71,%al |> ef79c: 88 c2 mov %al,%dl |> ef79e: b0 82 mov $0x82,%al |> ef7a0: e6 70 out %al,$0x70 |> ef7a2: e4 71 in $0x71,%al
And so I started dumping out blocks of RAM around there, the only bytes I've seen change are the TimerLast bytes (and only 3 of those); the contents in the ROM area at fffef7... always stay the same:
from fffef790: + : 0090: fc 3c 00 00 eb e4 b0 80 e6 70 e4 71 88 c2 b0 82 .<.......p.q....
Here are dumps from the high area it was running from when it crashed: bffbcf.. vvvvvvvv : : 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ : : 00d0: bc 64 13 40 eb e4 b0 80 e6 70 e4 71 88 c2 b0 82 .d.@.....p.q.... : : 00d0: bc 64 13 40 eb e4 b0 80 e6 70 e4 71 88 c2 b0 82 .d.@.....p.q.... : : 00d0: bc 64 13 40 eb e4 b0 80 04 00 e6 71 88 c2 b0 82 .d.@.......q.... : : 00d0: bc 64 13 40 eb e4 b0 80 04 00 e6 71 88 c2 b0 82 .d.@.......q.... : : 00d0: bc 64 13 40 eb e4 b0 80 00 fa e5 71 88 c2 b0 82 .d.@.......q.... : : 00d0: bc 64 13 40 eb e4 b0 80 00 fa e5 71 88 c2 b0 82 .d.@.......q.... : : 00d0: bc 64 13 40 eb e4 b0 80 da d4 e5 71 88 c2 b0 82 .d.@.......q.... : : 00d0: bc 64 13 40 eb e4 b0 80 da d4 e5 71 88 c2 b0 82 .d.@.......q.... : : 00d0: bc 64 13 40 eb e4 b0 80 54 50 e6 71 88 c2 b0 82 .d.@....TP.q.... : : 00d0: bc 64 13 40 eb e4 b0 80 54 50 e6 71 88 c2 b0 82 .d.@....TP.q....
and those 3 bytes correspond to TimerLast and they're changing all over;
I also dumped memory around ef790, and it's also seeing the change: e00000..
: f790: fc 3c 00 00 eb e4 b0 80 e6 70 e4 71 88 c2 b0 82 .<.......p.q.... : f790: 61 66 90 90 00 00 00 00 3f 3d f9 00 20 66 90 90 af......?=.. f.. : f790: fc 3c 00 00 eb e4 b0 80 04 00 e6 71 88 c2 b0 82 .<.........q.... : f790: 61 66 90 90 00 00 00 00 54 ee f0 00 20 66 90 90 af......T... f.. : f790: fc 3c 00 00 eb e4 b0 80 00 fa e5 71 88 c2 b0 82 .<.........q.... : f790: 61 66 90 90 00 00 00 00 3f ca af 01 20 66 90 90 af......?... f.. : f790: fc 3c 00 00 eb e4 b0 80 da d4 e5 71 88 c2 b0 82 .<.........q.... : f790: 61 66 90 90 00 00 00 00 c4 d8 a8 01 20 66 90 90 af.......... f.. : f790: fc 3c 00 00 eb e4 b0 80 54 50 e6 71 88 c2 b0 82 .<......TP.q.... : f790: 61 66 90 90 00 00 00 00 d5 3f 67 01 20 66 90 90 af.......?g. f.. : f790: fc 3c 00 00 eb e4 b0 80 b8 b2 e5 71 88 c2 b0 82 .<.........q.... : f790: 30 66 90 90 00 00 00 00 00 00 00 00 00 66 90 90 0f...........f.. : f790: fc 3c 00 00 eb e4 b0 80 be b8 e5 71 88 c2 b0 82 .<.........q....
but I'm assuming that's actually the real variable data some of the time and sometimes the ROM copy (this was dumped with cpu_memory_rw_debug in qemu, which says it's physical addresses).
I've not quite got my head around the ROM/writeable swith for the copy around e00000. So is this one of the timer routines being called before the code is copied to bffb.... ?
Still, this does explain why it's such a heisenbug; from a build point of view it requires TimerLast to overlay some critical bit of code you run each time (which probably only happens on 128k builds), then it's pot luck what happens based on timer value; a lot of the time it's nothing bad and you don't notice; sometimes you triple fault and the reset just takes a bit longer since it does an extra reset or two, and if you get really unlucky you hang or get a KVM error.
Dave
Dave
-Kevin
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
SeaBIOS mailing list SeaBIOS@seabios.org https://www.coreboot.org/mailman/listinfo/seabios
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, Feb 14, 2017 at 02:50:23PM +0000, Dr. David Alan Gilbert wrote:
I've been sporadically carrying on debugging this and I think I have a bit more understanding, but not quite all the way.
I'm pretty sure we are trying to run code that's been over written by variable data - which I believe to be TimerLast, and I'm also fairly sure the problem is not a delayed reset from multiple reboot signals.
Some detail: a) I added some debug to qemu to print what triggered the reboot by always passing qemu_system_reset_request a string; the kernel is always rebooting via KBD_CCMD_RESET, I see that enter the BIOS and then see SeaBIOS then does KBD_CCMD_RESET again - which I think is the one coming from handle_resume32. But then I noticed that every so often we got a reboot from a KVM_SHUTDOWN which is apparently a triple-fault or the like.
b) With a KVM trace I got the addresses the fault was happening at, and it was always an address in the copy fo the bios around bffbcf.. although the actual failure jumped about a bit, taking the: Relocating init from 0x000e4810 to 0xbffb2050 (size 57120) I worked the fault address back to 'ef799' and from the objdump found that was an address in maininit but also corresponded to TimerLast:
I can confirm there is a defect with TimerLast and reboot handling. I'm not sure what the best fix is - suggestions welcome.
The root of the problem is that QEMU does not reset the f-segment on a reboot signal and so SeaBIOS must manually reset that memory. It does this in the call to qemu_prep_reset() in tryReboot(). That call copies the pristine copy of the bios at 0xffff0000 to 0xf0000. The code then goes on to signal a hard reboot so that all the hardware is reset and so that the code starts just like it would on first boot.
Unfortunately, the hard reboot logic makes calls to udelay() - specifically in i8042_reboot() and pci_reboot(). These calls to udelay() are not valid here as the timer isn't even necessarily configured at this point. Alas - it also has the nasty possibility of corrupting the pristine copy of the bios that was just made in qemu_prep_reset(). Random crashes can then occur on the next boot.
It's easy to catch this with the following code change:
--- a/src/hw/timer.c +++ b/src/hw/timer.c @@ -146,6 +146,8 @@ timer_adjust_bits(u32 value, u32 validbits) static u32 timer_read(void) { + if (!GET_GLOBAL(HaveRunPost)) + panic("timer read too early\n"); u16 port = GET_GLOBAL(TimerPort); if (CONFIG_TSC_TIMER && !port) // Read from CPU TSC
With that, I get a panic on every reboot. Commenting out the udelay() calls then gets reboots working again.
The simplest fix is to remove the udelay() calls from the reboot logic. However, it's not clear to me if it's valid to omit the delays when running on real hardware. Another possibility would be to try and harden timer_read() so that it can be run very early (prior to code relocation), but that's tricky to implement correctly..
Good find Dave - I know that wasn't easy to track down. -Kevin
On 02/14/17 17:22, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 02:50:23PM +0000, Dr. David Alan Gilbert wrote:
I've been sporadically carrying on debugging this and I think I have a bit more understanding, but not quite all the way.
I'm pretty sure we are trying to run code that's been over written by variable data - which I believe to be TimerLast, and I'm also fairly sure the problem is not a delayed reset from multiple reboot signals.
Some detail: a) I added some debug to qemu to print what triggered the reboot by always passing qemu_system_reset_request a string; the kernel is always rebooting via KBD_CCMD_RESET, I see that enter the BIOS and then see SeaBIOS then does KBD_CCMD_RESET again - which I think is the one coming from handle_resume32. But then I noticed that every so often we got a reboot from a KVM_SHUTDOWN which is apparently a triple-fault or the like.
b) With a KVM trace I got the addresses the fault was happening at, and it was always an address in the copy fo the bios around bffbcf.. although the actual failure jumped about a bit, taking the: Relocating init from 0x000e4810 to 0xbffb2050 (size 57120) I worked the fault address back to 'ef799' and from the objdump found that was an address in maininit but also corresponded to TimerLast:
I can confirm there is a defect with TimerLast and reboot handling. I'm not sure what the best fix is - suggestions welcome.
The root of the problem is that QEMU does not reset the f-segment on a reboot signal
Are you sure this applies to recent QEMU too, not just ancient QEMU?
and so SeaBIOS must manually reset that memory.
Do you mean a workaround for the (historically lacking) emulation of the PAM chipset registers? PAM emulation has been correct for quite a while now in QEMU, as far as I can tell.
(Sorry if I'm completely off.)
Thanks, Laszlo
It does this in the call to qemu_prep_reset() in tryReboot(). That call copies the pristine copy of the bios at 0xffff0000 to 0xf0000. The code then goes on to signal a hard reboot so that all the hardware is reset and so that the code starts just like it would on first boot.
Unfortunately, the hard reboot logic makes calls to udelay() - specifically in i8042_reboot() and pci_reboot(). These calls to udelay() are not valid here as the timer isn't even necessarily configured at this point. Alas - it also has the nasty possibility of corrupting the pristine copy of the bios that was just made in qemu_prep_reset(). Random crashes can then occur on the next boot.
It's easy to catch this with the following code change:
--- a/src/hw/timer.c +++ b/src/hw/timer.c @@ -146,6 +146,8 @@ timer_adjust_bits(u32 value, u32 validbits) static u32 timer_read(void) {
- if (!GET_GLOBAL(HaveRunPost))
u16 port = GET_GLOBAL(TimerPort); if (CONFIG_TSC_TIMER && !port) // Read from CPU TSCpanic("timer read too early\n");
With that, I get a panic on every reboot. Commenting out the udelay() calls then gets reboots working again.
The simplest fix is to remove the udelay() calls from the reboot logic. However, it's not clear to me if it's valid to omit the delays when running on real hardware. Another possibility would be to try and harden timer_read() so that it can be run very early (prior to code relocation), but that's tricky to implement correctly..
Good find Dave - I know that wasn't easy to track down. -Kevin
On Tue, Feb 14, 2017 at 05:36:34PM +0100, Laszlo Ersek wrote:
On 02/14/17 17:22, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 02:50:23PM +0000, Dr. David Alan Gilbert wrote:
I've been sporadically carrying on debugging this and I think I have a bit more understanding, but not quite all the way.
I'm pretty sure we are trying to run code that's been over written by variable data - which I believe to be TimerLast, and I'm also fairly sure the problem is not a delayed reset from multiple reboot signals.
Some detail: a) I added some debug to qemu to print what triggered the reboot by always passing qemu_system_reset_request a string; the kernel is always rebooting via KBD_CCMD_RESET, I see that enter the BIOS and then see SeaBIOS then does KBD_CCMD_RESET again - which I think is the one coming from handle_resume32. But then I noticed that every so often we got a reboot from a KVM_SHUTDOWN which is apparently a triple-fault or the like.
b) With a KVM trace I got the addresses the fault was happening at, and it was always an address in the copy fo the bios around bffbcf.. although the actual failure jumped about a bit, taking the: Relocating init from 0x000e4810 to 0xbffb2050 (size 57120) I worked the fault address back to 'ef799' and from the objdump found that was an address in maininit but also corresponded to TimerLast:
I can confirm there is a defect with TimerLast and reboot handling. I'm not sure what the best fix is - suggestions welcome.
The root of the problem is that QEMU does not reset the f-segment on a reboot signal
Are you sure this applies to recent QEMU too, not just ancient QEMU?
and so SeaBIOS must manually reset that memory.
Do you mean a workaround for the (historically lacking) emulation of the PAM chipset registers? PAM emulation has been correct for quite a while now in QEMU, as far as I can tell.
My understanding was that a write to the PCI reboot register (0x0cf9) would cause a hard reboot - which would reset the PAM registers to their default (a RO view of the pristine bios at 0xf0000). I just did some tests with qemu v2.8 and it does not appear to do that.
Also, the PAM registers on real hardware support a mode where reads to 0xf0000 return the pristine copy of the bios while writes update memory. I didn't think there was any interest in implementing that on QEMU (nor do I think it would be particularly helpful to have).
-Kevin
On 02/14/17 18:16, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 05:36:34PM +0100, Laszlo Ersek wrote:
On 02/14/17 17:22, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 02:50:23PM +0000, Dr. David Alan Gilbert wrote:
I've been sporadically carrying on debugging this and I think I have a bit more understanding, but not quite all the way.
I'm pretty sure we are trying to run code that's been over written by variable data - which I believe to be TimerLast, and I'm also fairly sure the problem is not a delayed reset from multiple reboot signals.
Some detail: a) I added some debug to qemu to print what triggered the reboot by always passing qemu_system_reset_request a string; the kernel is always rebooting via KBD_CCMD_RESET, I see that enter the BIOS and then see SeaBIOS then does KBD_CCMD_RESET again - which I think is the one coming from handle_resume32. But then I noticed that every so often we got a reboot from a KVM_SHUTDOWN which is apparently a triple-fault or the like.
b) With a KVM trace I got the addresses the fault was happening at, and it was always an address in the copy fo the bios around bffbcf.. although the actual failure jumped about a bit, taking the: Relocating init from 0x000e4810 to 0xbffb2050 (size 57120) I worked the fault address back to 'ef799' and from the objdump found that was an address in maininit but also corresponded to TimerLast:
I can confirm there is a defect with TimerLast and reboot handling. I'm not sure what the best fix is - suggestions welcome.
The root of the problem is that QEMU does not reset the f-segment on a reboot signal
Are you sure this applies to recent QEMU too, not just ancient QEMU?
and so SeaBIOS must manually reset that memory.
Do you mean a workaround for the (historically lacking) emulation of the PAM chipset registers? PAM emulation has been correct for quite a while now in QEMU, as far as I can tell.
My understanding was that a write to the PCI reboot register (0x0cf9) would cause a hard reboot - which would reset the PAM registers to their default (a RO view of the pristine bios at 0xf0000). I just did some tests with qemu v2.8 and it does not appear to do that.
Sigh, you are right, and I'm wrong again. :/
In "hw/pci-host/q35.c", we have
mch_reset() mch_update() mch_update_pam() pam_update() [hw/pci-host/pam.c]
but the config space registers that underlie the PAM configuration (MCH_HOST_BRIDGE_PAM0 etc) are not re-set.
Same for i440fx ("hw/pci-host/piix.c"):
i440fx_update_memory_mappings() pam_update() [hw/pci-host/pam.c]
Although i440fx_update_memory_mappings() is called from a good number of places (i440fx_write_config(), i440fx_load_old(), i440fx_post_load(), i440fx_init()), it doesn't seem to be called from any system reset handler.
Knowing this, it likely doesn't even matter that the PAM registers (I440FX_PAM etc) are not re-set in piix3_reset().
Also, the PAM registers on real hardware support a mode where reads to 0xf0000 return the pristine copy of the bios while writes update memory. I didn't think there was any interest in implementing that on QEMU (nor do I think it would be particularly helpful to have).
Hmmm, I thought this was implemented with the four modes visible in init_pam() and switched by pam_update(), in "hw/pci-host/pam.c".
Based on the remaining "XXX" comments though, and the wording of commit 175f099b30d47 ("pam: partly fix write-only mode"), it seems that the emulation is not complete just yet?...
Perhaps this helps Dave identify what should be fixed in QEMU...
Thanks Laszlo
On Tue, Feb 14, 2017 at 07:04:05PM +0100, Laszlo Ersek wrote:
On 02/14/17 18:16, Kevin O'Connor wrote:
Also, the PAM registers on real hardware support a mode where reads to 0xf0000 return the pristine copy of the bios while writes update memory. I didn't think there was any interest in implementing that on QEMU (nor do I think it would be particularly helpful to have).
Hmmm, I thought this was implemented with the four modes visible in init_pam() and switched by pam_update(), in "hw/pci-host/pam.c".
Based on the remaining "XXX" comments though, and the wording of commit 175f099b30d47 ("pam: partly fix write-only mode"), it seems that the emulation is not complete just yet?...
Perhaps this helps Dave identify what should be fixed in QEMU...
I don't think anything in QEMU needs to be "fixed" - the bug is definitely in SeaBIOS. The QEMU pam stuff is definitely quirky, but even if we updated qemu we'd still have to fix seabios for old versions of qemu.
Just for historical perspective - the reason I think qemu didn't implement the pam "read from rom and write to memory" mode is that I don't think there's a good way to emulate that with page tables (and the range needs to be executable so just making it all device memory isn't practical). Even if it were implemented, though, I doubt it would help much.
-Kevin
On 02/14/17 19:43, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 07:04:05PM +0100, Laszlo Ersek wrote:
On 02/14/17 18:16, Kevin O'Connor wrote:
Also, the PAM registers on real hardware support a mode where reads to 0xf0000 return the pristine copy of the bios while writes update memory. I didn't think there was any interest in implementing that on QEMU (nor do I think it would be particularly helpful to have).
Hmmm, I thought this was implemented with the four modes visible in init_pam() and switched by pam_update(), in "hw/pci-host/pam.c".
Based on the remaining "XXX" comments though, and the wording of commit 175f099b30d47 ("pam: partly fix write-only mode"), it seems that the emulation is not complete just yet?...
Perhaps this helps Dave identify what should be fixed in QEMU...
I don't think anything in QEMU needs to be "fixed" - the bug is definitely in SeaBIOS. The QEMU pam stuff is definitely quirky, but even if we updated qemu we'd still have to fix seabios for old versions of qemu.
Just for historical perspective - the reason I think qemu didn't implement the pam "read from rom and write to memory" mode is that I don't think there's a good way to emulate that with page tables (and the range needs to be executable so just making it all device memory isn't practical). Even if it were implemented, though, I doubt it would help much.
There are two questions here, AIUI: (1) resetting the PAM config on qemu system reset, (2) implementing all four PAM modes in QEMU, faithfully to phys hw.
I agree that item (2) is not important.
However, in order to fix SeaBIOS, item (1) should be fixed in QEMU, shouldn't it? You wrote,
The root of the problem is that QEMU does not reset the f-segment on a reboot signal and so SeaBIOS must manually reset that memory. [...]
If item (1) is fixed in QEMU, then the above "root cause" goes away, and the workaround in SeaBIOS can be conditionalized. Am I wrong?
Thanks Laszlo
On Tue, Feb 14, 2017 at 07:52:01PM +0100, Laszlo Ersek wrote:
On 02/14/17 19:43, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 07:04:05PM +0100, Laszlo Ersek wrote:
On 02/14/17 18:16, Kevin O'Connor wrote:
Also, the PAM registers on real hardware support a mode where reads to 0xf0000 return the pristine copy of the bios while writes update memory. I didn't think there was any interest in implementing that on QEMU (nor do I think it would be particularly helpful to have).
Hmmm, I thought this was implemented with the four modes visible in init_pam() and switched by pam_update(), in "hw/pci-host/pam.c".
Based on the remaining "XXX" comments though, and the wording of commit 175f099b30d47 ("pam: partly fix write-only mode"), it seems that the emulation is not complete just yet?...
Perhaps this helps Dave identify what should be fixed in QEMU...
I don't think anything in QEMU needs to be "fixed" - the bug is definitely in SeaBIOS. The QEMU pam stuff is definitely quirky, but even if we updated qemu we'd still have to fix seabios for old versions of qemu.
Just for historical perspective - the reason I think qemu didn't implement the pam "read from rom and write to memory" mode is that I don't think there's a good way to emulate that with page tables (and the range needs to be executable so just making it all device memory isn't practical). Even if it were implemented, though, I doubt it would help much.
There are two questions here, AIUI: (1) resetting the PAM config on qemu system reset, (2) implementing all four PAM modes in QEMU, faithfully to phys hw.
I agree that item (2) is not important.
However, in order to fix SeaBIOS, item (1) should be fixed in QEMU, shouldn't it? You wrote,
The root of the problem is that QEMU does not reset the f-segment on a reboot signal and so SeaBIOS must manually reset that memory. [...]
Sorry, I didn't word that paragraph well. I was trying to give context for why the seabios code was memcpy'ing over itself.
If item (1) is fixed in QEMU, then the above "root cause" goes away, and the workaround in SeaBIOS can be conditionalized. Am I wrong?
I'm not sure. If I recall correctly, there are different resets on the x86 - some only reset the cpu and some do a "full machine reset". SeaBIOS attempts a variety of different reset mechanisms to reboot and I'm not sure which are supposed to do the full reset. If seabios does a "reset cpu" mechanism before a "reset machine" mechanism, then qemu resetting the pam may not help.
Thinking about this further, I think the real problem is that after seabios copies over itself it goes back into its generic reboot handling code. The udelay() is actually fine on real hardware - it's just not okay after seabios has memcpy'd itself. So, I think if seabios immediately did a reboot after the memcpy this problem would go away. Something like the patch below.
What's the best way to force a reboot on QEMU? The mechanism really should reset all cpus in a multi-cpu machine, ideally it would work on different machine types (piix4, q35, isapc, etc.), and ideally it would reset the hardware as well.
-Kevin
--- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -187,4 +187,8 @@ qemu_prep_reset(void) memcpy(hrp + 4, hrp + 4 + BIOS_SRC_OFFSET, cend - (hrp + 4)); barrier(); HaveRunPost = 0; + barrier(); + + // Force QEMU reboot + asm volatile("int3"); }
On 02/14/17 20:14, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 07:52:01PM +0100, Laszlo Ersek wrote:
On 02/14/17 19:43, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 07:04:05PM +0100, Laszlo Ersek wrote:
On 02/14/17 18:16, Kevin O'Connor wrote:
Also, the PAM registers on real hardware support a mode where reads to 0xf0000 return the pristine copy of the bios while writes update memory. I didn't think there was any interest in implementing that on QEMU (nor do I think it would be particularly helpful to have).
Hmmm, I thought this was implemented with the four modes visible in init_pam() and switched by pam_update(), in "hw/pci-host/pam.c".
Based on the remaining "XXX" comments though, and the wording of commit 175f099b30d47 ("pam: partly fix write-only mode"), it seems that the emulation is not complete just yet?...
Perhaps this helps Dave identify what should be fixed in QEMU...
I don't think anything in QEMU needs to be "fixed" - the bug is definitely in SeaBIOS. The QEMU pam stuff is definitely quirky, but even if we updated qemu we'd still have to fix seabios for old versions of qemu.
Just for historical perspective - the reason I think qemu didn't implement the pam "read from rom and write to memory" mode is that I don't think there's a good way to emulate that with page tables (and the range needs to be executable so just making it all device memory isn't practical). Even if it were implemented, though, I doubt it would help much.
There are two questions here, AIUI: (1) resetting the PAM config on qemu system reset, (2) implementing all four PAM modes in QEMU, faithfully to phys hw.
I agree that item (2) is not important.
However, in order to fix SeaBIOS, item (1) should be fixed in QEMU, shouldn't it? You wrote,
The root of the problem is that QEMU does not reset the f-segment on a reboot signal and so SeaBIOS must manually reset that memory. [...]
Sorry, I didn't word that paragraph well. I was trying to give context for why the seabios code was memcpy'ing over itself.
If item (1) is fixed in QEMU, then the above "root cause" goes away, and the workaround in SeaBIOS can be conditionalized. Am I wrong?
I'm not sure. If I recall correctly, there are different resets on the x86 - some only reset the cpu and some do a "full machine reset". SeaBIOS attempts a variety of different reset mechanisms to reboot and I'm not sure which are supposed to do the full reset. If seabios does a "reset cpu" mechanism before a "reset machine" mechanism, then qemu resetting the pam may not help.
To my knowledge, QEMU implements only one kind of system reset, with qemu_system_reset_request(). It is supposed to - reset all VCPUs (it puts all APs back into "wait for INIT-SIPI-SIPI"), - reset all chipset registers, - reset all devices, - not touch guest RAM.
(Device models are responsible for registering their reset handlers with qemu_register_reset().)
See e.g. commit 1ec4ba741630 ("PIIX3: reset the VM when the Reset Control Register's RCPU bit gets set", 2013-01-24) and commit 0e98b436eceb ("ICH9 LPC: Reset Control Register, basic implementation", 2013-02-20).
(I'm sure there are other good or even better examples, I just recall these because I happen to have written them.)
Thinking about this further, I think the real problem is that after seabios copies over itself it goes back into its generic reboot handling code. The udelay() is actually fine on real hardware - it's just not okay after seabios has memcpy'd itself. So, I think if seabios immediately did a reboot after the memcpy this problem would go away. Something like the patch below.
I'll let Dave explore the various SeaBIOS and/or QEMU changes! :)
What's the best way to force a reboot on QEMU? The mechanism really should reset all cpus in a multi-cpu machine, ideally it would work on different machine types (piix4, q35, isapc, etc.), and ideally it would reset the hardware as well.
In SeaBIOS's tryReboot() [src/resume.c], we have:
(1) acpi_reboot(): it uses the register identified by the FADT.RESET_REG field, if the FADT is large / recent enough.
Currently QEMU does not generate a FADT that is large / recent enough, so acpi_reboot() should be a no-op, at the moment. (See find_acpi_features() [src/fw/biostables.c].)
There's an ongoing development to implement this for QEMU (msgid 1484739954-86833-1-git-send-email-phil@philjordan.eu, https://www.mail-archive.com/qemu-devel@nongnu.org/msg423451.html), which would mean advertising ICH9_RST_CNT_IOPORT (0xCF9) in FADT.RESET_REG.
That in turn will lead to the above two commits I listed (i.e., on both i440fx and q35), hence qemu_system_reset_request().
(2) i8042_reboot(): ultimately this writes value 0xFE to IO port 0x64. In QEMU, this is handled by kbd_write_command() [hw/input/pckbd.c], case label KBD_CCMD_RESET; the result is a call to qemu_system_reset_request().
(3) pci_reboot() is just a more direct way (without ACPI) to do (1), hence qemu_system_reset_request().
(4) Triple fault: I have some trouble to track this down (I think I might have to look into KVM too for this), but: - check_exception() [target/i386/excp_helper.c] and - kvm_arch_process_async_events() [target/i386/kvm.c] both seem to imply that a triple fault -- i.e., one after EXCP08_DBLE? -- is handled again with qemu_system_reset_request().
Thus, I think from the four reboot methods seen in tryReboot(), the last three do the same on QEMU, and once implemented in QEMU, the first one will do the same as well. And, I think qemu_system_reset_request() does what you suggest it should.
--- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -187,4 +187,8 @@ qemu_prep_reset(void) memcpy(hrp + 4, hrp + 4 + BIOS_SRC_OFFSET, cend - (hrp + 4)); barrier(); HaveRunPost = 0;
- barrier();
- // Force QEMU reboot
- asm volatile("int3");
}
Dave, can you try this perhaps?
(Nonetheless, I think QEMU should reset the PAM registers on system reset -- maybe tie this to machine type versions / compat properties?)
Thanks Laszlo
On Wed, Feb 15, 2017 at 11:01:03AM +0100, Laszlo Ersek wrote:
On 02/14/17 20:14, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 07:52:01PM +0100, Laszlo Ersek wrote:
If item (1) is fixed in QEMU, then the above "root cause" goes away, and the workaround in SeaBIOS can be conditionalized. Am I wrong?
I'm not sure. If I recall correctly, there are different resets on the x86 - some only reset the cpu and some do a "full machine reset". SeaBIOS attempts a variety of different reset mechanisms to reboot and I'm not sure which are supposed to do the full reset. If seabios does a "reset cpu" mechanism before a "reset machine" mechanism, then qemu resetting the pam may not help.
To my knowledge, QEMU implements only one kind of system reset, with qemu_system_reset_request(). It is supposed to
- reset all VCPUs (it puts all APs back into "wait for INIT-SIPI-SIPI"),
- reset all chipset registers,
- reset all devices,
- not touch guest RAM.
Thanks Laszlo - I appreciate your very detailed response.
Way back in the day, the 286 had no way to return to real mode from protected mode. So, the BIOS had this funky method of detecting fake reboots to use as a mode switch. If there is only one kind of QEMU reset then I think resetting the pam register in it would break this type of resume in SeaBIOS. :-/
So, I'm not sure what the right approach is wrt the PAM registers.
-Kevin
On 02/15/17 16:52, Kevin O'Connor wrote:
On Wed, Feb 15, 2017 at 11:01:03AM +0100, Laszlo Ersek wrote:
On 02/14/17 20:14, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 07:52:01PM +0100, Laszlo Ersek wrote:
If item (1) is fixed in QEMU, then the above "root cause" goes away, and the workaround in SeaBIOS can be conditionalized. Am I wrong?
I'm not sure. If I recall correctly, there are different resets on the x86 - some only reset the cpu and some do a "full machine reset". SeaBIOS attempts a variety of different reset mechanisms to reboot and I'm not sure which are supposed to do the full reset. If seabios does a "reset cpu" mechanism before a "reset machine" mechanism, then qemu resetting the pam may not help.
To my knowledge, QEMU implements only one kind of system reset, with qemu_system_reset_request(). It is supposed to
- reset all VCPUs (it puts all APs back into "wait for INIT-SIPI-SIPI"),
- reset all chipset registers,
- reset all devices,
- not touch guest RAM.
Thanks Laszlo - I appreciate your very detailed response.
Way back in the day, the 286 had no way to return to real mode from protected mode.
That's... unexpected :)
So, the BIOS had this funky method of detecting fake reboots to use as a mode switch. If there is only one kind of QEMU reset then I think resetting the pam register in it would break this type of resume in SeaBIOS. :-/
So, I'm not sure what the right approach is wrt the PAM registers.
Can we perhaps provide a hint via fw_cfg, to convince SeaBIOS that reboots are not fake (hence no need to detect them)?
If not (or if it would be ugly -- it would have to pass QEMU review too!), I'm fine going with a SeaBIOS-side fix, of course. (Today's been crazy again so I don't immediately recall the details of the suggested SeaBIOS-side fix, but I seem to remember that there was one.)
Thanks! Laszlo
On 15/02/2017 16:52, Kevin O'Connor wrote:
On Wed, Feb 15, 2017 at 11:01:03AM +0100, Laszlo Ersek wrote:
On 02/14/17 20:14, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 07:52:01PM +0100, Laszlo Ersek wrote:
If item (1) is fixed in QEMU, then the above "root cause" goes away, and the workaround in SeaBIOS can be conditionalized. Am I wrong?
I'm not sure. If I recall correctly, there are different resets on the x86 - some only reset the cpu and some do a "full machine reset". SeaBIOS attempts a variety of different reset mechanisms to reboot and I'm not sure which are supposed to do the full reset. If seabios does a "reset cpu" mechanism before a "reset machine" mechanism, then qemu resetting the pam may not help.
To my knowledge, QEMU implements only one kind of system reset, with qemu_system_reset_request(). It is supposed to
- reset all VCPUs (it puts all APs back into "wait for INIT-SIPI-SIPI"),
- reset all chipset registers,
- reset all devices,
- not touch guest RAM.
Thanks Laszlo - I appreciate your very detailed response.
Way back in the day, the 286 had no way to return to real mode from protected mode. So, the BIOS had this funky method of detecting fake reboots to use as a mode switch. If there is only one kind of QEMU reset then I think resetting the pam register in it would break this type of resume in SeaBIOS. :-/
So, I'm not sure what the right approach is wrt the PAM registers.
My memories are fuzzy, but I remember discussing whether the PAM registers should be preserved or not by S3, and I think the answer was that on real hardware they are preserved by S3.
Now, S3 is mostly the same as a reset from the firmware's point of view. The conclusion then would be that a reset should _not_ be touching the PAM registers.
Paolo
On 02/17/17 14:10, Paolo Bonzini wrote:
On 15/02/2017 16:52, Kevin O'Connor wrote:
On Wed, Feb 15, 2017 at 11:01:03AM +0100, Laszlo Ersek wrote:
On 02/14/17 20:14, Kevin O'Connor wrote:
On Tue, Feb 14, 2017 at 07:52:01PM +0100, Laszlo Ersek wrote:
If item (1) is fixed in QEMU, then the above "root cause" goes away, and the workaround in SeaBIOS can be conditionalized. Am I wrong?
I'm not sure. If I recall correctly, there are different resets on the x86 - some only reset the cpu and some do a "full machine reset". SeaBIOS attempts a variety of different reset mechanisms to reboot and I'm not sure which are supposed to do the full reset. If seabios does a "reset cpu" mechanism before a "reset machine" mechanism, then qemu resetting the pam may not help.
To my knowledge, QEMU implements only one kind of system reset, with qemu_system_reset_request(). It is supposed to
- reset all VCPUs (it puts all APs back into "wait for INIT-SIPI-SIPI"),
- reset all chipset registers,
- reset all devices,
- not touch guest RAM.
Thanks Laszlo - I appreciate your very detailed response.
Way back in the day, the 286 had no way to return to real mode from protected mode. So, the BIOS had this funky method of detecting fake reboots to use as a mode switch. If there is only one kind of QEMU reset then I think resetting the pam register in it would break this type of resume in SeaBIOS. :-/
So, I'm not sure what the right approach is wrt the PAM registers.
My memories are fuzzy, but I remember discussing whether the PAM registers should be preserved or not by S3, and I think the answer was that on real hardware they are preserved by S3.
Now, S3 is mostly the same as a reset from the firmware's point of view. The conclusion then would be that a reset should _not_ be touching the PAM registers.
Aha! Indeed, you may have stated sth like the PAM regs were wired to the same "power rail" (or whatever the heck) as that main memory.
Let me dig... Yes (it was a cross-posted thread between SeaBIOS and qemu, in 2013):
msgid: 5136469F.7080904@redhat.com https://www.mail-archive.com/seabios@seabios.org/msg04521.html https://www.mail-archive.com/qemu-devel@nongnu.org/msg159003.html
(Incredible that these issues just keep popping up!)
Thanks! Laszlo
* Kevin O'Connor (kevin@koconnor.net) wrote:
On Tue, Feb 14, 2017 at 07:04:05PM +0100, Laszlo Ersek wrote:
On 02/14/17 18:16, Kevin O'Connor wrote:
Also, the PAM registers on real hardware support a mode where reads to 0xf0000 return the pristine copy of the bios while writes update memory. I didn't think there was any interest in implementing that on QEMU (nor do I think it would be particularly helpful to have).
Hmmm, I thought this was implemented with the four modes visible in init_pam() and switched by pam_update(), in "hw/pci-host/pam.c".
Based on the remaining "XXX" comments though, and the wording of commit 175f099b30d47 ("pam: partly fix write-only mode"), it seems that the emulation is not complete just yet?...
Perhaps this helps Dave identify what should be fixed in QEMU...
I don't think anything in QEMU needs to be "fixed" - the bug is definitely in SeaBIOS. The QEMU pam stuff is definitely quirky, but even if we updated qemu we'd still have to fix seabios for old versions of qemu.
Still, we probably should fix QEMU, especially if it's pretty easy, and resetting those registers sounds like it is, I'll try it.
Just for historical perspective - the reason I think qemu didn't implement the pam "read from rom and write to memory" mode is that I don't think there's a good way to emulate that with page tables (and the range needs to be executable so just making it all device memory isn't practical). Even if it were implemented, though, I doubt it would help much.
Anyway, the diagnosis seems right; in that the following hack seems to have survived 51 reboots:
diff -urN seabios-1.9.1-ref-kevinschanges/src/hw/pci.c seabios-1.9.1/src/hw/pci.c --- seabios-1.9.1-ref-kevinschanges/src/hw/pci.c 2016-01-18 05:13:20.000000000 -0500 +++ seabios-1.9.1/src/hw/pci.c 2017-02-14 13:09:39.760330263 -0500 @@ -276,7 +276,5 @@ { u8 v = inb(PORT_PCI_REBOOT) & ~6; outb(v|2, PORT_PCI_REBOOT); /* Request hard reset */ - udelay(50); outb(v|6, PORT_PCI_REBOOT); /* Actually do the reset */ - udelay(50); } diff -urN seabios-1.9.1-ref-kevinschanges/src/resume.c seabios-1.9.1/src/resume.c --- seabios-1.9.1-ref-kevinschanges/src/resume.c 2017-02-14 13:03:19.281943262 -0500 +++ seabios-1.9.1/src/resume.c 2017-02-14 13:09:27.456511775 -0500 @@ -123,15 +123,15 @@ // Setup for reset on qemu. qemu_prep_reset();
+ // Try PCI 0xcf9 reboot + pci_reboot(); + // Reboot using ACPI RESET_REG acpi_reboot();
// Try keyboard controller reboot. i8042_reboot();
- // Try PCI 0xcf9 reboot - pci_reboot(); - // Try triple fault asm volatile("int3");
I went for pci_reboot since it was easier to hack the delay out of. It's survived, still rebooting and hasn't thrown any KVM-shutdowns.
Dave
-Kevin
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hi,
Just for historical perspective - the reason I think qemu didn't implement the pam "read from rom and write to memory" mode is that I don't think there's a good way to emulate that with page tables (and the range needs to be executable so just making it all device memory isn't practical).
A while back kvm got support for memory slots which allow read access and vmexit on write access. Flash emulation needs this, and I think it is possible to also implement the "read from rom and write to memory" mode using that.
cheers, Gerd
On 14/02/2017 22:50, Gerd Hoffmann wrote:
Hi,
Just for historical perspective - the reason I think qemu didn't implement the pam "read from rom and write to memory" mode is that I don't think there's a good way to emulate that with page tables (and the range needs to be executable so just making it all device memory isn't practical).
A while back kvm got support for memory slots which allow read access and vmexit on write access. Flash emulation needs this, and I think it is possible to also implement the "read from rom and write to memory" mode using that.
It would be very slow of course, but if the only writes are to TimerLast, perhaps that would work?
Paolo
* Kevin O'Connor (kevin@koconnor.net) wrote:
On Tue, Feb 14, 2017 at 07:04:05PM +0100, Laszlo Ersek wrote:
On 02/14/17 18:16, Kevin O'Connor wrote:
Also, the PAM registers on real hardware support a mode where reads to 0xf0000 return the pristine copy of the bios while writes update memory. I didn't think there was any interest in implementing that on QEMU (nor do I think it would be particularly helpful to have).
Hmmm, I thought this was implemented with the four modes visible in init_pam() and switched by pam_update(), in "hw/pci-host/pam.c".
Based on the remaining "XXX" comments though, and the wording of commit 175f099b30d47 ("pam: partly fix write-only mode"), it seems that the emulation is not complete just yet?...
Perhaps this helps Dave identify what should be fixed in QEMU...
I don't think anything in QEMU needs to be "fixed" - the bug is definitely in SeaBIOS. The QEMU pam stuff is definitely quirky, but even if we updated qemu we'd still have to fix seabios for old versions of qemu.
I'd have sympathy if you just told the QEMU users to get a particular fix once we fix it.
Just for historical perspective - the reason I think qemu didn't implement the pam "read from rom and write to memory" mode is that I don't think there's a good way to emulate that with page tables (and the range needs to be executable so just making it all device memory isn't practical). Even if it were implemented, though, I doubt it would help much.
In the principal of removing our quirks, the following seems to work for me, Kevin, do you agree it's the right behaviour?
Dave
From fffd898e1ef87d6e404179d860db26304308268b Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" dgilbert@redhat.com Date: Wed, 15 Feb 2017 10:42:28 +0000 Subject: [PATCH] x86: Reset PAM registers on reset
On a reset the bridge code doesn't reset the PAM registers, the effect is to leave the RAM copy of the BIOS mapped rather than the flash copy. The RAM copy might be in an inconsistent state. SeaBIOS has some workarounds for this, but they're not always 100% succesful.
Signed-off-by: Dr. David Alan Gilbert dgilbert@redhat.com --- hw/pci-host/piix.c | 17 +++++++++++++++++ hw/pci-host/q35.c | 7 +++++++ 2 files changed, 24 insertions(+)
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index f9218aa..8c7d741 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -603,6 +603,22 @@ static bool piix3_rcr_needed(void *opaque) return (piix3->rcr != 0); }
+static void i440fx_reset(DeviceState *qdev) +{ + PCII440FXState *d = I440FX_PCI_DEVICE(qdev); + PCIDevice *pd = PCI_DEVICE(d); + + pd->config[I440FX_PAM + 0] = 0; /* 0x59 */ + pd->config[I440FX_PAM + 1] = 0; + pd->config[I440FX_PAM + 2] = 0; + pd->config[I440FX_PAM + 3] = 0; + pd->config[I440FX_PAM + 4] = 0; + pd->config[I440FX_PAM + 5] = 0; + pd->config[I440FX_PAM + 6] = 0; /* 0x5F */ + + i440fx_update_memory_mappings(d); +} + static const VMStateDescription vmstate_piix3_rcr = { .name = "PIIX3/rcr", .version_id = 1, @@ -741,6 +757,7 @@ static void i440fx_class_init(ObjectClass *klass, void *data) k->class_id = PCI_CLASS_BRIDGE_HOST; dc->desc = "Host bridge"; dc->vmsd = &vmstate_i440fx; + dc->reset = i440fx_reset; /* * PCI-facing part of the host bridge, not usable without the * host-facing part, which can't be device_add'ed, yet. diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c index 344f77b..9bc05dc 100644 --- a/hw/pci-host/q35.c +++ b/hw/pci-host/q35.c @@ -456,6 +456,13 @@ static void mch_reset(DeviceState *qdev) d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT; d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK; d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK; + d->config[MCH_HOST_BRIDGE_PAM0] = 0; /* 0x90 */ + d->config[MCH_HOST_BRIDGE_PAM1] = 0; + d->config[MCH_HOST_BRIDGE_PAM2] = 0; + d->config[MCH_HOST_BRIDGE_PAM3] = 0; + d->config[MCH_HOST_BRIDGE_PAM4] = 0; + d->config[MCH_HOST_BRIDGE_PAM5] = 0; + d->config[MCH_HOST_BRIDGE_PAM6] = 0; /* 0x96 */
mch_update(mch); }
On Wed, Feb 15, 2017 at 11:07:05AM +0000, Dr. David Alan Gilbert wrote:
In the principal of removing our quirks, the following seems to work for me, Kevin, do you agree it's the right behaviour?
I ran some quick tests with your patch and I can confirm it fixes the first problem. However, looking at the wikipedia article on old 286 mode switches: https://en.wikipedia.org/wiki/Protected_mode#Entering_and_exiting_protected_... it appears a triple fault was a common way of mode switching. I confirmed your patch would break that in SeaBIOS.
So your patch may be the "right" thing to do, but it would be a lot more work and would have higher risk. (If it is the right thing to do, seabios would then have to detect the ancient mode switch condition and then return the pam registers back to their prior state all without touching any ram (ie, no stack).)
Were you able to confirm the SeaBIOS patch I sent out earlier (see below) passes your tests?
-Kevin
--- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -187,4 +187,8 @@ qemu_prep_reset(void) memcpy(hrp + 4, hrp + 4 + BIOS_SRC_OFFSET, cend - (hrp + 4)); barrier(); HaveRunPost = 0; + barrier(); + + // Force QEMU reboot + asm volatile("int3"); }
* Kevin O'Connor (kevin@koconnor.net) wrote:
On Wed, Feb 15, 2017 at 11:07:05AM +0000, Dr. David Alan Gilbert wrote:
In the principal of removing our quirks, the following seems to work for me, Kevin, do you agree it's the right behaviour?
I ran some quick tests with your patch and I can confirm it fixes the first problem. However, looking at the wikipedia article on old 286 mode switches: https://en.wikipedia.org/wiki/Protected_mode#Entering_and_exiting_protected_... it appears a triple fault was a common way of mode switching. I confirmed your patch would break that in SeaBIOS.
So your patch may be the "right" thing to do, but it would be a lot more work and would have higher risk. (If it is the right thing to do, seabios would then have to detect the ancient mode switch condition and then return the pam registers back to their prior state all without touching any ram (ie, no stack).)
Oh I see, hmm - yes that's a pain - the actual PAM register reset was simple enough in my patch and actually left us with a nice known state after reset.
Were you able to confirm the SeaBIOS patch I sent out earlier (see below) passes your tests?
Yes it seems to. One worry is that if we ever fix the qemu triple-fault so it really does what you're describing and only resets the CPU, then I'm not sure your int3 is the right choice.
The other question is whether that protected-mode exit switch works in practice on qemu; it's going to come back with a lot of it's devices reset.
Dave
-Kevin
--- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -187,4 +187,8 @@ qemu_prep_reset(void) memcpy(hrp + 4, hrp + 4 + BIOS_SRC_OFFSET, cend - (hrp + 4)); barrier(); HaveRunPost = 0;
- barrier();
- // Force QEMU reboot
- asm volatile("int3");
}
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 15/02/2017 18:35, Dr. David Alan Gilbert wrote:
Yes it seems to. One worry is that if we ever fix the qemu triple-fault so it really does what you're describing and only resets the CPU, then I'm not sure your int3 is the right choice.
The other question is whether that protected-mode exit switch works in practice on qemu; it's going to come back with a lot of it's devices reset.
Right, keyboard reset (and port 92h reset) should be an INIT and not a full system reset. But that's also something that should be fixed, not a quirk to rely on...
Paolo
On Fri, Feb 17, 2017 at 02:12:57PM +0100, Paolo Bonzini wrote:
On 15/02/2017 18:35, Dr. David Alan Gilbert wrote:
Yes it seems to. One worry is that if we ever fix the qemu triple-fault so it really does what you're describing and only resets the CPU, then I'm not sure your int3 is the right choice.
The other question is whether that protected-mode exit switch works in practice on qemu; it's going to come back with a lot of it's devices reset.
Right, keyboard reset (and port 92h reset) should be an INIT and not a full system reset. But that's also something that should be fixed, not a quirk to rely on...
Could we rely on the pci reset mechanism (port 0xcf9)?
-Kevin
On 18/02/2017 04:45, Kevin O'Connor wrote:
On Fri, Feb 17, 2017 at 02:12:57PM +0100, Paolo Bonzini wrote:
On 15/02/2017 18:35, Dr. David Alan Gilbert wrote:
Yes it seems to. One worry is that if we ever fix the qemu triple-fault so it really does what you're describing and only resets the CPU, then I'm not sure your int3 is the right choice.
The other question is whether that protected-mode exit switch works in practice on qemu; it's going to come back with a lot of it's devices reset.
Right, keyboard reset (and port 92h reset) should be an INIT and not a full system reset. But that's also something that should be fixed, not a quirk to rely on...
Could we rely on the pci reset mechanism (port 0xcf9)?
That's only 3-4 years old, but 0xcf9 followed by keyboard reset should always work.
Paolo
On Sat, Feb 18, 2017 at 04:11:28PM +0100, Paolo Bonzini wrote:
On 18/02/2017 04:45, Kevin O'Connor wrote:
On Fri, Feb 17, 2017 at 02:12:57PM +0100, Paolo Bonzini wrote:
On 15/02/2017 18:35, Dr. David Alan Gilbert wrote:
Yes it seems to. One worry is that if we ever fix the qemu triple-fault so it really does what you're describing and only resets the CPU, then I'm not sure your int3 is the right choice.
The other question is whether that protected-mode exit switch works in practice on qemu; it's going to come back with a lot of it's devices reset.
Right, keyboard reset (and port 92h reset) should be an INIT and not a full system reset. But that's also something that should be fixed, not a quirk to rely on...
Could we rely on the pci reset mechanism (port 0xcf9)?
That's only 3-4 years old, but 0xcf9 followed by keyboard reset should always work.
Since this would be qemu specific, I think it would be okay if reset only worked on qemu versions from the last few years. Or, are you saying only some machine types would support it? I checked, and reboots on isapc don't currently work at all (looks like the bios isn't mapped to 0xfff00000 on isapc) so I don't think that's a stopper.
-Kevin
On 18/02/2017 17:55, Kevin O'Connor wrote:
On Sat, Feb 18, 2017 at 04:11:28PM +0100, Paolo Bonzini wrote:
On 18/02/2017 04:45, Kevin O'Connor wrote:
On Fri, Feb 17, 2017 at 02:12:57PM +0100, Paolo Bonzini wrote:
On 15/02/2017 18:35, Dr. David Alan Gilbert wrote:
Yes it seems to. One worry is that if we ever fix the qemu triple-fault so it really does what you're describing and only resets the CPU, then I'm not sure your int3 is the right choice.
The other question is whether that protected-mode exit switch works in practice on qemu; it's going to come back with a lot of it's devices reset.
Right, keyboard reset (and port 92h reset) should be an INIT and not a full system reset. But that's also something that should be fixed, not a quirk to rely on...
Could we rely on the pci reset mechanism (port 0xcf9)?
That's only 3-4 years old, but 0xcf9 followed by keyboard reset should always work.
Since this would be qemu specific, I think it would be okay if reset only worked on qemu versions from the last few years. Or, are you saying only some machine types would support it? I checked, and reboots on isapc don't currently work at all (looks like the bios isn't mapped to 0xfff00000 on isapc) so I don't think that's a stopper.
isapc should never be SMP, so the keyboard reset would be enough even if it were fixed to be an INIT. Hence my suggestion of using 0xcf9 for recent QEMU with PCI machine types, followed by keyboard reset for old QEMU + isapc.
Paolo
* Paolo Bonzini (pbonzini@redhat.com) wrote:
On 18/02/2017 17:55, Kevin O'Connor wrote:
On Sat, Feb 18, 2017 at 04:11:28PM +0100, Paolo Bonzini wrote:
On 18/02/2017 04:45, Kevin O'Connor wrote:
On Fri, Feb 17, 2017 at 02:12:57PM +0100, Paolo Bonzini wrote:
On 15/02/2017 18:35, Dr. David Alan Gilbert wrote:
Yes it seems to. One worry is that if we ever fix the qemu triple-fault so it really does what you're describing and only resets the CPU, then I'm not sure your int3 is the right choice.
The other question is whether that protected-mode exit switch works in practice on qemu; it's going to come back with a lot of it's devices reset.
Right, keyboard reset (and port 92h reset) should be an INIT and not a full system reset. But that's also something that should be fixed, not a quirk to rely on...
Could we rely on the pci reset mechanism (port 0xcf9)?
That's only 3-4 years old, but 0xcf9 followed by keyboard reset should always work.
Since this would be qemu specific, I think it would be okay if reset only worked on qemu versions from the last few years. Or, are you saying only some machine types would support it? I checked, and reboots on isapc don't currently work at all (looks like the bios isn't mapped to 0xfff00000 on isapc) so I don't think that's a stopper.
isapc should never be SMP, so the keyboard reset would be enough even if it were fixed to be an INIT. Hence my suggestion of using 0xcf9 for recent QEMU with PCI machine types, followed by keyboard reset for old QEMU + isapc.
Any chance of getting this fix into the about to be tagged stable?
Dave
Paolo
SeaBIOS mailing list SeaBIOS@seabios.org https://www.coreboot.org/mailman/listinfo/seabios
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hi,
isapc should never be SMP, so the keyboard reset would be enough even if it were fixed to be an INIT. Hence my suggestion of using 0xcf9 for recent QEMU with PCI machine types, followed by keyboard reset for old QEMU + isapc.
Any chance of getting this fix into the about to be tagged stable?
Can you post a patch (preferably tested with your testcase, but should land on the list today so we can make the deadline).
thanks, Gerd
* Gerd Hoffmann (kraxel@redhat.com) wrote:
Hi,
isapc should never be SMP, so the keyboard reset would be enough even if it were fixed to be an INIT. Hence my suggestion of using 0xcf9 for recent QEMU with PCI machine types, followed by keyboard reset for old QEMU + isapc.
Any chance of getting this fix into the about to be tagged stable?
Can you post a patch (preferably tested with your testcase, but should land on the list today so we can make the deadline).
Yeh that didn't happen; OK lets try and get it for the next one, it's not been a recent bug as far as I can tell.
Dave
thanks, Gerd
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hi,
Do you mean a workaround for the (historically lacking) emulation of the PAM chipset registers? PAM emulation has been correct for quite a while now in QEMU, as far as I can tell.
(Sorry if I'm completely off.)
That was the first thing coming to my mind too.
So, yes, recent qemu versions should not suffer from that issue. But seabios wants to be able to run on old qemu versions too, so they can't be completely ignored ...
Maybe fixing things on modern qemu is easier if we add a compile time option for backward compatibility with old qemu versions?
At least for qemu this would work fine. We have two different seabios binaries anyway, because at some point seabios didn't fit into 128k any more. So we have one 128k bios, for backward compatibility, and with support for features added only recently to qemu turned off to make it fit. And one full-featured 256k bios version. Only the 128k version needs support for old qemu versions without proper PAM emulation.
cheers, Gerd
* Gerd Hoffmann (kraxel@redhat.com) wrote:
Hi,
Do you mean a workaround for the (historically lacking) emulation of the PAM chipset registers? PAM emulation has been correct for quite a while now in QEMU, as far as I can tell.
(Sorry if I'm completely off.)
That was the first thing coming to my mind too.
So, yes, recent qemu versions should not suffer from that issue. But seabios wants to be able to run on old qemu versions too, so they can't be completely ignored ...
Maybe fixing things on modern qemu is easier if we add a compile time option for backward compatibility with old qemu versions?
At least for qemu this would work fine. We have two different seabios binaries anyway, because at some point seabios didn't fit into 128k any more. So we have one 128k bios, for backward compatibility, and with support for features added only recently to qemu turned off to make it fit. And one full-featured 256k bios version. Only the 128k version needs support for old qemu versions without proper PAM emulation.
But of course that's the one that's actually broken in this case!
Dave
cheers, Gerd
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Kevin O'Connor (kevin@koconnor.net) wrote:
On Tue, Feb 14, 2017 at 02:50:23PM +0000, Dr. David Alan Gilbert wrote:
I've been sporadically carrying on debugging this and I think I have a bit more understanding, but not quite all the way.
I'm pretty sure we are trying to run code that's been over written by variable data - which I believe to be TimerLast, and I'm also fairly sure the problem is not a delayed reset from multiple reboot signals.
Some detail: a) I added some debug to qemu to print what triggered the reboot by always passing qemu_system_reset_request a string; the kernel is always rebooting via KBD_CCMD_RESET, I see that enter the BIOS and then see SeaBIOS then does KBD_CCMD_RESET again - which I think is the one coming from handle_resume32. But then I noticed that every so often we got a reboot from a KVM_SHUTDOWN which is apparently a triple-fault or the like.
b) With a KVM trace I got the addresses the fault was happening at, and it was always an address in the copy fo the bios around bffbcf.. although the actual failure jumped about a bit, taking the: Relocating init from 0x000e4810 to 0xbffb2050 (size 57120) I worked the fault address back to 'ef799' and from the objdump found that was an address in maininit but also corresponded to TimerLast:
Hi Kevin,
I can confirm there is a defect with TimerLast and reboot handling. I'm not sure what the best fix is - suggestions welcome.
Thanks for confirming my diagnosis so far.
The root of the problem is that QEMU does not reset the f-segment on a reboot signal and so SeaBIOS must manually reset that memory.
If you need a qemu fix then lets just figure out the right fix.
It does this in the call to qemu_prep_reset() in tryReboot(). That call copies the pristine copy of the bios at 0xffff0000 to 0xf0000. The code then goes on to signal a hard reboot so that all the hardware is reset and so that the code starts just like it would on first boot.
Unfortunately, the hard reboot logic makes calls to udelay() - specifically in i8042_reboot() and pci_reboot(). These calls to udelay() are not valid here as the timer isn't even necessarily configured at this point. Alas - it also has the nasty possibility of corrupting the pristine copy of the bios that was just made in qemu_prep_reset(). Random crashes can then occur on the next boot.
It's easy to catch this with the following code change:
--- a/src/hw/timer.c +++ b/src/hw/timer.c @@ -146,6 +146,8 @@ timer_adjust_bits(u32 value, u32 validbits) static u32 timer_read(void) {
- if (!GET_GLOBAL(HaveRunPost))
u16 port = GET_GLOBAL(TimerPort); if (CONFIG_TSC_TIMER && !port) // Read from CPU TSCpanic("timer read too early\n");
With that, I get a panic on every reboot. Commenting out the udelay() calls then gets reboots working again.
The simplest fix is to remove the udelay() calls from the reboot logic. However, it's not clear to me if it's valid to omit the delays when running on real hardware.
Yes, it would be scary to modify that because then you'd be stuck with not quite knowing how many hardware implementations were fussy about it - although as you say it's dependent on that read before the timer is configured, so it's not safe anyway.
Wasn't one of the old tricks for a delay to repeatedly read an ISA location on the assumption if the hardware was old then it would be slow and if the hardware wasn't as old it probably wouldn't be as fussy about the delay?
Another possibility would be to try and harden timer_read() so that it can be run very early (prior to code relocation), but that's tricky to implement correctly..
I guess I could hack around it for my use by just removing the delay; Would preserving TimerLast in i8042_reboot/pci_reboot around the udelays work?
Good find Dave - I know that wasn't easy to track down.
Thanks; as heisenbugs go it's a pretty nasty one; still learnt a bit about the innards of seabios :-)
Dave
-Kevin
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK