[SeaBIOS] [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

Xulei (Stone) stone.xulei at huawei.com
Fri Nov 20 03:05:55 CET 2015



>On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote:
>> Kevin,
>> 
>> After deeply analyzing, i think there may be 3 possible reasons:
>> 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no
>> lock to protect.  So, sometimes, 2 or more vcpu may get the same
>> current value of CountCPUs. Then we'll get a single incrementation
>> instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will
>> loop forever;
>
>The handle_smp() code is called from romlayout.S:entry_smp() which
>does take a lock.  So, all of handle_smp() should run synchronous.
>

Ok, this possibility is ruled out!

>> 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number?
>
>Not sure - the last time there were problems in this area of the code
>others used kvmtrace to try and track this down.  Since you are
>getting dprintf statements, you could also try outputting
>cmos_smp_count prior to the loop (see patch below).
>
I'll test again with this patch, and observe the output.
But frankly speaking, i don't think SeaBIOS may read an incorrect number.
Because, QEMU set smp_cpus value in pc_cmos_init (cmos ox5f) at the 1st
time (QEMU does not execute pc_cmos_init again during repetitive reboot).
SeaBIOS works well for many times which means there is no reason that
a time SeaBIOS suddenly reads an incorrect number.

>> 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield?
>> I've tested, when yield() is running, SeaBIOS seems has not created
>> some other threads except the main thread. So I don't know what's
>> the function of yield() here.?
>
>The yield() allows hardware interrupts to occur.  But note that
>yield() isn't called in the loop - is is only called after the loop
>completes.
>
>If you are only getting this on massive repetitive reboot requests,
>there are some other possible explanations:
>
>- perhaps the SIPI is getting lost because one of the CPUs is still
>  resetting or still processing a SIPI from the last reboot?
>
Seems impossible. Seen from the qemu log and SeaBIOS log, the VM has 
booted successfully and can execute our "regular rebooting" daemon
process before the last reboot.

>- the seabios code itself may have been corrupted if the memcpy() in
>  qemu_prep_reset() got far enough along to clear HaveRunPost, but did
>  not get far enough along to fully complete the memcpy().
>
BTW, my VM reboots every 220 second (reboot time interval = 220s). I think
SeaBIOS has enough time to process all kinds of affairs, like SIPI and
memcpy().

Kevin, I want to know whether it is possible that if my VM is stuck at QEMU
(a point of pci device reset procedure? or whatever) SeaBIOS will hold at
hanle_smp() and could not printf "Found %d cpu(s) max supported %d cpu(s)"?
Is this possible?

================== bad QEMU log=======
[2015-11-13 18:45:57] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:45:57] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650381}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
[2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 796285}, "event": "RESET"}
[2015-11-13 18:45:58] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:45:59] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411560, "microseconds": 212196}, "event": "RESET"}
[2015-11-13 18:46:00] qemu_reset_report:749 domain is rebooting
[2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650558}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}}

================ good QEMU log=========
[2015-11-13 18:42:12] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:42:12] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718617}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
[2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 848236}, "event": "RESET"}
[2015-11-13 18:42:14] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:42:14] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411335, "microseconds": 280198}, "event": "RESET"}
[2015-11-13 18:42:15] qemu_reset_report:749 domain is rebooting
[2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718794}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}}
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 3 that means DRIVER
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '-1' hanle control message event = 0, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 4 send control message event = 1, value = 1
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '1' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '2' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '3' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '4' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '2' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 214826}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel1"}}
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '1' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '3' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '4' hanle control message event = 6, value = 1
[2015-11-13 18:42:31] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 220665}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel3"}}

>If the failure is reproducible, the patch below could help narrow the
>possibilities.
>
>-Kevin
>
>
>--- a/src/fw/smp.c
>+++ b/src/fw/smp.c
>@@ -125,6 +125,7 @@ smp_setup(void)
> 
>     // Wait for other CPUs to process the SIPI.
>     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
>+    dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);
>     while (cmos_smp_count != CountCPUs)
>         asm volatile(
>             // Release lock and allow other processors to use the stack.
>@@ -136,6 +137,7 @@ smp_setup(void)
>             "  jc 1b\n"
>             : "+m" (SMPLock), "+m" (SMPStack)
>             : : "cc", "memory");
>+    dprintf(1, "finish smp\n");
>     yield();
> 
>     // Restore memory.


More information about the SeaBIOS mailing list