[SeaBIOS] [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform
Xulei (Stone)
stone.xulei at huawei.com
Fri Nov 20 03:05:55 CET 2015
>On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote:
>> Kevin,
>>
>> After deeply analyzing, i think there may be 3 possible reasons:
>> 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no
>> lock to protect. So, sometimes, 2 or more vcpu may get the same
>> current value of CountCPUs. Then we'll get a single incrementation
>> instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will
>> loop forever;
>
>The handle_smp() code is called from romlayout.S:entry_smp() which
>does take a lock. So, all of handle_smp() should run synchronous.
>
Ok, this possibility is ruled out!
>> 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number?
>
>Not sure - the last time there were problems in this area of the code
>others used kvmtrace to try and track this down. Since you are
>getting dprintf statements, you could also try outputting
>cmos_smp_count prior to the loop (see patch below).
>
I'll test again with this patch, and observe the output.
But frankly speaking, i don't think SeaBIOS may read an incorrect number.
Because, QEMU set smp_cpus value in pc_cmos_init (cmos ox5f) at the 1st
time (QEMU does not execute pc_cmos_init again during repetitive reboot).
SeaBIOS works well for many times which means there is no reason that
a time SeaBIOS suddenly reads an incorrect number.
>> 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield?
>> I've tested, when yield() is running, SeaBIOS seems has not created
>> some other threads except the main thread. So I don't know what's
>> the function of yield() here.?
>
>The yield() allows hardware interrupts to occur. But note that
>yield() isn't called in the loop - is is only called after the loop
>completes.
>
>If you are only getting this on massive repetitive reboot requests,
>there are some other possible explanations:
>
>- perhaps the SIPI is getting lost because one of the CPUs is still
> resetting or still processing a SIPI from the last reboot?
>
Seems impossible. Seen from the qemu log and SeaBIOS log, the VM has
booted successfully and can execute our "regular rebooting" daemon
process before the last reboot.
>- the seabios code itself may have been corrupted if the memcpy() in
> qemu_prep_reset() got far enough along to clear HaveRunPost, but did
> not get far enough along to fully complete the memcpy().
>
BTW, my VM reboots every 220 second (reboot time interval = 220s). I think
SeaBIOS has enough time to process all kinds of affairs, like SIPI and
memcpy().
Kevin, I want to know whether it is possible that if my VM is stuck at QEMU
(a point of pci device reset procedure? or whatever) SeaBIOS will hold at
hanle_smp() and could not printf "Found %d cpu(s) max supported %d cpu(s)"?
Is this possible?
================== bad QEMU log=======
[2015-11-13 18:45:57] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:45:57] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650381}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
[2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 796285}, "event": "RESET"}
[2015-11-13 18:45:58] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:45:59] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411560, "microseconds": 212196}, "event": "RESET"}
[2015-11-13 18:46:00] qemu_reset_report:749 domain is rebooting
[2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650558}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}}
================ good QEMU log=========
[2015-11-13 18:42:12] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:42:12] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718617}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
[2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 848236}, "event": "RESET"}
[2015-11-13 18:42:14] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:42:14] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411335, "microseconds": 280198}, "event": "RESET"}
[2015-11-13 18:42:15] qemu_reset_report:749 domain is rebooting
[2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718794}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}}
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 3 that means DRIVER
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '-1' hanle control message event = 0, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 4 send control message event = 1, value = 1
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '1' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '2' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '3' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '4' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '2' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 214826}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel1"}}
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '1' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '3' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '4' hanle control message event = 6, value = 1
[2015-11-13 18:42:31] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 220665}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel3"}}
>If the failure is reproducible, the patch below could help narrow the
>possibilities.
>
>-Kevin
>
>
>--- a/src/fw/smp.c
>+++ b/src/fw/smp.c
>@@ -125,6 +125,7 @@ smp_setup(void)
>
> // Wait for other CPUs to process the SIPI.
> u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
>+ dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);
> while (cmos_smp_count != CountCPUs)
> asm volatile(
> // Release lock and allow other processors to use the stack.
>@@ -136,6 +137,7 @@ smp_setup(void)
> " jc 1b\n"
> : "+m" (SMPLock), "+m" (SMPStack)
> : : "cc", "memory");
>+ dprintf(1, "finish smp\n");
> yield();
>
> // Restore memory.
More information about the SeaBIOS
mailing list