[SeaBIOS] [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

Xulei (Stone) stone.xulei at huawei.com
Thu Nov 19 02:04:19 CET 2015


Dear Kevin,

Sorry for delayed replying. This patch works for me well. Thanks a lot!

Recently, I found another odd thing. A qemu-kvm VM is stuck at the SeaBIOS 
after self-rebooting many times. Analyzing the SeaBIOS log attached below, I
think there maybe someting wrong from this block of code:

/src/fw/smp.c

    u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
    while (cmos_smp_count != CountCPUs)
        asm volatile(
            // Release lock and allow other processors to use the stack.
            "  movl %%esp, %1\n"
            "  movl $0, %0\n"
            // Reacquire lock and take back ownership of stack.
            "1:rep ; nop\n"
            "  lock btsl $0, %0\n"
            "  jc 1b\n"
            : "+m" (SMPLock), "+m" (SMPStack)
            : : "cc", "memory");
    yield();

It seems if SeaBIOS read an incorrect number sometimes from QEMU 
through cmos 0x5f,the SeaBIOS really may be stucked. So, i wonder
what may cause this problem after a VM self-rebooting many times?

================bad SeaBIOS log===========
[2015-11-13 18:45:58] In resume (status=0)
[2015-11-13 18:45:58] In 32bit resume
[2015-11-13 18:45:58] Attempting a hard reboot
[2015-11-13 18:46:00] SeaBIOS (version rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org)
[2015-11-13 18:46:00] No Xen hypervisor found.
[2015-11-13 18:46:00] Running on QEMU (i440fx)
[2015-11-13 18:46:00] Running on KVM
[2015-11-13 18:46:00] RamSize: 0xc0000000 [cmos]
[2015-11-13 18:46:00] Relocating init from 0x000de8f0 to 0xbffaec00 (size 70464)
[2015-11-13 18:46:00] Found QEMU fw_cfg
[2015-11-13 18:46:00] RamBlock: addr 0x0000000000000000 len 0x00000000c0000000 [e820]
[2015-11-13 18:46:00] RamBlock: addr 0x0000000100000000 len 0x0000000340000000 [e820]
[2015-11-13 18:46:00] Moving pm_base to 0x600
[2015-11-13 18:46:00] boot order:
[2015-11-13 18:46:00] 1: /pci at i0cf8/scsi at e/disk at 0,0
[2015-11-13 18:46:00] 2: HALT
[2015-11-13 18:46:00] CPU Mhz=2402
[2015-11-13 18:46:00] === PCI bus & bridge init ===
[2015-11-13 18:46:00] PCI: pci_bios_init_bus_rec bus = 0x0
[2015-11-13 18:46:00] === PCI device probing ===
[2015-11-13 18:46:00] Found 21 PCI devices (max PCI bus is 00)
[2015-11-13 18:46:00] === PCI new allocation pass #1 ===
[2015-11-13 18:46:00] PCI: check devices
[2015-11-13 18:46:00] === PCI new allocation pass #2 ===
[2015-11-13 18:46:00] PCI: IO: c000 - c1cf
[2015-11-13 18:46:00] PCI: 32: 00000000c0000000 - 00000000fec00000
[2015-11-13 18:46:00] PCI: map device bdf=00:1f.0  bar 0, addr 0000c000, size 00000100 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:0e.0  bar 0, addr 0000c100, size 00000040 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:0f.0  bar 0, addr 0000c140, size 00000040 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:01.2  bar 4, addr 0000c180, size 00000020 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:0d.0  bar 0, addr 0000c1a0, size 00000020 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:01.1  bar 4, addr 0000c1c0, size 00000010 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:02.0  bar 6, addr febe0000, size 00010000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:02.0  bar 1, addr febf0000, size 00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0d.0  bar 1, addr febf1000, size 00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0e.0  bar 1, addr febf2000, size 00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0f.0  bar 1, addr febf3000, size 00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:10.0  bar 0, addr febf4000, size 00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:02.0  bar 0, addr f6000000, size 02000000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:10.0  bar 2, addr f8000000, size 01000000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:03.0  bar 2, addr f9000000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:04.0  bar 2, addr f9800000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:05.0  bar 2, addr fa000000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:06.0  bar 2, addr fa800000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:07.0  bar 2, addr fb000000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:08.0  bar 2, addr fb800000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:09.0  bar 2, addr fc000000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0a.0  bar 2, addr fc800000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0b.0  bar 2, addr fd000000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0c.0  bar 2, addr fd800000, size 00800000 [prefmem]
[2015-11-13 18:46:00] PCI: init bdf=00:00.0 id=8086:1237
[2015-11-13 18:46:00] PCI: init bdf=00:01.0 id=8086:7000
[2015-11-13 18:46:00] PIIX3/PIIX4 init: elcr=00 0c
[2015-11-13 18:46:00] PCI: init bdf=00:01.1 id=8086:7010
[2015-11-13 18:46:00] PCI: init bdf=00:01.2 id=8086:7020
[2015-11-13 18:46:00] PCI: init bdf=00:01.3 id=8086:7113
[2015-11-13 18:46:00] Using pmtimer, ioport 0x608
[2015-11-13 18:46:00] PCI: init bdf=00:02.0 id=1013:00b8
[2015-11-13 18:46:00] PCI: init bdf=00:03.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:04.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:05.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:06.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:07.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:08.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:09.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:0a.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:0b.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:0c.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:0d.0 id=1af4:1003
[2015-11-13 18:46:00] PCI: init bdf=00:0e.0 id=1af4:1001
[2015-11-13 18:46:00] PCI: init bdf=00:0f.0 id=1af4:1001
[2015-11-13 18:46:00] PCI: init bdf=00:10.0 id=1af4:1110
[2015-11-13 18:46:00] PCI: init bdf=00:1f.0 id=1af4:8888
[2015-11-13 18:46:00] PCI: Using 00:02.0 for primary VGA
[2015-11-13 18:46:00] handle_smp: apic_id=1
[2015-11-13 18:46:00] handle_smp: apic_id=6
[2015-11-13 18:46:00] handle_smp: apic_id=7
[2015-11-13 18:46:00] handle_smp: apic_id=3
[2015-11-13 18:46:00] handle_smp: apic_id=2
[2015-11-13 18:46:00] handle_smp: apic_id=5
[2015-11-13 18:46:00] handle_smp: apic_id=4
========The End, nothing more======
>On Mon, Nov 09, 2015 at 03:06:18PM -0500, Kevin O'Connor wrote:
>> On Mon, Nov 09, 2015 at 08:32:53AM -0500, Kevin O'Connor wrote:
>> > On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
>> > > >On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote:
>> > > >I'm surprised you would see the above on a recent qemu/kvm though - as
>> > > >on a newer KVM I think the second reset would have to happen after
>> > > >HaveAttemptedReboot is set and prior to the memcpy in
>> > > >qemu_prep_reset() completing.  Can you verify your KVM version?
>> > > 
>> > > I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can 
>> > > see this problem. 
>> > > I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, 
>> > > let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, 
>> > > a self-defined timeout, HA mechnism will issue a internal reboot command to
>> > > the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, 
>> > > aforementioned problem will occurs in high probability. 
>> > 
>> > Ah, okay.  I'm not sure what the best solution to this problem is.
>> 
>> After thinking about this further, I think we can move the
>> HaveAttemptedReboot assignment after the memcpy.
>
>The previous patch could cause corruption if the memcpy() failed.  I
>think the new SeaBIOS patch below should be okay though.
>
>-Kevin
>
>
>commit 8a6e44ad5c953266d2339b3299f5fb4ff32c8cbb
>Author: Kevin O'Connor <kevin at koconnor.net>
>Date:   Mon Nov 9 15:00:19 2015 -0500
>
>    resume: Make KVM soft reboot loop detection more flexible
>    
>    Move the check for soft reboot loops from resume.c to shadow.c and
>    directly check for the case where the memcpy fails.  This prevents a
>    hang if an external reboot request occurs during the BIOS memcpy.
>    
>    Signed-off-by: Kevin O'Connor <kevin at koconnor.net>
>
>diff --git a/src/fw/shadow.c b/src/fw/shadow.c
>index ee87d36..b2f2dd8 100644
>--- a/src/fw/shadow.c
>+++ b/src/fw/shadow.c
>@@ -156,6 +156,8 @@ make_bios_readonly(void)
>         make_bios_readonly_intel(ShadowBDF, Q35_HOST_BRIDGE_PAM0);
> }
> 
>+static u8 AttemptingReboot;
>+
> void
> qemu_prep_reset(void)
> {
>@@ -164,6 +166,19 @@ qemu_prep_reset(void)
>     // QEMU doesn't map 0xc0000-0xfffff back to the original rom on a
>     // reset, so do that manually before invoking a hard reset.
>     make_bios_writable();
>+    AttemptingReboot = 1;
>+    barrier();
>+    if (!AttemptingReboot)
>+        goto fail;
>+    barrier();
>     memcpy(VSYMBOL(code32flat_start), VSYMBOL(code32flat_start) + BIOS_SRC_OFFSET
>            , SYMBOL(code32flat_end) - SYMBOL(code32flat_start));
>+    barrier();
>+    if (AttemptingReboot)
>+        goto fail;
>+    return;
>+fail:
>+    // Attempt to restore code has failed - try to shutdown machine.
>+    dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
>+    apm_shutdown();
> }
>diff --git a/src/resume.c b/src/resume.c
>index a5465d8..afeadcf 100644
>--- a/src/resume.c
>+++ b/src/resume.c
>@@ -114,19 +114,10 @@ s3_resume(void)
>     farcall16big(&br);
> }
> 
>-u8 HaveAttemptedReboot VARLOW;
>-
> // Attempt to invoke a hard-reboot.
> static void
> tryReboot(void)
> {
>-    if (HaveAttemptedReboot) {
>-        // Hard reboot has failed - try to shutdown machine.
>-        dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
>-        apm_shutdown();
>-    }
>-    HaveAttemptedReboot = 1;
>-
>     dprintf(1, "Attempting a hard reboot\n");
> 
>     // Setup for reset on qemu.


More information about the SeaBIOS mailing list