Dear Kevin,
Sorry for delayed replying. This patch works for me well. Thanks a lot!
Recently, I found another odd thing. A qemu-kvm VM is stuck at the SeaBIOS after self-rebooting many times. Analyzing the SeaBIOS log attached below, I think there maybe someting wrong from this block of code:
/src/fw/smp.c
u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; while (cmos_smp_count != CountCPUs) asm volatile( // Release lock and allow other processors to use the stack. " movl %%esp, %1\n" " movl $0, %0\n" // Reacquire lock and take back ownership of stack. "1:rep ; nop\n" " lock btsl $0, %0\n" " jc 1b\n" : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); yield();
It seems if SeaBIOS read an incorrect number sometimes from QEMU through cmos 0x5f,the SeaBIOS really may be stucked. So, i wonder what may cause this problem after a VM self-rebooting many times?
================bad SeaBIOS log=========== [2015-11-13 18:45:58] In resume (status=0) [2015-11-13 18:45:58] In 32bit resume [2015-11-13 18:45:58] Attempting a hard reboot [2015-11-13 18:46:00] SeaBIOS (version rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org) [2015-11-13 18:46:00] No Xen hypervisor found. [2015-11-13 18:46:00] Running on QEMU (i440fx) [2015-11-13 18:46:00] Running on KVM [2015-11-13 18:46:00] RamSize: 0xc0000000 [cmos] [2015-11-13 18:46:00] Relocating init from 0x000de8f0 to 0xbffaec00 (size 70464) [2015-11-13 18:46:00] Found QEMU fw_cfg [2015-11-13 18:46:00] RamBlock: addr 0x0000000000000000 len 0x00000000c0000000 [e820] [2015-11-13 18:46:00] RamBlock: addr 0x0000000100000000 len 0x0000000340000000 [e820] [2015-11-13 18:46:00] Moving pm_base to 0x600 [2015-11-13 18:46:00] boot order: [2015-11-13 18:46:00] 1: /pci@i0cf8/scsi@e/disk@0,0 [2015-11-13 18:46:00] 2: HALT [2015-11-13 18:46:00] CPU Mhz=2402 [2015-11-13 18:46:00] === PCI bus & bridge init === [2015-11-13 18:46:00] PCI: pci_bios_init_bus_rec bus = 0x0 [2015-11-13 18:46:00] === PCI device probing === [2015-11-13 18:46:00] Found 21 PCI devices (max PCI bus is 00) [2015-11-13 18:46:00] === PCI new allocation pass #1 === [2015-11-13 18:46:00] PCI: check devices [2015-11-13 18:46:00] === PCI new allocation pass #2 === [2015-11-13 18:46:00] PCI: IO: c000 - c1cf [2015-11-13 18:46:00] PCI: 32: 00000000c0000000 - 00000000fec00000 [2015-11-13 18:46:00] PCI: map device bdf=00:1f.0 bar 0, addr 0000c000, size 00000100 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0e.0 bar 0, addr 0000c100, size 00000040 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0f.0 bar 0, addr 0000c140, size 00000040 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:01.2 bar 4, addr 0000c180, size 00000020 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0d.0 bar 0, addr 0000c1a0, size 00000020 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:01.1 bar 4, addr 0000c1c0, size 00000010 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 6, addr febe0000, size 00010000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 1, addr febf0000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0d.0 bar 1, addr febf1000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0e.0 bar 1, addr febf2000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0f.0 bar 1, addr febf3000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:10.0 bar 0, addr febf4000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 0, addr f6000000, size 02000000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:10.0 bar 2, addr f8000000, size 01000000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:03.0 bar 2, addr f9000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:04.0 bar 2, addr f9800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:05.0 bar 2, addr fa000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:06.0 bar 2, addr fa800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:07.0 bar 2, addr fb000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:08.0 bar 2, addr fb800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:09.0 bar 2, addr fc000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0a.0 bar 2, addr fc800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0b.0 bar 2, addr fd000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0c.0 bar 2, addr fd800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: init bdf=00:00.0 id=8086:1237 [2015-11-13 18:46:00] PCI: init bdf=00:01.0 id=8086:7000 [2015-11-13 18:46:00] PIIX3/PIIX4 init: elcr=00 0c [2015-11-13 18:46:00] PCI: init bdf=00:01.1 id=8086:7010 [2015-11-13 18:46:00] PCI: init bdf=00:01.2 id=8086:7020 [2015-11-13 18:46:00] PCI: init bdf=00:01.3 id=8086:7113 [2015-11-13 18:46:00] Using pmtimer, ioport 0x608 [2015-11-13 18:46:00] PCI: init bdf=00:02.0 id=1013:00b8 [2015-11-13 18:46:00] PCI: init bdf=00:03.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:04.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:05.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:06.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:07.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:08.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:09.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0a.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0b.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0c.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0d.0 id=1af4:1003 [2015-11-13 18:46:00] PCI: init bdf=00:0e.0 id=1af4:1001 [2015-11-13 18:46:00] PCI: init bdf=00:0f.0 id=1af4:1001 [2015-11-13 18:46:00] PCI: init bdf=00:10.0 id=1af4:1110 [2015-11-13 18:46:00] PCI: init bdf=00:1f.0 id=1af4:8888 [2015-11-13 18:46:00] PCI: Using 00:02.0 for primary VGA [2015-11-13 18:46:00] handle_smp: apic_id=1 [2015-11-13 18:46:00] handle_smp: apic_id=6 [2015-11-13 18:46:00] handle_smp: apic_id=7 [2015-11-13 18:46:00] handle_smp: apic_id=3 [2015-11-13 18:46:00] handle_smp: apic_id=2 [2015-11-13 18:46:00] handle_smp: apic_id=5 [2015-11-13 18:46:00] handle_smp: apic_id=4 ========The End, nothing more======
On Mon, Nov 09, 2015 at 03:06:18PM -0500, Kevin O'Connor wrote:
On Mon, Nov 09, 2015 at 08:32:53AM -0500, Kevin O'Connor wrote:
On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote: I'm surprised you would see the above on a recent qemu/kvm though - as on a newer KVM I think the second reset would have to happen after HaveAttemptedReboot is set and prior to the memcpy in qemu_prep_reset() completing. Can you verify your KVM version?
I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can see this problem. I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, a self-defined timeout, HA mechnism will issue a internal reboot command to the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, aforementioned problem will occurs in high probability.
Ah, okay. I'm not sure what the best solution to this problem is.
After thinking about this further, I think we can move the HaveAttemptedReboot assignment after the memcpy.
The previous patch could cause corruption if the memcpy() failed. I think the new SeaBIOS patch below should be okay though.
-Kevin
commit 8a6e44ad5c953266d2339b3299f5fb4ff32c8cbb Author: Kevin O'Connor kevin@koconnor.net Date: Mon Nov 9 15:00:19 2015 -0500
resume: Make KVM soft reboot loop detection more flexible
Move the check for soft reboot loops from resume.c to shadow.c and directly check for the case where the memcpy fails. This prevents a hang if an external reboot request occurs during the BIOS memcpy.
Signed-off-by: Kevin O'Connor kevin@koconnor.net
diff --git a/src/fw/shadow.c b/src/fw/shadow.c index ee87d36..b2f2dd8 100644 --- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -156,6 +156,8 @@ make_bios_readonly(void) make_bios_readonly_intel(ShadowBDF, Q35_HOST_BRIDGE_PAM0); }
+static u8 AttemptingReboot;
void qemu_prep_reset(void) { @@ -164,6 +166,19 @@ qemu_prep_reset(void) // QEMU doesn't map 0xc0000-0xfffff back to the original rom on a // reset, so do that manually before invoking a hard reset. make_bios_writable();
- AttemptingReboot = 1;
- barrier();
- if (!AttemptingReboot)
goto fail;
- barrier(); memcpy(VSYMBOL(code32flat_start), VSYMBOL(code32flat_start) + BIOS_SRC_OFFSET , SYMBOL(code32flat_end) - SYMBOL(code32flat_start));
- barrier();
- if (AttemptingReboot)
goto fail;
- return;
+fail:
- // Attempt to restore code has failed - try to shutdown machine.
- dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
- apm_shutdown();
} diff --git a/src/resume.c b/src/resume.c index a5465d8..afeadcf 100644 --- a/src/resume.c +++ b/src/resume.c @@ -114,19 +114,10 @@ s3_resume(void) farcall16big(&br); }
-u8 HaveAttemptedReboot VARLOW;
// Attempt to invoke a hard-reboot. static void tryReboot(void) {
if (HaveAttemptedReboot) {
// Hard reboot has failed - try to shutdown machine.
dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
apm_shutdown();
}
HaveAttemptedReboot = 1;
dprintf(1, "Attempting a hard reboot\n");
// Setup for reset on qemu.