On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote:
On 2015/11/3 14:58, Xulei (Stone, Euler) wrote:
On qemu-kvm platform, when I reset a VM through "virsh reset", and coincidently the VM is in process of internal rebooting at the same time. Then the VM will not be successfully reseted any more due to the reset reentrancy. I found: (1)SeaBios try to shutdown the VM after reseting it failed by apm_shutdown(). However, apm_shutdown() does not work on qemu-kvm platform; (2)I add 1s sleep in qemu_prep_reset(), then continuously reset the VM twice, aforementioned case must happen.
So, the problem occurs when issuing a second reset before the first reset completes?
Yes. Detailedly, the 2nd reset issued after "HaveAttemptedReboot = 1" and prior to the memcpy completing in qemu_prep_reset().
This patch fixes this issue by letting the VM always execute the reboot routing while a reenrancy happenes instead of attempting apm_shutdown on qemu-kvm platform.
The reason for the HaveAttemptedReboot check is to work around old versions of KVM that unexpectedly map the same memory to both 0xf0000 and 0xffff0000. So, it does not make sense to wrap the check in a !runningOnKVM() block as that disables the only reason for the check.
I'm surprised you would see the above on a recent qemu/kvm though - as on a newer KVM I think the second reset would have to happen after HaveAttemptedReboot is set and prior to the memcpy in qemu_prep_reset() completing. Can you verify your KVM version?
-Kevin
I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can see this problem. I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, a self-defined timeout, HA mechnism will issue a internal reboot command to the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, aforementioned problem will occurs in high probability.
Ah, okay. I'm not sure what the best solution to this problem is. We don't want to exclude KVM because the check is meant to prevent an infinite loop on older versions of KVM (which looks like a mysterious hang to users). We also don't want to be in a situation where we reboot and the memcpy hasn't fully completed, as that's likely to lead to mysterious crashes on the next boot.
-Kevin
On Mon, Nov 09, 2015 at 08:32:53AM -0500, Kevin O'Connor wrote:
On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote: I'm surprised you would see the above on a recent qemu/kvm though - as on a newer KVM I think the second reset would have to happen after HaveAttemptedReboot is set and prior to the memcpy in qemu_prep_reset() completing. Can you verify your KVM version?
I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can see this problem. I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, a self-defined timeout, HA mechnism will issue a internal reboot command to the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, aforementioned problem will occurs in high probability.
Ah, okay. I'm not sure what the best solution to this problem is.
After thinking about this further, I think we can move the HaveAttemptedReboot assignment after the memcpy. Does the SeaBIOS patch below fix things for you?
-Kevin
commit d0e9e2cca9fa6dacd2ad07081ef09c59be1ae945 Author: Kevin O'Connor kevin@koconnor.net Date: Mon Nov 9 15:00:19 2015 -0500
resume: Don't set HaveAttemptedReboot until after internal bios memcpy
Move the check for soft reboot loops from resume.c to shadow.c and only set the HaveAttemptedReboot flag after restoring the BIOS image. This prevents a hang if an external reboot request occurs during the BIOS memcpy.
Signed-off-by: Kevin O'Connor kevin@koconnor.net
diff --git a/src/fw/shadow.c b/src/fw/shadow.c index ee87d36..f2d0d65 100644 --- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -156,6 +156,8 @@ make_bios_readonly(void) make_bios_readonly_intel(ShadowBDF, Q35_HOST_BRIDGE_PAM0); }
+u8 HaveAttemptedReboot VARLOW; + void qemu_prep_reset(void) { @@ -163,7 +165,13 @@ qemu_prep_reset(void) return; // QEMU doesn't map 0xc0000-0xfffff back to the original rom on a // reset, so do that manually before invoking a hard reset. + if (HaveAttemptedReboot) { + // Hard reboot has failed - try to shutdown machine. + dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n"); + apm_shutdown(); + } make_bios_writable(); memcpy(VSYMBOL(code32flat_start), VSYMBOL(code32flat_start) + BIOS_SRC_OFFSET , SYMBOL(code32flat_end) - SYMBOL(code32flat_start)); + HaveAttemptedReboot = 1; } diff --git a/src/resume.c b/src/resume.c index a5465d8..afeadcf 100644 --- a/src/resume.c +++ b/src/resume.c @@ -114,19 +114,10 @@ s3_resume(void) farcall16big(&br); }
-u8 HaveAttemptedReboot VARLOW; - // Attempt to invoke a hard-reboot. static void tryReboot(void) { - if (HaveAttemptedReboot) { - // Hard reboot has failed - try to shutdown machine. - dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n"); - apm_shutdown(); - } - HaveAttemptedReboot = 1; - dprintf(1, "Attempting a hard reboot\n");
// Setup for reset on qemu.
On Mon, Nov 09, 2015 at 03:06:18PM -0500, Kevin O'Connor wrote:
On Mon, Nov 09, 2015 at 08:32:53AM -0500, Kevin O'Connor wrote:
On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote: I'm surprised you would see the above on a recent qemu/kvm though - as on a newer KVM I think the second reset would have to happen after HaveAttemptedReboot is set and prior to the memcpy in qemu_prep_reset() completing. Can you verify your KVM version?
I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can see this problem. I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, a self-defined timeout, HA mechnism will issue a internal reboot command to the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, aforementioned problem will occurs in high probability.
Ah, okay. I'm not sure what the best solution to this problem is.
After thinking about this further, I think we can move the HaveAttemptedReboot assignment after the memcpy.
The previous patch could cause corruption if the memcpy() failed. I think the new SeaBIOS patch below should be okay though.
-Kevin
commit 8a6e44ad5c953266d2339b3299f5fb4ff32c8cbb Author: Kevin O'Connor kevin@koconnor.net Date: Mon Nov 9 15:00:19 2015 -0500
resume: Make KVM soft reboot loop detection more flexible
Move the check for soft reboot loops from resume.c to shadow.c and directly check for the case where the memcpy fails. This prevents a hang if an external reboot request occurs during the BIOS memcpy.
Signed-off-by: Kevin O'Connor kevin@koconnor.net
diff --git a/src/fw/shadow.c b/src/fw/shadow.c index ee87d36..b2f2dd8 100644 --- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -156,6 +156,8 @@ make_bios_readonly(void) make_bios_readonly_intel(ShadowBDF, Q35_HOST_BRIDGE_PAM0); }
+static u8 AttemptingReboot; + void qemu_prep_reset(void) { @@ -164,6 +166,19 @@ qemu_prep_reset(void) // QEMU doesn't map 0xc0000-0xfffff back to the original rom on a // reset, so do that manually before invoking a hard reset. make_bios_writable(); + AttemptingReboot = 1; + barrier(); + if (!AttemptingReboot) + goto fail; + barrier(); memcpy(VSYMBOL(code32flat_start), VSYMBOL(code32flat_start) + BIOS_SRC_OFFSET , SYMBOL(code32flat_end) - SYMBOL(code32flat_start)); + barrier(); + if (AttemptingReboot) + goto fail; + return; +fail: + // Attempt to restore code has failed - try to shutdown machine. + dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n"); + apm_shutdown(); } diff --git a/src/resume.c b/src/resume.c index a5465d8..afeadcf 100644 --- a/src/resume.c +++ b/src/resume.c @@ -114,19 +114,10 @@ s3_resume(void) farcall16big(&br); }
-u8 HaveAttemptedReboot VARLOW; - // Attempt to invoke a hard-reboot. static void tryReboot(void) { - if (HaveAttemptedReboot) { - // Hard reboot has failed - try to shutdown machine. - dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n"); - apm_shutdown(); - } - HaveAttemptedReboot = 1; - dprintf(1, "Attempting a hard reboot\n");
// Setup for reset on qemu.
Dear Kevin,
Sorry for delayed replying. This patch works for me well. Thanks a lot!
Recently, I found another odd thing. A qemu-kvm VM is stuck at the SeaBIOS after self-rebooting many times. Analyzing the SeaBIOS log attached below, I think there maybe someting wrong from this block of code:
/src/fw/smp.c
u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; while (cmos_smp_count != CountCPUs) asm volatile( // Release lock and allow other processors to use the stack. " movl %%esp, %1\n" " movl $0, %0\n" // Reacquire lock and take back ownership of stack. "1:rep ; nop\n" " lock btsl $0, %0\n" " jc 1b\n" : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); yield();
It seems if SeaBIOS read an incorrect number sometimes from QEMU through cmos 0x5f,the SeaBIOS really may be stucked. So, i wonder what may cause this problem after a VM self-rebooting many times?
================bad SeaBIOS log=========== [2015-11-13 18:45:58] In resume (status=0) [2015-11-13 18:45:58] In 32bit resume [2015-11-13 18:45:58] Attempting a hard reboot [2015-11-13 18:46:00] SeaBIOS (version rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org) [2015-11-13 18:46:00] No Xen hypervisor found. [2015-11-13 18:46:00] Running on QEMU (i440fx) [2015-11-13 18:46:00] Running on KVM [2015-11-13 18:46:00] RamSize: 0xc0000000 [cmos] [2015-11-13 18:46:00] Relocating init from 0x000de8f0 to 0xbffaec00 (size 70464) [2015-11-13 18:46:00] Found QEMU fw_cfg [2015-11-13 18:46:00] RamBlock: addr 0x0000000000000000 len 0x00000000c0000000 [e820] [2015-11-13 18:46:00] RamBlock: addr 0x0000000100000000 len 0x0000000340000000 [e820] [2015-11-13 18:46:00] Moving pm_base to 0x600 [2015-11-13 18:46:00] boot order: [2015-11-13 18:46:00] 1: /pci@i0cf8/scsi@e/disk@0,0 [2015-11-13 18:46:00] 2: HALT [2015-11-13 18:46:00] CPU Mhz=2402 [2015-11-13 18:46:00] === PCI bus & bridge init === [2015-11-13 18:46:00] PCI: pci_bios_init_bus_rec bus = 0x0 [2015-11-13 18:46:00] === PCI device probing === [2015-11-13 18:46:00] Found 21 PCI devices (max PCI bus is 00) [2015-11-13 18:46:00] === PCI new allocation pass #1 === [2015-11-13 18:46:00] PCI: check devices [2015-11-13 18:46:00] === PCI new allocation pass #2 === [2015-11-13 18:46:00] PCI: IO: c000 - c1cf [2015-11-13 18:46:00] PCI: 32: 00000000c0000000 - 00000000fec00000 [2015-11-13 18:46:00] PCI: map device bdf=00:1f.0 bar 0, addr 0000c000, size 00000100 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0e.0 bar 0, addr 0000c100, size 00000040 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0f.0 bar 0, addr 0000c140, size 00000040 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:01.2 bar 4, addr 0000c180, size 00000020 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:0d.0 bar 0, addr 0000c1a0, size 00000020 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:01.1 bar 4, addr 0000c1c0, size 00000010 [io] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 6, addr febe0000, size 00010000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 1, addr febf0000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0d.0 bar 1, addr febf1000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0e.0 bar 1, addr febf2000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:0f.0 bar 1, addr febf3000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:10.0 bar 0, addr febf4000, size 00001000 [mem] [2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 0, addr f6000000, size 02000000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:10.0 bar 2, addr f8000000, size 01000000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:03.0 bar 2, addr f9000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:04.0 bar 2, addr f9800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:05.0 bar 2, addr fa000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:06.0 bar 2, addr fa800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:07.0 bar 2, addr fb000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:08.0 bar 2, addr fb800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:09.0 bar 2, addr fc000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0a.0 bar 2, addr fc800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0b.0 bar 2, addr fd000000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: map device bdf=00:0c.0 bar 2, addr fd800000, size 00800000 [prefmem] [2015-11-13 18:46:00] PCI: init bdf=00:00.0 id=8086:1237 [2015-11-13 18:46:00] PCI: init bdf=00:01.0 id=8086:7000 [2015-11-13 18:46:00] PIIX3/PIIX4 init: elcr=00 0c [2015-11-13 18:46:00] PCI: init bdf=00:01.1 id=8086:7010 [2015-11-13 18:46:00] PCI: init bdf=00:01.2 id=8086:7020 [2015-11-13 18:46:00] PCI: init bdf=00:01.3 id=8086:7113 [2015-11-13 18:46:00] Using pmtimer, ioport 0x608 [2015-11-13 18:46:00] PCI: init bdf=00:02.0 id=1013:00b8 [2015-11-13 18:46:00] PCI: init bdf=00:03.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:04.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:05.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:06.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:07.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:08.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:09.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0a.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0b.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0c.0 id=15b3:1004 [2015-11-13 18:46:00] PCI: init bdf=00:0d.0 id=1af4:1003 [2015-11-13 18:46:00] PCI: init bdf=00:0e.0 id=1af4:1001 [2015-11-13 18:46:00] PCI: init bdf=00:0f.0 id=1af4:1001 [2015-11-13 18:46:00] PCI: init bdf=00:10.0 id=1af4:1110 [2015-11-13 18:46:00] PCI: init bdf=00:1f.0 id=1af4:8888 [2015-11-13 18:46:00] PCI: Using 00:02.0 for primary VGA [2015-11-13 18:46:00] handle_smp: apic_id=1 [2015-11-13 18:46:00] handle_smp: apic_id=6 [2015-11-13 18:46:00] handle_smp: apic_id=7 [2015-11-13 18:46:00] handle_smp: apic_id=3 [2015-11-13 18:46:00] handle_smp: apic_id=2 [2015-11-13 18:46:00] handle_smp: apic_id=5 [2015-11-13 18:46:00] handle_smp: apic_id=4 ========The End, nothing more======
On Mon, Nov 09, 2015 at 03:06:18PM -0500, Kevin O'Connor wrote:
On Mon, Nov 09, 2015 at 08:32:53AM -0500, Kevin O'Connor wrote:
On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote: I'm surprised you would see the above on a recent qemu/kvm though - as on a newer KVM I think the second reset would have to happen after HaveAttemptedReboot is set and prior to the memcpy in qemu_prep_reset() completing. Can you verify your KVM version?
I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can see this problem. I do like this: put a HA and a watchdog mechanism in a VM. Deliberately, let this VM lose heartbeat and don't feed dog. Then, after 2 minutes, a self-defined timeout, HA mechnism will issue a internal reboot command to the VM and watchdog mechanism will issue a "virsh reset" from the host. Then, aforementioned problem will occurs in high probability.
Ah, okay. I'm not sure what the best solution to this problem is.
After thinking about this further, I think we can move the HaveAttemptedReboot assignment after the memcpy.
The previous patch could cause corruption if the memcpy() failed. I think the new SeaBIOS patch below should be okay though.
-Kevin
commit 8a6e44ad5c953266d2339b3299f5fb4ff32c8cbb Author: Kevin O'Connor kevin@koconnor.net Date: Mon Nov 9 15:00:19 2015 -0500
resume: Make KVM soft reboot loop detection more flexible
Move the check for soft reboot loops from resume.c to shadow.c and directly check for the case where the memcpy fails. This prevents a hang if an external reboot request occurs during the BIOS memcpy.
Signed-off-by: Kevin O'Connor kevin@koconnor.net
diff --git a/src/fw/shadow.c b/src/fw/shadow.c index ee87d36..b2f2dd8 100644 --- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -156,6 +156,8 @@ make_bios_readonly(void) make_bios_readonly_intel(ShadowBDF, Q35_HOST_BRIDGE_PAM0); }
+static u8 AttemptingReboot;
void qemu_prep_reset(void) { @@ -164,6 +166,19 @@ qemu_prep_reset(void) // QEMU doesn't map 0xc0000-0xfffff back to the original rom on a // reset, so do that manually before invoking a hard reset. make_bios_writable();
- AttemptingReboot = 1;
- barrier();
- if (!AttemptingReboot)
goto fail;
- barrier(); memcpy(VSYMBOL(code32flat_start), VSYMBOL(code32flat_start) + BIOS_SRC_OFFSET , SYMBOL(code32flat_end) - SYMBOL(code32flat_start));
- barrier();
- if (AttemptingReboot)
goto fail;
- return;
+fail:
- // Attempt to restore code has failed - try to shutdown machine.
- dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
- apm_shutdown();
} diff --git a/src/resume.c b/src/resume.c index a5465d8..afeadcf 100644 --- a/src/resume.c +++ b/src/resume.c @@ -114,19 +114,10 @@ s3_resume(void) farcall16big(&br); }
-u8 HaveAttemptedReboot VARLOW;
// Attempt to invoke a hard-reboot. static void tryReboot(void) {
if (HaveAttemptedReboot) {
// Hard reboot has failed - try to shutdown machine.
dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
apm_shutdown();
}
HaveAttemptedReboot = 1;
dprintf(1, "Attempting a hard reboot\n");
// Setup for reset on qemu.