Hi Kevin,
When I use old seabios in some stable linux release, some bootable devices (2 ide disks) would be lost when I try to restart guest by Ctrl+Alt+Delete during boot stage.
Releated Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1129549
I found an upstream commit [1] fixed this bug, but when I backport this patch to old seabios, guest will shutdown when I try to restart by Ctrl+Alt+Delete during boot stage.
Kevin, can you help to describe that: "Unfortunately, kvm does not keep a pristine copy of the BIOS at 0xffff0000" It's a kvm (userspace, QEMU) bug?
If it's a qemu-kvm bug, I should also fix this bz in old stable release.
[1] =========================================== commit 244caf86f11f5f65d166d91704f64cb673167abc Author: Kevin O'Connor kevin@koconnor.net Date: Wed Sep 15 21:48:16 2010 -0400
Try to hard-reboot on rerun of post even on emulators.
Extend the hard-reboot logic to qemu and kvm. On qemu, a reboot will not reset the memory settings for 0xc0000-0xfffff, so copy that memory area manually before rebooting. Unfortunately, kvm does not keep a pristine copy of the BIOS at 0xffff0000, so detect that case and shutdown the machine.
Two backport dependences: [PATCH] Try to hard-reboot processor on rerun of post under coreboot. [PATCH] Don't do shadow copying of optionroms when CONFIG_OPTIONROMS_DEPLOYED.
On Tue, Apr 21, 2015 at 07:31:36AM +0800, Amos Kong wrote:
Hi Kevin,
When I use old seabios in some stable linux release, some bootable devices (2 ide disks) would be lost when I try to restart guest by Ctrl+Alt+Delete during boot stage.
Releated Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1129549
I found an upstream commit [1] fixed this bug, but when I backport this patch to old seabios, guest will shutdown when I try to restart by Ctrl+Alt+Delete during boot stage.
Kevin, can you help to describe that: "Unfortunately, kvm does not keep a pristine copy of the BIOS at 0xffff0000" It's a kvm (userspace, QEMU) bug?
If it's a qemu-kvm bug, I should also fix this bz in old stable release.
Yes, my recollection was that it was a kvm bug. It was fixed in kvm after the above was committed to seabios - I don't know what the commit id was.
On a typical x86 machine, the BIOS image is located in read-only memory at 0xffff0000. The chipsets typically also support shadowing that image to ram (or as a read-only copy) at 0xf0000. However, neither qemu nor kvm fully support all the shadowing capabilities of a typical x86 chipset. So, seabios will copy itself from the image at 0xffff0000 to ram at 0xf0000. Unfortunately, kvm had a bug where the resulting ram image at 0xf0000 was actually mapped to the same ram at 0xffff0000 and changes to the memory copy at 0xf0000 would also change the copy at 0xffff0000. This made it impossible for reboots to redeploy the original pristine copy of seabios.
-Kevin
On 21/04/2015 02:29, Kevin O'Connor wrote:
On a typical x86 machine, the BIOS image is located in read-only memory at 0xffff0000. The chipsets typically also support shadowing that image to ram (or as a read-only copy) at 0xf0000. However, neither qemu nor kvm fully support all the shadowing capabilities of a typical x86 chipset. So, seabios will copy itself from the image at 0xffff0000 to ram at 0xf0000. Unfortunately, kvm had a bug where the resulting ram image at 0xf0000 was actually mapped to the same ram at 0xffff0000 and changes to the memory copy at 0xf0000 would also change the copy at 0xffff0000. This made it impossible for reboots to redeploy the original pristine copy of seabios.
Nowadays QEMU and KVM can emulate this correctly, but any version of QEMU before the introduction of the memory API (before 1.1 roughly) was not able to support this.
Paolo
Hi Amos,
On 04/21/15 01:31, Amos Kong wrote:
Hi Kevin,
When I use old seabios in some stable linux release, some bootable devices (2 ide disks) would be lost when I try to restart guest by Ctrl+Alt+Delete during boot stage.
Releated Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1129549
I found an upstream commit [1] fixed this bug, but when I backport this patch to old seabios, guest will shutdown when I try to restart by Ctrl+Alt+Delete during boot stage.
Kevin, can you help to describe that: "Unfortunately, kvm does not keep a pristine copy of the BIOS at 0xffff0000" It's a kvm (userspace, QEMU) bug?
If it's a qemu-kvm bug, I should also fix this bz in old stable release.
[1] =========================================== commit 244caf86f11f5f65d166d91704f64cb673167abc Author: Kevin O'Connor kevin@koconnor.net Date: Wed Sep 15 21:48:16 2010 -0400
Try to hard-reboot on rerun of post even on emulators. Extend the hard-reboot logic to qemu and kvm. On qemu, a reboot will not reset the memory settings for 0xc0000-0xfffff, so copy that memory area manually before rebooting. Unfortunately, kvm does not keep a pristine copy of the BIOS at 0xffff0000, so detect that case and shutdown the machine.
Two backport dependences: [PATCH] Try to hard-reboot processor on rerun of post under coreboot. [PATCH] Don't do shadow copying of optionroms when CONFIG_OPTIONROMS_DEPLOYED.
- Please open https://bugzilla.redhat.com/show_bug.cgi?id=1027565 (it's a public bug) - Please locate the "Unwrap comments" link, and click it - Then go to comment #20 in the bug -- it's simplest to click this here: https://bugzilla.redhat.com/show_bug.cgi?id=1027565#c20
The diagrams in that comment explain the difference between the RHEL-6 and RHEL-7 memory maps that the corresponding QEMU versions provide. Importantly, as Kevin explained too:
(a) in RHEL-6 qemu, the PAM registers that control *where* reads and writes to the region [0xc0000, 0xfffff] end up are not implemented (b) in RHEL-6 qemu, the "pc.ram" RAMBlock that provides the guest's "main RAM" is *hidden* by the "pc.bios" RAMBlock in the [0xe0000, 0xfffff] region. (c) in addition, the exact same "pc.bios" RAMBlock is visible at [0xfffe0000, 0xffffffff]
RHEL-7 is different. The PAM registers *are* implemented (well, mostly), and whether [0xe0000, 0xfffff] shows a window into RAM or PCI address space, that is controlled by some of these registers.
If the registers are set so that reads go to the PCI address space instead of RAM (= the "pc.ram" RAMBlock) -- grep the source for "pam-pci") -- then the [0xe0000, 0xfffff] range ends up showing the "isa-bios" range. Use the "info mtree" command. And "isa-bios" is again an alias, a window into the "pc.bios" RAMBlock, that is visible in PCI address space at 0xfffe0000.
In summary, - on RHEL-6 you have no working PAM registers, and [0xc0000, 0xfffff] and [0xfffe0000, 0xffffffff] always show the same. - on RHEL-7, you have (mostly) working PAM registers, and those can change what [0xc0000, 0xfffff] shows. Dependent on the PAM settings, this range can be a window into RAM, or it can be a window into "pc.bios", which resides at 0xfffe0000.
Now, different versions of SeaBIOS handle these environments differently. The topic of bug 1027565 was the following situation:
suppose that you boot a virtual machine on a RHEL-6 host (which implies RHEL-6, ie. PAM-less qemu, *and* RHEL-6 SeaBIOS), then migrate it to RHEL-7 (which implies RHEL-7, PAM-capable, qemu, *but* SeaBIOS stays the same, as it comes with migration), and *then* you reboot the VM on the target (RHEL-7) host -- this means that RHEL-6 SeaBIOS will reboot in a RHEL-7 (PAM-capable) VM.
This was a problem because: - when RHEL-6 SeaBIOS originally started on the RHEL-6 host, it modified the "pc.bios" RAMBlock, and that modification was visible in both [0xe0000, 0xfffff] and [0xfffe0000, 0xffffffff].
- When this guest was migrated to a RHEL-7 host, then rebooted, then (due to the RHEL-7-only PAM settings showing "pc.ram" at [0xe0000, 0xfffff]) those original modifications were *only* visible at [0xfffe0000, 0xffffffff]. This tripped up RHEL-6 SeaBIOS at reboot, because it expected to see (on reboot) its original modifications at [0xe0000, 0xfffff]. (The PAM registers are not re-set on reset.)
The klude we implemented for this was to manually re-shadow the BIOS from "pc.bios" to "pc.ram" on the RHEL-7 host, when the machine type implied RHEL-6.
Okay, so how does this relate to your question? The SeaBIOS commit you reference, 244caf86, makes several statements:
(1) "On qemu, a reboot will not reset the memory settings for 0xc0000-0xfffff" (2) "kvm does not keep a pristine copy of the BIOS at 0xffff0000"
Claim (1) remains true on RHEL-7 qemu as well.
Wrt. claim (2), you can see that this SeaBIOS commit dates back to the "RHEL-6 era", because it is *no longer true* for RHEL-7 qemu. On RHEL-7 qemu, the PAM registers work (mostly), and the copy of the BIOS at [0xfffe0000, 0xffffffff] *is* pristine. The function old_pc_system_rom_init() in "hw/i386/pc_sysfw.c" makes it read-only (both directly and also when seen via the isa-bios alias, dependent on the PAM registers):
if (!isapc_ram_fw) { memory_region_set_readonly(bios, true); }
...
if (!isapc_ram_fw) { memory_region_set_readonly(isa_bios, true); }
This makes a verbatim backport of this upstream SeaBIOS commit inappropriate for a RHEL-7 qemu (and that situation emerges when a VM is migrated from a RHEL-6 to a RHEL-7 host).
As to how you can fix https://bugzilla.redhat.com/show_bug.cgi?id=1129549 (which is the motivation for this entire discussion):
- You need to identify *what variable exactly* stores the list of bootable devices. Is that a SeaBIOS variable? Is it some field in the BDA (BIOS Data Area)? Something else? Because the bug is that this variable, wherever it lives, is not re-set on reboot.
(Kevin, can you perhaps help with this question? Thank you.)
- Once you identified the variable or field, you should figure out its lifecycle -- how it is affected by all of the above.
- The fix you come up with may be a backport from upstream SeaBIOS, but it also might have to be downstream only (ie. divergent). In particular whatever fix you find for RHEL-6 SeaBIOS, it must also work when such a VM is migrated to RHEL-7, and rebooted there.
As far as I understand SeaBIOS commit 244caf86, it tries to implement the following: - it knows that qemu does not reset stuff "hard enough" on reset, so it tries to make up for it, manually - it knows that "making up for it manually" is not possible on KVM at all, so it shuts down instead.
Implemented as:
the variable called HaveRunPost lives somewhere in the low [0xe0000, 0xfffff] range. When the guest is booted first, the initial value of HaveRunPost is 0, so _start() does not invoke tryReboot(), and HaveRunPost gets set o 1.
After a reboot, HaveRunPost is still 1, and this happens:
_start() tryReboot() qemu_prep_reset() make_bios_writable() copy from [0xfffe0000, 0xffffffff] to [0xe0000, 0xfffff] if (HaveRunPost) --> apm_shutdown() i8042_reboot()
The "copy" operation in the above seeks to *restore* (part of) the BIOS image at [0xe0000, 0xfffff], including its variable HaveRunPost, from the "pristine" copy at [0xfffe0000, 0xffffffff]. If that is successful, then HaveRunPost gets implicitly cleared (due to the copy operation), and then we proceed to a hard reboot. And this hard reboot is what restores the boot device list, I guess.
If HaveRunPost does not go from 1 to 0 due to the copy operation, then SeaBIOS has no way to restore itself to the "pristine" image -- there is no pristine image. (And that's the case on RHEL-6, because when you modify HaveRunPost in the [0xe0000, 0xfffff] range, that is immediately reflected in [0xfffe0000, 0xffffffff] too, so the copy is actually a no-op on RHEL-6. That's why you get the shutdown on RHEL-6.)
... I think that you cannot backport 244caf86 to RHEL-6 SeaBIOS. Namely, RHEL-6 has exactly those circumstances where SeaBIOS has no chance at a "real" hard reset; there is no "pristine" copy to restore stuff from. The solution for that would be fixing RHEL-6 qemu (see your own https://bugzilla.redhat.com/show_bug.cgi?id=1129549#c15), but fixing the PAM registers and all the memory stuff in RHEL-6 is *completely* out of scope.
So here's what you can do in RHEL-6 SeaBIOS (as I said above): - identify the exact variables / BDA fields that control the boot device list, and clear them manually on *each* SeaBIOS startup. That will cover cold boots (when it will amount to a no-op) and on warm boots (when it will fix your BZ.)
(Upstream 244caf86 would do that clearing with the copy operation, but that copy operation will *never* work on RHEL-6.)
- Then, verify that this manual, downstream-only fix works also after the VM is migrated to a RHEL-7 host.
... Apologies for the very long and messy email, but I hope it helps us all understand the issue.
Thanks Laszlo
On Tue, Apr 21, 2015 at 10:37:49AM +0200, Laszlo Ersek wrote:
As to how you can fix https://bugzilla.redhat.com/show_bug.cgi?id=1129549 (which is the motivation for this entire discussion):
You need to identify *what variable exactly* stores the list of bootable devices. Is that a SeaBIOS variable? Is it some field in the BDA (BIOS Data Area)? Something else? Because the bug is that this variable, wherever it lives, is not re-set on reboot.
(Kevin, can you perhaps help with this question? Thank you.)
Prior to 244caf86 SeaBIOS attempted to manually reset all its internal global variables on each boot. This got too messy and error-prone, so 244caf86 was implemented.
As Laszlo states, my recollection is that one should be able to continue to manually reset variables on each boot prior to 244caf86. It should not (according to my recollection) be necessary to do anything with the BDA or EBDA because SeaBIOS used to reset that on every boot anyway. Unfortunately, I don't know which global variable not being reset would lead to the bugzilla entry above.
-Kevin
On 04/21/15 17:06, Kevin O'Connor wrote:
On Tue, Apr 21, 2015 at 10:37:49AM +0200, Laszlo Ersek wrote:
As to how you can fix https://bugzilla.redhat.com/show_bug.cgi?id=1129549 (which is the motivation for this entire discussion):
You need to identify *what variable exactly* stores the list of bootable devices. Is that a SeaBIOS variable? Is it some field in the BDA (BIOS Data Area)? Something else? Because the bug is that this variable, wherever it lives, is not re-set on reboot.
(Kevin, can you perhaps help with this question? Thank you.)
Prior to 244caf86 SeaBIOS attempted to manually reset all its internal global variables on each boot. This got too messy and error-prone, so 244caf86 was implemented.
As Laszlo states, my recollection is that one should be able to continue to manually reset variables on each boot prior to 244caf86. It should not (according to my recollection) be necessary to do anything with the BDA or EBDA because SeaBIOS used to reset that on every boot anyway. Unfortunately, I don't know which global variable not being reset would lead to the bugzilla entry above.
Thank you, Kevin, for reading through all this, and confirming that manually resetting some variables is technically viable (maybe not really nice, but with RHEL-6 "nice" is not the primary objective...)
I think Amos should be able to locate the exact variables by generously sprinkling the code with debug messages (or just enabling the present ones), and comparing the debug logs of successful vs. failed runs. ("Differential diagnosis", hat tip to House MD. :))
Thanks! Laszlo