Hi Amos,
On 04/21/15 01:31, Amos Kong wrote:
Hi Kevin,
When I use old seabios in some stable linux release, some bootable devices (2 ide disks) would be lost when I try to restart guest by Ctrl+Alt+Delete during boot stage.
Releated Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1129549
I found an upstream commit [1] fixed this bug, but when I backport this patch to old seabios, guest will shutdown when I try to restart by Ctrl+Alt+Delete during boot stage.
Kevin, can you help to describe that: "Unfortunately, kvm does not keep a pristine copy of the BIOS at 0xffff0000" It's a kvm (userspace, QEMU) bug?
If it's a qemu-kvm bug, I should also fix this bz in old stable release.
[1] =========================================== commit 244caf86f11f5f65d166d91704f64cb673167abc Author: Kevin O'Connor kevin@koconnor.net Date: Wed Sep 15 21:48:16 2010 -0400
Try to hard-reboot on rerun of post even on emulators. Extend the hard-reboot logic to qemu and kvm. On qemu, a reboot will not reset the memory settings for 0xc0000-0xfffff, so copy that memory area manually before rebooting. Unfortunately, kvm does not keep a pristine copy of the BIOS at 0xffff0000, so detect that case and shutdown the machine.
Two backport dependences: [PATCH] Try to hard-reboot processor on rerun of post under coreboot. [PATCH] Don't do shadow copying of optionroms when CONFIG_OPTIONROMS_DEPLOYED.
- Please open https://bugzilla.redhat.com/show_bug.cgi?id=1027565 (it's a public bug) - Please locate the "Unwrap comments" link, and click it - Then go to comment #20 in the bug -- it's simplest to click this here: https://bugzilla.redhat.com/show_bug.cgi?id=1027565#c20
The diagrams in that comment explain the difference between the RHEL-6 and RHEL-7 memory maps that the corresponding QEMU versions provide. Importantly, as Kevin explained too:
(a) in RHEL-6 qemu, the PAM registers that control *where* reads and writes to the region [0xc0000, 0xfffff] end up are not implemented (b) in RHEL-6 qemu, the "pc.ram" RAMBlock that provides the guest's "main RAM" is *hidden* by the "pc.bios" RAMBlock in the [0xe0000, 0xfffff] region. (c) in addition, the exact same "pc.bios" RAMBlock is visible at [0xfffe0000, 0xffffffff]
RHEL-7 is different. The PAM registers *are* implemented (well, mostly), and whether [0xe0000, 0xfffff] shows a window into RAM or PCI address space, that is controlled by some of these registers.
If the registers are set so that reads go to the PCI address space instead of RAM (= the "pc.ram" RAMBlock) -- grep the source for "pam-pci") -- then the [0xe0000, 0xfffff] range ends up showing the "isa-bios" range. Use the "info mtree" command. And "isa-bios" is again an alias, a window into the "pc.bios" RAMBlock, that is visible in PCI address space at 0xfffe0000.
In summary, - on RHEL-6 you have no working PAM registers, and [0xc0000, 0xfffff] and [0xfffe0000, 0xffffffff] always show the same. - on RHEL-7, you have (mostly) working PAM registers, and those can change what [0xc0000, 0xfffff] shows. Dependent on the PAM settings, this range can be a window into RAM, or it can be a window into "pc.bios", which resides at 0xfffe0000.
Now, different versions of SeaBIOS handle these environments differently. The topic of bug 1027565 was the following situation:
suppose that you boot a virtual machine on a RHEL-6 host (which implies RHEL-6, ie. PAM-less qemu, *and* RHEL-6 SeaBIOS), then migrate it to RHEL-7 (which implies RHEL-7, PAM-capable, qemu, *but* SeaBIOS stays the same, as it comes with migration), and *then* you reboot the VM on the target (RHEL-7) host -- this means that RHEL-6 SeaBIOS will reboot in a RHEL-7 (PAM-capable) VM.
This was a problem because: - when RHEL-6 SeaBIOS originally started on the RHEL-6 host, it modified the "pc.bios" RAMBlock, and that modification was visible in both [0xe0000, 0xfffff] and [0xfffe0000, 0xffffffff].
- When this guest was migrated to a RHEL-7 host, then rebooted, then (due to the RHEL-7-only PAM settings showing "pc.ram" at [0xe0000, 0xfffff]) those original modifications were *only* visible at [0xfffe0000, 0xffffffff]. This tripped up RHEL-6 SeaBIOS at reboot, because it expected to see (on reboot) its original modifications at [0xe0000, 0xfffff]. (The PAM registers are not re-set on reset.)
The klude we implemented for this was to manually re-shadow the BIOS from "pc.bios" to "pc.ram" on the RHEL-7 host, when the machine type implied RHEL-6.
Okay, so how does this relate to your question? The SeaBIOS commit you reference, 244caf86, makes several statements:
(1) "On qemu, a reboot will not reset the memory settings for 0xc0000-0xfffff" (2) "kvm does not keep a pristine copy of the BIOS at 0xffff0000"
Claim (1) remains true on RHEL-7 qemu as well.
Wrt. claim (2), you can see that this SeaBIOS commit dates back to the "RHEL-6 era", because it is *no longer true* for RHEL-7 qemu. On RHEL-7 qemu, the PAM registers work (mostly), and the copy of the BIOS at [0xfffe0000, 0xffffffff] *is* pristine. The function old_pc_system_rom_init() in "hw/i386/pc_sysfw.c" makes it read-only (both directly and also when seen via the isa-bios alias, dependent on the PAM registers):
if (!isapc_ram_fw) { memory_region_set_readonly(bios, true); }
...
if (!isapc_ram_fw) { memory_region_set_readonly(isa_bios, true); }
This makes a verbatim backport of this upstream SeaBIOS commit inappropriate for a RHEL-7 qemu (and that situation emerges when a VM is migrated from a RHEL-6 to a RHEL-7 host).
As to how you can fix https://bugzilla.redhat.com/show_bug.cgi?id=1129549 (which is the motivation for this entire discussion):
- You need to identify *what variable exactly* stores the list of bootable devices. Is that a SeaBIOS variable? Is it some field in the BDA (BIOS Data Area)? Something else? Because the bug is that this variable, wherever it lives, is not re-set on reboot.
(Kevin, can you perhaps help with this question? Thank you.)
- Once you identified the variable or field, you should figure out its lifecycle -- how it is affected by all of the above.
- The fix you come up with may be a backport from upstream SeaBIOS, but it also might have to be downstream only (ie. divergent). In particular whatever fix you find for RHEL-6 SeaBIOS, it must also work when such a VM is migrated to RHEL-7, and rebooted there.
As far as I understand SeaBIOS commit 244caf86, it tries to implement the following: - it knows that qemu does not reset stuff "hard enough" on reset, so it tries to make up for it, manually - it knows that "making up for it manually" is not possible on KVM at all, so it shuts down instead.
Implemented as:
the variable called HaveRunPost lives somewhere in the low [0xe0000, 0xfffff] range. When the guest is booted first, the initial value of HaveRunPost is 0, so _start() does not invoke tryReboot(), and HaveRunPost gets set o 1.
After a reboot, HaveRunPost is still 1, and this happens:
_start() tryReboot() qemu_prep_reset() make_bios_writable() copy from [0xfffe0000, 0xffffffff] to [0xe0000, 0xfffff] if (HaveRunPost) --> apm_shutdown() i8042_reboot()
The "copy" operation in the above seeks to *restore* (part of) the BIOS image at [0xe0000, 0xfffff], including its variable HaveRunPost, from the "pristine" copy at [0xfffe0000, 0xffffffff]. If that is successful, then HaveRunPost gets implicitly cleared (due to the copy operation), and then we proceed to a hard reboot. And this hard reboot is what restores the boot device list, I guess.
If HaveRunPost does not go from 1 to 0 due to the copy operation, then SeaBIOS has no way to restore itself to the "pristine" image -- there is no pristine image. (And that's the case on RHEL-6, because when you modify HaveRunPost in the [0xe0000, 0xfffff] range, that is immediately reflected in [0xfffe0000, 0xffffffff] too, so the copy is actually a no-op on RHEL-6. That's why you get the shutdown on RHEL-6.)
... I think that you cannot backport 244caf86 to RHEL-6 SeaBIOS. Namely, RHEL-6 has exactly those circumstances where SeaBIOS has no chance at a "real" hard reset; there is no "pristine" copy to restore stuff from. The solution for that would be fixing RHEL-6 qemu (see your own https://bugzilla.redhat.com/show_bug.cgi?id=1129549#c15), but fixing the PAM registers and all the memory stuff in RHEL-6 is *completely* out of scope.
So here's what you can do in RHEL-6 SeaBIOS (as I said above): - identify the exact variables / BDA fields that control the boot device list, and clear them manually on *each* SeaBIOS startup. That will cover cold boots (when it will amount to a no-op) and on warm boots (when it will fix your BZ.)
(Upstream 244caf86 would do that clearing with the copy operation, but that copy operation will *never* work on RHEL-6.)
- Then, verify that this manual, downstream-only fix works also after the VM is migrated to a RHEL-7 host.
... Apologies for the very long and messy email, but I hope it helps us all understand the issue.
Thanks Laszlo