[SeaBIOS] about [SeaBIOS PATCH] Try to hard-reboot on rerun of post even on emulators.

Tue Apr 21 10:37:49 CEST 2015

Hi Amos,

On 04/21/15 01:31, Amos Kong wrote:
> Hi Kevin,
> 
> When I use old seabios in some stable linux release, some bootable
> devices (2 ide disks) would be lost when I try to restart guest by
> Ctrl+Alt+Delete during boot stage.
> 
> Releated Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1129549
> 
> I found an upstream commit [1] fixed this bug, but when I backport
> this patch to old seabios, guest will shutdown when I try to restart
> by Ctrl+Alt+Delete during boot stage.
> 
> Kevin, can you help to describe that:
> "Unfortunately, kvm does not keep a  pristine copy of the BIOS at 0xffff0000"
> It's a kvm (userspace, QEMU) bug?
> 
> If it's a qemu-kvm bug, I should also fix this bz in old stable release.
> 
> 
> [1] ===========================================
> commit 244caf86f11f5f65d166d91704f64cb673167abc
> Author: Kevin O'Connor <kevin at koconnor.net>
> Date:   Wed Sep 15 21:48:16 2010 -0400
> 
>     Try to hard-reboot on rerun of post even on emulators.
>     
>     Extend the hard-reboot logic to qemu and kvm.  On qemu, a reboot will
>     not reset the memory settings for 0xc0000-0xfffff, so copy that memory
>     area manually before rebooting.  Unfortunately, kvm does not keep a
>     pristine copy of the BIOS at 0xffff0000, so detect that case and
>     shutdown the machine.
> 
> Two backport dependences:
>   [PATCH] Try to hard-reboot processor on rerun of post under coreboot.
>   [PATCH] Don't do shadow copying of optionroms when CONFIG_OPTIONROMS_DEPLOYED.
> 

- Please open <https://bugzilla.redhat.com/show_bug.cgi?id=1027565>
  (it's a public bug)
- Please locate the "Unwrap comments" link, and click it
- Then go to comment #20 in the bug -- it's simplest to click this here:
  <https://bugzilla.redhat.com/show_bug.cgi?id=1027565#c20>

The diagrams in that comment explain the difference between the RHEL-6
and RHEL-7 memory maps that the corresponding QEMU versions provide.
Importantly, as Kevin explained too:

(a) in RHEL-6 qemu, the PAM registers that control *where* reads and
    writes to the region [0xc0000, 0xfffff] end up are not implemented
(b) in RHEL-6 qemu, the "pc.ram" RAMBlock that provides the guest's
    "main RAM" is *hidden* by the "pc.bios" RAMBlock in the
    [0xe0000, 0xfffff] region.
(c) in addition, the exact same "pc.bios" RAMBlock is visible at
    [0xfffe0000, 0xffffffff]

RHEL-7 is different. The PAM registers *are* implemented (well, mostly),
and whether [0xe0000, 0xfffff] shows a window into RAM or PCI address
space, that is controlled by some of these registers.

If the registers are set so that reads go to the PCI address space
instead of RAM (= the "pc.ram" RAMBlock) -- grep the source for
"pam-pci") -- then the [0xe0000, 0xfffff] range ends up showing the
"isa-bios" range. Use the "info mtree" command. And "isa-bios" is again
an alias, a window into the "pc.bios" RAMBlock, that is visible in PCI
address space at 0xfffe0000.

In summary,
- on RHEL-6 you have no working PAM registers, and [0xc0000, 0xfffff]
  and [0xfffe0000, 0xffffffff] always show the same.
- on RHEL-7, you have (mostly) working PAM registers, and those can
  change what [0xc0000, 0xfffff] shows. Dependent on the PAM settings,
  this range can be a window into RAM, or it can be a window into
  "pc.bios", which resides at 0xfffe0000.

Now, different versions of SeaBIOS handle these environments
differently. The topic of bug 1027565 was the following situation:

suppose that you boot a virtual machine on a RHEL-6 host (which implies
RHEL-6, ie. PAM-less qemu, *and* RHEL-6 SeaBIOS), then migrate it to
RHEL-7 (which implies RHEL-7, PAM-capable, qemu, *but* SeaBIOS stays the
same, as it comes with migration), and *then* you reboot the VM on the
target (RHEL-7) host -- this means that RHEL-6 SeaBIOS will reboot in a
RHEL-7 (PAM-capable) VM.

This was a problem because:
- when RHEL-6 SeaBIOS originally started on the RHEL-6 host, it modified
  the "pc.bios" RAMBlock, and that modification was visible in both
  [0xe0000, 0xfffff] and [0xfffe0000, 0xffffffff].

- When this guest was migrated to a RHEL-7 host, then rebooted, then
  (due to the RHEL-7-only PAM settings showing "pc.ram" at [0xe0000,
  0xfffff]) those original modifications were *only* visible at
  [0xfffe0000, 0xffffffff]. This tripped up RHEL-6 SeaBIOS at reboot,
  because it expected to see (on reboot) its original modifications at
  [0xe0000, 0xfffff]. (The PAM registers are not re-set on reset.)

The klude we implemented for this was to manually re-shadow the BIOS
from "pc.bios" to "pc.ram" on the RHEL-7 host, when the machine type
implied RHEL-6.

Okay, so how does this relate to your question? The SeaBIOS commit you
reference, 244caf86, makes several statements:

(1) "On qemu, a reboot will not reset the memory settings for
    0xc0000-0xfffff"
(2) "kvm does not keep a pristine copy of the BIOS at 0xffff0000"

Claim (1) remains true on RHEL-7 qemu as well.

Wrt. claim (2), you can see that this SeaBIOS commit dates back to the
"RHEL-6 era", because it is *no longer true* for RHEL-7 qemu. On RHEL-7
qemu, the PAM registers work (mostly), and the copy of the BIOS at
[0xfffe0000, 0xffffffff] *is* pristine. The function
old_pc_system_rom_init() in "hw/i386/pc_sysfw.c" makes it read-only
(both directly and also when seen via the isa-bios alias, dependent on
the PAM registers):

    if (!isapc_ram_fw) {
        memory_region_set_readonly(bios, true);
    }

...

    if (!isapc_ram_fw) {
        memory_region_set_readonly(isa_bios, true);
    }

This makes a verbatim backport of this upstream SeaBIOS commit
inappropriate for a RHEL-7 qemu (and that situation emerges when a VM is
migrated from a RHEL-6 to a RHEL-7 host).

As to how you can fix
<https://bugzilla.redhat.com/show_bug.cgi?id=1129549> (which is the
motivation for this entire discussion):

- You need to identify *what variable exactly* stores the list of
  bootable devices. Is that a SeaBIOS variable? Is it some field in the
  BDA (BIOS Data Area)? Something else? Because the bug is that this
  variable, wherever it lives, is not re-set on reboot.

  (Kevin, can you perhaps help with this question? Thank you.)

- Once you identified the variable or field, you should figure out its
  lifecycle -- how it is affected by all of the above.

- The fix you come up with may be a backport from upstream SeaBIOS, but
  it also might have to be downstream only (ie. divergent). In
  particular whatever fix you find for RHEL-6 SeaBIOS, it must also
  work when such a VM is migrated to RHEL-7, and rebooted there.

As far as I understand SeaBIOS commit 244caf86, it tries to implement
the following:
- it knows that qemu does not reset stuff "hard enough" on reset, so it
  tries to make up for it, manually
- it knows that "making up for it manually" is not possible on KVM at
  all, so it shuts down instead.

Implemented as:

the variable called HaveRunPost lives somewhere in the low [0xe0000,
0xfffff] range. When the guest is booted first, the initial value of
HaveRunPost is 0, so _start() does not invoke tryReboot(), and
HaveRunPost gets set o 1.

After a reboot, HaveRunPost is still 1, and this happens:

_start()
  tryReboot()
    qemu_prep_reset()
      make_bios_writable()
      copy from [0xfffe0000, 0xffffffff] to [0xe0000, 0xfffff]
    if (HaveRunPost) --> apm_shutdown()
    i8042_reboot()

The "copy" operation in the above seeks to *restore* (part of) the BIOS
image at [0xe0000, 0xfffff], including its variable HaveRunPost, from
the "pristine" copy at [0xfffe0000, 0xffffffff]. If that is successful,
then HaveRunPost gets implicitly cleared (due to the copy operation),
and then we proceed to a hard reboot. And this hard reboot is what
restores the boot device list, I guess.

If HaveRunPost does not go from 1 to 0 due to the copy operation, then
SeaBIOS has no way to restore itself to the "pristine" image -- there is
no pristine image. (And that's the case on RHEL-6, because when you
modify HaveRunPost in the [0xe0000, 0xfffff] range, that is immediately
reflected in [0xfffe0000, 0xffffffff] too, so the copy is actually a
no-op on RHEL-6. That's why you get the shutdown on RHEL-6.)

... I think that you cannot backport 244caf86 to RHEL-6 SeaBIOS. Namely,
RHEL-6 has exactly those circumstances where SeaBIOS has no chance at a
"real" hard reset; there is no "pristine" copy to restore stuff from.
The solution for that would be fixing RHEL-6 qemu (see your own
<https://bugzilla.redhat.com/show_bug.cgi?id=1129549#c15>), but fixing
the PAM registers and all the memory stuff in RHEL-6 is *completely* out
of scope.

So here's what you can do in RHEL-6 SeaBIOS (as I said above):
- identify the exact variables / BDA fields that control the boot device
list, and clear them manually on *each* SeaBIOS startup. That will cover
cold boots (when it will amount to a no-op) and on warm boots (when it
will fix your BZ.)

(Upstream 244caf86 would do that clearing with the copy operation, but
that copy operation will *never* work on RHEL-6.)

- Then, verify that this manual, downstream-only fix works also after
the VM is migrated to a RHEL-7 host.

... Apologies for the very long and messy email, but I hope it helps us
all understand the issue.

Thanks
Laszlo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-qemu_loadvm_state-shadow-SeaBIOS-for-VM-incoming-fro.patch
Type: text/x-patch
Size: 5847 bytes
Desc: not available
URL: <http://www.seabios.org/pipermail/seabios/attachments/20150421/438c3329/attachment.patch>