On Fri, Nov 10, 2023 at 09:36:02AM +0100, Claudio Fontana wrote:
On 11/8/23 19:35, Kevin O'Connor wrote:
On Tue, Nov 07, 2023 at 02:03:09PM +0100, Gerd Hoffmann wrote:
For better compatibility with old linux kernels, see source code comment.
Related (same problem in ovmf): https://github.com/tianocore/edk2/commit/c1e853769046
Thanks. I'll defer to your judgement on this. It does seem a little odd to alter the CPUPhysBits variable itself instead of adding additional checking to the pciinit.c code that uses CPUPhysBits. (That is, if there are old Linux kernels that can't handle a very high PCI space, then a workaround in the PCI code might make it more clear what is occurring.)
Cheers, -Kevin
Cc: Claudio Fontana cfontana@suse.de Signed-off-by: Gerd Hoffmann kraxel@redhat.com
Hi,
I thought about this, and I am not sure it's the right call though.
The change breaking old guests (CentOS7, Ubuntu 18.04 are the known ones, but presumably others as well) in QEMU is:
commit 784155cdcb02ffaae44afecab93861070e7d652d Author: Gerd Hoffmann kraxel@redhat.com Date: Mon Sep 11 17:20:26 2023 +0200
seabios: update submodule to git snapshot git shortlog ------------ Gerd Hoffmann (7): disable array bounds warning better kvm detection detect physical address space size move 64bit pci window to end of address space be less conservative with the 64bit pci io window qemu: log reservations in fw_cfg e820 table check for e820 conflict
The reasoning for the change is according to:
https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/message/BHK67U...
"It makes seabios follow a common pattern. real mode address space has io resources mapped high (below 1M). 32-bit address space has io resources mapped high too (below 4G). This does the same for 64-bit resources."
So it seems to be an almost aesthetic choice, the way I read the "common pattern", one that ends up actually breaking existing workloads though.
It's not about aesthetics. It's about handing out more resources to the 64bit mmio window and also to the pci bridge windows if we have enough address space for that. This is important because not only main memory sizes grow, but also device memory grows. Especially GPUs and AI accelerators can have rather big PCI memory bars these days. If you want support hotplugging them you need pcie root ports with big enough bridge windows.
The "common pattern" refers to OVMF doing the same for the same reason.
Now the correction to that that you propose in SeaBIOS is:
- if (valid && physbits > 46) {
// Old linux kernels have trouble dealing with more than 46
// phys-bits, so avoid that for now. Seems to be a bug in the
// virtio-pci driver. Reported: centos-7, ubuntu-18.04
dprintf(1, "%s: using phys-bits=46 (old linux kernel compatibility)\n", __func__);
physbits = 46;
- }
but to me this is potentially breaking the situation for another set of users, those that are passing through 48 physical bits from their host cpu to the guest, and expect it to actually happen.
This doesn't change the address space size the guest does see. seabios does not have the power to change the information returned by the cpuid instruction.
This only changes the placement of the PCI bars. The pci setup code is the only consumer of that variable, guess it makes sense to move the quirk to the pci code (as suggested by Kevin) to clarify this.
Would it not be a better solution to instead either revert the original change,
No (see above).
or to patch it to find a new range that better satisfies code consistency/aesthetic requirements, but also does not break any running workload?
Upstream OVMF does the same thing for roughly one year, the old linux kernel issue is the only one which showed up so far. OVMF wouldn't see *really* old guests (without UEFI support) though.
The pci setup code already has a bunch of conditions for activating the 64bit mmio window, to avoid breaking really old guests. It wouldn't happen on CPUs without long mode. It also wouldn't happen if guests don't have memory above 4G.
I'm open to suggestions to refine this.
Or we could add some extra workarounds in the stack elsewhere in the management tools to try to detect older guests (not ideal either), but this seems like adding breakage on top of breakage to me.
We already have libosinfo which records guest capabilities, which exists to solve exactly that problem: Allow new guests use new features by default, without breaking old guests. Extending libosinfo looks like a perfectly valid approach to me, especially considering that we probably want make the 46 phys-bits limit conditional some day in the future, to allow even larger guests.
take care, Gerd