Under SeaBIOS, I'm noticing that not all of the PCIe-related area is marked uncachable in the MTRR settings, at least in the Q35 platform (QEMU).
I feel like this is a bug, but I'm not familar with the lore behind Q35 and MTRRs, feedback would be appreciated.
The MCFG table contains an entry starting at Q35_HOST_BRIDGE_PCIEXBAR_ADDR (0xb0000000) corresponding to a 256MB area. Later, the mch_mem_addr_setup() routine defines the PCIe window (pcimem_start) starting at 0xc0000000 (Q35_HOST_BRIDGE_PCIEXBAR_ADDR + Q35_HOST_BRIDGE_PCIEXBAR_SIZE).
I see in mtrr_setup(), that only a single variable MTRR is configured based solely on pcimem_start and extends to the end of the 4GB boundary. It looks to me that this is probably fine for 440fx, but seems insufficient for Q35.
Q35 - original SeaBIOS 1.15.0: # cat /proc/mtrr reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable
Q35 - original EFI (ovmf 2202.02) # cat /proc/mtrr reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable reg01: base=0x0b0000000 ( 2816MB), size= 256MB, count=1: uncachable reg02: base=0x800000000 (32768MB), size=32768MB, count=1: uncachable
Thus, for Q35, there is no explict attempt to configure the Q35_HOST_BRIDGE_PCIEXBAR_ADDR area nor any potential the >4GB area used for PCIe I/O. The inherent size restrictions on MTRR mask definitions make a completely generic solution a bit tricky.
Attached is a simple patch that enables use of the other variable MTRR registers. I believe solves the issue, but I'm not sure how to test it with a 64-bit PCIe window. In a simple setup, it results in the following:
# cat /proc/mtrr reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable reg01: base=0x0b0000000 ( 2816MB), size= 256MB, count=1: uncachable
Is this type of change appropriate? Would it make sense to remove direct use of pcimem_start from the mtrr.c code and just like the PCI code directly register the related range using the new mechanism?
Thanks
-Alex
On Wed, Sep 28, 2022 at 10:56:34AM -0500, Alex Olson wrote:
Under SeaBIOS, I'm noticing that not all of the PCIe-related area is marked uncachable in the MTRR settings, at least in the Q35 platform (QEMU).
I feel like this is a bug, but I'm not familar with the lore behind Q35 and MTRRs, feedback would be appreciated.
The MCFG table contains an entry starting at Q35_HOST_BRIDGE_PCIEXBAR_ADDR (0xb0000000) corresponding to a 256MB area. Later, the mch_mem_addr_setup() routine defines the PCIe window (pcimem_start) starting at 0xc0000000 (Q35_HOST_BRIDGE_PCIEXBAR_ADDR + Q35_HOST_BRIDGE_PCIEXBAR_SIZE).
I see in mtrr_setup(), that only a single variable MTRR is configured based solely on pcimem_start and extends to the end of the 4GB boundary. It looks to me that this is probably fine for 440fx, but seems insufficient for Q35.
Q35 - original SeaBIOS 1.15.0: # cat /proc/mtrr reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable
Q35 - original EFI (ovmf 2202.02) # cat /proc/mtrr reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable reg01: base=0x0b0000000 ( 2816MB), size= 256MB, count=1: uncachable reg02: base=0x800000000 (32768MB), size=32768MB, count=1: uncachable
See also https://github.com/tianocore/edk2/commit/2a0bd3bffc80d1982cf85cdad066f79a2f6...
Thus, for Q35, there is no explict attempt to configure the Q35_HOST_BRIDGE_PCIEXBAR_ADDR area nor any potential the >4GB area used for PCIe I/O. The inherent size restrictions on MTRR mask definitions make a completely generic solution a bit tricky.
qemu using gigabyte pages these days should simplify things. If there is 2G (or less) of low memory mark 2G -> 4G uncachable. If there is 3G (or less) of low memory mark 3G -> 4G uncachable. Doable with a single mtrr register, should cover 99% of the use cases and is a improvement over the current situation.
I believe solves the issue, but I'm not sure how to test it with a 64-bit PCIe window.
Patch the code to force it.
seabios places everything below 4G if it fits, for backward compatibility reasons. Could be we are booting a 32bit guest OS which might not be able to deal with PCI bars mapped above 4G ...
Maybe it makes sense to relax that a bit. In case we find memory above 4G it should be safe to assume that PCI bars above 4G are fine too.
take care, Gerd
commit c7074250e6faaeffc5e8966a8a0ecdfe8a5c2af3 Author: Gerd Hoffmann kraxel@redhat.com Date: Fri Sep 9 10:17:15 2022 +0200
[testing] force 64bit pci window
diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c index ad6def93633b..08339da2905c 100644 --- a/src/fw/pciinit.c +++ b/src/fw/pciinit.c @@ -1108,7 +1108,7 @@ static void pci_bios_map_devices(struct pci_bus *busses) panic("PCI: out of I/O address space\n");
dprintf(1, "PCI: 32: %016llx - %016llx\n", pcimem_start, pcimem_end); - if (pci_bios_init_root_regions_mem(busses)) { + if (1 || pci_bios_init_root_regions_mem(busses)) { struct pci_region r64_mem, r64_pref; r64_mem.list.first = NULL; r64_pref.list.first = NULL;