Hi Folks
The commits between be848676..a6ed6b701 appear to break QEMU guests using a 32bit kernel when booting from virtio_scsi devices. I end up seeing the following in dmesg:
[ 2.016642] virtio_scsi: probe of virtio0 failed with error -2
Notably - I've tried modifying the amount of RAM the geust boots with, based on the comments in the commit messages, PAE *is* also enabled in the guest kernel (the kernel v6.1.76).
qemu-system-x86_64 .... -m 2048M - Guest boots fine qemu-system-x86_64 .... -m 4095M - Guest does not boot.
In the latter case I still see the 64 bit address space being used in the BIOS debug log in the latter case:
00.006: === PCI bus & bridge init === 00.006: PCI: pci_bios_init_bus_rec bus = 0x0 00.007: === PCI device probing === 00.007: Found 8 PCI devices (max PCI bus is 00) 00.008: PCIe: using q35 mmconfig at 0xb0000000 00.008: === PCI new allocation pass #1 === 00.008: PCI: check devices 00.009: === PCI new allocation pass #2 === 00.009: PCI: IO: c000 - c11f 00.009: PCI: 32: 00000000c0000000 - 00000000fec00000 00.009: PCI: 64: 0000380000000000 - 0000380040000000 00.009: PCI: map device bdf=00:02.0 bar 4, addr 380000000000, size 00004000 [prefmem] 00.010: PCI: map device bdf=00:03.0 bar 4, addr 380000004000, size 00004000 [prefmem] 00.010: PCI: map device bdf=00:04.0 bar 4, addr 380000008000, size 00004000 [prefmem] 00.010: PCI: map device bdf=00:02.0 bar 0, addr 0000c000, size 00000040 [io] 00.011: PCI: map device bdf=00:03.0 bar 0, addr 0000c040, size 00000040 [io] 00.011: PCI: map device bdf=00:04.0 bar 0, addr 0000c080, size 00000040 [io] 00.011: PCI: map device bdf=00:1f.3 bar 4, addr 0000c0c0, size 00000040 [io] 00.011: PCI: map device bdf=00:1f.2 bar 4, addr 0000c100, size 00000020 [io] 00.012: PCI: map device bdf=00:04.0 bar 6, addr feb80000, size 00040000 [mem] 00.012: PCI: map device bdf=00:01.0 bar 6, addr febc0000, size 00010000 [mem] 00.012: PCI: map device bdf=00:01.0 bar 2, addr febd0000, size 00001000 [mem] 00.012: PCI: map device bdf=00:02.0 bar 1, addr febd1000, size 00001000 [mem] 00.013: PCI: map device bdf=00:03.0 bar 1, addr febd2000, size 00001000 [mem] 00.013: PCI: map device bdf=00:04.0 bar 1, addr febd3000, size 00001000 [mem] 00.013: PCI: map device bdf=00:1f.2 bar 5, addr febd4000, size 00001000 [mem] 00.013: PCI: map device bdf=00:01.0 bar 0, addr fd000000, size 01000000 [prefmem] 00.014: PCI: init bdf=00:00.0 id=8086:29c0 00.014: PCI: init bdf=00:01.0 id=1234:1111 00.014: PCI: init bdf=00:02.0 id=1af4:1004 00.015: PCI: init bdf=00:03.0 id=1af4:1004 00.016: PCI: init bdf=00:04.0 id=1af4:1000
I'm happy to help test/provide more logs if they are useful.
On Fri, Feb 23, 2024 at 08:16:53PM +0000, Max Tottenham wrote:
Hi Folks
The commits between be848676..a6ed6b701 appear to break QEMU guests using a 32bit kernel when booting from virtio_scsi devices. I end up seeing the following in dmesg:
[ 2.016642] virtio_scsi: probe of virtio0 failed with error -2
Recommended action: turn off 64-bit support (long mode) in the cpu:
qemu -cpu host,lm=off
take care, Gerd
On 02/26, Gerd Hoffmann wrote:
On Fri, Feb 23, 2024 at 08:16:53PM +0000, Max Tottenham wrote:
Hi Folks
The commits between be848676..a6ed6b701 appear to break QEMU guests using a 32bit kernel when booting from virtio_scsi devices. I end up seeing the following in dmesg:
[ 2.016642] virtio_scsi: probe of virtio0 failed with error -2
Recommended action: turn off 64-bit support (long mode) in the cpu:
qemu -cpu host,lm=off
take care, Gerd
Hi Gerd
Thanks for the response,
that gets the VM booting - unfortunately we have many customers who may be running 32bit distro kernels - we won't know ahead of time before launching the VM whether they need this compatibility flag or not, I don't think we can use this as a suitable work-around.
Regards
Max
On Mon, Feb 26, 2024 at 10:56:05AM +0000, Max Tottenham wrote:
On 02/26, Gerd Hoffmann wrote:
Recommended action: turn off 64-bit support (long mode) in the cpu:
qemu -cpu host,lm=off
Hi Gerd
Thanks for the response,
that gets the VM booting - unfortunately we have many customers who may be running 32bit distro kernels - we won't know ahead of time before launching the VM whether they need this compatibility flag or not, I don't think we can use this as a suitable work-around.
You can turn this off completely this way:
--- a/src/fw/pciinit.c +++ b/src/fw/pciinit.c @@ -1195,8 +1195,10 @@ pci_setup(void) } }
+#if 0 if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G) pci_pad_mem64 = 1; +#endif
dprintf(1, "=== PCI bus & bridge init ===\n"); if (pci_probe_host() != 0) {
Another option would be to try tweak the condition which turns on pci_pad_mem64. The obvious candidate would be to raise the memory limit, i.e. turn this on only in case memory is present above 64G (outside the PAE-addressable physical address space), or choose some value between 4G and 64G.
I'm wondering how widespread it is in 2024 to run 32bit kernels with alot of memory?
The 32-bit kernel has 1G of kernel address space and can therefore map less than 1G of all RAM permanently. Memory above that limit ('highmem') must be mapped and unmapped if the kernel wants access it. Which is a significant performance hit (compared to a 64bit kernel), and the more memory you add the worse it gets ...
Also finding linux distros which provide full i386 support (including timely security updates) becomes increasingly difficult.
take care, Gerd
On Mon, Feb 26, 2024 at 04:00:34PM +0100, Gerd Hoffmann wrote:
On Mon, Feb 26, 2024 at 10:56:05AM +0000, Max Tottenham wrote:
On 02/26, Gerd Hoffmann wrote:
Recommended action: turn off 64-bit support (long mode) in the cpu:
qemu -cpu host,lm=off
Hi Gerd
Thanks for the response,
that gets the VM booting - unfortunately we have many customers who may be running 32bit distro kernels - we won't know ahead of time before launching the VM whether they need this compatibility flag or not, I don't think we can use this as a suitable work-around.
You can turn this off completely this way:
Apologies for resurrecting this old thread. We have just hit this case.
+#if 0 if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G) pci_pad_mem64 = 1; +#endif
dprintf(1, "=== PCI bus & bridge init ===\n"); if (pci_probe_host() != 0) {
Another option would be to try tweak the condition which turns on pci_pad_mem64. The obvious candidate would be to raise the memory limit, i.e. turn this on only in case memory is present above 64G (outside the PAE-addressable physical address space), or choose some value between 4G and 64G.
I'm wondering how widespread it is in 2024 to run 32bit kernels with alot of memory?
I see that no change has been made to seabios for this regression. Is it the position of the maintainers that such guest VMs are no longer supported by seabios, and anyone doing so is responsible for patching as necessary? Or would there still be interest in fixing this up in master?
thanks john
On Tue, Jun 11, 2024 at 04:15:06PM GMT, John Levon wrote:
On Mon, Feb 26, 2024 at 04:00:34PM +0100, Gerd Hoffmann wrote:
On Mon, Feb 26, 2024 at 10:56:05AM +0000, Max Tottenham wrote:
On 02/26, Gerd Hoffmann wrote:
Recommended action: turn off 64-bit support (long mode) in the cpu:
qemu -cpu host,lm=off
Hi Gerd
Thanks for the response,
that gets the VM booting - unfortunately we have many customers who may be running 32bit distro kernels - we won't know ahead of time before launching the VM whether they need this compatibility flag or not, I don't think we can use this as a suitable work-around.
You can turn this off completely this way:
Apologies for resurrecting this old thread. We have just hit this case.
+#if 0 if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G) pci_pad_mem64 = 1; +#endif
dprintf(1, "=== PCI bus & bridge init ===\n"); if (pci_probe_host() != 0) {
Another option would be to try tweak the condition which turns on pci_pad_mem64. The obvious candidate would be to raise the memory limit, i.e. turn this on only in case memory is present above 64G (outside the PAE-addressable physical address space), or choose some value between 4G and 64G.
I'm wondering how widespread it is in 2024 to run 32bit kernels with alot of memory?
I see that no change has been made to seabios for this regression. Is it the position of the maintainers that such guest VMs are no longer supported by seabios, and anyone doing so is responsible for patching as necessary? Or would there still be interest in fixing this up in master?
Well, the discussion simply died, while I was hoping for some feedback to figure how the heuristics can be adjusted best ...
Managing memory larger than the virtual address space comes with a performance penalty because it is not possible to map all memory permanently. So a 32-bit kernel has to do more page table updates than a 64-bit kernel because any access to HIGHMEM requires mapping changes.
The more memory a 32-bit kernel has to manage the higher the performance penalty is. Not only due to more HIGHMEM mapping operations, but also because the amount of LOWMEM (permanently mapped) memory needed to manage the memory (i.e. the 'struct page' array in case of linux) goes up, increasing the memory pressure in LOWMEM.
So my naive expectation would be that 32-bit guests have relatively small amounts of memory assigned, where the performance hit isn't too much of a problem. I have no idea whenever this is actually the case though.
So, in short, can you (or anyone else running into this) share some information what the typical / maximal amount of memory is for 32-bit guests in real world deployments?
thanks & take care, Gerd
On Wed, Jun 12, 2024 at 03:30:18PM +0200, Gerd Hoffmann wrote:
I see that no change has been made to seabios for this regression. Is it the position of the maintainers that such guest VMs are no longer supported by seabios, and anyone doing so is responsible for patching as necessary? Or would there still be interest in fixing this up in master?
So, in short, can you (or anyone else running into this) share some information what the typical / maximal amount of memory is for 32-bit guests in real world deployments?
I asked internally for some production details - it's partial information, but still indicative. The sizable majority of deployments are indeed at 4G or below. However there is in fact a non-trivial minority with 4G-64G configured, and even a few over 64G! Presumably nobody ever noticed the latter cases weren't actually using that memory.
So from a "will this change stop my production VM from booting" perspective, there's no magic number here, though 64G certainly seems like a very reasonable cutoff point - beyond having to fix up the VM config, there's no possible way that could affect any actual workload in practice.
Although this actually works at 63G, right, due to the memory layout?
I have a patch series that implements the above if you're interested...
regards john
Hi,
So from a "will this change stop my production VM from booting" perspective, there's no magic number here, though 64G certainly seems like a very reasonable cutoff point - beyond having to fix up the VM config, there's no possible way that could affect any actual workload in practice.
Although this actually works at 63G, right, due to the memory layout?
I think it makes sense to check for "RamSizeOver4G > 60GB" then.
On q35 machine type this will actually be 62GB RAM + 2GB mmio window (below 4G), on pc machine type this will indeed be 63GB RAM + 1GB mmio window.
I have a patch series that implements the above if you're interested...
Series? I'd expect a single one-line patch updating the checks, but sure, send it along.
take care, Gerd