Hi!
I come across an old qemu issue today, https://gitlab.com/qemu-project/qemu/-/issues/1115 . It seemed interesting, and since I do have several test win10 guests handy, I tested it with qemu built from tag v7.1.0. And observed the described behavior.
After digging deeper, it turns out the same bios, compiled using different gcc, produces either good or broken binaries. It is not the seabios itself, the diff is the gcc version.
In particular, on debian bookworm, gcc-12 produces seabios which breaks win10 boot. While on debian sid, gcc-13 produces working seabios from the same source.
Everything else can be anything - any qemu version, any seabios version, - any combination. Giving any combination, compile bios with gcc from bookworm - win10 does not boot; compile this same bios using gcc from sid - it works.
The binary in question is vgabios-stdvga.bin.
I tried compiling vgabios-stdvga.bin with gcc-11, - this one produces broken binary too.
This goes up to qemu version 8.1.0, and from where, version of compiler used to build seabios does not matter anymore.
This is a past already, yet it's a quite interesting (to me anyway) observation. Something was (or maybe still is?) quite fragile here.
Maybe it's a good idea to do some bisections, at least to find out when qemu started working again with "broken" bios, to understand the issue better.
FWIW.
Thanks,
/mjt
07.02.2024 02:17, Michael Tokarev пишет:
Hi!
I come across an old qemu issue today, https://gitlab.com/qemu-project/qemu/-/issues/1115 . It seemed interesting, and since I do have several test win10 guests handy, I tested it with qemu built from tag v7.1.0. And observed the described behavior.
After digging deeper, it turns out the same bios, compiled using different gcc, produces either good or broken binaries. It is not the seabios itself, the diff is the gcc version.
In particular, on debian bookworm, gcc-12 produces seabios which breaks win10 boot. While on debian sid, gcc-13 produces working seabios from the same source.
Everything else can be anything - any qemu version, any seabios version, - any combination. Giving any combination, compile bios with gcc from bookworm - win10 does not boot; compile this same bios using gcc from sid - it works.
The binary in question is vgabios-stdvga.bin.
I tried compiling vgabios-stdvga.bin with gcc-11, - this one produces broken binary too.
This goes up to qemu version 8.1.0, and from where, version of compiler used to build seabios does not matter anymore.
This is a past already, yet it's a quite interesting (to me anyway) observation. Something was (or maybe still is?) quite fragile here.
Maybe it's a good idea to do some bisections, at least to find out when qemu started working again with "broken" bios, to understand the issue better.
Since current debian stable (bookworm) which has qemu-7.2 and which is unable to boot windows 10 guest in bios mode, I went on and bisected this one. And the bisection leads to v8.0.0-2024-gbf376f30:
commit bf376f3020dfd7bcb2c4158b4ffa85c04d44f56d (HEAD) Author: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Date: Wed Jun 7 15:57:16 2023 -0500
hw/i386/pc: Default to use SMBIOS 3.0 for newer machine models
Currently, pc-q35 and pc-i44fx machine models are default to use SMBIOS 2.8 (32-bit entry point). Since SMBIOS 3.0 (64-bit entry point) is now fully supported since QEMU 7.0, default to use SMBIOS 3.0 for newer machine models. This is necessary to avoid the following message when launching a VM with large number of vcpus.
"SMBIOS 2.1 table length 66822 exceeds 65535"
Which wont help with 7.2 machine types (it changes defaults for 8.1+).
And yes, running current qemu with -M pc-q35-7.2 shows the same issue again.
So it might not be a gcc issue really, but just a too large bios and gcc-13 is able to produce more compact code which actually fits.
/mjt
On Wed, Feb 07, 2024 at 03:21:15AM +0300, Michael Tokarev wrote:
07.02.2024 02:17, Michael Tokarev пишет:
[...]
The binary in question is vgabios-stdvga.bin.
[...]
"SMBIOS 2.1 table length 66822 exceeds 65535"
Which wont help with 7.2 machine types (it changes defaults for 8.1+).
And yes, running current qemu with -M pc-q35-7.2 shows the same issue again.
So it might not be a gcc issue really, but just a too large bios and gcc-13 is able to produce more compact code which actually fits.
Ah, that makes sense. In the future, if you enable the seabios logs it should help track these things down. (Take the log from the working version and compare it to the log from the non-working version.) I suspect in the log would be messages from seabios/seavgabios hinting to the issue.
Cheers, -Kevin
10.02.2024 22:09, Kevin O'Connor :
On Wed, Feb 07, 2024 at 03:21:15AM +0300, Michael Tokarev wrote:
07.02.2024 02:17, Michael Tokarev пишет:
[...]
The binary in question is vgabios-stdvga.bin.
[...]
"SMBIOS 2.1 table length 66822 exceeds 65535"
Which wont help with 7.2 machine types (it changes defaults for 8.1+).
And yes, running current qemu with -M pc-q35-7.2 shows the same issue again.
So it might not be a gcc issue really, but just a too large bios and gcc-13 is able to produce more compact code which actually fits.
Ah, that makes sense. In the future, if you enable the seabios logs it should help track these things down. (Take the log from the working version and compare it to the log from the non-working version.) I suspect in the log would be messages from seabios/seavgabios hinting to the issue.
Ok, that's a good idea indeed. Somehow I forgot about this.
So, I rebuilt seabios with DEBUG=8. I did NOT rebuilt vgabios though, since after rebuilding it with debug added, the whole thing seem to work. However, the "bad" vgabios is just a bit larger than the "good" one:
-rw-r--r-- 1 mjt mjt 39936 Jan 24 09:35 bios-bad/vgabios-stdvga.bin -rw-r--r-- 1 mjt mjt 39424 Nov 30 20:20 bios-good/vgabios-stdvga.bin
this is from debian seabios package 1.16.3-2 (good, compiled with gcc-13) and 1.16.3-2~bpo12+1 (bad, compiled with gcc-12). There's no difference in there besides the version string and gcc used to compile it, all the rest is exactly the same.
This is debug level 1 difference (qemu machine pc-q35-7.2) -- bpo12+1 is the bad one:
diff -U1 debugcon-good debugcon-bad --- debugcon-good 2024-02-10 22:36:01.432450772 +0300 +++ debugcon-bad 2024-02-10 22:36:23.477127564 +0300 @@ -1,3 +1,3 @@ -SeaBIOS (version 1.16.3-debian-1.16.3-2) -BUILD: gcc: (Debian 13.2.0-7) 13.2.0 binutils: (GNU Binutils for Debian) 2.41 +SeaBIOS (version 1.16.3-debian-1.16.3-2~bpo12+1) +BUILD: gcc: (Debian 12.2.0-14) 12.2.0 binutils: (GNU Binutils for Debian) 2.40 No Xen hypervisor found. @@ -14,3 +14,3 @@ qemu/e820: addr 0x0000000100000000 len 0x0000000040000000 [RAM] -Relocating init from 0x000d4020 to 0x7efeb120 (size 85568) +Relocating init from 0x000d3b00 to 0x7efeae00 (size 86368) Moving pm_base to 0x600 @@ -18,3 +18,3 @@ 1: /pci@i0cf8/pci8086,2922@3/drive@0/disk@0 -kvmclock: at 0xe8580 (msr 0x4b564d01) +kvmclock: at 0xe8380 (msr 0x4b564d01) kvmclock: stable tsc, 3792 MHz @@ -62,5 +62,5 @@ Found 2 cpu(s) max supported 2 cpu(s) -Copying PIR from 0x7efffbe0 to 0x000f5560 -Copying MPTABLE from 0x00006cfc/7efe1bf0 to 0x000f5450 -Copying SMBIOS from 0x00006cfc to 0x000f5290 +Copying PIR from 0x7efffbe0 to 0x000f5500 +Copying MPTABLE from 0x00006cfc/7efe18d0 to 0x000f53f0 +Copying SMBIOS from 0x00006cfc to 0x000f5220 table(50434146)=0x7ffe2134 (via rsdt) @@ -69,7 +69,7 @@ Running option rom at c000:0003 -Start SeaVGABIOS (version 1.16.3-debian-1.16.3-2) -VGABUILD: gcc: (Debian 13.2.0-7) 13.2.0 binutils: (GNU Binutils for Debian) 2.41 +Start SeaVGABIOS (version 1.16.3-debian-1.16.3-2~bpo12+1) +VGABUILD: gcc: (Debian 12.2.0-14) 12.2.0 binutils: (GNU Binutils for Debian) 2.40 enter vga_post: a=00000008 b=0000ffff c=00000000 d=0000ffff ds=0000 es=f000 ss=0000 - si=00000000 di=000060c0 bp=00000000 sp=00006d16 cs=f000 ip=d00f f=0000 + si=00000000 di=00006060 bp=00000000 sp=00006d1a cs=f000 ip=d04a f=0000 VBE DISPI: bdf 00:01.0, bar 0 @@ -81,8 +81,8 @@ Removing mode 19e -Attempting to allocate 512 bytes lowmem via pmm call to f000:d0c0 +Attempting to allocate 512 bytes lowmem via pmm call to f000:d0e6 pmm call arg1=0 -VGA stack allocated at e8380 +VGA stack allocated at e8180 Turning on vga text mode console set VGA mode 3 -SeaBIOS (version 1.16.3-debian-1.16.3-2) +SeaBIOS (version 1.16.3-debian-1.16.3-2~bpo12+1) EHCI init on dev 00:1d.7 (regs=0xfebf2020) @@ -111,5 +111,5 @@ Searching bootorder for: HALT -drive 0x000f5200: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=67108864 +drive 0x000f5190: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=67108864 Running option rom at cb00:0003 -Space available for UMB: cd800-e7800, f4de0-f51e0 +Space available for UMB: cd800-e7800, f4d80-f5170 Returned 16637952 bytes of ZoneHigh @@ -180,4 +180 @@ VBE mode info request: 118 -VBE mode set: 4144 -set VGA mode 144 -VBE current mode=4144
So it looks like it is stuck at setting VBE mode.
Attached are 2 level-8 debug logs from good and bad runs. Unfortunately I see about the same picture as above, with lots of details before last 3 lines, and the same 3 last lines difference. Trying to find a working combination for vgabios debugging..
Does it make any sense so far?
Thanks,
/mjt
So.. the difference is vgabios only, not seabios (vgabios-stdvga in this case).
And I can't get it to work with debugging vgabios, it always fails even with DEBUG_LEVEL=2 (and level-1 logging isn't useful).
I was able to capture logs just for the non-working version, so there's nothing to compare it against. So I tried a different machine type in qemu, the one which works, which uses SMBIOS 3.0 (q35-8.2).
I also tried to compile the same vgabios with different compilers (gcc-13 vs gcc-12), in a hope to have working one with gcc-13 (the same way it works without debug). This one produces extra output at the end.
This should make a bit more sense, but I'm not sure, never tried to debug this stuff..
Thanks!
/mjt
Am 10.02.24 um 21:17 schrieb Michael Tokarev:
So.. the difference is vgabios only, not seabios (vgabios-stdvga in this case).
And I can't get it to work with debugging vgabios, it always fails even with DEBUG_LEVEL=2 (and level-1 logging isn't useful).
I was able to capture logs just for the non-working version, so there's nothing to compare it against. So I tried a different machine type in qemu, the one which works, which uses SMBIOS 3.0 (q35-8.2).
Hi,
Windows apparently doesn't read SMBIOS 3.0 tables: https://gitlab.com/qemu-project/qemu/-/issues/2008#note_1712210029
Not sure if this is relevant for your issue and it only "works" because of that with newer machine models. But it seems worth mentioning.
Best Regards, Fiona
On Mon, Feb 12, 2024 at 01:06:34PM +0100, Fiona Ebner wrote:
Am 10.02.24 um 21:17 schrieb Michael Tokarev:
So.. the difference is vgabios only, not seabios (vgabios-stdvga in this case).
And I can't get it to work with debugging vgabios, it always fails even with DEBUG_LEVEL=2 (and level-1 logging isn't useful).
I was able to capture logs just for the non-working version, so there's nothing to compare it against. So I tried a different machine type in qemu, the one which works, which uses SMBIOS 3.0 (q35-8.2).
Hi,
Windows apparently doesn't read SMBIOS 3.0 tables: https://gitlab.com/qemu-project/qemu/-/issues/2008#note_1712210029
Not sure if this is relevant for your issue and it only "works" because of that with newer machine models. But it seems worth mentioning.
Note that there is an option to explicitly set the smbios version:
qemu -machine q35,smbios-entry-point-type={32,64}
That will override the machine type default.
take care, Gerd
On Sat, Feb 10, 2024 at 11:17:54PM +0300, Michael Tokarev wrote:
So.. the difference is vgabios only, not seabios (vgabios-stdvga in this case).
And I can't get it to work with debugging vgabios, it always fails even with DEBUG_LEVEL=2 (and level-1 logging isn't useful).
I was able to capture logs just for the non-working version, so there's nothing to compare it against. So I tried a different machine type in qemu, the one which works, which uses SMBIOS 3.0 (q35-8.2).
Thanks for testing. So, if I understand the issue correctly: 1. If smbios v3 is used then the problem does not occur. 2. If gcc v13 is used to compile vgabios then the problem does not occur. 3. If smbios v2 is used and gcc v12 is used then win10 can not boot. Is that correct?
A strange issue. Issues like this tend to be very difficult to track down.
As a random guess, one possibility is that it could be related to vgabios stack size usage. You could try always enabling the "extra vga stack" with a change like:
--- a/vgasrc/vgabios.c +++ b/vgasrc/vgabios.c @@ -285,8 +285,7 @@ vga_set_mode(int mode, int flags) // Disable extra stack if it appears a modern OS is in use. // This works around bugs in some versions of Windows (Vista // and possibly later) when the stack is in the e-segment. - MASK_BDA_EXT(flags, BF_EXTRA_STACK - , (flags & MF_LEGACY) ? BF_EXTRA_STACK : 0); + MASK_BDA_EXT(flags, BF_EXTRA_STACK, BF_EXTRA_STACK); if (memmodel == MM_TEXT) { SET_BDA(video_cols, width); SET_BDA(video_rows, height-1);
Separately, if you can provide the failing and succeeding builds, I can try to take a look at it locally. To do this, make sure you're on commit 82faf1d5, run make, run "tar cfz fullbuild.tgz out/", and provide the resulting tgz file.
-Kevin
On Tue, 20 Feb 2024 14:41:50 -0500 Kevin O'Connor kevin@koconnor.net wrote:
On Sat, Feb 10, 2024 at 11:17:54PM +0300, Michael Tokarev wrote:
So.. the difference is vgabios only, not seabios (vgabios-stdvga in this case).
And I can't get it to work with debugging vgabios, it always fails even with DEBUG_LEVEL=2 (and level-1 logging isn't useful).
I was able to capture logs just for the non-working version, so there's nothing to compare it against. So I tried a different machine type in qemu, the one which works, which uses SMBIOS 3.0 (q35-8.2).
Thanks for testing. So, if I understand the issue correctly:
- If smbios v3 is used then the problem does not occur.
one thing to note is that Windows isn't able to find SMBIOS v3 tables due to bug in anchor lookup within winload.exe. So essentially v3 tables mean that Windows doesn't access SMBIOS at all, while with v2 it's actually using SMBIOS tables.
- If gcc v13 is used to compile vgabios then the problem does not occur.
- If smbios v2 is used and gcc v12 is used then win10 can not boot.
Is that correct?
A strange issue. Issues like this tend to be very difficult to track down.
As a random guess, one possibility is that it could be related to vgabios stack size usage. You could try always enabling the "extra vga stack" with a change like:
--- a/vgasrc/vgabios.c +++ b/vgasrc/vgabios.c @@ -285,8 +285,7 @@ vga_set_mode(int mode, int flags) // Disable extra stack if it appears a modern OS is in use. // This works around bugs in some versions of Windows (Vista // and possibly later) when the stack is in the e-segment.
MASK_BDA_EXT(flags, BF_EXTRA_STACK
, (flags & MF_LEGACY) ? BF_EXTRA_STACK : 0);
if (memmodel == MM_TEXT) { SET_BDA(video_cols, width); SET_BDA(video_rows, height-1);MASK_BDA_EXT(flags, BF_EXTRA_STACK, BF_EXTRA_STACK);
Separately, if you can provide the failing and succeeding builds, I can try to take a look at it locally. To do this, make sure you're on commit 82faf1d5, run make, run "tar cfz fullbuild.tgz out/", and provide the resulting tgz file.
-Kevin _______________________________________________ SeaBIOS mailing list -- seabios@seabios.org To unsubscribe send an email to seabios-leave@seabios.org
Hi,
Dne 10. 02. 24 v 20:09 Kevin O'Connor napsal(a):
So it might not be a gcc issue really, but just a too large bios and gcc-13 is able to produce more compact code which actually fits.
Ah, that makes sense. In the future, if you enable the seabios logs it should help track these things down. (Take the log from the working version and compare it to the log from the non-working version.) I suspect in the log would be messages from seabios/seavgabios hinting to the issue.
Just a random idea, maybe there is something wrong with windows failing to emulate stuff? (the vgafixup.py)
# It is also known that the Windows vgabios emulator has issues with # addressing negative offsets to the %esp register. That has been # worked around by not using the gcc parameter "-fomit-frame-pointer" # when compiling.
I would try to boot with non working vgabios something else, like grub2, and there I would do: insmod vbe videotest
And then probably one can use videotest to try to setup the mode. Or boot some payload with set gfxpayload.
Thanks, Rudolf
11.02.2024 00:10, Rudolf Marek:
Hi,
Hello! Thank you for the reply!
Dne 10. 02. 24 v 20:09 Kevin O'Connor napsal(a):
So it might not be a gcc issue really, but just a too large bios and gcc-13 is able to produce more compact code which actually fits.
Ah, that makes sense. In the future, if you enable the seabios logs it should help track these things down. (Take the log from the working version and compare it to the log from the non-working version.) I suspect in the log would be messages from seabios/seavgabios hinting to the issue.
Just a random idea, maybe there is something wrong with windows failing to emulate stuff? (the vgafixup.py)
Well.
Windows might fail to emulate something. The prob with that is that we can't fix this, we have to find where it is failing and work around this, because there are lots of systems running all around the world which works, and even if there's some quirk it all relies upon, and seabios is the only one which works correctly, this knowledge doesn't help, - we have to implement the same quirk to be compatible with the world.
Also, it seem strange at best that the thing depends on the compiler being used to compile seabios (vgabios) (I can't say for sure it is the size, but so far it seems like it is).
# It is also known that the Windows vgabios emulator has issues with # addressing negative offsets to the %esp register. That has been # worked around by not using the gcc parameter "-fomit-frame-pointer" # when compiling.
I would try to boot with non working vgabios something else, like grub2, and there I would do: insmod vbe videotest
This one seems to be working (I had to add `loadfont unicode' before; I never used grub before so don't know the details here).
I guess if the prob existed with grub, it'd be reported more widely, since many linux distros use grub in graphics mode.
And the original bug report talks about windows, especially windows 10 - just tried windows 7, and I don't see this behavior with it.
Probably should try with win11 - though this one is a bit more difficult to install in bios mode.
Thanks,
/mjt
11.02.2024 09:49, Michael Tokarev wrote: ..
And the original bug report talks about windows, especially windows 10 - just tried windows 7, and I don't see this behavior with it.
Probably should try with win11 - though this one is a bit more difficult to install in bios mode.
So, win11 (in bios mode) fails in exactly the same way as win10, with exactly the same debugging produced by seabios/vgabios.
/mjt
On Sun, Feb 11, 2024 at 09:49:21AM +0300, Michael Tokarev wrote:
Hello! Thank you for the reply!
Just a random idea, maybe there is something wrong with windows failing to emulate stuff? (the vgafixup.py)
Well.
Windows might fail to emulate something. The prob with that is that we can't fix this,
Well, vgafixup.py exists exactly to workaround emulator bugs (by avoiding problematic instructions). The Xserver with vesa driver goes emulate the vgabios too, and it has bugs, especially when it comes to 32-bit instructions ...
So there is a fair chance that we actually can do something about it once we know the root cause. Analyzing that kind of issues is a PITA though, especially with closed source software being involved.
I would try to boot with non working vgabios something else, like grub2, and there I would do: insmod vbe videotest
This one seems to be working (I had to add `loadfont unicode' before; I never used grub before so don't know the details here).
Booting linux kernels with vga=ask can also serve as simple vgabios test btw.
But with both linux and grub running there is no emulator involved, so it is unlikely to help much ...
And the original bug report talks about windows, especially windows 10 - just tried windows 7, and I don't see this behavior with it.
Could very well be that microsoft switched to an emulator in newer versions to sandbox the vgabios. Possibly it's also different in 32-bit vs. 64-bit windows versions.
take care, Gerd