On 2012-02-27 10:51, Daniel P. Berrange wrote:
I'm seeing current QEMU GIT fail to boot MS-Dos 6.22 with the following crash:
# qemu-system-x86_64 -fda ~/MS-DOS\ 6.22.img -m 1 -curses iPXE v1.0.0-591-g7aee315 iPXE (http://ipxe.org) 00:03.0 C900 PCI2.10 PnP PMM+00000000+00000000 C900
Booting from Floppy..
. qemu: fatal: Trying to execute code outside RAM or ROM at 0x00000001000effff
EAX=ffffffff EBX=ffffffff ECX=0000c934 EDX=00000068 ESI=00006801 EDI=00000000 EBP=0000002b ESP=0000fff5 EIP=ffffffff EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0040 00000400 0000ffff 00009300 CS =f000 000f0000 0000ffff 00009b00 SS =9ec4 0009ec40 0000ffff 00009300 DS =9ec4 0009ec40 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 000fcd78 00000037 IDT= 00000000 000003ff CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 CCS=000000d0 CCD=00000068 CCO=SARL EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 Aborted
Git bisect blames this
commit 41bd360325168b3c1db78eb7311420a1607d521f Author: Jan Kiszka jan.kiszka@siemens.com Date: Sun Jan 15 17:48:25 2012 +0100
seabios: Update to release 1.6.3.1 User visible changes in seabios: - Probe HPET existence (fix for -no-hpet) - Probe PCI existence (fix for -machine isapc) - usb: fix boot paths Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
I tried to bisect Seabios, but every revision in Seabios upstream works fine.
Then I noticed, that if I rebuild the BIOS, from the exact same revision 1.6.3.1 revision that is committed in 'seabios' submodule in QEMU, then it works fine. So AFAICT, it is not the Seabios source code at fault, but rather the binary build we have commited to GIT. Should/can we rebuild the bios.bin in GIT ?
Probably not without understanding what causes this strange inconsistency. If Seabios builds without errors and then later on fails, this is also a bug.
Kevin, what information do you need to assess my tool chain?
Jan
Jan Kiszka wrote:
Then I noticed, that if I rebuild the BIOS, from the exact same revision 1.6.3.1 revision that is committed in 'seabios' submodule in QEMU, then it works fine. So AFAICT, it is not the Seabios source code at fault, but rather the binary build we have commited to GIT. Should/can we rebuild the bios.bin in GIT ?
Probably not without understanding what causes this strange inconsistency. If Seabios builds without errors and then later on fails, this is also a bug.
Kevin, what information do you need to assess my tool chain?
In the coreboot project we have more than 10 years of experience from distribution toolchains consistently being too broken to build a working coreboot image. The same problems apply to SeaBIOS.
As you know, distribution toolchains are heavily patched, presumably to add some value to the distribution. The patches work fine when the toolchains should output userland binaries or the odd kernel. They fail frequently and in countless ways when used to produce bare metal binaries.
Within coreboot it is much less effort to build an i386-elf cross toolchain than to mess with the hundreds if not thousands of issues in the distribution toolchains. The same applies to SeaBIOS of course. The script we use in coreboot is here:
http://review.coreboot.org/gitweb?p=coreboot.git;a=tree;f=util/crossgcc
If you want to investigate and spend time on motivating distributions to unbreak their toolchains that's awesome, but be prepared to spend many weeks disassembling binaries and reverse engineering the toolchain.
//Peter
On 2012-02-27 17:00, Peter Stuge wrote:
Jan Kiszka wrote:
Then I noticed, that if I rebuild the BIOS, from the exact same revision 1.6.3.1 revision that is committed in 'seabios' submodule in QEMU, then it works fine. So AFAICT, it is not the Seabios source code at fault, but rather the binary build we have commited to GIT. Should/can we rebuild the bios.bin in GIT ?
Probably not without understanding what causes this strange inconsistency. If Seabios builds without errors and then later on fails, this is also a bug.
Kevin, what information do you need to assess my tool chain?
In the coreboot project we have more than 10 years of experience from distribution toolchains consistently being too broken to build a working coreboot image. The same problems apply to SeaBIOS.
As you know, distribution toolchains are heavily patched, presumably to add some value to the distribution. The patches work fine when the toolchains should output userland binaries or the odd kernel. They fail frequently and in countless ways when used to produce bare metal binaries.
Within coreboot it is much less effort to build an i386-elf cross toolchain than to mess with the hundreds if not thousands of issues in the distribution toolchains. The same applies to SeaBIOS of course. The script we use in coreboot is here:
http://review.coreboot.org/gitweb?p=coreboot.git;a=tree;f=util/crossgcc
If you want to investigate and spend time on motivating distributions to unbreak their toolchains that's awesome, but be prepared to spend many weeks disassembling binaries and reverse engineering the toolchain.
Well, the Linux kernel can also be built with practically any distro out there. Having a need for a separate toolchain for building x86 on x86 is a bit overkill IMHO, at least for someone hacking on Seabios only infrequently like /me.
Jan
PS: Please avoid "mail-followup-to" in your replies, it messes up To/CC.
Jan Kiszka wrote:
Well, the Linux kernel can also be built with practically any distro out there.
Yeah. Maybe that gets tested more than building coreboot and SeaBIOS, and so problems are discovered by those who introduce them.
Having a need for a separate toolchain for building x86 on x86 is a bit overkill IMHO, at least for someone hacking on Seabios only infrequently like /me.
OTOH it takes about 20 minutes to build a toolchain on a reasonably fast machine, and the script makes it quite effortless.
PS: Please avoid "mail-followup-to" in your replies, it messes up To/CC.
Moved this discussion to private email.
//Peter
On Mon, Feb 27, 2012 at 04:25:09PM +0100, Jan Kiszka wrote:
On 2012-02-27 10:51, Daniel P. Berrange wrote:
I'm seeing current QEMU GIT fail to boot MS-Dos 6.22 with the following crash:
# qemu-system-x86_64 -fda ~/MS-DOS\ 6.22.img -m 1 -curses
Does the error persist when run with "-m 2"? If more memory fixes the issue, then it is likely already fixed in upstream (commit 890d9851). The bugs fixed in that commit are null pointer derefernce errors - in SeaBIOS, a write to "NULL" actually alters the memory at address 0, which can corrupt the interrupt table - these can lead to unpredictable errors, as the timing between when an irq fires and when the corruption occurs can vary. DOS might overwrite the irq entries with its own settings, and thus depending on timing may cover up the error. In short, I wouldn't assume the problem is the toolchain.
-Kevin
On Wed, Feb 29, 2012 at 03:45:13AM -0500, Kevin O'Connor wrote:
On Mon, Feb 27, 2012 at 04:25:09PM +0100, Jan Kiszka wrote:
On 2012-02-27 10:51, Daniel P. Berrange wrote:
I'm seeing current QEMU GIT fail to boot MS-Dos 6.22 with the following crash:
# qemu-system-x86_64 -fda ~/MS-DOS\ 6.22.img -m 1 -curses
Does the error persist when run with "-m 2"? If more memory fixes the issue, then it is likely already fixed in upstream (commit 890d9851). The bugs fixed in that commit are null pointer derefernce errors - in SeaBIOS, a write to "NULL" actually alters the memory at address 0, which can corrupt the interrupt table - these can lead to unpredictable errors, as the timing between when an irq fires and when the corruption occurs can vary. DOS might overwrite the irq entries with its own settings, and thus depending on timing may cover up the error. In short, I wouldn't assume the problem is the toolchain.
The error occurs no matter what '-m XX' setting I give it. I did a git bisect across a Seabios GIT from master down torel-1.6.3.1 and could not reproduce it with any BIOS I built myself. Hence the only conclusion I could come to is that the QEMU binary was broken in some way.
Regards, Daniel
Modern operating systems does not use BIOS much. DOS use BIOS a lot. SeaBIOS is a BIOS, so should be able to run operating systems that use BIOS. So I think DOS is something that ought to run/work on SeaBIOS if SeaBIOS implements BIOS correctly.
Shouldn't DOS be some milestone for SeaBIOS? Perhaps there ought to be a BIOS testsuite to test BIOS compliance.
On Wed, Feb 29, 2012 at 10:19 AM, Daniel P. Berrange berrange@redhat.com wrote:
On Wed, Feb 29, 2012 at 03:45:13AM -0500, Kevin O'Connor wrote:
On Mon, Feb 27, 2012 at 04:25:09PM +0100, Jan Kiszka wrote:
On 2012-02-27 10:51, Daniel P. Berrange wrote:
I'm seeing current QEMU GIT fail to boot MS-Dos 6.22 with the following crash:
# qemu-system-x86_64 -fda ~/MS-DOS\ 6.22.img -m 1 -curses
Does the error persist when run with "-m 2"? If more memory fixes the issue, then it is likely already fixed in upstream (commit 890d9851). The bugs fixed in that commit are null pointer derefernce errors - in SeaBIOS, a write to "NULL" actually alters the memory at address 0, which can corrupt the interrupt table - these can lead to unpredictable errors, as the timing between when an irq fires and when the corruption occurs can vary. DOS might overwrite the irq entries with its own settings, and thus depending on timing may cover up the error. In short, I wouldn't assume the problem is the toolchain.
The error occurs no matter what '-m XX' setting I give it. I did a git bisect across a Seabios GIT from master down torel-1.6.3.1 and could not reproduce it with any BIOS I built myself. Hence the only conclusion I could come to is that the QEMU binary was broken in some way.
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
SeaBIOS mailing list SeaBIOS@seabios.org http://www.seabios.org/mailman/listinfo/seabios
On Mon, Feb 27, 2012 at 04:25:09PM +0100, Jan Kiszka wrote:
On 2012-02-27 10:51, Daniel P. Berrange wrote:
I'm seeing current QEMU GIT fail to boot MS-Dos 6.22 with the following crash:
# qemu-system-x86_64 -fda ~/MS-DOS\ 6.22.img -m 1 -curses iPXE v1.0.0-591-g7aee315 iPXE (http://ipxe.org) 00:03.0 C900 PCI2.10 PnP PMM+00000000+00000000 C900
Booting from Floppy..
. qemu: fatal: Trying to execute code outside RAM or ROM at 0x00000001000effff
EAX=ffffffff EBX=ffffffff ECX=0000c934 EDX=00000068 ESI=00006801 EDI=00000000 EBP=0000002b ESP=0000fff5
I traced this down, and it appears to be a stack size issue. It looks like MSDOS calls "int 0x13" with 229 bytes of stack space during its boot. On my build gcc generates the handle_13() function with a maximum of 140 bytes of stack space utilized (according to tools/checkstack.py). On your build, gcc created it with a maximum of 216 bytes. The entry functions use 42 bytes of stack space. Add it up and you can see that the additional stack space that gcc used caused %esp to wrap and the stack was corrupted.
I'm not sure how to best work around this. One way is to sprinkle "noinline" keywords through disk.c. (It seems like gcc got in trouble on your build by inlining many functions into disk_13().) Another way would be to jump into the extra stack (the disk code already uses its own stack) earlier in the handle_13 code.
Also, can you see what happens if you change "--param large-stack-frame=4" to "--param large-stack-frame=0" in the build?
-Kevin
On 2012-03-19 01:29, Kevin O'Connor wrote:
On Mon, Feb 27, 2012 at 04:25:09PM +0100, Jan Kiszka wrote:
On 2012-02-27 10:51, Daniel P. Berrange wrote:
I'm seeing current QEMU GIT fail to boot MS-Dos 6.22 with the following crash:
# qemu-system-x86_64 -fda ~/MS-DOS\ 6.22.img -m 1 -curses iPXE v1.0.0-591-g7aee315 iPXE (http://ipxe.org) 00:03.0 C900 PCI2.10 PnP PMM+00000000+00000000 C900
Booting from Floppy..
. qemu: fatal: Trying to execute code outside RAM or ROM at 0x00000001000effff
EAX=ffffffff EBX=ffffffff ECX=0000c934 EDX=00000068 ESI=00006801 EDI=00000000 EBP=0000002b ESP=0000fff5
I traced this down, and it appears to be a stack size issue. It looks like MSDOS calls "int 0x13" with 229 bytes of stack space during its boot. On my build gcc generates the handle_13() function with a maximum of 140 bytes of stack space utilized (according to tools/checkstack.py). On your build, gcc created it with a maximum of 216 bytes. The entry functions use 42 bytes of stack space. Add it up and you can see that the additional stack space that gcc used caused %esp to wrap and the stack was corrupted.
I'm not sure how to best work around this. One way is to sprinkle "noinline" keywords through disk.c. (It seems like gcc got in trouble on your build by inlining many functions into disk_13().) Another way would be to jump into the extra stack (the disk code already uses its own stack) earlier in the handle_13 code.
Also, can you see what happens if you change "--param large-stack-frame=4" to "--param large-stack-frame=0" in the build?
This makes no difference here, still 216 bytes.
Jan
On Mon, Mar 19, 2012 at 09:03:09PM +0100, Jan Kiszka wrote:
On 2012-03-19 01:29, Kevin O'Connor wrote:
On Mon, Feb 27, 2012 at 04:25:09PM +0100, Jan Kiszka wrote:
EAX=ffffffff EBX=ffffffff ECX=0000c934 EDX=00000068 ESI=00006801 EDI=00000000 EBP=0000002b ESP=0000fff5
I traced this down, and it appears to be a stack size issue. It looks like MSDOS calls "int 0x13" with 229 bytes of stack space during its boot. On my build gcc generates the handle_13() function with a maximum of 140 bytes of stack space utilized (according to tools/checkstack.py). On your build, gcc created it with a maximum of 216 bytes. The entry functions use 42 bytes of stack space. Add it up and you can see that the additional stack space that gcc used caused %esp to wrap and the stack was corrupted.
I'm not sure how to best work around this. One way is to sprinkle "noinline" keywords through disk.c. (It seems like gcc got in trouble on your build by inlining many functions into disk_13().) Another way would be to jump into the extra stack (the disk code already uses its own stack) earlier in the handle_13 code.
Also, can you see what happens if you change "--param large-stack-frame=4" to "--param large-stack-frame=0" in the build?
This makes no difference here, still 216 bytes.
I've noticed that if one takes a pointer to a variable on the stack, gcc seems to do a poor job managing stack space. I'm guessing the patch below would fix the issue. It's a bit ugly though.
-Kevin
diff --git a/src/disk.c b/src/disk.c index 7a58af4..706b9f4 100644 --- a/src/disk.c +++ b/src/disk.c @@ -76,7 +76,7 @@ fillLCHS(struct drive_s *drive_g, u16 *nlc, u16 *nlh, u16 *nlspt) }
// Perform read/write/verify using old-style chs accesses -static void +static void noinline basic_access(struct bregs *regs, struct drive_s *drive_g, u16 command) { struct disk_op_s dop; @@ -119,7 +119,7 @@ basic_access(struct bregs *regs, struct drive_s *drive_g, u16 command) }
// Perform read/write/verify using new-style "int13ext" accesses. -static void +static void noinline extended_access(struct bregs *regs, struct drive_s *drive_g, u16 command) { struct disk_op_s dop; @@ -201,7 +201,7 @@ disk_1304(struct bregs *regs, struct drive_s *drive_g) }
// format disk track -static void +static void noinline disk_1305(struct bregs *regs, struct drive_s *drive_g) { debug_stub(regs); @@ -228,7 +228,7 @@ disk_1305(struct bregs *regs, struct drive_s *drive_g) }
// read disk drive parameters -static void +static void noinline disk_1308(struct bregs *regs, struct drive_s *drive_g) { u16 ebda_seg = get_ebda_seg(); @@ -329,7 +329,7 @@ disk_1314(struct bregs *regs, struct drive_s *drive_g) }
// read disk drive size -static void +static void noinline disk_1315(struct bregs *regs, struct drive_s *drive_g) { disk_ret(regs, DISK_RET_SUCCESS); @@ -463,7 +463,7 @@ disk_1345(struct bregs *regs, struct drive_s *drive_g) }
// IBM/MS eject media -static void +static void noinline disk_1346(struct bregs *regs, struct drive_s *drive_g) { if (regs->dl < EXTSTART_CD) {
Hi,
Maybe STACKS=9,512 would make it more.
Rudolfa