Hi! I'm a major contributor to the gem5 open source computer architecture simulator, and I'm trying to get SeaBIOS and FreeDOS to run on it. We've had at least some level of x86 support on our simulator for a number of years now, but we've primarily focused on 64 bit mode. I've found a lot of bugs in our simulator as I've been going along, but despite my best efforts I haven't been able to find a way to blame my code for the bug I'm currently stuck on.
I'm using a very stripped down configuration of SeaBIOS, and am using the serial console to interact with it since I haven't written a simulated VGA interface yet. It reads FreeDOS from their published QEMU disk image, and it starts up and prints a menu where it wants me to select from 4 different boot modes. Roughly when I send a character over the serial connection, the simulator crashes because the software running on it tried to access an address that has nothing behind it.
I've been tracing this problem down, and I see an int 0x16 with code 0x1 happening, which is trying to check the keyboard status, I believe. That goes along, and eventually calls check_irqs, which calls need_hop_back which returns 1, and then calls "stack_hop_back" to call back into itself but on the "original callers stack" I 75% know what that's talking about, but I'm not 100%.
Anyway, once we're on the other stack, we call into the clock_poll_irq, that calls clock_update, that calls the (inlined I think) sercon_check_event, and when that tries to SET_LOW(rx_buf[rx_bytes], byte) the bad access happens. At least I'm pretty confident that's where it happens, it could also be in one of the other lines right around there.
The problem seems to be that the variable it's trying to access is supposed to be in the "e" segment, ie with selector 0xe000 and base address 0xe0000. The code that does this is here:
fbd43: 8e c3 mov %bx,%es
fbd45: 26 8a 16 9d f7 mov %es:-0x863,%dl
fbd4a: 80 fa 0f cmp $0xf,%dl
fbd4d: 77 e6 ja fbd35 <clock_update+0x6b>
fbd4f: 66 0f b6 0e 9d f7 movzbl -0x863,%ecx <==== where it asplodes
fbd55: 26 67 88 81 a0 f7 00 mov %al,%es:0xf7a0(%ecx)
Note the comparison against 0xf, which I think is where it checks against the size of rx_buf.
You can see here that this access is (I think) using the %ds register by default. It has an operand size prefix, and a 2 byte displacement of 0xf79d. Adding this to 0xe0000 gives 0xef79d, which from what I've seen is a pretty valid looking address, not far below where I have the BIOS ROM mapped in.
Unfortunately when this has problems, %ds is actually 0x9d80, which gives a base of 0x9d800, which gives a linear address of 0xacf9d. This is in the middle of the (not yet implemented) VGA framebuffer which is why it dies.
I then traced down why %ds has this value, and it's from the "hop back" step, specifically here:
asm volatile(
// Backup stack_pos and current %ss/%esp
"movl %6, %4\n"
"movw %%ss, %w3\n"
"movl %%esp, %6\n"
// Restore original callers' %ss/%esp
"movl -4(%4), %5\n"
"movl %5, %%ss\n" <======== Where %ss is set
"movw %%ds:-8(%4), %%sp\n"
"movl %5, %%ds\n" <======== Where %ds is set
// Call func
Note that in this code, *both* %ss and %ds are being set, and being set to the same thing. This value *was* successfully pulled off the saved data from when the int was originally called as far as I can tell, but this value of %ds does *not* seem to be correct, since the first time it's used it causes the bad access.
Could you please help me figure out what's going wrong here? Is this supposed to work out somehow, and my simulator is just wrong (my bet, but what's it doing wrong?), or is this a bug in SeaBIOS? Am I using SeaBIOS in some way it's known not to work?
Please let me know if you need any other info, I'll be more than happy to get this sorted out!
Gabe