On 03/08/13 04:35, Kevin O'Connor wrote:
On Thu, Mar 07, 2013 at 09:43:04AM +0100, Aurelien Jarno wrote:
On Wed, Mar 06, 2013 at 07:53:51PM -0500, Kevin O'Connor wrote:
That change is definitely just build related - I don't see how it could impact the final SeaBIOS binary. How did you conclude that this commit is what fixes the issue?
I did a git bisect to find the commit fixing the issue. Then, as I was not believing the result, I tried the following sequence a dozen of times (for some unknown reasons the FreeBSD install CD doesn't exhibit the issue, so I used the Debian GNU/kFreeBSD installer):
[...]
Thanks for the detailed bug report. Here's what I see going on:
the SeaBIOS 4219149a commit does change the resulting binary ever so slightly - the src/virtio_ring.c code has a reference to __FILE__ (the only code in SeaBIOS that does that), and due to slightly different build rules in this commit it evaluates to a slightly different string.
the freebsd crash has nothing to do with 4219149a or src/virtio_ring.c - instead, random changes in the seabios binary layout can cause (or avoid) the crash. You can see this in action by modifying seabios to have higher debug levels, commenting out code, adding dprintf statements, etc.
the crash happens when freebsd attempts to emulate the bios code (!) in order to determine the keyboard typematic rate (!). (See sys/dev/atkbdc/atkbd.c.) Since SeaBIOS doesn't support the typematic callback rate (int 0x16 ax=0x0306) this doesn't actually achieve anything in practice were the call to not crash. However, a crash does (sometimes) result.
the freebsd x86bios_get_pages() code is buggy (See sys/compat/x86bios/x86bios.c). It attempts to check that its x86 emulater (!) doesn't access a page it hasn't mapped. However, it does not check for the case where a two byte access spans two pages. If the first page is mapped, but the second is not - splat. The crash I've seen in QEMU had a two byte access to 0xffffff8000015fff with the fault at 0xffffff8000016000.
I have not been able to determine why an attempt was made to access a non-mapped page. My best guess is that the x86emu code (!) goes off the deep-end in all cases - just some cases lead it to the bug above and other cases lead it to a more friendly termination. (Recall that SeaBIOS doesn't support the typematic call anyway.) It should be possible to track this down by adding debug statements to the freebsd code if anyone is familiar with the freebsd kernel compile-deploy-run cycle.
Great analysis!
Laszlo (sorry for the noise)