On Fri, Jul 08, 2011 at 09:46:32AM +0200, Bjørn Mork wrote:
"Kevin O'Connor" kevin@koconnor.net writes:
It's possible that the OS has an error in handling the SMBIOS when it is in high-memory (located above 1meg). (For example, older versions of Linux crash when the mptable is in high memory.)
[...]
However, it would be really odd for the OS to work some times with the SMBIOS in high memory and sometimes fail.
Yes. Just to be perfectly clear: The crash with SMBIOS in high memory happens every time with "recent" (anything from 2009 or later) SeaBIOS versions.
I must admit that I right now am wondering whether I somehow screwed up the previous testing of older versions. I am not at all sure under what circumstances older SeaBIOS would work with SMBIOS enabled.
I investigated this a bit and I believe the Juniper OS has two SMBIOS bugs: it crashes when the table is in high memory, and when searching for the table it only checks the first 16byte aligned "_SM_" signature it finds.
The first bug looks likely because when one runs QEmu with "-d in_asm,int,exec" there is a "check_exception" near the end of the log with an address that is similar to the address of the high-memory SMBIOS table.
The second bug causes the OS to ignore the SMBIOS table if the SeaBIOS code layout happens to put an "_SM_" signature on a 16byte boundary before the real SMBIOS table - this led to "Bad SMBIOS data checksum" messages. When this happened, the OS would go on to boot okay because it then didn't try to read the real SMBIOS table from high-memory. Since SeaBIOS has the signature in its code (it needs it to build the table) there was roughly a 1 in 16 chance that a random build would happen to place code in a place that would confuse the OS and cause it to ignore the real SMBIOS table.
Long story short, the Juniper OS probably started failing at commit 2929c352b but bisect didn't find it because random other builds would pass due to the second bug.
I tried malloc_low() too, and that works as well. But malloc_fseg() seems appropriate, unless I've misunderstood something here. Which very well can be. I am not going to claim any understanding at all.
So, moving the SMBIOS back to the f-segment would fix this. But, there's an issue with that.
The SMBIOS table is normally pretty small (eg, 263 bytes), but it increases with the amount of ram and number of CPUs. If one starts QEmu with 255 cpus, the SMBIOS size is over 11K. Fitting that into the f-segment (64K shared with all the other 16bit code and data) is going to be a real problem. This is also an issue for the mptable (which is over 5K at 255 cpus), but newer OSes don't care about mptable - not so for the smbios table though.
I'm not sure what to do. -Kevin