On 02/01/11 11:14, Mark Cave-Ayland wrote:
The kernel stack is overflown. Perhaps some recursive loop (iterating device tree, since this doesn't happen on real hardware?) never exits, or maybe OpenBIOS consumes kernel stack much more than OBP.
Yeah, that's the conclusion I came to although I'm not really familiar with the overall Solaris boot process to figure out what should happen as opposed to what does happen.
After some fiddling with the Solaris ISO, I extracted out the SS-5 kernel and loaded the symbols from that into gdb and tried to step through various bits to see what is happening.
Stepping through the code manually, it looks like we're going through the following function names:
startup() hat_kern_setup() vac_flush() fix_prom_pages()
Moving futher, it's a little difficult to tell but it looks as if we're dying in some kind of MMU setup here:
#0 0xf007057c in page_numtopp_nolock () #1 0xf005517c in load_l3 () #2 0xf0054e7c in load_l2 () #3 0xf024429c in rootfs () #4 0xf024429c in rootfs ()
I added a breakpoint at 0xf02442a0 and that wasn't reached before the fatal trap fired. Taking a look at these routines in the OpenSolaris source, it looks like fix_prom_pages() does some interesting things with memory lists to work out which parts of memory are already mapped, and so my current suspicion is that the memory lists are somehow wrong.
Does anyone know whether Solaris 8 uses the romvec v0 memlist arrays or whether it uses the relevant properties read directly from the /virtual-memory and /memory nodes?
ATB,
Mark.