[OpenBIOS] Analysis of current Solaris 8 boot failure on SPARC32

Mark Cave-Ayland mark.cave-ayland at siriusit.co.uk
Mon Jan 3 14:51:27 CET 2011


On 02/01/11 11:14, Mark Cave-Ayland wrote:

>> The kernel stack is overflown. Perhaps some recursive loop (iterating
>> device tree, since this doesn't happen on real hardware?) never exits,
>> or maybe OpenBIOS consumes kernel stack much more than OBP.
>
> Yeah, that's the conclusion I came to although I'm not really familiar
> with the overall Solaris boot process to figure out what should happen
> as opposed to what does happen.

After some fiddling with the Solaris ISO, I extracted out the SS-5 
kernel and loaded the symbols from that into gdb and tried to step 
through various bits to see what is happening.

Stepping through the code manually, it looks like we're going through 
the following function names:

startup()
hat_kern_setup()
vac_flush()
fix_prom_pages()

Moving futher, it's a little difficult to tell but it looks as if we're 
dying in some kind of MMU setup here:

#0  0xf007057c in page_numtopp_nolock ()
#1  0xf005517c in load_l3 ()
#2  0xf0054e7c in load_l2 ()
#3  0xf024429c in rootfs ()
#4  0xf024429c in rootfs ()

I added a breakpoint at 0xf02442a0 and that wasn't reached before the 
fatal trap fired. Taking a look at these routines in the OpenSolaris 
source, it looks like fix_prom_pages() does some interesting things with 
memory lists to work out which parts of memory are already mapped, and 
so my current suspicion is that the memory lists are somehow wrong.

Does anyone know whether Solaris 8 uses the romvec v0 memlist arrays or 
whether it uses the relevant properties read directly from the 
/virtual-memory and /memory nodes?


ATB,

Mark.

-- 
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs



More information about the OpenBIOS mailing list