[OpenBIOS] More work on Solaris 8 SPARC32 crash

Mark Cave-Ayland mark.cave-ayland at siriusit.co.uk
Mon Feb 14 11:29:36 CET 2011


On 13/02/11 22:17, Tarl Neustaedter wrote:

Hi Tarl,

>> Incidentally if I also enable romvec debugging in OpenBIOS this is
>> what I get on the console just before the crash:
>>
>> vac: enabled in write through mode
>> mem = 131072K (0x8000000)
>> avail mem = 110419968
>> obp_nextnode(0x0) = 0xffd4527c
>> obp_proplen(0xffd4527c, reg) (not found)
>> obp_proplen(0xffd4527c, ranges) (not found)
>> obp_proplen(0xffd4527c, intr) (not found)
>> obp_proplen(0xffd4527c, interrupts) (not found)
>
> That's not good. obp_nextnode() should be giving you a pointer to a
> valid node (I believe root), where it looks at properties.

Yes, that is actually what is happening in the trace above - 
obp_nextnode(0x0) means 0 is being passed in, and then 0xffd4527c is 
being returned as the handle.

> The divide by zero is probably Solaris signalling an error; if things
> are bad enough that it can't talk with the PROM (or doesn't trust it),
> it does a divide by zero to blow up. In
> usr/src/psm/promif/ieee1275/sun4/prom_init.c :
>
>
> /*
> * Fatal promif internal error, not an external interface
> */
>
> /*ARGSUSED*/
> void
> prom_fatal_error(const char *errormsg)
> {
>
> volatile int zero = 0;
> volatile int i = 1;
>
> /*
> * No prom interface, try to cause a trap by
> * dividing by zero, leaving the message in %i0.
> */
>
> i = i / zero;
> /*NOTREACHED*/
>
>
> I don't think this has anything to do with the PIL 14 or 10 issues you
> discuss later on.

Oh that's interesting. However I don't think that this is the case here 
for 2 reasons:

1) The backtraces definitely point to an issue with clock initialisation 
based upon the symbol names, and enabling the L14 timer does allow the 
division by zero to succeed with a value between 0 and the counter limit.

2) The address where the trap is invoked is definitely outside the main 
kernel space by some margin, which makes me think that this is because 
it is coming from an external kernel module which is being dynamically 
loaded - otherwise if it were being caused by the above, I would expect 
the trap address to be within the main kernel image.


ATB,

Mark.

-- 
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs



More information about the OpenBIOS mailing list