[OpenBIOS] More work on Solaris 8 SPARC32 crash
Mark Cave-Ayland
mark.cave-ayland at siriusit.co.uk
Mon Feb 14 11:29:36 CET 2011
On 13/02/11 22:17, Tarl Neustaedter wrote:
Hi Tarl,
>> Incidentally if I also enable romvec debugging in OpenBIOS this is
>> what I get on the console just before the crash:
>>
>> vac: enabled in write through mode
>> mem = 131072K (0x8000000)
>> avail mem = 110419968
>> obp_nextnode(0x0) = 0xffd4527c
>> obp_proplen(0xffd4527c, reg) (not found)
>> obp_proplen(0xffd4527c, ranges) (not found)
>> obp_proplen(0xffd4527c, intr) (not found)
>> obp_proplen(0xffd4527c, interrupts) (not found)
>
> That's not good. obp_nextnode() should be giving you a pointer to a
> valid node (I believe root), where it looks at properties.
Yes, that is actually what is happening in the trace above -
obp_nextnode(0x0) means 0 is being passed in, and then 0xffd4527c is
being returned as the handle.
> The divide by zero is probably Solaris signalling an error; if things
> are bad enough that it can't talk with the PROM (or doesn't trust it),
> it does a divide by zero to blow up. In
> usr/src/psm/promif/ieee1275/sun4/prom_init.c :
>
>
> /*
> * Fatal promif internal error, not an external interface
> */
>
> /*ARGSUSED*/
> void
> prom_fatal_error(const char *errormsg)
> {
>
> volatile int zero = 0;
> volatile int i = 1;
>
> /*
> * No prom interface, try to cause a trap by
> * dividing by zero, leaving the message in %i0.
> */
>
> i = i / zero;
> /*NOTREACHED*/
>
>
> I don't think this has anything to do with the PIL 14 or 10 issues you
> discuss later on.
Oh that's interesting. However I don't think that this is the case here
for 2 reasons:
1) The backtraces definitely point to an issue with clock initialisation
based upon the symbol names, and enabling the L14 timer does allow the
division by zero to succeed with a value between 0 and the counter limit.
2) The address where the trap is invoked is definitely outside the main
kernel space by some margin, which makes me think that this is because
it is coming from an external kernel module which is being dynamically
loaded - otherwise if it were being caused by the above, I would expect
the trap address to be within the main kernel image.
ATB,
Mark.
--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063
Sirius Labs: http://www.siriusit.co.uk/labs
More information about the OpenBIOS
mailing list