Hi all,
Currently attempting to boot a Solaris 8 install CD results in the following output:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Jan 2 2011 00:28 Type 'help' for detailed information Trying cdrom:d... Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f004127c npc: f0041280 General Registers: %g0-7: 00000000 00000808 00000001 f0041b74 00000000 f0243b88 00000000 f0244020
Current Register Window: %o0-7: f025831c f5a0f00c f0240374 f0240370 f024036c 00000004 f0240300 f005bd84 %l0-7: 04400cc2 f005bf94 f005bf98 00000004 00000209 00000004 00000000 f023fe60 %i0-7: 00000001 f02403f4 f5a0f00c f025831c 00000001 00000009 f023ff08 f005c6b8
Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 psr: 04000cc2 (icc: ---- SPE: SP-) wim: 00000004 fsr: 00080000 y: 00000000 Aborted
With the SPARC32 OFMEM migration complete, we can now get lots of debugging information regarding the memory mappings being made at run time. Setting a breakpoint at the crash address, it is possible to see that it is part of a loop that called several times during boot. Using this we can compare the successful iterations of the loop with the failing version in order to determine where the crash is happening.
Here is the gdb output from the last successful iteration of the loop:
Breakpoint 1, 0xf004127c in ?? () (gdb) disas 0xf0041270 0xf00412a0 Dump of assembler code from 0xf0041270 to 0xf00412a0: 0xf0041270: rett %l2 + 4 0xf0041274: b 0xf004127c 0xf0041278: nop 0xf004127c: mov 1, %l5 ! 0x1 0xf0041280: sll %l5, %l0, %l5 0xf0041284: rd %wim, %l3 0xf0041288: btst 0x40, %l0 0xf004128c: be 0xf0041318 0xf0041290: btst %l3, %l5 0xf0041294: sub %fp, 0xa8, %l7 0xf0041298: st %g1, [ %l7 + 0x6c ] 0xf004129c: std %g2, [ %l7 + 0x70 ] End of assembler dump. (gdb) info regi g0 0x0 0 g1 0x808 2056 g2 0xf5a0f000 -174002176 g3 0x19 25 g4 0x0 0 g5 0xf0243b88 -266060920 g6 0x0 0 g7 0xf0244020 -266059744 o0 0x0 0 o1 0xf02406b4 -266074444 o2 0xf5a0f00c -174002164 o3 0xf0258398 -265976936 o4 0xf0252b10 -265999600 o5 0x0 0 sp 0xf0240658 0xf0240658 o7 0xf0041b74 -268166284 l0 0x4400cc0 71306432 l1 0xf004b1f8 -268127752 l2 0xf004b1fc -268127748 l3 0xf0041000 -268169216 l4 0x209 521 l5 0x1 1 l6 0x7 7 l7 0xf0240658 -266074536 i0 0xf024d870 -266020752 i1 0x0 0 i2 0xff812201 -8314367 i3 0x0 0 i4 0x0 0 i5 0xf01582dc -267025700 fp 0xf0240290 0xf0240290 i7 0xf004ef98 -268111976 y 0x0 0 psr 0x4400cc0 [ PS S #10 #11 #22 #26 ] wim 0x1 1 tbr 0xf0040090 -268173168 pc 0xf004127c 0xf004127c npc 0xf0041280 0xf0041280 fsr 0x80000 [ #19 ] csr 0x0 0 (gdb) stepi 0xf0041280 in ?? () (gdb) 0xf0041284 in ?? () (gdb) 0xf0041288 in ?? () (gdb) 0xf004128c in ?? () (gdb) 0xf0041290 in ?? () (gdb) 0xf0041294 in ?? () (gdb) 0xf0041298 in ?? () (gdb) 0xf004129c in ?? () (gdb) info regi g0 0x0 0 g1 0x808 2056 g2 0xf5a0f000 -174002176 g3 0x19 25 g4 0x0 0 g5 0xf0243b88 -266060920 g6 0x0 0 g7 0xf0244020 -266059744 o0 0x0 0 o1 0xf02406b4 -266074444 o2 0xf5a0f00c -174002164 o3 0xf0258398 -265976936 o4 0xf0252b10 -265999600 o5 0x0 0 sp 0xf0240658 0xf0240658 o7 0xf0041b74 -268166284 l0 0x4400cc0 71306432 l1 0xf004b1f8 -268127752 l2 0xf004b1fc -268127748 l3 0x1 1 l4 0x209 521 l5 0x1 1 l6 0x7 7 l7 0xf02401e8 -266075672 i0 0xf024d870 -266020752 i1 0x0 0 i2 0xff812201 -8314367 i3 0x0 0 i4 0x0 0 i5 0xf01582dc -267025700 fp 0xf0240290 0xf0240290 i7 0xf004ef98 -268111976 y 0x0 0 psr 0x4000cc0 [ PS S #10 #11 #26 ] wim 0x1 1 tbr 0xf0040090 -268173168 pc 0xf004129c 0xf004129c npc 0xf00412a0 0xf00412a0 fsr 0x80000 [ #19 ] csr 0x0 0
And here is the failing version:
(gdb) info regi g0 0x0 0 g1 0x80 128 g2 0xf5a0f000 -174002176 g3 0x1a 26 g4 0x0 0 g5 0xf0243b88 -266060920 g6 0x0 0 g7 0xf0244020 -266059744 o0 0x0 0 o1 0xf024047c -266075012 o2 0xf5a0f00c -174002164 o3 0xf0258398 -265976936 o4 0xf0252b10 -265999600 o5 0x0 0 sp 0xf0240420 0xf0240420 o7 0xf0041b74 -268166284 l0 0x4400cc4 71306436 l1 0xf004b1f8 -268127752 l2 0xf004b1fc -268127748 l3 0x10 16 l4 0x209 521 l5 0x10 16 l6 0x7 7 l7 0xf023ffb0 -266076240 i0 0xf5a0f01c -174002148 i1 0x100 256 i2 0xf0000000 -268435456 i3 0xff000000 -16777216 i4 0x4100cc5 68160709 i5 0x4100ce5 68160741 fp 0xf0240058 0xf0240058 i7 0xf0054be8 -268088344 y 0x0 0 psr 0x4000cc4 [ #2 PS S #10 #11 #26 ] wim 0x10 16 tbr 0xf0040090 -268173168 pc 0xf00412a4 0xf00412a4 npc 0xf00412a8 0xf00412a8 fsr 0x80000 [ #19 ] csr 0x0 0 (gdb) cont Continuing.
Breakpoint 1, 0xf004127c in ?? () (gdb) stepi 0xf0041280 in ?? () (gdb) 0xf0041284 in ?? () (gdb) 0xf0041288 in ?? () (gdb) 0xf004128c in ?? () (gdb) 0xf0041290 in ?? () (gdb) 0xf0041294 in ?? () (gdb) 0xf0041298 in ?? () (gdb) Remote connection closed (gdb)
So the failure appears to be happening on this instruction:
0xf0041298: st %g1, [ %l7 + 0x6c ]
For the successful iteration:
l7 0xf02401e8 -266075672
For the failing iteration:
l7 0xf023ffb0 -266076240
With OFMEM debugging enabled, it's fairly easy to see the following in the console output:
Jumping to entry point 00004000 for type 00000005... switching to new context: OFMEM: ofmem_claim phys=ffffffffffffffff size=00040000 align=00000008 OFMEM: ofmem_claim_virt virt=f0040000 size=00040000 align=00000000 OFMEM: ofmem_map_page_range f0040000 -> 006fc0000 00040000 mode 000000bc OFMEM: ofmem_claim phys=ffffffffffffffff size=00019000 align=00000008 OFMEM: ofmem_claim_virt virt=f0240000 size=00019000 align=00000000 OFMEM: ofmem_map_page_range f0240000 -> 006fa7000 00019000 mode 000000bc
So what is happening is that %l7 is getting set to below 0xf0240000 and hence the trap is triggered because the kernel is attempting to write to unmapped virtual memory.
Using Artyom's blog, I was able to fire up kadb to try and figure out which part of the kernel is raising the exception:
kadb[0]: 0xf004127c? sys_trap: sys_trap: aa102001 = mov 0x1, %l5 kadb[0]:
Based upon this, it would seem that the Solaris kernel allocates a stack for saving state when a trap is called with a base of 0xf0240000, but for some reason we are stacking to a point where we go beyond the memory region allocated for it. I suspect that this is a side effect of a property/device not being setup correctly, but I'm not yet sure what it is. Anyhow, I thought I'd post the results of my investigations so far in case anyone else has any ideas as to what could cause this.
ATB,
Mark.
On Sun, Jan 2, 2011 at 1:17 AM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Hi all,
Currently attempting to boot a Solaris 8 install CD results in the following output:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Jan 2 2011 00:28 Type 'help' for detailed information Trying cdrom:d... Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f004127c npc: f0041280 General Registers: %g0-7: 00000000 00000808 00000001 f0041b74 00000000 f0243b88 00000000 f0244020
Current Register Window: %o0-7: f025831c f5a0f00c f0240374 f0240370 f024036c 00000004 f0240300 f005bd84 %l0-7: 04400cc2 f005bf94 f005bf98 00000004 00000209 00000004 00000000 f023fe60 %i0-7: 00000001 f02403f4 f5a0f00c f025831c 00000001 00000009 f023ff08 f005c6b8
Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 psr: 04000cc2 (icc: ---- SPE: SP-) wim: 00000004 fsr: 00080000 y: 00000000 Aborted
With the SPARC32 OFMEM migration complete, we can now get lots of debugging information regarding the memory mappings being made at run time. Setting a breakpoint at the crash address, it is possible to see that it is part of a loop that called several times during boot. Using this we can compare the successful iterations of the loop with the failing version in order to determine where the crash is happening.
Here is the gdb output from the last successful iteration of the loop:
Breakpoint 1, 0xf004127c in ?? () (gdb) disas 0xf0041270 0xf00412a0 Dump of assembler code from 0xf0041270 to 0xf00412a0: 0xf0041270: rett %l2 + 4 0xf0041274: b 0xf004127c 0xf0041278: nop 0xf004127c: mov 1, %l5 ! 0x1 0xf0041280: sll %l5, %l0, %l5 0xf0041284: rd %wim, %l3 0xf0041288: btst 0x40, %l0 0xf004128c: be 0xf0041318 0xf0041290: btst %l3, %l5 0xf0041294: sub %fp, 0xa8, %l7 0xf0041298: st %g1, [ %l7 + 0x6c ] 0xf004129c: std %g2, [ %l7 + 0x70 ]
Seems to be some kind of window trap handler.
End of assembler dump. (gdb) info regi g0 0x0 0 g1 0x808 2056 g2 0xf5a0f000 -174002176 g3 0x19 25 g4 0x0 0 g5 0xf0243b88 -266060920 g6 0x0 0 g7 0xf0244020 -266059744 o0 0x0 0 o1 0xf02406b4 -266074444 o2 0xf5a0f00c -174002164 o3 0xf0258398 -265976936 o4 0xf0252b10 -265999600 o5 0x0 0 sp 0xf0240658 0xf0240658 o7 0xf0041b74 -268166284 l0 0x4400cc0 71306432 l1 0xf004b1f8 -268127752 l2 0xf004b1fc -268127748 l3 0xf0041000 -268169216 l4 0x209 521 l5 0x1 1 l6 0x7 7 l7 0xf0240658 -266074536 i0 0xf024d870 -266020752 i1 0x0 0 i2 0xff812201 -8314367 i3 0x0 0 i4 0x0 0 i5 0xf01582dc -267025700 fp 0xf0240290 0xf0240290 i7 0xf004ef98 -268111976 y 0x0 0 psr 0x4400cc0 [ PS S #10 #11 #22 #26 ] wim 0x1 1 tbr 0xf0040090 -268173168 pc 0xf004127c 0xf004127c npc 0xf0041280 0xf0041280 fsr 0x80000 [ #19 ] csr 0x0 0 (gdb) stepi 0xf0041280 in ?? () (gdb) 0xf0041284 in ?? () (gdb) 0xf0041288 in ?? () (gdb) 0xf004128c in ?? () (gdb) 0xf0041290 in ?? () (gdb) 0xf0041294 in ?? () (gdb) 0xf0041298 in ?? () (gdb) 0xf004129c in ?? () (gdb) info regi g0 0x0 0 g1 0x808 2056 g2 0xf5a0f000 -174002176 g3 0x19 25 g4 0x0 0 g5 0xf0243b88 -266060920 g6 0x0 0 g7 0xf0244020 -266059744 o0 0x0 0 o1 0xf02406b4 -266074444 o2 0xf5a0f00c -174002164 o3 0xf0258398 -265976936 o4 0xf0252b10 -265999600 o5 0x0 0 sp 0xf0240658 0xf0240658 o7 0xf0041b74 -268166284 l0 0x4400cc0 71306432 l1 0xf004b1f8 -268127752 l2 0xf004b1fc -268127748 l3 0x1 1 l4 0x209 521 l5 0x1 1 l6 0x7 7 l7 0xf02401e8 -266075672 i0 0xf024d870 -266020752 i1 0x0 0 i2 0xff812201 -8314367 i3 0x0 0 i4 0x0 0 i5 0xf01582dc -267025700 fp 0xf0240290 0xf0240290 i7 0xf004ef98 -268111976 y 0x0 0 psr 0x4000cc0 [ PS S #10 #11 #26 ] wim 0x1 1 tbr 0xf0040090 -268173168 pc 0xf004129c 0xf004129c npc 0xf00412a0 0xf00412a0 fsr 0x80000 [ #19 ] csr 0x0 0
And here is the failing version:
(gdb) info regi g0 0x0 0 g1 0x80 128 g2 0xf5a0f000 -174002176 g3 0x1a 26 g4 0x0 0 g5 0xf0243b88 -266060920 g6 0x0 0 g7 0xf0244020 -266059744 o0 0x0 0 o1 0xf024047c -266075012 o2 0xf5a0f00c -174002164 o3 0xf0258398 -265976936 o4 0xf0252b10 -265999600 o5 0x0 0 sp 0xf0240420 0xf0240420 o7 0xf0041b74 -268166284 l0 0x4400cc4 71306436 l1 0xf004b1f8 -268127752 l2 0xf004b1fc -268127748 l3 0x10 16 l4 0x209 521 l5 0x10 16 l6 0x7 7 l7 0xf023ffb0 -266076240 i0 0xf5a0f01c -174002148 i1 0x100 256 i2 0xf0000000 -268435456 i3 0xff000000 -16777216 i4 0x4100cc5 68160709 i5 0x4100ce5 68160741 fp 0xf0240058 0xf0240058 i7 0xf0054be8 -268088344 y 0x0 0 psr 0x4000cc4 [ #2 PS S #10 #11 #26 ] wim 0x10 16 tbr 0xf0040090 -268173168 pc 0xf00412a4 0xf00412a4 npc 0xf00412a8 0xf00412a8 fsr 0x80000 [ #19 ] csr 0x0 0 (gdb) cont Continuing.
Breakpoint 1, 0xf004127c in ?? () (gdb) stepi 0xf0041280 in ?? () (gdb) 0xf0041284 in ?? () (gdb) 0xf0041288 in ?? () (gdb) 0xf004128c in ?? () (gdb) 0xf0041290 in ?? () (gdb) 0xf0041294 in ?? () (gdb) 0xf0041298 in ?? () (gdb) Remote connection closed (gdb)
So the failure appears to be happening on this instruction:
0xf0041298: st %g1, [ %l7 + 0x6c ]
For the successful iteration:
l7 0xf02401e8 -266075672
For the failing iteration:
l7 0xf023ffb0 -266076240
With OFMEM debugging enabled, it's fairly easy to see the following in the console output:
Jumping to entry point 00004000 for type 00000005... switching to new context: OFMEM: ofmem_claim phys=ffffffffffffffff size=00040000 align=00000008 OFMEM: ofmem_claim_virt virt=f0040000 size=00040000 align=00000000 OFMEM: ofmem_map_page_range f0040000 -> 006fc0000 00040000 mode 000000bc OFMEM: ofmem_claim phys=ffffffffffffffff size=00019000 align=00000008 OFMEM: ofmem_claim_virt virt=f0240000 size=00019000 align=00000000 OFMEM: ofmem_map_page_range f0240000 -> 006fa7000 00019000 mode 000000bc
So what is happening is that %l7 is getting set to below 0xf0240000 and hence the trap is triggered because the kernel is attempting to write to unmapped virtual memory.
Using Artyom's blog, I was able to fire up kadb to try and figure out which part of the kernel is raising the exception:
kadb[0]: 0xf004127c? sys_trap: sys_trap: aa102001 = mov 0x1, %l5 kadb[0]:
Based upon this, it would seem that the Solaris kernel allocates a stack for saving state when a trap is called with a base of 0xf0240000, but for some reason we are stacking to a point where we go beyond the memory region allocated for it. I suspect that this is a side effect of a property/device not being setup correctly, but I'm not yet sure what it is. Anyhow, I thought I'd post the results of my investigations so far in case anyone else has any ideas as to what could cause this.
The kernel stack is overflown. Perhaps some recursive loop (iterating device tree, since this doesn't happen on real hardware?) never exits, or maybe OpenBIOS consumes kernel stack much more than OBP.
On 02/01/11 09:53, Blue Swirl wrote:
Based upon this, it would seem that the Solaris kernel allocates a stack for saving state when a trap is called with a base of 0xf0240000, but for some reason we are stacking to a point where we go beyond the memory region allocated for it. I suspect that this is a side effect of a property/device not being setup correctly, but I'm not yet sure what it is. Anyhow, I thought I'd post the results of my investigations so far in case anyone else has any ideas as to what could cause this.
The kernel stack is overflown. Perhaps some recursive loop (iterating device tree, since this doesn't happen on real hardware?) never exits, or maybe OpenBIOS consumes kernel stack much more than OBP.
Yeah, that's the conclusion I came to although I'm not really familiar with the overall Solaris boot process to figure out what should happen as opposed to what does happen.
Idly adding a few breakpoints in obp_fortheval_v2(), a couple of interesting things stand out:
Breakpoint 1, obp_fortheval_v2 (str=0x11a268 "h# f800.0000 rmap@ swap ! ", arg0=1247220, arg1=0, arg2=0, arg3=0, arg4=0) at ../arch/sparc32/romvec.c:428
Firstly, we don't do full region/segment mapping in OpenBIOS (we only have 1 level PTEs) but I'm not sure this is relevant here.
Secondly, the majority of the calls to obp_devopen() pass an argument string like this "/iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@2,0:d". However, towards the end it changes over to this: "/iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@2,0:b". Maybe the kernel can't find something in the d slice and so switches to the b slice as a backup, at which point it runs out of stack space as this isn't supposed to happen?
ATB,
Mark.
On 02/01/11 11:14, Mark Cave-Ayland wrote:
The kernel stack is overflown. Perhaps some recursive loop (iterating device tree, since this doesn't happen on real hardware?) never exits, or maybe OpenBIOS consumes kernel stack much more than OBP.
Yeah, that's the conclusion I came to although I'm not really familiar with the overall Solaris boot process to figure out what should happen as opposed to what does happen.
After some fiddling with the Solaris ISO, I extracted out the SS-5 kernel and loaded the symbols from that into gdb and tried to step through various bits to see what is happening.
Stepping through the code manually, it looks like we're going through the following function names:
startup() hat_kern_setup() vac_flush() fix_prom_pages()
Moving futher, it's a little difficult to tell but it looks as if we're dying in some kind of MMU setup here:
#0 0xf007057c in page_numtopp_nolock () #1 0xf005517c in load_l3 () #2 0xf0054e7c in load_l2 () #3 0xf024429c in rootfs () #4 0xf024429c in rootfs ()
I added a breakpoint at 0xf02442a0 and that wasn't reached before the fatal trap fired. Taking a look at these routines in the OpenSolaris source, it looks like fix_prom_pages() does some interesting things with memory lists to work out which parts of memory are already mapped, and so my current suspicion is that the memory lists are somehow wrong.
Does anyone know whether Solaris 8 uses the romvec v0 memlist arrays or whether it uses the relevant properties read directly from the /virtual-memory and /memory nodes?
ATB,
Mark.
On 03/01/11 13:51, Mark Cave-Ayland wrote:
Does anyone know whether Solaris 8 uses the romvec v0 memlist arrays or whether it uses the relevant properties read directly from the /virtual-memory and /memory nodes?
Ah I think there is a distinct possibility that the memory properties are being generated incorrectly on SPARC32 :/
Artyom: any chance you could send the output of the following in OBP?
cd / .properties
cd /memory .properties
cd /virtual-memory .properties
My suspicion is the translations property is wrong (definitely the virtual address part of the translation struct because it doesn't respect #address-cells) and also possibly the mode.
ATB,
Mark.
On Mon, Jan 3, 2011 at 4:09 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
On 03/01/11 13:51, Mark Cave-Ayland wrote:
Does anyone know whether Solaris 8 uses the romvec v0 memlist arrays or whether it uses the relevant properties read directly from the /virtual-memory and /memory nodes?
Ah I think there is a distinct possibility that the memory properties are being generated incorrectly on SPARC32 :/
Artyom: any chance you could send the output of the following in OBP?
cd / .properties
cd /memory .properties
cd /virtual-memory .properties
Do you mean these,
http://lists.openbios.org/pipermail/openbios/2010-October/005611.html
or some other ones?
My suspicion is the translations property is wrong (definitely the virtual address part of the translation struct because it doesn't respect #address-cells) and also possibly the mode.
On 03/01/11 21:40, Artyom Tarasenko wrote:
Do you mean these,
http://lists.openbios.org/pipermail/openbios/2010-October/005611.html
or some other ones?
Hmmmm those are the ones, but I thought they were incomplete as there was no translations property in /virtual-memory?
If this property is only available on newer versions of OBP, we may need to tweak OFMEM in update the ROMVEC arrays too.
ATB,
Mark.
Am 03.01.2011 um 16:09 schrieb Mark Cave-Ayland:
Artyom: any chance you could send the output of the following in OBP?
cd / .properties
cd /memory .properties
cd /virtual-memory .properties
Some of these he already posted. :)
My suspicion is the translations property is wrong (definitely the virtual address part of the translation struct because it doesn't respect #address-cells) and also possibly the mode.
According to Tarl, the virtual address is not supposed to respect #address-cells but to use as many (integer) cells as needed for - hardcoded - one (stack) cell. I would thus expect the virtual address to be 4 bytes on sparc32.
Maybe the physical address is wider than expected?
Andreas
On 03/01/11 21:59, Andreas Färber wrote:
According to Tarl, the virtual address is not supposed to respect #address-cells but to use as many (integer) cells as needed for - hardcoded - one (stack) cell. I would thus expect the virtual address to be 4 bytes on sparc32.
Maybe the physical address is wider than expected?
Reviewing this again now, the obvious thing to spot is that the virtual address should be 2 cells wide in /virtual-memory:
ok cd /virtual-memory ok .properties .properties ? ok .attributes available 00000000 fff00000 00100000
i.e. the virtual address does seem to respect #address-cells here. I did a quick hack on OFMEM here to see what happens if I do this, and now the Solaris kernel gets stuck in a panic loop rather than invoking the fatal trap in Qemu - which I guess is progress ;)
It does however mean that the translation information must be being passed via the romvec memlist arrays, rather than being read from the device tree properties unless there is another MMU device node somewhere that we haven't found?
ATB,
Mark.
On 2011-1-3 1:59 PM, Andreas Färber wrote:
[...] According to Tarl, the virtual address is not supposed to respect #address-cells but to use as many (integer) cells as needed for - hardcoded
- one (stack) cell. I would thus expect the virtual address to be 4 bytes on
sparc32.
I don't recall phrasing it that way, but indeed #address-cells is specific to physical addresses. I might guess that back in 32-bit forth, the virtual address would be 4 bytes, but that predates me (by 1994, we were doing 64-bit forth). I'd expect any recent Solaris to believe OBP is 64-bit. Matter of fact, by Solaris 2.4, we were already doing 64-bit, why are you running into 32-bit with Solaris 8?
By the way, have you seen P1275.1/D14a, Supplement for IEEE 1754 ISA (SPARC)? It talks a lot about issues you've been dealing with recently (lessee - that's http://playground.sun.com/1275/bindings/sparc/d14a/12751d1a.ps ).
On 03/01/11 22:45, Tarl Neustaedter wrote:
I don't recall phrasing it that way, but indeed #address-cells is specific to physical addresses. I might guess that back in 32-bit forth, the virtual address would be 4 bytes, but that predates me (by 1994, we were doing 64-bit forth). I'd expect any recent Solaris to believe OBP is 64-bit. Matter of fact, by Solaris 2.4, we were already doing 64-bit, why are you running into 32-bit with Solaris 8?
Currently I have a 32-bit Solaris 8 install CD at work, and Qemu SPARC32 by default emulates an SS-5 sun4m machine (which Artyom has managed to get working with a PROM image from a real SS-5). Hence I've been comparing outputs to try and figure out what the Solaris kernel is doing with OBP which we currently aren't doing correctly in OpenBIOS.
By the way, have you seen P1275.1/D14a, Supplement for IEEE 1754 ISA (SPARC)? It talks a lot about issues you've been dealing with recently (lessee - that's http://playground.sun.com/1275/bindings/sparc/d14a/12751d1a.ps ).
Yes thanks - I already have a copy of this in my PDF library :)
ATB,
Mark.
On 2011-1-3 1:59 PM, Andreas Färber wrote:
[...] According to Tarl, the virtual address is not supposed to respect #address-cells but to use as many (integer) cells as needed for - hardcoded
- one (stack) cell. I would thus expect the virtual address to be 4 bytes on
sparc32.
What I find in Solaris (current Solaris, I don't currently have easy access to back versions) is below. Look in usr/src/psm/stand/boot/sparc/sun4/sys/prom_plat.h
/* * The 'format' of the "translations" property in the 'mmu' node ... */
struct translation { uint32_t virt_hi; /* upper 32 bits of vaddr */ uint32_t virt_lo; /* lower 32 bits of vaddr */ uint32_t size_hi; /* upper 32 bits of size in bytes */ uint32_t size_lo; /* lower 32 bits of size in bytes */ uint32_t tte_hi; /* higher 32 bites of tte */ uint32_t tte_lo; /* lower 32 bits of tte */ }
On 03/01/11 22:52, Tarl Neustaedter wrote:
What I find in Solaris (current Solaris, I don't currently have easy access to back versions) is below. Look in usr/src/psm/stand/boot/sparc/sun4/sys/prom_plat.h
/*
- The 'format' of the "translations" property in the 'mmu' node ...
*/
struct translation { uint32_t virt_hi; /* upper 32 bits of vaddr */ uint32_t virt_lo; /* lower 32 bits of vaddr */ uint32_t size_hi; /* upper 32 bits of size in bytes */ uint32_t size_lo; /* lower 32 bits of size in bytes */ uint32_t tte_hi; /* higher 32 bites of tte */ uint32_t tte_lo; /* lower 32 bits of tte */ }
Hmmm that looks right for 64-bit but I'm not sure about 32-bit (note the use of tte rather than pte). Given that I don't think there is an MMU node in the SS-5 OBP boot tree, I'm not sure how the kernel is picking up the MMU translations?
I can only think that they are coming from the ROMVEC v0_prommap memlist which is mentioned here: http://tracker.coreboot.org/trac/openbios/browser/trunk/openbios-devel/arch/.... However that structure doesn't match the memlist definition in http://src.opensolaris.org/source/xref/systemz/betelgeuse/usr/src/uts/common..., plus a list item can only contain one address which would still make it unsuitable for holding translation information.
ATB,
Mark.