On 25.08.2012, at 03:05, Blue Swirl blauwirbel@gmail.com wrote:
On Sat, Aug 25, 2012 at 9:47 AM, Andreas TObler andreast@fgznet.ch wrote:
On 08/25/12 11:00, Blue Swirl wrote:
On Fri, Aug 24, 2012 at 9:45 PM, Andreas Tobler andreast@fgznet.ch wrote:
Hello,
I'm trying to get FreeBSD powerpc running with qemu. So far it loads the fbsd loader and the loader loads the kernel. The kernel starts booting but it hangs in an endless loop. It tries to print out a fatal_trap but it looks like the 'of' doesn't work properly (anymore?) at this stage.
I have a remote debugger attached to the kernel and I can see where it hangs. But I can not figure out what caused the fatal trap here.
An 'info registers' in qemu shows the srr0=fff025a4, this, as I understand, points to of_client_callback from OpenBIOS. (objdump -dS openbios-qemu.nostrip gives me this.)
qemu is on 1.1.90, iow, a git snapshot from yesterday with OpenBIOS from 19th of aug.
Is there a possibilty to 'debug' the OpenBIOS somehow?
CCing OpenBIOS list too.
We have a built-in debugger in OpenBIOS (maybe not well documented). Then there's DEBUG_CIF in libopenbios/client.c and it should be possible to add debugging print statements to forth/system/ciface.fs too.
Oh, thank you. I'll take a look when I'm back on the machine.
I'm not sure whether it is a kernel issue or an OpenBIOS issue.
The problem could be that there's a MMU fault when the kernel calls OpenBIOS, maybe because OpenBIOS is no longer mapped (MMU disabled?) and then the above debugging would not help.
Hm, from the FreeBSD kernel code view, I passed several 'of' calls to read the memory regions etc. to map memory. That said, OB is working fine for the start. I'm not sure where it happens, the fault above. It might be when I start to call 'of' again after I have started the MMU on the FreeBSD kernel side.
So, to my understanding, you say it might be an MMU fault, that OB is no longer mapped? Who would be the responsible for this mapping, the FreeBSD kernel or OB?
Assuming that SRR0 is the fault address (I'm not so familiar with PPC),
SRR0 is the fault IP. So if the fault at hand is an instruction fetch fault, yes, that would be the address at fault. If it's a data fault you would have to check DAR for the address it faults in.
It might also help to boot the guest with -d in_asm,cpu,int and check out /tmp/qemu.log afterwards. Search for the IP that faulted and see why exactly it did.
Alex
since it's pointing to the low level entry point this could make sense.
At least on Sparc, the kernel should save the original MMU mappings and restore them on point of entry to OpenBIOS. Alternatively, you could try to arrange the code so that after you take control of MMU, OpenBIOS is not used anymore. For example, Linux builds an internal model of the Open Firmware tree and does not use the OF calls for OF tree lookup after the initial probe. I'd suppose NetBSD and OpenBSD kernel code could be used for reference since they work on PPC (real machines at least, I haven't tried on QEMU).
Thanks a lot! Andreas
On 25.08.12 16:36, Alexander Graf wrote:
On 25.08.2012, at 03:05, Blue Swirl blauwirbel@gmail.com wrote:
On Sat, Aug 25, 2012 at 9:47 AM, Andreas TObler andreast@fgznet.ch wrote:
On 08/25/12 11:00, Blue Swirl wrote:
On Fri, Aug 24, 2012 at 9:45 PM, Andreas Tobler andreast@fgznet.ch wrote:
Hello,
I'm trying to get FreeBSD powerpc running with qemu. So far it loads the fbsd loader and the loader loads the kernel. The kernel starts booting but it hangs in an endless loop. It tries to print out a fatal_trap but it looks like the 'of' doesn't work properly (anymore?) at this stage.
I have a remote debugger attached to the kernel and I can see where it hangs. But I can not figure out what caused the fatal trap here.
An 'info registers' in qemu shows the srr0=fff025a4, this, as I understand, points to of_client_callback from OpenBIOS. (objdump -dS openbios-qemu.nostrip gives me this.)
qemu is on 1.1.90, iow, a git snapshot from yesterday with OpenBIOS from 19th of aug.
Is there a possibilty to 'debug' the OpenBIOS somehow?
CCing OpenBIOS list too.
We have a built-in debugger in OpenBIOS (maybe not well documented). Then there's DEBUG_CIF in libopenbios/client.c and it should be possible to add debugging print statements to forth/system/ciface.fs too.
Oh, thank you. I'll take a look when I'm back on the machine.
I'm not sure whether it is a kernel issue or an OpenBIOS issue.
The problem could be that there's a MMU fault when the kernel calls OpenBIOS, maybe because OpenBIOS is no longer mapped (MMU disabled?) and then the above debugging would not help.
Hm, from the FreeBSD kernel code view, I passed several 'of' calls to read the memory regions etc. to map memory. That said, OB is working fine for the start. I'm not sure where it happens, the fault above. It might be when I start to call 'of' again after I have started the MMU on the FreeBSD kernel side.
So, to my understanding, you say it might be an MMU fault, that OB is no longer mapped? Who would be the responsible for this mapping, the FreeBSD kernel or OB?
Assuming that SRR0 is the fault address (I'm not so familiar with PPC),
SRR0 is the fault IP. So if the fault at hand is an instruction fetch fault, yes, that would be the address at fault. If it's a data fault you would have to check DAR for the address it faults in.
It might also help to boot the guest with -d in_asm,cpu,int and check out /tmp/qemu.log afterwards. Search for the IP that faulted and see why exactly it did.
Whoa!!! The first try I ended after the log grew over 5GB :)
The next step was enabling the logging at a position where I knew it is going to happen soon.
Below the excerpt from the qemu.log.
Now the big question for me, what does this exactly say?
Thanks for your hints, really appreciated!
Andreas
From here I branch to 'of':
NIP fff025a4 LR 0094e3f0 CTR fff025a4 XER 00000000 MSR 00003030 HID0 00000000 HF 00002000 idx 1 TB 00000000 554554156 DECR 3740413139 GPR00 00000000009461c4 0000000000c7c090 0000000000c9c830 00000000d0004acc GPR04 00000000fff025a4 0000000000000000 0000000000001032 00000000d0004a80 GPR08 000000000fd00000 0000000000c80000 0000000001c35f60 00000000d00049a0 GPR12 000000000de9bba0 0000000000000000 00000000fff30714 00000000fff30ec8 GPR16 00000000fff2f256 0000000004000000 00000000fffb36cc 00000000fffb3ecc GPR20 0000000000f68000 0000000000000004 00000000fff2f03f 00000000fff2efbf GPR24 00000000fff2f047 00000000fffb3630 0000000001c2f3b0 0000000001c325a8 GPR28 0000000001c2ee6c 0000000001c35fbc 0000000000100100 00000000d00049a0 CR 24000022 [ E G - - - - E E ] RES ffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 00000000 SRR0 0092e658 SRR1 00083002 PVR 00080301 VRSAVE 00000000 SPRG0 0fd00000 SPRG1 01c35f60 SPRG2 44000022 SPRG3 00000000 SPRG4 00000000 SPRG5 00000000 SPRG6 00000000 SPRG7 00000000 SDR1 0100001f get_bat: IBAT v fff025a4 get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 no BAT match for fff025a4: get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 00000000 00000000 0ffe0000 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 Raise exception at fff025a4 => 00000003 (40000000) NIP 00000400 LR 0094e3f0 CTR fff025a4 XER 00000000 MSR 00001000 HID0 00000000 HF 00000000 idx 1 TB 00000000 554556831 DECR 3740410465 GPR00 00000000009461c4 0000000000c7c090 0000000000c9c830 00000000d0004acc GPR04 00000000fff025a4 0000000000000000 0000000000001032 00000000d0004a80 GPR08 000000000fd00000 0000000000c80000 0000000001c35f60 00000000d00049a0 GPR12 000000000de9bba0 0000000000000000 00000000fff30714 00000000fff30ec8 GPR16 00000000fff2f256 0000000004000000 00000000fffb36cc 00000000fffb3ecc GPR20 0000000000f68000 0000000000000004 00000000fff2f03f 00000000fff2efbf GPR24 00000000fff2f047 00000000fffb3630 0000000001c2f3b0 0000000001c325a8 GPR28 0000000001c2ee6c 0000000001c35fbc 0000000000100100 00000000d00049a0 CR 24000022 [ E G - - - - E E ] RES ffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 00000000 SRR0 fff025a4 SRR1 40003030 PVR 00080301 VRSAVE 00000000 SPRG0 0fd00000 SPRG1 01c35f60 SPRG2 44000022 SPRG3 00000000 SPRG4 00000000 SPRG5 00000000 SPRG6 00000000 SPRG7 00000000 SDR1 0100001f IN: 0x00000400: mtsprg 1,r1 0x00000404: mflr r1 0x00000408: mtsprg 2,r1 0x0000040c: li r1,32 0x00000410: bla 0x10074c
On 25/08/12 18:01, Andreas Tobler wrote:
SRR0 is the fault IP. So if the fault at hand is an instruction fetch fault, yes, that would be the address at fault. If it's a data fault you would have to check DAR for the address it faults in.
It might also help to boot the guest with -d in_asm,cpu,int and check out /tmp/qemu.log afterwards. Search for the IP that faulted and see why exactly it did.
Whoa!!! The first try I ended after the log grew over 5GB :)
The next step was enabling the logging at a position where I knew it is going to happen soon.
Below the excerpt from the qemu.log.
Now the big question for me, what does this exactly say?
Thanks for your hints, really appreciated!
Andreas
Hi Andreas,
Do you get any output with just OpenBIOS built with DEBUG_CIF enabled in libopenbios/client.c? According to my email here, one of the things I found a while back was that the dma-alloc method wasn't defined in OpenBIOS for PPC when trying to boot (see the OpenBIOS archives for more information).
HTH,
Mark.
On 25.08.12 19:37, Mark Cave-Ayland wrote:
On 25/08/12 18:01, Andreas Tobler wrote:
SRR0 is the fault IP. So if the fault at hand is an instruction fetch fault, yes, that would be the address at fault. If it's a data fault you would have to check DAR for the address it faults in.
It might also help to boot the guest with -d in_asm,cpu,int and check out /tmp/qemu.log afterwards. Search for the IP that faulted and see why exactly it did.
Whoa!!! The first try I ended after the log grew over 5GB :)
The next step was enabling the logging at a position where I knew it is going to happen soon.
Below the excerpt from the qemu.log.
Now the big question for me, what does this exactly say?
Thanks for your hints, really appreciated!
Hi Mark!
Do you get any output with just OpenBIOS built with DEBUG_CIF enabled in libopenbios/client.c? According to my email here, one of the things I found a while back was that the dma-alloc method wasn't defined in OpenBIOS for PPC when trying to boot (see the OpenBIOS archives for more information).
Yeah, with this DEBUG option enabled I see something. The last entry is about mmu and translations. Then it hangs. Well, it is in an endless loop inside the FreeBSD kernel.
I my repeat myself, but the kernel starts booting from the loader. And then, maybe after the mmu is on in the kernel? the exception occurs.
The dma-alloc issues I found in the archives appear much earlier in the boot process, right? And do I get it right, that if needed, the dma-alloc, I'd get an exception?
(Me new in this business ;)
Thanks, Andreas
....
finddevice("/chosen") = 0xfff45644 getprop(0xfff45644, "mmu", 0x00c21efc, 4) = 4 0x00c21efc 0f b5 a8 70 __ __ __ __ __ __ __ __ __ __ __ __ .??p instance-to-package(0x0fb5a870) = 0xfff521c8 getproplen(0xfff521c8, "translations") = 0x00000250 getprop(0xfff521c8, "translations", 0x00024000, 592) = 592 0x00024000 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
......@......... ..... ..... .....
On 25.08.2012, at 10:01, Andreas Tobler wrote:
On 25.08.12 16:36, Alexander Graf wrote:
On 25.08.2012, at 03:05, Blue Swirl blauwirbel@gmail.com wrote:
On Sat, Aug 25, 2012 at 9:47 AM, Andreas TObler andreast@fgznet.ch wrote:
On 08/25/12 11:00, Blue Swirl wrote:
On Fri, Aug 24, 2012 at 9:45 PM, Andreas Tobler andreast@fgznet.ch wrote:
Hello,
I'm trying to get FreeBSD powerpc running with qemu. So far it loads the fbsd loader and the loader loads the kernel. The kernel starts booting but it hangs in an endless loop. It tries to print out a fatal_trap but it looks like the 'of' doesn't work properly (anymore?) at this stage.
I have a remote debugger attached to the kernel and I can see where it hangs. But I can not figure out what caused the fatal trap here.
An 'info registers' in qemu shows the srr0=fff025a4, this, as I understand, points to of_client_callback from OpenBIOS. (objdump -dS openbios-qemu.nostrip gives me this.)
qemu is on 1.1.90, iow, a git snapshot from yesterday with OpenBIOS from 19th of aug.
Is there a possibilty to 'debug' the OpenBIOS somehow?
CCing OpenBIOS list too.
We have a built-in debugger in OpenBIOS (maybe not well documented). Then there's DEBUG_CIF in libopenbios/client.c and it should be possible to add debugging print statements to forth/system/ciface.fs too.
Oh, thank you. I'll take a look when I'm back on the machine.
I'm not sure whether it is a kernel issue or an OpenBIOS issue.
The problem could be that there's a MMU fault when the kernel calls OpenBIOS, maybe because OpenBIOS is no longer mapped (MMU disabled?) and then the above debugging would not help.
Hm, from the FreeBSD kernel code view, I passed several 'of' calls to read the memory regions etc. to map memory. That said, OB is working fine for the start. I'm not sure where it happens, the fault above. It might be when I start to call 'of' again after I have started the MMU on the FreeBSD kernel side.
So, to my understanding, you say it might be an MMU fault, that OB is no longer mapped? Who would be the responsible for this mapping, the FreeBSD kernel or OB?
Assuming that SRR0 is the fault address (I'm not so familiar with PPC),
SRR0 is the fault IP. So if the fault at hand is an instruction fetch fault, yes, that would be the address at fault. If it's a data fault you would have to check DAR for the address it faults in.
It might also help to boot the guest with -d in_asm,cpu,int and check out /tmp/qemu.log afterwards. Search for the IP that faulted and see why exactly it did.
Whoa!!! The first try I ended after the log grew over 5GB :)
The next step was enabling the logging at a position where I knew it is going to happen soon.
Below the excerpt from the qemu.log.
Now the big question for me, what does this exactly say?
Thanks for your hints, really appreciated!
Andreas
From here I branch to 'of':
NIP fff025a4 LR 0094e3f0 CTR fff025a4 XER 00000000 MSR 00003030 HID0 00000000 HF 00002000 idx 1 TB 00000000 554554156 DECR 3740413139 GPR00 00000000009461c4 0000000000c7c090 0000000000c9c830 00000000d0004acc GPR04 00000000fff025a4 0000000000000000 0000000000001032 00000000d0004a80 GPR08 000000000fd00000 0000000000c80000 0000000001c35f60 00000000d00049a0 GPR12 000000000de9bba0 0000000000000000 00000000fff30714 00000000fff30ec8 GPR16 00000000fff2f256 0000000004000000 00000000fffb36cc 00000000fffb3ecc GPR20 0000000000f68000 0000000000000004 00000000fff2f03f 00000000fff2efbf GPR24 00000000fff2f047 00000000fffb3630 0000000001c2f3b0 0000000001c325a8 GPR28 0000000001c2ee6c 0000000001c35fbc 0000000000100100 00000000d00049a0 CR 24000022 [ E G - - - - E E ] RES ffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 00000000 SRR0 0092e658 SRR1 00083002 PVR 00080301 VRSAVE 00000000 SPRG0 0fd00000 SPRG1 01c35f60 SPRG2 44000022 SPRG3 00000000 SPRG4 00000000 SPRG5 00000000 SPRG6 00000000 SPRG7 00000000 SDR1 0100001f get_bat: IBAT v fff025a4 get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 no BAT match for fff025a4: get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 00000000 00000000 0ffe0000 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 Raise exception at fff025a4 => 00000003 (40000000)
This means that at address 0xfff025a4 we hit a
POWERPC_EXCP_ISI = 3, /* Instruction storage exception */
with flags
33 Set to 1 if MSR.IR=1 and the translation for an attempted access is not found in the Page Table; otherwise set to 0.
So something was trying to jump to code that is not mapped in the HTAB. The address is definitely an OpenBIOS address, so for some reason FreeBSD messed with the HTAB, didn't make sure that OpenBIOS is still mapped in and then jumped to it. I wonder why the code works on real hardware?
NIP 00000400 LR 0094e3f0 CTR fff025a4 XER 00000000 MSR 00001000 HID0 00000000 HF 00000000 idx 1 TB 00000000 554556831 DECR 3740410465 GPR00 00000000009461c4 0000000000c7c090 0000000000c9c830 00000000d0004acc GPR04 00000000fff025a4 0000000000000000 0000000000001032 00000000d0004a80 GPR08 000000000fd00000 0000000000c80000 0000000001c35f60 00000000d00049a0 GPR12 000000000de9bba0 0000000000000000 00000000fff30714 00000000fff30ec8 GPR16 00000000fff2f256 0000000004000000 00000000fffb36cc 00000000fffb3ecc GPR20 0000000000f68000 0000000000000004 00000000fff2f03f 00000000fff2efbf GPR24 00000000fff2f047 00000000fffb3630 0000000001c2f3b0 0000000001c325a8 GPR28 0000000001c2ee6c 0000000001c35fbc 0000000000100100 00000000d00049a0 CR 24000022 [ E G - - - - E E ] RES ffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 00000000 SRR0 fff025a4 SRR1 40003030 PVR 00080301 VRSAVE 00000000 SPRG0 0fd00000 SPRG1 01c35f60 SPRG2 44000022 SPRG3 00000000 SPRG4 00000000 SPRG5 00000000 SPRG6 00000000 SPRG7 00000000 SDR1 0100001f IN: 0x00000400: mtsprg 1,r1 0x00000404: mflr r1 0x00000408: mtsprg 2,r1 0x0000040c: li r1,32 0x00000410: bla 0x10074c
This is the ISI interrupt handler :).
Alex
Hi Alex!
On 26.08.12 06:24, Alexander Graf wrote:
get_bat: IBAT v fff025a4 get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 no BAT match for fff025a4:
And about this one I do not have to worry?
get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 00000000 00000000 0ffe0000 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 Raise exception at fff025a4 => 00000003 (40000000)
This means that at address 0xfff025a4 we hit a
POWERPC_EXCP_ISI = 3, /* Instruction storage exception */
with flags
33 Set to 1 if MSR.IR=1 and the translation for an attempted access is not found in the Page Table; otherwise set to 0.
So something was trying to jump to code that is not mapped in the HTAB. The address is definitely an OpenBIOS address, so for some reason FreeBSD messed with the HTAB, didn't make sure that OpenBIOS is still mapped in and then jumped to it. I wonder why the code works on real hardware?
Aha, this gives me a challenge to find out what is happening. It works on real hw, but from my observation it sometimes 'hangs' and to find out about this issue I started to play with qemu ;)
Thank you very much.
Now I have to find out about why the OpwnBIOS area is not mapped. Is it the case that the 'of' code on Apple HW is located elsewhere, not on such high address like 0xfff00000? Or should that not matter?
Andreas
On 25.08.2012, at 22:47, Andreas Tobler wrote:
Hi Alex!
On 26.08.12 06:24, Alexander Graf wrote:
get_bat: IBAT v fff025a4 get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 no BAT match for fff025a4:
And about this one I do not have to worry?
book3s_32 (which the G3s and G4s are) have 2 ways of resolving an effective address (virtual in x86 speech) to a real address (physical address in x86 speech).
BAT
There are a number of BAT registers. They basically contain mappings that contain
- effective offset of a region - length of the region - real offset of the region
thus allow you to map an address range
EA & ~mask -> RA | (EA & mask)
So Linux for example uses them to map
EA 0xc0000000 - 0xcfffffff -> RA 0x00000000 -- 0x0fffffff
HTAB
This is the "normal" way of dealing with the MMU on non-embedded PPC. You have a big hash table in memory that contains maps for all processes at the same time, with a tag on which one you are at a time. The map which process you are at happens through the segment registers (SR).
OpenBIOS uses an HTAB to map itself into your effective memory window. It also sets up its own SRs.
I would expect OpenBSD to set up its own maps in the HTAB and use different SRs. At that time, the OpenBIOS mappings are obviously gone.
get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 00000000 00000000 0ffe0000 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 Raise exception at fff025a4 => 00000003 (40000000)
This means that at address 0xfff025a4 we hit a
POWERPC_EXCP_ISI = 3, /* Instruction storage exception */
with flags
33 Set to 1 if MSR.IR=1 and the translation for an attempted access is not found in the Page Table; otherwise set to 0.
So something was trying to jump to code that is not mapped in the HTAB. The address is definitely an OpenBIOS address, so for some reason FreeBSD messed with the HTAB, didn't make sure that OpenBIOS is still mapped in and then jumped to it. I wonder why the code works on real hardware?
Aha, this gives me a challenge to find out what is happening. It works on real hw, but from my observation it sometimes 'hangs' and to find out about this issue I started to play with qemu ;)
Thank you very much.
Now I have to find out about why the OpwnBIOS area is not mapped. Is it the case that the 'of' code on Apple HW is located elsewhere, not on such high address like 0xfff00000? Or should that not matter?
Phew. I honestly have no idea how this would work at all even on Apple HW. Linux simply handles everything from real mode (disable paging) when going into anything firmware related. I have no idea what OpenBSD does. If you could try to find out and summarize it, I might be able to reconstruct how it could work :).
Alex
On 08/26/12 16:09, Alexander Graf wrote:
On 25.08.2012, at 22:47, Andreas Tobler wrote:
Hi Alex!
On 26.08.12 06:24, Alexander Graf wrote:
get_bat: IBAT v fff025a4 get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 no BAT match for fff025a4:
And about this one I do not have to worry?
book3s_32 (which the G3s and G4s are) have 2 ways of resolving an effective address (virtual in x86 speech) to a real address (physical address in x86 speech).
BAT
There are a number of BAT registers. They basically contain mappings that contain
- effective offset of a region
- length of the region
- real offset of the region
thus allow you to map an address range
EA & ~mask -> RA | (EA & mask)
So Linux for example uses them to map
EA 0xc0000000 - 0xcfffffff -> RA 0x00000000 -- 0x0fffffff
HTAB
This is the "normal" way of dealing with the MMU on non-embedded PPC. You have a big hash table in memory that contains maps for all processes at the same time, with a tag on which one you are at a time. The map which process you are at happens through the segment registers (SR).
OpenBIOS uses an HTAB to map itself into your effective memory window. It also sets up its own SRs.
I would expect OpenBSD to set up its own maps in the HTAB and use different SRs. At that time, the OpenBIOS mappings are obviously gone.
Ok, nit, I have to insist, I'm on FreeBSD and I have not much clue how OpenBSD does it. We on FreeBSD have both, 32- and 64-bit PowerPC running. On the 64-bit side we even have a POWER7 emulation working inside qemu :)
So, do I get it right, either BAT _or_ HTAB? If so, I will study the HTAB code and find out what is happening, I need to read into to docs then.
get_bat: IBAT0 v fff025a4 BATu 00001ffe BATl 00000012 00000000 00000000 0ffe0000 get_bat: IBAT1 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT2 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 get_bat: IBAT3 v fff025a4 BATu 00000000 BATl 00000000 00000000 00000000 00000000 Raise exception at fff025a4 => 00000003 (40000000)
This means that at address 0xfff025a4 we hit a
POWERPC_EXCP_ISI = 3, /* Instruction storage exception */
with flags
33 Set to 1 if MSR.IR=1 and the translation for an attempted access is not found in the Page Table; otherwise set to 0.
So something was trying to jump to code that is not mapped in the HTAB. The address is definitely an OpenBIOS address, so for some reason FreeBSD messed with the HTAB, didn't make sure that OpenBIOS is still mapped in and then jumped to it. I wonder why the code works on real hardware?
Aha, this gives me a challenge to find out what is happening. It works on real hw, but from my observation it sometimes 'hangs' and to find out about this issue I started to play with qemu ;)
Thank you very much.
Now I have to find out about why the OpwnBIOS area is not mapped. Is it the case that the 'of' code on Apple HW is located elsewhere, not on such high address like 0xfff00000? Or should that not matter?
Phew. I honestly have no idea how this would work at all even on Apple HW. Linux simply handles everything from real mode (disable paging) when going into anything firmware related. I have no idea what OpenBSD does. If you could try to find out and summarize it, I might be able to reconstruct how it could work :).
Hm, afaiu, we have three different 'of' modes, real-mode for cases where the firmware runs in real-mode. Meant for 32- and 64-bit real-mode. And two for virtual-mode one for 32- and one for 64-bit. As I remember I run into 32-bit virtual-mode. Might also be that the issue is here.
Another issue which came up in this thread, linux uses its own 'of' tree when the kernel is up while we still rely on the one form the firmware.
Will experiment a bit.
Thanks again for your patience!
Andreas
Now I have to find out about why the OpwnBIOS area is not mapped. Is it the case that the 'of' code on Apple HW is located elsewhere, not on such high address like 0xfff00000?
Apple OF normally sits at the top of the address space, too.
Or should that not matter?
Phew. I honestly have no idea how this would work at all even on Apple HW. Linux simply handles everything from real mode (disable paging) when going into anything firmware related. I have no idea what OpenBSD does. If you could try to find out and summarize it, I might be able to reconstruct how it could work :).
Perhaps the OpenBSD code did not flush the TLB yet? Or does QEMU emulate the TLB properly? I seriously doubt that :-)
The logs do not show whether segment x'f is still mapped in the SRs (or I missed it). Is it?
Segher
On 26.08.12 19:02, Segher Boessenkool wrote:
Now I have to find out about why the OpwnBIOS area is not mapped. Is it the case that the 'of' code on Apple HW is located elsewhere, not on such high address like 0xfff00000?
Apple OF normally sits at the top of the address space, too.
Thanks for clarification!
Or should that not matter?
Phew. I honestly have no idea how this would work at all even on Apple HW. Linux simply handles everything from real mode (disable paging) when going into anything firmware related. I have no idea what OpenBSD does. If you could try to find out and summarize it, I might be able to reconstruct how it could work :).
Perhaps the OpenBSD code did not flush the TLB yet? Or does QEMU emulate the TLB properly? I seriously doubt that :-)
Hm, s/OpenBSD/FreeBSD. :)
How do I flush the TLB? Then I can look up in the code and see where it is done. I guess we do that but the place/time might be the question. (Otherwise it wouldn't work on real HW, right?)
The logs do not show whether segment x'f is still mapped in the SRs (or I missed it). Is it?
How can I enable this log or make it visible?
And sorry for not being subscribed to the OPenBIOS list. Now I am and everybody should see all my posts to this list too.
Thanks! Andreas
Phew. I honestly have no idea how this would work at all even on Apple HW. Linux simply handles everything from real mode (disable paging) when going into anything firmware related. I have no idea what OpenBSD does. If you could try to find out and summarize it, I might be able to reconstruct how it could work :).
Perhaps the OpenBSD code did not flush the TLB yet? Or does QEMU emulate the TLB properly? I seriously doubt that :-)
Hm, s/OpenBSD/FreeBSD. :)
I have no idea :-)
How do I flush the TLB?
tlbie, and perhaps tlbsync.
Then I can look up in the code and see where it is done. I guess we do that but the place/time might be the question. (Otherwise it wouldn't work on real HW, right?)
That is my theory, yes.
The logs do not show whether segment x'f is still mapped in the SRs (or I missed it). Is it?
How can I enable this log or make it visible?
I have no idea.
Segher
On 26.08.2012, at 12:09, Segher Boessenkool segher@kernel.crashing.org wrote:
Phew. I honestly have no idea how this would work at all even on Apple HW. Linux simply handles everything from real mode (disable paging) when going into anything firmware related. I have no idea what OpenBSD does. If you could try to find out and summarize it, I might be able to reconstruct how it could work :).
Perhaps the OpenBSD code did not flush the TLB yet? Or does QEMU emulate the TLB properly? I seriously doubt that :-)
Hm, s/OpenBSD/FreeBSD. :)
I have no idea :-)
How do I flush the TLB?
tlbie, and perhaps tlbsync.
The QEMU TLB only caches existing translations, never misses.
Alex
Then I can look up in the code and see where it is done. I guess we do that but the place/time might be the question. (Otherwise it wouldn't work on real HW, right?)
That is my theory, yes.
The logs do not show whether segment x'f is still mapped in the SRs (or I missed it). Is it?
How can I enable this log or make it visible?
I have no idea.
Segher
How do I flush the TLB?
tlbie, and perhaps tlbsync.
The QEMU TLB only caches existing translations, never misses.
I'm not sure what you mean here? No PowerPC hardware that I know of stores a "this address doesn't go anywhere" tag in the TLB, either (I don't think the architecture allows that even).
I also don't see what it has to do with the problem. The scenario what we think is happening: the CPU has translations for the OF code space in its TLB, because it has run it before. The kernel removes the translations but doesn't do TLBIE on those. On real hardware, the TLB entries are still used. What does QEMU do?
Segher
On 27.08.2012, at 13:43, Segher Boessenkool segher@kernel.crashing.org wrote:
How do I flush the TLB?
tlbie, and perhaps tlbsync.
The QEMU TLB only caches existing translations, never misses.
I'm not sure what you mean here? No PowerPC hardware that I know of stores a "this address doesn't go anywhere" tag in the TLB, either (I don't think the architecture allows that even).
I also don't see what it has to do with the problem. The scenario what we think is happening: the CPU has translations for the OF code space in its TLB, because it has run it before. The kernel removes the translations but doesn't do TLBIE on those. On real hardware, the TLB entries are still used. What does QEMU do?
Ah, I see. It depends. QEMU doesn't provide any guarantees that the TLB survives basically. We don't flush it often for book3s, but it can still happen. Maybe try to put a printf into the tlb flush handler function?
Alex
Segher
On 27.08.12 23:51, Alexander Graf wrote:
On 27.08.2012, at 13:43, Segher Boessenkool segher@kernel.crashing.org wrote:
How do I flush the TLB?
tlbie, and perhaps tlbsync.
The QEMU TLB only caches existing translations, never misses.
I'm not sure what you mean here? No PowerPC hardware that I know of stores a "this address doesn't go anywhere" tag in the TLB, either (I don't think the architecture allows that even).
I also don't see what it has to do with the problem. The scenario what we think is happening: the CPU has translations for the OF code space in its TLB, because it has run it before. The kernel removes the translations but doesn't do TLBIE on those. On real hardware, the TLB entries are still used. What does QEMU do?
Ah, I see. It depends. QEMU doesn't provide any guarantees that the TLB survives basically. We don't flush it often for book3s, but it can still happen. Maybe try to put a printf into the tlb flush handler function?
Sorry for the delay, was sick for the past days :(
You suggest to add some printf's, am I right to do that in the cputlb.c tlb_flush()? If not, where did you mean to do that?
And on a side note, are/were there successful boot results from other OS's than linux with qemu and OpenBIOS on powerpc?
I didn't find a successful report.
Thanks, Andreas
On 31.08.2012, at 22:40, Andreas Tobler wrote:
On 27.08.12 23:51, Alexander Graf wrote:
On 27.08.2012, at 13:43, Segher Boessenkool segher@kernel.crashing.org wrote:
How do I flush the TLB?
tlbie, and perhaps tlbsync.
The QEMU TLB only caches existing translations, never misses.
I'm not sure what you mean here? No PowerPC hardware that I know of stores a "this address doesn't go anywhere" tag in the TLB, either (I don't think the architecture allows that even).
I also don't see what it has to do with the problem. The scenario what we think is happening: the CPU has translations for the OF code space in its TLB, because it has run it before. The kernel removes the translations but doesn't do TLBIE on those. On real hardware, the TLB entries are still used. What does QEMU do?
Ah, I see. It depends. QEMU doesn't provide any guarantees that the TLB survives basically. We don't flush it often for book3s, but it can still happen. Maybe try to put a printf into the tlb flush handler function?
Sorry for the delay, was sick for the past days :(
You suggest to add some printf's, am I right to do that in the cputlb.c tlb_flush()? If not, where did you mean to do that?
Yup. tlb_flush_page and/or tlb_flush.
And on a side note, are/were there successful boot results from other OS's than linux with qemu and OpenBIOS on powerpc?
Phew. With a bunch of hacks I've had Mac OS X booted into its kernel where it was happily churning along until it realized that I'm trying to run it on machine that was too old (g3 beige support got dropped as early as 10.3 iirc).
Apart from that, I'm not aware of any successful boots. Maybe haiku?
Alex
I didn't find a successful report.
Thanks, Andreas