[OpenBIOS] Running client with MMU off

BALATON Zoltan balaton at eik.bme.hu
Mon Jun 9 02:27:44 CEST 2014


On Sun, 8 Jun 2014, Mark Cave-Ayland wrote:
> It might be that MorphOS just got lucky that this works on real hardware 
> without faulting (similar to the bug I had for SPARC64) but some further 
> analysis would help. Does QEMU ppc have a DEBUG_TLB or similar option so you 
> can see when the eviction is triggered? Sadly I think the options here are 
> quite limited :/

There are some debug options in qemu/target-ppc/mmu_helper.c but they 
don't give much clue and the DUMP_PAGE_TABLES option does not work because 
it does not compile if enabled (something seems to have changed leaving 
debug code not updated in this file; since I don't know what was the 
change it's not trivial to fix).

On Sun, 8 Jun 2014, Mark Cave-Ayland wrote:
> Actually just thinking about this, I've just remembered that in OpenBIOS we use
> a round-robin hash table slot eviction scheme that forces a tlbie (i.e.
> invalidate the TLB entry) at the end so perhaps we are invalidating the entry
> early? See arch/ppc/qemu/ofmem.c in hash_page32().

Or maybe it was never in the TLB because the copying of the vectors that 
write here was before enabling the MMU but I don't completely understand 
what will cause an exception and what doesn't. The vectors have been used 
but that may put it in the instruction cache not in the data one.

> Quick and dirty hack alert: you may be able to tweak the code so that any hash
> entry with a virtual address which references physical page 0 is never chosen
> for eviction (and hence never forcibly invalidated with tlbie) and see if that
> helps at all? Have a play and see what happens.

I don't see how this could be done and this seems to be a very dirty hack 
that I'd like to avoid.

> Understood. I appreciate the time you have spent working on this, we just need
> to make sure that we don't just scatter MorphOSisms around the code in a way
> that makes maintenance harder for those of us who won't run it everyday, and
> isn't detrimental to all the other supported architectures.

I agree with this completely and I hope we can come up with a solution 
that works and can be accepted. I did some more experiments but could not 
make it work. So far the only working solution seems to run this code with 
MMU off. I tried this patch:

--- a/openbios-devel/arch/ppc/qemu/start.S
+++ b/openbios-devel/arch/ppc/qemu/start.S
@@ -517,6 +517,10 @@ _GLOBAL(call_elf):
         li      r7,0                    // r7 = length of client program arguments
         li      r0,MSR_FP | MSR_ME | MSR_DR | MSR_IR
         MTMSRD(r0)
+       LOAD_REG_IMMEDIATE(r8, __vectors)
+       li      r9,0
+       lwz     r10,4096(r8)
+       stw     r10,4096(r9)
         blrl

  #ifdef CONFIG_PPC64

which writes to 0x1000 after enabling the MMU and before calling the 
client code to make sure the translation is in the table. With this it can 
get past the write that caused an exception before but hits another one 
later and crashes. Looks like the code where this happens sets up MMU 
related registers and TLB entries so the handlers are probably not yet 
working and it cannot handle the exception even if the right routine is 
called. This is where it crashes with the above patch:

IN:
0x00400acc:  mfdbsr  r0
0x00400ad0:  mfl2cr  r11
0x00400ad4:  addis   r9,r10,-1
0x00400ad8:  addi    r9,r9,32767
0x00400adc:  cmplwi  r9,1
0x00400ae0:  bgt-    0x400af4

IN:
0x00400af4:  mficcr  r0
0x00400af8:  mr      r4,r16
0x00400afc:  mr      r5,r15
0x00400b00:  addi    r3,r1,16
0x00400b04:  li      r6,0
0x00400b08:  bl      0x41cce8

IN:
0x0041cce8:  stwu    r1,-96(r1)
0x0041ccec:  stmw    r13,20(r1)
0x0041ccf0:  lwz     r0,0(r3)
0x0041ccf4:  sync
0x0041ccf8:  mtsr    0,r0
0x0041ccfc:  isync

helper_store_sr: reg=0 00000000 20000400
Raise exception at 0041cd00 => 00000003 (40000000)
IN:
0x00000400:  mtsprg  2,r2
0x00000404:  li      r2,4
0x00000408:  b       0x41f0d4

IN:
0x0041f0d4:  mtsprg  1,r1
0x0041f0d8:  mfmsr   r1
0x0041f0dc:  ori     r1,r1,12336
0x0041f0e0:  sync
0x0041f0e4:  mtmsr   r1

Raise exception at 0041f0e8 => 00000003 (40000000)
Raise exception at 0041f0e8 => 00000003 (40000000)

Actually after writing the sr0-sr15 and sdr1 registers it clears the MMU 
bits and then writes to the ibat and dbat registers and seems to set up 
the TLB and then enable the bits in the MSR. So maybe Apple relies on 
translations in these registers and that's why it works there. 
Unfortunately these are only implemented in 32bit processors so it would 
not work everywhere but still this seems to be the only other solution I 
can think of short of disabling the MMU somehow during this code. Any 
ideas?

Regards,
BALATON Zoltan



More information about the OpenBIOS mailing list