On 05/01/13 20:21, Blue Swirl wrote:
(removed Ben from the CC as this is clearly an OpenBIOS issue)
Actually here's a revised version that moves the __next_grab_slot back into C which is simpler still and continues to work for me. I guess in this form the same change could be applied as-is to hash_page_64()?
The address arithmetic looks very similar to va2pa() or ea_to_phys(). Could one of those be used instead?
I thought about that, but figured that because it was a hot-path I'd do it inline to save some cycles invoking another function. Since it uses the same OF_* constants as those functions then it should just work.
Also since this is pretty hot code path, could you initialize the pointer and perform the calculations just once during init?
Unfortunately not :( The whole issue is that the exception handlers are invoked in real mode with MMU disabled, and hence to locate your global/static pointer you'd still have to convert its address from virtual to real before you could do anything with it. So for me it made sense to just do the increment on the variable directly.
ATB,
Mark.
Am 05.01.2013 um 21:54 schrieb Mark Cave-Ayland mark.cave-ayland@ilande.co.uk:
On 05/01/13 20:21, Blue Swirl wrote:
(removed Ben from the CC as this is clearly an OpenBIOS issue)
Actually here's a revised version that moves the __next_grab_slot back into C which is simpler still and continues to work for me. I guess in this form the same change could be applied as-is to hash_page_64()?
The address arithmetic looks very similar to va2pa() or ea_to_phys(). Could one of those be used instead?
I thought about that, but figured that because it was a hot-path I'd do it inline to save some cycles invoking another function. Since it uses the same OF_* constants as those functions then it should just work.
If the calculation function was provided through a static inline function in a header, this'd be a no-issue.
Alex
Also since this is pretty hot code path, could you initialize the pointer and perform the calculations just once during init?
Unfortunately not :( The whole issue is that the exception handlers are invoked in real mode with MMU disabled, and hence to locate your global/static pointer you'd still have to convert its address from virtual to real before you could do anything with it. So for me it made sense to just do the increment on the variable directly.
ATB,
Mark.
On 05/01/13 21:23, Alexander Graf wrote:
I thought about that, but figured that because it was a hot-path I'd do it inline to save some cycles invoking another function. Since it uses the same OF_* constants as those functions then it should just work.
If the calculation function was provided through a static inline function in a header, this'd be a no-issue.
Alex
va2pa() looks like a good candidate for this, however it isn't marked as static inline because it's called from ofmem_common.c in libopenbios.
Then again I'm not sure this is important as I've heard in several places that most optimising compilers ignore the inline keyword, and will just inline whatever they think will run fastest based upon the current optimisation level.
Anyway let me know which way you'd prefer.
ATB,
Mark.
Am 05.01.2013 um 22:47 schrieb Mark Cave-Ayland mark.cave-ayland@ilande.co.uk:
On 05/01/13 21:23, Alexander Graf wrote:
I thought about that, but figured that because it was a hot-path I'd do it inline to save some cycles invoking another function. Since it uses the same OF_* constants as those functions then it should just work.
If the calculation function was provided through a static inline function in a header, this'd be a no-issue.
Alex
va2pa() looks like a good candidate for this, however it isn't marked as static inline because it's called from ofmem_common.c in libopenbios.
Then again I'm not sure this is important as I've heard in several places that most optimising compilers ignore the inline keyword, and will just inline whatever they think will run fastest based upon the current optimisation level.
Anyway let me know which way you'd prefer.
I don't have the source in front of me atm. But basically I'd say you should ignore any function calling overhead for now and instead reuse as much infrastructure as you can.
Alex
ATB,
Mark.
On 05.01.2013, at 22:47, Mark Cave-Ayland wrote:
On 05/01/13 21:23, Alexander Graf wrote:
I thought about that, but figured that because it was a hot-path I'd do it inline to save some cycles invoking another function. Since it uses the same OF_* constants as those functions then it should just work.
If the calculation function was provided through a static inline function in a header, this'd be a no-issue.
Alex
va2pa() looks like a good candidate for this, however it isn't marked as static inline because it's called from ofmem_common.c in libopenbios.
Then again I'm not sure this is important as I've heard in several places that most optimising compilers ignore the inline keyword, and will just inline whatever they think will run fastest based upon the current optimisation level.
Anyway let me know which way you'd prefer.
Does the patch below work?
Alex
diff --git a/arch/ppc/qemu/ofmem.c b/arch/ppc/qemu/ofmem.c index 25555a0..cd2b700 100644 --- a/arch/ppc/qemu/ofmem.c +++ b/arch/ppc/qemu/ofmem.c @@ -311,10 +311,24 @@ ea_to_phys(unsigned long ea, ucell *mode) return phys; }
+/* Return the next slot to evict, in the range of [0..7] */ +static int +next_evicted_slot(void) +{ + static int next_grab_slot; + int *next_grab_slot_va; + int r; + + next_grab_slot_va = (int*)(void*)va2pa((phys_addr_t)&next_grab_slot); + r = *next_grab_slot_va; + *next_grab_slot_va = (r + 1) % 8; + + return r; +} + static void hash_page_64(unsigned long ea, phys_addr_t phys, ucell mode) { - static int next_grab_slot = 0; uint64_t vsid_mask, page_mask, pgidx, hash; uint64_t htab_mask, mask, avpn; unsigned long pgaddr; @@ -349,10 +363,8 @@ hash_page_64(unsigned long ea, phys_addr_t phys, ucell mode) found = 1;
/* out of slots, just evict one */ - if (!found) { - i = next_grab_slot + 1; - next_grab_slot = (next_grab_slot + 1) % 8; - } + if (!found) + i = next_evicted_slot() + 1; i--; { mPTE_64_t p = { @@ -381,7 +393,6 @@ static void hash_page_32(unsigned long ea, phys_addr_t phys, ucell mode) { #ifndef __powerpc64__ - static int next_grab_slot = 0; unsigned long *upte, cmp, hash1; int i, vsid, found; mPTE_t *pp; @@ -407,10 +418,8 @@ hash_page_32(unsigned long ea, phys_addr_t phys, ucell mode) found = 1;
/* out of slots, just evict one */ - if (!found) { - i = next_grab_slot + 1; - next_grab_slot = (next_grab_slot + 1) % 8; - } + if (!found) + i = next_evicted_slot() + 1; i--; upte[i * 2] = cmp; upte[i * 2 + 1] = (phys & ~0xfff) | mode;
On 05.01.2013, at 23:13, Alexander Graf wrote:
On 05.01.2013, at 22:47, Mark Cave-Ayland wrote:
On 05/01/13 21:23, Alexander Graf wrote:
I thought about that, but figured that because it was a hot-path I'd do it inline to save some cycles invoking another function. Since it uses the same OF_* constants as those functions then it should just work.
If the calculation function was provided through a static inline function in a header, this'd be a no-issue.
Alex
va2pa() looks like a good candidate for this, however it isn't marked as static inline because it's called from ofmem_common.c in libopenbios.
Then again I'm not sure this is important as I've heard in several places that most optimising compilers ignore the inline keyword, and will just inline whatever they think will run fastest based upon the current optimisation level.
Anyway let me know which way you'd prefer.
Does the patch below work?
It looks pretty broken still:
int *next_grab_slot_va; int r;
next_grab_slot_va = (int*)(void*)va2pa((phys_addr_t)&next_grab_slot); r = *next_grab_slot_va; *next_grab_slot_va = (r + 1) % 8; 340: 3d 60 00 00 lis r11,0 342: R_PPC_ADDR16_HA .bss 344: 81 2b 00 00 lwz r9,0(r11) 346: R_PPC_ADDR16_LO .bss
Alex
On 05.01.2013, at 23:26, Alexander Graf wrote:
On 05.01.2013, at 23:13, Alexander Graf wrote:
On 05.01.2013, at 22:47, Mark Cave-Ayland wrote:
On 05/01/13 21:23, Alexander Graf wrote:
I thought about that, but figured that because it was a hot-path I'd do it inline to save some cycles invoking another function. Since it uses the same OF_* constants as those functions then it should just work.
If the calculation function was provided through a static inline function in a header, this'd be a no-issue.
Alex
va2pa() looks like a good candidate for this, however it isn't marked as static inline because it's called from ofmem_common.c in libopenbios.
Then again I'm not sure this is important as I've heard in several places that most optimising compilers ignore the inline keyword, and will just inline whatever they think will run fastest based upon the current optimisation level.
Anyway let me know which way you'd prefer.
Does the patch below work?
It looks pretty broken still:
The problem is that .bss is beyond OF_CODE_START + OF_CODE_SIZE. Too bad. This version seems to compile into something sane. It's essentially the same as Mark's latest one though, just slightly more readable IMHO :).
Alex
diff --git a/arch/ppc/qemu/ofmem.c b/arch/ppc/qemu/ofmem.c index 25555a0..bd4e112 100644 --- a/arch/ppc/qemu/ofmem.c +++ b/arch/ppc/qemu/ofmem.c @@ -311,10 +311,32 @@ ea_to_phys(unsigned long ea, ucell *mode) return phys; }
+/* Converts a global variable (from .data or .bss) into a pointer that + can be accessed from real mode */ +static void * +global_ptr_real(void *p) +{ + return (void*)((uintptr_t)p - OF_CODE_START + get_rom_base()); +} + +/* Return the next slot to evict, in the range of [0..7] */ +static int +next_evicted_slot(void) +{ + static int next_grab_slot; + int *next_grab_slot_va; + int r; + + next_grab_slot_va = global_ptr_real(&next_grab_slot); + r = *next_grab_slot_va; + *next_grab_slot_va = (r + 1) % 8; + + return r; +} + static void hash_page_64(unsigned long ea, phys_addr_t phys, ucell mode) { - static int next_grab_slot = 0; uint64_t vsid_mask, page_mask, pgidx, hash; uint64_t htab_mask, mask, avpn; unsigned long pgaddr; @@ -349,10 +371,8 @@ hash_page_64(unsigned long ea, phys_addr_t phys, ucell mode) found = 1;
/* out of slots, just evict one */ - if (!found) { - i = next_grab_slot + 1; - next_grab_slot = (next_grab_slot + 1) % 8; - } + if (!found) + i = next_evicted_slot() + 1; i--; { mPTE_64_t p = { @@ -381,7 +401,6 @@ static void hash_page_32(unsigned long ea, phys_addr_t phys, ucell mode) { #ifndef __powerpc64__ - static int next_grab_slot = 0; unsigned long *upte, cmp, hash1; int i, vsid, found; mPTE_t *pp; @@ -407,10 +426,8 @@ hash_page_32(unsigned long ea, phys_addr_t phys, ucell mode) found = 1;
/* out of slots, just evict one */ - if (!found) { - i = next_grab_slot + 1; - next_grab_slot = (next_grab_slot + 1) % 8; - } + if (!found) + i = next_evicted_slot() + 1; i--; upte[i * 2] = cmp; upte[i * 2 + 1] = (phys & ~0xfff) | mode;
On 05.01.2013, at 23:33, Alexander Graf wrote:
On 05.01.2013, at 23:26, Alexander Graf wrote:
On 05.01.2013, at 23:13, Alexander Graf wrote:
On 05.01.2013, at 22:47, Mark Cave-Ayland wrote:
On 05/01/13 21:23, Alexander Graf wrote:
I thought about that, but figured that because it was a hot-path I'd do it inline to save some cycles invoking another function. Since it uses the same OF_* constants as those functions then it should just work.
If the calculation function was provided through a static inline function in a header, this'd be a no-issue.
Alex
va2pa() looks like a good candidate for this, however it isn't marked as static inline because it's called from ofmem_common.c in libopenbios.
Then again I'm not sure this is important as I've heard in several places that most optimising compilers ignore the inline keyword, and will just inline whatever they think will run fastest based upon the current optimisation level.
Anyway let me know which way you'd prefer.
Does the patch below work?
It looks pretty broken still:
The problem is that .bss is beyond OF_CODE_START + OF_CODE_SIZE. Too bad. This version seems to compile into something sane. It's essentially the same as Mark's latest one though, just slightly more readable IMHO :).
I applied this patch to svn.
For those of you wary of touching a "hot path" (even though eviction shouldn't occur too often in really hot paths), all of the functions below get inlined by gcc and statically compiled out, because gcc knows the destination address of the pointer already.
Alex
On 06/01/13 13:31, Alexander Graf wrote:
I applied this patch to svn.
For those of you wary of touching a "hot path" (even though eviction shouldn't occur too often in really hot paths), all of the functions below get inlined by gcc and statically compiled out, because gcc knows the destination address of the pointer already.
Thanks for this - I've just done some testing against latest trunk and your patch works for me. In particular the Darwin kernel now starts up, and NetBSD now gets to a framebuffer console :)
ATB,
Mark.