Hi,
In the same way as unrv2b used to, ulzma exhibits very bad instruction
fetch behavior on my Opteron CPUs/Tyan S2912 board. I'm not entirely
sure what causes this, and despite having spent significant time digging
through it I'm not able to mend this by adjusting MTRRs or other cache
settings. So I've taken the easy route, and patched ulzma to copy
LzmaDecode to the stack before executing. This brings running time for
fallback stage uncompress from nearly two minutes to 50ms here.
I'm also seeing weird performance behavior from memset -- this is not
consistent, and just started appearing at some point in my development
here. I assume this has something to do with alignment, but I was
frankly not in the mood to start debugging this as well. I've included
a patch that reduces memset to "rep stosb" under x86, which eliminates
the worst-case behavior (in the order of minutes spent in
cbfs_load_stage) I was seeing here.
Signed-off-by: Arne Georg Gleditsch <arne.gleditsch(a)numascale.com>
--
Arne.