Dear Kyösti,
Am Samstag, den 16.04.2016, 14:16 +0300 schrieb Kyösti Mälkki:
On Thu, Apr 14, 2016 at 6:17 PM, Kyösti Mälkki wrote:
On Mon, Apr 4, 2016 at 5:58 PM, Aaron Durbin:
From your "before" build we have the following where the addresses start diverging.
-02004072 W arch_segment_loaded -02004073 W platform_prog_run -02004074 T prog_run
+02004072 W platform_segment_loaded +02004073 W arch_segment_loaded +02004074 T prog_segment_loaded +020040a3 W platform_prog_run +020040a4 T prog_run
That's 0x30 hex bytes off starting from platform_prog_run(). You could try adding nops to try to re-align things to where they were. It'll be tedious, but that's the only thing I can think of at the moment.
Hi
On pcengines/apu1 I got regression of 300ms on romstage too. The delay was gone once I enabled serial console.
Attached is a patch that places newly introduce prog_segment_loaded() and platform_segment_loaded() functions after any AGESA executable in the binary. This alone fixes the delay too. Looks a lot like alignment issue now.
If we inject this 0x30 offset at different locations within libagesa build, maybe we can narrow this down to few functions?
I narrowed it down Lib/amdlib.c where all the tiny low-level IO access functions live. Injecting extra 0x30 before and after this object in libagesa.fam14.o resulted with 500-600ms difference in romstage performance. Splitting something critical across cachelines perhaps?
Forcing function aligment alone seems to fix this, not much difference compared to build of Lib/amdlib.c with -O2.
Applying your three patches, indeed fix the regression in boot time by the commit 096f4579 (lib/prog_loading: introduce prog_segment_loaded()) [1]. With your patches applied it’s even faster than before.
It’s now down from around 1,379 ms to 380 ms, which is even faster than the 520 ms before the regression (and 495 ms) before that.
Thanks a lot!
Are the patches just proof of concept, or ready to be reviewed on Gerrit?
Thanks,
Paul
On Sun, Apr 17, 2016 at 4:29 PM, Paul Menzel < paulepanter@users.sourceforge.net> wrote:
Dear Kyösti,
It’s now down from around 1,379 ms to 380 ms, which is even faster than the 520 ms before the regression (and 495 ms) before that.
Thanks a lot!
Are the patches just proof of concept, or ready to be reviewed on Gerrit?
Proof of concept and some explanation what might be going on there. As noted, I think we can go and compile Lib/amdlib.c with -O2 flag, so there is no need to use the third patch with per-function align-attributes.
Kyösti