On Sun, Apr 17, 2016 at 4:29 PM, Paul Menzel <paulepanter@users.sourceforge.net> wrote:
Dear Kyösti,


It’s now down from around 1,379 ms to 380 ms, which is even faster than
the 520 ms before the regression (and 495 ms) before that.

Thanks a lot!

Are the patches just proof of concept, or ready to be reviewed on
Gerrit?

Proof of concept and some explanation what might be going on there.  As noted, I think we can go and compile Lib/amdlib.c with -O2 flag, so there is no need to use the third patch with per-function align-attributes.

Kyösti