[coreboot] use gcc 4.6.0 link time optimization to reduce coreboot execution time

Fri Apr 29 07:45:54 CEST 2011

Stefan Reinauer wrote:

] Did you do some size tests with non-AGESA targets?

The improvement for non-agesa-v5 projects is smaller.
Here are a couple of examples:
AMD Mahogany F10       standard  -flto
fallback/romstage      74803     73004
fallback/coreboot_ram  55665     49928

Intel D945GCLF         standard  -flto
fallback/romstage      33144     29841
fallback/coreboot_ram  69435     65774

By the way, the attached patch needs one more change to
build some of the non-agesa projects with -flto enabled.
from an 04/20/2011 email...
- sed -e 's/\.rodata/.rom.data/g' -e 's/\.text/.section .rom.text/g' $^ > $@.tmp
+ sed -e 's/\.rodata/.rom.data/g' -e 's/^[ \t]*\.text/.section .rom.text/g' $^ > $@.tmp

] Does lto work with our "driver"s? I hoped that once we have LTO 
] available we could get rid of the distinction between drivers and 
] objects and handle everything the way we handle drivers now, letting gcc 
] remove the functions we don't need.

I have not encountered coreboot "drivers" directly yet. But I
think the answer is yes. Link time optimization completely
eliminates code that is never called and data that is never
referenced.

] Should we instead probe for availability of -flto in 
] util/xcompile/xcompile and use it if it is there?

One problem is that while gcc 4.5.2 supports -flto, it
is experimental and crashes during the build. So probing
logic would have to exclude gcc 4.5.2. Also, there might
be some objections to the long build time (agesa projects
only), and to the lack of debug support.

] What's the problem with dwarf2? GCC 4.6 uses mostly dwarf4 unless you 
] manually force it to dwarf2. Will this still be a problem?

Testing with both dwarf2 and dwarf4 give the same result: no line
number information for files compiled with -flto. The docs warn:
   "Link time optimization does not play well with generating
    debugging information. Combining -flto with -g is currently
    experimental and expected to produce wrong results".

> Index: Makefile
> ===================================================================
> --- Makefile	(revision 6549)
> +++ Makefile	(working copy)
> @@ -211,7 +211,7 @@
>   de$(EMPTY)fine $(1)-objs_$(2)_template
>   $(obj)/$$(1).$(1).o: src/$$(1).$(2) $(obj)/config.h $(4)
>   	@printf "    CC         $$$$(subst $$$$(obj)/,,$$$$(@))\n"
> -	$(CC) $(3) -MMD $$$$(CFLAGS) -c -o $$$$@ $$$$<
> +	$(CC) $(3) -MMD $$$$(CFLAGS) $$$$(LTO_OPTIMIZE) -c -o $$$$@ $$$$<

] Hm.. I think LTO_OPTIMIZE should be added to CFLAGS instead, that would 
] make the patch a whole lot less intrusive.

Here is why it is more complicated than seems necessary. The idea of
-flto is you just add it to compiler flags, and make sure to pass
the flags during the link step. When I did this, the build fails with:
"cannot find entry symbol protected_start". This causes the entire crt0.s
to be considered dead code and omitted. With C code, this problem can be
overcome using __attribute__((externally_visible)). But gas has no
equivalent and I could not find a solution, other than compile part
without -flto. In the patch, $(LTO_OPTIMIZE) is the normal optimization
flags, while $(OPTIMIZE) is optimization excluding -flto. There might
be a nicer way to orginize this, but somehow -flto needs to be skipped
in one case.

> Index: src/arch/x86/init/bootblock.ld
> ===================================================================
> --- src/arch/x86/init/bootblock.ld	(revision 6549)
> +++ src/arch/x86/init/bootblock.ld	(working copy)
> @@ -22,7 +22,6 @@
>   OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
>   OUTPUT_ARCH(i386)
>
> -TARGET(binary)
>   SECTIONS
>   {
>   	. = CONFIG_ROMBASE;
] interesting... does this hurt LTO?

Yes, and I do not currently understand exactly why. TARGET(binary)
does not seem to work as I interpret the documentation. The difference
in -flto behavior may be due to our linker script handling of section
names such as 'text.unlikely' that appear with -flto. Just a guess. I
found omitting TARGET(binary) solved a -flto build problem and stuck
with it.

Thanks,
Scott