Dear coreboot folks,
Am Donnerstag, den 24.04.2014, 23:39 +0200 schrieb Paul Menzel:
> in #coreboot on <irc.freenode.net> Stefan mentioned that link time
> optimization (LTO) [1] might yield some speed improvements for coreboot
> as the resulting firmware image might be smaller and therefore it takes
> less time to read it from flash.
>
> As LTO has been greatly improved in GCC 4.9.0 and is all over the news,
> has somebody already experimented with LTO or created patches or tools
> for testing?
>
> Is somebody able to share conclusions already? That’d be very
> interesting.
looking further in April and May 2011 Scott Duplichan even sent a patch
to add an option to enable LTO [2][3], but unfortunately it was not
submitted.
Back then on AMD Persimmon with SeaBIOS as payload the boot time was
reduced from 690 ms to 640 ms. (Fun fact: UEFI firmware took 10 s on
that board.)
Thanks,
Paul
> [1] http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> (search for `lto`)
[2] http://www.coreboot.org/pipermail/coreboot/2011-April/064859.html
[3] http://www.coreboot.org/pipermail/coreboot/2011-May/064874.html
On Tue, Apr 28, 2020 at 04:16:59PM +0200, Paul Menzel wrote:
> Dear coreboot folks,
>
>
> Despite ever increasing flash ROM chip sizes, small images are still desired
> for faster boot times, faster flash times, and more space for
> payloads, which is sometimes needed for adding several payloads
> (including GRUB/TianoCore) or Linux payloads.
>
> Jacob Garber did great work to achieve this goal by enabling Link Time
> Optimization (LTO) for coreboot [1] and libpayload [2]. While doing
> this, he also found and fixed several bugs in the code base.
>
> Currently, it fails for AMD AGESA boards due handling of illegal globals.
>
> If somebody has a solution for that, that’d be great.
>
> It’d be great, if more people could test this, on your boards, and report
> back.
>
> I propose, to submit the change-sets before the next release, and to enable
> LTO for libpayload by default, and to disable it for coreboot by default.
>
> Big thanks again to Jacob for doing this. (My attempt doing this for GRUB
> failed. ;-))
>
>
> Kind regards,
>
> Paul
>
>
> [1]: https://review.coreboot.org/c/coreboot/+/38989
> [2]: https://review.coreboot.org/c/coreboot/+/38291
Hey Paul, thanks for the encouraging words. :) I tidied up the patches today
and think they are ready for review. Not all targets compile yet, but all of
the LTO framework is there so people can test it out with their boards. For
example, the Thinkpad T500 boots successfully, and there is about a 10%
reduction in stage size and compilation time.
Right now I think it's safest to leave LTO disabled for libpayload and
coreboot. GCC 9 is the first version where LTO is considered "production
ready", and until we have that merged I think LTO should be considered
experimental. Once people have had some time to try it out and work out any
bugs we can move to making it the default in some cases.
Cheers,
Jacob
Dear coreboot folks,
in #coreboot on <irc.freenode.net> Stefan mentioned that link time
optimization (LTO) [1] might yield some speed improvements for coreboot
as the resulting firmware image might be smaller and therefore it takes
less time to read it from flash.
As LTO has been greatly improved in GCC 4.9.0 and is all over the news,
has somebody already experimented with LTO or created patches or tools
for testing?
Is somebody able to share conclusions already? That’d be very
interesting.
Thanks,
Paul
[1] http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
(search for `lto`)
Hi community
I picked up the good work done by Jacob Garber to implement link time
optimization. The size gains are very substantial, with sometimes up to 15%
reduction in binary size! Besides the size benefit it can make us a bit
more lazy programmers: for instance we often put static inline functions in
headers guarded by preprocessing depending on Kconfig options to not
generate code we won't use. With LTO that can be relaxed a bit and the
condition to do nothing can come from a separate linked object (at least I
presume, I haven't tested).
So to do LTO, both with clang and GCC, you use the compiler as a linker
frontend, rather than invoke the linker directly. Object files that are
optionally generated with the -flto flag to generate LTO optimized
binaries. See https://review.coreboot.org/c/coreboot/+/40811
Jacob's patches dealt with GCC. I toyed around with clang. The clang linker
frontend only works with LTO if using the GOLD or LLD linker. GCC can use
the BFD linker we currently use, so that's an easy transition. Both those
linkers, GOLD and LLD are less capable at parsing linker scripts than the
BFD linker. Especially for the x86 bootblock, in which we do a lot of
stuff, like ID, FIT pointer, ECFW pointer, early 16 bit code, ..., is
complicated and those linkers fail on the arithmetics doing all that
optimized linking.
So my question would be, whether we want to support non BFD linkers and
therefore not support all the magic in linker scripts we currently have. At
most we are a few bytes less optimal than possible by for instance setting
a good hardcoded size and offset of the early 16bit code rather than have
the linker script optimize that. As a side note, LTO size gainz are bigger
than what would be lost, by a large margin.
https://review.coreboot.org/c/coreboot/+/80724/1/src/arch/x86/bootblock.ld
is a crude example of how I got lld to be happy with the script.
If non BFD linkers are acceptable, how do you propose we deal with it? When
not using LTO you could in principle use a different linker of your
choosing. Should we just move to lld to make sure CI is happy about linker
scripts? Should it be an option, but then we won?t have CI being able to
test it except on a few boards put in configs/ ? Maybe have both, but then
change the default once LLD is known to work well enough? Sidenote: LLD is
faster than BFD but linking is not an expensive step in coreboot anyway.
Other more typical linker scripts like the ones you find on ARM and ARM64
platforms work just fine. There are some subtle differences e.g.
https://review.coreboot.org/c/coreboot/+/80735 where the heap needs to be
declared to not be loaded.
There are a few issues though. LLD does not like the RISCV relocation
symbols of libgcc.a... I read upstream bugreports about this issue, so
maybe that will resolve itself.
Same questions with LTO and CI. Should it be an option (IMO yes, as that's
what both Linux and u-boot do) and which default settings do we want. For
CI reasons LTO is a bit more strict and therefore useful, as the compiler
frontend does throw more warnings than the plain linker when linking.
What are your thoughts? Are there some caveats about linkers and LTO that
are worth knowing, before moving forward?
Kind regards
Arthur
Dear coreboot folks,
Despite ever increasing flash ROM chip sizes, small images are still
desired for faster boot times, faster flash times, and more space for
payloads, which is sometimes needed for adding several payloads
(including GRUB/TianoCore) or Linux payloads.
Jacob Garber did great work to achieve this goal by enabling Link Time
Optimization (LTO) for coreboot [1] and libpayload [2]. While doing
this, he also found and fixed several bugs in the code base.
Currently, it fails for AMD AGESA boards due handling of illegal globals.
> Yes, this is a current limitation of LTO right now. Because the
> object files are all lumped together into a single unit, all
> information about where the symbols came from is lost, so
> EXCLUDE_FILE is unable of excluding the AGESA objects from the
> illegal_globals check. Tracing where a symbol came from has been
> implemented in LLVM [0], but I'm not sure if it's on the roadmap for
> GCC. For now it's probably best to disable LTO when compiling AGESA.
>
> [0]: https://llvm.org/devmtg/2017-10/slides/LTOLinkerScriptsEdlerVonKoch.pdf
If somebody has a solution for that, that’d be great.
It’d be great, if more people could test this, on your boards, and
report back.
I propose, to submit the change-sets before the next release, and to
enable LTO for libpayload by default, and to disable it for coreboot by
default.
Big thanks again to Jacob for doing this. (My attempt doing this for
GRUB failed. ;-))
Kind regards,
Paul
[1]: https://review.coreboot.org/c/coreboot/+/38989
[2]: https://review.coreboot.org/c/coreboot/+/38291
Hi Julius and Nico,
Thanks for the feedback!
Did you find out any particular (magic) construct we are currently using
> that fails? Or is it the overall complexity of the script?
>
It trips on some arithmetics but I don't fully understand it yet, so my
attempt was trial and error to get something linking and booting.
Btw. can clang+lld with LTO still link against GCC objects, e.g.
> libgfxinit?
Linking is possible, but it cannot optimize that ofc. btw sometimes it's
necessary to skip LTO on some specific C code like the arm eabi_compat.c .
As a sidenote maybe https://github.com/AdaCore/gnat-llvm is an interesting
route to have LTO in the clang with libgfxinit combo?
Do you have a rough list of the types of things that the LLD linker
> cannot deal with (e.g. there seems to be something about not using a
> symbol before it was defined, like with BOOTBLOCK_TOP, but then it
> doesn't seem to apply everywhere, e.g. for ID_SECTION it still seems
> to work?), so we can get an idea what kind of limitations we'd be
> accepting here for both current and future linker scripting?
>
I'm not sure yet why LLD isn't happy about some arithmetics in the linker
script. I'll investigate to get a clearer picture.
I
> wouldn't mind some rewrites to the x86 bootblock script in general,
> since some of it honestly seems unnecessarily convoluted anyway, but
> it's more concerning if you need to drop features (like commenting out
> all of those asserts at the end) when there's no way to make something
> similar work with LLD.
The asserts don't work because of the LTO and LLD combination. The
_bootblock / _ebootblock symbols get optimized away and are somehow set to
0. Referring to them inside the code would fix that.
https://review.coreboot.org/c/coreboot/+/71871 is also a way to deal with
it.
Also, are you sure that all the Arm boards are fine? Did you do a full
> abuild and then also compare the images (with BUILD_TIMELESS) to make
> sure the layouts didn't actually shift? We do a bunch of complicated
> things in our linker scripts, I'm actually surprised that LLD would be
> fine with everything besides the x86 bootblock (it didn't use to be at
> all a couple of years ago, but I guess they may have improved it).
>
I only played with qemu on arm and arm64 and those still worked. Some more
in depth comparison of the elf output is indeed needed. For instance with
x86 stages it tripped the cbfstool assertion that loadable sections need to
be consecutive.
I think the other big question here is: why do we care about clang at
> all? If GCC can do LTO with BFD, why don't we just stick with that? My
> understanding was that people just added clang support to coreboot
> "for fun" to see if it was possible, and we said okay as long as you
> can do it without having to break any code. But now if we do need to
> make meaningful code changes to support it, should we reexamine why we
> want it at all? Is anyone actually using it in a production use case
> (and if so, why)?
I personally have clang set as the default compiler on my system using
site-local/Kconfig. I prefer its error messages. It also generates a bit
different errors/warnings as GCC, so that's always nice in CI. Clang trades
blows with GCC on code size. Especially with LTO clang can sometimes result
in 10% smaller binaries than GCC LTO binaries. Last time I checked Linux
only supports LTO with clang (that might not be true anymore), although I'm
not so sure why. If newer language support like rust or zig is desirable in
the future, then LTO with clang will work more easily, as the same LLVM IR
is used. One cool feature of clang is that it can do reflection on C
structs with compiler builtins:
https://review.coreboot.org/c/coreboot/+/72460 . So in my opinion it's a
tooling option worth exploring. "For fun" often precedes production use :-)
.
Arthur
On Sat, Feb 24, 2024 at 10:55 PM Nico Huber <nico.h(a)gmx.de> wrote:
> Hi Arthur,
>
> this sounds very interesting.
>
> On 23.02.24 17:47, Arthur Heymans wrote:
> > So my question would be, whether we want to support non BFD linkers and
> > therefore not support all the magic in linker scripts we currently have.
>
> Did you find out any particular (magic) construct we are currently using
> that fails? Or is it the overall complexity of the script?
>
> I was already wondering lately if we shouldn't split the complex x86
> linker script for different use cases (e.g. native, FSP, etc.). If we
> had three scripts instead of one, we would sometimes have to make the
> same change in multiple files. But, OTOH, I don't think we touch the
> linker scripts that much. And every time I look into the x86 one, it
> seems hard to find top and bottom.
>
> > If non BFD linkers are acceptable, how do you propose we deal with it?
> When
> > not using LTO you could in principle use a different linker of your
> > choosing. Should we just move to lld to make sure CI is happy about
> linker
> > scripts? Should it be an option, but then we won?t have CI being able to
> > test it except on a few boards put in configs/ ? Maybe have both, but
> then
> > change the default once LLD is known to work well enough? Sidenote: LLD
> is
> > faster than BFD but linking is not an expensive step in coreboot anyway.
>
> I wouldn't mind switching to LLD. Would prefer that we focus on a
> single linker, though. There is some soothing feeling when one knows
> that everybody is using the same toolchain, and chances that bugs are
> discovered early are higher.
>
> Btw. can clang+lld with LTO still link against GCC objects, e.g.
> libgfxinit?
>
> Nico
>
>
Hi Arthur,
this sounds very interesting.
On 23.02.24 17:47, Arthur Heymans wrote:
> So my question would be, whether we want to support non BFD linkers and
> therefore not support all the magic in linker scripts we currently have.
Did you find out any particular (magic) construct we are currently using
that fails? Or is it the overall complexity of the script?
I was already wondering lately if we shouldn't split the complex x86
linker script for different use cases (e.g. native, FSP, etc.). If we
had three scripts instead of one, we would sometimes have to make the
same change in multiple files. But, OTOH, I don't think we touch the
linker scripts that much. And every time I look into the x86 one, it
seems hard to find top and bottom.
> If non BFD linkers are acceptable, how do you propose we deal with it? When
> not using LTO you could in principle use a different linker of your
> choosing. Should we just move to lld to make sure CI is happy about linker
> scripts? Should it be an option, but then we won?t have CI being able to
> test it except on a few boards put in configs/ ? Maybe have both, but then
> change the default once LLD is known to work well enough? Sidenote: LLD is
> faster than BFD but linking is not an expensive step in coreboot anyway.
I wouldn't mind switching to LLD. Would prefer that we focus on a
single linker, though. There is some soothing feeling when one knows
that everybody is using the same toolchain, and chances that bugs are
discovered early are higher.
Btw. can clang+lld with LTO still link against GCC objects, e.g.
libgfxinit?
Nico
Hi Arthur,
First of all, thanks a lot for putting all this work into getting LTO
working. The benefits really seem promising!
Do you have a rough list of the types of things that the LLD linker
cannot deal with (e.g. there seems to be something about not using a
symbol before it was defined, like with BOOTBLOCK_TOP, but then it
doesn't seem to apply everywhere, e.g. for ID_SECTION it still seems
to work?), so we can get an idea what kind of limitations we'd be
accepting here for both current and future linker scripting? I
wouldn't mind some rewrites to the x86 bootblock script in general,
since some of it honestly seems unnecessarily convoluted anyway, but
it's more concerning if you need to drop features (like commenting out
all of those asserts at the end) when there's no way to make something
similar work with LLD.
Also, are you sure that all the Arm boards are fine? Did you do a full
abuild and then also compare the images (with BUILD_TIMELESS) to make
sure the layouts didn't actually shift? We do a bunch of complicated
things in our linker scripts, I'm actually surprised that LLD would be
fine with everything besides the x86 bootblock (it didn't use to be at
all a couple of years ago, but I guess they may have improved it).
I think the other big question here is: why do we care about clang at
all? If GCC can do LTO with BFD, why don't we just stick with that? My
understanding was that people just added clang support to coreboot
"for fun" to see if it was possible, and we said okay as long as you
can do it without having to break any code. But now if we do need to
make meaningful code changes to support it, should we reexamine why we
want it at all? Is anyone actually using it in a production use case
(and if so, why)?
On 4/28/11 8:01 PM, Scott Duplichan wrote:
> Adds a kconfig option to enable gcc link time optimization.
> Link time optimization reduces both rom stage and ram stage
> image size by removing unused functions and data. Reducing the
> image size saves boot time by minimizing the flash memory read
> and decompress time for ram stage.
>
> The option is off by default because of side effects such as
> long build time and unusable dwarf2 debug output. This
> option cuts persimmon+seabios DOS boot from SSD time from
> 690 ms to 640 ms.
Did you do some size tests with non-AGESA targets?
Does lto work with our "driver"s? I hoped that once we have LTO
available we could get rid of the distinction between drivers and
objects and handle everything the way we handle drivers now, letting gcc
remove the functions we don't need.
> Signed-off-by: Scott Duplichan<scott(a)notabs.org>
>
Should we instead probe for availability of -flto in
util/xcompile/xcompile and use it if it is there?
What's the problem with dwarf2? GCC 4.6 uses mostly dwarf4 unless you
manually force it to dwarf2. Will this still be a problem?
> Index: Makefile
> ===================================================================
> --- Makefile (revision 6549)
> +++ Makefile (working copy)
> @@ -211,7 +211,7 @@
> de$(EMPTY)fine $(1)-objs_$(2)_template
> $(obj)/$$(1).$(1).o: src/$$(1).$(2) $(obj)/config.h $(4)
> @printf " CC $$$$(subst $$$$(obj)/,,$$$$(@))\n"
> - $(CC) $(3) -MMD $$$$(CFLAGS) -c -o $$$$@ $$$$<
> + $(CC) $(3) -MMD $$$$(CFLAGS) $$$$(LTO_OPTIMIZE) -c -o $$$$@ $$$$<
Hm.. I think LTO_OPTIMIZE should be added to CFLAGS instead, that would
make the patch a whole lot less intrusive.
> Index: src/arch/x86/init/bootblock.ld
> ===================================================================
> --- src/arch/x86/init/bootblock.ld (revision 6549)
> +++ src/arch/x86/init/bootblock.ld (working copy)
> @@ -22,7 +22,6 @@
> OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
> OUTPUT_ARCH(i386)
>
> -TARGET(binary)
> SECTIONS
> {
> . = CONFIG_ROMBASE;
Hm interesting... does this hurt LTO?
On 3/25/10 3:47 PM, Myles Watson wrote:
>>> On 3/24/10 11:02 PM, repository service wrote:
>>>
>>>> -extern unsigned char AmlCode[];
>>>> +extern const acpi_header_t AmlCode;
>>>>
>>> And we're positive, this always does the right thing with gcc?
>>>
>> I am told that AmlCode is defined as array of (unsigned) char in
>> some other file. Declaring it as some other type here is not
>> valid C, and *will* break with GCC, with some options (-combine
>> or LTO at least) -- it will not compile.
>>
> The biggest worry for me is incorrect execution. If it doesn't compile when
> it breaks, then that's a good thing.
>
Well, if it breaks with LTO, we should fix it right away... I'll try and
prepare something.
--
coresystems GmbH • Brahmsstr. 16 • D-79104 Freiburg i. Br.
Tel.: +49 761 7668825 • Fax: +49 761 7664613
Email: info(a)coresystems.de • http://www.coresystems.de/
Registergericht: Amtsgericht Freiburg • HRB 7656
Geschäftsführer: Stefan Reinauer • Ust-IdNr.: DE245674866