Hi community
I picked up the good work done by Jacob Garber to implement link time optimization. The size gains are very substantial, with sometimes up to 15% reduction in binary size! Besides the size benefit it can make us a bit more lazy programmers: for instance we often put static inline functions in headers guarded by preprocessing depending on Kconfig options to not generate code we won't use. With LTO that can be relaxed a bit and the condition to do nothing can come from a separate linked object (at least I presume, I haven't tested).
So to do LTO, both with clang and GCC, you use the compiler as a linker frontend, rather than invoke the linker directly. Object files that are optionally generated with the -flto flag to generate LTO optimized binaries. See https://review.coreboot.org/c/coreboot/+/40811
Jacob's patches dealt with GCC. I toyed around with clang. The clang linker frontend only works with LTO if using the GOLD or LLD linker. GCC can use the BFD linker we currently use, so that's an easy transition. Both those linkers, GOLD and LLD are less capable at parsing linker scripts than the BFD linker. Especially for the x86 bootblock, in which we do a lot of stuff, like ID, FIT pointer, ECFW pointer, early 16 bit code, ..., is complicated and those linkers fail on the arithmetics doing all that optimized linking.
So my question would be, whether we want to support non BFD linkers and therefore not support all the magic in linker scripts we currently have. At most we are a few bytes less optimal than possible by for instance setting a good hardcoded size and offset of the early 16bit code rather than have the linker script optimize that. As a side note, LTO size gainz are bigger than what would be lost, by a large margin. https://review.coreboot.org/c/coreboot/+/80724/1/src/arch/x86/bootblock.ld is a crude example of how I got lld to be happy with the script.
If non BFD linkers are acceptable, how do you propose we deal with it? When not using LTO you could in principle use a different linker of your choosing. Should we just move to lld to make sure CI is happy about linker scripts? Should it be an option, but then we won?t have CI being able to test it except on a few boards put in configs/ ? Maybe have both, but then change the default once LLD is known to work well enough? Sidenote: LLD is faster than BFD but linking is not an expensive step in coreboot anyway.
Other more typical linker scripts like the ones you find on ARM and ARM64 platforms work just fine. There are some subtle differences e.g. https://review.coreboot.org/c/coreboot/+/80735 where the heap needs to be declared to not be loaded.
There are a few issues though. LLD does not like the RISCV relocation symbols of libgcc.a... I read upstream bugreports about this issue, so maybe that will resolve itself.
Same questions with LTO and CI. Should it be an option (IMO yes, as that's what both Linux and u-boot do) and which default settings do we want. For CI reasons LTO is a bit more strict and therefore useful, as the compiler frontend does throw more warnings than the plain linker when linking.
What are your thoughts? Are there some caveats about linkers and LTO that are worth knowing, before moving forward?
Kind regards
Arthur
Hi Arthur,
this sounds very interesting.
On 23.02.24 17:47, Arthur Heymans wrote:
So my question would be, whether we want to support non BFD linkers and therefore not support all the magic in linker scripts we currently have.
Did you find out any particular (magic) construct we are currently using that fails? Or is it the overall complexity of the script?
I was already wondering lately if we shouldn't split the complex x86 linker script for different use cases (e.g. native, FSP, etc.). If we had three scripts instead of one, we would sometimes have to make the same change in multiple files. But, OTOH, I don't think we touch the linker scripts that much. And every time I look into the x86 one, it seems hard to find top and bottom.
If non BFD linkers are acceptable, how do you propose we deal with it? When not using LTO you could in principle use a different linker of your choosing. Should we just move to lld to make sure CI is happy about linker scripts? Should it be an option, but then we won?t have CI being able to test it except on a few boards put in configs/ ? Maybe have both, but then change the default once LLD is known to work well enough? Sidenote: LLD is faster than BFD but linking is not an expensive step in coreboot anyway.
I wouldn't mind switching to LLD. Would prefer that we focus on a single linker, though. There is some soothing feeling when one knows that everybody is using the same toolchain, and chances that bugs are discovered early are higher.
Btw. can clang+lld with LTO still link against GCC objects, e.g. libgfxinit?
Nico
Hi Julius and Nico, Thanks for the feedback!
Did you find out any particular (magic) construct we are currently using
that fails? Or is it the overall complexity of the script?
It trips on some arithmetics but I don't fully understand it yet, so my attempt was trial and error to get something linking and booting.
Btw. can clang+lld with LTO still link against GCC objects, e.g.
libgfxinit?
Linking is possible, but it cannot optimize that ofc. btw sometimes it's necessary to skip LTO on some specific C code like the arm eabi_compat.c . As a sidenote maybe https://github.com/AdaCore/gnat-llvm is an interesting route to have LTO in the clang with libgfxinit combo?
Do you have a rough list of the types of things that the LLD linker
cannot deal with (e.g. there seems to be something about not using a symbol before it was defined, like with BOOTBLOCK_TOP, but then it doesn't seem to apply everywhere, e.g. for ID_SECTION it still seems to work?), so we can get an idea what kind of limitations we'd be accepting here for both current and future linker scripting?
I'm not sure yet why LLD isn't happy about some arithmetics in the linker script. I'll investigate to get a clearer picture.
I
wouldn't mind some rewrites to the x86 bootblock script in general, since some of it honestly seems unnecessarily convoluted anyway, but it's more concerning if you need to drop features (like commenting out all of those asserts at the end) when there's no way to make something similar work with LLD.
The asserts don't work because of the LTO and LLD combination. The _bootblock / _ebootblock symbols get optimized away and are somehow set to 0. Referring to them inside the code would fix that. https://review.coreboot.org/c/coreboot/+/71871 is also a way to deal with it.
Also, are you sure that all the Arm boards are fine? Did you do a full
abuild and then also compare the images (with BUILD_TIMELESS) to make sure the layouts didn't actually shift? We do a bunch of complicated things in our linker scripts, I'm actually surprised that LLD would be fine with everything besides the x86 bootblock (it didn't use to be at all a couple of years ago, but I guess they may have improved it).
I only played with qemu on arm and arm64 and those still worked. Some more in depth comparison of the elf output is indeed needed. For instance with x86 stages it tripped the cbfstool assertion that loadable sections need to be consecutive.
I think the other big question here is: why do we care about clang at
all? If GCC can do LTO with BFD, why don't we just stick with that? My understanding was that people just added clang support to coreboot "for fun" to see if it was possible, and we said okay as long as you can do it without having to break any code. But now if we do need to make meaningful code changes to support it, should we reexamine why we want it at all? Is anyone actually using it in a production use case (and if so, why)?
I personally have clang set as the default compiler on my system using site-local/Kconfig. I prefer its error messages. It also generates a bit different errors/warnings as GCC, so that's always nice in CI. Clang trades blows with GCC on code size. Especially with LTO clang can sometimes result in 10% smaller binaries than GCC LTO binaries. Last time I checked Linux only supports LTO with clang (that might not be true anymore), although I'm not so sure why. If newer language support like rust or zig is desirable in the future, then LTO with clang will work more easily, as the same LLVM IR is used. One cool feature of clang is that it can do reflection on C structs with compiler builtins: https://review.coreboot.org/c/coreboot/+/72460 . So in my opinion it's a tooling option worth exploring. "For fun" often precedes production use :-) .
Arthur On Sat, Feb 24, 2024 at 10:55 PM Nico Huber nico.h@gmx.de wrote:
Hi Arthur,
this sounds very interesting.
On 23.02.24 17:47, Arthur Heymans wrote:
So my question would be, whether we want to support non BFD linkers and therefore not support all the magic in linker scripts we currently have.
Did you find out any particular (magic) construct we are currently using that fails? Or is it the overall complexity of the script?
I was already wondering lately if we shouldn't split the complex x86 linker script for different use cases (e.g. native, FSP, etc.). If we had three scripts instead of one, we would sometimes have to make the same change in multiple files. But, OTOH, I don't think we touch the linker scripts that much. And every time I look into the x86 one, it seems hard to find top and bottom.
If non BFD linkers are acceptable, how do you propose we deal with it?
When
not using LTO you could in principle use a different linker of your choosing. Should we just move to lld to make sure CI is happy about
linker
scripts? Should it be an option, but then we won?t have CI being able to test it except on a few boards put in configs/ ? Maybe have both, but
then
change the default once LLD is known to work well enough? Sidenote: LLD
is
faster than BFD but linking is not an expensive step in coreboot anyway.
I wouldn't mind switching to LLD. Would prefer that we focus on a single linker, though. There is some soothing feeling when one knows that everybody is using the same toolchain, and chances that bugs are discovered early are higher.
Btw. can clang+lld with LTO still link against GCC objects, e.g. libgfxinit?
Nico
Hi Arthur,
First of all, thanks a lot for putting all this work into getting LTO working. The benefits really seem promising!
Do you have a rough list of the types of things that the LLD linker cannot deal with (e.g. there seems to be something about not using a symbol before it was defined, like with BOOTBLOCK_TOP, but then it doesn't seem to apply everywhere, e.g. for ID_SECTION it still seems to work?), so we can get an idea what kind of limitations we'd be accepting here for both current and future linker scripting? I wouldn't mind some rewrites to the x86 bootblock script in general, since some of it honestly seems unnecessarily convoluted anyway, but it's more concerning if you need to drop features (like commenting out all of those asserts at the end) when there's no way to make something similar work with LLD.
Also, are you sure that all the Arm boards are fine? Did you do a full abuild and then also compare the images (with BUILD_TIMELESS) to make sure the layouts didn't actually shift? We do a bunch of complicated things in our linker scripts, I'm actually surprised that LLD would be fine with everything besides the x86 bootblock (it didn't use to be at all a couple of years ago, but I guess they may have improved it).
I think the other big question here is: why do we care about clang at all? If GCC can do LTO with BFD, why don't we just stick with that? My understanding was that people just added clang support to coreboot "for fun" to see if it was possible, and we said okay as long as you can do it without having to break any code. But now if we do need to make meaningful code changes to support it, should we reexamine why we want it at all? Is anyone actually using it in a production use case (and if so, why)?