Hi Julius and Nico,
Thanks for the feedback!

Did you find out any particular (magic) construct we are currently using
that fails? Or is it the overall complexity of the script?
It trips on some arithmetics but I don't fully understand it yet, so my attempt was trial and error to get something linking and booting.

Btw. can clang+lld with LTO still link against GCC objects, e.g.
Linking is possible, but it cannot optimize that ofc. btw sometimes it's necessary to skip LTO on some specific C code like the arm eabi_compat.c . As a sidenote maybe https://github.com/AdaCore/gnat-llvm is an interesting route to have LTO in the clang with libgfxinit combo?

Do you have a rough list of the types of things that the LLD linker
cannot deal with (e.g. there seems to be something about not using a
symbol before it was defined, like with BOOTBLOCK_TOP, but then it
doesn't seem to apply everywhere, e.g. for ID_SECTION it still seems
to work?), so we can get an idea what kind of limitations we'd be
accepting here for both current and future linker scripting?

I'm not sure yet why LLD isn't happy about some arithmetics in the linker script. I'll investigate to get a clearer picture.

wouldn't mind some rewrites to the x86 bootblock script in general,
since some of it honestly seems unnecessarily convoluted anyway, but
it's more concerning if you need to drop features (like commenting out
all of those asserts at the end) when there's no way to make something
similar work with LLD.
The asserts don't work because of the LTO and LLD combination. The _bootblock / _ebootblock symbols get optimized away and are somehow set to 0. Referring to them inside the code would fix that. https://review.coreboot.org/c/coreboot/+/71871 is also a way to deal with it.

Also, are you sure that all the Arm boards are fine? Did you do a full
abuild and then also compare the images (with BUILD_TIMELESS) to make
sure the layouts didn't actually shift? We do a bunch of complicated
things in our linker scripts, I'm actually surprised that LLD would be
fine with everything besides the x86 bootblock (it didn't use to be at
all a couple of years ago, but I guess they may have improved it).

I only played with qemu on arm and arm64 and those still worked. Some more in depth comparison of the elf output is indeed needed. For instance with x86 stages it tripped the cbfstool assertion that loadable sections need to be consecutive.

I think the other big question here is: why do we care about clang at
all? If GCC can do LTO with BFD, why don't we just stick with that? My
understanding was that people just added clang support to coreboot
"for fun" to see if it was possible, and we said okay as long as you
can do it without having to break any code. But now if we do need to
make meaningful code changes to support it, should we reexamine why we
want it at all? Is anyone actually using it in a production use case
(and if so, why)? 
I personally have clang set as the default compiler on my system using site-local/Kconfig. I prefer its error messages. It also generates a bit different errors/warnings as GCC, so that's always nice in CI. Clang trades blows with GCC on code size. Especially with LTO clang can sometimes result in 10% smaller binaries than GCC LTO binaries. Last time I checked Linux only supports LTO with clang (that might not be true anymore), although I'm not so sure why. If newer language support like rust or zig is desirable in the future, then LTO with clang will work more easily, as the same LLVM IR is used. One cool feature of clang is that it can do reflection on C structs with compiler builtins: https://review.coreboot.org/c/coreboot/+/72460 . So in my opinion it's a tooling option worth exploring. "For fun" often precedes production use :-) .

On Sat, Feb 24, 2024 at 10:55 PM Nico Huber <nico.h@gmx.de> wrote:
Hi Arthur,

this sounds very interesting.

On 23.02.24 17:47, Arthur Heymans wrote:
> So my question would be, whether we want to support non BFD linkers and
> therefore not support all the magic in linker scripts we currently have.

Did you find out any particular (magic) construct we are currently using
that fails? Or is it the overall complexity of the script?

I was already wondering lately if we shouldn't split the complex x86
linker script for different use cases (e.g. native, FSP, etc.). If we
had three scripts instead of one, we would sometimes have to make the
same change in multiple files. But, OTOH, I don't think we touch the
linker scripts that much. And every time I look into the x86 one, it
seems hard to find top and bottom.

> If non BFD linkers are acceptable, how do you propose we deal with it? When
> not using LTO you could in principle use a different linker of your
> choosing. Should we just move to lld to make sure CI is happy about linker
> scripts? Should it be an option, but then we won?t have CI being able to
> test it except on a few boards put in configs/ ? Maybe have both, but then
> change the default once LLD is known to work well enough? Sidenote: LLD is
> faster than BFD but linking is not an expensive step in coreboot anyway.

I wouldn't mind switching to LLD. Would prefer that we focus on a
single linker, though. There is some soothing feeling when one knows
that everybody is using the same toolchain, and chances that bugs are
discovered early are higher.

Btw. can clang+lld with LTO still link against GCC objects, e.g.