[coreboot] GCC update broke AMD Fam10h boot

Mon Mar 16 22:49:18 CET 2015

On 03/16/2015 04:44 PM, Aaron Durbin wrote:
> On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson
> <tpearson at raptorengineeringinc.com>  wrote:
>> On 03/16/2015 09:23 AM, Aaron Durbin wrote:
>>>
>>> On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
>>> <tpearson at raptorengineeringinc.com>   wrote:
>>>>
>>>> All,
>>>>
>>>> Just a heads up as there is no bugtracker for this project.  GIT commit
>>>> 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2,
>>>> breaks
>>>> ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
>>>> but then goes into an infinite loop).  Downgrading the GCC version
>>>> repairs
>>>> the boot failure.
>>>>
>>>> Not sure if you want to revert that commit until someone can figure out
>>>> what
>>>> changed to cause the problem.
>>>
>>>
>>> Could post ramstage.elf from the two different builds somewhere? I'd
>>> like to take a peak at what is in there.
>>
>>
>> Sure:
>> https://raptorengineeringinc.com/coreboot/built.tar.bz2
>>
>> Other oddities:
>> GCC 4.8.3:
>> normal/romstage                0x7ff80    stage        97345
>> normal/ramstage                0x97c40    stage        154869
>>
>> GCC 4.9.2:
>> normal/romstage                0x7ff80    stage        94773
>> normal/ramstage                0x97240    stage        173942
>>
>> Note in particular, judging from the file sizes, that something seems to
>> have been relocated from romstage to ramstage by the new gcc version.
>>
>
> I noticed you had CONFIG_COVERAGE selected in both the builds. Could
> you try not having that selected? I wonder if something changed in the
> compiler on that front. But... I think I found a bigger issue.

That shouldn't be a problem.  For reference, should CONFIG_COVERAGE be 
on or off for board status report builds?

> $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_
> 00146fc4 r pch_intel_wifi
> 00146fd0 R cpu_drivers
> 00146fd0 R epci_drivers
> 00146fd0 r model_10xxx
> 00146fdc R _bs_init_begin
> 00146fdc r cbmem_bscb
> 00146fdc R ecpu_drivers
> 00146ff0 r gcov_bscb
> 0014702c R _bs_init_end
> 0014702c R pnp_conf_mode_870155_aa
> 00147034 R pnp_conf_mode_a0a0_aa
> 0014703c R pnp_conf_mode_8787_aa
> 00147044 R pnp_conf_mode_7777_aa
>
> $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_
> 001465c4 r pch_intel_wifi
> 001465d0 R cpu_drivers
> 001465d0 R epci_drivers
> 001465d0 r model_10xxx
> 001465dc R _bs_init_begin
> 001465dc R ecpu_drivers
> 001465e0 r cbmem_bscb
> 00146600 r gcov_bscb
> 0014663c R _bs_init_end
> 00146640 R pnp_conf_mode_870155_aa
> 00146648 R pnp_conf_mode_a0a0_aa
> 00146650 R pnp_conf_mode_8787_aa
> 00146658 R pnp_conf_mode_7777_a
>
> The boot state callbacks place the whole structure for each entry
> between _bs_init_begin and _bs_init_end. For both binaries the size of
> 0x14.

> For the 4.8.3 compiled ramstage I see:
> (gdb) p/x 0x0014702c - 0x00146fdc
> $12 = 0x50
> For the 4.9.2 compiled ramstage I see:
> (gdb) p/x 0x014663c - 0x01465dc
> $14 = 0x60
>
> 0x60 is not a multiple of 0x14 -- which is means things aren't cool.

This makes perfect sense--whenever coreboot didn't hang outright it 
started infinitely spewing some message regarding a boot state callback 
already being complete.

> Looking at the symbols it appears the compiler is aligning those
> structures to 32-bytes for some reason...
>
> A quick hack is add ALIGN(32) to the linker script before
> _bs_init_begin: src/arch/x86/ramstage.ld

So I wonder if this is unique to AMD Fam10h or if a whole lot of other 
boards broke with the gcc update.  I wouldn't have even caught this if I 
hadn't checked out a new coreboot tree instead of copying over the 
existing tree with the prebuilt crossgcc, so we might be looking at a 
ticking timebomb that will go off as people start upgrading their 
crossgcc versions...

> But I think we'll need to store pointers to the structures in order to
> properly handle the situation where the compiler is effectively making
> alignment/size decisions for some reason.

I am not at all familiar with the code in question, so all I can do is 
offer to test.  Thanks for analysing the problem!

> -Aaron

-- 
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645
http://www.raptorengineeringinc.com