On 03/16/2015 04:44 PM, Aaron Durbin wrote:
On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson tpearson@raptorengineeringinc.com wrote:
On 03/16/2015 09:23 AM, Aaron Durbin wrote:
On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson tpearson@raptorengineeringinc.com wrote:
All,
Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure.
Not sure if you want to revert that commit until someone can figure out what changed to cause the problem.
Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there.
Sure: https://raptorengineeringinc.com/coreboot/built.tar.bz2
Other oddities: GCC 4.8.3: normal/romstage 0x7ff80 stage 97345 normal/ramstage 0x97c40 stage 154869
GCC 4.9.2: normal/romstage 0x7ff80 stage 94773 normal/ramstage 0x97240 stage 173942
Note in particular, judging from the file sizes, that something seems to have been relocated from romstage to ramstage by the new gcc version.
I noticed you had CONFIG_COVERAGE selected in both the builds. Could you try not having that selected? I wonder if something changed in the compiler on that front. But... I think I found a bigger issue.
That shouldn't be a problem. For reference, should CONFIG_COVERAGE be on or off for board status report builds?
$ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_ 00146fc4 r pch_intel_wifi 00146fd0 R cpu_drivers 00146fd0 R epci_drivers 00146fd0 r model_10xxx 00146fdc R _bs_init_begin 00146fdc r cbmem_bscb 00146fdc R ecpu_drivers 00146ff0 r gcov_bscb 0014702c R _bs_init_end 0014702c R pnp_conf_mode_870155_aa 00147034 R pnp_conf_mode_a0a0_aa 0014703c R pnp_conf_mode_8787_aa 00147044 R pnp_conf_mode_7777_aa
$ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_ 001465c4 r pch_intel_wifi 001465d0 R cpu_drivers 001465d0 R epci_drivers 001465d0 r model_10xxx 001465dc R _bs_init_begin 001465dc R ecpu_drivers 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end 00146640 R pnp_conf_mode_870155_aa 00146648 R pnp_conf_mode_a0a0_aa 00146650 R pnp_conf_mode_8787_aa 00146658 R pnp_conf_mode_7777_a
The boot state callbacks place the whole structure for each entry between _bs_init_begin and _bs_init_end. For both binaries the size of 0x14.
For the 4.8.3 compiled ramstage I see: (gdb) p/x 0x0014702c - 0x00146fdc $12 = 0x50 For the 4.9.2 compiled ramstage I see: (gdb) p/x 0x014663c - 0x01465dc $14 = 0x60
0x60 is not a multiple of 0x14 -- which is means things aren't cool.
This makes perfect sense--whenever coreboot didn't hang outright it started infinitely spewing some message regarding a boot state callback already being complete.
Looking at the symbols it appears the compiler is aligning those structures to 32-bytes for some reason...
A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld
So I wonder if this is unique to AMD Fam10h or if a whole lot of other boards broke with the gcc update. I wouldn't have even caught this if I hadn't checked out a new coreboot tree instead of copying over the existing tree with the prebuilt crossgcc, so we might be looking at a ticking timebomb that will go off as people start upgrading their crossgcc versions...
But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason.
I am not at all familiar with the code in question, so all I can do is offer to test. Thanks for analysing the problem!
-Aaron
On Mon, Mar 16, 2015 at 4:49 PM, Timothy Pearson tpearson@raptorengineeringinc.com wrote:
On 03/16/2015 04:44 PM, Aaron Durbin wrote:
On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson tpearson@raptorengineeringinc.com wrote:
On 03/16/2015 09:23 AM, Aaron Durbin wrote:
On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson tpearson@raptorengineeringinc.com wrote:
All,
Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure.
Not sure if you want to revert that commit until someone can figure out what changed to cause the problem.
Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there.
Sure: https://raptorengineeringinc.com/coreboot/built.tar.bz2
Other oddities: GCC 4.8.3: normal/romstage 0x7ff80 stage 97345 normal/ramstage 0x97c40 stage 154869
GCC 4.9.2: normal/romstage 0x7ff80 stage 94773 normal/ramstage 0x97240 stage 173942
Note in particular, judging from the file sizes, that something seems to have been relocated from romstage to ramstage by the new gcc version.
I noticed you had CONFIG_COVERAGE selected in both the builds. Could you try not having that selected? I wonder if something changed in the compiler on that front. But... I think I found a bigger issue.
That shouldn't be a problem. For reference, should CONFIG_COVERAGE be on or off for board status report builds?
$ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_ 00146fc4 r pch_intel_wifi 00146fd0 R cpu_drivers 00146fd0 R epci_drivers 00146fd0 r model_10xxx 00146fdc R _bs_init_begin 00146fdc r cbmem_bscb 00146fdc R ecpu_drivers 00146ff0 r gcov_bscb 0014702c R _bs_init_end 0014702c R pnp_conf_mode_870155_aa 00147034 R pnp_conf_mode_a0a0_aa 0014703c R pnp_conf_mode_8787_aa 00147044 R pnp_conf_mode_7777_aa
$ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_ 001465c4 r pch_intel_wifi 001465d0 R cpu_drivers 001465d0 R epci_drivers 001465d0 r model_10xxx 001465dc R _bs_init_begin 001465dc R ecpu_drivers 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end 00146640 R pnp_conf_mode_870155_aa 00146648 R pnp_conf_mode_a0a0_aa 00146650 R pnp_conf_mode_8787_aa 00146658 R pnp_conf_mode_7777_a
The boot state callbacks place the whole structure for each entry between _bs_init_begin and _bs_init_end. For both binaries the size of 0x14.
For the 4.8.3 compiled ramstage I see: (gdb) p/x 0x0014702c - 0x00146fdc $12 = 0x50 For the 4.9.2 compiled ramstage I see: (gdb) p/x 0x014663c - 0x01465dc $14 = 0x60
0x60 is not a multiple of 0x14 -- which is means things aren't cool.
This makes perfect sense--whenever coreboot didn't hang outright it started infinitely spewing some message regarding a boot state callback already being complete.
Looking at the symbols it appears the compiler is aligning those structures to 32-bytes for some reason...
A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld
So I wonder if this is unique to AMD Fam10h or if a whole lot of other boards broke with the gcc update. I wouldn't have even caught this if I hadn't checked out a new coreboot tree instead of copying over the existing tree with the prebuilt crossgcc, so we might be looking at a ticking timebomb that will go off as people start upgrading their crossgcc versions...
It all sorta depends. But the issue that _bs_init_begin does not equal the address of the first bscb structure is bad news all around.
But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason.
I am not at all familiar with the code in question, so all I can do is offer to test. Thanks for analysing the problem!
I might be able to whip up a patch, but it's harder than I first thought because we were relying on arrays to be swept into those regions. I'll have to think on this one or we'll just have to change the API entirely for all the users.
-Aaron
-- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com