On Tue, 21 Mar 2017 10:32:46 -0500 Timothy Pearson firstname.lastname@example.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/12/2017 07:58 PM, Daniel Kulesz via coreboot wrote:
as reported, the KGPE-D16 was mostly unusable for me in my 2x Opteron 6276 + 128 GB RAM configuration as it simply did not boot reliably - even with serial console debugging disabled completely. After experimenting with various config options and comparing my best "known half-working" config from earlier attempts, I finally found out that the hangs were related to the configuration and not to a specific coreboot version.
I attached the configs showing my current "reliable" setup (that survived 10 cold and 10 warm reboots without a single hangup!) and one of the previous "unreliable" setups which often needed several cold boots to successfully boot up once. There are several options which might be reposible for these hangs. Personally, I believe what helps is to completely disable the serial console and not just disable debugging to serial console.
As asked for previously, I also took some boot time measures from pressing the power button to "grub beep" in my 128 GB RAM configuration. Here they are:
vendor bios, unoptimized with iPXE setup: 59s coreboot, current with the "reliable" config: 73s coreboot, Jan 17 2017, with the "reliable" config: 91s coreboot, current with the "unreliable" config: 131s
I assume that further investigation of the root cause could help to locate the real bug (like e.g. the setup of the serial console). Yet, I hope that having a "working-good" config will be useful for people suffering from the same issue as I did. For me, this setup is still far from being what I expected (memory is clocked too low and idle power consumption is 170W instead of 90W), but at least the machine boots up reliably every time now.
Could you verify something for me? In internal tests it looks like setting CONFIG_SQUELCH_EARLY_SMP resolves the hang with the serial console enabled, but I need secondary verification of this due to the intermittent nature of the problem. You seem to be hardest hit by the bug so your system should make a good test case.
Unfortunately, I had the option already enabled when using the "config-unreliable" in my initial posting. So it looks like this setting is not effective in stopping the hangs.