[coreboot] GM45 S3 resume issues

Thu Nov 12 19:55:32 CET 2015

Hi,

had a look at your logs:

On 11.11.2015 00:49, Patrick 'P. J.' McDermott wrote:
> These systems fail to resume in one of the following ways:
> 
>   * S3 resume (indicated by the SLP_TYP bit) is detected, SLP_TYP is
>     cleared, DRAM receive-enable calibration fails with a timing
>     under/overflow, the system resets, and coreboot boots normally into
>     the payload (with the sleep LED still on) because SLP_TYP is now
>     unset.  See x200-resume-fail-receive-enable-calibration.log and
>     t400-resume-fail-receive-enable-calibration.log.
>   * S3 resume is detected, SLP_TYP is cleared, raminit and the rest of
>     romstage completes without error, but then something between the
>     southbridge's smm_init() and cpu_initialize() hangs (maybe the
>     system is stuck in SMM).  See x200-resume-fail-smm-hang.log and
>     t400-resume-fail-smm-hang.log.
I have yet no idea about the SMM hang.

>   * S3 resume is detected, SLP_TYP is cleared, romstage completes, but
>     something within smm_init() hangs before dumping (possibly while
>     clearing [1]) TCO1_STS bits.  See t400-resume-fail-tco-hang.log
The logs are all a little garbled. It looks to me like this is exactly
the same hang as in *-resume-fail-smm-hang.log.

> There are a couple of other ways in which I've seen S3 resume fail, but
> these are the most common.
> 
> I thought of working around the first issue (clearing SLP_TYP, resetting
> due to a raminit error, then booting into the payload) by clearing
> SLP_TYP near the end of the romstage main() (after raminit).  So I tried
> the following patch:
> 
> ---
> diff --git a/src/mainboard/lenovo/x200/romstage.c b/src/mainboard/lenovo/x200/romstage.c
> index 86a973f..915baf2 100644
> --- a/src/mainboard/lenovo/x200/romstage.c
> +++ b/src/mainboard/lenovo/x200/romstage.c
> @@ -103,10 +103,6 @@ void main(unsigned long bist)
>  #if CONFIG_HAVE_ACPI_RESUME
>  		printk(BIOS_DEBUG, "Resume from S3 detected.\n");
>  		s3resume = 1;
> -		/* Clear SLP_TYPE. This will break stage2 but
> -		 * we care for that when we get there.
> -		 */
> -		outl(pm1_cnt & ~(7 << 10), DEFAULT_PMBASE + 0x04);
>  #else
>  		printk(BIOS_DEBUG, "Resume from S3 detected, but disabled.\n");
>  #endif
> @@ -190,6 +186,11 @@ void main(unsigned long bist)
>  
>  		/* Magic for S3 resume */
>  		pci_write_config32(PCI_DEV(0, 0, 0), D0F0_SKPD, SKPAD_ACPI_S3_MAGIC);
> +
> +		/* Clear SLP_TYPE. This will break stage2 but
> +		 * we care for that when we get there.
> +		 */
> +		outl(pm1_cnt & ~(7 << 10), DEFAULT_PMBASE + 0x04);
>  	} else {
>  		/* Magic for S3 resume */
>  		pci_write_config32(PCI_DEV(0, 0, 0), D0F0_SKPD, SKPAD_NORMAL_BOOT_MAGIC);
> ---
> 
> But that just made these errors even more frequent.  Trying to resume
> from S3 put the system into a reset loop with receive-enable calibration
> errors (see x200-patched-resume-fail-receive-enable-loop.log).  So
> instead of rebooting into the payload or hanging, the system just resets
> forever.
This reset loop is very interesting. Did it end sometime? It could mean
the worst, i.e. the RAM lost its configuration (self refresh failed). I
suspect that's the case as there is not much difference in the normal
vs. the resume path until receive-enable calibration.

Nico