Hi,
had a look at your logs:
On 11.11.2015 00:49, Patrick 'P. J.' McDermott wrote:
These systems fail to resume in one of the following ways:
- S3 resume (indicated by the SLP_TYP bit) is detected, SLP_TYP is cleared, DRAM receive-enable calibration fails with a timing under/overflow, the system resets, and coreboot boots normally into the payload (with the sleep LED still on) because SLP_TYP is now unset. See x200-resume-fail-receive-enable-calibration.log and t400-resume-fail-receive-enable-calibration.log.
- S3 resume is detected, SLP_TYP is cleared, raminit and the rest of romstage completes without error, but then something between the southbridge's smm_init() and cpu_initialize() hangs (maybe the system is stuck in SMM). See x200-resume-fail-smm-hang.log and t400-resume-fail-smm-hang.log.
I have yet no idea about the SMM hang.
- S3 resume is detected, SLP_TYP is cleared, romstage completes, but something within smm_init() hangs before dumping (possibly while clearing [1]) TCO1_STS bits. See t400-resume-fail-tco-hang.log
The logs are all a little garbled. It looks to me like this is exactly the same hang as in *-resume-fail-smm-hang.log.
There are a couple of other ways in which I've seen S3 resume fail, but these are the most common.
I thought of working around the first issue (clearing SLP_TYP, resetting due to a raminit error, then booting into the payload) by clearing SLP_TYP near the end of the romstage main() (after raminit). So I tried the following patch:
diff --git a/src/mainboard/lenovo/x200/romstage.c b/src/mainboard/lenovo/x200/romstage.c index 86a973f..915baf2 100644 --- a/src/mainboard/lenovo/x200/romstage.c +++ b/src/mainboard/lenovo/x200/romstage.c @@ -103,10 +103,6 @@ void main(unsigned long bist) #if CONFIG_HAVE_ACPI_RESUME printk(BIOS_DEBUG, "Resume from S3 detected.\n"); s3resume = 1;
/* Clear SLP_TYPE. This will break stage2 but
* we care for that when we get there.
*/
outl(pm1_cnt & ~(7 << 10), DEFAULT_PMBASE + 0x04);
#else printk(BIOS_DEBUG, "Resume from S3 detected, but disabled.\n"); #endif @@ -190,6 +186,11 @@ void main(unsigned long bist)
/* Magic for S3 resume */ pci_write_config32(PCI_DEV(0, 0, 0), D0F0_SKPD, SKPAD_ACPI_S3_MAGIC);
/* Clear SLP_TYPE. This will break stage2 but
* we care for that when we get there.
*/
} else { /* Magic for S3 resume */ pci_write_config32(PCI_DEV(0, 0, 0), D0F0_SKPD, SKPAD_NORMAL_BOOT_MAGIC);outl(pm1_cnt & ~(7 << 10), DEFAULT_PMBASE + 0x04);
But that just made these errors even more frequent. Trying to resume from S3 put the system into a reset loop with receive-enable calibration errors (see x200-patched-resume-fail-receive-enable-loop.log). So instead of rebooting into the payload or hanging, the system just resets forever.
This reset loop is very interesting. Did it end sometime? It could mean the worst, i.e. the RAM lost its configuration (self refresh failed). I suspect that's the case as there is not much difference in the normal vs. the resume path until receive-enable calibration.
Nico