Hello Michał,
No need to apologize for discouragement, this is a valuable information I wish I had earlier before I invested so much time into trying to solve the issue (I should've asked earlier ofc.).
At least the hope is that the production units should work. Now I guess it's time for us to solve the situation internally and with Intel.
Thank you again.
Regards, Jan Samek Siemens, s.r.o. ADV D EU CZ AE AC 7 jan.samek@siemens.com
________________________________________ From: Michał Żygowski michal.zygowski@3mdeb.com Sent: 23 August 2021 18:16 To: coreboot@coreboot.org Subject: [coreboot] Re: TigerLake RVP TCSS init failure
Hi Jan,
Unfortunately I haven't been able to resolve the issues I had and I can already see you are bumping into the same ones I have experienced. Also (unfortunately again) I don't have good news for you:
1. You can't make it work (I have been trying to find help form Intel how to move on with this platform without success). 2. Why you can't make it work? Most likely because you have an engineering sample (ES) CPU/SoC which is shipped by default along with this platform. And this ES CPU will simply not work, as I have been told. I have been fighting with this platform for weeks both with coreboot and MinPlatform without success.
Switching to a production SoC should do the magic according to Intel employees who tested on their side both coreboot and EDK2 MinPlatform. At this point I gave up fighting this unfair battle.
Sorry to be discouraging, I just want to give a sincere opinion of what I have experienced. I would consider buying a production SoC/CPU to save the time and frustration during the bringup (which should be very easy and fast with an RVP).
Best regards, -- Michał Żygowski Firmware Engineer GPG: 6B5BA214D21FCEB2 https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2F3mdeb.com%... | @3mdeb_com
On 8/20/21 3:22 PM, Samek, Jan wrote:
Hello Michał, Dear coreboot community,
Is there any update on this issue since the last message?
My situation is exactly the same with our custom TGL-UP3/LP4x board as well with Intel TGL-UP3-LP4x RVP. I am able to provide more details on this issue that came from my effort to resolve it.
The only change to the public code is that I removed the hard dependencies on chromeec (commented out EC calls from ec.c and forced board_id to TGL_UP3_LP4_MICRON) in order to use the original Intel EC binary that comes inside the reference UEFI image (I have no interest in EC development on my board and had problems with building chromeec for the RVP).
Observations:
- No matter whether I set or unset CONFIG_USE_INTEL_FSP_MP_INIT or CONFIG_USE_INTEL_FSP_TO_CALL_COREBOOT_PUBLISH_MP_PPI, I still get the same behavior ("Clearing pending MCEs... [reset]").
- Watch out for DCI, when enabled (e.g. partially - in FSP-M and not in FSP-S), it can make some assertions fail in debug FSP or cause resets with release FSP even before encountering the core issue.
- The gdb stub is currently broken for platforms that set IDT_IN_EVERY_STAGE=y - see my previous thread "GDB stub & bootblock dependencies (CONFIG_IDT_IN_EVERY_STAGE=y)" for a possible solution (sorry, can't upstream code from Siemens yet).
The output is similar to what Michał Żygowski already wrote before (see the attachment for a full version):
.... Clearing SMI status registers SMI_STS: PM1 PM1_STS: TMROF TCO_STS: INTRD_DET GPE0 STD STS: BATLOW smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7b000000, cpu = 0 In relocation handler: CPU 0 New SMBASE=0x7b000000 IEDBASE=0x7b400000 Writing SMRR. base = 0x7b000006, mask=0xff800c00 Relocation complete. smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7afff400, cpu = 3 In relocation handler: CPU 3 New SMBASE=0x7afff400 IEDBASE=0x7b400000 Relocation complete. smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7afff800, cpu = 2 In relocation handler: CPU 2 New SMBASE=0x7afff800 IEDBASE=0x7b400000 Writing SMRR. base = 0x7b000006, mask=0xff800c00 Relocation complete. smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7afffc00, cpu = 1 In relocation handler: CPU 1 New SMBASE=0x7afffc00 IEDBASE=0x7b400000 Relocation complete. smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7affec00, cpu = 5 In relocation handler: CPU 5 New SMBASE=0x7affec00 IEDBASE=0x7b400000 Writing SMRR. base = 0x7b000006, mask=0xff800c00 Relocation complete. smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7afff000, cpu = 4 In relocation handler: CPU 4 New SMBASE=0x7afff000 IEDBASE=0x7b400000 Relocation complete. smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7affe400, cpu = 7 In relocation handler: CPU 7 New SMBASE=0x7affe400 IEDBASE=0x7b400000 Writing SMRR. base = 0x7b000006, mask=0xff800c00 Relocation complete. smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7affe800, cpu = 6 In relocation handler: CPU 6 New SMBASE=0x7affe800 IEDBASE=0x7b400000 Relocation complete. Initializing CPU #0 CPU: vendor Intel device 806c1 CPU: family 06, model 8c, stepping 01 Clearing out pending MCEs [ here comes the reset ]
The defconfig is also attached. The coreboot version noted here is rather old because there's been a problem with the SPD data availability for the RVP and this is the single time it passed FSP meminit. Nevertheless it can be still reproduced on current version on the custom TGL-UP3/LP4X board (for which I unfortunately cannot provide sources).
On the custom board with the same CPU/DRAM configuration where the failure also occurs, I tried to skip the mca_configure() call in src/soc/intel/tigerlake/cpu.c but the failure just moves to LAPIC setup following it. Adding waiting loops before the mca_configure() call prevented the resets and has suggested that the cause might not be timing-dependent. Adding more debug output into the mca_configure() function in src/soc/intel/common/block/cpu/cpulib.c showed that the reset occurs just when the wrmsr call with values {0xffffffff, 0xffffffff} to some of the MCE banks in order to clear it (the number of the bank tends to be 4 but not all the time). GBLRST_CAUSE is always 00000000 00000000 after the reset.
According to public Intel SDM (#325462), volume 2D, page 6-14, section "Operation in a Uni-Processor Platform", there's an algorithm described in pseudofortrancode which corresponds to the actual implementation of mca_configure() in coreboot:
FOR I = 0 to IA32_MCG_CAP.COUNT-1 DO IF (IA32_MC[I]_STATUS = uncorrectable error) THEN #GP(0);
I don't know how to verify whether the cause of the reset is the GPE that can be caused by wrmsr. As mentioned, the GBLRST_CAUSE is always 0 after the reset occurs on the custom board.
Thanks for any ideas. Have a nice weekend.
Regards, Jan
Jan Samek Siemens, s.r.o. ADV D EU CZ AE AC 7 jan.samek@siemens.com
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
_______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org