[coreboot] Re: TigerLake RVP TCSS init failure

20 Aug 2021


      Hello Michał,
Dear coreboot community,
Is there any update on this issue since the last message?
My situation is exactly the same with our custom TGL-UP3/LP4x board as well with Intel
TGL-UP3-LP4x RVP. I am able to provide more details on this issue that came from my
effort to resolve it.
The only change to the public code is that I removed the hard dependencies on
chromeec (commented out EC calls from ec.c and forced board_id to
TGL_UP3_LP4_MICRON) in order to use the original Intel EC binary that comes
inside the reference UEFI image (I have no interest in EC development on my
board and had problems with building chromeec for the RVP).
Observations:
- No matter whether I set or unset CONFIG_USE_INTEL_FSP_MP_INIT or
  CONFIG_USE_INTEL_FSP_TO_CALL_COREBOOT_PUBLISH_MP_PPI, I still get
  the same behavior ("Clearing pending MCEs... [reset]"). 
- Watch out for DCI, when enabled (e.g. partially - in FSP-M and not in FSP-S), it
  can make some assertions fail in debug FSP or cause resets with release FSP
  even before encountering the core issue.
- The gdb stub is currently broken for platforms that set IDT_IN_EVERY_STAGE=y -
  see my previous thread "GDB stub & bootblock dependencies
  (CONFIG_IDT_IN_EVERY_STAGE=y)" for a possible solution (sorry,  can't
  upstream code from Siemens yet).
The output is similar to what Michał Żygowski already wrote before (see
the attachment for a full version):
....
    Clearing SMI status registers
    SMI_STS: PM1
    PM1_STS: TMROF
    TCO_STS: INTRD_DET
    GPE0 STD STS: BATLOW
    smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7b000000, cpu = 0
    In relocation handler: CPU 0
    New SMBASE=0x7b000000 IEDBASE=0x7b400000
    Writing SMRR. base = 0x7b000006, mask=0xff800c00
    Relocation complete.
    smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7afff400, cpu = 3
    In relocation handler: CPU 3
    New SMBASE=0x7afff400 IEDBASE=0x7b400000
    Relocation complete.
    smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7afff800, cpu = 2
    In relocation handler: CPU 2
    New SMBASE=0x7afff800 IEDBASE=0x7b400000
    Writing SMRR. base = 0x7b000006, mask=0xff800c00
    Relocation complete.
    smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7afffc00, cpu = 1
    In relocation handler: CPU 1
    New SMBASE=0x7afffc00 IEDBASE=0x7b400000
    Relocation complete.
    smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7affec00, cpu = 5
    In relocation handler: CPU 5
    New SMBASE=0x7affec00 IEDBASE=0x7b400000
    Writing SMRR. base = 0x7b000006, mask=0xff800c00
    Relocation complete.
    smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7afff000, cpu = 4
    In relocation handler: CPU 4
    New SMBASE=0x7afff000 IEDBASE=0x7b400000
    Relocation complete.
    smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7affe400, cpu = 7
    In relocation handler: CPU 7
    New SMBASE=0x7affe400 IEDBASE=0x7b400000
    Writing SMRR. base = 0x7b000006, mask=0xff800c00
    Relocation complete.
    smm_do_relocation : curr_smbase 0x30000 perm_smbase 0x7affe800, cpu = 6
    In relocation handler: CPU 6
    New SMBASE=0x7affe800 IEDBASE=0x7b400000
    Relocation complete.
    Initializing CPU #0
    CPU: vendor Intel device 806c1
    CPU: family 06, model 8c, stepping 01
    Clearing out pending MCEs
    [ here comes the reset ]
The defconfig is also attached. The coreboot version noted here is rather old
because there's been a problem with the SPD data availability for
the RVP and this is the single time it passed FSP meminit. Nevertheless
it can be still reproduced on current version on the custom TGL-UP3/LP4X
board (for which I unfortunately cannot provide sources).
On the custom board with the same CPU/DRAM configuration where the failure
also occurs, I tried to skip the mca_configure() call in src/soc/intel/tigerlake/cpu.c
but the failure just moves to LAPIC setup following it. Adding waiting loops before the
mca_configure() call prevented the resets and has suggested that the cause
might not be timing-dependent. Adding more debug output into the mca_configure()
function in src/soc/intel/common/block/cpu/cpulib.c showed that the reset occurs
just when the wrmsr call with values {0xffffffff, 0xffffffff} to some of the MCE
banks in order to clear it (the number of the bank tends to be 4 but not all the time).
GBLRST_CAUSE is always 00000000 00000000 after the reset.
According to public Intel SDM (#325462), volume 2D, page 6-14, section "Operation in
a Uni-Processor Platform", there's an algorithm described in pseudofortrancode which
corresponds to the actual implementation of mca_configure() in coreboot:
FOR I = 0 to IA32_MCG_CAP.COUNT-1 DO
        IF (IA32_MC[I]_STATUS = uncorrectable error)
            THEN #GP(0);
I don't know how to verify whether the cause of the reset is the GPE that can be caused by
wrmsr. As mentioned, the GBLRST_CAUSE is always 0 after the reset occurs on the custom
board.
Thanks for any ideas.
Have a nice weekend.
Regards,
Jan
Jan Samek
Siemens, s.r.o.
ADV D EU CZ AE AC 7
jan.samek@siemens.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[coreboot] Re: TigerLake RVP TCSS init failure