As pointed out by you, PCIe link might not yet have completed the training. Maybe checking the link training completion bit until a certain timeout can be an ideal solution.
Try the diff below: diff --git a/src/device/pci_device.c b/src/device/pci_device.c index 4b5e73b806..c96e9f1e9d 100644 --- a/src/device/pci_device.c +++ b/src/device/pci_device.c @@ -1213,6 +1213,7 @@ static void pci_scan_hidden_device(struct device *dev) * @param min_devfn Minimum devfn to look at in the scan, usually 0x00. * @param max_devfn Maximum devfn to look at in the scan, usually 0xff. */ +#define PCIE_TRAIN_RETRY 10000 void pci_scan_bus(struct bus *bus, unsigned int min_devfn, unsigned int max_devfn) { @@ -1254,6 +1255,14 @@ void pci_scan_bus(struct bus *bus, unsigned int min_devfn, continue; }
+ /* Wait for training to complete */ + u16 lnk, try, cap = pci_find_capability(dev, PCI_CAP_ID_PCIE); + for (try = PCIE_TRAIN_RETRY; try > 0; try--) { + lnk = pci_read_config16(dev, cap + PCI_EXP_LNKSTA); + if (!(lnk & PCI_EXP_LNKSTA_LT)) + break; + udelay(100); + } /* See if a device is present and setup the device structure. */ dev = pci_probe_dev(dev, bus, devfn);
Regards, Naresh
On Tue, Sep 21, 2021 at 4:48 PM Sumo kingsumos@gmail.com wrote:
Hi,
Anything else can be tried instead of using simple delays? Perhaps during the enumeration for some devices a delay is really required, for the NVMe case the vendor/device ID pair isn't detected at all... I'm not sure if the PCIe link is still training (I don't have an analyzer) or if the device is still booting...
Kind regards, Sumo
On Tue, Aug 17, 2021 at 8:34 AM Sumo kingsumos@gmail.com wrote:
Hi,
I have managed to disable the UART console and collect the logs via cbmem tool. Therefore it will not add any additional delay even with debug logs enabled so we can compare the logs with and without the delay.
Here are my findings:
- without the delay in dev_enumerate() the NVMe because isn't detected at
all (i.e. the PCIe device isn't shown in the bus): PCI: 00:0b.0 scanning... do_pci_scan_bridge for PCI: 00:0b.0 PCI: 00:0b.0: Enabled LTR PCI: pci_scan_bus for bus 04 POST: 0x24
*PCI: Static device PCI: 04:00.0 not found, disabling it.*POST: 0x25 PCI: Leftover static devices: PCI: 04:00.0 PCI: Check your devicetree.cb. POST: 0x55 scan_bus: bus PCI: 00:0b.0 finished in 0 msecs
- by adding the delay the device is detected and initialized:
PCI: 00:0b.0 scanning... do_pci_scan_bridge for PCI: 00:0b.0 PCI: 00:0b.0: Enabled LTR PCI: pci_scan_bus for bus 04 POST: 0x24
*PCI: 04:00.0 [1987/5012] enabled*POST: 0x25 POST: 0x55 Enabling Common Clock Configuration PCIE CLK PM is not supported by endpoint L1 Sub-State supported from root port 11 L1 Sub-State Support = 0xf CommonModeRestoreTime = 0x28 Power On Value = 0x6, Power On Scale = 0x1 ASPM: Enabled L1 PCIe: Max_Payload_Size adjusted to 256 PCI: 04:00.0: Enabled LTR PCI: 04:00.0: Programmed LTR max latencies scan_bus: bus PCI: 00:0b.0 finished in 0 msecs
Also, another device is failing if I remove the delay - it's a I211 gigabit ethernet controller: PCI: 00:0f.0 scanning... do_pci_scan_bridge for PCI: 00:0f.0 PCI: 00:0f.0: Enabled LTR PCI: pci_scan_bus for bus 05 POST: 0x24
*PCI: Static device PCI: 05:00.0 not found, disabling it.*POST: 0x25 PCI: Leftover static devices: PCI: 05:00.0 PCI: Check your devicetree.cb. POST: 0x55 scan_bus: bus PCI: 00:0f.0 finished in 0 msecs
With the delay the I211 is detected: PCI: 00:0f.0 scanning... do_pci_scan_bridge for PCI: 00:0f.0 PCI: 00:0f.0: Enabled LTR PCI: pci_scan_bus for bus 05 POST: 0x24
*PCI: 05:00.0 [8086/1539] enabled*POST: 0x25 POST: 0x55 Enabling Common Clock Configuration PCIE CLK PM is not supported by endpoint ASPM: Enabled L1 PCIe: Max_Payload_Size adjusted to 256 PCI: 05:00.0: No LTR support scan_bus: bus PCI: 00:0f.0 finished in 0 msecs
Full logs are attached.
Kind regards, Sumo
On Mon, Aug 16, 2021 at 8:01 PM Sumo kingsumos@gmail.com wrote:
Hi Paul,
When logs are (almost) disabled the error isn't shown, so if I add the delay with logs disabled the log output will have almost no difference at all.
Following are the logs, including a log with Coreboot debug enabled + no delay. For all logs FSP loglevel is set to NoDebug:
- nvme-err.log : no delay; coreboot debug_level=Error; NVMe error: at
the end of the log is shown the error in the UEFI FW: ERROR: C40000002:V02010007 I0 93B80004-9FB3-11D4-9A3A-0090273FC14D 7E90A998;
- nvme-ok-delay.log : 20ms delay; coreboot debug_level=Error; NVMe ok;
- nvme-ok.log : no delay; coreboot debug_level=Spew; NVMe ok: the
coreboot log output is enough to make NVMe work properly;
The NVMe is in the root port 00:0b.0, it is shown as 04:00.0
Thanks, Sumo
On Mon, Aug 16, 2021 at 2:57 PM Paul Menzel pmenzel@molgen.mpg.de wrote:
Dear Sumo,
Am 16.08.21 um 18:38 schrieb Sumo:
The NVMe is not detected when serial console logs are disabled, I
mean by
setting both Coreboot log_level=Error (or less) and FSP PcdFspDebugPrintErrorLevel=NoDebug. Looks like the enumeration fails
then
further on the device is not listed in the UEFI FW (same issue shown
in
either CorebootPayloadPkg or UefiPayloadPkg). When Linux boots the
device
appears normally.
The problem is fixed by adding a small delay inside dev_enumerate() -
a
20ms delay at the very beginning of the function is enough. I'm
wondering
if there is a better solution for this, the device is already defined
in
the devicetree.cb (set as on). Maybe coreboot is too fast and the
NVMe is
still booting up - or the PCIe link is still training, not sure.
Coreboot
doesn't retry if the device is not detected right away?
Please share the logs without and with the delay.
Kind regards,
Paul
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org