Hi,

Anything else can be tried instead of using simple delays?
Perhaps during the enumeration for some devices a delay is really required, for the NVMe case the vendor/device ID pair isn't detected at all... I'm not sure if the PCIe link is still training (I don't have an analyzer) or if the device is still booting...

Kind regards,
Sumo

On Tue, Aug 17, 2021 at 8:34 AM Sumo <kingsumos@gmail.com> wrote:
Hi,

I have managed to disable the UART console and collect the logs via cbmem tool. Therefore it will not add any additional delay even with debug logs enabled so we can compare the logs with and without the delay.

Here are my findings:
- without the delay in dev_enumerate() the NVMe because isn't detected at all (i.e. the PCIe device isn't shown in the bus):
PCI: 00:0b.0 scanning...
do_pci_scan_bridge for PCI: 00:0b.0
PCI: 00:0b.0: Enabled LTR
PCI: pci_scan_bus for bus 04
POST: 0x24
PCI: Static device PCI: 04:00.0 not found, disabling it.
POST: 0x25
PCI: Leftover static devices:
PCI: 04:00.0
PCI: Check your devicetree.cb.
POST: 0x55
scan_bus: bus PCI: 00:0b.0 finished in 0 msecs

- by adding the delay the device is detected and initialized:
PCI: 00:0b.0 scanning...
do_pci_scan_bridge for PCI: 00:0b.0
PCI: 00:0b.0: Enabled LTR
PCI: pci_scan_bus for bus 04
POST: 0x24
PCI: 04:00.0 [1987/5012] enabled
POST: 0x25
POST: 0x55
Enabling Common Clock Configuration
PCIE CLK PM is not supported by endpoint
L1 Sub-State supported from root port 11
L1 Sub-State Support = 0xf
CommonModeRestoreTime = 0x28
Power On Value = 0x6, Power On Scale = 0x1
ASPM: Enabled L1
PCIe: Max_Payload_Size adjusted to 256
PCI: 04:00.0: Enabled LTR
PCI: 04:00.0: Programmed LTR max latencies
scan_bus: bus PCI: 00:0b.0 finished in 0 msecs

Also, another device is failing if I remove the delay - it's a I211 gigabit ethernet controller:
PCI: 00:0f.0 scanning...
do_pci_scan_bridge for PCI: 00:0f.0
PCI: 00:0f.0: Enabled LTR
PCI: pci_scan_bus for bus 05
POST: 0x24
PCI: Static device PCI: 05:00.0 not found, disabling it.
POST: 0x25
PCI: Leftover static devices:
PCI: 05:00.0
PCI: Check your devicetree.cb.
POST: 0x55
scan_bus: bus PCI: 00:0f.0 finished in 0 msecs

With the delay the I211 is detected:
PCI: 00:0f.0 scanning...
do_pci_scan_bridge for PCI: 00:0f.0
PCI: 00:0f.0: Enabled LTR
PCI: pci_scan_bus for bus 05
POST: 0x24
PCI: 05:00.0 [8086/1539] enabled
POST: 0x25
POST: 0x55
Enabling Common Clock Configuration
PCIE CLK PM is not supported by endpoint
ASPM: Enabled L1
PCIe: Max_Payload_Size adjusted to 256
PCI: 05:00.0: No LTR support
scan_bus: bus PCI: 00:0f.0 finished in 0 msecs

Full logs are attached.

Kind regards,
Sumo

On Mon, Aug 16, 2021 at 8:01 PM Sumo <kingsumos@gmail.com> wrote:
Hi Paul,

When logs are (almost) disabled the error isn't shown, so if I add the delay with logs disabled the log output will have almost no difference at all.

Following are the logs, including a log with Coreboot debug enabled + no delay. For all logs FSP loglevel is set to NoDebug:
- nvme-err.log : no delay; coreboot debug_level=Error; NVMe error: at the end of the log is shown the error in the UEFI FW:
  ERROR: C40000002:V02010007 I0 93B80004-9FB3-11D4-9A3A-0090273FC14D 7E90A998;
- nvme-ok-delay.log : 20ms delay; coreboot debug_level=Error; NVMe ok;
- nvme-ok.log : no delay; coreboot debug_level=Spew; NVMe ok: the coreboot log output is enough to make NVMe work properly;

The NVMe is in the root port 00:0b.0, it is shown as 04:00.0

Thanks,
Sumo

On Mon, Aug 16, 2021 at 2:57 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
Dear Sumo,


Am 16.08.21 um 18:38 schrieb Sumo:

> The NVMe is not detected when serial console logs are disabled, I mean by
> setting both Coreboot log_level=Error (or less) and FSP
> PcdFspDebugPrintErrorLevel=NoDebug. Looks like the enumeration fails then
> further on the device is not listed in the UEFI FW (same issue shown in
> either CorebootPayloadPkg or UefiPayloadPkg). When Linux boots the device
> appears normally.
>
> The problem is fixed by adding a small delay inside dev_enumerate() - a
> 20ms delay at the very beginning of the function is enough. I'm wondering
> if there is a better solution for this, the device is already defined in
> the devicetree.cb (set as on). Maybe coreboot is too fast and the NVMe is
> still booting up - or the PCIe link is still training, not sure. Coreboot
> doesn't retry if the device is not detected right away?

Please share the logs without and with the delay.


Kind regards,

Paul