Hi,
Anything else can be tried instead of using simple delays? Perhaps during the enumeration for some devices a delay is really required, for the NVMe case the vendor/device ID pair isn't detected at all... I'm not sure if the PCIe link is still training (I don't have an analyzer) or if the device is still booting...
Kind regards, Sumo
On Tue, Aug 17, 2021 at 8:34 AM Sumo kingsumos@gmail.com wrote:
Hi,
I have managed to disable the UART console and collect the logs via cbmem tool. Therefore it will not add any additional delay even with debug logs enabled so we can compare the logs with and without the delay.
Here are my findings:
- without the delay in dev_enumerate() the NVMe because isn't detected at
all (i.e. the PCIe device isn't shown in the bus): PCI: 00:0b.0 scanning... do_pci_scan_bridge for PCI: 00:0b.0 PCI: 00:0b.0: Enabled LTR PCI: pci_scan_bus for bus 04 POST: 0x24
*PCI: Static device PCI: 04:00.0 not found, disabling it.*POST: 0x25 PCI: Leftover static devices: PCI: 04:00.0 PCI: Check your devicetree.cb. POST: 0x55 scan_bus: bus PCI: 00:0b.0 finished in 0 msecs
- by adding the delay the device is detected and initialized:
PCI: 00:0b.0 scanning... do_pci_scan_bridge for PCI: 00:0b.0 PCI: 00:0b.0: Enabled LTR PCI: pci_scan_bus for bus 04 POST: 0x24
*PCI: 04:00.0 [1987/5012] enabled*POST: 0x25 POST: 0x55 Enabling Common Clock Configuration PCIE CLK PM is not supported by endpoint L1 Sub-State supported from root port 11 L1 Sub-State Support = 0xf CommonModeRestoreTime = 0x28 Power On Value = 0x6, Power On Scale = 0x1 ASPM: Enabled L1 PCIe: Max_Payload_Size adjusted to 256 PCI: 04:00.0: Enabled LTR PCI: 04:00.0: Programmed LTR max latencies scan_bus: bus PCI: 00:0b.0 finished in 0 msecs
Also, another device is failing if I remove the delay - it's a I211 gigabit ethernet controller: PCI: 00:0f.0 scanning... do_pci_scan_bridge for PCI: 00:0f.0 PCI: 00:0f.0: Enabled LTR PCI: pci_scan_bus for bus 05 POST: 0x24
*PCI: Static device PCI: 05:00.0 not found, disabling it.*POST: 0x25 PCI: Leftover static devices: PCI: 05:00.0 PCI: Check your devicetree.cb. POST: 0x55 scan_bus: bus PCI: 00:0f.0 finished in 0 msecs
With the delay the I211 is detected: PCI: 00:0f.0 scanning... do_pci_scan_bridge for PCI: 00:0f.0 PCI: 00:0f.0: Enabled LTR PCI: pci_scan_bus for bus 05 POST: 0x24
*PCI: 05:00.0 [8086/1539] enabled*POST: 0x25 POST: 0x55 Enabling Common Clock Configuration PCIE CLK PM is not supported by endpoint ASPM: Enabled L1 PCIe: Max_Payload_Size adjusted to 256 PCI: 05:00.0: No LTR support scan_bus: bus PCI: 00:0f.0 finished in 0 msecs
Full logs are attached.
Kind regards, Sumo
On Mon, Aug 16, 2021 at 8:01 PM Sumo kingsumos@gmail.com wrote:
Hi Paul,
When logs are (almost) disabled the error isn't shown, so if I add the delay with logs disabled the log output will have almost no difference at all.
Following are the logs, including a log with Coreboot debug enabled + no delay. For all logs FSP loglevel is set to NoDebug:
- nvme-err.log : no delay; coreboot debug_level=Error; NVMe error: at the
end of the log is shown the error in the UEFI FW: ERROR: C40000002:V02010007 I0 93B80004-9FB3-11D4-9A3A-0090273FC14D 7E90A998;
- nvme-ok-delay.log : 20ms delay; coreboot debug_level=Error; NVMe ok;
- nvme-ok.log : no delay; coreboot debug_level=Spew; NVMe ok: the
coreboot log output is enough to make NVMe work properly;
The NVMe is in the root port 00:0b.0, it is shown as 04:00.0
Thanks, Sumo
On Mon, Aug 16, 2021 at 2:57 PM Paul Menzel pmenzel@molgen.mpg.de wrote:
Dear Sumo,
Am 16.08.21 um 18:38 schrieb Sumo:
The NVMe is not detected when serial console logs are disabled, I mean
by
setting both Coreboot log_level=Error (or less) and FSP PcdFspDebugPrintErrorLevel=NoDebug. Looks like the enumeration fails
then
further on the device is not listed in the UEFI FW (same issue shown in either CorebootPayloadPkg or UefiPayloadPkg). When Linux boots the
device
appears normally.
The problem is fixed by adding a small delay inside dev_enumerate() - a 20ms delay at the very beginning of the function is enough. I'm
wondering
if there is a better solution for this, the device is already defined
in
the devicetree.cb (set as on). Maybe coreboot is too fast and the NVMe
is
still booting up - or the PCIe link is still training, not sure.
Coreboot
doesn't retry if the device is not detected right away?
Please share the logs without and with the delay.
Kind regards,
Paul