Hi Matt,
thanks for bringing the topic up. Please also contact your Intel reps about this via commercial support channel as well. I believe Patrick G. once stated that he could act as a relay when it comes to disputes between commercial and community development 'strategies'.
On Wed, Sep 25, 2019 at 8:48 PM Matt DeVillier matt.devillier@gmail.com wrote:
Commit 903b40a [soc/intel: Replace uses of dev_find_slot()] replaced calls to dev_find_slot() with calls to pcidev_path_on_root() for all Intel common SoCs, but it seems unclear exactly what problem this was intended to fix, and has created problems with locating hidden PCI devices.
I believe [1] commit explains dev_find_slot() with enough details.
Commit f2ac0137 [soc/intel: Fix regression with hidden PCI devices] fixed this partially by creating a debug version of pcidev_path_on_root() which falls back on dev_find_slot(), but did not apply it to all devices which need it. On SKL/KBL alone, there are at least 5 different function calls accessing a dozen PCI devices which require falling back on dev_find_slot().
Kyosti has expressed a desire to eliminate the use of dev_find_slot() since it can potentially return false positives prior to device enumeration in ramstage, but as currently implemented the cure seems worse than the disease.
Short term, it seems like having pcidev_path_on_root() fall back on dev_find_slot() with a loud warning (like pcidev_path_on_root_debug() does now) makes the most sense, vs having two nearly identical function calls used inconsistently. Long term, we need a better strategy for dealing with PCI devices which get hidden by FSP / are in violation of PCI spec.
But since discussion on Gerrit seems to have died, reviving it here for a larger audience.
I will quote below what I already wrote in gerrit [2], [3]:
The trouble is the entire PCI subsystem in ramstage is based on matching the vendor/device ID register with a PCI driver and to the source we want to control that device with. To allow this hiding of PCI devices will ultimately force us to write the control somewhere in the scope of SOC instead. Oh, but wait, perhaps the intention is for us to __not__ write that control anymore but let the FSP blob do all that too!
AFAICS, hardware that does not respond to vendor/device ID register reads does not meet PCI compliance. I am willing to hit +2 on removing support for platforms that do not meet PCI compliance, specially when in the cases here, it is a matter of broken FSP blobs and not broken silicon per-se.
Also, I should not be accepting new callers for dev_find_slot() due the ill semantics it has. Prior to device enumeration, it can return false positives because all devices appear on bus 0. So please look for alternative solution if you want to support Intel's initiative of more blob less FOSS.
I suggest you post on the mailing list. That active PCI devices no longer respond to Vendor ID / Device ID queries does not meet PCI compliance, as I understand the specs. Unfortunately, someone with enough authority inside the org will likely decide it's just fine that ramstage will no longer be designed using PCI drivers and allow use of shim calling massive FSP blob.
As for solutions:
Preferably fix FSP and not allow this PCI hide-and-seek.
or
Rewrite PCI device enumeration such that devices that do not respond to ID queries are not permanently removed from topology links. You need platform hooks to decide whether to remove a particular device (based on B:D.F) or not. This would avoid ill dev_find_slot() then.
or
Revert to __SIMPLE_DEVICE__ for PCI access. But __SIMPLE_DEVICE__ is also on my kill list due to uglyness and high maintenance whenever we attempt to move sources to earlier stages. Unfortunately things found in soc/intel has slowed down the process to standstill and work is not funded.
[1] https://review.coreboot.org/c/coreboot/+/26447 [2] https://review.coreboot.org/c/coreboot/+/35087 [3] https://review.coreboot.org/c/coreboot/+/35088
Regards, Kyösti Mälkki