On Thu, Sep 26, 2019 at 8:51 AM Kyösti Mälkki <kyosti.malkki(a)gmail.com>
thanks for bringing the topic up. Please also contact your Intel reps
about this via commercial support channel as well. I believe Patrick
G. once stated that he could act as a relay when it comes to disputes
between commercial and community development 'strategies'.
On Wed, Sep 25, 2019 at 8:48 PM Matt DeVillier <matt.devillier(a)gmail.com>
Commit 903b40a [soc/intel: Replace uses of dev_find_slot()] replaced
dev_find_slot() with calls to pcidev_path_on_root() for all Intel
common SoCs, but it seems unclear exactly what problem this was intended to
fix, and has created problems with locating hidden PCI devices.
I believe  commit explains dev_find_slot() with enough details.
Commit f2ac0137 [soc/intel: Fix regression with
hidden PCI devices]
fixed this partially by creating a debug version of
which falls back on dev_find_slot(), but did not apply it to all devices
which need it. On SKL/KBL alone, there are at least 5 different function
calls accessing a dozen PCI devices which require falling back on
Kyosti has expressed a desire to eliminate the use of dev_find_slot()
since it can
potentially return false positives prior to device enumeration
in ramstage, but as currently implemented the cure seems worse than the
Short term, it seems like having pcidev_path_on_root() fall back on
dev_find_slot() with a loud warning (like pcidev_path_on_root_debug() does
now) makes the most sense, vs having two nearly identical function calls
used inconsistently. Long term, we need a better strategy for dealing with
PCI devices which get hidden by FSP / are in violation of PCI spec.
But since discussion on Gerrit seems to have died, reviving it here for
I just noticed these commits and have commented there. :)
I will quote below what I already wrote in gerrit , :
The trouble is the entire PCI subsystem in ramstage is based on
matching the vendor/device ID register with a PCI driver and to the
source we want to control that device with. To allow this hiding of
PCI devices will ultimately force us to write the control somewhere in
the scope of SOC instead. Oh, but wait, perhaps the intention is for
us to __not__ write that control anymore but let the FSP blob do all
I think we can provide a semantic of "possibly hidden" devices and provide
operations which do the correct thing provided the platform constraints. I
do want to understand the complete problem, though.
My current understanding is that we're forcing a device to be unhidden
after a sequence we know that hides a particular device. And we're
attempting to put it back to hidden permanently because of policy enforced
by a silicon vendor. However, if an access comes after the hide sequence we
trip over the NULL check if the device was removed from the topology. What
I couldn't figure out was what caused the need for f2ac0137? p2sb was
working still? The commits don't have more information as to the
circumstances of the failure.
AFAICS, hardware that does not respond to vendor/device ID register
reads does not meet PCI compliance. I am willing to hit +2 on removing
support for platforms that do not meet PCI compliance, specially when
in the cases here, it is a matter of broken FSP blobs and not broken
I'm not in agreement on the policies/thinking from all the hardware
vendors. Many issues have arisen in not thinking through pci dev/fun
allocation. e.g. function 0 can be hidden but functions >1 should be
around. Definitely a violation of PCI spec and many things have been worked
That said, computer architecture/design is messy, and I think it's naive to
think everything is spec compliant. There are complex fabrics and fabric
policies in all these devices that are utilized to provide a "pci". And as
firmware people we need to deal with the realities of the hardware. That's
pretty much the point of firmware: prepare the abstractions for the OS to
make those assumptions of aligning with specs. Even then, some things just
can't be fixed (check out all the quirks and layering violations in the
Also, I should not be accepting new callers for dev_find_slot() due
the ill semantics it has. Prior to device enumeration, it can return
false positives because all devices appear on bus 0. So please look
for alternative solution if you want to support Intel's initiative of
more blob less FOSS.
dev_find_slot() failures have nothing to do with blobs or not. Certainly,
some of the circumstances have tripped up some of the assumptions made in
that code. And I agree we should move away from using that API and/or
I suggest you post on the mailing list. That active PCI devices no
longer respond to Vendor ID / Device ID queries does not meet PCI
compliance, as I understand the specs. Unfortunately, someone with
enough authority inside the org will likely decide it's just fine that
ramstage will no longer be designed using PCI drivers and allow use of
shim calling massive FSP blob.
As for solutions:
Preferably fix FSP and not allow this PCI hide-and-seek.
Rewrite PCI device enumeration such that devices that do not respond
to ID queries are not permanently removed from topology links. You
I thought that the devices being removed was part of the original problem?
need platform hooks to decide whether to remove a
(based on B:D.F) or not. This would avoid ill dev_find_slot() then.
I have some ideas on some solutions, but I want to understand the failure
that occurred after 903b40a.
Revert to __SIMPLE_DEVICE__ for PCI access. But __SIMPLE_DEVICE__ is
also on my kill list due to uglyness and high maintenance whenever we
attempt to move sources to earlier stages. Unfortunately things found
in soc/intel has slowed down the process to standstill and work is not
coreboot mailing list -- coreboot(a)coreboot.org
To unsubscribe send an email to coreboot-leave(a)coreboot.org