Now PCI bridges (and PCIE root port too) get a bus range number in system init, basing on currently plugged devices. That's why when one wants to hotplug another bridge, it needs his child bus, which the parent is unable to provide. The suggested workaround is to have vendor-specific capability in RedHat generic pcie-root-port that contains number of additional bus to reserve on BIOS PCI init.
Aleksandr Bezzubikov (2): pci: add support for direct usage of bdf for capability lookup pci: enable RedHat pci bridges to reserve more buses
src/fw/pciinit.c | 12 ++++++++++-- src/hw/pcidevice.c | 24 ++++++++++++++++++++++++ src/hw/pcidevice.h | 1 + 3 files changed, 35 insertions(+), 2 deletions(-)
Add a capability lookup function which gets bdf instead of pci_device as its first argument. It may be useful when we have bdf, but don't have the whole pci_device structure.
Signed-off-by: Aleksandr Bezzubikov zuban32s@gmail.com --- src/hw/pcidevice.c | 24 ++++++++++++++++++++++++ src/hw/pcidevice.h | 1 + 2 files changed, 25 insertions(+)
diff --git a/src/hw/pcidevice.c b/src/hw/pcidevice.c index cfebf66..3fa240e 100644 --- a/src/hw/pcidevice.c +++ b/src/hw/pcidevice.c @@ -158,6 +158,30 @@ u8 pci_find_capability(struct pci_device *pci, u8 cap_id, u8 cap) return 0; }
+u8 pci_find_capability_bdf(int bdf, u8 cap_id, u8 cap) +{ + int i; + u16 status = pci_config_readw(bdf, PCI_STATUS); + + if (!(status & PCI_STATUS_CAP_LIST)) + return 0; + + if (cap == 0) { + /* find first */ + cap = pci_config_readb(bdf, PCI_CAPABILITY_LIST); + } else { + /* find next */ + cap = pci_config_readb(bdf, cap + PCI_CAP_LIST_NEXT); + } + for (i = 0; cap && i <= 0xff; i++) { + if (pci_config_readb(bdf, cap + PCI_CAP_LIST_ID) == cap_id) + return cap; + cap = pci_config_readb(bdf, cap + PCI_CAP_LIST_NEXT); + } + + return 0; +} + // Enable PCI bus-mastering (ie, DMA) support on a pci device void pci_enable_busmaster(struct pci_device *pci) diff --git a/src/hw/pcidevice.h b/src/hw/pcidevice.h index 354b549..e4ed5cf 100644 --- a/src/hw/pcidevice.h +++ b/src/hw/pcidevice.h @@ -70,6 +70,7 @@ int pci_init_device(const struct pci_device_id *ids struct pci_device *pci_find_init_device(const struct pci_device_id *ids , void *arg); u8 pci_find_capability(struct pci_device *pci, u8 cap_id, u8 cap); +u8 pci_find_capability_bdf(int bdf, u8 cap_id, u8 cap); void pci_enable_busmaster(struct pci_device *pci); u16 pci_enable_iobar(struct pci_device *pci, u32 addr); void *pci_enable_membar(struct pci_device *pci, u32 addr);
On 19/07/2017 16:20, Aleksandr Bezzubikov wrote:
Add a capability lookup function which gets bdf instead of pci_device as its first argument. It may be useful when we have bdf, but don't have the whole pci_device structure.
Signed-off-by: Aleksandr Bezzubikov zuban32s@gmail.com
src/hw/pcidevice.c | 24 ++++++++++++++++++++++++ src/hw/pcidevice.h | 1 + 2 files changed, 25 insertions(+)
diff --git a/src/hw/pcidevice.c b/src/hw/pcidevice.c index cfebf66..3fa240e 100644 --- a/src/hw/pcidevice.c +++ b/src/hw/pcidevice.c @@ -158,6 +158,30 @@ u8 pci_find_capability(struct pci_device *pci, u8 cap_id, u8 cap) return 0; }
Hi Aleksandr,
+u8 pci_find_capability_bdf(int bdf, u8 cap_id, u8 cap) +{
Please do not duplicate the code for the 'pci_find_capability'. In this case you can reuse the code by making 'pci_find_capability' call 'pci_find_capability_bdf', or better, instead of a new function, change the signature as you proposed and make the calling code use bdf instead of the pci device, since is the only info needed.
Thanks, Marcel
- int i;
- u16 status = pci_config_readw(bdf, PCI_STATUS);
- if (!(status & PCI_STATUS_CAP_LIST))
return 0;
- if (cap == 0) {
/* find first */
cap = pci_config_readb(bdf, PCI_CAPABILITY_LIST);
- } else {
/* find next */
cap = pci_config_readb(bdf, cap + PCI_CAP_LIST_NEXT);
- }
- for (i = 0; cap && i <= 0xff; i++) {
if (pci_config_readb(bdf, cap + PCI_CAP_LIST_ID) == cap_id)
return cap;
cap = pci_config_readb(bdf, cap + PCI_CAP_LIST_NEXT);
- }
- return 0;
+}
- // Enable PCI bus-mastering (ie, DMA) support on a pci device void pci_enable_busmaster(struct pci_device *pci)
diff --git a/src/hw/pcidevice.h b/src/hw/pcidevice.h index 354b549..e4ed5cf 100644 --- a/src/hw/pcidevice.h +++ b/src/hw/pcidevice.h @@ -70,6 +70,7 @@ int pci_init_device(const struct pci_device_id *ids struct pci_device *pci_find_init_device(const struct pci_device_id *ids , void *arg); u8 pci_find_capability(struct pci_device *pci, u8 cap_id, u8 cap); +u8 pci_find_capability_bdf(int bdf, u8 cap_id, u8 cap); void pci_enable_busmaster(struct pci_device *pci); u16 pci_enable_iobar(struct pci_device *pci, u32 addr); void *pci_enable_membar(struct pci_device *pci, u32 addr);
In case of RedHat PCI bridges reserve additional buses, which number is provided in a vendor-specific capability.
Signed-off-by: Aleksandr Bezzubikov zuban32s@gmail.com --- src/fw/pciinit.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c index 08221e6..b6f3a01 100644 --- a/src/fw/pciinit.c +++ b/src/fw/pciinit.c @@ -578,9 +578,17 @@ pci_bios_init_bus_rec(int bus, u8 *pci_bus) pci_bios_init_bus_rec(secbus, pci_bus);
if (subbus != *pci_bus) { + u16 vendor = pci_config_readw(bdf, PCI_VENDOR_ID); + u8 res_bus = 0; + if (vendor == 0x1b36) { + u8 cap = pci_find_capability_bdf(bdf, PCI_CAP_ID_VNDR, 0); + if (cap) { + res_bus = pci_config_readb(bdf, cap + 16); + } + } dprintf(1, "PCI: subordinate bus = 0x%x -> 0x%x\n", - subbus, *pci_bus); - subbus = *pci_bus; + subbus, *pci_bus + res_bus); + subbus = *pci_bus + res_bus; } else { dprintf(1, "PCI: subordinate bus = 0x%x\n", subbus); }
On Wed, Jul 19, 2017 at 04:20:14PM +0300, Aleksandr Bezzubikov wrote:
In case of RedHat PCI bridges reserve additional buses, which number is provided
It is "Red Hat"
in a vendor-specific capability.
And perhaps also a #define ?
Signed-off-by: Aleksandr Bezzubikov zuban32s@gmail.com
src/fw/pciinit.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c index 08221e6..b6f3a01 100644 --- a/src/fw/pciinit.c +++ b/src/fw/pciinit.c @@ -578,9 +578,17 @@ pci_bios_init_bus_rec(int bus, u8 *pci_bus) pci_bios_init_bus_rec(secbus, pci_bus);
if (subbus != *pci_bus) {
u16 vendor = pci_config_readw(bdf, PCI_VENDOR_ID);
u8 res_bus = 0;
if (vendor == 0x1b36) {
u8 cap = pci_find_capability_bdf(bdf, PCI_CAP_ID_VNDR, 0);
if (cap) {
res_bus = pci_config_readb(bdf, cap + 16);
}
} dprintf(1, "PCI: subordinate bus = 0x%x -> 0x%x\n",
subbus, *pci_bus);
subbus = *pci_bus;
subbus, *pci_bus + res_bus);
subbus = *pci_bus + res_bus; } else { dprintf(1, "PCI: subordinate bus = 0x%x\n", subbus); }
-- 2.7.4
On 19/07/2017 16:56, Konrad Rzeszutek Wilk wrote:
On Wed, Jul 19, 2017 at 04:20:14PM +0300, Aleksandr Bezzubikov wrote:
In case of RedHat PCI bridges reserve additional buses, which number is provided
It is "Red Hat"
in a vendor-specific capability.
And perhaps also a #define ?
Right, please add it to src/hw/pci_ids.h.
Thanks, Marcel
Signed-off-by: Aleksandr Bezzubikov zuban32s@gmail.com
src/fw/pciinit.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c index 08221e6..b6f3a01 100644 --- a/src/fw/pciinit.c +++ b/src/fw/pciinit.c @@ -578,9 +578,17 @@ pci_bios_init_bus_rec(int bus, u8 *pci_bus) pci_bios_init_bus_rec(secbus, pci_bus);
if (subbus != *pci_bus) {
u16 vendor = pci_config_readw(bdf, PCI_VENDOR_ID);
u8 res_bus = 0;
if (vendor == 0x1b36) {
u8 cap = pci_find_capability_bdf(bdf, PCI_CAP_ID_VNDR, 0);
if (cap) {
res_bus = pci_config_readb(bdf, cap + 16);
}
} dprintf(1, "PCI: subordinate bus = 0x%x -> 0x%x\n",
subbus, *pci_bus);
subbus = *pci_bus;
subbus, *pci_bus + res_bus);
subbus = *pci_bus + res_bus; } else { dprintf(1, "PCI: subordinate bus = 0x%x\n", subbus); }
-- 2.7.4
On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote:
Now PCI bridges (and PCIE root port too) get a bus range number in system init, basing on currently plugged devices. That's why when one wants to hotplug another bridge, it needs his child bus, which the parent is unable to provide.
Could you explain how you trigger this?
The suggested workaround is to have vendor-specific capability in RedHat generic pcie-root-port that contains number of additional bus to reserve on BIOS PCI init.
But wouldn't the proper fix be for the PCI bridge to have the subordinate value be extended to fit more bus ranges?
Aleksandr Bezzubikov (2): pci: add support for direct usage of bdf for capability lookup pci: enable RedHat pci bridges to reserve more buses
src/fw/pciinit.c | 12 ++++++++++-- src/hw/pcidevice.c | 24 ++++++++++++++++++++++++ src/hw/pcidevice.h | 1 + 3 files changed, 35 insertions(+), 2 deletions(-)
-- 2.7.4
ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk konrad.wilk@oracle.com:
On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote:
Now PCI bridges (and PCIE root port too) get a bus range number in
system init,
basing on currently plugged devices. That's why when one wants to
hotplug another bridge,
it needs his child bus, which the parent is unable to provide.
Could you explain how you trigger this?
I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux says 'cannot allocate bus number for device bla-bla'. This obviously does not allow me to use the bridge at all.
The suggested workaround is to have vendor-specific capability in RedHat
generic pcie-root-port
that contains number of additional bus to reserve on BIOS PCI init.
But wouldn't the proper fix be for the PCI bridge to have the subordinate value be extended to fit more bus ranges?
What do you mean? This is what I'm trying to do. Do you suppose to get rid of vendor-specific cap and use original register value instead of it?
Aleksandr Bezzubikov (2): pci: add support for direct usage of bdf for capability lookup pci: enable RedHat pci bridges to reserve more buses
src/fw/pciinit.c | 12 ++++++++++-- src/hw/pcidevice.c | 24 ++++++++++++++++++++++++ src/hw/pcidevice.h | 1 + 3 files changed, 35 insertions(+), 2 deletions(-)
-- 2.7.4
On Wed, 2017-07-19 at 16:20 +0300, Aleksandr Bezzubikov wrote:
Now PCI bridges (and PCIE root port too) get a bus range number in system init, basing on currently plugged devices. That's why when one wants to hotplug another bridge, it needs his child bus, which the parent is unable to provide. The suggested workaround is to have vendor-specific capability in RedHat generic pcie-root-port that contains number of additional bus to reserve on BIOS PCI init.
Where is the qemu patch for this?
What about window sizes? IIRC there was a plan to provide allocation hints for them too ...
cheers, Gerd
On 20/07/2017 9:52, Gerd Hoffmann wrote:
On Wed, 2017-07-19 at 16:20 +0300, Aleksandr Bezzubikov wrote:
Now PCI bridges (and PCIE root port too) get a bus range number in system init, basing on currently plugged devices. That's why when one wants to hotplug another bridge, it needs his child bus, which the parent is unable to provide. The suggested workaround is to have vendor-specific capability in RedHat generic pcie-root-port that contains number of additional bus to reserve on BIOS PCI init.
Hi Gerd, Thanks for looking into this.
Where is the qemu patch for this?
On the way, Aleksandr is working on it, should be ready soon.
What about window sizes? IIRC there was a plan to provide allocation hints for them too ...
Yes, is in my TODO list, however not as part of Aleksandr's series which aims to provide PCIe-PCI bridge hotplug support. I should mention that for this case both Windows and Linux guests (modern OSes) trigger correctly IO/MEM resource re-balancing process and succeeds to complete the hotplug operation.
Thanks, Marcel
cheers, Gerd
Hi,
What about window sizes? IIRC there was a plan to provide allocation hints for them too ...
Yes, is in my TODO list, however not as part of Aleksandr's series which aims to provide PCIe-PCI bridge hotplug support.
I'd prefer to have a single vendor capability for all resource allocation hints provided by qemu.
cheers, Gerd
On 21/07/2017 13:04, Gerd Hoffmann wrote:
Hi,
What about window sizes? IIRC there was a plan to provide allocation hints for them too ...
Yes, is in my TODO list, however not as part of Aleksandr's series which aims to provide PCIe-PCI bridge hotplug support.
I'd prefer to have a single vendor capability for all resource allocation hints provided by qemu.
[Adding Laszlo]
Sure, the capability looking something like:
[flags: reserve-buses|reserve-IO|reserve-MEM|...] [extra-buses][IO-size][MEM-size]
if reserve-buses -> use 'extra-buses' value and so on
would be OK?
Thanks, Marcel
cheers, Gerd
On Fri, 2017-07-21 at 15:15 +0300, Marcel Apfelbaum wrote:
On 21/07/2017 13:04, Gerd Hoffmann wrote:
Hi,
What about window sizes? IIRC there was a plan to provide allocation hints for them too ...
Yes, is in my TODO list, however not as part of Aleksandr's series which aims to provide PCIe-PCI bridge hotplug support.
I'd prefer to have a single vendor capability for all resource allocation hints provided by qemu.
[Adding Laszlo]
Sure, the capability looking something like:
[flags: reserve-buses|reserve-IO|reserve-MEM|...] [extra-buses][IO-size][MEM-size]
if reserve-buses -> use 'extra-buses' value and so on
Do we need the flags? We can use "value == 0 -> no hint for you". Also what about prefetchable vs. non-prefetchable memory? I guess we want a size hint for both memory windows?
cheers, Gerd
On 21/07/2017 15:42, Gerd Hoffmann wrote:
On Fri, 2017-07-21 at 15:15 +0300, Marcel Apfelbaum wrote:
On 21/07/2017 13:04, Gerd Hoffmann wrote:
Hi,
What about window sizes? IIRC there was a plan to provide allocation hints for them too ...
Yes, is in my TODO list, however not as part of Aleksandr's series which aims to provide PCIe-PCI bridge hotplug support.
I'd prefer to have a single vendor capability for all resource allocation hints provided by qemu.
[Adding Laszlo]
Sure, the capability looking something like:
[flags: reserve-buses|reserve-IO|reserve-MEM|...] [extra-buses][IO-size][MEM-size] if reserve-buses -> use 'extra-buses' value and so on
Do we need the flags? We can use "value == 0 -> no hint for you".
Maybe we don't want to allocate IO at all, So value 0 would say do not allocate.
But we can disable IO by other means.
Also what about prefetchable vs. non-prefetchable memory? I guess we want a size hint for both memory windows?
Good point, thanks! Marcel
cheers, Gerd
Hi,
Maybe we don't want to allocate IO at all, So value 0 would say do not allocate.
But we can disable IO by other means.
Indeed. As IO is optional for pcie we don't need a qemu-specific extension for that.
cheers, Gerd
On Fri, Jul 21, 2017 at 03:15:46PM +0300, Marcel Apfelbaum wrote:
On 21/07/2017 13:04, Gerd Hoffmann wrote:
I'd prefer to have a single vendor capability for all resource allocation hints provided by qemu.
Sure, the capability looking something like:
[flags: reserve-buses|reserve-IO|reserve-MEM|...] [extra-buses][IO-size][MEM-size]
if reserve-buses -> use 'extra-buses' value and so on
I don't have any objection to using a PCI capability, but I do wonder if fw_cfg would be a better fit. This information is purely qemu -> firmware, right?
-Kevin
On 21/07/2017 20:28, Kevin O'Connor wrote:
On Fri, Jul 21, 2017 at 03:15:46PM +0300, Marcel Apfelbaum wrote:
On 21/07/2017 13:04, Gerd Hoffmann wrote:
I'd prefer to have a single vendor capability for all resource allocation hints provided by qemu.
Sure, the capability looking something like:
[flags: reserve-buses|reserve-IO|reserve-MEM|...] [extra-buses][IO-size][MEM-size] if reserve-buses -> use 'extra-buses' value and so on
I don't have any objection to using a PCI capability, but I do wonder if fw_cfg would be a better fit. This information is purely qemu -> firmware, right?
Hi Kevin,
Right, but theoretically speaking a guest OS driver could also get the hint on hotplug, or simply because the OS chooses to re-assign resources on its own.
A while ago we discussed the fw_cfg option, but Gerd preferred the vendor capability, and since the capability looked cleaner Aleksandr opted for it.
He will send V2 soon together with the QEMU counterpart feature.
Thanks, Marcel
-Kevin
On Fri, 2017-07-21 at 13:28 -0400, Kevin O'Connor wrote:
On Fri, Jul 21, 2017 at 03:15:46PM +0300, Marcel Apfelbaum wrote:
On 21/07/2017 13:04, Gerd Hoffmann wrote:
I'd prefer to have a single vendor capability for all resource allocation hints provided by qemu.
Sure, the capability looking something like:
[flags: reserve-buses|reserve-IO|reserve-MEM|...] [extra-buses][IO-size][MEM-size]
if reserve-buses -> use 'extra-buses' value and so on
I don't have any objection to using a PCI capability, but I do wonder if fw_cfg would be a better fit. This information is purely qemu -> firmware, right?
fw_cfg quickly becomes messy if you want allow different hints per bridge.
cheers, Gerd