[SeaBIOS] [RFC v3] pciinit: setup mcfg for pxb-pcie to support multiple pci domains

Laszlo Ersek lersek at redhat.com
Tue Sep 25 19:17:36 CEST 2018


On 09/25/18 17:38, Kevin O'Connor wrote:
> On Mon, Sep 17, 2018 at 11:02:59PM +0800, Zihan Yang wrote:
>> To support multiple pci domains of pxb-pcie device in qemu, we need to setup
>> mcfg range in seabios. We use [0x80000000, 0xb0000000) to hold new domain mcfg
>> table for now, and we need to retrieve the desired mcfg size of each pxb-pcie
>> from a hidden bar because they may not need the whole 256 busses, which also
>> enables us to support more domains within a limited range (768MB)
> 
> At a highlevel, this looks okay to me.  I'd like to see additional
> reviews from others more familiar with the QEMU PCI code, though.
> 
> Is the plan to do the same thing for OVMF?

I remain entirely unconvinced that this feature is useful. (I've stated
so before.)

I believe the latest QEMU RFC posting (v5) is here:

[Qemu-devel] [RFC v5 0/6] pci_expander_brdige: support separate pci
domain for pxb-pcie

http://mid.mail-archive.com/1537196258-12581-1-git-send-email-whois.zihan.yang@gmail.com

First, I fail to see the use case where ~256 PCI bus numbers aren't
enough. If I strain myself, perhaps I can imagine using ~200 PCIe root
ports on Q35 (each of which requires a separate bus number), so that we
can independently hot-plug 200 devices then. And that's supposedly not
enough, because we want... 300? 400? A thousand? Doesn't sound realistic
to me. (This is not meant to be a strawman argument, I really have no
idea what the feature would be useful for.)

Second, the v5 RFC doesn't actually address the alleged bus number
shortage. IIUC, it supports a low number of ECAM ranges under 4GB, but
those are (individually) limited in the bus number ranges they can
accommodate (due to 32-bit address space shortage). So more or less the
current approach just fragments the bus number space we already have, to
multiple domains.

Third, should a subsequent iteration of the QEMU series put those extra
ECAMs above 4GB, with the intent to leave the enumeration of those
hierarchies to the "guest OS", it would present an incredible
implementation mess for OVMF. If people gained the ability to attach
storage or network to those domains, on the QEMU command line, they
would expect to boot off of them, using UEFI. Then OVMF would have to
make sure the controllers could be bound by their respective UEFI
drivers. That in turn would require functional config space access
(ECAM) at semi-random 64-bit addresses.

The layout of the 64-bit address space is already pretty darn
complicated in OVMF; it depends on guest RAM size, DIMM hotplug area
size, and it already dictates the base of the 64-bit MMIO aperture for
BAR allocations. It also dictates how high up the DXE Core should build
the page tables (you can only access 64-bit addresses if the 1:1 page
tables built by the DXE Core cover them).

Obviously, UEFI on physical machines does support multiple PCI domains.
There are a number of differences however:

- Both the ECAM ranges, and the MMIO apertures (for BAR allocation) of
the disparate host bridges are distinct. On QEMU / OVMF, the latter part
is not true (about the MMIO apertures), and this has already required us
to write some nasty quirks for supposedly platform-independent core code
in edk2.

- The same address ranges mentioned in the previous bullet are known in
advance (they are "static"). That's a *huge* simplification opportunity
to physical platform code (which is most of the time not even open
source, let alone upstreamed to edk2), because the engineers can lay out
the 64-bit address range, and deduce all the related artifacts from that
layout, on paper, at the office.

It's a proper mess with a lot of opportunity for regressions, and I just
don't see the bang for the buck.

(I didn't mean to re-hash my opinion yet again -- in the QEMU RFC v5
thread, I saw references to OVMF, stating that OVMF would not support
this. I was 100% fine with those mentions, but here you asked
explicitly... Some ideas just make me rant, my apologies.)

Thanks
Laszlo

> -Kevin
> 
>>
>> Signed-off-by: Zihan Yang <whois.zihan.yang at gmail.com>
>> ---
>>  src/fw/dev-q35.h |  7 +++++++
>>  src/fw/pciinit.c | 32 ++++++++++++++++++++++++++++++++
>>  src/hw/pci_ids.h |  1 +
>>  3 files changed, 40 insertions(+)
>>
>> diff --git a/src/fw/dev-q35.h b/src/fw/dev-q35.h
>> index 201825d..229cd81 100644
>> --- a/src/fw/dev-q35.h
>> +++ b/src/fw/dev-q35.h
>> @@ -49,4 +49,11 @@
>>  #define ICH9_APM_ACPI_ENABLE           0x2
>>  #define ICH9_APM_ACPI_DISABLE          0x3
>>  
>> +#define PXB_PCIE_HOST_BRIDGE_MCFG_BAR              0x50    /* 64bit register */
>> +#define PXB_PCIE_HOST_BRIDGE_MCFG_SIZE             0x58    /* 32bit register */
>> +#define PXB_PCIE_HOST_BRIDGE_ENABLE                Q35_HOST_BRIDGE_PCIEXBAREN
>> +/* pxb-pcie can use [0x80000000, 0xb0000000), be careful not to overflow */
>> +#define PXB_PCIE_HOST_BRIDGE_MCFG_SIZE_ADDR        0x80000000
>> +#define PXB_PCIE_HOST_BRIDGE_MCFG_SIZE_ADDR_UPPER Q35_HOST_BRIDGE_PCIEXBAR_ADDR
>> +
>>  #endif // dev-q35.h
>> diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
>> index c0634bc..e0ac22c 100644
>> --- a/src/fw/pciinit.c
>> +++ b/src/fw/pciinit.c
>> @@ -51,6 +51,7 @@ u64 pcimem_end     = BUILD_PCIMEM_END;
>>  u64 pcimem64_start = BUILD_PCIMEM64_START;
>>  u64 pcimem64_end   = BUILD_PCIMEM64_END;
>>  u64 pci_io_low_end = 0xa000;
>> +u64 pxb_pcie_mcfg_base = PXB_PCIE_HOST_BRIDGE_MCFG_SIZE_ADDR;
>>  
>>  struct pci_region_entry {
>>      struct pci_device *dev;
>> @@ -507,11 +508,42 @@ static void mch_mem_addr_setup(struct pci_device *dev, void *arg)
>>          pci_io_low_end = acpi_pm_base;
>>  }
>>  
>> +static void pxb_pcie_mem_addr_setup(struct pci_device *dev, void *arg)
>> +{
>> +    u64 mcfg_base;
>> +    u32 mcfg_size = pci_config_readl(dev->bdf, PXB_PCIE_HOST_BRIDGE_MCFG_SIZE);
>> +
>> +    /* 0 means this pxb-pcie still resides in pci domain 0 */
>> +    if (mcfg_size == 0)
>> +        return;
>> +
>> +    if (pxb_pcie_mcfg_base + mcfg_size >
>> +                             PXB_PCIE_HOST_BRIDGE_MCFG_SIZE_ADDR_UPPER) {
>> +        dprintf(1, "PCI: Not enough space to hold new pci domains\n");
>> +        return;
>> +    }
>> +
>> +    mcfg_base = pxb_pcie_mcfg_base;
>> +    pxb_pcie_mcfg_base += mcfg_size;
>> +
>> +    /* First clear old mmio, taken care of by QEMU */
>> +    pci_config_writel(dev->bdf, PXB_PCIE_HOST_BRIDGE_MCFG_BAR, 0);
>> +    /* Update MCFG base */
>> +    pci_config_writel(dev->bdf, PXB_PCIE_HOST_BRIDGE_MCFG_BAR + 4,
>> +                      mcfg_base >> 32);
>> +    pci_config_writel(dev->bdf, PXB_PCIE_HOST_BRIDGE_MCFG_BAR,
>> +                      (mcfg_base & 0xffffffff) | PXB_PCIE_HOST_BRIDGE_ENABLE);
>> +
>> +    e820_add(mcfg_base, mcfg_size, E820_RESERVED);
>> +}
>> +
>>  static const struct pci_device_id pci_platform_tbl[] = {
>>      PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82441,
>>                 i440fx_mem_addr_setup),
>>      PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_Q35_MCH,
>>                 mch_mem_addr_setup),
>> +    PCI_DEVICE(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_PXB_HOST,
>> +               pxb_pcie_mem_addr_setup),
>>      PCI_DEVICE_END
>>  };
>>  
>> diff --git a/src/hw/pci_ids.h b/src/hw/pci_ids.h
>> index 1096461..b495920 100644
>> --- a/src/hw/pci_ids.h
>> +++ b/src/hw/pci_ids.h
>> @@ -2266,6 +2266,7 @@
>>  #define PCI_VENDOR_ID_REDHAT		0x1b36
>>  #define PCI_DEVICE_ID_REDHAT_ROOT_PORT	0x000C
>>  #define PCI_DEVICE_ID_REDHAT_BRIDGE	0x0001
>> +#define PCI_DEVICE_ID_REDHAT_PXB_HOST   0x000B
>>  
>>  #define PCI_VENDOR_ID_TEKRAM		0x1de1
>>  #define PCI_DEVICE_ID_TEKRAM_DC290	0xdc29
>> -- 
>> 2.7.4
>>
>>
>> _______________________________________________
>> SeaBIOS mailing list
>> SeaBIOS at seabios.org
>> https://mail.coreboot.org/mailman/listinfo/seabios




More information about the SeaBIOS mailing list