[SeaBIOS] [RFC v3] pciinit: setup mcfg for pxb-pcie to support multiple pci domains

Thu Sep 27 19:53:26 CEST 2018

* Zihan Yang (whois.zihan.yang at gmail.com) wrote:
> HI Laszlo
> Laszlo Ersek <lersek at redhat.com> 于2018年9月26日周三 上午1:17写道：
> >
> > On 09/25/18 17:38, Kevin O'Connor wrote:
> > > On Mon, Sep 17, 2018 at 11:02:59PM +0800, Zihan Yang wrote:
> > >> To support multiple pci domains of pxb-pcie device in qemu, we need to setup
> > >> mcfg range in seabios. We use [0x80000000, 0xb0000000) to hold new domain mcfg
> > >> table for now, and we need to retrieve the desired mcfg size of each pxb-pcie
> > >> from a hidden bar because they may not need the whole 256 busses, which also
> > >> enables us to support more domains within a limited range (768MB)
> > >
> > > At a highlevel, this looks okay to me.  I'd like to see additional
> > > reviews from others more familiar with the QEMU PCI code, though.
> > >
> > > Is the plan to do the same thing for OVMF?
> >
> > I remain entirely unconvinced that this feature is useful. (I've stated
> > so before.)
> >
> > I believe the latest QEMU RFC posting (v5) is here:
> >
> > [Qemu-devel] [RFC v5 0/6] pci_expander_brdige: support separate pci
> > domain for pxb-pcie
> >
> > http://mid.mail-archive.com/1537196258-12581-1-git-send-email-whois.zihan.yang@gmail.com
> >
> > First, I fail to see the use case where ~256 PCI bus numbers aren't
> > enough. If I strain myself, perhaps I can imagine using ~200 PCIe root
> > ports on Q35 (each of which requires a separate bus number), so that we
> > can independently hot-plug 200 devices then. And that's supposedly not
> > enough, because we want... 300? 400? A thousand? Doesn't sound realistic
> > to me. (This is not meant to be a strawman argument, I really have no
> > idea what the feature would be useful for.)
> 
> It might not be very intuitive, but it indeed exists. The very
> beginning discussion
> about 4 months ago has mentioned a possible use case, and I paste it here
> 
> - We have Ray from Intel trying to use 1000 virtio-net devices

why that many?

> - We may have a VM managing some backups (tapes), we may have a lot of these.

I'm curious; what does tape backup have to do with the number of PCI
slots/busses?

Dave

> - We may want indeed to create a nested solution as Michael mentioned.
> 
> The thread can be found in
> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg04667.html
> 
> Also, a later post from a person in Dell stated in the list that he would need
> this feature for Intel VMD In Dell EMC? I have no idea about the details,
> but since they went here for help, I guess they do can benefit from it somehow.
> 
> > Second, the v5 RFC doesn't actually address the alleged bus number
> > shortage. IIUC, it supports a low number of ECAM ranges under 4GB, but
> > those are (individually) limited in the bus number ranges they can
> > accommodate (due to 32-bit address space shortage). So more or less the
> > current approach just fragments the bus number space we already have, to
> > multiple domains.
> >
> > Third, should a subsequent iteration of the QEMU series put those extra
> > ECAMs above 4GB, with the intent to leave the enumeration of those
> > hierarchies to the "guest OS", it would present an incredible
> > implementation mess for OVMF. If people gained the ability to attach
> > storage or network to those domains, on the QEMU command line, they
> > would expect to boot off of them, using UEFI. Then OVMF would have to
> > make sure the controllers could be bound by their respective UEFI
> > drivers. That in turn would require functional config space access
> > (ECAM) at semi-random 64-bit addresses.
> 
> I'm not familiar with OVMF, so I'm afraid I don't know how to make it easier
> for OVMF, the division of 64bit space in OVMF is out of my purpose. There
> is no plan to implement it in OVMF for now, we just want to make the
> seabios/qemu patch a proof of concept.
> 
> As for the seabios, it access devices through port 0xcf8/0xcfc, which is binded
> to q35 host in qemu. If we want to change the mmconfig size of pxb-pcie
> (instead of using whole 256MB), we must know its desired size, which is passed
> as a hidden bar. Unfortunately the configuration space of pxb-pcie device
> cannot be accessed with 0xcf8/0xcfc because they are in different host bridge.
> At this time the ECAM is not configured yet, so no able to use MMIO too.
> In previous version, I tried to bind pxb host to other ports in qemu, so that we
> can use port io to access the config space of pxb-pcie, but it seems a
> little dirty,
> 
> Another issue is how seabios initialize things. It will first do pci_setup when
> things like RSDP is not loaded. It is inconvenient to retrieve MCFG table and
> other information, so we cannot infer the mmconfig addr and size from MCFG
> table in seabios.
> 
> Therefore we fall back to an alternative that we support 4x of devices as the
> first step, and let the guest os do the initialization. The inability to boot
> from devices in another domain is indeed an issue, and we don't have very
> good solution to it yet.
> 
> Things might change in the future if we can figure out a better solution, and I
> hope we can have an easier and more elegant solution in OVMF. But now
> we are just trying to give a possible solution as a poc.
> 
> Thanks
> Zihan
> 
> _______________________________________________
> SeaBIOS mailing list
> SeaBIOS at seabios.org
> https://mail.coreboot.org/mailman/listinfo/seabios
--
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK