Hi Gerd
On 08/28/2018 07:12 AM, Zihan Yang wrote:
Gerd Hoffmann kraxel@redhat.com 于2018年8月27日周一 上午7:04写道:
Hi,
However, QEMU only binds port 0xcf8 and 0xcfc to bus pcie.0. To avoid bus confliction, we should use other port pairs for busses under new domains.
I would skip support for IO based configuration and use only MMCONFIG for extra root buses.
The question remains: how do we assign MMCONFIG space for each PCI domain.
Thanks for your comments!
Allocation-wise it would be easiest to place them above 4G. Right after memory, or after etc/reserved-memory-end (if that fw_cfg file is present), where the 64bit pci bars would have been placed. Move the pci bars up in address space to make room.
Only problem is that seabios wouldn't be able to access mmconfig then.
Placing them below 4G would work at least for a few pci domains. q35 mmconfig bar is placed at 0xb0000000 -> 0xbfffffff, basically for historical reasons. Old qemu versions had 2.75G low memory on q35 (up to 0xafffffff), and I think old machine types still have that for live migration compatibility reasons. Modern qemu uses 2G only, to make gigabyte alignment work.
32bit pci bars are placed above 0xc0000000. The address space from 2G to 2.75G (0x8000000 -> 0xafffffff) is unused on new machine types. Enough room for three additional mmconfig bars (full size), so four pci domains total if you add the q35 one.
Maybe we can support 4 domains first before we come up with a better solution. But I'm not sure if four domains are enough for those who want too many devices?
(Adding Michael)
Since we will not use all 256 buses of an extra PCI domain, I think this space will allow us to support more PCI domains.
How will the flow look like ?
1. QEMU passes to SeaBIOS information of how many extra PCI domains needs, and how many buses per domain. How it will pass this info? A vendor specific capability, some PCI registers or modifying extra-pci-roots fw_cfg file?
2. SeaBIOS assigns the address for each PCI Domain and returns the information to QEMU. How it will do that? Some pxb-pcie registers? Or do we model the MMCFG like a PCI BAR?
3. Once QEMU gets the MMCFG addresses, it can answer to mmio configuration cycles.
4. SeaBIOS queries all PCI domains devices, computes and assigns IO/MEM resources (for PCI domains > 0 it will use MMCFG to configure the PCI devices)
5. QEMU uses the IO/MEM information to create the CRS for each extra PCI host bridge.
6. SeaBIOS gets the ACPI tables from QEMU and passes them to the guest OS.
Thanks, Marcel
cheers, Gerd
Hi,
Since we will not use all 256 buses of an extra PCI domain, I think this space will allow us to support more PCI domains.
Depends on the use case I guess. If you just need many pcie devices this probably doesn't help. If you want them for numa support then yes, more domains with less devices each can be useful then.
How will the flow look like ?
- QEMU passes to SeaBIOS information of how many extra
PCI domains needs, and how many buses per domain. How it will pass this info? A vendor specific capability, some PCI registers or modifying extra-pci-roots fw_cfg file?
Where is the pxb-pcie device? 0000:$somewhere? Or $domain:00:00.0?
- SeaBIOS assigns the address for each PCI Domain and
returns the information to QEMU. How it will do that? Some pxb-pcie registers? Or do we model the MMCFG like a PCI BAR?
If we can access pxb-pcie registers before configuring MMCFG then yes, we should use pxb-pcie registers for that.
cheers, Gerd
Hi Gerd,
On 08/28/2018 09:07 AM, Gerd Hoffmann wrote:
Hi,
Since we will not use all 256 buses of an extra PCI domain, I think this space will allow us to support more PCI domains.
Depends on the use case I guess. If you just need many pcie devices this probably doesn't help. If you want them for numa support then yes, more domains with less devices each can be useful then.
We already support multiple NUMA nodes. We want more devices. Still, having 4x number of devices we previously supported is a good step forward.
How will the flow look like ?
- QEMU passes to SeaBIOS information of how many extra
PCI domains needs, and how many buses per domain. How it will pass this info? A vendor specific capability, some PCI registers or modifying extra-pci-roots fw_cfg file?
Where is the pxb-pcie device? 0000:$somewhere? Or $domain:00:00.0?
0000:$somewhere (On PCI domain 0)
- SeaBIOS assigns the address for each PCI Domain
and returns the information to QEMU. How it will do that? Some pxb-pcie registers? Or do we model the MMCFG like a PCI BAR?
If we can access pxb-pcie registers before configuring MMCFG then yes, we should use pxb-pcie registers for that.
Yes, we can.
Thanks Gerd! Marcel
cheers, Gerd
Hi,
Where is the pxb-pcie device? 0000:$somewhere? Or $domain:00:00.0?
0000:$somewhere (On PCI domain 0)
Cool, so we don't have an chicken-and-egg issue.
If we can access pxb-pcie registers before configuring MMCFG then yes, we should use pxb-pcie registers for that.
Yes, we can.
Ok, so we can configure mmcfg as hidden pci bar, simliar to the q35 mmcfg. Any configuration hints can be passed as pci vendor capability (simliar to the bridge window size hints), if needed.
cheers, Gerd
Gerd Hoffmann kraxel@redhat.com 于2018年8月28日周二 上午10:15写道:
Hi,
Where is the pxb-pcie device? 0000:$somewhere? Or $domain:00:00.0?
0000:$somewhere (On PCI domain 0)
Cool, so we don't have an chicken-and-egg issue.
If we can access pxb-pcie registers before configuring MMCFG then yes, we should use pxb-pcie registers for that.
Yes, we can.
Ok, so we can configure mmcfg as hidden pci bar, simliar to the q35 mmcfg. Any configuration hints can be passed as pci vendor capability (simliar to the bridge window size hints), if needed.
That sounds workable, I will modify the implementation.
cheers, Gerd
Thanks. Zihan
On Tue, Aug 28, 2018 at 12:14:58PM +0200, Gerd Hoffmann wrote:
Hi,
Where is the pxb-pcie device? 0000:$somewhere? Or $domain:00:00.0?
0000:$somewhere (On PCI domain 0)
Cool, so we don't have an chicken-and-egg issue.
If we can access pxb-pcie registers before configuring MMCFG then yes, we should use pxb-pcie registers for that.
Yes, we can.
Ok, so we can configure mmcfg as hidden pci bar, simliar to the q35 mmcfg. Any configuration hints can be passed as pci vendor capability (simliar to the bridge window size hints), if needed.
Just so I understand, the proposal is to have SeaBIOS search for pxb-pcie devices on the main PCI bus and allocate address space for each. (These devices would not be considered pci buses in the traditional sense.) Then SeaBIOS will traverse that address space (MMCFG) and allocate BARs (both address space and io space) for the PCI devices found in that address space. Finally, QEMU will take all those allocations and use it when generating the ACPI tables.
Did I get that right?
-Kevin
Hi Kevin,
On 08/28/2018 08:02 PM, Kevin O'Connor wrote:
On Tue, Aug 28, 2018 at 12:14:58PM +0200, Gerd Hoffmann wrote:
Hi,
Where is the pxb-pcie device? 0000:$somewhere? Or $domain:00:00.0?
0000:$somewhere (On PCI domain 0)
Cool, so we don't have an chicken-and-egg issue.
If we can access pxb-pcie registers before configuring MMCFG then yes, we should use pxb-pcie registers for that.
Yes, we can.
Ok, so we can configure mmcfg as hidden pci bar, simliar to the q35 mmcfg. Any configuration hints can be passed as pci vendor capability (simliar to the bridge window size hints), if needed.
Just so I understand, the proposal is to have SeaBIOS search for pxb-pcie devices on the main PCI bus and allocate address space for each. (These devices would not be considered pci buses in the traditional sense.) Then SeaBIOS will traverse that address space (MMCFG) and allocate BARs (both address space and io space) for the PCI devices found in that address space. Finally, QEMU will take all those allocations and use it when generating the ACPI tables.
Did I get that right?
Yes, the pxb-pcie exposes a new PCI root bus, but we want it in a different PCI domain. This is done in order to remove the 256 PCI Express devices limitation on a PCI Express machine.
Does the plan sounds sane?
Thanks, Marcel
-Kevin
On Tue, Aug 28, 2018 at 08:17:19PM +0300, Marcel Apfelbaum wrote:
On 08/28/2018 08:02 PM, Kevin O'Connor wrote:
On Tue, Aug 28, 2018 at 12:14:58PM +0200, Gerd Hoffmann wrote:
Where is the pxb-pcie device? 0000:$somewhere? Or $domain:00:00.0?
0000:$somewhere (On PCI domain 0)
Cool, so we don't have an chicken-and-egg issue.
If we can access pxb-pcie registers before configuring MMCFG then yes, we should use pxb-pcie registers for that.
Yes, we can.
Ok, so we can configure mmcfg as hidden pci bar, simliar to the q35 mmcfg. Any configuration hints can be passed as pci vendor capability (simliar to the bridge window size hints), if needed.
Just so I understand, the proposal is to have SeaBIOS search for pxb-pcie devices on the main PCI bus and allocate address space for each. (These devices would not be considered pci buses in the traditional sense.) Then SeaBIOS will traverse that address space (MMCFG) and allocate BARs (both address space and io space) for the PCI devices found in that address space. Finally, QEMU will take all those allocations and use it when generating the ACPI tables.
Did I get that right?
Yes, the pxb-pcie exposes a new PCI root bus, but we want it in a different PCI domain. This is done in order to remove the 256 PCI Express devices limitation on a PCI Express machine.
Does the plan sounds sane?
It sounds okay to me.
Separately, we could "dust off" the SeaBIOS PAE patches if we want to place the address space allocations above 4GB.
-Kevin