On Mon, 2014-11-24 at 15:18 -0500, Kevin O'Connor wrote:
On Mon, Nov 24, 2014 at 09:38:38PM +0200, Marcel Apfelbaum wrote:
On Mon, 2014-11-24 at 13:01 -0500, Kevin O'Connor wrote:
On Mon, Nov 24, 2014 at 12:21:31PM -0500, Kevin O'Connor wrote:
On Mon, Nov 24, 2014 at 03:28:52PM +0100, Gerd Hoffmann wrote:
I think I would try to reuse the existing code which does the same for bridges. Reuse "struct pci_bus" to add one more level (and maybe rename the struct), then have the resource propagation code in pci_bios_check_devices() do the job.
That will need some reorganization, because the simple "struct pci_bus *busses" array indexed by bus number will not work any more.
Why not just pretend that the extra root PCI buses are children of bus zero (for resource sizing and mapping purposes).
Thinking about this further, an even easier route may be to just place all devices on the extra root PCI buses in busses. If I understand it correctly, when a system has more than one PCI root bus, all the root buses snoop all mem and io accesses, so there isn't a requirement for the io/mem regions to be continuous for that bus.
While from the host bridges perspective you are right, from the OS point of view we have a problem, at least for APCI based ones. They want disjoint IO/mem ranges for each primary root bus. (See: http://www.acpi.info/acpi_faq.htm)
I have a POC on a private branch with a working version and especially Windows needs the disjoint regions.
However, we can do the above and broke the _CRS into little pieces, one for each device. This will work, but maybe is not elegant and we may end with a lot of problems on QEMU side. What do you think?
If it's better to group the allocations for a given root bus together, then I'd just go with my first email - pretend each extra root bus is a child of bus zero for allocation purposes, and then audit the code so that it works with bus->bus_dev == NULL.
I'll try it thanks!
I do wonder why it's better to have extra PCI root buses instead of using PCI-to-PCI buses, but I guess I'll find out when I read your RFC.
In short, guests with multiple NUMA nodes (mapped from hosts with multiple NUMA nodes) can group together only their CPU and memory, but not the PCI devices. The reason is that the operating systems can associate only one NUMA node per PCI primary bus.
Extra PCI root buses will give this flexibility and will be really handy, especially with pass-through devices.
The RFC is coming soon.
Thanks again for your interest :), Marcel