On 07/26/17 18:22, Marcel Apfelbaum wrote:
On 26/07/2017 18:20, Laszlo Ersek wrote:
[snip]
However, what does the hot-pluggability of the PCIe-PCI bridge buy us? In other words, what does it buy us when we do not add the PCIe-PCI bridge immediately at guest startup, as an integrated device?
Why is it a problem to "commit" in advance? I understand that we might
not like the DMI-PCI bridge (due to it being legacy), but what speaks against cold-plugging the PCIe-PCI bridge either as an integrated device in pcie.0 (assuming that is permitted), or cold-plugging the PCIe-PCI bridge in a similarly cold-plugged PCIe root port?
We want to keep Q35 clean, and for most cases we don't want any legacy PCI stuff if not especially required.
I mean, in the cold-plugged case, you use up two bus numbers at the most, one for the root port, and another for the PCIe-PCI bridge. In the hot-plugged case, you have to start with the cold-plugged root port just the same (so that you can communicate the bus number reservation *at all*), and then reserve (= use up in advance) the bus number, the IO space, and the MMIO space(s). I don't see the difference; hot-plugging the PCIe-PCI bridge (= not committing in advance) doesn't seem to save any resources.
Is not about resources, more about usage model.
I guess I would see a difference if we reserved more than one bus number in the hotplug case, namely in order to support recursive hotplug under the PCIe-PCI bridge. But, you confirmed that we intend to keep the flat hierarchy (ie the exercise is only for enabling legacy PCI endpoints, not for recursive hotplug). The PCIe-PCI bridge isn't a device that does anything at all on its own, so why not just coldplug it? Its resources have to be reserved in advance anyway.
Even if we prefer flat hierarchies, we should allow a sane nested bridges configuration, so we will some times reserve more than one.
So, thus far I would say "just cold-plug the PCIe-PCI bridge at startup, possibly even make it an integrated device, and then you don't need to reserve bus numbers (and other apertures)".
Where am I wrong?
Nothing wrong, I am just looking for feature parity Q35 vs PC. Users may want to continue using [nested] PCI bridges, and we want the Q35 machine to be used by more users in order to make it reliable faster, while keeping it clean by default.
We had a discussion on this matter on last year KVM forum and the hot-pluggable PCIe-PCI bridge was the general consensus.
OK. I don't want to question or go back on that consensus now; I'd just like to point out that all that you describe (nested bridges, and enabling legacy PCI with PCIe-PCI bridges, *on demand*) is still possible with cold-plugging.
I.e., the default setup of Q35 does not need to include legacy PCI bridges. It's just that the pre-launch configuration effort for a Q35 user to *reserve* resources for legacy PCI is the exact same as the pre-launch configuration effort to *actually cold-plug* the bridge.
[snip]
The PI spec says,
[...] For all the root HPCs and the nonroot HPCs, call EFI_PCI_HOT_PLUG_INIT_PROTOCOL.GetResourcePadding() to obtain the amount of overallocation and add that amount to the requests from the physical devices. Reprogram the bus numbers by taking into account the bus resource padding information. [...]
However, according to my interpretation of the source code, PciBusDxe does not consider bus number padding for non-root HPCs (which are "all" HPCs on QEMU).
Theoretically speaking, it is possible to change the behavior, right?
Not just theoretically; in the past I have changed PciBusDxe -- it wouldn't identify QEMU's hotplug controllers (root port, downstream port etc) appropriately, and I managed to get some patches in. It's just that the less we understand the current code and the more intrusive/extensive the change is, the harder it is to sell the *idea*. PciBusDxe is platform-independent and shipped on many a physical system too.
Understood, but from your explanation it sounds like the existings callback sites(hooks) are enough.
That's the problem: they don't appear to, if you consider bus number reservations. The existing callback sites seem fine regarding IO and MMIO, but the only callback site that honors bus number reservation is limited to "root" (in the previously defined sense) hotplug controllers.
So this is something that will need investigation, and my most recent queries into the "hotplug preparation" parts of PciBusDxe indicate that those parts are quite... "forgotten". :) I guess this might be because on physical systems the level of PCI(e) hotpluggery that we plan to do is likely unheard of :)
Thanks! Laszlo