[SeaBIOS] [RFC PATCH v2 0/4] Allow RedHat PCI bridges reserve more buses than necessary during init

Thu Jul 27 20:18:34 CEST 2017

On 26/07/2017 21:31, Laszlo Ersek wrote:
> On 07/26/17 18:22, Marcel Apfelbaum wrote:
>> On 26/07/2017 18:20, Laszlo Ersek wrote:
> 
> [snip]
> 
>>> However, what does the hot-pluggability of the PCIe-PCI bridge buy us?
>>> In other words, what does it buy us when we do not add the PCIe-PCI
>>> bridge immediately at guest startup, as an integrated device?
>>>   > Why is it a problem to "commit" in advance? I understand that we might
>>> not like the DMI-PCI bridge (due to it being legacy), but what speaks
>>> against cold-plugging the PCIe-PCI bridge either as an integrated device
>>> in pcie.0 (assuming that is permitted), or cold-plugging the PCIe-PCI
>>> bridge in a similarly cold-plugged PCIe root port?
>>>
>>
>> We want to keep Q35 clean, and for most cases we don't want any
>> legacy PCI stuff if not especially required.
>>
>>> I mean, in the cold-plugged case, you use up two bus numbers at the
>>> most, one for the root port, and another for the PCIe-PCI bridge. In the
>>> hot-plugged case, you have to start with the cold-plugged root port just
>>> the same (so that you can communicate the bus number reservation *at
>>> all*), and then reserve (= use up in advance) the bus number, the IO
>>> space, and the MMIO space(s). I don't see the difference; hot-plugging
>>> the PCIe-PCI bridge (= not committing in advance) doesn't seem to save
>>> any resources.
>>>
>>
>> Is not about resources, more about usage model.
>>
>>> I guess I would see a difference if we reserved more than one bus number
>>> in the hotplug case, namely in order to support recursive hotplug under
>>> the PCIe-PCI bridge. But, you confirmed that we intend to keep the flat
>>> hierarchy (ie the exercise is only for enabling legacy PCI endpoints,
>>> not for recursive hotplug).  The PCIe-PCI bridge isn't a device that
>>> does anything at all on its own, so why not just coldplug it? Its
>>> resources have to be reserved in advance anyway.
>>>
>>
>> Even if we prefer flat hierarchies, we should allow a sane nested
>> bridges configuration, so we will some times reserve more than one.
>>
>>> So, thus far I would say "just cold-plug the PCIe-PCI bridge at startup,
>>> possibly even make it an integrated device, and then you don't need to
>>> reserve bus numbers (and other apertures)".
>>>
>>> Where am I wrong?
>>>
>>
>> Nothing wrong, I am just looking for feature parity Q35 vs PC.
>> Users may want to continue using [nested] PCI bridges, and
>> we want the Q35 machine to be used by more users in order
>> to make it reliable faster, while keeping it clean by default.
>>
>> We had a discussion on this matter on last year KVM forum
>> and the hot-pluggable PCIe-PCI bridge was the general consensus.
> 
> OK. I don't want to question or go back on that consensus now; I'd just
> like to point out that all that you describe (nested bridges, and
> enabling legacy PCI with PCIe-PCI bridges, *on demand*) is still
> possible with cold-plugging.
> 
> I.e., the default setup of Q35 does not need to include legacy PCI
> bridges. It's just that the pre-launch configuration effort for a Q35
> user to *reserve* resources for legacy PCI is the exact same as the
> pre-launch configuration effort to *actually cold-plug* the bridge.
> 
> [snip]
> 
>>>>> The PI spec says,
>>>>>
>>>>>> [...] For all the root HPCs and the nonroot HPCs, call
>>>>>> EFI_PCI_HOT_PLUG_INIT_PROTOCOL.GetResourcePadding() to obtain the
>>>>>> amount of overallocation and add that amount to the requests from the
>>>>>> physical devices. Reprogram the bus numbers by taking into account the
>>>>>> bus resource padding information. [...]
>>>>>
>>>>> However, according to my interpretation of the source code, PciBusDxe
>>>>> does not consider bus number padding for non-root HPCs (which are "all"
>>>>> HPCs on QEMU).
>>>>>
>>>>
>>>> Theoretically speaking, it is possible to change the  behavior, right?
>>>
>>> Not just theoretically; in the past I have changed PciBusDxe -- it
>>> wouldn't identify QEMU's hotplug controllers (root port, downstream port
>>> etc) appropriately, and I managed to get some patches in. It's just that
>>> the less we understand the current code and the more intrusive/extensive
>>> the change is, the harder it is to sell the *idea*. PciBusDxe is
>>> platform-independent and shipped on many a physical system too.
>>>
>>
>> Understood, but from your explanation it sounds like the existings
>> callback sites(hooks) are enough.
> 
> That's the problem: they don't appear to, if you consider bus number
> reservations. The existing callback sites seem fine regarding IO and
> MMIO, but the only callback site that honors bus number reservation is
> limited to "root" (in the previously defined sense) hotplug controllers.
> 
> So this is something that will need investigation, and my most recent
> queries into the "hotplug preparation" parts of PciBusDxe indicate that
> those parts are quite... "forgotten". :) I guess this might be because
> on physical systems the level of PCI(e) hotpluggery that we plan to do
> is likely unheard of :)
> 

I admit is possible that it looks a little "crazy" on bare-metal,
but as long as we "color inside the lines" we are allowed to push
it a little :)

Thanks,
Marcel

> Thanks!
> Laszlo
>