Multi domain PCI resource allocation: How to deal with multiple root busses on one domain

List overview All Threads
Download

newer

older

[Building] Latest Coreboot...

New array-bounds warnings with GCC...

Arthur Heymans

17 Mar 2022 17 Mar '22

7:03 p.m.

I've recently tried to improve the soc/intel/xeon_sp codebase. I want to make it use more native coreboot structures and codeflows instead of parsing the FSP HOB again and again to do things. Ideally the HOB is parsed only once in ramstage, parsed into adequate native coreboot structures (struct device, struct bus, chip_info, ...) and used later on.

The lowest hanging fruit in that effort is resource allocation. Currently the coreboot allocator is sort of hijacked by the soc code and done over again. The reason for this is that xeon_sp platforms operate a bit differently than most AMD and Intel Client hardware: there are multiple root busses. This means that there are PCI busses that are in use, but are not downstream from PCI bus 0. In hardware terminology those are the IIO and other type of Stacks.

Each Stack has its own range of usable PCI Bus numbers and decoded IO and MEM spaces below and above 4G. I tried to map these hardware concepts to the existing coreboot 'domain' structure. Each domain has resource windows that are used to allocate children devices on, which would be the PCI devices on the stacks. The allocator needs some tweaks to allow for multiple resources of a type (MEM or IO), but nothing major. See https://review.coreboot.org/c/coreboot/+/62353/ and https://review.coreboot.org/c/coreboot/+/62865 (allocator rewrite/improvement based on Nico's excellent unmerged v4.5 work) This seems to work really well and arguably even better than how it is now with more elegant handling of above and below 4G resources.

Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

Do you have any suggestions to move forward?

Kind regards

Arthur Heymans

Attachments:

attachment.html (text/html — 2.8 KB)

Show replies by date

Jonathan Zhang (Infra)

17 Mar 17 Mar

7:27 p.m.

How about this option? Instead of one (coreboot) PCIe domain per (Xeon-SP) PCIe stack, we do one (coreboot) PCIe domain per root bus assignment. Regarding resource windows, we could adjust the remaining windows after assignment for a PCIe domain is completed.

Jonathan

From: Arthur Heymans arthur@aheymans.xyz Date: Thursday, March 17, 2022 at 11:04 AM To: coreboot coreboot@coreboot.org Subject: [coreboot] Multi domain PCI resource allocation: How to deal with multiple root busses on one domain Hi I've recently tried to improve the soc/intel/xeon_sp codebase. I want to make it use more native coreboot structures and codeflows instead of parsing the FSP HOB again and again to do things. Ideally the HOB is parsed only once in ramstage, parsed into adequate native coreboot structures (struct device, struct bus, chip_info, ...) and used later on. The lowest hanging fruit in that effort is resource allocation. Currently the coreboot allocator is sort of hijacked by the soc code and done over again. The reason for this is that xeon_sp platforms operate a bit differently than most AMD and Intel Client hardware: there are multiple root busses. This means that there are PCI busses that are in use, but are not downstream from PCI bus 0. In hardware terminology those are the IIO and other type of Stacks. Each Stack has its own range of usable PCI Bus numbers and decoded IO and MEM spaces below and above 4G. I tried to map these hardware concepts to the existing coreboot 'domain' structure. Each domain has resource windows that are used to allocate children devices on, which would be the PCI devices on the stacks. The allocator needs some tweaks to allow for multiple resources of a type (MEM or IO), but nothing major. See https://review.coreboot.org/c/coreboot/+/62353/https://review.coreboot.org/c/coreboot/+/62353/ and https://review.coreboot.org/c/coreboot/+/62865 https://review.coreboot.org/c/coreboot/+/62865 (allocator rewrite/improvement based on Nico's excellent unmerged v4.5 work) This seems to work really well and arguably even better than how it is now with more elegant handling of above and below 4G resources. Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) .... This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways. Do you have any suggestions to move forward? Kind regards Arthur Heymans

Nico Huber

7:50 p.m.

Hi Arthur,

On 17.03.22 19:03, Arthur Heymans wrote:

...

Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

this is correct, we often (if not always by now) ignore that `link_list` is a list itself and only walk the children of the first entry.

...

This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

I wouldn't say it was a design choice, probably rather a convenience choice. The old concepts around multiple buses directly downstream of a single device seemed inconsistent, AFAICT. And at the time the allo- cator v4 was written it seemed unnecessary to keep compatibility around.

That doesn't mean we can't bring it back, of course. There is at least one alternative, though.

The currently common case looks like this:

PCI bus 0 | v

domain 0 --. |-- PCI 00:00.0 | |-- PCI 00:02.0 | :

Now we could have multiple PCI buses directly below the domain. But instead of modelling this with the `link_list`, we could also model it with an abstract "host" bus below the domain device and another layer of "host bridge" devices in between:

host bus | v

domain 0 --. |-- PCI host bridge 0 --. | |-- PCI 00:00.0 | | | `-- PCI 00:02.0 | | |-- PCI host bridge 1 --. | |-- PCI 16:00.0 | | | : :

I guess this would reduce complexity in generic code at the expense of more data structures (devices) to manage. OTOH, if we'd make a final decision for such a model, we could also get rid of the `link_list`. Basically, setting in stone that we only allow one bus downstream of any device node.

I'm not fully familiar with the hierarchy on Xeon-SP systems. Would this be an adequate solution? Also, does the term `stack` map to our `domain` 1:1 or are there differences?

Nico

Lance Zhao

18 Mar 18 Mar

5:06 a.m.

Stack idea is from https://www.intel.com/content/www/us/en/developer/articles/technical/utilizi... .

In linux, sometimes domain is same as "segment", I am not sure current coreboot on xeon_sp already cover the case of multiple segment yet.

Nico Huber nico.h@gmx.de 于2022年3月18日周五 02:50写道：

...

Hi Arthur,

On 17.03.22 19:03, Arthur Heymans wrote:

...
Now my question is the following: On some Stacks there are multiple root busses, but the resources need to

be

...
allocated on the same window. My initial idea was to add those root

busses

...
as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

this is correct, we often (if not always by now) ignore that `link_list` is a list itself and only walk the children of the first entry.

...
This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

I wouldn't say it was a design choice, probably rather a convenience choice. The old concepts around multiple buses directly downstream of a single device seemed inconsistent, AFAICT. And at the time the allo- cator v4 was written it seemed unnecessary to keep compatibility around.

That doesn't mean we can't bring it back, of course. There is at least one alternative, though.

The currently common case looks like this:
      PCI bus 0
         |
         v
domain 0 --. |-- PCI 00:00.0 | |-- PCI 00:02.0 | :

Now we could have multiple PCI buses directly below the domain. But instead of modelling this with the `link_list`, we could also model it with an abstract "host" bus below the domain device and another layer of "host bridge" devices in between:
      host bus
         |
         v
domain 0 --. |-- PCI host bridge 0 --. | |-- PCI 00:00.0 | | | `-- PCI 00:02.0 | | |-- PCI host bridge 1 --. | |-- PCI 16:00.0 | | | : :

I guess this would reduce complexity in generic code at the expense of more data structures (devices) to manage. OTOH, if we'd make a final decision for such a model, we could also get rid of the `link_list`. Basically, setting in stone that we only allow one bus downstream of any device node.

I'm not fully familiar with the hierarchy on Xeon-SP systems. Would this be an adequate solution? Also, does the term `stack` map to our `domain` 1:1 or are there differences?

Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Nico Huber

3:20 p.m.

Hi Lance,

On 18.03.22 05:06, Lance Zhao wrote:

...

Stack idea is from https://www.intel.com/content/www/us/en/developer/articles/technical/utilizi...

thank you very much! The diagrams are enlightening. I always assumed Intel calls these "stacks" because there are multiple components invol- ved that matter for software/firmware development. Turns out these stacks are rather black boxes to us and we don't need to know what components compose a stack, is that right?

Looking at these diagrams, I'd say the IIO stacks are PCI host bridges from our point of view.

...

In linux, sometimes domain is same as "segment", I am not sure current coreboot on xeon_sp already cover the case of multiple segment yet.

These terms are highly ambiguous. We always need to be careful to not confuse them, e.g. "domain" in one project can mean something very dif- ferent than our "domain device".

Not sure if you are referring to "PCI bus segments". These are very dif- ferent from our "domain" term. I assume coreboot supports multiple PCI bus segments. At least it looks like one just needs to initialize `.secondary` and `.subordinate` of the downstream link of a PCI host bridge accordingly.

There is also the term "PCI segment group". This refers to PCI bus segments that share a space of 256 buses, e.g. one PCI bus segment could occupy buses 0..15 and another 16..31 in the same group. Multiple PCI segment groups are currently not explicitly supported. Might work, though, if the platform has a single, consecutive ECAM/MMCONF region to access more than the first group.

Nico

Arthur Heymans

21 Mar 21 Mar

9:35 p.m.

Hi all

Thanks a lot for the input.

I looked a bit further into this and it looks like only the resource allocation parts assumes one downstream bus under link_list. The rest of coreboot seems to properly account for sibling busses, so maybe making the allocator loop over ->next in busses is not so bad after all. https://review.coreboot.org/c/coreboot/+/62967 implements this.

OTOH I'm however under the impression that the sconfig tool currently does not easily allow for statically defining multibus domains.

Kind regards

Arthur

On Fri, Mar 18, 2022 at 3:20 PM Nico Huber nico.h@gmx.de wrote:

...

Hi Lance,

On 18.03.22 05:06, Lance Zhao wrote:

...
Stack idea is from

https://www.intel.com/content/www/us/en/developer/articles/technical/utilizi...

thank you very much! The diagrams are enlightening. I always assumed Intel calls these "stacks" because there are multiple components invol- ved that matter for software/firmware development. Turns out these stacks are rather black boxes to us and we don't need to know what components compose a stack, is that right?

Looking at these diagrams, I'd say the IIO stacks are PCI host bridges from our point of view.

...
In linux, sometimes domain is same as "segment", I am not sure current coreboot on xeon_sp already cover the case of multiple segment yet.

These terms are highly ambiguous. We always need to be careful to not confuse them, e.g. "domain" in one project can mean something very dif- ferent than our "domain device".

Not sure if you are referring to "PCI bus segments". These are very dif- ferent from our "domain" term. I assume coreboot supports multiple PCI bus segments. At least it looks like one just needs to initialize `.secondary` and `.subordinate` of the downstream link of a PCI host bridge accordingly.

There is also the term "PCI segment group". This refers to PCI bus segments that share a space of 256 buses, e.g. one PCI bus segment could occupy buses 0..15 and another 16..31 in the same group. Multiple PCI segment groups are currently not explicitly supported. Might work, though, if the platform has a single, consecutive ECAM/MMCONF region to access more than the first group.

Nico

Mariusz Szafrański

22 Mar 22 Mar

8:03 a.m.

https://review.coreboot.org/c/coreboot/+/51180 not exactly but similar one ;-)

Mariusz

W dniu 21.03.2022 o 21:35, Arthur Heymans pisze:

...

Hi all

Thanks a lot for the input.

I looked a bit further into this and it looks like only the resource allocation parts assumes one downstream bus under link_list. The rest of coreboot seems to properly account for sibling busses, so maybe making the allocator loop over ->next in busses is not so bad after all. https://review.coreboot.org/c/coreboot/+/62967 implements this.

OTOH I'm however under the impression that the sconfig tool currently does not easily allow for statically defining multibus domains.

Kind regards

Arthur

Jeff Daly

1:34 p.m.

If I may chime in here (I worked at Intel back when this issue was first encountered when porting coreboot to Jacobsville)… The way I ended up making it work was to introduce another object in devicetree (which I called a root bridge) to model the concept of the PCI stack. In the picture below from Arthur these would be the PCI host bridge [0/1]. I called them root bridges because the PCI spec describes a host bridge as the path that the CPU takes to get to the PCI ‘domain’ in our use of it. I had worked on a much earlier project (no coreboot) called Jasper Forest where we had 2 separate CPU packages with a bus between them and each had a separate set of PCI busses below them. The cpus were in the same coherency domain and would each be able to access the busses below the other. This situation was what I considered to be 2 host bridges because there were 2 separate ways the cpus could get to the PCI domain, one via their own direct connection and one crossing the cpu-cpu bus and the sibling cpu directing the accesses and response back appropriately. The pci domain in this case was ‘pre-split’, allocating busses 0-0x7F to the first (the one with the DMI to PCH connection) cpu and busses 0x80-0xFD to the other. (Each CPU also had a bus number allocated to their own C-bus, 0xFF for the first, 0xFE for the second which was used early in the boot process to configure the bus splits between them.). After the first boot cycle was completed and all resources were gathered for each host bridge it was determined whether each had enough resources to map in everything required under each and if a rebalance needed to happen. If a rebalance had to occur (one side needed more memory or io space) nvram variables were set and a reset occurred so the BIOS could set up the split according to the variables and in this way only the first boot (or if new devices were added to open slots) would be the long ones.

Maybe the above description helps with making some choices about how to do things, maybe not as Mariusz said there’s multiple (and probably always a better) way(s) of doing things. Using my experience on Jasper Forest, introducing a root bridge object to devicetree gave a nice way to describe the system logically, and gave a mechanism for the implementor to do this pre-allocation of resources so we wouldn’t have to go through the possible reboot to balance resources as above. The stacks (root bridges) in Xeon may be able to handle changing the decoding at runtime (with certain limitations like bus quiescence) unlike the Jasper Forest example above where the initial decoding of the resources between CPUs required a reset to change the values. Using devicetree to describe the resources was my solution to making the enumeration faster and simpler, at the expense of flexibility. But, since these were mostly networking type platforms they were more static in nature so it wasn’t really thought to be an issue at the time. (These are the same stacks as are used in Xeon-SP today, as well as back to what Skylake-D for example used). I left Intel a few years ago before that work was completed.

Fast forward, I was doing some work porting coreboot to Skylake-D early last year I recalled some of the difficulties in communicating the PCI domain enumerated in coreboot to Tianocore via ACPI. I rememberd that it might have well been that the stacks are considered as host bridges because we could describe in ASL that each stack had a separate and invariable (as far as Tianocore was concerned) set of resources. I think that I had actually done it that way, extending the acpi code in coreboot to generate a PCI host bridge device when the new root bridge object was encountered in the devicetree. For Skylake-D work (which was eventually dropped) I had run into a problem where if not all the memory space in the system was allocated or reserved for things (meaning that holes were left in the memory space), if a device under a stack wasn’t allocated resources because the stack didn’t have a large enough window that Linux would assume these holes subtractively decoded to the PCI domain and try to stick them in there. Another thing that added some complexity was that each stack had it’s own IOAPIC and in non-APIC mode all virtual legacy wire interrupts had to get forwarded down to the stack that had the PCH before interrupts got back to the CPU.

Not sure if any of this helps or if it just sounds like rambling, but I thought maybe some of these thoughts could be helpful in design decisions made in the future. Personally I liked the idea of having the stacks understood in devicetree, but there were also some drawbacks as well. One thing that might be a drawback is whether or not the stack implementation in the hardware can be flexible enough for what you might like to do in devicetree as far as assigning bus ranges, etc. The stacks’ maximum bus number is determined by the starting bus number of the next stack in the line.

Some further info regarding the stacks which may influence any future designs…. Intel regards a stack as being ‘implemented’ when it decodes a PCIe root bridge below it. Stack-based SoC designs may not implement all stacks, as they may have different PCIe requirements. The thing to understand (at least on the current generation of stack-based designs) is that the devices/functions that used to be part of what was called the Uncore (memory controllers, CSI/UPI bus configuration, etc) are now spread across devices 8-31 of the stacks’ root bus numbers. The exception to this is stack 0 which only has uncore device 8 on it, because it’s also the DMI decode to the PCH complex which has (and can only have) devices 9-31 on it. So, while a stack may be ‘unimplemented’ it still needs to have a bus number if the uncore devices on it need to be accessible (or at least need to not collide with other bus assignments if not needed). Example is the uncore integrated memory controller device(s) are now on stack 2, devices 8-13 (SKX/CSX platforms). Stack 2 needs a bus number assigned to it (via a register in stack 0 dev 8), in order to access the imc registers. By default this bus number is 2 and the stack bus decoder takes precedence so any stack bus numbering needs to be in increasing bus number per stack. The type of thing that can’t happen in this case at early boot is trying to ‘fake’ a bus number decoding say for some device under a pcie root bridge on the PCH, you wouldn’t be able to setup a pcie root bridge subordinate/secondary bus decoding to get to it until you’ve changed the stack bus numbering from the power on default.

One of the upshots of this new schema (and probably the reason that it was done this way in the first place) is that now none of the uncore devices use any MMIO resources for internal registers. When more register space is needed, they will be in MMCFG space.

From: Lance Zhao lance.zhao@gmail.com Sent: Friday, March 18, 2022 12:06 AM To: Nico Huber nico.h@gmx.de Cc: Arthur Heymans arthur@aheymans.xyz; coreboot coreboot@coreboot.org Subject: [coreboot] Re: Multi domain PCI resource allocation: How to deal with multiple root busses on one domain

Caution: This is an external email. Please take care when clicking links or opening attachments.

Stack idea is from https://www.intel.com/content/www/us/en/developer/articles/technical/utilizi....

In linux, sometimes domain is same as "segment", I am not sure current coreboot on xeon_sp already cover the case of multiple segment yet.

Nico Huber <nico.h@gmx.demailto:nico.h@gmx.de> 于2022年3月18日周五 02:50写道： Hi Arthur,

On 17.03.22 19:03, Arthur Heymans wrote:

...

Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

this is correct, we often (if not always by now) ignore that `link_list` is a list itself and only walk the children of the first entry.

...

This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

That doesn't mean we can't bring it back, of course. There is at least one alternative, though.

The currently common case looks like this:

PCI bus 0 | v

domain 0 --. |-- PCI 00:00.0 | |-- PCI 00:02.0 | :

host bus | v

domain 0 --. |-- PCI host bridge 0 --. | |-- PCI 00:00.0 | | | `-- PCI 00:02.0 | | |-- PCI host bridge 1 --. | |-- PCI 16:00.0 | | | : :

I'm not fully familiar with the hierarchy on Xeon-SP systems. Would this be an adequate solution? Also, does the term `stack` map to our `domain` 1:1 or are there differences?

Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.orgmailto:coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.orgmailto:coreboot-leave@coreboot.org

Mariusz Szafrański

18 Mar 18 Mar

6:08 p.m.

Hi Nico,

Concept of multiple host bridges was partially implemented in PoC project for upcomming SoC when I joined this project some time ago.

There were many issues (maybe because an early, prototype implementation/incompatibility with previous v3 resource alocator) mainly in resource allocation.

This was also adding additional layer (host bridges) and complexity.

This was finally abandoned in favor of multidomain concept (https://review.coreboot.org/c/coreboot/+/56263) which reuses existing coreboot domain. But using multiple of them - separate one for each stack.

The question was why to introduce new structures when we could reuse domain to split available memory/io ranges between stacks/domains. In domain level read resources function it gets ranges preallocated to stack instad of full range in case of one domain.

It was working with only minimal changes to coreboot core/resource allocator.

Just my experience ;-) Always can be done better

Mariusz

W dniu 17.03.2022 o 19:50, Nico Huber pisze:

...

Hi Arthur,

On 17.03.22 19:03, Arthur Heymans wrote:

...
Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

this is correct, we often (if not always by now) ignore that `link_list` is a list itself and only walk the children of the first entry.

...
This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

I wouldn't say it was a design choice, probably rather a convenience choice. The old concepts around multiple buses directly downstream of a single device seemed inconsistent, AFAICT. And at the time the allo- cator v4 was written it seemed unnecessary to keep compatibility around.

That doesn't mean we can't bring it back, of course. There is at least one alternative, though.

The currently common case looks like this:
       PCI bus 0
          |
          v
domain 0 --. |-- PCI 00:00.0 | |-- PCI 00:02.0 | :

Now we could have multiple PCI buses directly below the domain. But instead of modelling this with the `link_list`, we could also model it with an abstract "host" bus below the domain device and another layer of "host bridge" devices in between:
       host bus
          |
          v
domain 0 --. |-- PCI host bridge 0 --. | |-- PCI 00:00.0 | | | `-- PCI 00:02.0 | | |-- PCI host bridge 1 --. | |-- PCI 16:00.0 | | | : :

I guess this would reduce complexity in generic code at the expense of more data structures (devices) to manage. OTOH, if we'd make a final decision for such a model, we could also get rid of the `link_list`. Basically, setting in stone that we only allow one bus downstream of any device node.

I'm not fully familiar with the hierarchy on Xeon-SP systems. Would this be an adequate solution? Also, does the term `stack` map to our `domain` 1:1 or are there differences?

Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Mariusz Szafrański

22 Mar 22 Mar

8:23 a.m.

Hi Nico,

Concept of multiple host bridges was partially implemented in PoC project for upcomming SoC when I joined this project some time ago.

There were many issues (maybe because an early, prototype implementation/incompatibility with previous v3 resource alocator) mainly in resource allocation.

This was also adding additional layer (host bridges) and complexity.

It was working with only minimal changes to coreboot core/resource allocator.

Just my experience Always can be done better ;-)

Mariusz

W dniu 17.03.2022 o 19:50, Nico Huber pisze:

...

Hi Arthur,

On 17.03.22 19:03, Arthur Heymans wrote:

...
Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

this is correct, we often (if not always by now) ignore that `link_list` is a list itself and only walk the children of the first entry.

...
This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

I wouldn't say it was a design choice, probably rather a convenience choice. The old concepts around multiple buses directly downstream of a single device seemed inconsistent, AFAICT. And at the time the allo- cator v4 was written it seemed unnecessary to keep compatibility around.

That doesn't mean we can't bring it back, of course. There is at least one alternative, though.

The currently common case looks like this:
       PCI bus 0
          |
          v
domain 0 --. |-- PCI 00:00.0 | |-- PCI 00:02.0 | :

Now we could have multiple PCI buses directly below the domain. But instead of modelling this with the `link_list`, we could also model it with an abstract "host" bus below the domain device and another layer of "host bridge" devices in between:
       host bus
          |
          v
domain 0 --. |-- PCI host bridge 0 --. | |-- PCI 00:00.0 | | | `-- PCI 00:02.0 | | |-- PCI host bridge 1 --. | |-- PCI 16:00.0 | | | : :

I guess this would reduce complexity in generic code at the expense of more data structures (devices) to manage. OTOH, if we'd make a final decision for such a model, we could also get rid of the `link_list`. Basically, setting in stone that we only allow one bus downstream of any device node.

I'm not fully familiar with the hierarchy on Xeon-SP systems. Would this be an adequate solution? Also, does the term `stack` map to our `domain` 1:1 or are there differences?

Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Mariusz Szafrański

18 Mar 18 Mar

5:30 p.m.

Hi Arthur,

In our multidomin based PoC in this situation (multiple root busses on one stack) we "virtually" splitted this stack and its resource window to two or more virtual stacks and later handled as separate stacks.

Mariusz

W dniu 17.03.2022 o 19:03, Arthur Heymans pisze:

...

Hi

I've recently tried to improve the soc/intel/xeon_sp codebase. I want to make it use more native coreboot structures and codeflows instead of parsing the FSP HOB again and again to do things. Ideally the HOB is parsed only once in ramstage, parsed into adequate native coreboot structures (struct device, struct bus, chip_info, ...) and used later on.

The lowest hanging fruit in that effort is resource allocation. Currently the coreboot allocator is sort of hijacked by the soc code and done over again. The reason for this is that xeon_sp platforms operate a bit differently than most AMD and Intel Client hardware: there are multiple root busses. This means that there are PCI busses that are in use, but are not downstream from PCI bus 0. In hardware terminology those are the IIO and other type of Stacks.

Each Stack has its own range of usable PCI Bus numbers and decoded IO and MEM spaces below and above 4G. I tried to map these hardware concepts to the existing coreboot 'domain' structure. Each domain has resource windows that are used to allocate children devices on, which would be the PCI devices on the stacks. The allocator needs some tweaks to allow for multiple resources of a type (MEM or IO), but nothing major. See https://review.coreboot.org/c/coreboot/+/62353/ and https://review.coreboot.org/c/coreboot/+/62865 (allocator rewrite/improvement based on Nico's excellent unmerged v4.5 work) This seems to work really well and arguably even better than how it is now with more elegant handling of above and below 4G resources.

Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

Do you have any suggestions to move forward?

Kind regards

Arthur Heymans

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Mariusz Szafrański

22 Mar 22 Mar

8:14 a.m.

Hi Arthur,

In our multidomain based PoC in this situation (multiple root busses on one stack) we "virtually" splitted this stack and its resource window to two or more virtual stacks and later handled as separate stacks.

Mariusz

W dniu 17.03.2022 o 19:03, Arthur Heymans pisze:

...

Hi

I've recently tried to improve the soc/intel/xeon_sp codebase. I want to make it use more native coreboot structures and codeflows instead of parsing the FSP HOB again and again to do things. Ideally the HOB is parsed only once in ramstage, parsed into adequate native coreboot structures (struct device, struct bus, chip_info, ...) and used later on.

The lowest hanging fruit in that effort is resource allocation. Currently the coreboot allocator is sort of hijacked by the soc code and done over again. The reason for this is that xeon_sp platforms operate a bit differently than most AMD and Intel Client hardware: there are multiple root busses. This means that there are PCI busses that are in use, but are not downstream from PCI bus 0. In hardware terminology those are the IIO and other type of Stacks.

Each Stack has its own range of usable PCI Bus numbers and decoded IO and MEM spaces below and above 4G. I tried to map these hardware concepts to the existing coreboot 'domain' structure. Each domain has resource windows that are used to allocate children devices on, which would be the PCI devices on the stacks. The allocator needs some tweaks to allow for multiple resources of a type (MEM or IO), but nothing major. See https://review.coreboot.org/c/coreboot/+/62353/ and https://review.coreboot.org/c/coreboot/+/62865 (allocator rewrite/improvement based on Nico's excellent unmerged v4.5 work) This seems to work really well and arguably even better than how it is now with more elegant handling of above and below 4G resources.

Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

Do you have any suggestions to move forward?

Kind regards

Arthur Heymans

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Arthur Heymans

8:29 a.m.

Hi Mariusz

I was inspired by the multi domain approach doc and got quite far already. I decided to allocate and attach domains at runtime for the moment instead of statically via the devicetree. In the future I think having devicetree structures makes a lot of sense, e.g. to provide stack specific configuration or derive the IIO bifurcation configuration from it.

I see that FSP sometimes does this virtual splitting but not always and just reports multiple PCI roots on one stack (via a HOB). I don't think splitting up a reported stack in multiple domains in coreboot is a good idea. This means you need to be aware of the downstream resources when just constraining the domain resources to the ones reported to be allocated to the stack. This agains means bypassing or redoing a lot of the coreboot resources allocation so I doubt this approach makes sense. I'm currently thinking that the multiple PCI root bus per domain approach is the easiest. It already works with some minor allocator changes.

Kind regards

Arthur

On Tue, Mar 22, 2022 at 8:15 AM Mariusz Szafrański via coreboot < coreboot@coreboot.org> wrote:

...

Hi Arthur,

In our multidomain based PoC in this situation (multiple root busses on one stack) we "virtually" splitted this stack and its resource window to two or more virtual stacks and later handled as separate stacks.

Mariusz

W dniu 17.03.2022 o 19:03, Arthur Heymans pisze:

...
Hi

I've recently tried to improve the soc/intel/xeon_sp codebase. I want to make it use more native coreboot structures and codeflows instead of parsing the FSP HOB again and again to do things. Ideally the HOB is parsed only once in ramstage, parsed into adequate native coreboot structures (struct device, struct bus, chip_info, ...) and used later on.

The lowest hanging fruit in that effort is resource allocation. Currently the coreboot allocator is sort of hijacked by the soc code and done over again. The reason for this is that xeon_sp platforms operate a bit differently than most AMD and Intel Client hardware: there are multiple root busses. This means that there are PCI busses that are in use, but are not downstream from PCI bus 0. In hardware terminology those are the IIO and other type of Stacks.

Each Stack has its own range of usable PCI Bus numbers and decoded IO and MEM spaces below and above 4G. I tried to map these hardware concepts to the existing coreboot 'domain' structure. Each domain has resource windows that are used to allocate children devices on, which would be the PCI devices on the stacks. The allocator needs some tweaks to allow for multiple resources of a type (MEM or IO), but nothing major. See https://review.coreboot.org/c/coreboot/+/62353/ and https://review.coreboot.org/c/coreboot/+/62865 (allocator rewrite/improvement based on Nico's excellent unmerged v4.5 work) This seems to work really well and arguably even better than how it is now with more elegant handling of above and below 4G resources.

Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

Do you have any suggestions to move forward?

Kind regards

Arthur Heymans

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Mariusz Szafrański

9:57 a.m.

Hi Artur,

Multiple PCI root bus per domain gives us more control about resource allocation for downstream devices from poll preallocated by FSP to stack but adds this link_list->next looping complexity (maybe not much deal to handle that as you stated). Additional work will be needed for statically defining them in devicetree.

For us it was easiest to split them "virtually"

e.g. if we got from HOB info that physical stack x has preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem 0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and there are 2 root buses 0x20 and 0x28 instead of adding one domain with "physical" stack we added two domains with "virtual" stacks:

stack x1 with virtually preallocated PCI buses 0x20..0x27, ip form 0x2000..0x27ff, mem 0xd0000000..0xd7ffffff, mem 0x10000000000...0x17fffffffff stack x2 with virtually preallocated PCI buses 0x28..0x2f, ip form 0x2800..0x2fff, mem 0xd7000000..0xdfffffff, mem 0x18000000000...0x1ffffffffff

Each one with only one root bus without this link_list->next "complexity"

It was just a "shortcut" for now.

At some point of time I was thinking about something called "subdomains" concept to cover this multiple root buses in one domain case so to make something like:

domain 0 //domain domain 1 //subdomain first root bus from stack x and its downstream devices end domain 2 //subdomain second root bus from stack x and its downstream devices end end domain ... ... end ...

But finally didn`t tried to implement it.

Additional dirty "overflow" trick that I was used for some time was to use something like:

domain 0 //first root bus pci 0:0.1....end .... //second root bus pci 0x20:0....end //overflow at 0x20 pci bus number boundary .... end

And dynamic update 0x20 -> real bus number at runtime

All above is what we have tried but all that is not a final solution. Maybe it will give someone else hint to find a better/easier way to handle this hw in coreboot.

Mariusz

W dniu 22.03.2022 o 08:29, Arthur Heymans pisze:

...

Hi Mariusz

Kind regards

Arthur

On Tue, Mar 22, 2022 at 8:15 AM Mariusz Szafrański via coreboot coreboot@coreboot.org wrote:

Hi Arthur,

In our multidomain based PoC in this situation (multiple root
busses on
one stack) we "virtually" splitted this stack and its resource
window to
two or more virtual stacks and later handled as separate stacks.

Mariusz

W dniu 17.03.2022 o 19:03, Arthur Heymans pisze:
> Hi
>
> I've recently tried to improve the soc/intel/xeon_sp codebase.
> I want to make it use more native coreboot structures and codeflows
> instead of parsing the FSP HOB again and again to do things.
Ideally
> the HOB is parsed only once in ramstage, parsed into adequate
native
> coreboot structures (struct device, struct bus, chip_info, ...) and
> used later on.
>
> The lowest hanging fruit in that effort is resource allocation.
> Currently the coreboot allocator is sort of hijacked by the soc
code
> and done over again.
> The reason for this is that xeon_sp platforms operate a bit
> differently than most AMD and Intel Client hardware: there are
> multiple root busses. This means that there are PCI busses that
are in
> use, but are not downstream from PCI bus 0. In hardware terminology
> those are the IIO and other type of Stacks.
>
> Each Stack has its own range of usable PCI Bus numbers and
decoded IO
> and MEM spaces below and above 4G. I tried to map these hardware
> concepts to the existing coreboot 'domain' structure. Each
domain has
> resource windows that are used to allocate children devices on,
which
> would be the PCI devices on the stacks.
> The allocator needs some tweaks to allow for multiple resources
of a
> type (MEM or IO), but nothing major. See
> https://review.coreboot.org/c/coreboot/+/62353/ and
> https://review.coreboot.org/c/coreboot/+/62865 (allocator
> rewrite/improvement based on Nico's excellent unmerged v4.5
work) This
> seems to work really well and arguably even better than how it
is now
> with more elegant handling of above and below 4G resources.
>
> Now my question is the following:
> On some Stacks there are multiple root busses, but the resources
need
> to be allocated on the same window. My initial idea was to add
those
> root busses as separate struct bus in the domain->link_list.
However
> currently the allocator assumes only one bus on domains (and
bridges).
> In the code you'll see a lot of things like
>
> for (child = domain->link_list->children; child; child =
child->sibling)
>       ....
>
> This is fine if there is only one bus on the domain.
> Looping over link_list->next, struct bus'ses is certainly an option
> here, but I was told that having only one bus here was a design
> decision on the allocator v4 rewrite. I'm not sure how common that
> assumption is in the tree, so things could be broken in awkward
ways.
>
> Do you have any suggestions to move forward?
>
> Kind regards
>
> Arthur Heymans
>
>
> _______________________________________________
> coreboot mailing list -- coreboot@coreboot.org
> To unsubscribe send an email to coreboot-leave@coreboot.org

_______________________________________________
coreboot mailing list -- coreboot@coreboot.org
To unsubscribe send an email to coreboot-leave@coreboot.org

coreboot mailing list --coreboot@coreboot.org To unsubscribe send an email tocoreboot-leave@coreboot.org

Arthur Heymans

10:30 a.m.

...

e.g. if we got from HOB info that physical stack x has preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem 0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and there are 2 root buses 0x20 and 0x28 instead of adding one domain with "physical" stack we added two domains with "virtual" stacks: stack x1 with virtually preallocated PCI buses 0x20..0x27, ip form 0x2000..0x27ff, mem 0xd0000000..0xd7ffffff, mem 0x10000000000...0x17fffffffff stack x2 with virtually preallocated PCI buses 0x28..0x2f, ip form 0x2800..0x2fff, mem 0xd7000000..0xdfffffff, mem 0x18000000000...0x1ffffffffff

Each one with only one root bus without this link_list->next "complexity"

This only works if the downstream resources fit in this split virtual allocation, which you can't know before reading all downstream resources. Especially for mem32 resources the resource allocation is already tight so I think this can get ugly.

At some point of time I was thinking about something called "subdomains"

...

concept to cover this multiple root buses in one domain case so to make something like: domain 0 //domain domain 1 //subdomain first root bus from stack x and its downstream devices end domain 2 //subdomain second root bus from stack x and its downstream devices end end domain ... ... end ...

The way I understood it, domains are a set of resource windows to be constrained and then distributed over children and in this case children over multiple PCI root busses. I have some doubts that subdomains map the situation correctly/efficiently, because it has essentially the same problem as knowing how to split the resources between domains correctly.

OTOH, does it even make sense to map this in the devicetree? The way FSP reports stacks is generated at runtime and differs depending on the hardware configuration. So having a static structure mapping that may not be interesting?

Arthur

On Tue, Mar 22, 2022 at 9:58 AM Mariusz Szafrański via coreboot < coreboot@coreboot.org> wrote:

...

Hi Artur,

Multiple PCI root bus per domain gives us more control about resource allocation for downstream devices from poll preallocated by FSP to stack but adds this link_list->next looping complexity (maybe not much deal to handle that as you stated). Additional work will be needed for statically defining them in devicetree.

For us it was easiest to split them "virtually"

e.g. if we got from HOB info that physical stack x has preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem 0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and there are 2 root buses 0x20 and 0x28 instead of adding one domain with "physical" stack we added two domains with "virtual" stacks: stack x1 with virtually preallocated PCI buses 0x20..0x27, ip form 0x2000..0x27ff, mem 0xd0000000..0xd7ffffff, mem 0x10000000000...0x17fffffffff stack x2 with virtually preallocated PCI buses 0x28..0x2f, ip form 0x2800..0x2fff, mem 0xd7000000..0xdfffffff, mem 0x18000000000...0x1ffffffffff

Each one with only one root bus without this link_list->next "complexity"

It was just a "shortcut" for now.

At some point of time I was thinking about something called "subdomains" concept to cover this multiple root buses in one domain case so to make something like: domain 0 //domain domain 1 //subdomain first root bus from stack x and its downstream devices end domain 2 //subdomain second root bus from stack x and its downstream devices end end domain ... ... end ...

But finally didn`t tried to implement it.

Additional dirty "overflow" trick that I was used for some time was to use something like: domain 0 //first root bus pci 0:0.1....end .... //second root bus pci 0x20:0....end //overflow at 0x20 pci bus number boundary .... end

And dynamic update 0x20 -> real bus number at runtime

All above is what we have tried but all that is not a final solution. Maybe it will give someone else hint to find a better/easier way to handle this hw in coreboot.

Mariusz W dniu 22.03.2022 o 08:29, Arthur Heymans pisze:

Hi Mariusz

I was inspired by the multi domain approach doc and got quite far already. I decided to allocate and attach domains at runtime for the moment instead of statically via the devicetree. In the future I think having devicetree structures makes a lot of sense, e.g. to provide stack specific configuration or derive the IIO bifurcation configuration from it.

I see that FSP sometimes does this virtual splitting but not always and just reports multiple PCI roots on one stack (via a HOB). I don't think splitting up a reported stack in multiple domains in coreboot is a good idea. This means you need to be aware of the downstream resources when just constraining the domain resources to the ones reported to be allocated to the stack. This agains means bypassing or redoing a lot of the coreboot resources allocation so I doubt this approach makes sense. I'm currently thinking that the multiple PCI root bus per domain approach is the easiest. It already works with some minor allocator changes.

Kind regards

Arthur

On Tue, Mar 22, 2022 at 8:15 AM Mariusz Szafrański via coreboot < coreboot@coreboot.org> wrote:

...
Hi Arthur,

In our multidomain based PoC in this situation (multiple root busses on one stack) we "virtually" splitted this stack and its resource window to two or more virtual stacks and later handled as separate stacks.

Mariusz

W dniu 17.03.2022 o 19:03, Arthur Heymans pisze:

...
Hi

I've recently tried to improve the soc/intel/xeon_sp codebase. I want to make it use more native coreboot structures and codeflows instead of parsing the FSP HOB again and again to do things. Ideally the HOB is parsed only once in ramstage, parsed into adequate native coreboot structures (struct device, struct bus, chip_info, ...) and used later on.

The lowest hanging fruit in that effort is resource allocation. Currently the coreboot allocator is sort of hijacked by the soc code and done over again. The reason for this is that xeon_sp platforms operate a bit differently than most AMD and Intel Client hardware: there are multiple root busses. This means that there are PCI busses that are in use, but are not downstream from PCI bus 0. In hardware terminology those are the IIO and other type of Stacks.

Each Stack has its own range of usable PCI Bus numbers and decoded IO and MEM spaces below and above 4G. I tried to map these hardware concepts to the existing coreboot 'domain' structure. Each domain has resource windows that are used to allocate children devices on, which would be the PCI devices on the stacks. The allocator needs some tweaks to allow for multiple resources of a type (MEM or IO), but nothing major. See https://review.coreboot.org/c/coreboot/+/62353/ and https://review.coreboot.org/c/coreboot/+/62865 (allocator rewrite/improvement based on Nico's excellent unmerged v4.5 work) This seems to work really well and arguably even better than how it is now with more elegant handling of above and below 4G resources.

Now my question is the following: On some Stacks there are multiple root busses, but the resources need to be allocated on the same window. My initial idea was to add those root busses as separate struct bus in the domain->link_list. However currently the allocator assumes only one bus on domains (and bridges). In the code you'll see a lot of things like

for (child = domain->link_list->children; child; child = child->sibling) ....

This is fine if there is only one bus on the domain. Looping over link_list->next, struct bus'ses is certainly an option here, but I was told that having only one bus here was a design decision on the allocator v4 rewrite. I'm not sure how common that assumption is in the tree, so things could be broken in awkward ways.

Do you have any suggestions to move forward?

Kind regards

Arthur Heymans

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Nico Huber

11:56 a.m.

On 22.03.22 10:30, Arthur Heymans wrote:

...

OTOH, does it even make sense to map this in the devicetree? The way FSP reports stacks is generated at runtime and differs depending on the hardware configuration. So having a static structure mapping that may not be interesting?

IMO, a static devicetree structure only makes sense if you can tell FSP to configure the hardware like this or it is somehow predefined, e.g. descriptor straps. If FSP is broken and makes decisions on its own, better generate the tree at runtime.

Nico

Mariusz Szafrański

12:01 p.m.

W dniu 22.03.2022 o 10:30, Arthur Heymans pisze:

...

e.g. if we got from HOB info that physical stack x has
preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem
0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and
there are 2 root buses 0x20 and 0x28 instead of adding one domain
with "physical" stack we added two domains with "virtual" stacks:

stack x1 with virtually preallocated PCI buses 0x20..0x27, ip form
0x2000..0x27ff, mem 0xd0000000..0xd7ffffff, mem
0x10000000000...0x17fffffffff
stack x2 with virtually preallocated PCI buses 0x28..0x2f, ip form
0x2800..0x2fff, mem 0xd7000000..0xdfffffff, mem
0x18000000000...0x1ffffffffff

Each one with only one root bus without this link_list->next
"complexity"

Today most resources are mem64 "ready" and above 4G window is big enough so using "prefer 64bit strategy" practically eliminates this tight on mem32 range

...

At some point of time I was thinking about something called
"subdomains" concept to cover this multiple root buses in one
domain case so to make something like:

    domain 0  //domain
        domain 1 //subdomain
             first root bus from stack x and its downstream devices
        end
        domain 2 //subdomain
            second root bus from stack x and its downstream devices
        end
    end
    domain ...
        ...
    end
    ...
T8he way I understood it, domains are a set of resource windows to be constrained and then distributed over children and in this case children over multiple PCI root busses. I have some doubts that subdomains map the situation correctly/efficiently, because it has essentially the same problem as knowing how to split the resources between domains correctly.

As I stated it was never implemented as subdomain level device is more a "bridge device" than "domain device"

...

OTOH, does it even make sense to map this in the devicetree? The way FSP reports stacks is generated at runtime and differs depending on the hardware configuration. So having a static structure mapping that may not be interesting?

Arthur

It depends if few ms in boot time does matter.

It also depends on SoCs familly. Didn`t checked xeon but on SoCs I was worked on most of stack configuration was common to whole family with only minor differences on 1 or 2 stacks.

Mariusz

Nico Huber

12:28 p.m.

On 22.03.22 12:01, Mariusz Szafrański via coreboot wrote:

...

...
At some point of time I was thinking about something called "subdomains" concept to cover this multiple root buses in one domain case so to make something like:

domain 0 //domain domain 1 //subdomain first root bus from stack x and its downstream devices end domain 2 //subdomain second root bus from stack x and its downstream devices end end domain ... ... end ...

T8he way I understood it, domains are a set of resource windows to be constrained and then distributed over children and in this case children over multiple PCI root busses. I have some doubts that subdomains map the situation correctly/efficiently, because it has essentially the same problem as knowing how to split the resources between domains correctly.

As I stated it was never implemented as subdomain level device is more a "bridge device" than "domain device"

This sounds much like my idea to have device nodes for the PCI host bridges :) I believe it's the right way to model the hierarchy. Having these nodes might also make it easier to map things to ACPI.

...

...
OTOH, does it even make sense to map this in the devicetree? The way FSP reports stacks is generated at runtime and differs depending on the hardware configuration. So having a static structure mapping that may not be interesting?

Arthur

It depends if few ms in boot time does matter.

Do you mean us? or are the HOBs that huge?

Nico

Nico Huber

12:20 p.m.

On 22.03.22 09:57, Mariusz Szafrański via coreboot wrote:

...

e.g. if we got from HOB info that physical stack x has preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem 0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and there are 2 root buses 0x20 and 0x28 instead of adding one domain with "physical" stack we added two domains with "virtual" stacks:

I'm still trying to learn what a "stack" comprises. I'm pretty sure most of the problems solve themselves if we map the Intel terms to standard and coreboot terms.

Would the following be a correct statement about stacks? A "stack" always has dedicated I/O port and memory ranges (that don't overlap with anything else, especially not with the ranges of other stacks) and has one or more PCI root buses.

If so, are the PCI bus numbers separate from those of other stacks? Or do all stacks share a single range of 0..255 PCI buses? In standard terms, do they share a single PCI segment group?

Nico

Arthur Heymans

12:38 p.m.

...

...
e.g. if we got from HOB info that physical stack x has preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem 0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and there are 2 root buses 0x20 and 0x28 instead of adding one domain with "physical" stack we added two domains with "virtual" stacks: stack x1 with virtually preallocated PCI buses 0x20..0x27, ip form 0x2000..0x27ff, mem 0xd0000000..0xd7ffffff, mem 0x10000000000...0x17fffffffff stack x2 with virtually preallocated PCI buses 0x28..0x2f, ip form 0x2800..0x2fff, mem 0xd7000000..0xdfffffff, mem 0x18000000000...0x1ffffffffff

Each one with only one root bus without this link_list->next "complexity"

This only works if the downstream resources fit in this split virtual allocation, which you can't know before reading all downstream resources. Especially for mem32 resources the resource allocation is already tight so I think this can get ugly.

Today most resources are mem64 "ready" and above 4G window is big enough so using "prefer 64bit strategy" practically eliminates this tight on mem32 range

I've seen FSP allocating just enough mem32 space for buildin endpoint PCI 32bit-only resources spread over multiple root busses on the same stack (on DINO stack? No idea what that really means). Just splitting the mem32 resources in 'half' like you suggest would break allocation, so no. The "prefer 64bit strategy" is certainly needed but not sufficient.

On 22.03.22 09:57, Mariusz Szafrański via coreboot wrote:

...

...
e.g. if we got from HOB info that physical stack x has preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem 0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and there are 2 root buses 0x20 and 0x28 instead of adding one domain with "physical" stack we added two domains with "virtual" stacks:

I'm still trying to learn what a "stack" comprises. I'm pretty sure most of the problems solve themselves if we map the Intel terms to standard and coreboot terms.

Would the following be a correct statement about stacks? A "stack" always has dedicated I/O port and memory ranges (that don't overlap with anything else, especially not with the ranges of other stacks) and has one or more PCI root buses.

If so, are the PCI bus numbers separate from those of other stacks? Or do all stacks share a single range of 0..255 PCI buses? In standard terms, do they share a single PCI segment group?

So that is actually configurable in the hardware but currently all stacks consume a set of PCI busses on a single 0..255 PCI segment. Those PCI busses allocated to a stack are then consumed by endpoint devices directly on the stack or by 'regular' PCI bridges on there.

sidenote: it also looks like the hardware really does not like to have PCI bridges on a IIO stack set a subordinate value larger than the IIO stack 'MaxBus' (basically a stack-level subordinate bus?). So scanning PCI busses needs some care. See https://review.coreboot.org/c/coreboot/+/59395

On Tue, Mar 22, 2022 at 12:21 PM Nico Huber nico.h@gmx.de wrote:

...

On 22.03.22 09:57, Mariusz Szafrański via coreboot wrote:

...
e.g. if we got from HOB info that physical stack x has preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem 0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and there are 2 root buses 0x20 and 0x28 instead of adding one domain with "physical" stack we added two domains with "virtual" stacks:

I'm still trying to learn what a "stack" comprises. I'm pretty sure most of the problems solve themselves if we map the Intel terms to standard and coreboot terms.

Would the following be a correct statement about stacks? A "stack" always has dedicated I/O port and memory ranges (that don't overlap with anything else, especially not with the ranges of other stacks) and has one or more PCI root buses.

If so, are the PCI bus numbers separate from those of other stacks? Or do all stacks share a single range of 0..255 PCI buses? In standard terms, do they share a single PCI segment group?

Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Mariusz Szafrański

2:10 p.m.

W dniu 22.03.2022 o 12:38, Arthur Heymans pisze:

...

sidenote: it also looks like the hardware really does not like to have PCI bridges on a IIO stack set a subordinate value larger than the IIO stack 'MaxBus' (basically a stack-level subordinate bus?). So scanning PCI busses needs some care. See https://review.coreboot.org/c/coreboot/+/59395

Each stack can have preassigned PCI bus range. window from busbase (pci bus no of first root bus on stack) to IIO stack 'MaxBus' inclusive. If MaxBus<busbase - no range assigned.

So you can logically (and with big simplification) imagine this as there exists preconfigured 'virtual bridge' between CPU and stack PCI root buses with secondary set to busbase and subordinate set to 'MaxBus' (same for io window/mem below 4G window/mem above 4G - one of each type per each stack)

There can also exists stacks marked as disabled or reserved with or without defined pci bus ranges. PCI bus no defined in disabled or reserved stacks should not be used/accessed. Access can cause hang/lookup or very long delays. So only bus ranges defined in "enabled" stacks should be used.

So it can be handled as you proposed in CB:59395 or we can define weak function e.g. get_max_subordinate(int current) which return 0xff by default and can be overriden in soc code to return real allowed max subordinate no.

int __weak get_max_subordinate(int current) { return 0xff;};

and in src/device/pci_device.c

subordinate = get_max_subordinate(primary); // instead of subordinate = 0xff; /* MAX PCI_BUS number here */

Mariusz

Arthur Heymans

2:40 p.m.

...

So it can be handled as you proposed in CB:59395 or we can define weak function e.g. get_max_subordinate(int current) which return 0xff by default and can be overriden in soc code to return real allowed max subordinate no.

int __weak get_max_subordinate(int current) { return 0xff;};

and in src/device/pci_device.c

subordinate = get_max_subordinate(primary); // instead of subordinate = 0xff; /* MAX PCI_BUS number here */

I chose to have it directly in the devicetree over weak functions as the the soc specific override function would essentially be a loop over the devicetree struct which seems more fragile when things are being appended to it (scan_bus).

On Tue, Mar 22, 2022 at 2:11 PM Mariusz Szafrański via coreboot < coreboot@coreboot.org> wrote:

...

W dniu 22.03.2022 o 12:38, Arthur Heymans pisze:

...
sidenote: it also looks like the hardware really does not like to have PCI bridges on a IIO stack set a subordinate value larger than the IIO stack 'MaxBus' (basically a stack-level subordinate bus?). So scanning PCI busses needs some care. See https://review.coreboot.org/c/coreboot/+/59395

Each stack can have preassigned PCI bus range. window from busbase (pci bus no of first root bus on stack) to IIO stack 'MaxBus' inclusive. If MaxBus<busbase - no range assigned.

So you can logically (and with big simplification) imagine this as there exists preconfigured 'virtual bridge' between CPU and stack PCI root buses with secondary set to busbase and subordinate set to 'MaxBus' (same for io window/mem below 4G window/mem above 4G - one of each type per each stack)

There can also exists stacks marked as disabled or reserved with or without defined pci bus ranges. PCI bus no defined in disabled or reserved stacks should not be used/accessed. Access can cause hang/lookup or very long delays. So only bus ranges defined in "enabled" stacks should be used.

So it can be handled as you proposed in CB:59395 or we can define weak function e.g. get_max_subordinate(int current) which return 0xff by default and can be overriden in soc code to return real allowed max subordinate no.

int __weak get_max_subordinate(int current) { return 0xff;};

and in src/device/pci_device.c

subordinate = get_max_subordinate(primary); // instead of subordinate = 0xff; /* MAX PCI_BUS number here */

Mariusz

coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Jeff Daly

5:50 p.m.

...

-----Original Message----- From: Nico Huber nico.h@gmx.de Sent: Tuesday, March 22, 2022 7:21 AM To: coreboot@akumat.pl; coreboot@coreboot.org Subject: [coreboot] Re: Multi domain PCI resource allocation: How to deal with multiple root busses on one domain

Caution: This is an external email. Please take care when clicking links or opening attachments.

On 22.03.22 09:57, Mariusz Szafrański via coreboot wrote:

...
e.g. if we got from HOB info that physical stack x has preallocated PCI buses 0x20..0x2f, io form 0x2000..0x2fff, mem 0xd0000000..0xdfffffff, mem 0x10000000000...0x1ffffffffff and there are 2 root buses 0x20 and 0x28 instead of adding one domain with "physical" stack we added two domains with "virtual" stacks:

I'm still trying to learn what a "stack" comprises. I'm pretty sure most of the problems solve themselves if we map the Intel terms to standard and coreboot terms.

Would the following be a correct statement about stacks? A "stack" always has dedicated I/O port and memory ranges (that don't overlap with anything else, especially not with the ranges of other stacks) and has one or more PCI root buses.

If so, are the PCI bus numbers separate from those of other stacks? Or do all stacks share a single range of 0..255 PCI buses? In standard terms, do they share a single PCI segment group?

Nico _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org

Think of a stack as a virtual root port which has the same decoding ability as a PCIe root port. It has a start bus number and an end bus number and it has an IO start and end address as well as a MMIO start and end address. The stacks in a system are peer root ports so if your stack bus (PCI bus) numbering is 0, 32, 64, 128, 255 then MMCFG cycles to bus 0-31 go to stack 0, while cycles to bus 32-63 go to stack 1 and so on. The last stack only sees MMCFG cycles for bus 255. So basically all the stacks can be thought of as separate host bridges each with a distinct range of decoding resources.

The stacks themselves are made up of a PCIe (and/or DMI) root ports in the device range 0-7. For stack 0 as an example which should start at bus 0, devices 0-7 will be 'internal' devices only decoded by the stack itself, and devices 8-31 (for bus 0) will be forwarded down the internal DMI link to the PCH cluster. Devices 0-3 (for SKL) on each stack (if implemented) are PCIe root ports (or the DMI to PCH cluster on stack 0) and work like you would expect. Cycles directed their b/d/f are decoded properly, type 1 cycles are checked to see if their secondary/subordinate bus ranges match in parallel with each other. Any config cycles to a stacks base bus number with device numbers 8-31 go to internally decoded 'uncore' devices like memory controllers, power control, address decoding, etc. Devices 4-7 on each stack are not (at this time) PCIe root ports and implement things like an IOAPIC for devices on that stack, memory mapping and virtualization for that stack, RAS, etc.

There are not any PCI-PCI bridges on stacks that are not implemented by devices other than 0-3, nor are there any IO/MMIO resources decoded by any devices on the stacks' first bus number beyond device 7.

1138

days inactive

1139

days old

coreboot@coreboot.org

22 comments

7 participants

tags (0)

participants (7)

Arthur Heymans
Jeff Daly
Jonathan Zhang (Infra)
Lance Zhao
Mariusz Szafrański
Mariusz Szafrański
Nico Huber