Hi,
I'm trying to understand how HT is modeled into PCI space so that I can propose the "right" way to handle it in the dts. Depending on whether I run lspci -t under coreboot or factory BIOS, different topologies will be displayed. That means looking at lspci is not going to tell me how the hardware really works.
Given a standard setup with three HT links from the CPU, where do I find which device? Is PCI device 18.0 sort of a PCI bridge which has multiple PCI buses (HT links) behind itself?
+--18.0----(first HT link)--+--0.0 | |\ --0.1 | | | | | --(second HT link)---(may be empty) | | | ----(third HT link)--+--0.0 | +--1.0 | --1.1 +--18.1 +--18.2 +--18.3
Will a MP setup look like this: +--18.0----(first HT link)--+--19.0---(2nd HT link) | |\ | --(3rd HT link) | | | +--19.1 | | | +--19.2 | | | +--19.3 | | | | | --(second HT link)---(may be empty) | | | ----(third HT link)--+--0.0 | +--1.0 | --1.1 +--18.1 +--18.2 +--18.3
Or is the hardware organized in a completely different way? I'm especially curious about the MP scenario as depicted above. Where do the PCI functions of 18.[0123] reside?
Regards, Carl-Daniel
+--18.0----(Link n where n in [0,1,2])--+--19.0--(CPU)-(2nd HT link) | |\ | --(3rd HT link) | | | +--19.1 | | | +--19.2 | | | +--19.3 | | | | | --18.0(second HT link)---(may be empty) | | | ----18.0(third HT link)--+--0.0 | +--1.0 | --1.1 +--18.1 +--18.2 +--18.3
This looks almost right to me. But the routing functions are in F1. And the link that is used need not be first, it can be link 2 or 3. You might want to put the CPU in that picture (I just did)
ron
On Fri, Oct 24, 2008 at 6:10 AM, ron minnich rminnich@gmail.com wrote:
+--18.0----(Link n where n in [0,1,2])--+--19.0--(CPU)-(2nd HT link) | |\ |
--(3rd HT link)
| | | +--19.1 | | | +--19.2 | | | +--19.3 | | | | | --18.0(second HT link)---(may be empty) | | | ----18.0(third HT link)--+--0.0 | +--1.0 | --1.1 +--18.1 +--18.2 +--18.3
you should only specify sb chain that lpc bus to rom.
others should be probed automatically.
v2 already did that. and only need to specify first node.
YH
On 24.10.2008 20:51, yhlu wrote:
On Fri, Oct 24, 2008 at 6:10 AM, ron minnich rminnich@gmail.com wrote:
+--18.0----(Link n where n in [0,1,2])--+--19.0--(CPU)-(2nd HT link) | |\ |
--(3rd HT link)
| | | +--19.1 | | | +--19.2 | | | +--19.3 | | | | | --18.0(second HT link)---(may be empty) | | | ----18.0(third HT link)--+--0.0 | +--1.0 | --1.1 +--18.1 +--18.2 +--18.3
you should only specify sb chain that lpc bus to rom.
others should be probed automatically.
We need automatic probing, but we also need a way to store per-device configuration. That means we have to model all SB chains well enough.
v2 already did that. and only need to specify first node.
What happens if the first nodes are identical like in some dual chipset boards?
Regards, Carl-Daniel
On 24.10.2008 15:10, ron minnich wrote:
+--18.0----(Link n [0,1,2])--+--19.0--(CPU)-(2nd HT link) | |\ | --(3rd HT link) | | | +--19.1 | | | +--19.2 | | | +--19.3 | | | | | --18.0(second HT link)---(may be empty) | | | ----18.0(third HT link)--+--0.0 | +--1.0 | --1.1 +--18.1 +--18.2 +--18.3
This looks almost right to me. But the routing functions are in F1.
So the HT links are attached to 18.0, but routing control is in 18.1?
And the link that is used need not be first, it can be link 2 or 3. You might want to put the CPU in that picture (I just did)
Thanks.
With Marc's mail, this is getting more complicated. It may be the best thing to stick with the logical PCI structure of the system, however that is not clear at all and seems to depend a great deal on the used firmware.
Regards, Carl-Daniel
On Fri, Oct 24, 2008 at 5:05 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
With Marc's mail, this is getting more complicated. It may be the best thing to stick with the logical PCI structure of the system, however that is not clear at all and seems to depend a great deal on the used firmware.
AMD put huge brainpower into making the HT bits fit into pci config space ... I see now reason to change that decision.
"Getting more complicated" is a bad sign :-)
ron
The HT topology is not really directly reflected in PCI config space. They are obviously linked, but there is not really a way to "map" a HT topology of Opteron nodes to a graphical view of config space, it just doesn't exist. The Opterons just are where they are.
All of the processor PCI devices are on Bus 0, and will always be on Bus 0. Devices 18-1f are the only ones that will ever exist for Opterons. The device number that each Opteron responds to is based on NodeID (0-7), which is set on each Opteron during discovery. There won't ever be "holes", the NodeIDs must always be contiguous. Node IDs are not set in stone based on topology, node 0 is always the BSP, but 1-7 can basically be distributed out any of the 3 links to any other CPUs in any manner. AGESA has a default "discovery method" (I think breadth first, lowest link number first) but it has options to over-ride the discovery mechanism to change the order of nodes in a system. All that matters is that the routing tables are correct and consistent for the traffic to get where it needs to and without deadlock. Once that is complete, the processors just show up in PCI as devices 18-1f (or fewer)
Here is an lpsci from one of our systems with 5 nodes (AMI BIOS with AGESA):
00:01.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge 00:02.0 Host bridge: Broadcom BCM5785 [HT1000] Legacy South Bridge 00:02.1 IDE interface: Broadcom BCM5785 [HT1000] IDE 00:02.2 ISA bridge: Broadcom BCM5785 [HT1000] LPC 00:03.0 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:03.1 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:03.2 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:04.0 VGA compatible controller: XGI Technology Inc. (eXtreme Graphics Innovation) Volari Z7 00:06.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:07.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:08.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:09.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:0a.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:1a.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:1a.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:1a.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:1a.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:1b.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:1b.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:1b.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:1b.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:1c.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:1c.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:1c.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:1c.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:0d.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge (rev c0) 01:0e.0 IDE interface: Broadcom BCM5785 [HT1000] SATA (PATA/IDE Mode) 01:0e.1 IDE interface: Broadcom BCM5785 [HT1000] SATA (PATA/IDE Mode) 03:00.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 04:01.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 04:02.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 04:03.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 04:04.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 0c:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 0c:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
If we add or remove processors, nothing beside the 18-1f devices will change (SB Bus numbers, device numbers, etc do not change). When we add another *non* coherent HT device attached to one of the Opterons, it gets a new bus number (we start at 20 with ours, but it is arbitrary). All of the routing associated with HT for both coherent and non-coherent is contained in the mapping registers and routing table registers in all of the Opterons. The mapping registers map mem/io/cfg regions to nodes, and the routing table says how to get to that node. The ncHT devices can have BARs, and take up memory mapped IO just the same as another PCI device.
On Thu, Oct 23, 2008 at 10:40 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I'm trying to understand how HT is modeled into PCI space so that I can propose the "right" way to handle it in the dts. Depending on whether I run lspci -t under coreboot or factory BIOS, different topologies will be displayed. That means looking at lspci is not going to tell me how the hardware really works.
I am a little bit confused by this. What are the exact differences you see between coreboot and factory? The number of Opterons should be that same. The position in config space of a particular socket may change, based on node discovery differences between the BIOSes. Their is no reason for other devices to move because of the HT changes, but they may move just by other differences in coreboot.
Tom
On 24.10.2008 19:50, Tom Sylla wrote:
The HT topology is not really directly reflected in PCI config space. They are obviously linked, but there is not really a way to "map" a HT topology of Opteron nodes to a graphical view of config space, it just doesn't exist. The Opterons just are where they are.
Thanks for your detailed explanation. That clears up quite a bit of confusion. I had hoped that we could easily represent the 8-processor ladder vs. crossbar cHT topologies in a device tree/graph together with the ncHT links, but I'll leave the cHT stuff out for now.
All of the processor PCI devices are on Bus 0, and will always be on Bus 0. Devices 18-1f are the only ones that will ever exist for Opterons. The device number that each Opteron responds to is based on NodeID (0-7), which is set on each Opteron during discovery. There won't ever be "holes", the NodeIDs must always be contiguous. Node IDs are not set in stone based on topology, node 0 is always the BSP, but 1-7 can basically be distributed out any of the 3 links to any other CPUs in any manner.
Ah.
AGESA has a default "discovery method" (I think breadth first, lowest link number first) but it has options to over-ride the discovery mechanism to change the order of nodes in a system. All that matters is that the routing tables are correct and consistent for the traffic to get where it needs to and without deadlock.
Getting the routing tables right is non-trivial for MP setups, especially if we don't know how the hardware is wired. My hope was to be able to express the cHT topologies in a way which allows us to derive correct routing tables. I'm postponing that goal for now.
Once that is complete, the processors just show up in PCI as devices 18-1f (or fewer)
They show up on bus 0 as you wrote. Will/can any devices attached via ncHT also show up on bus 0? If we have multiple ncHT links, what decides about the bus numbers for each of them?
Here is an lpsci from one of our systems with 5 nodes (AMI BIOS with AGESA):
00:01.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge 00:02.0 Host bridge: Broadcom BCM5785 [HT1000] Legacy South Bridge 00:02.1 IDE interface: Broadcom BCM5785 [HT1000] IDE 00:02.2 ISA bridge: Broadcom BCM5785 [HT1000] LPC 00:03.0 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:03.1 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:03.2 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:04.0 VGA compatible controller: XGI Technology Inc. (eXtreme Graphics Innovation) Volari Z7 00:06.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:07.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:08.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:09.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:0a.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:1a.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:1a.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:1a.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:1a.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:1b.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:1b.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:1b.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:1b.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:1c.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:1c.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:1c.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:1c.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:0d.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge (rev c0) 01:0e.0 IDE interface: Broadcom BCM5785 [HT1000] SATA (PATA/IDE Mode) 01:0e.1 IDE interface: Broadcom BCM5785 [HT1000] SATA (PATA/IDE Mode) 03:00.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 04:01.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 04:02.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 04:03.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 04:04.0 PCI bridge: PLX Technology, Inc. Unknown device 8518 (rev aa) 0c:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 0c:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
If we add or remove processors, nothing beside the 18-1f devices will change (SB Bus numbers, device numbers, etc do not change). When we add another *non* coherent HT device attached to one of the Opterons, it gets a new bus number (we start at 20 with ours, but it is arbitrary). All of the routing associated with HT for both coherent and non-coherent is contained in the mapping registers and routing table registers in all of the Opterons. The mapping registers map mem/io/cfg regions to nodes, and the routing table says how to get to that node. The ncHT devices can have BARs, and take up memory mapped IO just the same as another PCI device.
If I understand you correctly, it would be easy to have 00:01.0-00:0a.0 appear as 01:01.0-01:0a.0 (bus 1) while still keeping the 18-1f devices on the hardcoded bus 0.
On Thu, Oct 23, 2008 at 10:40 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I'm trying to understand how HT is modeled into PCI space so that I can propose the "right" way to handle it in the dts. Depending on whether I run lspci -t under coreboot or factory BIOS, different topologies will be displayed. That means looking at lspci is not going to tell me how the hardware really works.
I am a little bit confused by this. What are the exact differences you see between coreboot and factory? The number of Opterons should be that same. The position in config space of a particular socket may change, based on node discovery differences between the BIOSes. Their is no reason for other devices to move because of the HT changes, but they may move just by other differences in coreboot.
IIRC I saw a board which only had 18-1f on bus 0 and everything else on other buses. AFAICS having devices on the same bus as the processor devices or not is a topology difference.
Would you mind posting lspci -tvnn for that 5-processor board as well? It would help me a lot to understand this issue better.
Regards, Carl-Daniel
On Fri, Oct 24, 2008 at 8:24 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
AGESA has a default "discovery method" (I think breadth first, lowest link number first) but it has options to over-ride the discovery mechanism to change the order of nodes in a system. All that matters is that the routing tables are correct and consistent for the traffic to get where it needs to and without deadlock.
Getting the routing tables right is non-trivial for MP setups, especially if we don't know how the hardware is wired. My hope was to be able to express the cHT topologies in a way which allows us to derive correct routing tables. I'm postponing that goal for now.
Yeah, that is a very complex thing to do. Just spewing values into the routing table registers is a reasonable way to go, especially at first.
Once that is complete, the processors just show up in PCI as devices 18-1f (or fewer)
They show up on bus 0 as you wrote. Will/can any devices attached via ncHT also show up on bus 0? If we have multiple ncHT links, what decides about the bus numbers for each of them?
Yes, they can sometimes, and it is sort of a special case. If you look in my lspci dump, you will see lots of southbridge devices on bus 0. If you added another ncHT device, e.g. another HT1000, that southbridge would have to have its bus number shifted so devices would not conflict. You could put it at 1, 6, 20, etc. Other nc devices are the same, we have the ability to add up to 3 ncHT FPGAs to our system, and when we do so, they appear on busses 20, 21, and 22 (we picked those and set them in our BIOS). I think I have seen coreboot code using 40, 80, c0, etc. The NC devices I have seen all have registers to program their PCI bus number. You might want to look at the HT spec's information about bus numbering. It describes the reasoning about SB stuff living on bus 0.
If we add or remove processors, nothing beside the 18-1f devices will change (SB Bus numbers, device numbers, etc do not change). When we add another *non* coherent HT device attached to one of the Opterons, it gets a new bus number (we start at 20 with ours, but it is arbitrary). All of the routing associated with HT for both coherent and non-coherent is contained in the mapping registers and routing table registers in all of the Opterons. The mapping registers map mem/io/cfg regions to nodes, and the routing table says how to get to that node. The ncHT devices can have BARs, and take up memory mapped IO just the same as another PCI device.
If I understand you correctly, it would be easy to have 00:01.0-00:0a.0 appear as 01:01.0-01:0a.0 (bus 1) while still keeping the 18-1f devices on the hardcoded bus 0.
As long as the nc device lets you change the bus number that it sits on (and it should, though I have only looked at a couple). You might want to see how these ever-confusing options are used in v2: HT_CHAIN_UNITID_BASE, HT_CHAIN_END_UNITID_BASE, SB_HT_CHAIN_ON_BUS0, SB_HT_CHAIN_UNITID_OFFSET_ONLY.
I am a little bit confused by this. What are the exact differences you see between coreboot and factory? The number of Opterons should be that same. The position in config space of a particular socket may change, based on node discovery differences between the BIOSes. Their is no reason for other devices to move because of the HT changes, but they may move just by other differences in coreboot.
IIRC I saw a board which only had 18-1f on bus 0 and everything else on other buses. AFAICS having devices on the same bus as the processor devices or not is a topology difference.
Hopefully it is clear now how things can move like that. The Opterons won't move. It is possible with HT that other devices may exist on higher bus numbers without a bridge (real or fake) from bus 0. It is weird, and non-legacy compatible, so it should not happen with NB and SB devices. There are exceptions, though, when we connect our nc FPGAs, and put them at bus 20, we have no bridge in config space connecting them to bus 0. By default, the linux kernel will not find them (it does a normal PCI scan, looking for bridges, subordinates, etc). We must advertise the non-contiguous PCI busses in an ACPI table for Linux and Windows to "see" the higher busses that are not bridged to bus 0. (there are some other ways to force the linux kernel to find the devices, but the ACPI method works for all the current OSes we've tried). The point is that making the PCI busses discontiguous is "weird", and makes you jump through other hoops to play well with OSes.
Would you mind posting lspci -tvnn for that 5-processor board as well? It would help me a lot to understand this issue better.
Yep, when I am at the machine again, I'll send it.
On Sat, Oct 25, 2008 at 9:25 AM, Tom Sylla tsylla@gmail.com wrote:
Hopefully it is clear now how things can move like that. The Opterons won't move. It is possible with HT that other devices may exist on higher bus numbers without a bridge (real or fake) from bus 0. It is weird, and non-legacy compatible, so it should not happen with NB and SB devices. There are exceptions, though, when we connect our nc FPGAs, and put them at bus 20, we have no bridge in config space connecting them to bus 0. By default, the linux kernel will not find them (it does a normal PCI scan, looking for bridges, subordinates, etc). We must advertise the non-contiguous PCI busses in an ACPI table for Linux and Windows to "see" the higher busses that are not bridged to bus 0. (there are some other ways to force the linux kernel to find the devices, but the ACPI method works for all the current OSes we've tried). The point is that making the PCI busses discontiguous is "weird", and makes you jump through other hoops to play well with OSes.
linux kernel will do some legacy check from irq_routes_tables..... when acpi is disabled...
YH
On Fri, Oct 24, 2008 at 8:24 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Would you mind posting lspci -tvnn for that 5-processor board as well? It would help me a lot to understand this issue better.
Here they are, if you are still interested. There is a 5-node, and 8-node, and a 2-node with an nc device attached. This system is Barcelona, so you will see more devices for each Opteron.
On 28.10.2008 00:49, Tom Sylla wrote:
On Fri, Oct 24, 2008 at 8:24 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Would you mind posting lspci -tvnn for that 5-processor board as well? It would help me a lot to understand this issue better.
Here they are, if you are still interested. There is a 5-node, and 8-node, and a 2-node with an nc device attached. This system is Barcelona, so you will see more devices for each Opteron.
Thanks a lot! These dumps are extremely helpful in understanding the desired dts structure better. My mail to Marc from a few minutes ago already partially incorporates the new knowledge.
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
Hi,
I'm trying to understand how HT is modeled into PCI space so that I can propose the "right" way to handle it in the dts. Depending on whether I run lspci -t under coreboot or factory BIOS, different topologies will be displayed. That means looking at lspci is not going to tell me how the hardware really works.
Given a standard setup with three HT links from the CPU, where do I find which device? Is PCI device 18.0 sort of a PCI bridge which has multiple PCI buses (HT links) behind itself?
Or is the hardware organized in a completely different way? I'm especially curious about the MP scenario as depicted above. Where do the PCI functions of 18.[0123] reside?
This is why you don't want to mix the ht links and the pci bus. The physical layout and the logical layout on pci are different. the lspci -t view is the correct view. That is how the devices are addressed. Several different physical layouts will get you the same logical layout because of how pci buses are scanned.
this cpu --- pci | cpu --- pci
would be equivalent to this (although unlikely topology) cpu --- pci | --- pci cpu
I think it would be a mistake to over complicate the dts with the physical connections. The dts should be easy for people to fill in to make a platform work. They don't know (or need to know) what ht links are connecting the cpu and what ones go to pci bus.
Marc
On 24.10.2008 20:14, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
Hi,
I'm trying to understand how HT is modeled into PCI space so that I can propose the "right" way to handle it in the dts. Depending on whether I run lspci -t under coreboot or factory BIOS, different topologies will be displayed. That means looking at lspci is not going to tell me how the hardware really works.
Given a standard setup with three HT links from the CPU, where do I find which device? Is PCI device 18.0 sort of a PCI bridge which has multiple PCI buses (HT links) behind itself?
Or is the hardware organized in a completely different way? I'm especially curious about the MP scenario as depicted above. Where do the PCI functions of 18.[0123] reside?
This is why you don't want to mix the ht links and the pci bus. The physical layout and the logical layout on pci are different. the lspci -t view is the correct view.
Except that it sometimes differs (at least in numbering) between coreboot and factory BIOS, so I'm not sure which aspects of lspci -t are immutable and which are arbitrary.
That is how the devices are addressed. Several different physical layouts will get you the same logical layout because of how pci buses are scanned.
Interesting.
this cpu --- pci | cpu --- pci
would be equivalent to this (although unlikely topology) cpu --- pci | --- pci cpu
I think it would be a mistake to over complicate the dts with the physical connections.
Is the "pci" in your graphics above a bus or a device?
From your answer it seems that "pci" represents a bus. Do 18.[0123],
19.[0123] and other processor devices all live on PCI bus 0? Is that bus somewhat related to the buses attached via ncHT? IIRC I once saw a machine where 18.[0123] was alone on bus 0 and all other PCI devices were on separate buses.
Taking the "everything is PCI" model, how would I specify the virtual PCI buses attached via ncHT? Are they children (secondary buses) of 18.0 which would act like a PCI-to-PCI bridge?
The dts should be easy for people to fill in to make a platform work.
Agreed.
They don't know (or need to know) what ht links are connecting the cpu and what ones go to pci bus.
They need a way to specify settings for any given PCI device. Since most modern machines have multiple PCI devices with the same vendor/device ID, we have to be able to identify devices based on their logical path. For that, we have to model the logical PCI bus/device/tree reasonably well. I'm trying to do that, but no model seems to fit.
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
On 24.10.2008 20:14, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
Hi,
I'm trying to understand how HT is modeled into PCI space so that I can propose the "right" way to handle it in the dts. Depending on whether I run lspci -t under coreboot or factory BIOS, different topologies will be displayed. That means looking at lspci is not going to tell me how the hardware really works.
Given a standard setup with three HT links from the CPU, where do I find which device? Is PCI device 18.0 sort of a PCI bridge which has multiple PCI buses (HT links) behind itself? Or is the hardware organized in a completely different way? I'm especially curious about the MP scenario as depicted above. Where do the PCI functions of 18.[0123] reside?
This is why you don't want to mix the ht links and the pci bus. The physical layout and the logical layout on pci are different. the lspci -t view is the correct view.
Except that it sometimes differs (at least in numbering) between coreboot and factory BIOS, so I'm not sure which aspects of lspci -t are immutable and which are arbitrary.
Bus numbering is up to the BIOS.
That is how the devices are addressed. Several different physical layouts will get you the same logical layout because of how pci buses are scanned.
Interesting.
this cpu --- pci | cpu --- pci
would be equivalent to this (although unlikely topology) cpu --- pci | --- pci cpu
I think it would be a mistake to over complicate the dts with the physical connections.
Is the "pci" in your graphics above a bus or a device? From your answer it seems that "pci" represents a bus. Do 18.[0123], 19.[0123] and other processor devices all live on PCI bus 0?
yes
Is that bus
somewhat related to the buses attached via ncHT?
in that ncHT looks like a pcibus
IIRC I once saw a
machine where 18.[0123] was alone on bus 0 and all other PCI devices were on separate buses.
There has be someother device on bus0. CPUs are not pci bridges.
Taking the "everything is PCI" model, how would I specify the virtual PCI buses attached via ncHT? Are they children (secondary buses) of 18.0 which would act like a PCI-to-PCI bridge?
it is another pci bus off of root. Just like bus0 and just like it looks in lspci -t. Tom's email describes this in more detail.
They don't know (or need to know) what ht links are connecting the cpu and what ones go to pci bus.
They need a way to specify settings for any given PCI device. Since most modern machines have multiple PCI devices with the same vendor/device ID, we have to be able to identify devices based on their logical path. For that, we have to model the logical PCI bus/device/tree reasonably well. I'm trying to do that, but no model seems to fit.
I disagree. How would a two devices of the same type require difference initialization? I think that defining where the devices are at build time is wrong. You only need the ID and the functions that device requires.
Marc
On 25.10.2008 02:33, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
On 24.10.2008 20:14, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
Hi,
I'm trying to understand how HT is modeled into PCI space so that I can propose the "right" way to handle it in the dts. Depending on whether I run lspci -t under coreboot or factory BIOS, different topologies will be displayed. That means looking at lspci is not going to tell me how the hardware really works.
Given a standard setup with three HT links from the CPU, where do I find which device? Is PCI device 18.0 sort of a PCI bridge which has multiple PCI buses (HT links) behind itself? Or is the hardware organized in a completely different way? I'm especially curious about the MP scenario as depicted above. Where do the PCI functions of 18.[0123] reside?
This is why you don't want to mix the ht links and the pci bus. The physical layout and the logical layout on pci are different. the lspci -t view is the correct view.
Except that it sometimes differs (at least in numbering) between coreboot and factory BIOS, so I'm not sure which aspects of lspci -t are immutable and which are arbitrary.
Bus numbering is up to the BIOS.
Ah thanks.
That is how the devices are addressed. Several different physical layouts will get you the same logical layout because of how pci buses are scanned.
Interesting.
this cpu --- pci | cpu --- pci
would be equivalent to this (although unlikely topology) cpu --- pci | --- pci cpu
I think it would be a mistake to over complicate the dts with the physical connections.
Is the "pci" in your graphics above a bus or a device? From your answer it seems that "pci" represents a bus. Do 18.[0123], 19.[0123] and other processor devices all live on PCI bus 0?
yes
Is that bus somewhat related to the buses attached via ncHT?
in that ncHT looks like a pcibus
Good, then we can treat it as a PCI bus in our device tree.
IIRC I once saw a machine where 18.[0123] was alone on bus 0 and all other PCI devices were on separate buses.
There has be someother device on bus0. CPUs are not pci bridges.
That's strange. I do believe you, but that would imply at least one PCI device from each ncHT link has to be visible on bus 0.
Taking the "everything is PCI" model, how would I specify the virtual PCI buses attached via ncHT? Are they children (secondary buses) of 18.0 which would act like a PCI-to-PCI bridge?
it is another pci bus off of root. Just like bus0 and just like it looks in lspci -t. Tom's email describes this in more detail.
OK, then we should model it that way in v3.
They don't know (or need to know) what ht links are connecting the cpu and what ones go to pci bus.
They need a way to specify settings for any given PCI device. Since most modern machines have multiple PCI devices with the same vendor/device ID, we have to be able to identify devices based on their logical path. For that, we have to model the logical PCI bus/device/tree reasonably well. I'm trying to do that, but no model seems to fit.
I disagree. How would a two devices of the same type require difference initialization? I think that defining where the devices are at build time is wrong. You only need the ID and the functions that device requires.
The best example I saw was multiple PCI express bridges with the same ID, although only some of them were used by physical PCIe slots or onboard PCIe devices. Factory BIOSes usually hide the unconnected bridges and I'd be happy if we could do the same. AFAICS that requires different initialization of devices with the same type.
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
They don't know (or need to know) what ht links are connecting the cpu and what ones go to pci bus.
They need a way to specify settings for any given PCI device. Since most modern machines have multiple PCI devices with the same vendor/device ID, we have to be able to identify devices based on their logical path. For that, we have to model the logical PCI bus/device/tree reasonably well. I'm trying to do that, but no model seems to fit.
I disagree. How would a two devices of the same type require difference initialization? I think that defining where the devices are at build time is wrong. You only need the ID and the functions that device requires.
The best example I saw was multiple PCI express bridges with the same ID, although only some of them were used by physical PCIe slots or onboard PCIe devices. Factory BIOSes usually hide the unconnected bridges and I'd be happy if we could do the same. AFAICS that requires different initialization of devices with the same type.
This is an interesting problem to solve. If the customization is something very specific to the platform it is difficult to do in the device tree by variables settings. You end calling back to a custom function anyway which could know how to set each device any number of ways (with or without a dts).
I don't think that the dts is required at build time but both cases are handled (variables and custom functions) so I think we should continue on this path.
Marc