The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
The IEEE Std 1275-1994:
IEEE Standard for Boot (Initialization Configuration) Firmware: Core Requirements and Practices 3.2.1.1 Node names Each node in the device tree is identified by a node name using the following notation: driver-name@unit-address:device-arguments
The driver name field is a sequence of between one and 31 letters [...]. By convention, this name includes the name of the device’s manufacturer and the device’s model name separated by a “,”.
The unit address field is the text representation of the physical address of the device within the address space defined by its parent node. The form of the text representation is bus-dependent. 3.2.1.2 Path names A particular node is uniquely identified by describing its position in the device tree by completely specifying the path from the root node through all intermediate nodes to the node in question. The textual representation of a such a path is called a device path. Device paths are composed as follows: /node-name0/node-name1/ ... /node-nameN When Open Firmware is searching for a particular node, and either the driver name or @unit-address portion of the node name is not given, Open Firmware shall arbitrarily choose a node matching the portion that is present.
Signed-off-by: Marcel Apfelbaum marcel@redhat.com --- v1->v2: - fixed a bug preventing the boot of devices behind the main host bridge This approach leaves previous code paths intact so it will work the same as before if no PXB is present. - Added spec for fw path naming conventions. (Michael S. Tsirkin)
src/boot.c | 3 ++- src/hw/pci.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/boot.c b/src/boot.c index ec59c37..e241d1c 100644 --- a/src/boot.c +++ b/src/boot.c @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci) } else { if (pci->rootbus) p += snprintf(p, max, "/pci-root@%x", pci->rootbus); - p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN); + else + p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN); }
int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf); diff --git a/src/hw/pci.c b/src/hw/pci.c index 0379b55..9e77af4 100644 --- a/src/hw/pci.c +++ b/src/hw/pci.c @@ -133,7 +133,7 @@ pci_probe_devices(void) if (bus != lastbus) rootbuses++; lastbus = bus; - rootbus = rootbuses; + rootbus = bus; if (bus > MaxPCIBus) MaxPCIBus = bus; } else {
On 06/11/15 15:37, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
The IEEE Std 1275-1994:
IEEE Standard for Boot (Initialization Configuration) Firmware: Core Requirements and Practices 3.2.1.1 Node names Each node in the device tree is identified by a node name using the following notation: driver-name@unit-address:device-arguments
The driver name field is a sequence of between one and 31 letters [...]. By convention, this name includes the name of the device’s manufacturer and the device’s model name separated by a “,”. The unit address field is the text representation of the physical address of the device within the address space defined by its parent node. The form of the text representation is bus-dependent. 3.2.1.2 Path names A particular node is uniquely identified by describing its position in the device tree by completely specifying the path from the root node through all intermediate nodes to the node in question. The textual representation of a such a path is called a device path. Device paths are composed as follows: /node-name0/node-name1/ ... /node-nameN When Open Firmware is searching for a particular node, and either the driver name or @unit-address portion of the node name is not given, Open Firmware shall arbitrarily choose a node matching the portion that is present.
Signed-off-by: Marcel Apfelbaum marcel@redhat.com
v1->v2:
- fixed a bug preventing the boot of devices behind the main host bridge This approach leaves previous code paths intact so it will work the same as before if no PXB is present.
- Added spec for fw path naming conventions. (Michael S. Tsirkin)
src/boot.c | 3 ++- src/hw/pci.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/boot.c b/src/boot.c index ec59c37..e241d1c 100644 --- a/src/boot.c +++ b/src/boot.c @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci) } else { if (pci->rootbus) p += snprintf(p, max, "/pci-root@%x", pci->rootbus);
p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
else
p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
}
int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf);
diff --git a/src/hw/pci.c b/src/hw/pci.c index 0379b55..9e77af4 100644 --- a/src/hw/pci.c +++ b/src/hw/pci.c @@ -133,7 +133,7 @@ pci_probe_devices(void) if (bus != lastbus) rootbuses++; lastbus = bus;
rootbus = rootbuses;
rootbus = bus; if (bus > MaxPCIBus) MaxPCIBus = bus; } else {
I think the commit message is somewhat overkill, but I'll leave that to Michael. :)
Regardig the rootbus question. As far as I can see, the last hunk changes the dev->rootbus assignment for parentless devices, so they pick up the last bus rather than the number of buses found.
Then, the only difference this makes is in build_pci_path() -- I grepped the tree for whole-word "rootbus".
(It's quite a serendipity that my v3 qemu patchset produces exactly this pattern in OFW device paths, without my then-knowledge of the SeaBIOS code. Although, admittedly, the fw_name := "pci-root" change there was suggested by Marcel.)
So, unless I'm missing something:
Reviewed-by: Laszlo Ersek lersek@redhat.com
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The IEEE Std 1275-1994:
IEEE Standard for Boot (Initialization Configuration) Firmware: Core Requirements and Practices 3.2.1.1 Node names Each node in the device tree is identified by a node name using the following notation: driver-name@unit-address:device-arguments
The driver name field is a sequence of between one and 31 letters [...]. By convention, this name includes the name of the device’s manufacturer and the device’s model name separated by a “,”. The unit address field is the text representation of the physical address of the device within the address space defined by its parent node. The form of the text representation is bus-dependent.
Note the "physical address" part in the above. Your patch changes the "pci-root@" syntax to use a logical address instead of a physical address. That is, unless I've missed something, SeaBIOS today uses a physical address (the n'th root bus) and the patch would change it to use a logical address.
One of the goals of using an "openfirmware" like address was so that they would be stable across boots (the same mechanism is also used with coreboot). Using a physical address is key for this, because simply adding or removing a PCI device could cause the logical PCI bridge enumeration to change - and that would mess up the bootorder list if it was based on logical addresses.
-Kevin
On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The prev path includes both the extra root bridge and *then* the usual host bridge. /pci-root@1/pci@i0cf8/ ... ^ new ^ regular ^ devices
Since the new pci root bridge (and bus) is on "paralel" with the regular one. it is not correct to add it to the path.
The architecture is: /<host bridge>/devices... /extra root bridge/devices... /extra root bridge/devices... And not /extra root bridge//<host bridge>/devices
Thanks, Marcel
The IEEE Std 1275-1994:
IEEE Standard for Boot (Initialization Configuration) Firmware: Core Requirements and Practices 3.2.1.1 Node names Each node in the device tree is identified by a node name using the following notation: driver-name@unit-address:device-arguments
The driver name field is a sequence of between one and 31 letters [...]. By convention, this name includes the name of the device’s manufacturer and the device’s model name separated by a “,”. The unit address field is the text representation of the physical address of the device within the address space defined by its parent node. The form of the text representation is bus-dependent.
Note the "physical address" part in the above. Your patch changes the "pci-root@" syntax to use a logical address instead of a physical address. That is, unless I've missed something, SeaBIOS today uses a physical address (the n'th root bus) and the patch would change it to use a logical address.
One of the goals of using an "openfirmware" like address was so that they would be stable across boots (the same mechanism is also used with coreboot). Using a physical address is key for this, because simply adding or removing a PCI device could cause the logical PCI bridge enumeration to change - and that would mess up the bootorder list if it was based on logical addresses.
-Kevin
On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The prev path includes both the extra root bridge and *then* the usual host bridge. /pci-root@1/pci@i0cf8/ ... ^ new ^ regular ^ devices
Since the new pci root bridge (and bus) is on "paralel" with the regular one. it is not correct to add it to the path.
The architecture is: /<host bridge>/devices... /extra root bridge/devices... /extra root bridge/devices... And not /extra root bridge//<host bridge>/devices
Your patch changed both the "/extra root bridge/devices..." part and the "@1" part. The change of the "@1" in "/pci-root@1/" is not correct IMO.
Does open-firmware have any examples for PCI paths and in particular PCI paths when there are multiple root-buses?
It's possible to replace the "pci@i0cf8" with "pci-root@1" but that seems odd as the extra root bus is accessible via io accesses to 0x0cf8.
Another option would be to place the pci-root@1 behind the pci@i0cf8 as in "/pci@i0cf8/pci-root@1/...". Or, the root bus could be appended to the host bridge as in "/pci@i0cf8,1/...".
-Kevin
On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The prev path includes both the extra root bridge and *then* the usual host bridge. /pci-root@1/pci@i0cf8/ ... ^ new ^ regular ^ devices
Since the new pci root bridge (and bus) is on "paralel" with the regular one. it is not correct to add it to the path.
The architecture is: /<host bridge>/devices... /extra root bridge/devices... /extra root bridge/devices... And not /extra root bridge//<host bridge>/devices
Your patch changed both the "/extra root bridge/devices..." part and the "@1" part. The change of the "@1" in "/pci-root@1/" is not correct IMO.
Why? @1 should be the unit address which is the text representation of the physical address, in our case the slot. Since the bus number in our case is 4, I think /pci-root@4/ is the 'correct' address.
Does open-firmware have any examples for PCI paths and in particular PCI paths when there are multiple root-buses?
Maybe Laszlo can say more, but we both agreed that this would be the berst representation of extra root buses on both OVMF and Seabios.
It's possible to replace the "pci@i0cf8" with "pci-root@1" but that seems odd as the extra root bus is accessible via io accesses to 0x0cf8.
While this is true, /pci-root@[...]/ may represent also other kind of host bridges not only PXBs. But we can change this of course, as long as OVMF can also work with it.
Another option would be to place the pci-root@1 behind the pci@i0cf8 as in "/pci@i0cf8/pci-root@1/...". Or, the root bus could be appended to the host bridge as in "/pci@i0cf8,1/...".
The latest representation makes sense to me, but "/pci@i0cf8,4/...", after comma the bus number.
Laszlo, will this work for OVMF?
Thanks, Marcel
-Kevin
On 06/11/15 16:36, Marcel Apfelbaum wrote:
On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The prev path includes both the extra root bridge and *then* the usual host bridge. /pci-root@1/pci@i0cf8/ ... ^ new ^ regular ^ devices
Since the new pci root bridge (and bus) is on "paralel" with the regular one. it is not correct to add it to the path.
The architecture is: /<host bridge>/devices... /extra root bridge/devices... /extra root bridge/devices... And not /extra root bridge//<host bridge>/devices
Your patch changed both the "/extra root bridge/devices..." part and the "@1" part. The change of the "@1" in "/pci-root@1/" is not correct IMO.
Why? @1 should be the unit address which is the text representation of the physical address, in our case the slot. Since the bus number in our case is 4, I think /pci-root@4/ is the 'correct' address.
Does open-firmware have any examples for PCI paths and in particular PCI paths when there are multiple root-buses?
Maybe Laszlo can say more, but we both agreed that this would be the berst representation of extra root buses on both OVMF and Seabios.
The
PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot (Initialization Configuration) Firmware
document (binding) does speak about this, as far as I can see, in
2.2.1. Physical Address Formats
It first gives a "Numerical Representation" in device tree format (same thing as in DTB / FDT), and then a "Text Representation" with references to "Numerical Representation". It is *completely* Greek to me. It took me minutes of staring just to vaguely understand how the current
i0cf8
unit address comes together.
I've always treated the OFW devpaths that QEMU generates only *syntactically* conformant to the (base) OFW spec, and never considered the particular bindings 100% binding. That said, if someone finds where the PCI binding defines unit addresses for *root* buses, please let me know, just for reference.
It's possible to replace the "pci@i0cf8" with "pci-root@1" but that seems odd as the extra root bus is accessible via io accesses to 0x0cf8.
While this is true, /pci-root@[...]/ may represent also other kind of host bridges not only PXBs. But we can change this of course, as long as OVMF can also work with it.
Another option would be to place the pci-root@1 behind the pci@i0cf8 as in "/pci@i0cf8/pci-root@1/...". Or, the root bus could be appended to the host bridge as in "/pci@i0cf8,1/...".
The latest representation makes sense to me, but "/pci@i0cf8,4/...", after comma the bus number.
Laszlo, will this work for OVMF?
With the v3 patchset for QEMU, we could probably easily generate the "i0cf8,4" unit address inside the PXB device model itself. (Of course exactly what number should stand after the comma remains a question.)
Parsing it in OVMF is doable, albeit somewhat ugly.
In any case, I'm not convinced at all why this is a better idea than the proposal in this patch.
Laszlo
On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The prev path includes both the extra root bridge and *then* the usual host bridge. /pci-root@1/pci@i0cf8/ ... ^ new ^ regular ^ devices
Since the new pci root bridge (and bus) is on "paralel" with the regular one. it is not correct to add it to the path.
The architecture is: /<host bridge>/devices... /extra root bridge/devices... /extra root bridge/devices... And not /extra root bridge//<host bridge>/devices
Your patch changed both the "/extra root bridge/devices..." part and the "@1" part. The change of the "@1" in "/pci-root@1/" is not correct IMO.
Why? @1 should be the unit address which is the text representation of the physical address, in our case the slot. Since the bus number in our case is 4, I think /pci-root@4/ is the 'correct' address.
On real machines, the firmware assigns the 4 - it's not a physical address; it's a logical address (like all bus numbers in PCI). The firmware might assign a totally different number on the next boot.
-Kevin
On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The prev path includes both the extra root bridge and *then* the usual host bridge. /pci-root@1/pci@i0cf8/ ... ^ new ^ regular ^ devices
Since the new pci root bridge (and bus) is on "paralel" with the regular one. it is not correct to add it to the path.
The architecture is: /<host bridge>/devices... /extra root bridge/devices... /extra root bridge/devices... And not /extra root bridge//<host bridge>/devices
Your patch changed both the "/extra root bridge/devices..." part and the "@1" part. The change of the "@1" in "/pci-root@1/" is not correct IMO.
Why? @1 should be the unit address which is the text representation of the physical address, in our case the slot. Since the bus number in our case is 4, I think /pci-root@4/ is the 'correct' address.
On real machines, the firmware assigns the 4 - it's not a physical address; it's a logical address (like all bus numbers in PCI). The firmware might assign a totally different number on the next boot.
Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly try to understand it.
I looked up a real hardware machine and it seemed to me that the extra pci root numbers are provided in the ACPI tables, meaning by the vendor, not the fw. In this case QEMU is the vendor, i440fx is the machine, right?
I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*. They are doing it for the pci-2-pci bridges of course. I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding PCI root buses. Is my understanding incorrect?
Thanks, Marcel
-Kevin
On 06/11/15 19:46, Marcel Apfelbaum wrote:
On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote: > The fixes solves the following issue: > The PXB device exposes a new pci root bridge with the > fw path: /pci-root@4/..., in which 4 is the root bus number. > Before this patch the fw path was wrongly computed: > /pci-root@1/pci@i0cf8/... > Fix the above issues: Correct the bus number and remove the > extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The prev path includes both the extra root bridge and *then* the usual host bridge. /pci-root@1/pci@i0cf8/ ... ^ new ^ regular ^ devices
Since the new pci root bridge (and bus) is on "paralel" with the regular one. it is not correct to add it to the path.
The architecture is: /<host bridge>/devices... /extra root bridge/devices... /extra root bridge/devices... And not /extra root bridge//<host bridge>/devices
Your patch changed both the "/extra root bridge/devices..." part and the "@1" part. The change of the "@1" in "/pci-root@1/" is not correct IMO.
Why? @1 should be the unit address which is the text representation of the physical address, in our case the slot. Since the bus number in our case is 4, I think /pci-root@4/ is the 'correct' address.
On real machines, the firmware assigns the 4 - it's not a physical address; it's a logical address (like all bus numbers in PCI). The firmware might assign a totally different number on the next boot.
Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly try to understand it.
I looked up a real hardware machine and it seemed to me that the extra pci root numbers are provided in the ACPI tables, meaning by the vendor, not the fw. In this case QEMU is the vendor, i440fx is the machine, right?
I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*. They are doing it for the pci-2-pci bridges of course. I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding PCI root buses.
This is exactly how I understood it.
We're not interested in placing such bus numbers in device paths that are assigned during PCI enumeration. (Like subordinate bus numbers.) We're talking about the root bus numbers.
OVMF implements the same kind of probing that SeaBIOS does (based on natural language description from Michael and Marcel, not on the actual code). Devices on the root buses respond without any prior bus number assignments. Therefore it makes sense to place those root bus numbers into device paths.
The bus numbers assignable by the firmware come from the intervals *between* the (fixed-in-hardware) root bus numbers. As I understand it, for two adjacent root bus numbers R1 and R2, the both-sides-exclusive interval (R1, R2) is available for secondary bus number assignment, to non-root buses that are (recursively) behind PCI bridges that hang off the R1 root bus (ie. the LHS of the interval).
We don't care about such firmware-assigned bus numbers at all, but R1, R2 etc. must be communicated to the firmware *somehow* in order to identify devices for booting.
Since R1, R2 etc are not *assigned* by the firmware, only detected (the assignment happens in QEMU, and also by the hw vendor in case of physical hardware), R1, R2 etc are permanent as long as the physical configuration does not change. Hence they qualify for the physical addressing nature of OFW device paths. We've just been looking for a *syntax* to express them.
Is my understanding incorrect?
FWIW I'm relieved that at least the two of us have been understanding each other ;)
Laszlo
On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
On 06/11/15 19:46, Marcel Apfelbaum wrote:
On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
On real machines, the firmware assigns the 4 - it's not a physical address; it's a logical address (like all bus numbers in PCI). The firmware might assign a totally different number on the next boot.
Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly try to understand it.
I looked up a real hardware machine and it seemed to me that the extra pci root numbers are provided in the ACPI tables, meaning by the vendor, not the fw. In this case QEMU is the vendor, i440fx is the machine, right?
I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*. They are doing it for the pci-2-pci bridges of course. I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding PCI root buses.
This is exactly how I understood it.
We're not interested in placing such bus numbers in device paths that are assigned during PCI enumeration. (Like subordinate bus numbers.) We're talking about the root bus numbers.
OVMF implements the same kind of probing that SeaBIOS does (based on natural language description from Michael and Marcel, not on the actual code). Devices on the root buses respond without any prior bus number assignments.
Alas, that is not correct. Coreboot supports several AMD boards that have multiple southbridge chips which provide independent PCI root buses. These chips have to be configured and assigned a bus number prior to use (which coreboot does).
-Kevin
On 06/11/15 21:24, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
On 06/11/15 19:46, Marcel Apfelbaum wrote:
On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
On real machines, the firmware assigns the 4 - it's not a physical address; it's a logical address (like all bus numbers in PCI). The firmware might assign a totally different number on the next boot.
Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly try to understand it.
I looked up a real hardware machine and it seemed to me that the extra pci root numbers are provided in the ACPI tables, meaning by the vendor, not the fw. In this case QEMU is the vendor, i440fx is the machine, right?
I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*. They are doing it for the pci-2-pci bridges of course. I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding PCI root buses.
This is exactly how I understood it.
We're not interested in placing such bus numbers in device paths that are assigned during PCI enumeration. (Like subordinate bus numbers.) We're talking about the root bus numbers.
OVMF implements the same kind of probing that SeaBIOS does (based on natural language description from Michael and Marcel, not on the actual code). Devices on the root buses respond without any prior bus number assignments.
Alas, that is not correct. Coreboot supports several AMD boards that have multiple southbridge chips which provide independent PCI root buses. These chips have to be configured and assigned a bus number prior to use (which coreboot does).
Thanks.
Assuming such a physical hardware configuration, and that Coreboot configures the root buses before the SeaBIOS payload is launched: how does Coreboot identify a device, on a nonzero root bus, for SeaBIOS to boot from? Is that possible at all, or is the user expected to configure / select that in SeaBIOS exclusively?
Our use case does not include Coreboot (as far as I can tell), but I'm trying to find some parallels here.
* In the "QEMU without Coreboot" case, QEMU is the component that sets up the root buses for the firmware. Therefore it has all knowledge about the root buses. OVMF is meant solely for QEMU "hardware", therefore it has a full understanding with QEMU. QEMU can refer to root buses in the "bootorder" fw_cfg file because it owns both the root buses and the "bootorder" fw_cfg file, and OVMF can trust them to match.
* In the "physical hardware with Coreboot" case, Coreboot is the component that sets up the root buses for the firmware (SeaBIOS). Coreboot *could* refer to the root buses in some boot order file (a cbfs file I guess?) -- if such a feature existed between Coreboot and SeaBIOS -- because Coreboot would own both the root buses and the (theoretical) cbfs boot order file. Hence SeaBIOS could trust them to match.
Assuming there is no such feature between Coreboot and SeaBIOS (ie. one that would parallel our QEMU use case on physical hardware), what solution would you find acceptable for the case when QEMU basically promises "I know where you'll find those root buses, and the bootorder fw_cfg file will match them"?
Could we simply make this patch conditional on runningOnQEMU()?
Thanks Laszlo
On Fri, Jun 12, 2015 at 11:25:50AM +0200, Laszlo Ersek wrote:
On 06/11/15 21:24, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
On 06/11/15 19:46, Marcel Apfelbaum wrote:
On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
On real machines, the firmware assigns the 4 - it's not a physical address; it's a logical address (like all bus numbers in PCI). The firmware might assign a totally different number on the next boot.
Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly try to understand it.
I looked up a real hardware machine and it seemed to me that the extra pci root numbers are provided in the ACPI tables, meaning by the vendor, not the fw. In this case QEMU is the vendor, i440fx is the machine, right?
I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*. They are doing it for the pci-2-pci bridges of course. I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding PCI root buses.
This is exactly how I understood it.
We're not interested in placing such bus numbers in device paths that are assigned during PCI enumeration. (Like subordinate bus numbers.) We're talking about the root bus numbers.
OVMF implements the same kind of probing that SeaBIOS does (based on natural language description from Michael and Marcel, not on the actual code). Devices on the root buses respond without any prior bus number assignments.
Alas, that is not correct. Coreboot supports several AMD boards that have multiple southbridge chips which provide independent PCI root buses. These chips have to be configured and assigned a bus number prior to use (which coreboot does).
Thanks.
Assuming such a physical hardware configuration, and that Coreboot configures the root buses before the SeaBIOS payload is launched: how does Coreboot identify a device, on a nonzero root bus, for SeaBIOS to boot from? Is that possible at all, or is the user expected to configure / select that in SeaBIOS exclusively?
Coreboot does not provide information on what to boot. It's task is low level hardware initialization. It's the job of SeaBIOS to boot the OS (and determine which media, etc to boot from). SeaBIOS gets boot preference information from a static configuration file (bootorder) stored in flash (cbfs).
Assuming there is no such feature between Coreboot and SeaBIOS (ie. one that would parallel our QEMU use case on physical hardware), what solution would you find acceptable for the case when QEMU basically promises "I know where you'll find those root buses, and the bootorder fw_cfg file will match them"?
We currently go to great lengths to avoid logical identifiers in bootorder and I'm confused why we would wish to add them now. Bus number is not currently used anywhere in bootorder because (in the general case) it's an arbitrary identifier that's not stable between boots and (in the general case) may not be stable even within a boot.
I understand that in this specific case (extra root buses on QEMU) it is stable within a boot, but it seems strange that we'd want to define the interface knowing it's a poor choice in the general case.
As for what I would suggest - well, SeaBIOS has already supported multiple root buses for years and already has a mechanism for deterministically specifying a device on an extra root bus. (By specifying the N'th extra root bus instead of specifying the logical id given to that bus). This is by no means a perfect solution and it's certainly open to change - but the current proposed patches appear to be regressions to me.
Could we simply make this patch conditional on runningOnQEMU()?
It's possible. I'd certainly prefer to avoid adding special cases if possible.
-Kevin
On 06/12/15 15:03, Kevin O'Connor wrote
On Fri, Jun 12, 2015 at 11:25:50AM +0200, Laszlo Ersek wrote:
On 06/11/15 21:24, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
On 06/11/15 19:46, Marcel Apfelbaum wrote:
On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
On real machines, the firmware assigns the 4 - it's not a physical address; it's a logical address (like all bus numbers in PCI). The firmware might assign a totally different number on the next boot.
Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly try to understand it.
I looked up a real hardware machine and it seemed to me that the extra pci root numbers are provided in the ACPI tables, meaning by the vendor, not the fw. In this case QEMU is the vendor, i440fx is the machine, right?
I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*. They are doing it for the pci-2-pci bridges of course. I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding PCI root buses.
This is exactly how I understood it.
We're not interested in placing such bus numbers in device paths that are assigned during PCI enumeration. (Like subordinate bus numbers.) We're talking about the root bus numbers.
OVMF implements the same kind of probing that SeaBIOS does (based on natural language description from Michael and Marcel, not on the actual code). Devices on the root buses respond without any prior bus number assignments.
Alas, that is not correct. Coreboot supports several AMD boards that have multiple southbridge chips which provide independent PCI root buses. These chips have to be configured and assigned a bus number prior to use (which coreboot does).
Thanks.
Assuming such a physical hardware configuration, and that Coreboot configures the root buses before the SeaBIOS payload is launched: how does Coreboot identify a device, on a nonzero root bus, for SeaBIOS to boot from? Is that possible at all, or is the user expected to configure / select that in SeaBIOS exclusively?
Coreboot does not provide information on what to boot. It's task is low level hardware initialization. It's the job of SeaBIOS to boot the OS (and determine which media, etc to boot from). SeaBIOS gets boot preference information from a static configuration file (bootorder) stored in flash (cbfs).
Assuming there is no such feature between Coreboot and SeaBIOS (ie. one that would parallel our QEMU use case on physical hardware), what solution would you find acceptable for the case when QEMU basically promises "I know where you'll find those root buses, and the bootorder fw_cfg file will match them"?
We currently go to great lengths to avoid logical identifiers in bootorder and I'm confused why we would wish to add them now. Bus number is not currently used anywhere in bootorder because (in the general case) it's an arbitrary identifier that's not stable between boots and (in the general case) may not be stable even within a boot.
I understand that in this specific case (extra root buses on QEMU) it is stable within a boot, but it seems strange that we'd want to define the interface knowing it's a poor choice in the general case.
As for what I would suggest - well, SeaBIOS has already supported multiple root buses for years and already has a mechanism for deterministically specifying a device on an extra root bus. (By specifying the N'th extra root bus instead of specifying the logical id given to that bus). This is by no means a perfect solution and it's certainly open to change - but the current proposed patches appear to be regressions to me.
Could we simply make this patch conditional on runningOnQEMU()?
It's possible. I'd certainly prefer to avoid adding special cases if possible.
Okay. Let's compare the two options we appear to have:
(1) A patch like this for SeaBIOS:
diff --git a/src/boot.c b/src/boot.c index ec59c37..c7fd091 100644 --- a/src/boot.c +++ b/src/boot.c @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci) } else { if (pci->rootbus) p += snprintf(p, max, "/pci-root@%x", pci->rootbus);
p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
if (!runningOnQEMU() || !pci->rootbus)
p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
}
int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf);
diff --git a/src/hw/pci.c b/src/hw/pci.c index 0379b55..169a040 100644 --- a/src/hw/pci.c +++ b/src/hw/pci.c @@ -13,6 +13,7 @@ #include "string.h" // memset #include "util.h" // udelay #include "x86.h" // outl +#include "fw/paravirt.h" // runningOnQEMU
void pci_config_writel(u16 bdf, u32 addr, u32 val) { @@ -133,7 +134,7 @@ pci_probe_devices(void) if (bus != lastbus) rootbuses++; lastbus = bus;
rootbus = rootbuses;
rootbus = runningOnQEMU() ? bus : rootbuses; if (bus > MaxPCIBus) MaxPCIBus = bus; } else {
(2) The QEMU command line and the effects the command line has on the virtual hardware should not change. However, all of the following have to be updated:
- the "explicit_ofw_unit_address" property assignments in pxb_dev_initfn() have to be done separately, after all PXBs have been seen, and sorted. This probably requires another "machine init done" notifier.
- the _UID assignments in build_ssdt() need to reflect the exact same values
- OVMF's root bridge driver needs to generate the same _UID values in the PciRoot() device path nodes
- OVMF's boot order library must consider the /pci-root@N/pci@i0cf8/... format, where the root bus is the N'th extra root bus (in hex notation).
Basically, we need to keep the bus_nr=N user interface, and the effects it has on the virtual hardware, intact, but translate the numbers that are exposed via fw_cfg *and* ACPI (because those must match!) from "identifier" to "serial number after sorting by identifier"; in practice replicating the detection traversal that SeaBIOS does.
Doesn't seem impossible (unless Marcel raises a design-level issue with it), but I'll have to withdraw for a while and research it.
Thanks Laszlo
On Fri, Jun 12, 2015 at 05:45:04PM +0200, Laszlo Ersek wrote:
On 06/12/15 15:03, Kevin O'Connor wrote
As for what I would suggest - well, SeaBIOS has already supported multiple root buses for years and already has a mechanism for deterministically specifying a device on an extra root bus. (By specifying the N'th extra root bus instead of specifying the logical id given to that bus). This is by no means a perfect solution and it's certainly open to change - but the current proposed patches appear to be regressions to me.
Could we simply make this patch conditional on runningOnQEMU()?
It's possible. I'd certainly prefer to avoid adding special cases if possible.
Okay. Let's compare the two options we appear to have:
(1) A patch like this for SeaBIOS:
diff --git a/src/boot.c b/src/boot.c index ec59c37..c7fd091 100644 --- a/src/boot.c +++ b/src/boot.c @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci) } else { if (pci->rootbus) p += snprintf(p, max, "/pci-root@%x", pci->rootbus);
p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
if (!runningOnQEMU() || !pci->rootbus)
p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
}
int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf);
diff --git a/src/hw/pci.c b/src/hw/pci.c index 0379b55..169a040 100644 --- a/src/hw/pci.c +++ b/src/hw/pci.c @@ -13,6 +13,7 @@ #include "string.h" // memset #include "util.h" // udelay #include "x86.h" // outl +#include "fw/paravirt.h" // runningOnQEMU
void pci_config_writel(u16 bdf, u32 addr, u32 val) { @@ -133,7 +134,7 @@ pci_probe_devices(void) if (bus != lastbus) rootbuses++; lastbus = bus;
rootbus = rootbuses;
rootbus = runningOnQEMU() ? bus : rootbuses; if (bus > MaxPCIBus) MaxPCIBus = bus; } else {
If we went down this path, I hope we could agree on the same prefix and thus limit the runningOnQEMU() to just the second part. Was there a concern with "/pci@i0cf8,%x/"?
(2) The QEMU command line and the effects the command line has on the virtual hardware should not change. However, all of the following have to be updated:
- the "explicit_ofw_unit_address" property assignments in pxb_dev_initfn() have to be done separately, after all PXBs have been seen, and sorted. This probably requires another "machine init done" notifier.
I admit the sorting of pxb objects just to reverse engineer what SeaBIOS expects would not be fun. Doesn't QEMU have to sort the buses anyway to know which secondary bus ranges are associated with each root bus?
the _UID assignments in build_ssdt() need to reflect the exact same values
OVMF's root bridge driver needs to generate the same _UID values in the PciRoot() device path nodes
OVMF's boot order library must consider the /pci-root@N/pci@i0cf8/... format, where the root bus is the N'th extra root bus (in hex notation).
Basically, we need to keep the bus_nr=N user interface, and the effects it has on the virtual hardware, intact, but translate the numbers that are exposed via fw_cfg *and* ACPI (because those must match!) from "identifier" to "serial number after sorting by identifier"; in practice replicating the detection traversal that SeaBIOS does.
Why does fw_cfg and ACPI have to match?
-Kevin
On 06/12/15 20:40, Kevin O'Connor wrote:
On Fri, Jun 12, 2015 at 05:45:04PM +0200, Laszlo Ersek wrote:
On 06/12/15 15:03, Kevin O'Connor wrote
As for what I would suggest - well, SeaBIOS has already supported multiple root buses for years and already has a mechanism for deterministically specifying a device on an extra root bus. (By specifying the N'th extra root bus instead of specifying the logical id given to that bus). This is by no means a perfect solution and it's certainly open to change - but the current proposed patches appear to be regressions to me.
Could we simply make this patch conditional on runningOnQEMU()?
It's possible. I'd certainly prefer to avoid adding special cases if possible.
Okay. Let's compare the two options we appear to have:
(1) A patch like this for SeaBIOS:
diff --git a/src/boot.c b/src/boot.c index ec59c37..c7fd091 100644 --- a/src/boot.c +++ b/src/boot.c @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci) } else { if (pci->rootbus) p += snprintf(p, max, "/pci-root@%x", pci->rootbus);
p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
if (!runningOnQEMU() || !pci->rootbus)
p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
}
int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf);
diff --git a/src/hw/pci.c b/src/hw/pci.c index 0379b55..169a040 100644 --- a/src/hw/pci.c +++ b/src/hw/pci.c @@ -13,6 +13,7 @@ #include "string.h" // memset #include "util.h" // udelay #include "x86.h" // outl +#include "fw/paravirt.h" // runningOnQEMU
void pci_config_writel(u16 bdf, u32 addr, u32 val) { @@ -133,7 +134,7 @@ pci_probe_devices(void) if (bus != lastbus) rootbuses++; lastbus = bus;
rootbus = rootbuses;
rootbus = runningOnQEMU() ? bus : rootbuses; if (bus > MaxPCIBus) MaxPCIBus = bus; } else {
If we went down this path, I hope we could agree on the same prefix and thus limit the runningOnQEMU() to just the second part. Was there a concern with "/pci@i0cf8,%x/"?
I don't recall any specific concern, but if we want to present either /pci@i0cf8,%x/, or the pattern SeaBIOS currently expects, then in QEMU the same stuff has to be poked at anyway.
(2) The QEMU command line and the effects the command line has on the virtual hardware should not change. However, all of the following have to be updated:
- the "explicit_ofw_unit_address" property assignments in pxb_dev_initfn() have to be done separately, after all PXBs have been seen, and sorted. This probably requires another "machine init done" notifier.
I admit the sorting of pxb objects just to reverse engineer what SeaBIOS expects would not be fun.
Actually it's kinda fun. :)
Doesn't QEMU have to sort the buses anyway to know which secondary bus ranges are associated with each root bus?
I don't think so.
OVMF does the same probing (in the same order) as SeaBIOS for the root buses, and the intervals between each pair are handed to edk2's PCI bus driver (which is independent of the OVMF platform code). This latter driver performs the assignments / allocations from the allowed interval.
the _UID assignments in build_ssdt() need to reflect the exact same values
OVMF's root bridge driver needs to generate the same _UID values in the PciRoot() device path nodes
OVMF's boot order library must consider the /pci-root@N/pci@i0cf8/... format, where the root bus is the N'th extra root bus (in hex notation).
Basically, we need to keep the bus_nr=N user interface, and the effects it has on the virtual hardware, intact, but translate the numbers that are exposed via fw_cfg *and* ACPI (because those must match!) from "identifier" to "serial number after sorting by identifier"; in practice replicating the detection traversal that SeaBIOS does.
Why does fw_cfg and ACPI have to match?
You will like the explanation for that. I'm about to test a new QEMU series, and I'll CC you on the relevant patches. The last patch in the series answers this question.
(In a nutshell: OVMF translates OFW devpath fragments to UEFI devpath fragments, for boot option matching. The UEFI devpath fragments in question start with a PciRoot() ACPI node, and the _UID value in that initial node must match what QEMU hands down and OVMF uses in translation. Finally, the UEFI spec requires that such a _UID value, in the UEFI devpath, actually identify the valid ACPI object -- and that ACPI object comes from QEMU's generator.)
If everything goes well, we might not need to change SeaBIOS at all. (... Hope dies last :))
Thanks Laszlo
On Fri, Jun 12, 2015 at 02:40:10PM -0400, Kevin O'Connor wrote:
(2) The QEMU command line and the effects the command line has on the virtual hardware should not change. However, all of the following have to be updated:
- the "explicit_ofw_unit_address" property assignments in pxb_dev_initfn() have to be done separately, after all PXBs have been seen, and sorted. This probably requires another "machine init done" notifier.
I admit the sorting of pxb objects just to reverse engineer what SeaBIOS expects would not be fun. Doesn't QEMU have to sort the buses anyway to know which secondary bus ranges are associated with each root bus?
I guess it can be done - though this means it will break if we ever support hot-plugging of these roots. But more importantly, if the sort is by the bus number, then how is it better than just using the bus number directly?
On Sun, Jun 14, 2015 at 02:05:52PM +0200, Michael S. Tsirkin wrote:
On Fri, Jun 12, 2015 at 02:40:10PM -0400, Kevin O'Connor wrote:
(2) The QEMU command line and the effects the command line has on the virtual hardware should not change. However, all of the following have to be updated:
- the "explicit_ofw_unit_address" property assignments in pxb_dev_initfn() have to be done separately, after all PXBs have been seen, and sorted. This probably requires another "machine init done" notifier.
I admit the sorting of pxb objects just to reverse engineer what SeaBIOS expects would not be fun. Doesn't QEMU have to sort the buses anyway to know which secondary bus ranges are associated with each root bus?
I guess it can be done - though this means it will break if we ever support hot-plugging of these roots.
As I understand it, the use case for multiple PCI roots is large servers that process a lot of IO. For example, if you have a server with two 8-core cpu chips and two 40 gig ethernet cards, it might be a significant performance boost if the ethernet traffic from one card is on a different bus from the other card. These systems (at least in the AMD case) have multiple southbridge chips on the motherboard that each implement a PCI bus and each southbridge chip is associated with a particular CPU. This allows (in theory) a particular CPU to process the IO from it's associated PCI bus without IO contention. I'm not an expert on this, but that's my general understanding.
I'm not aware of real world hardware with hot-plugable root buses. Should it come about then some kind of OS visible spec would be needed for the OS to identify and enumerate newly added buses, and I suppose we could figure out how to handle it once that type of thing happens.
But more importantly, if the sort is by the bus number, then how is it better than just using the bus number directly?
Coreboot supports real machines with multiple southbridge chips. The number of southbridge chips on the motherboard is static, but the bus id associated with them is not. So, SeaBIOS attempts to give a unique id to the southbridge chip that doesn't rely on the bus id (it's the N'th chip instead of the chip's current bus id).
I asked the coreboot developers about this again and they reiterated that bus id is dynamic and it could change between boots.
The SeaBIOS scheme is not perfect of course - for example nothing technically stops coreboot from assigning the bus ids in a different order from one boot to the next and if a bus doesn't show any devices on it at all then it would skew the ordering. Neither happens in practice today.
All of the above aside, I'm confused why one would want to add the logical bus-id to the open firmware style physical topology. We know the bus-id is a logical value in the general case - I find it confusing to use it in a description of physical topology. To wit, what happens if OSes learn how to alter the bus id of extra root buses (eg, to support hot plugging them) - then we'd have built an interface that's not stable even within a single boot.
-Kevin
On Sun, Jun 14, 2015 at 10:50:22AM -0400, Kevin O'Connor wrote:
On Sun, Jun 14, 2015 at 02:05:52PM +0200, Michael S. Tsirkin wrote:
On Fri, Jun 12, 2015 at 02:40:10PM -0400, Kevin O'Connor wrote:
(2) The QEMU command line and the effects the command line has on the virtual hardware should not change. However, all of the following have to be updated:
- the "explicit_ofw_unit_address" property assignments in pxb_dev_initfn() have to be done separately, after all PXBs have been seen, and sorted. This probably requires another "machine init done" notifier.
I admit the sorting of pxb objects just to reverse engineer what SeaBIOS expects would not be fun. Doesn't QEMU have to sort the buses anyway to know which secondary bus ranges are associated with each root bus?
I guess it can be done - though this means it will break if we ever support hot-plugging of these roots.
As I understand it, the use case for multiple PCI roots is large servers that process a lot of IO.
For better or worse, guest OS-es assume that numa locality can only be specified for PCI roots.
So the use case is to specify numa locality for virtual devices.
For example, if you have a server with two 8-core cpu chips and two 40 gig ethernet cards, it might be a significant performance boost if the ethernet traffic from one card is on a different bus from the other card. These systems (at least in the AMD case) have multiple southbridge chips on the motherboard that each implement a PCI bus and each southbridge chip is associated with a particular CPU. This allows (in theory) a particular CPU to process the IO from it's associated PCI bus without IO contention. I'm not an expert on this, but that's my general understanding.
I'm not aware of real world hardware with hot-plugable root buses. Should it come about then some kind of OS visible spec would be needed for the OS to identify and enumerate newly added buses, and I suppose we could figure out how to handle it once that type of thing happens.
But more importantly, if the sort is by the bus number, then how is it better than just using the bus number directly?
Coreboot supports real machines with multiple southbridge chips. The number of southbridge chips on the motherboard is static, but the bus id associated with them is not. So, SeaBIOS attempts to give a unique id to the southbridge chip that doesn't rely on the bus id (it's the N'th chip instead of the chip's current bus id).
I asked the coreboot developers about this again and they reiterated that bus id is dynamic and it could change between boots.
The SeaBIOS scheme is not perfect of course - for example nothing technically stops coreboot from assigning the bus ids in a different order from one boot to the next and if a bus doesn't show any devices on it at all then it would skew the ordering. Neither happens in practice today.
All of the above aside, I'm confused why one would want to add the logical bus-id to the open firmware style physical topology. We know the bus-id is a logical value in the general case - I find it confusing to use it in a description of physical topology. To wit, what happens if OSes learn how to alter the bus id of extra root buses (eg, to support hot plugging them) - then we'd have built an interface that's not stable even within a single boot.
-Kevin
To summarise, you feel that modifying bus id without reordering bus ids between roots is likely, modifications that would cause reordering are unlikely, thus counting bus ids in order gives a stable index. Is that right?
To be on the safe side, it would be nice to have bios skip some fields/properties when parsing paths, so that if we want to use another id in the future, we can supply both id types. I haven't looked at the parsing code - maybe it does this already?
On Sun, Jun 14, 2015 at 08:06:22PM +0200, Michael S. Tsirkin wrote:
To summarise, you feel that modifying bus id without reordering bus ids between roots is likely, modifications that would cause reordering are unlikely, thus counting bus ids in order gives a stable index. Is that right?
Yes.
To be on the safe side, it would be nice to have bios skip some fields/properties when parsing paths, so that if we want to use another id in the future, we can supply both id types. I haven't looked at the parsing code - maybe it does this already?
SeaBIOS already does that. (SeaBIOS doesn't parse the bootorder file - it generates a "glob" like pattern for each device and then sees if that pattern matches a line in the bootorder file.)
Also, I don't have a strong objection to Laszlo's SeaBIOS patch (the one that does the runningOnQEMU() check). (I still think it's quirky to use bus-id in the file, but it's not a show stopper if it's just for QEMU.)
-Kevin
On Sun, 2015-06-14 at 20:06 +0200, Michael S. Tsirkin wrote:
As I understand it, the use case for multiple PCI roots is large servers that process a lot of IO.
For better or worse, guest OS-es assume that numa locality can only be specified for PCI roots.
So the use case is to specify numa locality for virtual devices.
In addition, I'd add that on ppc "pseries", the standard way of hotplugging devices is to hotplug PCI roots (ie, a virtual host bridge with the new device(s) below it).
I'm not aware of real world hardware with hot-plugable root buses. Should it come about then some kind of OS visible spec would be needed for the OS to identify and enumerate newly added buses, and I suppose we could figure out how to handle it once that type of thing happens.
On IBM ppc systems, both exist. IE, real HW hot pluggable roots (on older systems mostly, GX based drawers with PCI host bridges in them nowadays, we tend to use PCIe cable card based drawers), and in virtualized systems, hot plugging root bridges is the standard way PowerVM (aka pHyp) uses for hotplug which we need to support in qemu/pseries at some point.
But more importantly, if the sort is by the bus number, then how is it better than just using the bus number directly?
PCI Bus number makes no sense. Any root can have the whole range of bus numbers 0...255 and the bus number assignment is under control of the guest anyway. Or are you talking about a different bus number (somewhat picked up the conversation half way...)
To summarise, you feel that modifying bus id without reordering bus ids between roots is likely, modifications that would cause reordering are unlikely, thus counting bus ids in order gives a stable index. Is that right?
To be on the safe side, it would be nice to have bios skip some fields/properties when parsing paths, so that if we want to use another id in the future, we can supply both id types. I haven't looked at the parsing code - maybe it does this already?
Ben.
On Mon, Jun 15, 2015 at 07:39:18AM +1000, Benjamin Herrenschmidt wrote:
On Sun, 2015-06-14 at 20:06 +0200, Michael S. Tsirkin wrote:
As I understand it, the use case for multiple PCI roots is large servers that process a lot of IO.
For better or worse, guest OS-es assume that numa locality can only be specified for PCI roots.
So the use case is to specify numa locality for virtual devices.
In addition, I'd add that on ppc "pseries", the standard way of hotplugging devices is to hotplug PCI roots (ie, a virtual host bridge with the new device(s) below it).
I'm not aware of real world hardware with hot-plugable root buses. Should it come about then some kind of OS visible spec would be needed for the OS to identify and enumerate newly added buses, and I suppose we could figure out how to handle it once that type of thing happens.
On IBM ppc systems, both exist. IE, real HW hot pluggable roots (on older systems mostly, GX based drawers with PCI host bridges in them nowadays, we tend to use PCIe cable card based drawers), and in virtualized systems, hot plugging root bridges is the standard way PowerVM (aka pHyp) uses for hotplug which we need to support in qemu/pseries at some point.
But more importantly, if the sort is by the bus number, then how is it better than just using the bus number directly?
PCI Bus number makes no sense. Any root can have the whole range of bus numbers 0...255 and the bus number assignment is under control of the guest anyway. Or are you talking about a different bus number (somewhat picked up the conversation half way...)
There are x86 systems with multiple separate PCI root buses where one can access the pci config space of all the buses using the same 0x0cf8 IO space. During system setup, the multiple PCI root buses are each configured to only respond to PCI config accesses within its range of bus numbers. So if "root1" is configured for bus ids between 64-128, then it will only forward the request if the bus id in the request is between 64-128.
I suspect in your PPC example that the separate root buses all had separate io/memory space as well and thus were completely separate. (That is, they don't share the equivalent of IO 0x0cf8.) If so, that's different from how the x86 qemu code and the x86 systems I was discussing above work.
-Kevin
On Sun, 2015-06-14 at 17:59 -0400, Kevin O'Connor wrote:
There are x86 systems with multiple separate PCI root buses where one can access the pci config space of all the buses using the same 0x0cf8 IO space. During system setup, the multiple PCI root buses are each configured to only respond to PCI config accesses within its range of bus numbers. So if "root1" is configured for bus ids between 64-128, then it will only forward the request if the bus id in the request is between 64-128.
I suspect in your PPC example that the separate root buses all had separate io/memory space as well and thus were completely separate. (That is, they don't share the equivalent of IO 0x0cf8.) If so, that's different from how the x86 qemu code and the x86 systems I was discussing above work.
Correct, my point is that qemu shouldn't be made to rely on the stable bus numbers. Why not use the mmconfig address instead ? That way you can factor the bus number in via an offset if it's relevant or provide a completely different address if the busses are separate.
Ben.
On Mon, Jun 15, 2015 at 12:50:08PM +1000, Benjamin Herrenschmidt wrote:
On Sun, 2015-06-14 at 17:59 -0400, Kevin O'Connor wrote:
There are x86 systems with multiple separate PCI root buses where one can access the pci config space of all the buses using the same 0x0cf8 IO space. During system setup, the multiple PCI root buses are each configured to only respond to PCI config accesses within its range of bus numbers. So if "root1" is configured for bus ids between 64-128, then it will only forward the request if the bus id in the request is between 64-128.
I suspect in your PPC example that the separate root buses all had separate io/memory space as well and thus were completely separate. (That is, they don't share the equivalent of IO 0x0cf8.) If so, that's different from how the x86 qemu code and the x86 systems I was discussing above work.
Correct, my point is that qemu shouldn't be made to rely on the stable bus numbers. Why not use the mmconfig address instead ?
These are traditional pci buses, so they don't have an mmconfig.
That way you can factor the bus number in via an offset if it's relevant or provide a completely different address if the busses are separate.
Ben.
On Thu, Jun 11, 2015 at 08:46:01PM +0300, Marcel Apfelbaum wrote:
On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
On real machines, the firmware assigns the 4 - it's not a physical address; it's a logical address (like all bus numbers in PCI). The firmware might assign a totally different number on the next boot.
Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly try to understand it.
I looked up a real hardware machine and it seemed to me that the extra pci root numbers are provided in the ACPI tables, meaning by the vendor, not the fw. In this case QEMU is the vendor, i440fx is the machine, right?
I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*.
So, I'm also not an expert on this. It seems to be a fairly esoteric area of PC initialization.
My understanding is that extra PCI roots are configured by coreboot outside of the normal PCI bridge mechanism. They are configured by assigning a base bus number and range (similar to the way PCI bridges are configured). All the PCI roots see all the PCI traffic, but they only forward those requests that fall within their assigned bus range.
On each boot, coreboot might decide to assign a different bus id to the extra roots (for example, if a device with a PCI bridge is inserted and it's bus allocation causes bus ids to shift). Technically, coreboot could even change the order extra buses are assigned bus ids, but doesn't today.
This was seen on several AMD systems - I'm told at least some Intel systems have multiple root buses, but the bus numbers are just hard wired.
They are doing it for the pci-2-pci bridges of course. I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding PCI root buses. Is my understanding incorrect?
SeaBIOS doesn't assign the extra PCI bus numbers on real hardware (nor even regular PCI bridge numbers) - that's all handled by coreboot.
Under coreboot, SeaBIOS scans the PCI buses to figure out what coreboot assigned - it doesn't mean the assignments are hard wired.
-Kevin
Hi,
On each boot, coreboot might decide to assign a different bus id to the extra roots (for example, if a device with a PCI bridge is inserted and it's bus allocation causes bus ids to shift). Technically, coreboot could even change the order extra buses are assigned bus ids, but doesn't today.
This was seen on several AMD systems - I'm told at least some Intel systems have multiple root buses, but the bus numbers are just hard wired.
This is how the qemu pxb works: root bus numbers are a config option for the root bridge device, i.e. from the guest point of view they are hard-wired.
cheers, Gerd
On 06/12/2015 09:00 AM, Gerd Hoffmann wrote:
Hi,
On each boot, coreboot might decide to assign a different bus id to the extra roots (for example, if a device with a PCI bridge is inserted and it's bus allocation causes bus ids to shift). Technically, coreboot could even change the order extra buses are assigned bus ids, but doesn't today.
This was seen on several AMD systems - I'm told at least some Intel systems have multiple root buses, but the bus numbers are just hard wired.
This is how the qemu pxb works: root bus numbers are a config option for the root bridge device, i.e. from the guest point of view they are hard-wired.
Exactly. In our case, the HW assigns the PXB bus bumber, and again, I saw this also on real HW with multiple buses, the bus nr comes from ACPI, meaning the vendor.
Let's focus on the problem in hand: We need a way for QEMU to write some fw path on bootorder fw_config file and both Seabios/OVMF need to know how to correctly map this to the actual device.
If the boot device is behind a PXI extra root bus, there is a need not only to differentiate the root bus but also to know *which one*. So we need the bus number, what other way is there? As Gerd mentioned, the PXB bus number is provided in QEMU command line, meaning hard-wired.
We can of course, as Laszlo suggested, add an extra condition the use of this path: /pci-root@bus-br/ on running in QEMU in order not to interfere with other HW. Less pretty but more robust.
Thanks, Marcel
cheers, Gerd
On Fri, Jun 12, 2015 at 03:17:27PM +0300, Marcel Apfelbaum wrote:
On 06/12/2015 09:00 AM, Gerd Hoffmann wrote:
On each boot, coreboot might decide to assign a different bus id to the extra roots (for example, if a device with a PCI bridge is inserted and it's bus allocation causes bus ids to shift). Technically, coreboot could even change the order extra buses are assigned bus ids, but doesn't today.
This was seen on several AMD systems - I'm told at least some Intel systems have multiple root buses, but the bus numbers are just hard wired.
This is how the qemu pxb works: root bus numbers are a config option for the root bridge device, i.e. from the guest point of view they are hard-wired.
Exactly. In our case, the HW assigns the PXB bus bumber, and again, I saw this also on real HW with multiple buses, the bus nr comes from ACPI, meaning the vendor.
I'm confused where ACPI comes into this. In all cases I know of, the firmware generates the ACPI tables to match the hardware. I've never heard of hardware configuring itself from the ACPI tables.
Let's focus on the problem in hand: We need a way for QEMU to write some fw path on bootorder fw_config file and both Seabios/OVMF need to know how to correctly map this to the actual device.
If the boot device is behind a PXI extra root bus, there is a need not only to differentiate the root bus but also to know *which one*. So we need the bus number, what other way is there?
The submitted patch changed the mechanism already in SeaBIOS. I'm not claiming the existing mechanism was perfect, but lets not claim that it's not possible either.
As Gerd mentioned, the PXB bus number is provided in QEMU command line, meaning hard-wired.
We can of course, as Laszlo suggested, add an extra condition the use of this path: /pci-root@bus-br/ on running in QEMU in order not to interfere with other HW. Less pretty but more robust.
-Kevin
On Fr, 2015-06-12 at 09:23 -0400, Kevin O'Connor wrote:
On Fri, Jun 12, 2015 at 03:17:27PM +0300, Marcel Apfelbaum wrote:
On 06/12/2015 09:00 AM, Gerd Hoffmann wrote:
On each boot, coreboot might decide to assign a different bus id to the extra roots (for example, if a device with a PCI bridge is inserted and it's bus allocation causes bus ids to shift). Technically, coreboot could even change the order extra buses are assigned bus ids, but doesn't today.
This was seen on several AMD systems - I'm told at least some Intel systems have multiple root buses, but the bus numbers are just hard wired.
This is how the qemu pxb works: root bus numbers are a config option for the root bridge device, i.e. from the guest point of view they are hard-wired.
Exactly. In our case, the HW assigns the PXB bus bumber, and again, I saw this also on real HW with multiple buses, the bus nr comes from ACPI, meaning the vendor.
I'm confused where ACPI comes into this. In all cases I know of, the firmware generates the ACPI tables to match the hardware. I've never heard of hardware configuring itself from the ACPI tables.
We have basically the same model in qemu, except that it isn't the firmware but qemu generating the tables (and qemu looks at the registers programmed by the firmware to make sure things match).
The pxb has no registers to program, the hardware just shows up on a bus number (qemu cfg, hard-wired for the guest). ACPI must specify it so the guest OS finds it. When passing bus numbers via fw_cfg the must match acpi of course.
I'm wondering whenever things become easier if we add config registers to the pxb, where the firmware can program the bus number range and we can use the config register base as a way to specify which pxb we are referring to ?
cheers, Gerd
Hi,
I'm wondering whenever things become easier if we add config registers to the pxb, where the firmware can program the bus number range and we can use the config register base as a way to specify which pxb we are referring to ?
... and, while thinking about ben's reply elsewhere in this thread, maybe even decouple the whole thing from the primary root bus? You can't program the devices via 0x0cf8 then, but we could add a mmconfig bar to the host bridge device ...
cheers, Gerd
On 06/15/2015 09:50 AM, Gerd Hoffmann wrote:
Hi,
I'm wondering whenever things become easier if we add config registers to the pxb, where the firmware can program the bus number range and we can use the config register base as a way to specify which pxb we are referring to ?
... and, while thinking about ben's reply elsewhere in this thread, maybe even decouple the whole thing from the primary root bus? You can't program the devices via 0x0cf8 then, but we could add a mmconfig bar to the host bridge device ...
Hi Gerd,
I thought about it of course, but it seemed to me to be over-kill and no real reason to do it. I would hove done it if: - there would be a "generic" spec for such a host-bridge specifying at least the registers for the bus number. I didn't find anything => => Seabios should be aware of a special QEMU device and look for it... ugly - I would have seen how Seabios/coreboot program the above bus number. Again, didn't find the host-bridge programming code. At last, I followed a 'real' PXB device, some old Intel snooping host bridge.
As always, I am open to ideas, but, the latest patches from Laszlo follows Seabios way, and if OVMF can also handle it maybe we can stop: "The use of logical bus number instead of bus index" discussion and start arguing on something else. :)
I just want to say thank you to everybody involved, a lot of information and good guidelines surfaced during this talk.
Until the next patch... Thanks, Marcel
cheers, Gerd
On Mon, Jun 15, 2015 at 08:01:08AM +0200, Gerd Hoffmann wrote:
On Fr, 2015-06-12 at 09:23 -0400, Kevin O'Connor wrote:
On Fri, Jun 12, 2015 at 03:17:27PM +0300, Marcel Apfelbaum wrote:
On 06/12/2015 09:00 AM, Gerd Hoffmann wrote:
On each boot, coreboot might decide to assign a different bus id to the extra roots (for example, if a device with a PCI bridge is inserted and it's bus allocation causes bus ids to shift). Technically, coreboot could even change the order extra buses are assigned bus ids, but doesn't today.
This was seen on several AMD systems - I'm told at least some Intel systems have multiple root buses, but the bus numbers are just hard wired.
This is how the qemu pxb works: root bus numbers are a config option for the root bridge device, i.e. from the guest point of view they are hard-wired.
Exactly. In our case, the HW assigns the PXB bus bumber, and again, I saw this also on real HW with multiple buses, the bus nr comes from ACPI, meaning the vendor.
I'm confused where ACPI comes into this. In all cases I know of, the firmware generates the ACPI tables to match the hardware. I've never heard of hardware configuring itself from the ACPI tables.
We have basically the same model in qemu, except that it isn't the firmware but qemu generating the tables (and qemu looks at the registers programmed by the firmware to make sure things match).
The pxb has no registers to program, the hardware just shows up on a bus number (qemu cfg, hard-wired for the guest). ACPI must specify it so the guest OS finds it. When passing bus numbers via fw_cfg the must match acpi of course.
I'm wondering whenever things become easier if we add config registers to the pxb, where the firmware can program the bus number range and we can use the config register base as a way to specify which pxb we are referring to ?
cheers, Gerd
But then we'll need a bunch of fw cfg entries to let guest discover the extra roots and their bus ranges.
Hi,
I'm wondering whenever things become easier if we add config registers to the pxb, where the firmware can program the bus number range and we can use the config register base as a way to specify which pxb we are referring to ?
But then we'll need a bunch of fw cfg entries to let guest discover the extra roots and their bus ranges.
We could add them to the pxb host bridge device (1b36:0009). Then we don't need any fw_cfg stuff, seabios could simply lookup/setup things in the 1b36:0009 pci config space ...
cheers, Gerd
On Mon, Jun 15, 2015 at 12:18:16PM +0200, Gerd Hoffmann wrote:
Hi,
I'm wondering whenever things become easier if we add config registers to the pxb, where the firmware can program the bus number range and we can use the config register base as a way to specify which pxb we are referring to ?
But then we'll need a bunch of fw cfg entries to let guest discover the extra roots and their bus ranges.
We could add them to the pxb host bridge device (1b36:0009). Then we don't need any fw_cfg stuff, seabios could simply lookup/setup things in the 1b36:0009 pci config space ...
cheers, Gerd
Sure but then it's not a fixed address so you can't use these to name boot devices.
On 06/11/15 15:58, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The IEEE Std 1275-1994:
IEEE Standard for Boot (Initialization Configuration) Firmware: Core Requirements and Practices 3.2.1.1 Node names Each node in the device tree is identified by a node name using the following notation: driver-name@unit-address:device-arguments
The driver name field is a sequence of between one and 31 letters [...]. By convention, this name includes the name of the device’s manufacturer and the device’s model name separated by a “,”. The unit address field is the text representation of the physical address of the device within the address space defined by its parent node. The form of the text representation is bus-dependent.
Note the "physical address" part in the above. Your patch changes the "pci-root@" syntax to use a logical address instead of a physical address. That is, unless I've missed something, SeaBIOS today uses a physical address (the n'th root bus) and the patch would change it to use a logical address.
One of the goals of using an "openfirmware" like address was so that they would be stable across boots (the same mechanism is also used with coreboot). Using a physical address is key for this, because simply adding or removing a PCI device could cause the logical PCI bridge enumeration to change - and that would mess up the bootorder list if it was based on logical addresses.
There are two questions here. The first is the inclusion of the "pci@i0cf8" node even if a "pci-root@x" node is present in front of it. The hunk that changes that is not your main concern, right? (And Marcel just described that hunk in more detail.)
The other question is how "x" is selected in "pci-root@x".
On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property. If you change that property from (say) 3 to 4, then the device paths exported by QEMU will change. However, the location (in the PCI hierarchy) of all the affected devices will *also* change at once, and their auto-enumerated, firmware-side device paths will reflect that. Therefore the new "bootorder" fw_cfg entries will match the freshly generated firmware-side device paths.
So why is this not stable? If you change the hardware without automatically updating any stashed firmware-side device paths, then things will fall apart without "bootorder" entries in the picture anyway.
Also, assuming you key off "x" of the running counter that counts root buses as they are found during enumeration, that's a possibility too, but I don't see how it gives more stability. If you insert a new root bus (with a device on it) between to preexistent ones, that will offset all the "x" values for the root buses that come after it by one.
In UEFI at least (I'm not speaking about OVMF in particular, but the UEFI spec), there is a "short-form device path" concept for hard drive and USB boot options. For hard disks, it is practically a relative device path that lacks the path fragment from the root node until just before the GPT partition identifier. The idea being, if you plug your SCSI controller in another PCI slot, the change in the full device path will be local to the path fragment that is not captured in the (persistent) boot option. The GPT GUID can identify the partition uniquely in the system wherever it exists, so it can be booted even without fully enumerating all devices and reproducing all the default boot options.
Short of such a "uniquely identifying relative devpath" trick, I don't think stability in firmware-stashed (ie. not regenerated) device paths exists in general, if the underlying hardware configuration is changed.
In summary: I think we could modify both QEMU and OVMF to use the "serial numbers" of the extra PCI root buses, in increasing bus number order, instead of their actual bus numbers, for identifying them. That's just a convention. Then the second hunk of this patch would not be necessary for SeaBIOS. But I think this convention would be only less logical, and not more stable.
Can you please elaborate? I'm confused.
Thanks Laszlo
On Thu, Jun 11, 2015 at 04:35:33PM +0200, Laszlo Ersek wrote:
On 06/11/15 15:58, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The IEEE Std 1275-1994:
IEEE Standard for Boot (Initialization Configuration) Firmware: Core Requirements and Practices 3.2.1.1 Node names Each node in the device tree is identified by a node name using the following notation: driver-name@unit-address:device-arguments
The driver name field is a sequence of between one and 31 letters [...]. By convention, this name includes the name of the device’s manufacturer and the device’s model name separated by a “,”. The unit address field is the text representation of the physical address of the device within the address space defined by its parent node. The form of the text representation is bus-dependent.
Note the "physical address" part in the above. Your patch changes the "pci-root@" syntax to use a logical address instead of a physical address. That is, unless I've missed something, SeaBIOS today uses a physical address (the n'th root bus) and the patch would change it to use a logical address.
One of the goals of using an "openfirmware" like address was so that they would be stable across boots (the same mechanism is also used with coreboot). Using a physical address is key for this, because simply adding or removing a PCI device could cause the logical PCI bridge enumeration to change - and that would mess up the bootorder list if it was based on logical addresses.
There are two questions here. The first is the inclusion of the "pci@i0cf8" node even if a "pci-root@x" node is present in front of it. The hunk that changes that is not your main concern, right? (And Marcel just described that hunk in more detail.)
The other question is how "x" is selected in "pci-root@x".
On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property. If you change that property from (say) 3 to 4, then the device paths exported by QEMU will change. However, the location (in the PCI hierarchy) of all the affected devices will *also* change at once, and their auto-enumerated, firmware-side device paths will reflect that. Therefore the new "bootorder" fw_cfg entries will match the freshly generated firmware-side device paths.
So why is this not stable? If you change the hardware without automatically updating any stashed firmware-side device paths, then things will fall apart without "bootorder" entries in the picture anyway.
Also, assuming you key off "x" of the running counter that counts root buses as they are found during enumeration, that's a possibility too, but I don't see how it gives more stability. If you insert a new root bus (with a device on it) between to preexistent ones, that will offset all the "x" values for the root buses that come after it by one.
The SeaBIOS code is used on both virtual machines and real machines. The bus number is something that is generated by software and it is not assured to be stable between boots. (For example, if someone adds a PCI device to their machine between boots then every bus number in the system might be different on the next boot.) The open firmware paths go to great length to avoid arbitrary bus numbers today - for example:
/pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0
Given the complexity to avoid arbitrary bus numbers I'm confused why one would want to add them.
In UEFI at least (I'm not speaking about OVMF in particular, but the UEFI spec), there is a "short-form device path" concept for hard drive and USB boot options. For hard disks, it is practically a relative device path that lacks the path fragment from the root node until just before the GPT partition identifier. The idea being, if you plug your SCSI controller in another PCI slot, the change in the full device path will be local to the path fragment that is not captured in the (persistent) boot option. The GPT GUID can identify the partition uniquely in the system wherever it exists, so it can be booted even without fully enumerating all devices and reproducing all the default boot options.
Short of such a "uniquely identifying relative devpath" trick, I don't think stability in firmware-stashed (ie. not regenerated) device paths exists in general, if the underlying hardware configuration is changed.
I'm not sure why you say that - it works just fine. The open firmware device paths relate a physical path to the given hardware and as long as one doesn't alter that physical path it will be the same path on every boot. (Specifically, one can add or remove unrelated PCI devices, USB devices, etc. without impacting the open firmware paths to devices not modified.)
In summary: I think we could modify both QEMU and OVMF to use the "serial numbers" of the extra PCI root buses, in increasing bus number order, instead of their actual bus numbers, for identifying them. That's just a convention. Then the second hunk of this patch would not be necessary for SeaBIOS. But I think this convention would be only less logical, and not more stable.
Can you please elaborate? I'm confused.
-Kevin
On 06/11/15 18:48, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:35:33PM +0200, Laszlo Ersek wrote:
On 06/11/15 15:58, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The IEEE Std 1275-1994:
IEEE Standard for Boot (Initialization Configuration) Firmware: Core Requirements and Practices 3.2.1.1 Node names Each node in the device tree is identified by a node name using the following notation: driver-name@unit-address:device-arguments
The driver name field is a sequence of between one and 31 letters [...]. By convention, this name includes the name of the device’s manufacturer and the device’s model name separated by a “,”. The unit address field is the text representation of the physical address of the device within the address space defined by its parent node. The form of the text representation is bus-dependent.
Note the "physical address" part in the above. Your patch changes the "pci-root@" syntax to use a logical address instead of a physical address. That is, unless I've missed something, SeaBIOS today uses a physical address (the n'th root bus) and the patch would change it to use a logical address.
One of the goals of using an "openfirmware" like address was so that they would be stable across boots (the same mechanism is also used with coreboot). Using a physical address is key for this, because simply adding or removing a PCI device could cause the logical PCI bridge enumeration to change - and that would mess up the bootorder list if it was based on logical addresses.
There are two questions here. The first is the inclusion of the "pci@i0cf8" node even if a "pci-root@x" node is present in front of it. The hunk that changes that is not your main concern, right? (And Marcel just described that hunk in more detail.)
The other question is how "x" is selected in "pci-root@x".
On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property. If you change that property from (say) 3 to 4, then the device paths exported by QEMU will change. However, the location (in the PCI hierarchy) of all the affected devices will *also* change at once, and their auto-enumerated, firmware-side device paths will reflect that. Therefore the new "bootorder" fw_cfg entries will match the freshly generated firmware-side device paths.
So why is this not stable? If you change the hardware without automatically updating any stashed firmware-side device paths, then things will fall apart without "bootorder" entries in the picture anyway.
Also, assuming you key off "x" of the running counter that counts root buses as they are found during enumeration, that's a possibility too, but I don't see how it gives more stability. If you insert a new root bus (with a device on it) between to preexistent ones, that will offset all the "x" values for the root buses that come after it by one.
The SeaBIOS code is used on both virtual machines and real machines. The bus number is something that is generated by software
Not the root bus numbers, as far as I understand.
(Please see the rest of my reply in the other sub-thread.)
Thanks Laszlo
and it is not assured to be stable between boots. (For example, if someone adds a PCI device to their machine between boots then every bus number in the system might be different on the next boot.) The open firmware paths go to great length to avoid arbitrary bus numbers today - for example:
/pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0
Given the complexity to avoid arbitrary bus numbers I'm confused why one would want to add them.
In UEFI at least (I'm not speaking about OVMF in particular, but the UEFI spec), there is a "short-form device path" concept for hard drive and USB boot options. For hard disks, it is practically a relative device path that lacks the path fragment from the root node until just before the GPT partition identifier. The idea being, if you plug your SCSI controller in another PCI slot, the change in the full device path will be local to the path fragment that is not captured in the (persistent) boot option. The GPT GUID can identify the partition uniquely in the system wherever it exists, so it can be booted even without fully enumerating all devices and reproducing all the default boot options.
Short of such a "uniquely identifying relative devpath" trick, I don't think stability in firmware-stashed (ie. not regenerated) device paths exists in general, if the underlying hardware configuration is changed.
I'm not sure why you say that - it works just fine. The open firmware device paths relate a physical path to the given hardware and as long as one doesn't alter that physical path it will be the same path on every boot. (Specifically, one can add or remove unrelated PCI devices, USB devices, etc. without impacting the open firmware paths to devices not modified.)
In summary: I think we could modify both QEMU and OVMF to use the "serial numbers" of the extra PCI root buses, in increasing bus number order, instead of their actual bus numbers, for identifying them. That's just a convention. Then the second hunk of this patch would not be necessary for SeaBIOS. But I think this convention would be only less logical, and not more stable.
Can you please elaborate? I'm confused.
-Kevin
SeaBIOS mailing list SeaBIOS@seabios.org http://www.seabios.org/mailman/listinfo/seabios
On Thu, Jun 11, 2015 at 12:48:22PM -0400, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:35:33PM +0200, Laszlo Ersek wrote:
On 06/11/15 15:58, Kevin O'Connor wrote:
On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
The fixes solves the following issue: The PXB device exposes a new pci root bridge with the fw path: /pci-root@4/..., in which 4 is the root bus number. Before this patch the fw path was wrongly computed: /pci-root@1/pci@i0cf8/... Fix the above issues: Correct the bus number and remove the extra host bridge description.
Why is that wrong? The previous path looks correct to me.
The IEEE Std 1275-1994:
IEEE Standard for Boot (Initialization Configuration) Firmware: Core Requirements and Practices 3.2.1.1 Node names Each node in the device tree is identified by a node name using the following notation: driver-name@unit-address:device-arguments
The driver name field is a sequence of between one and 31 letters [...]. By convention, this name includes the name of the device’s manufacturer and the device’s model name separated by a “,”. The unit address field is the text representation of the physical address of the device within the address space defined by its parent node. The form of the text representation is bus-dependent.
Note the "physical address" part in the above. Your patch changes the "pci-root@" syntax to use a logical address instead of a physical address. That is, unless I've missed something, SeaBIOS today uses a physical address (the n'th root bus) and the patch would change it to use a logical address.
One of the goals of using an "openfirmware" like address was so that they would be stable across boots (the same mechanism is also used with coreboot). Using a physical address is key for this, because simply adding or removing a PCI device could cause the logical PCI bridge enumeration to change - and that would mess up the bootorder list if it was based on logical addresses.
There are two questions here. The first is the inclusion of the "pci@i0cf8" node even if a "pci-root@x" node is present in front of it. The hunk that changes that is not your main concern, right? (And Marcel just described that hunk in more detail.)
The other question is how "x" is selected in "pci-root@x".
On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property. If you change that property from (say) 3 to 4, then the device paths exported by QEMU will change. However, the location (in the PCI hierarchy) of all the affected devices will *also* change at once, and their auto-enumerated, firmware-side device paths will reflect that. Therefore the new "bootorder" fw_cfg entries will match the freshly generated firmware-side device paths.
So why is this not stable? If you change the hardware without automatically updating any stashed firmware-side device paths, then things will fall apart without "bootorder" entries in the picture anyway.
Also, assuming you key off "x" of the running counter that counts root buses as they are found during enumeration, that's a possibility too, but I don't see how it gives more stability. If you insert a new root bus (with a device on it) between to preexistent ones, that will offset all the "x" values for the root buses that come after it by one.
The SeaBIOS code is used on both virtual machines and real machines. The bus number is something that is generated by software and it is not assured to be stable between boots. (For example, if someone adds a PCI device to their machine between boots then every bus number in the system might be different on the next boot.) The open firmware paths go to great length to avoid arbitrary bus numbers today - for example:
/pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0
Given the complexity to avoid arbitrary bus numbers I'm confused why one would want to add them.
Could you give an example real-hardware path when there are multiple roots though? I'd like to make sure what qemu generates matches that.
In UEFI at least (I'm not speaking about OVMF in particular, but the UEFI spec), there is a "short-form device path" concept for hard drive and USB boot options. For hard disks, it is practically a relative device path that lacks the path fragment from the root node until just before the GPT partition identifier. The idea being, if you plug your SCSI controller in another PCI slot, the change in the full device path will be local to the path fragment that is not captured in the (persistent) boot option. The GPT GUID can identify the partition uniquely in the system wherever it exists, so it can be booted even without fully enumerating all devices and reproducing all the default boot options.
Short of such a "uniquely identifying relative devpath" trick, I don't think stability in firmware-stashed (ie. not regenerated) device paths exists in general, if the underlying hardware configuration is changed.
I'm not sure why you say that - it works just fine. The open firmware device paths relate a physical path to the given hardware and as long as one doesn't alter that physical path it will be the same path on every boot. (Specifically, one can add or remove unrelated PCI devices, USB devices, etc. without impacting the open firmware paths to devices not modified.)
In summary: I think we could modify both QEMU and OVMF to use the "serial numbers" of the extra PCI root buses, in increasing bus number order, instead of their actual bus numbers, for identifying them. That's just a convention. Then the second hunk of this patch would not be necessary for SeaBIOS. But I think this convention would be only less logical, and not more stable.
Can you please elaborate? I'm confused.
-Kevin
On Sun, Jun 14, 2015 at 02:10:22PM +0200, Michael S. Tsirkin wrote:
On Thu, Jun 11, 2015 at 12:48:22PM -0400, Kevin O'Connor wrote:
The SeaBIOS code is used on both virtual machines and real machines. The bus number is something that is generated by software and it is not assured to be stable between boots. (For example, if someone adds a PCI device to their machine between boots then every bus number in the system might be different on the next boot.) The open firmware paths go to great length to avoid arbitrary bus numbers today - for example:
/pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0
Given the complexity to avoid arbitrary bus numbers I'm confused why one would want to add them.
Could you give an example real-hardware path when there are multiple roots though? I'd like to make sure what qemu generates matches that.
I don't have the hardware, but I've asked a user that does to send in a log.
Here's a real world example of a search path that is generated today for bus 0:
01.249: Searching bootorder for: /pci@i0cf8/*@11/drive@0/disk@0
Here's what SeaBIOS is coded to produce for a similar device on the first extra pci root bus instead:
01.249: Searching bootorder for: /pci-root@1/pci@i0cf8/*@11/drive@0/disk@0
Placing "pci@i0cf8" after "pci-root@1" is admittedly hokey, and I don't have any issue with changing it.
-Kevin