On Mon, 2013-11-04 at 11:51 +0100, Vasilis Liaskovitis wrote:
Any comment on this?
On Fri, Oct 25, 2013 at 11:32:10AM +0200, Vasilis Liaskovitis wrote:
This patch adds a _PXM object to seabios CPU objects. The _PXM value is derived from CPU SRAT entries, so build_ssdt needs to be called after build_srat for this to work.
The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when using seabios and a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux guest kernel parses the SRAT CPU entries at boot time and stores them in the array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel commit c4c60524). When the removed cpu is hot-added again, the linux kernel looks up the hot-added cpu object's _PXM value instead of somehow re-using the SRAT entry info (acpi_map_cpu2node calls acpi_get_node which calls acpi_get_pxm). If the _PXM value is not found, the CPU is assumed to be on node 0, and it is hot-plugged in the wrong NUMA node.
Which is the preferred OSPM way of looking up a CPU's proximity info at hotplug time? Is it the CPU object's _PXM value, or the already-parsed CPU SRAT entry? Or maybe both ways are valid?
SRAT describes proximity values at boot-time. During hotplug, the kernel is supposed to obtain the current proximity value from _PXM method.
This issue may require a kernel fix alternatively or additionally to the seabios fix: The kernel can save the originally parsed SRAT entry info somewhere before it resets it at hot-remove time, and use that info on hot-plug time if the _PXM value is missing for the hot-plugged CPU BIOS object. This way CPU hot-plug works well against a BIOS with no CPU _PXM info.
To support CPU hotplug, seabios needs to implement _PXM to CPU or its parent device object when the system has multiple nodes.
Thanks, -Toshi
Hi Toshi,
On Mon, Nov 04, 2013 at 01:26:14PM -0700, Toshi Kani wrote:
On Mon, 2013-11-04 at 11:51 +0100, Vasilis Liaskovitis wrote:
Any comment on this?
On Fri, Oct 25, 2013 at 11:32:10AM +0200, Vasilis Liaskovitis wrote:
This patch adds a _PXM object to seabios CPU objects. The _PXM value is derived from CPU SRAT entries, so build_ssdt needs to be called after build_srat for this to work.
The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when using seabios and a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux guest kernel parses the SRAT CPU entries at boot time and stores them in the array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel commit c4c60524). When the removed cpu is hot-added again, the linux kernel looks up the hot-added cpu object's _PXM value instead of somehow re-using the SRAT entry info (acpi_map_cpu2node calls acpi_get_node which calls acpi_get_pxm). If the _PXM value is not found, the CPU is assumed to be on node 0, and it is hot-plugged in the wrong NUMA node.
Which is the preferred OSPM way of looking up a CPU's proximity info at hotplug time? Is it the CPU object's _PXM value, or the already-parsed CPU SRAT entry? Or maybe both ways are valid?
SRAT describes proximity values at boot-time. During hotplug, the kernel is supposed to obtain the current proximity value from _PXM method.
thanks for the clarification.
This issue may require a kernel fix alternatively or additionally to the seabios fix: The kernel can save the originally parsed SRAT entry info somewhere before it resets it at hot-remove time, and use that info on hot-plug time if the _PXM value is missing for the hot-plugged CPU BIOS object. This way CPU hot-plug works well against a BIOS with no CPU _PXM info.
To support CPU hotplug, seabios needs to implement _PXM to CPU or its parent device object when the system has multiple nodes.
ok, so no linux kernel changes are needed. Only adding PXM to seabios CPUs objects should be enough, which is what this RFC patch does.
thanks,
- Vasilis
On Tue, 5 Nov 2013 12:20:29 +0200 Vasilis Liaskovitis vasilis.liaskovitis@profitbricks.com wrote:
Hi Toshi,
On Mon, Nov 04, 2013 at 01:26:14PM -0700, Toshi Kani wrote:
On Mon, 2013-11-04 at 11:51 +0100, Vasilis Liaskovitis wrote:
Any comment on this?
On Fri, Oct 25, 2013 at 11:32:10AM +0200, Vasilis Liaskovitis wrote:
This patch adds a _PXM object to seabios CPU objects. The _PXM value is derived from CPU SRAT entries, so build_ssdt needs to be called after build_srat for this to work.
The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when using seabios and a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux guest kernel parses the SRAT CPU entries at boot time and stores them in the array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel commit c4c60524). When the removed cpu is hot-added again, the linux kernel looks up the hot-added cpu object's _PXM value instead of somehow re-using the SRAT entry info (acpi_map_cpu2node calls acpi_get_node which calls acpi_get_pxm). If the _PXM value is not found, the CPU is assumed to be on node 0, and it is hot-plugged in the wrong NUMA node.
Which is the preferred OSPM way of looking up a CPU's proximity info at hotplug time? Is it the CPU object's _PXM value, or the already-parsed CPU SRAT entry? Or maybe both ways are valid?
SRAT describes proximity values at boot-time. During hotplug, the kernel is supposed to obtain the current proximity value from _PXM method.
quoting ACPI spec 5.0 (5.2.16 System Resource Affinity Table (SRAT)): "If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically added processor is not present in the SRAT, a _PXM object must exist for the processor’s device or one of its ancestors in the ACPI Namespace."
so _PXM in not MUST have if there is entry for device in SRAT table and it seems Seabios builds table with possible CPUs included, so kernel just don't use already present info. Perhaps kernel should be fixed (i.e take affinity from SRAT table first and override value with _PXM if present).
thanks for the clarification.
This issue may require a kernel fix alternatively or additionally to the seabios fix: The kernel can save the originally parsed SRAT entry info somewhere before it resets it at hot-remove time, and use that info on hot-plug time if the _PXM value is missing for the hot-plugged CPU BIOS object. This way CPU hot-plug works well against a BIOS with no CPU _PXM info.
To support CPU hotplug, seabios needs to implement _PXM to CPU or its parent device object when the system has multiple nodes.
ok, so no linux kernel changes are needed. Only adding PXM to seabios CPUs objects should be enough, which is what this RFC patch does.
thanks,
- Vasilis
SeaBIOS mailing list SeaBIOS@seabios.org http://www.seabios.org/mailman/listinfo/seabios
On Wed, 2013-11-06 at 13:44 +0100, Igor Mammedov wrote:
On Tue, 5 Nov 2013 12:20:29 +0200 Vasilis Liaskovitis vasilis.liaskovitis@profitbricks.com wrote:
Hi Toshi,
On Mon, Nov 04, 2013 at 01:26:14PM -0700, Toshi Kani wrote:
On Mon, 2013-11-04 at 11:51 +0100, Vasilis Liaskovitis wrote:
Any comment on this?
On Fri, Oct 25, 2013 at 11:32:10AM +0200, Vasilis Liaskovitis wrote:
This patch adds a _PXM object to seabios CPU objects. The _PXM value is derived from CPU SRAT entries, so build_ssdt needs to be called after build_srat for this to work.
The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when using seabios and a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux guest kernel parses the SRAT CPU entries at boot time and stores them in the array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel commit c4c60524). When the removed cpu is hot-added again, the linux kernel looks up the hot-added cpu object's _PXM value instead of somehow re-using the SRAT entry info (acpi_map_cpu2node calls acpi_get_node which calls acpi_get_pxm). If the _PXM value is not found, the CPU is assumed to be on node 0, and it is hot-plugged in the wrong NUMA node.
Which is the preferred OSPM way of looking up a CPU's proximity info at hotplug time? Is it the CPU object's _PXM value, or the already-parsed CPU SRAT entry? Or maybe both ways are valid?
SRAT describes proximity values at boot-time. During hotplug, the kernel is supposed to obtain the current proximity value from _PXM method.
quoting ACPI spec 5.0 (5.2.16 System Resource Affinity Table (SRAT)): "If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically added processor is not present in the SRAT, a _PXM object must exist for the processor’s device or one of its ancestors in the ACPI Namespace."
so _PXM in not MUST have if there is entry for device in SRAT table and it seems Seabios builds table with possible CPUs included, so kernel just don't use already present info. Perhaps kernel should be fixed (i.e take affinity from SRAT table first and override value with _PXM if present).
I think the above statement is to emphasize the need of _PXM for hotplug, but I will check the intention of the statement.
Here is a quote from ACPI 5.0, which restates what I said before. === 17.2.1 System Resource Affinity Table Definition
This optional System Resource Affinity Table (SRAT) provides the boot time description of the processor and memory ranges belonging to a system locality. OSPM will consume the SRAT only at boot time. OSPM should use _PXM for any devices that are hot-added into the system after boot up. ====
Thanks, -Toshi
On Tue, 5 Nov 2013 12:20:29 +0200 Vasilis Liaskovitis vasilis.liaskovitis@profitbricks.com wrote:
[...]
This issue may require a kernel fix alternatively or additionally to the seabios fix: The kernel can save the originally parsed SRAT entry info somewhere before it resets it at hot-remove time, and use that info on hot-plug time if the _PXM value is missing for the hot-plugged CPU BIOS object. This way CPU hot-plug works well against a BIOS with no CPU _PXM info.
To support CPU hotplug, seabios needs to implement _PXM to CPU or its parent device object when the system has multiple nodes.
BTW: may we should compare linux behavior with Windows's one. usually MS implements ACPI spec more strictly.
ok, so no linux kernel changes are needed. Only adding PXM to seabios CPUs objects should be enough, which is what this RFC patch does.
thanks,
- Vasilis
SeaBIOS mailing list SeaBIOS@seabios.org http://www.seabios.org/mailman/listinfo/seabios