Now this is an actual proposal I would like to get included on Seabios.
It's much simpler than the previous RFCs. It simply changes the ACPI and mptable code to not require APIC IDs to be contiguous, eliminates references to the CountCPUs variable, using a 256-bit APIC ID bitmap to find out which CPUs are present on boot.
Eduardo Habkost (2): acpi: set I/O APIC ID to 0 by default allow CPUs to have non-contiguous Local APIC IDs
src/acpi-dsdt.dsl | 4 +++- src/acpi.c | 11 ++++++----- src/config.h | 2 ++ src/mptable.c | 2 +- src/smp.c | 17 +++++++++++++++++ src/util.h | 1 + 6 files changed, 30 insertions(+), 7 deletions(-)
When resetting an I/O APIC, its ID is set to 0, so set it to 0 on the MADT table too.
Signed-off-by: Eduardo Habkost ehabkost@redhat.com --- src/acpi.c | 2 +- src/config.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/acpi.c b/src/acpi.c index 55e4607..3f55de9 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -335,7 +335,7 @@ build_madt(void) struct madt_io_apic *io_apic = (void*)apic; io_apic->type = APIC_IO; io_apic->length = sizeof(*io_apic); - io_apic->io_apic_id = CountCPUs; + io_apic->io_apic_id = BUILD_IOAPIC_ID; io_apic->address = cpu_to_le32(BUILD_IOAPIC_ADDR); io_apic->interrupt = cpu_to_le32(0);
diff --git a/src/config.h b/src/config.h index 3a70867..878c691 100644 --- a/src/config.h +++ b/src/config.h @@ -52,9 +52,11 @@ #define BUILD_PCIMEM64_END 0x10000000000ULL
#define BUILD_IOAPIC_ADDR 0xfec00000 +#define BUILD_IOAPIC_ID 0 #define BUILD_HPET_ADDRESS 0xfed00000 #define BUILD_APIC_ADDR 0xfee00000
+ // Important real-mode segments #define SEG_IVT 0x0000 #define SEG_BDA 0x0040
On Thu, Jul 19, 2012 at 05:52:41PM -0300, Eduardo Habkost wrote:
When resetting an I/O APIC, its ID is set to 0, so set it to 0 on the MADT table too.
Actually BIOS needs to configure ioapic id to a uniqe value. This does not really matter for KVM though.
Signed-off-by: Eduardo Habkost ehabkost@redhat.com
src/acpi.c | 2 +- src/config.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/acpi.c b/src/acpi.c index 55e4607..3f55de9 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -335,7 +335,7 @@ build_madt(void) struct madt_io_apic *io_apic = (void*)apic; io_apic->type = APIC_IO; io_apic->length = sizeof(*io_apic);
- io_apic->io_apic_id = CountCPUs;
- io_apic->io_apic_id = BUILD_IOAPIC_ID; io_apic->address = cpu_to_le32(BUILD_IOAPIC_ADDR); io_apic->interrupt = cpu_to_le32(0);
diff --git a/src/config.h b/src/config.h index 3a70867..878c691 100644 --- a/src/config.h +++ b/src/config.h @@ -52,9 +52,11 @@ #define BUILD_PCIMEM64_END 0x10000000000ULL
#define BUILD_IOAPIC_ADDR 0xfec00000 +#define BUILD_IOAPIC_ID 0 #define BUILD_HPET_ADDRESS 0xfed00000 #define BUILD_APIC_ADDR 0xfee00000
// Important real-mode segments #define SEG_IVT 0x0000
#define SEG_BDA 0x0040
1.7.10.4
-- Gleb.
On Fri, Jul 20, 2012 at 12:18:59AM +0300, Gleb Natapov wrote:
On Thu, Jul 19, 2012 at 05:52:41PM -0300, Eduardo Habkost wrote:
When resetting an I/O APIC, its ID is set to 0, so set it to 0 on the MADT table too.
Actually BIOS needs to configure ioapic id to a uniqe value. This does not really matter for KVM though.
Where does this requirement comes from? I am guessing it matters only when the I/O APIC is directly connected to the APIC bus (according to Intel SDM, that's the case only for old Pentium and P6 CPUs)[1].
Anyway, even if some hardware has this unique-ID requirement, today Seabios does not fulfill it, leaving the I/O APIC ID as 0. The patch at least makes the MADT table match reality.
[1] I have checked 3 different machines, and all machines I have looked have an I/O APIC ID that conflicts with an existing Local APIC ID, on the ACPI MADT table.
Some iasl dumps may be found online by googling for: "Subtable Type : 01" "I/O Apic ID"
I looked at 5 or 6 matches, and almost every one have an I/O APIC ID conflicting with a Local APIC ID.
Signed-off-by: Eduardo Habkost ehabkost@redhat.com
src/acpi.c | 2 +- src/config.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/acpi.c b/src/acpi.c index 55e4607..3f55de9 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -335,7 +335,7 @@ build_madt(void) struct madt_io_apic *io_apic = (void*)apic; io_apic->type = APIC_IO; io_apic->length = sizeof(*io_apic);
- io_apic->io_apic_id = CountCPUs;
- io_apic->io_apic_id = BUILD_IOAPIC_ID; io_apic->address = cpu_to_le32(BUILD_IOAPIC_ADDR); io_apic->interrupt = cpu_to_le32(0);
diff --git a/src/config.h b/src/config.h index 3a70867..878c691 100644 --- a/src/config.h +++ b/src/config.h @@ -52,9 +52,11 @@ #define BUILD_PCIMEM64_END 0x10000000000ULL
#define BUILD_IOAPIC_ADDR 0xfec00000 +#define BUILD_IOAPIC_ID 0 #define BUILD_HPET_ADDRESS 0xfed00000 #define BUILD_APIC_ADDR 0xfee00000
// Important real-mode segments #define SEG_IVT 0x0000
#define SEG_BDA 0x0040
1.7.10.4
-- Gleb.
On Fri, Jul 20, 2012 at 01:22:43PM -0300, Eduardo Habkost wrote:
On Fri, Jul 20, 2012 at 12:18:59AM +0300, Gleb Natapov wrote:
On Thu, Jul 19, 2012 at 05:52:41PM -0300, Eduardo Habkost wrote:
When resetting an I/O APIC, its ID is set to 0, so set it to 0 on the MADT table too.
Actually BIOS needs to configure ioapic id to a uniqe value. This does not really matter for KVM though.
Where does this requirement comes from? I am guessing it matters only when the I/O APIC is directly connected to the APIC bus (according to Intel SDM, that's the case only for old Pentium and P6 CPUs)[1].
http://www.intel.com/design/chipsets/datashts/29056601.pdf says that it should be programmed to unique value. I checked 4 machines on 3 of them this was the case on one (AMD) there are 3 IOAPICs and they all overlap with APICs.
Anyway, even if some hardware has this unique-ID requirement, today Seabios does not fulfill it, leaving the I/O APIC ID as 0. The patch at least makes the MADT table match reality.
The currant code was written when we could only dream supporting 16 cpus and back than it did correct thing, but now it just broken. QEMU do not care about IOAPIC ID for sure so I am OK with the patch.
[1] I have checked 3 different machines, and all machines I have looked have an I/O APIC ID that conflicts with an existing Local APIC ID, on the ACPI MADT table.
Some iasl dumps may be found online by googling for: "Subtable Type : 01" "I/O Apic ID"
I looked at 5 or 6 matches, and almost every one have an I/O APIC ID conflicting with a Local APIC ID.
1 from 4 for me :)
Signed-off-by: Eduardo Habkost ehabkost@redhat.com
src/acpi.c | 2 +- src/config.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/acpi.c b/src/acpi.c index 55e4607..3f55de9 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -335,7 +335,7 @@ build_madt(void) struct madt_io_apic *io_apic = (void*)apic; io_apic->type = APIC_IO; io_apic->length = sizeof(*io_apic);
- io_apic->io_apic_id = CountCPUs;
- io_apic->io_apic_id = BUILD_IOAPIC_ID; io_apic->address = cpu_to_le32(BUILD_IOAPIC_ADDR); io_apic->interrupt = cpu_to_le32(0);
diff --git a/src/config.h b/src/config.h index 3a70867..878c691 100644 --- a/src/config.h +++ b/src/config.h @@ -52,9 +52,11 @@ #define BUILD_PCIMEM64_END 0x10000000000ULL
#define BUILD_IOAPIC_ADDR 0xfec00000 +#define BUILD_IOAPIC_ID 0 #define BUILD_HPET_ADDRESS 0xfed00000 #define BUILD_APIC_ADDR 0xfee00000
// Important real-mode segments #define SEG_IVT 0x0000
#define SEG_BDA 0x0040
1.7.10.4
-- Gleb.
-- Eduardo
-- Gleb.
On Mon, Jul 23, 2012 at 02:42:27PM +0300, Gleb Natapov wrote:
On Fri, Jul 20, 2012 at 01:22:43PM -0300, Eduardo Habkost wrote:
On Fri, Jul 20, 2012 at 12:18:59AM +0300, Gleb Natapov wrote:
On Thu, Jul 19, 2012 at 05:52:41PM -0300, Eduardo Habkost wrote:
When resetting an I/O APIC, its ID is set to 0, so set it to 0 on the MADT table too.
Actually BIOS needs to configure ioapic id to a uniqe value. This does not really matter for KVM though.
Where does this requirement comes from? I am guessing it matters only when the I/O APIC is directly connected to the APIC bus (according to Intel SDM, that's the case only for old Pentium and P6 CPUs)[1].
http://www.intel.com/design/chipsets/datashts/29056601.pdf says that it should be programmed to unique value. I checked 4 machines on 3 of them this was the case on one (AMD) there are 3 IOAPICs and they all overlap with APICs.
True, but it says "All APIC devices using the APIC bus". Figure 10-1 on Intel SDM Volume 3 makes me believe the I/O APIC and Local APICs are not connected to the same APIC bus since Pentium 4.
Anyway, if anybody wants to patch SeaBIOS to guarantee the I/O APIC ID not conflict with the Local APICs, it would be welcome. While this is not done, the ACPI tables need to match reality.
(The MP-Table has to be changed too, I will send a new patch soon. Thanks for pointing that out).
Anyway, even if some hardware has this unique-ID requirement, today Seabios does not fulfill it, leaving the I/O APIC ID as 0. The patch at least makes the MADT table match reality.
The currant code was written when we could only dream supporting 16 cpus and back than it did correct thing, but now it just broken. QEMU do not care about IOAPIC ID for sure so I am OK with the patch.
[1] I have checked 3 different machines, and all machines I have looked have an I/O APIC ID that conflicts with an existing Local APIC ID, on the ACPI MADT table.
Some iasl dumps may be found online by googling for: "Subtable Type : 01" "I/O Apic ID"
I looked at 5 or 6 matches, and almost every one have an I/O APIC ID conflicting with a Local APIC ID.
1 from 4 for me :)
Signed-off-by: Eduardo Habkost ehabkost@redhat.com
src/acpi.c | 2 +- src/config.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/acpi.c b/src/acpi.c index 55e4607..3f55de9 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -335,7 +335,7 @@ build_madt(void) struct madt_io_apic *io_apic = (void*)apic; io_apic->type = APIC_IO; io_apic->length = sizeof(*io_apic);
- io_apic->io_apic_id = CountCPUs;
- io_apic->io_apic_id = BUILD_IOAPIC_ID; io_apic->address = cpu_to_le32(BUILD_IOAPIC_ADDR); io_apic->interrupt = cpu_to_le32(0);
diff --git a/src/config.h b/src/config.h index 3a70867..878c691 100644 --- a/src/config.h +++ b/src/config.h @@ -52,9 +52,11 @@ #define BUILD_PCIMEM64_END 0x10000000000ULL
#define BUILD_IOAPIC_ADDR 0xfec00000 +#define BUILD_IOAPIC_ID 0 #define BUILD_HPET_ADDRESS 0xfed00000 #define BUILD_APIC_ADDR 0xfee00000
// Important real-mode segments #define SEG_IVT 0x0000
#define SEG_BDA 0x0040
1.7.10.4
-- Gleb.
-- Eduardo
-- Gleb.
Extract Local APIC IDs directly from the CPUs, and instead of check for "i < CountCPUs", check if the APIC ID was present on boot, when building ACPI tables and the MP-Table.
This keeps ACPI Processor ID == APIC ID, but allows the hardware<->Seabios interface be completely APIC-ID based, and Seabios may change the way ACPI Processor IDs are chosen in the future.
As we currently Seabios support only xAPIC and not x2APIC, the list of present-on-boot APIC IDs is a 256-bit bitmap. If one day Seabios starts to support x2APIC, the data structure used to enumerate the APIC IDs will have to be changed.
Signed-off-by: Eduardo Habkost ehabkost@redhat.com --- src/acpi-dsdt.dsl | 4 +++- src/acpi.c | 9 +++++---- src/mptable.c | 2 +- src/smp.c | 17 +++++++++++++++++ src/util.h | 1 + 5 files changed, 27 insertions(+), 6 deletions(-)
diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 2060686..72dc7d8 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -676,6 +676,7 @@ DefinitionBlock ( /* Methods called by run-time generated SSDT Processor objects */ Method (CPMA, 1, NotSerialized) { // _MAT method - create an madt apic buffer + // Arg0 = Processor ID = Local APIC ID // Local0 = CPON flag for this cpu Store(DerefOf(Index(CPON, Arg0)), Local0) // Local1 = Buffer (in madt apic form) to return @@ -688,6 +689,7 @@ DefinitionBlock ( } Method (CPST, 1, NotSerialized) { // _STA method - return ON status of cpu + // Arg0 = Processor ID = Local APIC ID // Local0 = CPON flag for this cpu Store(DerefOf(Index(CPON, Arg0)), Local0) If (Local0) { Return(0xF) } Else { Return(0x0) } @@ -708,7 +710,7 @@ DefinitionBlock ( Store (PRS, Local5) // Local2 = last read byte from bitmap Store (Zero, Local2) - // Local0 = cpuid iterator + // Local0 = Processor ID / APIC ID iterator Store (Zero, Local0) While (LLess(Local0, SizeOf(CPON))) { // Local1 = CPON flag for this cpu diff --git a/src/acpi.c b/src/acpi.c index 3f55de9..d79a1e9 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -326,7 +326,7 @@ build_madt(void) apic->length = sizeof(*apic); apic->processor_id = i; apic->local_apic_id = i; - if (i < CountCPUs) + if (apic_id_is_present(apic->local_apic_id)) apic->flags = cpu_to_le32(1); else apic->flags = cpu_to_le32(0); @@ -444,6 +444,7 @@ build_ssdt(void) }
// build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}" + // Arg0 = Processor ID = APIC ID *(ssdt_ptr++) = 0x14; // MethodOp ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*acpi_cpus), 2); *(ssdt_ptr++) = 'N'; @@ -476,7 +477,7 @@ build_ssdt(void) ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*acpi_cpus), 2); *(ssdt_ptr++) = acpi_cpus; for (i=0; i<acpi_cpus; i++) - *(ssdt_ptr++) = (i < CountCPUs) ? 0x01 : 0x00; + *(ssdt_ptr++) = (apic_id_is_present(i)) ? 0x01 : 0x00;
// store pci io windows: start, end, length // this way we don't have to do the math in the dsdt @@ -655,10 +656,10 @@ build_srat(void) core->proximity_lo = curnode; memset(core->proximity_hi, 0, 3); core->local_sapic_eid = 0; - if (i < CountCPUs) + if (apic_id_is_present(i)) core->flags = cpu_to_le32(1); else - core->flags = 0; + core->flags = cpu_to_le32(0); core++; }
diff --git a/src/mptable.c b/src/mptable.c index 103f462..9406f98 100644 --- a/src/mptable.c +++ b/src/mptable.c @@ -59,7 +59,7 @@ mptable_init(void) cpu->apicid = i; cpu->apicver = apic_version; /* cpu flags: enabled, bootstrap cpu */ - cpu->cpuflag = ((i<CountCPUs) ? 0x01 : 0x00) | ((i==0) ? 0x02 : 0x00); + cpu->cpuflag = (apic_id_is_present(i) ? 0x01 : 0x00) | ((i==0) ? 0x02 : 0x00); cpu->cpusignature = cpuid_signature; cpu->featureflag = cpuid_features; cpu++; diff --git a/src/smp.c b/src/smp.c index 8c077a1..af92e4a 100644 --- a/src/smp.c +++ b/src/smp.c @@ -36,6 +36,8 @@ wrmsr_smp(u32 index, u64 val)
u32 CountCPUs VAR16VISIBLE; u32 MaxCountCPUs VAR16VISIBLE; +// 256 bits for the found APIC IDs +u32 FoundAPICIDs[256/32] VAR16VISIBLE; extern void smp_ap_boot_code(void); ASM16( " .global smp_ap_boot_code\n" @@ -59,6 +61,12 @@ ASM16( " jmp 1b\n" "2:\n"
+ // get apic ID on EBX, set bit on FoundAPICIDs + " mov $1, %eax\n" + " cpuid\n" + " shrl $24, %ebx\n" + " lock bts %ebx, FoundAPICIDs\n" + // Increment the cpu counter " lock incl CountCPUs\n"
@@ -67,6 +75,11 @@ ASM16( " jmp 1b\n" );
+int apic_id_is_present(u8 apic_id) +{ + return FoundAPICIDs[apic_id/32] & (1 << (apic_id % 32)); +} + // find and initialize the CPUs by launching a SIPI to them void smp_probe(void) @@ -82,6 +95,10 @@ smp_probe(void) return; }
+ // mark the BSP initial APIC ID as found, too: + u8 apic_id = ebx>>24; + FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32)); + // Init the counter. writel(&CountCPUs, 1);
diff --git a/src/util.h b/src/util.h index ef8ec7c..89e928c 100644 --- a/src/util.h +++ b/src/util.h @@ -355,6 +355,7 @@ extern u32 CountCPUs; extern u32 MaxCountCPUs; void wrmsr_smp(u32 index, u64 val); void smp_probe(void); +int apic_id_is_present(u8 apic_id);
// coreboot.c extern const char *CBvendor, *CBpart;
On Thu, Jul 19, 2012 at 05:52:42PM -0300, Eduardo Habkost wrote:
Extract Local APIC IDs directly from the CPUs, and instead of check for "i < CountCPUs", check if the APIC ID was present on boot, when building ACPI tables and the MP-Table.
This keeps ACPI Processor ID == APIC ID, but allows the hardware<->Seabios interface be completely APIC-ID based, and Seabios may change the way ACPI Processor IDs are chosen in the future.
As we currently Seabios support only xAPIC and not x2APIC, the list of present-on-boot APIC IDs is a 256-bit bitmap. If one day Seabios starts to support x2APIC, the data structure used to enumerate the APIC IDs will have to be changed.
Is this SeaBIOS change dependent on a particular version of QEMU?
[...]
--- a/src/smp.c +++ b/src/smp.c @@ -36,6 +36,8 @@ wrmsr_smp(u32 index, u64 val)
u32 CountCPUs VAR16VISIBLE; u32 MaxCountCPUs VAR16VISIBLE; +// 256 bits for the found APIC IDs +u32 FoundAPICIDs[256/32] VAR16VISIBLE; extern void smp_ap_boot_code(void); ASM16( " .global smp_ap_boot_code\n" @@ -59,6 +61,12 @@ ASM16( " jmp 1b\n" "2:\n"
- // get apic ID on EBX, set bit on FoundAPICIDs
- " mov $1, %eax\n"
- " cpuid\n"
- " shrl $24, %ebx\n"
- " lock bts %ebx, FoundAPICIDs\n"
Because of the insanity that is 16bit mode, please use explicit size suffixes on assembler instructions (ie, "movl" and "btsl").
-Kevin
On Thu, Jul 19, 2012 at 09:12:54PM -0400, Kevin O'Connor wrote:
On Thu, Jul 19, 2012 at 05:52:42PM -0300, Eduardo Habkost wrote:
Extract Local APIC IDs directly from the CPUs, and instead of check for "i < CountCPUs", check if the APIC ID was present on boot, when building ACPI tables and the MP-Table.
This keeps ACPI Processor ID == APIC ID, but allows the hardware<->Seabios interface be completely APIC-ID based, and Seabios may change the way ACPI Processor IDs are chosen in the future.
As we currently Seabios support only xAPIC and not x2APIC, the list of present-on-boot APIC IDs is a 256-bit bitmap. If one day Seabios starts to support x2APIC, the data structure used to enumerate the APIC IDs will have to be changed.
Is this SeaBIOS change dependent on a particular version of QEMU?
No, it's the opposite: QEMU needs this change first, to be able to fix some CPU socket/core/thread topology bugs caused by incorrect APIC IDs.
For hardware that has contiguous APIC IDs (that's the case for existing QEMU versions), the OS-visible behavior and resulting tables should be exactly the same.
[...]
--- a/src/smp.c +++ b/src/smp.c @@ -36,6 +36,8 @@ wrmsr_smp(u32 index, u64 val)
u32 CountCPUs VAR16VISIBLE; u32 MaxCountCPUs VAR16VISIBLE; +// 256 bits for the found APIC IDs +u32 FoundAPICIDs[256/32] VAR16VISIBLE; extern void smp_ap_boot_code(void); ASM16( " .global smp_ap_boot_code\n" @@ -59,6 +61,12 @@ ASM16( " jmp 1b\n" "2:\n"
- // get apic ID on EBX, set bit on FoundAPICIDs
- " mov $1, %eax\n"
- " cpuid\n"
- " shrl $24, %ebx\n"
- " lock bts %ebx, FoundAPICIDs\n"
Because of the insanity that is 16bit mode, please use explicit size suffixes on assembler instructions (ie, "movl" and "btsl").
I'll do. Thanks for the feedback.