Reduce amount of consumed cpu time (i.e. don't spin forever) if number of started cpus are more then expected. And print a bug message into debug port.
And it is not safe to continue to boot if CountCPUs is more than cmos_smp_count and might be still racing, due to mp/acpi could be build with still changing CountCPUs. It is safer to stop and allow user to notice and restart system rather than debug possible inxplicable bugs later.
v2: extendend commit message with explanation why not continue to boot
Signed-off-by: Igor Mammedov imammedo@redhat.com --- src/smp.c | 8 +++++++- 1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/src/smp.c b/src/smp.c index 8c077a1..9933ac6 100644 --- a/src/smp.c +++ b/src/smp.c @@ -115,8 +115,14 @@ smp_probe(void) msleep(10); } else { u8 cmos_smp_count = inb_cmos(CMOS_BIOS_SMP_COUNT); - while (cmos_smp_count + 1 != readl(&CountCPUs)) + while (cmos_smp_count + 1 != readl(&CountCPUs)) { + if (cmos_smp_count + 1 < readl(&CountCPUs)) { + dprintf(1, "BUG: Expected %d cpu(s) but %d cpus started\n", + cmos_smp_count + 1, readl(&CountCPUs)); + hlt(); + } yield(); + } }
// Restore memory.
On Fri, Mar 16, 2012 at 10:33:15AM +0100, Igor Mammedov wrote:
Reduce amount of consumed cpu time (i.e. don't spin forever) if number of started cpus are more then expected. And print a bug message into debug port.
And it is not safe to continue to boot if CountCPUs is more than cmos_smp_count and might be still racing, due to mp/acpi could be build with still changing CountCPUs. It is safer to stop and allow user to notice and restart system rather than debug possible inxplicable bugs later.
I don't understand the gain in this. The CPU is spinning on the count, so (unless I missed something) there's a very good chance that this check wont trigger even if the emulator sets the counts wrong. It seems like we're just adding a random halt. Why bother?
[...]
--- a/src/smp.c +++ b/src/smp.c @@ -115,8 +115,14 @@ smp_probe(void) msleep(10); } else { u8 cmos_smp_count = inb_cmos(CMOS_BIOS_SMP_COUNT);
while (cmos_smp_count + 1 != readl(&CountCPUs))
while (cmos_smp_count + 1 != readl(&CountCPUs)) {
if (cmos_smp_count + 1 < readl(&CountCPUs)) {
dprintf(1, "BUG: Expected %d cpu(s) but %d cpus started\n",
cmos_smp_count + 1, readl(&CountCPUs));
hlt();
The panic() call should be used instead of dprintf()/hlt().
-Kevin
On 03/22/2012 12:39 AM, Kevin O'Connor wrote:
On Fri, Mar 16, 2012 at 10:33:15AM +0100, Igor Mammedov wrote:
Reduce amount of consumed cpu time (i.e. don't spin forever) if number of started cpus are more then expected. And print a bug message into debug port.
And it is not safe to continue to boot if CountCPUs is more than cmos_smp_count and might be still racing, due to mp/acpi could be build with still changing CountCPUs. It is safer to stop and allow user to notice and restart system rather than debug possible inxplicable bugs later.
I don't understand the gain in this. The CPU is spinning on the count, so (unless I missed something) there's a very good chance that this check wont trigger even if the emulator sets the counts wrong. It seems like we're just adding a random halt. Why bother?
In tests it is not just random condition. Every time cmos_smp_count < CountCPUs it hits this halt to prevent qemu process consume 98% cpu time. It happens for hot-plug case if emulator doesn't update cmos_smp_count on vcpu hotplug (relatively easy to fix in emulator) and for hot-unplug case when emulator updates cmos_smp_count correctly but unplugged vcpus aren't destroyed (it will remain so for some time since it need to be fixed in qemu and kvm code). So reason for this patch is to take in account a buggy emulator and not to ruin host if something went wrong.
[...]
--- a/src/smp.c +++ b/src/smp.c @@ -115,8 +115,14 @@ smp_probe(void) msleep(10); } else { u8 cmos_smp_count = inb_cmos(CMOS_BIOS_SMP_COUNT);
while (cmos_smp_count + 1 != readl(&CountCPUs))
while (cmos_smp_count + 1 != readl(&CountCPUs)) {
if (cmos_smp_count + 1< readl(&CountCPUs)) {
dprintf(1, "BUG: Expected %d cpu(s) but %d cpus started\n",
cmos_smp_count + 1, readl(&CountCPUs));
hlt();
The panic() call should be used instead of dprintf()/hlt().
Thanks, panic does the same but is more cleaner.
However Gleb've raised a valid question, whether we could move smp_probe to the point where console is available and print error message to it.
Would it be safe to do something like this from the POV of option roms?
diff --git a/src/post.c b/src/post.c index 8383b79..69b812e 100644 --- a/src/post.c +++ b/src/post.c @@ -240,20 +240,21 @@ maininit(void) if (CONFIG_THREADS && CONFIG_THREAD_OPTIONROMS) init_hw();
- // Find and initialize other cpus - smp_probe(); - // Setup interfaces that option roms may need bios32_setup(); pmm_setup(); pnp_setup(); kbd_setup(); mouse_setup(); - init_bios_tables();
// Run vga option rom vga_setup();
+ // Find and initialize other cpus + smp_probe(); //print an error to console if something went wrong + + init_bios_tables(); + // Do hardware initialization (if running synchronously) if (!CONFIG_THREADS || !CONFIG_THREAD_OPTIONROMS) { init_hw();
If it is possible, then lets do it instead of 'silent' halt, and if not then do our best and stop with panic printing only to debug port/console.
-Kevin