On Fri, 2007-12-21 at 14:15 -0700, Marc Jones wrote:
Try setting CONFIG_LOGICAL_CPUS 0 and see if that makes the delay go away. I am going to assume that is where the problem is based on the *01*00000001 message is probably coming from wait_ap_started().
I'll give it a try. This definitely had an effect. I'm seeing these with only one delay now.
02 nodes initialized. *02*00000001
I will attempt to give you an overview of how I think this works. First thing is that each APs APIC is used to hold flags about the AP(core) state. This is the only good way to communicate between cores (sharing memory can get very messy with CAR). The other flag is the BIOS/INIT/COLD Reset flags that set when each nodes core0 executes the ROM, distinguish_cpu_resets(). The core0 execution is done near the end of setup_coherent_ht_domain(). Then the bsp, node0core0, waits, wait_all_core0_started(), for node1core0 to execute and loop in init_cpus(). This seems to be working based on the message core0 started: 01.
Next is start_other_cores() and wait_all_other_cores_started(bsp_apicid). This is where the long delay is probably happening as cores are off running and the BSP is waiting. I don't have any idea why it is stuck waiting but you should get some insight by instrumenting init_cpus(), wait_cpu_state() and wait_ap_started();
Thanks for this insight. I didn't realize the importance of the APIC. I'll follow this logic through and see what I can learn.
Steve