Steve Isaacs wrote:
On Thu, 2007-12-20 at 16:40 -0700, Myles Watson wrote:
I'm seeing delays between each of the following messages -- approximately 8 seconds. Is this normal?
Not for me on Tyan s2895 (Dual Opteron NVidia CK804.)
Myles
Thanks,
hhhmmm.... Looks like I've got a problem.
Steve
Steve,
core0 started: 01 *01*00000001 *02*00000001 *03*00000001
This looks like there is a problems with starting other cores other than 0 on each node.
Try setting CONFIG_LOGICAL_CPUS 0 and see if that makes the delay go away. I am going to assume that is where the problem is based on the *01*00000001 message is probably coming from wait_ap_started().
I will attempt to give you an overview of how I think this works. First thing is that each APs APIC is used to hold flags about the AP(core) state. This is the only good way to communicate between cores (sharing memory can get very messy with CAR). The other flag is the BIOS/INIT/COLD Reset flags that set when each nodes core0 executes the ROM, distinguish_cpu_resets(). The core0 execution is done near the end of setup_coherent_ht_domain(). Then the bsp, node0core0, waits, wait_all_core0_started(), for node1core0 to execute and loop in init_cpus(). This seems to be working based on the message core0 started: 01.
Next is start_other_cores() and wait_all_other_cores_started(bsp_apicid). This is where the long delay is probably happening as cores are off running and the BSP is waiting. I don't have any idea why it is stuck waiting but you should get some insight by instrumenting init_cpus(), wait_cpu_state() and wait_ap_started();
Marc