I'm seeing delays between each of the following messages -- approximately 8 seconds. Is this normal?
core0 started: 01 *01*00000001 *02*00000001 *03*00000001
This is a dual opteron system.
Thanks,
Steve
On Dec 20, 2007 3:30 PM, Steve Isaacs yasteve@gmail.com wrote:
I'm seeing delays between each of the following messages -- approximately 8 seconds. Is this normal?
core0 started: 01 *01*00000001 *02*00000001 *03*00000001
weird. That should be near instantaneous. Do you have both cpus in and, do they both have memory attached.
ron
On Thu, 2007-12-20 at 15:38 -0800, ron minnich wrote:
On Dec 20, 2007 3:30 PM, Steve Isaacs yasteve@gmail.com wrote:
I'm seeing delays between each of the following messages -- approximately 8 seconds. Is this normal?
core0 started: 01 *01*00000001 *02*00000001 *03*00000001
weird. That should be near instantaneous. Do you have both cpus in and, do they both have memory attached.
ron
Yes, and later, RAM initializes and execution continues with CAR turned off.
Steve
I'm seeing delays between each of the following messages -- approximately 8 seconds. Is this normal?
Not for me on Tyan s2895 (Dual Opteron NVidia CK804.)
Myles
core0 started: 01 *01*00000001 *02*00000001 *03*00000001
This is a dual opteron system.
Thanks,
Steve
-- linuxbios mailing list linuxbios@linuxbios.org http://www.linuxbios.org/mailman/listinfo/linuxbios
On Thu, 2007-12-20 at 16:40 -0700, Myles Watson wrote:
I'm seeing delays between each of the following messages -- approximately 8 seconds. Is this normal?
Not for me on Tyan s2895 (Dual Opteron NVidia CK804.)
Myles
Thanks,
hhhmmm.... Looks like I've got a problem.
Steve
Steve Isaacs wrote:
On Thu, 2007-12-20 at 16:40 -0700, Myles Watson wrote:
I'm seeing delays between each of the following messages -- approximately 8 seconds. Is this normal?
Not for me on Tyan s2895 (Dual Opteron NVidia CK804.)
Myles
Thanks,
hhhmmm.... Looks like I've got a problem.
Steve
Steve,
core0 started: 01 *01*00000001 *02*00000001 *03*00000001
This looks like there is a problems with starting other cores other than 0 on each node.
Try setting CONFIG_LOGICAL_CPUS 0 and see if that makes the delay go away. I am going to assume that is where the problem is based on the *01*00000001 message is probably coming from wait_ap_started().
I will attempt to give you an overview of how I think this works. First thing is that each APs APIC is used to hold flags about the AP(core) state. This is the only good way to communicate between cores (sharing memory can get very messy with CAR). The other flag is the BIOS/INIT/COLD Reset flags that set when each nodes core0 executes the ROM, distinguish_cpu_resets(). The core0 execution is done near the end of setup_coherent_ht_domain(). Then the bsp, node0core0, waits, wait_all_core0_started(), for node1core0 to execute and loop in init_cpus(). This seems to be working based on the message core0 started: 01.
Next is start_other_cores() and wait_all_other_cores_started(bsp_apicid). This is where the long delay is probably happening as cores are off running and the BSP is waiting. I don't have any idea why it is stuck waiting but you should get some insight by instrumenting init_cpus(), wait_cpu_state() and wait_ap_started();
Marc
On Fri, 2007-12-21 at 14:15 -0700, Marc Jones wrote:
Try setting CONFIG_LOGICAL_CPUS 0 and see if that makes the delay go away. I am going to assume that is where the problem is based on the *01*00000001 message is probably coming from wait_ap_started().
I'll give it a try. This definitely had an effect. I'm seeing these with only one delay now.
02 nodes initialized. *02*00000001
I will attempt to give you an overview of how I think this works. First thing is that each APs APIC is used to hold flags about the AP(core) state. This is the only good way to communicate between cores (sharing memory can get very messy with CAR). The other flag is the BIOS/INIT/COLD Reset flags that set when each nodes core0 executes the ROM, distinguish_cpu_resets(). The core0 execution is done near the end of setup_coherent_ht_domain(). Then the bsp, node0core0, waits, wait_all_core0_started(), for node1core0 to execute and loop in init_cpus(). This seems to be working based on the message core0 started: 01.
Next is start_other_cores() and wait_all_other_cores_started(bsp_apicid). This is where the long delay is probably happening as cores are off running and the BSP is waiting. I don't have any idea why it is stuck waiting but you should get some insight by instrumenting init_cpus(), wait_cpu_state() and wait_ap_started();
Thanks for this insight. I didn't realize the importance of the APIC. I'll follow this logic through and see what I can learn.
Steve