Hi all,
I've reached the point where LB is attempting to start the secondary CPU on my custom board. As near as I can tell, LB thinks the secondary CPU isn't responding. This could be a hardware problem, or a LB issue of some sort.
CPU 6 would not start! CPU 6 did not initialize!
First thing I wanted to check was that LB is using the right APIC ID in the messages it's sending. But I haven't been able to find a good explanation anywhere of how local APIC IDs are assigned in dual-Xeon systems. All Intel says with any certainty is that the power-up sequence includes setting a unique ID for each LAPIC. There is a format to the IDs (i.e. some bits for logical processor, some for socket, some for APIC cluster), but nothing that says (for instance) "One Xeon chip is assigned LAPIC IDs 0 & 1 and the second is assigned LAPIC IDs 6 & 7".
Looking over the s2735 config, and anything I found on the Internet, it seems that this is how dual-Xeon boards work. Does anyone know why? Do the addresses come out like this any time you have a dual-Xeon board, and every time you power it up? Or are there some pins that have to be strapped a certain way to end up with these IDs? (Side note, some Xeons support reset-time config of APIC cluster ID - mine don't). The way I'm interpreting the Intel documentation, it looks like the only way to find out a secondary CPU's LAPIC ID is to start it (via an IPI broadcast?) and have it write it somewhere.
Thanks, Steve www.digidescorp.com
On Wed, 29 Jun 2005, Steve Magnani wrote:
I've reached the point where LB is attempting to start the secondary CPU on my custom board. As near as I can tell, LB thinks the secondary CPU isn't responding. This could be a hardware problem, or a LB issue of some sort.
hmm. I think it should be 6, IIRC.
There was a problem once, on another product, where the two wires for LAPIC got reversed and caused all kinds of fun. I just can't remember all the details. But you might want to double-, triple-, and friple-check your connections.
cluster), but nothing that says (for instance) "One Xeon chip is assigned LAPIC IDs 0 & 1 and the second is assigned LAPIC IDs 6 & 7".
it always seems to work out that way ...
ron
I haven't completely unraveled this yet, but I'm pretty sure that one reason my second Xeon chip won't start is due to a nested spin_lock within the 2nd chip's execution path.
Since my board is similar to the Tyan s2735 I use the same setting, CONFIG_MAX_CPUS = 4. If I am tracing the code correctly, I believe that results in the following sequence of calls to bring up the CPUs. The "CPU #x" header shows which logical CPU executes each set of calls.
CPU #0 (APIC ID 0, Xeon Chip 0, Logical processor 0): northbridge.c: cpu_bus_init lapic_cpu_init.c: initialize_cpus cpu.c: cpu_initialize model_f2x_init.c: model_f2x_init intel_sibling.c: intel_sibling_init lapic_cpu_init.c: start_cpu (APIC ID 1) spin_lock(&start_cpu_lock) lapic_cpu_init.c: lapic_start_cpu spin_unlock(&start_cpu_lock) lapic_cpu_init.c: initialize_other_cpus lapic_cpu_init.c: start_cpu(APIC ID 6) spin_lock(&start_cpu_lock) lapic_cpu_init.c: lapic_start_cpu spin_unlock(&start_cpu_lock)
CPU #1 (APIC ID 1, Xeon Chip 0, Logical processor 1): lapic_cpu_init.c: secondary_cpu_init spin_lock(&start_cpu_lock) cpu.c: cpu_initialize model_f2x_init.c: model_f2x_init intel_sibling.c: intel_sibling_init (no-op) spin_unlock(&start_cpu_lock) lapic.h: stop_this_cpu
CPU #2 (APIC ID 6, Xeon Chip 1, Logical processor 0): lapic_cpu_init.c: secondary_cpu_init spin_lock(&start_cpu_lock) cpu.c: cpu_initialize model_f2x_init.c: model_f2x_init intel_sibling.c: intel_sibling_init lapic_cpu_init.c: start_cpu (APIC ID 7) spin_lock(&start_cpu_lock) lapic_cpu_init.c: lapic_start_cpu spin_unlock(&start_cpu_lock) spin_unlock(&start_cpu_lock) lapic.h: stop_this_cpu
CPU #3 (APIC ID 7, Xeon Chip 1, Logical processor 1): lapic_cpu_init.c: secondary_cpu_init spin_lock(&start_cpu_lock) cpu.c: cpu_initialize model_f2x_init.c: model_f2x_init intel_sibling.c: intel_sibling_init (no-op) spin_unlock(&start_cpu_lock) lapic.h: stop_this_cpu
It sure looks to me like the CPU #2 call sequence tries to grab the start_cpu_lock twice.
Steve