Hi all,
I'm trying to port the h8dme to fam10. My test hardware has two quad core CPUs and 32G of ram. I'm seeing a hang very early on during boot:
--------------------------------------- coreboot-2.0.0-r4978:4979M_h8dme_fam10_Fallback Wed Dec 16 17:29:28 EST 2009 starting...
BSP Family_Model: 00100f42 *sysinfo range: [000cc000,000cdfa0] bsp_apicid = 00 cpu_init_detectedx = 00000000 microcode: equivalent rev id = 0x1041, current patch id = 0x00000000 microcode: rev id (1062) does not match this patch. microcode: Not updated! Fix microcode_updates[] cpuSetAMDMSR done Enter amd_ht_init() AMD_CB_EventNotify() event class: 05 event: 1004 data: 04 00 00 01 AMD_CB_ManualBUIDSwapList() AMD_CB_EventNotify() event class: 05 event: 2006 data: 04 00 02 ff Exit amd_ht_init() cpuSetAMDPCI 00 done cpuSetAMDPCI 01 done Prep FID/VID Node:00 F3x80: e600a681 F3x84: a0e641e6 F3xD4: c3310f24 F3xD8: 03001615 F3xDC: 00005322 Prep FID/VID Node:01 F3x80: e600a681 F3x84: a0e641e6 F3xD4: c3310f24 F3xD8: 03001615 F3xDC: 00005322 setup_remote_node: 01 done Start node 01 done. bsp_apicid=00 pre wait_all_co ---------------------------------------
It stops in the middle of the output there (hang). This is against svn head. I'm building with xgcc 4.4.1 (same problem with stock Ubuntu gcc 3.4). I'm not sure what I'm doing wrong; any pointers to things that I could try?
I've uploaded the code I have so far here, in case you're interested:
http://ward.vandewege.net/coreboot/h8dme/fam10/src-mainboard-supermicro-h8dm... http://ward.vandewege.net/coreboot/h8dme/fam10/targets-supermicro-h8dme_fam1...
Thanks, Ward.
On Fri, Dec 18, 2009 at 10:54 AM, Ward Vandewege ward@gnu.org wrote:
Hi all,
I'm trying to port the h8dme to fam10. My test hardware has two quad core CPUs and 32G of ram. I'm seeing a hang very early on during boot:
coreboot-2.0.0-r4978:4979M_h8dme_fam10_Fallback Wed Dec 16 17:29:28 EST 2009 starting...
Start node 01 done. bsp_apicid=00 pre wait_all_co
It stops in the middle of the output there (hang). This is against svn head. I'm building with xgcc 4.4.1 (same problem with stock Ubuntu gcc 3.4). I'm not sure what I'm doing wrong; any pointers to things that I could try?http://www.coreboot.org/mailman/listinfo/coreboot
I would check the early startup code for the remote nodes. It looks like you hang right where it gets garbled for Knut.
Thanks, Myles
On Fri, Dec 18, 2009 at 11:38:46AM -0700, Myles Watson wrote:
I would check the early startup code for the remote nodes. It looks like you hang right where it gets garbled for Knut.
Hmm, yeah. So I tried enabling just one core by setting
default CONFIG_MAX_PHYSICAL_CPUS=1 default CONFIG_MAX_CPUS=1 * CONFIG_MAX_PHYSICAL_CPUS default CONFIG_LOGICAL_CPUS=1
and by adding a return at the very beginning of start_other_cores in
cpu/amd/quadcore/quadcore.c
which gets me a bit further, but not much. It hangs in the mcp55_early_pcie_setup function in
southbridge/nvidia/mcp55/mcp55_early_setup_car.c
Log attached. Anything else I should try?
Thanks, Ward.
On Fri, Dec 18, 2009 at 11:38:46AM -0700, Myles Watson wrote:
I would check the early startup code for the remote nodes. It looks
like
you hang right where it gets garbled for Knut.
Hmm, yeah. So I tried enabling just one core by setting
default CONFIG_MAX_PHYSICAL_CPUS=1 default CONFIG_MAX_CPUS=1 * CONFIG_MAX_PHYSICAL_CPUS default CONFIG_LOGICAL_CPUS=1
I think you need to set CONFIG_LOGICAL_CPUS=0 to disable the siblings. I got it to compile by moving the nb_ function it can't find into the #endif above it.
* AP 02 didn't start timeout:00000001 * AP 03 didn't start timeout:00000001
Begin FIDVID MSR 0xc0010071 0x30ae00a3 0x40034c40 FIDVID on BSP, APIC_id: 00 BSP fid = 10600 Wait for AP stage 1: ap_apicid = 1 fidvid_bsp_stage1: time out while reading from ap 01 Wait for AP stage 1: ap_apicid = 2 fidvid_bsp_stage1: time out while reading from ap 02 Wait for AP stage 1: ap_apicid = 3
It's still trying to start the APs.
and by adding a return at the very beginning of start_other_cores in
cpu/amd/quadcore/quadcore.c
which gets me a bit further, but not much. It hangs in the mcp55_early_pcie_setup function in
southbridge/nvidia/mcp55/mcp55_early_setup_car.c
Log attached. Anything else I should try?
If inl and outl are hanging, I would dump the routing registers and read the device's IDs to see what's going wrong. I'm not very familiar with how the fam10 code works, but dumping the routing registers should be mostly cut and paste from the k8/util.c code.
Thanks, Myles
On Tue, Dec 22, 2009 at 02:30:21PM -0700, Myles Watson wrote:
On Fri, Dec 18, 2009 at 11:38:46AM -0700, Myles Watson wrote:
I would check the early startup code for the remote nodes. It looks
like
you hang right where it gets garbled for Knut.
Hmm, yeah. So I tried enabling just one core by setting
default CONFIG_MAX_PHYSICAL_CPUS=1 default CONFIG_MAX_CPUS=1 * CONFIG_MAX_PHYSICAL_CPUS default CONFIG_LOGICAL_CPUS=1
I think you need to set CONFIG_LOGICAL_CPUS=0 to disable the siblings. I got it to compile by moving the nb_ function it can't find into the #endif above it.
- AP 02 didn't start timeout:00000001
- AP 03 didn't start timeout:00000001
Begin FIDVID MSR 0xc0010071 0x30ae00a3 0x40034c40 FIDVID on BSP, APIC_id: 00 BSP fid = 10600 Wait for AP stage 1: ap_apicid = 1 fidvid_bsp_stage1: time out while reading from ap 01 Wait for AP stage 1: ap_apicid = 2 fidvid_bsp_stage1: time out while reading from ap 02 Wait for AP stage 1: ap_apicid = 3
It's still trying to start the APs.
Right. Setting CONFIG_LOGICAL_CPUS to zero and making sure that conditional on CONFIG_LOGICAL_CPUS at the top of northbridge.c does not apply fixed that.
Should this go into the tree?
--- northbridge/amd/amdfam10/northbridge.c (revision 4978) +++ northbridge/amd/amdfam10/northbridge.c (working copy) @@ -31,10 +31,10 @@
#include <cpu/x86/lapic.h>
-#if CONFIG_LOGICAL_CPUS==1 #include <cpu/amd/quadcore.h> #include <pc80/mc146818rtc.h> -#endif
#include "chip.h" #include "root_complex/chip.h"
and by adding a return at the very beginning of start_other_cores in
cpu/amd/quadcore/quadcore.c
which gets me a bit further, but not much. It hangs in the mcp55_early_pcie_setup function in
southbridge/nvidia/mcp55/mcp55_early_setup_car.c
Log attached. Anything else I should try?
If inl and outl are hanging, I would dump the routing registers and read the device's IDs to see what's going wrong. I'm not very familiar with how the fam10 code works, but dumping the routing registers should be mostly cut and paste from the k8/util.c code.
Right. I've done that - log attached. I'm dumping with
showallroutes(BIOS_DEBUG, PCI_DEV(0, 0x18, 1));
I'm not sure what to make of the dump though (attached).
Thanks, Ward.
Right. Setting CONFIG_LOGICAL_CPUS to zero and making sure that conditional on CONFIG_LOGICAL_CPUS at the top of northbridge.c does not apply fixed that.
Should this go into the tree?
--- northbridge/amd/amdfam10/northbridge.c (revision 4978) +++ northbridge/amd/amdfam10/northbridge.c (working copy) @@ -31,10 +31,10 @@
#include <cpu/x86/lapic.h>
-#if CONFIG_LOGICAL_CPUS==1 #include <cpu/amd/quadcore.h> #include <pc80/mc146818rtc.h> -#endif
#include "chip.h" #include "root_complex/chip.h"
I like just moving the endif to protect nb_cfg_54, if it would work. It compiles for me.
--- northbridge/amd/amdfam10/northbridge.c (revision 4978) +++ northbridge/amd/amdfam10/northbridge.c (working copy) @@ -1235,7 +1235,6 @@ disable_siblings = !CONFIG_LOGICAL_CPUS; #if CONFIG_LOGICAL_CPUS == 1 get_option(&disable_siblings, "quad_core"); -#endif
// for pre_e0, nb_cfg_54 can not be set, ( even set, when you read it // still be 0) @@ -1243,6 +1242,7 @@ // and differ d0 and e0 single core
nb_cfg_54 = read_nb_cfg_54(); +#endif
#if CONFIG_CBB dev_mc = dev_find_slot(0, PCI_DEVFN(CONFIG_CDB, 0)); //0x00
and by adding a return at the very beginning of start_other_cores in
cpu/amd/quadcore/quadcore.c
which gets me a bit further, but not much. It hangs in the mcp55_early_pcie_setup function in
southbridge/nvidia/mcp55/mcp55_early_setup_car.c
Log attached. Anything else I should try?
If inl and outl are hanging, I would dump the routing registers and read
the
device's IDs to see what's going wrong. I'm not very familiar with how
the
fam10 code works, but dumping the routing registers should be mostly cut
and
paste from the k8/util.c code.
Right. I've done that - log attached. I'm dumping with
showallroutes(BIOS_DEBUG, PCI_DEV(0, 0x18, 1));
I'm not sure what to make of the dump though (attached).
MMIO(b8)0000000000-31a4f2ffff, ->(0,1), , , CPU disable 0, Lock 0, Non posted 0
This is broken, but I'm not sure if it's the dump or the register value. It shouldn't affect the IO, though. That register looked fine. It seems like IO is broken for you not to be able to start the other processors or complete the mcp55 init.
You could print out PCI_DEV(0,0x18,0) @ 0x6C to make sure that the lower bits are what you expect. The ones I'd look at are the default link (bits 11,3,2), disable routing bit (bit 0).
The default link should be 2. The disable routing bit can tell you if it's important that the routing registers are messed up.
Thanks, Myles
On Tue, Dec 22, 2009 at 04:11:06PM -0700, Myles Watson wrote:
Right. Setting CONFIG_LOGICAL_CPUS to zero and making sure that conditional on CONFIG_LOGICAL_CPUS at the top of northbridge.c does not apply fixed that.
Should this go into the tree?
--- northbridge/amd/amdfam10/northbridge.c (revision 4978) +++ northbridge/amd/amdfam10/northbridge.c (working copy) @@ -31,10 +31,10 @@
#include <cpu/x86/lapic.h>
-#if CONFIG_LOGICAL_CPUS==1 #include <cpu/amd/quadcore.h> #include <pc80/mc146818rtc.h> -#endif
#include "chip.h" #include "root_complex/chip.h"
I like just moving the endif to protect nb_cfg_54, if it would work. It compiles for me.
--- northbridge/amd/amdfam10/northbridge.c (revision 4978) +++ northbridge/amd/amdfam10/northbridge.c (working copy) @@ -1235,7 +1235,6 @@ disable_siblings = !CONFIG_LOGICAL_CPUS; #if CONFIG_LOGICAL_CPUS == 1 get_option(&disable_siblings, "quad_core"); -#endif
// for pre_e0, nb_cfg_54 can not be set, ( even set, when you read it // still be 0) @@ -1243,6 +1242,7 @@ // and differ d0 and e0 single core
nb_cfg_54 = read_nb_cfg_54(); +#endif
#if CONFIG_CBB dev_mc = dev_find_slot(0, PCI_DEVFN(CONFIG_CDB, 0)); //0x00
OK - with that patch it builds and boots, and the output looks similar (but not identical. See
http://ward.vandewege.net/coreboot/h8dme/fam10/minicom-20091222af-ram-on-bot...
The only difference is this
-MMIO(b8)0000000000-31a4f2ffff, ->(0,1), , , CPU disable 0, Lock 0, Non posted 0 +MMIO(b8)0000000000-31a6b2ffff, ->(0,1), , , CPU disable 0, Lock 0, Non posted 1
which may be entirely unrelated?
I'll look at that other register tomorrow.
Thanks! Ward.
On Tue, Dec 22, 2009 at 04:11:06PM -0700, Myles Watson wrote:
This is broken, but I'm not sure if it's the dump or the register value. It shouldn't affect the IO, though. That register looked fine. It seems like IO is broken for you not to be able to start the other processors or complete the mcp55 init.
You could print out PCI_DEV(0,0x18,0) @ 0x6C to make sure that the lower bits are what you expect. The ones I'd look at are the default link (bits 11,3,2), disable routing bit (bit 0).
The default link should be 2. The disable routing bit can tell you if it's important that the routing registers are messed up.
Hrm. If I'm reading that right with this code
u32 xxx = pci_read_config32(PCI_DEV(0, 0x18, 0), 0x6c); printk_debug("0x%04x\n",xxx);
then what comes out does not look very good:
0xf870
which is
1111100001110000
So the default link is 0, and the Routing Table Disable bit is set to zero.
You mentioned bit 11 - that seems to be marked as 'reserved' in the BKDG for fam10?
Thanks, Ward.
This is broken, but I'm not sure if it's the dump or the register value.
It
shouldn't affect the IO, though. That register looked fine. It seems
like
IO is broken for you not to be able to start the other processors or complete the mcp55 init.
You could print out PCI_DEV(0,0x18,0) @ 0x6C to make sure that the lower bits are what you expect. The ones I'd look at are the default link
(bits
11,3,2), disable routing bit (bit 0).
The default link should be 2. The disable routing bit can tell you if
it's
important that the routing registers are messed up.
Hrm. If I'm reading that right with this code
u32 xxx = pci_read_config32(PCI_DEV(0, 0x18, 0), 0x6c); printk_debug("0x%04x\n",xxx);
That looks right.
then what comes out does not look very good:
0xf870
which is
1111100001110000
That looks pretty broken. Bits 12-15 are reserved according to the BKDG I have. Maybe you should try printing those values earlier? I don't know what's going on to get such strange values.
So the default link is 0, and the Routing Table Disable bit is set to zero.
You mentioned bit 11 - that seems to be marked as 'reserved' in the BKDG for fam10?
In the version I have it's marked as read only.
Thanks, Myles