I'm able to boot Linux just fine on my MCP55-based board with non-ECC memory, but when I install ECC DIMMs I get a Machine Check Exception from Linux about half the time:
CPU 0: Machine Check Exception: 0000000000000004 Bank 4: f6302001d8080813 at 00000000021fbd60 Kernel panic - not syncing: CPU context corrupt
I haven't yet tried different ECC DIMMs to eliminate the possibility of a bad part.
I suspect it's a software problem though. When I run memtest86+, it claims the memory is non-ECC, and the machine reboots itself after about 10 minutes of testing.
I'm wondering whether LinuxBIOS is setting up memory properly. The board has a single two-core AMD Athlon 64 X2, identified by LinuxBIOS as:
CPU: vendor AMD device 40fb2 CPU: family 0f, model 4b, stepping 02 brandId=0849
and I'm setting MEM_TRAIN_SEQ to 2 in Options.lb. I'm about to try 1 and 0 instead, to see if that helps.
Can someone explain what exactly MEM_TRAIN_SEQ does? Is the proper setting dependent on the board, the CPU type, or both? The description in src/config/Options.lb is, um, a tad cryptic:
0: three for in bsp, 1: on every core0, 2: one for on bsp
Any other pointers would be appreciated.
--Ed
Only Opteron supports ECC.
About mem training 0: BSP will train memory on all nodes of ReceiverEn, then train memory nodes of DQSPos 1: every core0 will train memory on the node itself. 2: BSP will train memory on one node (ReceiverEn, DQSPos) and then next Node.
For big system ( Four way or quad way system), you can save time with Option 1.
YH
On 3/20/07, Lu, Yinghai yinghai.lu@amd.com wrote:
Only Opteron supports ECC.
I hope that's not true, otherwise AMD needs to change their datasheet for the Athlon 64 X2:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/3342...
About mem training 0: BSP will train memory on all nodes of ReceiverEn, then train memory nodes of DQSPos 1: every core0 will train memory on the node itself. 2: BSP will train memory on one node (ReceiverEn, DQSPos) and then next Node.
For big system ( Four way or quad way system), you can save time with Option 1.
So for a single-socket dual-core system, which training option should I use?
--Ed
Oh. I didn't notice that, and I didn't test that case.
For single socket system, you can use any option.
YH