I'm able to boot Linux just fine on my MCP55-based board with non-ECC memory, but when I install ECC DIMMs I get a Machine Check Exception from Linux about half the time:
CPU 0: Machine Check Exception: 0000000000000004 Bank 4: f6302001d8080813 at 00000000021fbd60 Kernel panic - not syncing: CPU context corrupt
I haven't yet tried different ECC DIMMs to eliminate the possibility of a bad part.
I suspect it's a software problem though. When I run memtest86+, it claims the memory is non-ECC, and the machine reboots itself after about 10 minutes of testing.
I'm wondering whether LinuxBIOS is setting up memory properly. The board has a single two-core AMD Athlon 64 X2, identified by LinuxBIOS as:
CPU: vendor AMD device 40fb2 CPU: family 0f, model 4b, stepping 02 brandId=0849
and I'm setting MEM_TRAIN_SEQ to 2 in Options.lb. I'm about to try 1 and 0 instead, to see if that helps.
Can someone explain what exactly MEM_TRAIN_SEQ does? Is the proper setting dependent on the board, the CPU type, or both? The description in src/config/Options.lb is, um, a tad cryptic:
0: three for in bsp, 1: on every core0, 2: one for on bsp
Any other pointers would be appreciated.
--Ed