[LinuxBIOS] Machine Check Exception with ECC DIMMs on MCP55 board

Ed Swierk eswierk at arastra.com
Tue Mar 20 17:28:35 CET 2007

I'm able to boot Linux just fine on my MCP55-based board with non-ECC
memory, but when I install ECC DIMMs I get a Machine Check Exception
from Linux about half the time:

  CPU 0: Machine Check Exception: 0000000000000004
  Bank 4: f6302001d8080813 at 00000000021fbd60
  Kernel panic - not syncing: CPU context corrupt

I haven't yet tried different ECC DIMMs to eliminate the possibility
of a bad part.

I suspect it's a software problem though. When I run memtest86+, it
claims the memory is non-ECC, and the machine reboots itself after
about 10 minutes of testing.

I'm wondering whether LinuxBIOS is setting up memory properly. The
board has a single two-core AMD Athlon 64 X2, identified by LinuxBIOS

  CPU: vendor AMD device 40fb2
  CPU: family 0f, model 4b, stepping 02

and I'm setting MEM_TRAIN_SEQ to 2 in Options.lb. I'm about to try 1
and 0 instead, to see if that helps.

Can someone explain what exactly MEM_TRAIN_SEQ does? Is the proper
setting dependent on the board, the CPU type, or both? The description
in src/config/Options.lb is, um, a tad cryptic:

  0: three for in bsp, 1: on every core0, 2: one for on bsp

Any other pointers would be appreciated.


