[help] A K8/RS780/SB710 board MCE error - coreboot

13 Sep 2010


      Hello everyone,
I'm porting coreboot v4 to a k8-rs780-sb710 based mainboard,  and use
amd/mahogany
and amd/tilapia_fam10 codes as the reference. Now coreboot boots the
board and filo loads linux,but the board crashes at a MCE error during
booting process. I'm not very know the detail about the MCE, so any
suggestions will be appreciated, thanks very much.
The mainboard architecture:
CPU: socket F Opteron 2210 EE get_cpu_rev EAX=0x40f13 (1 cpu, dual core)
DIMM: DDR2 333M (x1 / x2)
HT Link0: off
HT Link1: RS780->SB710
HT Link2: off
VGA off
GFX off
PCIE off
coreboot code  revision: modified on r5692
The MCE/panic message:
HARDWARE ERROR
CPU 0: Machine Check Exception:                4 Bank 0: f658a00000000833
TSC 572507f34 ADDR 6000
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Machine check
------------[ cut here ]------------
WARNING: at kernel/smp.c:331 smp_call_function_mask+0x32/0x1ec()
Modules linked in:
Supported: Yes
Pid: 1, comm: swapper Tainted: G   M      2.6.27.19-5-default #1
Call Trace:
 [<ffffffff8020d9f9>] show_trace_log_lvl+0x41/0x58
 [<ffffffff80496a74>] dump_stack+0x69/0x6f
 [<ffffffff8023bfba>] warn_on_slowpath+0x51/0x77
 [<ffffffff8025b1c5>] smp_call_function_mask+0x32/0x1ec
 [<ffffffff8025b3a8>] smp_call_function+0x29/0x2e
 [<ffffffff8021a04a>] native_smp_send_stop+0x1a/0x26
 [<ffffffff80496b36>] panic+0xbc/0x169
 [<ffffffff80216366>] mce_log+0x0/0x7e
 [<ffffffff80216740>] do_machine_check+0x31e/0x3cd
 [<ffffffff8020d27f>] machine_check+0x7f/0x90
 [<ffffffff802126c8>] setup_trampoline+0x20/0x30
 [<ffffffff804919a5>] native_cpu_up+0x31e/0xc64
 [<ffffffff80493d17>] _cpu_up+0x9a/0x11c
 [<ffffffff80493df4>] cpu_up+0x5b/0x6f
 [<ffffffff8095b708>] kernel_init+0xe1/0x1eb
 [<ffffffff8020cf49>] child_rip+0xa/0x11
---[ end trace 4eaa2a86a8e2da22 ]---
mcelog --k8 --ascii
HARDWARE ERROR
CPU 0: Machine Check Exception:                4 Bank 0: f658a00000000833
TSC 572507f34 ADDR 6000
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
HARDWARE ERROR
CPU 0 0 data cache TSC 572507f34
  Data cache ECC error (syndrome b1)
       bit45 = uncorrected ecc error
       bit57 = processor context corrupt
       bit61 = error uncorrected
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      data read mem transaction
      memory access, level generic'
STATUS f658a00000000833 MCGSTATUS 4
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Attached is the detailed boot message.
-- 
Regards,
Liu Tao