Hi Ward,
On Wed, Sep 2, 2009 at 8:03 PM, Ward Vandewegeward@gnu.org wrote:
Hi Marc,
On Wed, Sep 02, 2009 at 04:03:45PM -0600, Marc Jones wrote:
With the proprietary BIOS, this setup works perfectly. That said, the manual for the board does say it's not recommended to mix memory types. Our system integrator mentions that when 16 banks are used, the memory runs at maximum 533MHz.
As we talked about, it looks like the memory is sized correctly and the next thing to try forcing the memory speed slower. 8 dual rank dimms (16 banks) s[eed limitation might be spec'd or errata. A check might need to go into the main k8 mem init code.
I've forced the speed down to 533MHz with this patch:
--- src/northbridge/amd/amdk8/raminit_f.code(revision 4625) +++ src/northbridge/amd/amdk8/raminit_f.code(working copy) @@ -1811,6 +1811,10 @@ } min_latency = 3;
- // Force minimum cycle time to 3.75ns (i.e. 266MHz)
- min_cycle_time = 0x375;
- printk_raminit("1 bios_cycle_time: %08x\n", bios_cycle_time);
printk_raminit("1 min_cycle_time: %08x\n", min_cycle_time);
/* Compute the least latency with the fastest clock supported
Which appears to do the right thing, but the behavior is unchanged. Here's a boot log prior to the patch:
http://ward.vandewege.net/coreboot/h8dme/minicom-20090902-with-64G-samsung-o...
and here's after the patch:
http://ward.vandewege.net/coreboot/h8dme/minicom-20090902-with-64G-samsung-o...
This is with all samsung ram on CPU1 and all kingston ram on CPU2, as you suggested on irc.
I don't think that this is a RAM matching problem. Each CPU/MC is completely has completely matched RAM in this setup. The memory is being sized correctly and I had hoped that slowing down would help. The next step is to do a memory test or instrument the memory clear to see how it fails. If it is on a boundary it would indicate an addressing problem. Something more random would indicate timing. But it is a little bit of guess work from here. Comparing the MC PCI registers (function1 and 2) against the legacy bios might reveal something as well.
The actual reset is probably a triple fault which probably started with a op-code exception. We can instrument the exception handler if you get really stuck.
Marc