[coreboot] K8 machine with (a lot of) ram from 2 vendors keeps resetting
marcj303 at gmail.com
Thu Sep 3 23:06:03 CEST 2009
On Wed, Sep 2, 2009 at 8:03 PM, Ward Vandewege<ward at gnu.org> wrote:
> Hi Marc,
> On Wed, Sep 02, 2009 at 04:03:45PM -0600, Marc Jones wrote:
>> > With the proprietary BIOS, this setup works perfectly. That said, the manual
>> > for the board does say it's not recommended to mix memory types. Our system
>> > integrator mentions that when 16 banks are used, the memory runs at maximum
>> > 533MHz.
>> As we talked about, it looks like the memory is sized correctly and
>> the next thing to try forcing the memory speed slower. 8 dual rank
>> dimms (16 banks) s[eed limitation might be spec'd or errata. A check
>> might need to go into the main k8 mem init code.
> I've forced the speed down to 533MHz with this patch:
> --- src/northbridge/amd/amdk8/raminit_f.code(revision 4625)
> +++ src/northbridge/amd/amdk8/raminit_f.code(working copy)
> @@ -1811,6 +1811,10 @@
> min_latency = 3;
> + // Force minimum cycle time to 3.75ns (i.e. 266MHz)
> + min_cycle_time = 0x375;
> + printk_raminit("1 bios_cycle_time: %08x\n", bios_cycle_time);
> printk_raminit("1 min_cycle_time: %08x\n", min_cycle_time);
> /* Compute the least latency with the fastest clock supported
> Which appears to do the right thing, but the behavior is unchanged. Here's a
> boot log prior to the patch:
> and here's after the patch:
> This is with all samsung ram on CPU1 and all kingston ram on CPU2, as you
> suggested on irc.
I don't think that this is a RAM matching problem. Each CPU/MC is
completely has completely matched RAM in this setup. The memory is
being sized correctly and I had hoped that slowing down would help.
The next step is to do a memory test or instrument the memory clear to
see how it fails. If it is on a boundary it would indicate an
addressing problem. Something more random would indicate timing. But
it is a little bit of guess work from here. Comparing the MC PCI
registers (function1 and 2) against the legacy bios might reveal
something as well.
The actual reset is probably a triple fault which probably started
with a op-code exception. We can instrument the exception handler if
you get really stuck.
More information about the coreboot