Hi all,
I'm having some problems with a supermicro h8dme board with 2 K8 processors and 64 GB of ram.
Sadly, the ram is of two types (that was not my decision :/). We have 8x 4GB Samsung M393T5160QZA-CE6 and 8x 4GB Kingston KVR667D2D4P5/4G. Both of these types of ram are dual rank DDR2, 667 MHz, CL5, 1.8 V, registered, ECC.
The memory is installed like this:
BANK CPU1 CPU2 1B KING KING 1A KING KING 2B KING KING 2A KING KING 3B SAMS SAMS 3A SAMS SAMS 4B SAMS SAMS 4A SAMS SAMS
With the proprietary BIOS, this setup works perfectly. That said, the manual for the board does say it's not recommended to mix memory types. Our system integrator mentions that when 16 banks are used, the memory runs at maximum 533MHz.
I should mention that this particular coreboot port has one oddity - the memory on each CPU must be matched because we don't know how to do the SPD address switch to detect ram on the second CPU.
As you can see in the diagram above, that condition is met.
There are two problems. First, with the above (64GB) configuration coreboot just keeps resetting itself after printing "Clearing initial memory region:". I've attached a log with RAM_TIMING_DEBUG enabled, see minicom-20090902-with-64G.cap.
Perhaps this is related to coreboot not dropping down to 533MHz with 16 DIMMs installed?
Dropping the installed ram down to 48GB like this:
BANK CPU1 CPU2 1B KING KING 1A KING KING 2B KING KING 2A KING KING 3B SAMS SAMS 3A SAMS SAMS 4B -- -- 4A -- --
Makes the boot get all the way to seabios, but the board still resets itself just before booting the linux kernel. See attached minicom-20090902-with-48G.cap.
Any thoughts on what might be happening here, or how I could debug further?
Thanks, Ward.
Hi Ward,
On Wed, Sep 2, 2009 at 3:14 PM, Ward Vandewegeward@gnu.org wrote:
Hi all,
I'm having some problems with a supermicro h8dme board with 2 K8 processors and 64 GB of ram.
Sadly, the ram is of two types (that was not my decision :/). We have 8x 4GB Samsung M393T5160QZA-CE6 and 8x 4GB Kingston KVR667D2D4P5/4G. Both of these types of ram are dual rank DDR2, 667 MHz, CL5, 1.8 V, registered, ECC.
The memory is installed like this:
BANK CPU1 CPU2 1B KING KING 1A KING KING 2B KING KING 2A KING KING 3B SAMS SAMS 3A SAMS SAMS 4B SAMS SAMS 4A SAMS SAMS
With the proprietary BIOS, this setup works perfectly. That said, the manual for the board does say it's not recommended to mix memory types. Our system integrator mentions that when 16 banks are used, the memory runs at maximum 533MHz.
As we talked about, it looks like the memory is sized correctly and the next thing to try forcing the memory speed slower. 8 dual rank dimms (16 banks) s[eed limitation might be spec'd or errata. A check might need to go into the main k8 mem init code.
Marc
Hi Marc,
On Wed, Sep 02, 2009 at 04:03:45PM -0600, Marc Jones wrote:
With the proprietary BIOS, this setup works perfectly. That said, the manual for the board does say it's not recommended to mix memory types. Our system integrator mentions that when 16 banks are used, the memory runs at maximum 533MHz.
As we talked about, it looks like the memory is sized correctly and the next thing to try forcing the memory speed slower. 8 dual rank dimms (16 banks) s[eed limitation might be spec'd or errata. A check might need to go into the main k8 mem init code.
I've forced the speed down to 533MHz with this patch:
--- src/northbridge/amd/amdk8/raminit_f.code(revision 4625) +++ src/northbridge/amd/amdk8/raminit_f.code(working copy) @@ -1811,6 +1811,10 @@ } min_latency = 3;
+ // Force minimum cycle time to 3.75ns (i.e. 266MHz) + min_cycle_time = 0x375; + + printk_raminit("1 bios_cycle_time: %08x\n", bios_cycle_time); printk_raminit("1 min_cycle_time: %08x\n", min_cycle_time);
/* Compute the least latency with the fastest clock supported
Which appears to do the right thing, but the behavior is unchanged. Here's a boot log prior to the patch:
http://ward.vandewege.net/coreboot/h8dme/minicom-20090902-with-64G-samsung-o...
and here's after the patch:
http://ward.vandewege.net/coreboot/h8dme/minicom-20090902-with-64G-samsung-o...
This is with all samsung ram on CPU1 and all kingston ram on CPU2, as you suggested on irc.
Anything else I should try?
Thanks, Ward.
Hi Ward,
On Wed, Sep 2, 2009 at 8:03 PM, Ward Vandewegeward@gnu.org wrote:
Hi Marc,
On Wed, Sep 02, 2009 at 04:03:45PM -0600, Marc Jones wrote:
With the proprietary BIOS, this setup works perfectly. That said, the manual for the board does say it's not recommended to mix memory types. Our system integrator mentions that when 16 banks are used, the memory runs at maximum 533MHz.
As we talked about, it looks like the memory is sized correctly and the next thing to try forcing the memory speed slower. 8 dual rank dimms (16 banks) s[eed limitation might be spec'd or errata. A check might need to go into the main k8 mem init code.
I've forced the speed down to 533MHz with this patch:
--- src/northbridge/amd/amdk8/raminit_f.code(revision 4625) +++ src/northbridge/amd/amdk8/raminit_f.code(working copy) @@ -1811,6 +1811,10 @@ } min_latency = 3;
- // Force minimum cycle time to 3.75ns (i.e. 266MHz)
- min_cycle_time = 0x375;
- printk_raminit("1 bios_cycle_time: %08x\n", bios_cycle_time);
printk_raminit("1 min_cycle_time: %08x\n", min_cycle_time);
/* Compute the least latency with the fastest clock supported
Which appears to do the right thing, but the behavior is unchanged. Here's a boot log prior to the patch:
http://ward.vandewege.net/coreboot/h8dme/minicom-20090902-with-64G-samsung-o...
and here's after the patch:
http://ward.vandewege.net/coreboot/h8dme/minicom-20090902-with-64G-samsung-o...
This is with all samsung ram on CPU1 and all kingston ram on CPU2, as you suggested on irc.
I don't think that this is a RAM matching problem. Each CPU/MC is completely has completely matched RAM in this setup. The memory is being sized correctly and I had hoped that slowing down would help. The next step is to do a memory test or instrument the memory clear to see how it fails. If it is on a boundary it would indicate an addressing problem. Something more random would indicate timing. But it is a little bit of guess work from here. Comparing the MC PCI registers (function1 and 2) against the legacy bios might reveal something as well.
The actual reset is probably a triple fault which probably started with a op-code exception. We can instrument the exception handler if you get really stuck.
Marc
On 02.09.2009 23:14, Ward Vandewege wrote:
I'm having some problems with a supermicro h8dme board with 2 K8 processors and 64 GB of ram.
Sadly, the ram is of two types (that was not my decision :/). We have 8x 4GB Samsung M393T5160QZA-CE6 and 8x 4GB Kingston KVR667D2D4P5/4G. Both of these types of ram are dual rank DDR2, 667 MHz, CL5, 1.8 V, registered, ECC.
The K8 memory code has some oddities which cause it to break down even if the SPD parameters you use are identical. Not only must all SPD timing parameters be identical, they also have to be stored at the same byte addresses in each SPD. I sent a patch for this to the list sometime last year, but it was never completely finished. I can try to dig it up if you want to take a look.
Regards, Carl-Daniel