Hi Paul,
Am Samstag, den 11.02.2017, 16:14 +0100 schrieb Daniel Kulesz:
To answer my question myself: It works partially.
1.) Samsung M393B1K70DH0-YK0
Type: DDR3 DIMM 240-Pin, reg ECC ? Ranks/Banks: dual rank, x4 ? Modules: 1x 8GB ? JEDEC: PC3L-12800R ? Voltage: 1.35V
I got 8 of these working with one CPU package on the KGPE-D16 with one of the latest Coreboot master versions:
(Please note, that coreboot is officially spelled all lowercase.)
How much did you pay for these modules?
One of them costs around 12-25 Euros (used) or ~65 Euros (new).
Just to be sure. You got 64 GB of RAM working. If you plug in more modules, it fails, right?
No. Using 64 GB worked more or less fine, but using *less* than 64 GB caused issues - depending in which slots I had put them.
Version: 4.5-963-gf57a768
Please upload your logs to the board status repository. (This commit has not been uploaded by REACTS, so it wouldn?t overwrite anything.)
I uploaded one of the tested configurations here:
https://review.coreboot.org/cgit/board-status.git/commit/asus/kgpe-d16/4.5-9...
I was not able to upload more configurations since the build_status output folder is named after the build, and I didn't want to rename too much manually.
Without the logs, it?s hard to debug anything. Verbose logs are one of the biggest advantages of coreboot. So please upload them, or attach them.
I attached the serial logs of one of the failing configurations (all orange slots populated).
However, I was unable to reproduce the exact failure because meanwhile I populated the second CPU socket and didn't find a way to deactivate the CPU (after pulling the second power connector the system did not boot up at all).
I also noticed the following messages in dmesg:
[ 1561.833618] [Hardware Error]: Corrected error, no action required. [ 1561.840026] [Hardware Error]: CPU:16 (15:1:2) MC4_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc644000ea080a13 [ 1561.851687] [Hardware Error]: Error Addr: 0x0000000ffe8a4c70 [ 1561.860313] [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB. [ 1561.870995] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout) [ 1873.121721] mce: [Hardware Error]: Machine check events logged [ 1873.121777] [Hardware Error]: Corrected error, no action required. [ 1873.128184] [Hardware Error]: CPU:16 (15:1:2) MC4_STATUS[-|CE|MiscV|-|AddrV|-|-|CECC]: 0x9c644000ea080a13 [ 1873.142264] [Hardware Error]: Error Addr: 0x0000000ff95c9230 [ 1873.150857] [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB. [ 1873.161573] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
I didn't notice any of these before the installation of the second CPU module. Don't know if this is related or not. But I'll run a memtest next to check if maybe one of the modules could be faulty.
Cheers, Daniel
Hi folks,
here's a short update regarding the beforementioned DDR3L Samsung memory sticks on the KGPE-D16 running coreboot.
Thanks to help and hints from Timothy, I experimented with different cpu/memory configurations and have done some stress and memory testing. So far, I have a few interesting findings to report for a 1-CPU-package configuration (using Opteron 6276):
(1) When running coreboot master with 4 DIMMs (in orange slots), the DIMMs are reported in dmidecode to be running at 800MHz and no ECC errors show up in dmesg during memtester runs. Same goes for the vendor bios. (2) When running coreboot master with 8 DIMMs, the DIMMs are reported in demidecode to be running at 667 MHz and MC4 errors show up in dmesg during memtester runs. (3) When running the same as in (2) but with the *vendor BIOS*, dmidecode reports 800MHz and memtester runs without MC4 errors. (4) When running the same as in (2) but with *Coreboot 4.3*, dmidecode reports 667MHz but memtester runs *without* MC4 errors as well.
It is still on my list to validate (using benchmarks) whether dmidecode just reports non-sense or if the DIMMs really run at only 667MHz. However, the issue with the MC4 errors is much more severe and likely to be a regression between Coreboot 4.3 and the current master version. It will require further bisecting to identify the cause.
@Other KGPE-D16 users: Does dmidecode in coreboot report your memory sticks to be running at 800MHz (provided you have PC12800 ones)? And are seeing MC4 failures when running memtester? (For me, it happens at stage 2 with the "random testing" - usually after less than 7 minutes after starting the test and even if I just use 10G of memory for testing).
I also tested using two cpu packages and other memory populations, but as Timothy has obligations regarding the proper dimension of my PSU, it makes more sense for me to test in 1-CPU-configuration first to find the cause of these MC4 errors in coreboot before proceeding further.
Cheers, Daniel
Btw.: I had some "false negative" kernel oopses when running stress-ng with all tests enabled on both vendor bios and coreboot. Seems to be a known software issue (Ubuntu bug 1654073).