Am Fri, 11 Nov 2016 13:53:16 +0100 schrieb Nico Huber nico.h@gmx.de:
Hi Charlotte,
On 11.11.2016 08:14, Charlotte Plusplus wrote:
So I did many more tests today (more than 6h, and flashing around 30 times), with SPD settings hardcoded into raminit, and without the mrc cache interfering.
thanks for the analysis and summing this up.
TLDR: coreboot tries to increase the frequency without increasing the voltage, and that doesn't work for all memory.
Basically, with the problematic ram sticks, I can boot perfectly fine at DDR3-1866 speed, but even the slower setting 11-11-11-31 gives errors on the memtest. This is inconsistent with information I found about my memory from its SPD information, and from other people who overclock this exact same memory.
Even at 10-10-10-27, I still get errors at DDR3-1600 speeds. Far fewer than before, but some errors sill.
After reading more about XMP and SPD, it is my understanding that :
- JEDEC specs stop at 1600, and after that XMP is required
- even before 1600, XMP also offers profiles, and they are not
optional: some memory is otherwise unable to work at its advertised speed
This would mean the memory is just broken. But that's what I suspect of any memory that's supposed to run out of spec.
Yes XMP is required to run "out of spec". That's why all Intel Processors only advertise DDR3-1600 as maximum frequency on Intel's homepage. XMP is optional. I don't think that XMP is the problem. My guess is that raminit doesn't set all required registers to fine tune the memory controller to get it stable.
- XMP profiles are some kind of overclocking: they usually require
adjusting the voltage, to deal with this increased speed
Not kind of overclocking, simply overclocking. There's only one voltage specified for DDR3, IIRC.
- in XMP profile bytes, voltage increase information is given
precisely
- nowhere in the code I saw anything increasing the voltage, while
XMP requires that
I conclude that while there may be errors in selecting the SPD settings, even if the SPD is manually corrected with known-good settings, or if overshooting with very generous latencies, some errors do remain as the ram is being asked to operate outside its voltage specifications (given the frequency)
1.5V is a JEDEC spec, but RAM is advertised based on the information contained in the XMP profiles, which at the moment do not seem fully supported.
JEDEC is the standard. If the XMP support is half-baked it should be disabled by default. Maybe we should even put a warning in the log if we encounter an XMP profile with anything else than 1.5V (if it's common that those DIMMs are broken ex factory).
Yes, we could add a warning that XMP profile will be used. I personly never had problems with it.
I do not know how to adjust the voltage (it should require talking to the IMC of the CPU) but I think that as soon as this is done, stability should improve.
If someone can propose a patch doing that (either using the voltage read from SPD, or by manually entering voltage information), I will be happy to test it.
Depending on the board the voltage might not be configurable at all. Why should it be if there is only one voltage defined in the standard?
The W520 does only have 1.5V DDR voltage. If it's stable with vendor bios, it's not a DDR voltage problem at all.
For now, I urge caution when operating even at DDR-1866 frequencies. Most boards do set up 933 as their max_mem_clock_mhz. It is not very prudent to do that until the voltage situation can be solved.
If the board can work at that frequency, that's just fine. If the voltage is a problem, it's due to the memory module. IMHO, the rule should be to ignore SPD frequency settings that include an out of spec voltage.
That's what sandybridge raminit does. Only XMP profiles with DDR voltage of 1.5V are used. Profiles that do have other voltage setting are ignored.
Regards, Patrick
If you want to do further testing, you can try to find out which com- binations of processor and DIMMs work with the Vendor BIOS or the MRC blob (I wouldn't expect that it supports non-JEDEC stuff, but it would be nice to know if something can be fixed in coreboot easily).
Nico
On 11.11.2016 17:12, Patrick Rudolph wrote:
Am Fri, 11 Nov 2016 13:53:16 +0100 schrieb Nico Huber nico.h@gmx.de:
On 11.11.2016 08:14, Charlotte Plusplus wrote:
I do not know how to adjust the voltage (it should require talking to the IMC of the CPU) but I think that as soon as this is done, stability should improve.
If someone can propose a patch doing that (either using the voltage read from SPD, or by manually entering voltage information), I will be happy to test it.
Depending on the board the voltage might not be configurable at all. Why should it be if there is only one voltage defined in the standard?
The W520 does only have 1.5V DDR voltage. If it's stable with vendor bios, it's not a DDR voltage problem at all.
That's what I suspected, too.
For now, I urge caution when operating even at DDR-1866 frequencies. Most boards do set up 933 as their max_mem_clock_mhz. It is not very prudent to do that until the voltage situation can be solved.
If the board can work at that frequency, that's just fine. If the voltage is a problem, it's due to the memory module. IMHO, the rule should be to ignore SPD frequency settings that include an out of spec voltage.
That's what sandybridge raminit does. Only XMP profiles with DDR voltage of 1.5V are used. Profiles that do have other voltage setting are ignored.
Good to know, I already started worrying about your code just by reading emails. Should have looked in the code instead ;) my apologies.
Nico
Hello
On Fri, Nov 11, 2016 at 5:37 PM, Nico Huber nico.h@gmx.de wrote:
The W520 does only have 1.5V DDR voltage. If it's stable with vendor bios, it's not a DDR voltage problem at all.
Based on my reading of the block diagram and crossing that with a cpu pinout and the cpu specs, I disagree. The W520 indeed only support 1.5V, if you mean 1.5V vs 1.3 "low voltage" DDR3L.
But SA_DIMM_VREFDQ is in direct control of the DDR3 voltage: "The step size is 7.7 mV". So it supports 1.5V +- k*0.007V, with k being given by the XMP profile.
In case this is not clear, on http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/3rd-g... : Page 30 : "The processor memory controller has the capability of generating the DDR3 Reference Voltage (VREF) internally for both read (RDVREF) and write (VREFDQ) operations. The generated VREF can be changed in small steps, and an optimum VREF value is determined for both during a cold boot through advanced DDR3 training procedures in order to provide the best voltage and signal margins."
That seems to be a lot of evidence in the voltage not being an absolutely fixed 1.500V. It is something more flexible!!!
That's what sandybridge raminit does. Only XMP profiles with DDR
voltage of 1.5V are used. Profiles that do have other voltage setting are ignored.
Good to know, I already started worrying about your code just by reading emails. Should have looked in the code instead ;) my apologies.
Yes, in spd_xmp_decode_ddr3, profiles not using 1.5V are discarded. I believe this is the problem. When I did a google image search of "cpuz ddr3", the first few hits showed me 1.6V and 1.65V XMP profiles. So there are quite a few of such profiles out there. I'm not alone.
At the moment, I do not have any better explaination as to why my ram is not stable than XMP profiles being not followed.
Patrick said above:"I don't think that XMP is the problem. My guess is that raminit doesn't set all required registers to fine tune the memory controller to get it stable."
Maybe it is the explanation, and XMP profiles are indeed not needed at all. Maybe I am very wrong in my analysis.
At the moment, I would just like to have the ram on my W520 stable when it operates within specifications (and I mean within a XMP profile), as I was planning to use the W520 as my main laptop, and I can't :-(
I thought porting coreboot to the W520 would help me do that. These ram issues are really bothering me. I can't have unstable RAM on my main laptop. This is why I am extremely motivated to make it work.
I will be making more tests tonight. I included the patch #17389 your posted today: nb/intel/sandybridge/raminit: Fix CAS Write Latency
I disabled all my SPD hardcoding, and only disabled the MRC cache, so that I can alternate between normal and fallback to run more tests without reflashing.
I have strictly no experience with coreboot and I'm learning on the go. Your help in fixing the RAM issues would be greatly appreciated
Thanks, Charlotte