Hi Charlotte,
On 11.11.2016 08:14, Charlotte Plusplus wrote:
So I did many more tests today (more than 6h, and flashing around 30 times), with SPD settings hardcoded into raminit, and without the mrc cache interfering.
thanks for the analysis and summing this up.
TLDR: coreboot tries to increase the frequency without increasing the voltage, and that doesn't work for all memory.
Basically, with the problematic ram sticks, I can boot perfectly fine at DDR3-1866 speed, but even the slower setting 11-11-11-31 gives errors on the memtest. This is inconsistent with information I found about my memory from its SPD information, and from other people who overclock this exact same memory.
Even at 10-10-10-27, I still get errors at DDR3-1600 speeds. Far fewer than before, but some errors sill.
After reading more about XMP and SPD, it is my understanding that :
- JEDEC specs stop at 1600, and after that XMP is required
- even before 1600, XMP also offers profiles, and they are not optional:
some memory is otherwise unable to work at its advertised speed
This would mean the memory is just broken. But that's what I suspect of any memory that's supposed to run out of spec.
- XMP profiles are some kind of overclocking: they usually require
adjusting the voltage, to deal with this increased speed
Not kind of overclocking, simply overclocking. There's only one voltage specified for DDR3, IIRC.
- in XMP profile bytes, voltage increase information is given precisely
- nowhere in the code I saw anything increasing the voltage, while XMP
requires that
I conclude that while there may be errors in selecting the SPD settings, even if the SPD is manually corrected with known-good settings, or if overshooting with very generous latencies, some errors do remain as the ram is being asked to operate outside its voltage specifications (given the frequency)
1.5V is a JEDEC spec, but RAM is advertised based on the information contained in the XMP profiles, which at the moment do not seem fully supported.
JEDEC is the standard. If the XMP support is half-baked it should be disabled by default. Maybe we should even put a warning in the log if we encounter an XMP profile with anything else than 1.5V (if it's common that those DIMMs are broken ex factory).
I do not know how to adjust the voltage (it should require talking to the IMC of the CPU) but I think that as soon as this is done, stability should improve.
If someone can propose a patch doing that (either using the voltage read from SPD, or by manually entering voltage information), I will be happy to test it.
Depending on the board the voltage might not be configurable at all. Why should it be if there is only one voltage defined in the standard?
For now, I urge caution when operating even at DDR-1866 frequencies. Most boards do set up 933 as their max_mem_clock_mhz. It is not very prudent to do that until the voltage situation can be solved.
If the board can work at that frequency, that's just fine. If the voltage is a problem, it's due to the memory module. IMHO, the rule should be to ignore SPD frequency settings that include an out of spec voltage.
If you want to do further testing, you can try to find out which com- binations of processor and DIMMs work with the Vendor BIOS or the MRC blob (I wouldn't expect that it supports non-JEDEC stuff, but it would be nice to know if something can be fixed in coreboot easily).
Nico
Hello
On Fri, Nov 11, 2016 at 7:53 AM, Nico Huber nico.h@gmx.de wrote:
After reading more about XMP and SPD, it is my understanding that :
- JEDEC specs stop at 1600, and after that XMP is required
- even before 1600, XMP also offers profiles, and they are not optional:
some memory is otherwise unable to work at its advertised speed
This would mean the memory is just broken. But that's what I suspect of any memory that's supposed to run out of spec.
I think it is a being too extreme. All of the ram sold, and most of what is being used contains XMP profiles. It is hard to say how much of the installed base may have problems when using XMP profiles without adapting the voltage since not many people use coreboot, and not many of those who use coreboot will be running memtest for hours.
- XMP profiles are some kind of overclocking: they usually require
adjusting the voltage, to deal with this increased speed
Not kind of overclocking, simply overclocking. There's only one voltage specified for DDR3, IIRC.
It is indeed Intel fault for rating the IMC only to 1600, and putting a hack on top of that to make RAM go faster. But they made this hack a standard, adopted by manufacturers. So it is not just overclocking in the usual sense. It is Intel and ram-manufacturers validated overclocking, where the XMP profiles contain speed settings + voltage, and negociate with the system to get this voltage.
It is stable, or it wouldn't be used in production by so many bioses.
JEDEC is the standard. If the XMP support is half-baked it should be disabled by default. Maybe we should even put a warning in the log if we encounter an XMP profile with anything else than 1.5V (if it's common that those DIMMs are broken ex factory).
So many standards to chose from lol I wouldn't call the XMP support half baked. It is a very nice addition, as based on my understanding, some combination of chipsets + RAM may not even be able boot without XMP profiles. The XMP implementation just needs to be completed to also do the voltage part.
Likewise, we can't say that 99% of the DIMMs are broken. The XMP profiles have been tested and validated.
In my opinion, XMP should be a compile time option, defaulting to y, but with a warning of possible ram errors.
Most of the boards have a max_mem_clock_mhz at 933, which concerns me just as much in terms of ram errors.
I would suggest RAM settings to override that, and the selected SPD settings. This way that unstable settings detected in memtest86 can be adjusted in nvramgui without having to recompile.
I will do more research to see if I can do that in userland (like MSR can be used for CPU overclocking, there must be a way to specify ram voltage)
End result until XMP voltage can be adjustable in some way or another: - most system will be unaffected - unstable system can put max_mem_clock_mhz override to 666 or see if they can get something better with manual SPD - userland programs may automate that last part - when XMP voltage is supported (and 100 MHz for ivy bridge instead of 133 as Patrick noted, etc), coreboot will have gained much more flexibility in RAM initialization to deal with similar situations that may arise with new specs
It will also be very nice to have all that work without blobs.
Depending on the board the voltage might not be configurable at all. Why
should it be if there is only one voltage defined in the standard?
No, XMP calls for precise voltage. From wikipedia:
bit 0 Module Vdd voltage twentieths (0.00 or 0.05) bits 4:1 Module Vdd voltage tenths (0.0–0.9) bits 6:5 Module Vdd voltage units (0–2)
The standard call for the DIMM asking the system a specific voltage. It is a negociation of speed, latency and voltage, cf https://en.wikipedia.org/wiki/Serial_presence_detect#Extreme_Memory_Profile_...
If the board can work at that frequency, that's just fine. If the voltage is a problem, it's due to the memory module. IMHO, the rule should be to ignore SPD frequency settings that include an out of spec voltage
XMP is a specification. It should be supported. I think the only mistake is that the voltage part is not being applied. It's not out of spec, it's within another spec.
If you want to do further testing, you can try to find out which com- binations of processor and DIMMs work with the Vendor BIOS or the MRC blob (I wouldn't expect that it supports non-JEDEC stuff, but it would be nice to know if something can be fixed in coreboot easily).
I tested the Lenovo bios in depth with memtest86. It works just fine.
Could you give me some suggestions to use the MRC blob? I don't see anything like that in coreboot soure. (And actually, if it's a intel blob, I would expect that it will support XMP)
I will try to find a way to adjust the voltage from userland. I will do more test when I have either the blob thing or the voltage working.
Thanks Charlotte