Hello,
Years ago, I developed a BadRAM patch that enabled Linux to run in spite of broken memory chips. I am now contemplating making it go into coreboot.
http://rick.vanrein.org/linux/badram/
BadRAM uses a very terse list of address/mask pairs to describe faulty locations. I am talking to JEDEC, trying to get them to standardise a format for putting this in the SPD-EEPROMs on board of DIMMs. Reading these EEPROMs is a task best left to the BIOS, and CoreBoot could become a suitable implementation of BIOS-based (OS-independent) BadRAM.
After browsing through the coreboot code, it seems that the best approach to follow would be to call lb_remove_memory_range() on ranges that are faulty, after having included all the memory on each DIMM. CoreBoot would then deliver a memory map with more regions than in a usual setup. For example, if one row is marked bad, and it consists of 4096 columns, there may be as many as 4096 ranges marked bad. This would make the memory map expand -- but in a payload-compatible manner.
I am not sure how the memory map is used after booting... - does the memory map feed the e820 bios call (through openbios, say)? - will all payloads use CoreBoot's memory map, one way or another? - is the memory map reclaimed after booting, and if so when? - would a memory map of, say, 4097 entries (so, 81 kB) ever be problematic?
If the memory for an expanded memory map is wasted, it's probably not ideal to do the expansion in CoreBoot. In that case, would it be a reasonable solution to read the SPD-stored information about BadRAM patterns and move it to either the lb_memory_range structure or a separate struct that lists pairs of address/mask? CoreBoot is the best place to make such a translation.
The reason for putting the BadRAM address/mask pairs in SPD is that it carries the fault knowledge onto the DIMM itself, in a portable way. If this weren't so ideal, I'd propose putting the whole thing in NVRAM; that however would make the BadRAM patterns dedicated to a machine, and it would disable usage patterns where broken memory would be plugged in as if it were the most normal case in the world. I'd love to make the use of broken memory chips as commonplace as possible, so as to avoid the environmental damage caused by making (memory) chips.
Your responses are kindly welcomed.
Thanks,
Rick van Rein GroenGemak http://groengemak.nl/en/
On Sun, Sep 28, 2008 at 10:51:43PM +0000, Rick van Rein wrote:
Hello,
Years ago, I developed a BadRAM patch that enabled Linux to run in spite of broken memory chips. I am now contemplating making it go into coreboot.
Hi,
This is a very nice idea. I hope the coreboot maintainers will find it suitable too :-)
After browsing through the coreboot code, it seems that the best approach to follow would be to call lb_remove_memory_range() on ranges that are faulty, after having included all the memory on each DIMM. CoreBoot would then deliver a memory map with more regions than in a usual setup. For example, if one row is marked bad, and it consists of 4096 columns, there may be as many as 4096 ranges marked bad. This would make the memory map expand -- but in a payload-compatible manner.
Note that there are two interfaces for exporting a memory map, one is the native coreboot table interface (write_coreboot_table()) and the other is Multiboot (write_multiboot_info). Please make sure your code supports both.
(That's for v3. In v2, Multiboot is not merged yet, but I expect it will be soon)
I am not sure how the memory map is used after booting...
- does the memory map feed the e820 bios call (through openbios, say)?
Yes (with SeaBIOS or ADLO, that is, but openbios has an equivalent interface). Though you don't have to worry about this; all these stages get the info from coreboot through the interfaces mentioned above.
- would a memory map of, say, 4097 entries (so, 81 kB) ever be problematic?
I think it could, but if it is problems are likely to happen elsewhere, and they're bugs that need to be fixed IMHO.
I'd love to make the use of broken memory chips as commonplace as possible, so as to avoid the environmental damage caused by making (memory) chips.
A laudable goal. My hat off to you.