Dear coreboot community,
I'm trying to use coreboot on my second PC and I think I need your help. I have an ASUS A8V-E Deluxe with coreboot which does work with a single 1GB ECC module, but not with a second identical one. It hangs at jumping from romstage to coreboot_ram (I hope that's correct). It's last words are "Jumping to image", then nothing.
I'm attaching two logs from serial console, one with 1x1GB RAM installed which works, one with 2x1GB which hangs, and also the diff. Looking at the diff I noticed that in the 2x1GB log there are some missing lines between "Copying data from cache to RAM -- switching to use RAM as stack..." and "Clearing initial memory region: Done". What should happen in between seems to be omitted, which doesn't make sense to me, looking at src/cpu/amd/car/post_cache_as_ram.c . With my little experience I can't figure out what's wrong..
Anything I can do to debug? Do you have an idea?
Some notes: - 2x1GB starts up with the vendor BIOS but randomly hangs at booting linux (kernel panic or so) - 1x1GB works with coreboot and vendor BIOS - Replacing those 1GB ECC modules (Corsair/Samsung chips) with 512MB ECC modules (HP/Samsung or MDT/???) leads to the same results (2 modules fail with coreboot; vendor BIOS unknown) - 2x512MB non-ECC works!
Why are only 2 modules with ECC failing to boot, but not 2 non-ECC? I tried commenting out the contents of hw_enable_ecc() in src/northbridge/amd/amdk8/raminit.c but that didn't change anything. memtest86+ even still detected ECC (with only one module).
Apart from that I got a little success story: the VIA K8T890 chipset on this mainboard had a bug in it's first revision, making it incompatible with dual-core CPUs. I was curious if coreboot also had this limitation, so I bought a cheap dual-core Opteron 180 and tested it. Vendor BIOS: one core detected in linux. coreboot: two cores!! I'm not sure if it's stable because I had two hang-ups last weekend, but that was probably because I forgot to put in the GPU fan plug.. :) If anyone knows the details of this chipset bug I'd be very interested. By the way, the RAM issue is the same with a single-core Athlon 64.
Thanks, Michael
Anything I can do to debug? Do you have an idea?
Yes it looks like memory is setup in wrong way. I did the port for ASUS A8V-E SE board. Possible reasons:
0) something is wrong with placement of dimms
I remember sometimes memory did not work well if in second channel, sometimes even the board had it labeled vice versa... Try to put dimm in different slot. It could fix single dimm issues
1) something is wrong with dualchannel setup
This is usually sign that the following table is wrong:
You can try to modify it like this as it is on A8V-E SE because the deluxe version seems wrong to me (you have only 4 dimm slots)
static const uint16_t spd_addr[] = { // Node 0 DIMM0, DIMM2, 0, 0, 0, 0, 0, 0, // Node 1 DIMM1, DIMM3, 0, 0, 0, 0, 0, 0, };
It tells that DIMM0 with i2c address 0x50 is first dimm of channel A and second dimm of channel A is 0x52. The channel B is 0x51 and 0x53. You could try to use
modprobe i2c-viapro modprobe i2c-dev i2cdetect -l (now select right bus) i2cdetect 0
To see if you plug single dimm into diffrent slots how this number changes.
if above does not work it could be:
// Node 0 DIMM0, DIMM1, 0, 0, 0, 0, 0, 0, // Node 1 DIMM2, DIMM3, 0, 0, 0, 0, 0, 0,
But it is not so likely. You will need to use the above i2cdetec trick to see how it maps to i2c addresses.
2) something is wrong with memory init
lets wait if above helps, I suspect it should.
Some notes:
- 2x1GB starts up with the vendor BIOS but randomly hangs at booting
linux (kernel panic or so)
Looks like wrong memory setup too.
- 1x1GB works with coreboot and vendor BIOS
- Replacing those 1GB ECC modules (Corsair/Samsung chips) with 512MB ECC
modules (HP/Samsung or MDT/???) leads to the same results (2 modules fail with coreboot; vendor BIOS unknown)
- 2x512MB non-ECC works!
Why are only 2 modules with ECC failing to boot, but not 2 non-ECC? I tried commenting out the contents of hw_enable_ecc() in src/northbridge/amd/amdk8/raminit.c but that didn't change anything. memtest86+ even still detected ECC (with only one module).
Apart from that I got a little success story: the VIA K8T890 chipset on this mainboard had a bug in it's first revision, making it incompatible with dual-core CPUs. I was curious if coreboot also had this limitation, so I bought a cheap dual-core Opteron 180 and tested it. Vendor BIOS: one core detected in linux. coreboot: two cores!!
Yes because I was not aware of this problem while implementing this chipset support 6? years ago. Do you have any details?
I'm not sure if it's stable because I had two hang-ups last weekend, but that was probably because I forgot to put in the GPU fan plug.. :) If anyone knows the details of this chipset bug I'd be very interested. By the way, the RAM issue is the same with a single-core Athlon 64.
I see I have also no clue.
Thanks Rudolf
Thanks, Michael
On Sun, 2013-10-27 at 00:14 +0200, Rudolf Marek wrote:
Anything I can do to debug? Do you have an idea?
Yes it looks like memory is setup in wrong way. I did the port for ASUS A8V-E SE board. Possible reasons:
- something is wrong with placement of dimms
I remember sometimes memory did not work well if in second channel, sometimes even the board had it labeled vice versa... Try to put dimm in different slot. It could fix single dimm issues
You're right with the single dimm issues, I have to use slot B1 or B2 for a single module, vendor BIOS and coreboot. But I also found a more or less working configuration I had somehow missed before: putting both 1GB ECC modules in B1 and B2 makes coreboot detect full 2GB and boot, but with reduced speed: "Memory speed reduced due to signal loading conditions", and not DualChannel of course; see log. Anything else hangs at "Jumping to image" or detects only 1GB.
- something is wrong with dualchannel setup
This is usually sign that the following table is wrong:
You can try to modify it like this as it is on A8V-E SE because the deluxe version seems wrong to me (you have only 4 dimm slots)
static const uint16_t spd_addr[] = { // Node 0 DIMM0, DIMM2, 0, 0, 0, 0, 0, 0, // Node 1 DIMM1, DIMM3, 0, 0, 0, 0, 0, 0, };
I used i2cdetect like you suggested (which was interesting!) and I found out that this should be the correct table. The labels on the board seem to be vice versa indeed:
Board label: A1 | A2 | B1 | B2 <-- physical layout ----------------- i2c address: 51 | 53 | 50 | 52
With both modules in A1/B1 or A2/B2 the system would boot now, but only 1GB was detected, or 512MB with 512MB modules, also with non-ECC! With both modules in B1/B2 it was the same as before (2GB detected, reduced speed, but works). Both log files are with this corrected table.
I also tried to swap DIMM0, DIMM2 with DIMM1, DIMM3 and the results from i2cdetect were funny:
0 1 2 3 4 5 6 7 8 9 a b c d e f 00: -- -- -- -- -- -- -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 2f 30: -- -- -- -- -- 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40: 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50: 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60: 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70: 70 71 72 73 74 75 76 77
Also, only the module in B1 was detected, not in B2, if I remember correctly..
- something is wrong with memory init
lets wait if above helps, I suspect it should.
I'm not sure right now if it did.. the table is probably correct but now even non-ECC is not fully detected. Assuming it IS correct, is there anything I can try except making myself familiar with RAM init and searching for a bug, which could take a while, like, forever?
Apart from that I got a little success story: the VIA K8T890 chipset on this mainboard had a bug in it's first revision, making it incompatible with dual-core CPUs. I was curious if coreboot also had this limitation, so I bought a cheap dual-core Opteron 180 and tested it. Vendor BIOS: one core detected in linux. coreboot: two cores!!
Yes because I was not aware of this problem while implementing this chipset support 6? years ago. Do you have any details?
I'm glad you were not! I don't know any more than what I found using google though [1]. The A8V-E SE had a newer chipset revision which was fixed. Probably they shipped some A8V-E Deluxe boards with the new revision and mine already has it as well.
Thank you for your help Rudolf!
Michael
Board label: A1 | A2 | B1 | B2 <-- physical layout
i2c address: 51 | 53 | 50 | 52
OK then we need to modify it on WIKI page and also produce a patch to fix it..
I also tried to swap DIMM0, DIMM2 with DIMM1, DIMM3 and the results from i2cdetect were funny:
looks like hanged I2C bus.
I'm not sure right now if it did.. the table is probably correct but now even non-ECC is not fully detected. Assuming it IS correct, is there anything I can try except making myself familiar with RAM init and searching for a bug, which could take a while, like, forever?
No, if you get this far, don't underestimate yourself. I remember back in those days memory init looked looked like a huge magic, but nowadays with DDR2/DDR3 it is MAGIC. Last 5 years added a lot of complexity to the raminit, which allows me to say that K8 DDR is sufficiently simple. Not easy but doable.
I think you need to see message from spd_enable_2channels()
printk(BIOS_SPEW, "Enabling dual channel memory\n");
(you need to set debug level to SPEW)
If you dont see thise message the function above failed for some reason and only one channel is used. Try to look into this function first.
Thank you for your help Rudolf!
No problem, I'm glad that my work is still used.
Thanks Rudolf