Marc Jones wrote:
On Fri, Nov 27, 2009 at 2:05 AM, Nathan Williams nathan@traverse.com.au wrote:
Nathan Williams wrote:
Marc Jones wrote:
On Tue, Nov 24, 2009 at 1:09 AM, Nathan Williams nathan@traverse.com.au wrote:
Marc Jones wrote:
On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams nathan@traverse.com.au wrote: > I managed to get the commercial BIOS to boot on my board and diffed it with coreboot: > > http://coreboot.pastebin.com/m39b22c21 > > The only differences I can see are related to interrupts, which shouldn't matter in relation to > my RAM problems. > > I have also run a memtest86 with the commercial BIOS (from bootable CDROM) and as a payload in coreboot. > The commercial BIOS didn't have any errors, but my coreboot did. So the hardware can't be too bad. That looks like just the southbridge cs5536 target. The memory differences would be in the processor geodelx target. Can you send those results?
Marc
I did some new MSR dumps.
Diff: ./msrtool -t geodelx -t cs5536 -d amd_ref_bios http://coreboot.pastebin.com/m5e487f87
AMD NAS reference BIOS: ./msrtool -t geodelx -t cs5536 -l -s amd_ref_bios http://coreboot.pastebin.com/madc04ac
My Coreboot: ./msrtool -t geodelx -t cs5536 -l -s nathan_bios http://coreboot.pastebin.com/m7f35d855
The diffs I did today show some differences with GLCP_DELAY_CONTROLS. Last time I added some code to force it to match the commercial BIOS GLCP_DELAY_CONTROLS MSR, but it didn't seem to make any difference.
I also tested all the SODIMMS I have here (about 10) with the commercial BIOS. Each time I did a msrtool diff to one I saved on disk.
Most are 333MHz, but 2 are 400MHz. There weren't any changes to the MSRs.
Could there be an issue with the initialisation sequence that reading MSRs after booting won't show? Also, quite a few MSRs aren't defined in geodelx.c yet. Are there any obvious ones that should be added in?
--- AMD NAS reference BIOS +++ Nathan's coreboot v3 # # GLCP_DELAY_CONTROLS # -0x4c00000f 0x83f1_00aa_5696_0404 +0x4c00000f 0x8271_005a_ 5696_ 0404
It looks like coreboot and the ref bios detect different dimm configuration. This timing setup could be part of the instability (I don't think it explains the reset problem). Look at the code here: SetDelayControl(void) and anywhere else that GLCP_DELAY_CONTROLS gets set to see what might be happening. Make sure that MTest is disabled in the ref bios setup. This setting is based on the number of devices (load) there is on the dimm.
I didn't realize that so few registers were in the msr tool for geodelx. You should add these: 20000018h R/W Refresh and SDRAM Program (MC_CF07_DATA) 10071007_00000040h Page 227 20000019h R/W Timing and Mode Program (MC_CF8F_DATA) 18000008_287337A3h Page 229 2000001Ah R/W Feature Enables (MC_CF1017_DATA) 00000000_11080001h Page 231 2000001Bh RO Performance Counters (MC_CFPERF_CNT1) 00000000_00000000h Page 232 2000001Ch R/W Counter and CAS Control (MC_PERCNT2) 00000000_00FF00FFh Page 233 2000001Dh R/W Clocking and Debug (MC_CFCLK_DBUG) 00000000_00001300h Page 233
4C00000Fh R/W GLCP I/O Delay Controls(GLCP_DELAY_CONTROLS)00000000_00000000h Page 549 4C000014h R/W GLCP System Reset and PLL Control (GLCP_SYS_RSTPLL) Bootstrap specific Page 554
Marc
I've now added the MSRs and uploaded to pastebin:
AMD NAS: http://coreboot.pastebin.com/m53aed60b
My coreboot: http://coreboot.pastebin.com/md23bc6a
./msrtool -d AMD_NAS: http://coreboot.pastebin.com/m77663de5
Tomorrow I'll try the tests on the NAS hardware, instead of our own motherboards just in case there are some hidden hardware issues.
Regards, Nathan
On the NAS reference board I got the following diff between coreboot and the commercial BIOS:
http://coreboot.pastebin.com/m1353db1a
As you can see there are a lot of latency differences. Unfortunately it was only later that I realised that the differences are because the bootstraps are set to bypass, which means coreboot uses 266 as the speed, where as the commercial bios uses 333. So when I repeat the same on our boards, the only difference in the geodelx MSRs is:
# MC_CFCLK_DBUG -0x2000001d 0x0000000000000000 +0x2000001d 0x0000000000001000 # 12 TRISTATE_DIS TRI-STATE Disable -0: Tri-stating enabled +1: Tri-stating disabled
Nathan,
I don't think the tri-state disable bit explains the problems you have seen. Since the memory has the same settings, the problem must be somewhere else. You will need to go back the the reboot path to investigate. It seems like something in the reset isn't doing a complete reset, which causes a problem with the cache disable.
Marc
I am suspicious that the reset problem only occurs when I'm using a laptop hard drive off the 44pin IDE connector on our board. I have tried booting with a 3.5" drive and external 12V, but I can't replicate the problem. With the 3.5" drive, a reboot from fsck works fine. Hopefully the next PCB revision should perform better because we've moved the 5V plane further away from the DDR tracks.
I don't know if I mentioned another problem that has similar symptoms. Some RAM causes the same cache disable problem, even if there are no IDE devices connected. This happens from power-up, so it's not a reset issue.
Nathan