On Mon, Sep 6, 2010 at 6:40 PM, Scott Duplichan scott@notabs.org wrote:
Stefan Reinauer stefan.reinauer@coresystems.de writes:
Can you see if the patches posted in http://article.gmane.org/gmane.linux.bios/57707 make any difference for you?
Did we ever figure out what is causing this?
The last time I really dug into this, it was fairly obvious that it was caused by instruction fetch thrashing towards the ROM. I tried to amend this with MTRR settings, but I was unable to make that work correctly. For some reason it seemed like the HT requests were sublty changed when the MTRR was applied, and didn't hit the legacy southbridge properly.
The patch would require 4KB more stack on all supported systems, so if we can we should do things differently.
It doesn't have to be stack, but it is nice to have it memory mananged in some way. The unrv2b patch I posted addressing the same problem was even more kludgy.
Also, it's not really guaranteed that the code works from the new location since we don't compile coreboot with -fPIC (and as far as I understand the GCC guys, even that would not help), so I am a bit hesitant to check this in.
Agreed, it is a bit icky. Not sure what the best way to handle that is, though. On the pro side, I assume breakage here is going to be obvious, and (supposing these patches actually help Nick) this is an issue people are running into with some regularity.
-- Arne.
-- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
One necessary condition for caching MMIO such as the flash chip on AMD family 10h processors is not well known:
If the processor has an L3 cache, then bit 15 of msr C001_102A (ClLinesToNbDis) must be set. This bit needs to eventually be cleared in order for the OS to use the L3 cache. But BIOS must not clear this bit until cacheable accesses to the flash chip are no longer needed. This situation applies only to family 10h processors that have L3 cache. Often BIOS clears this bit too early and slow execution results.As an experiment, you could add code to set this bit before the slow function and see what happens.
Last night I tried to debug this code on simnow. An HT modeling problem kept me from getting past HT init. I may try it again today.
The recommended cacheability setting for MMIO is WP. At the point the simnow model hangs in HT init, the setting is WB. While this should be OK for family 10h, it will be important to use WP for families 14h and 15. ClLinesToNbDis is properly set for MMIO caching at this point (HT init):
------------Effective memory type and destination by address------------
NORMAL NORMAL NORMAL SMM SMM SMM READ WRITE EXECUTE READ WRITE EXECUTE 00000-C3FFF UC MMIO.................... UC MMIO.................... C4000-CFFFF WB DRAM.................... WB DRAM.................... D0000-FFFFF UC MMIO.................... UC MMIO....................
00100000-00FFFFFF UC DRAM 01000000-FFEFFFFF UC MMIO FFF00000-FFF7FFFF WB MMIO <=== really should be WP FFF80000-FFFFFFFF UC MMIO
-msr c001102a 00000040_010080C8 <=== good at this point
Thanks, Scott
Simnow testing with Tilapia confirms that the coreboot AMD family 10h code _does_ have the problem of clearing ClLinesToNbDis too early. To confirm this problem, someone testing on real AMD family 10h hardware should remove the msr C001_001a write from STOP_CAR_AND_CPU() and from mct_ClrClToNB_D(). An AMD F10h system running an optimized legacy bios can boot to a DOS ptompt in less than one second. There is no reason coreboot should be any slower.
Thanks, Scott
Ah, this makes sense now. Is c001_001a a shared msr? Is it ok for the APs to be disabled and just leave it enabled on the BSP until the ramstage is decompressed? I should have a patch ready this afternoon.
Marc