On Wed, Apr 28, 2010 at 6:26 AM, Myles Watson mylesgw@gmail.com wrote:
Have you tried changing it to pci_locate_device_on_bus()? That will constrain the search to a single bus.
That doesn't help; config space accesses to the PCIe bridge devices themselves are hanging.
I worked around the problem by replacing pci_locate_device() with hardcoded PCI_DEV values, which should be okay for this chipset as long as the northbridge and southbridge are always at the normal addresses.
I managed to set up SerialICE on my board and get a few thousand lines of tracing from the factory BIOS. I notice that it doesn't touch those PCIe bridge devices at all early on. Is it possible that some HyperTransport magic needs to happen to get them to behave?
I don't know. It's surprising to hang.
I think there must be some MTRR setup problem. Maybe you could print
out
the MTRRs just before the slow parts?
Here's a dump of various MSRs right after the call to raminit_amdmct() in romstage.c:
/* variable MTRRs */ msr 00000200=0000000000000000 msr 00000201=0000000000000000 msr 00000202=00000000fff00006 msr 00000203=0000fffffff80800
This looks wrong to me. I'm not an expert, but Since 202 is the base, and 203 is the mask, It looks like the area from 0xfff00000 - 0xfff7ffff is cached. I would think the correct setting would be:
msr 00000202=00000000fff00006 msr 00000203=0000fffffff00800
To cache the last MB of mem.
msr 00000204=0000000000000006 msr 00000205=0000ffff80000800
Then this one caches 0 - 2GB
msr 00000206=0000000080000006 msr 00000207=0000ffffc0000800
This one caches 2GB-3GB
msr 00000208=00000000c0000006 msr 00000209=0000ffffe0000800
This one caches 3GB-3.5GB
Something to keep in mind is that caching should be disabled then enabled when setting the var MTRRs. Since you don't want to disable caches when you're using cache-as-RAM, I think it's best to make sure that the MTRRs are set correctly from the beginning and not touch them again until you've copied the RAM stage to the RAM and moved your stack.
msr 0000020a=0000000000000000 msr 0000020b=0000000000000000 msr 0000020c=0000000000000000 msr 0000020d=0000000000000000 msr 0000020e=0000000000000000 msr 0000020f=0000000000000000
/* fixed MTRRs */ msr 00000250=1e1e1e1e1e1e1e1e msr 00000258=1e1e1e1e1e1e1e1e msr 00000259=0000000000000000 msr 00000268=1e1e1e1e00000000 msr 00000269=1e1e1e1e1e1e1e1e msr 0000026a=0000000000000000 msr 0000026b=0000000000000000 msr 0000026c=0404040404040404 msr 0000026d=0404040404040404 msr 0000026e=0404040404040404 msr 0000026f=0404040404040404
/* variable & fixed MTRRs enabled */ msr 000002ff=0000000000000c00
/* IORRs */ msr c0010016=0000000080210000 msr c0010017=0000000000000000 msr c0010018=0000000000000000 msr c0010019=0000000000000000
/* top of memory registers */ msr c001001a=00000000e0000000 msr c001001d=0000000120000000
Another experiment I tried was to replace memset() with the original assembler version of clear_memory(). With this change, the "Clearing initial memory region:" step takes a fraction of a second vs. minutes with memset(). Then things grind to a halt after "Stage: loading fallback/coreboot_ram @ ...".
My theory is that since clear_memory was a single rep instruction, the fact that it wasn't being cached wasn't a big deal. With the caches set correctly, memset was faster on my board.
Good luck, Myles