Hello,
I find with my RS780/SB700 board (ECS A780GM-M3), the initial family 10h AP launch is not reliable. I often see a hang when when all the cores are running in parallel. The parallel execution is apparent if serial logging is enabled, because the characters logged by each core are interleaved. I noticed the AP cores do a few cf8/cfc pci config writes. Could simultaneous access to this shared index/data pair be responsible for the hang? Concerns about such conflicts is why the AMD agesa code does not run cores on the same die in parallel.
Thanks, Scott
Scott Duplichan wrote:
Could simultaneous access to this shared index/data pair be responsible for the hang?
Sure thing.
Concerns about such conflicts is why the AMD agesa code does not run cores on the same die in parallel.
There has been talk about changing the parallelized part of coreboot.
//Peter
Am 12.10.2010 06:08, schrieb Scott Duplichan:
I find with my RS780/SB700 board (ECS A780GM-M3), the initial family 10h AP launch is not reliable. I often see a hang when when all the cores are running in parallel. The parallel execution is apparent if serial logging is enabled, because the characters logged by each core are interleaved. I noticed the AP cores do a few cf8/cfc pci config writes. Could simultaneous access to this shared index/data pair be responsible for the hang? Concerns about such conflicts is why the AMD agesa code does not run cores on the same die in parallel.
This probably happens in the code that determines the CPUID, right? I worked around this by reading until the CPUID is != 0xffffffff (which is an invalid value, but the value that a failed read returns). It's a hack, but it worked for me.
Patrick
"Scott Duplichan" scott@notabs.org writes:
I find with my RS780/SB700 board (ECS A780GM-M3), the initial family 10h AP launch is not reliable. I often see a hang when when all the cores are running in parallel. The parallel execution is apparent if serial logging is enabled, because the characters logged by each core are interleaved. I noticed the AP cores do a few cf8/cfc pci config writes. Could simultaneous access to this shared index/data pair be responsible for the hang? Concerns about such conflicts is why the AMD agesa code does not run cores on the same die in parallel.
This is actually the main reason I fixed MMCONFIG for AMD boards, since that avoids the cf8/cfc race. I would suggest enabling that, but you might have to change the default address range. I believe there's a BAR that's assigned a temporary address by the RS780 code that would clash.
On Tue, Oct 12, 2010 at 1:54 AM, Arne Georg Gleditsch arne.gleditsch@numascale.com wrote:
"Scott Duplichan" scott@notabs.org writes:
I find with my RS780/SB700 board (ECS A780GM-M3), the initial family 10h AP launch is not reliable. I often see a hang when when all the cores are running in parallel. The parallel execution is apparent if serial logging is enabled, because the characters logged by each core are interleaved. I noticed the AP cores do a few cf8/cfc pci config writes. Could simultaneous access to this shared index/data pair be responsible for the hang? Concerns about such conflicts is why the AMD agesa code does not run cores on the same die in parallel.
This is actually the main reason I fixed MMCONFIG for AMD boards, since that avoids the cf8/cfc race. I would suggest enabling that, but you might have to change the default address range. I believe there's a BAR that's assigned a temporary address by the RS780 code that would clash.
Scott,
As Arne said, make sure you have MMCONFIG enabled. I think it should be on by default for all fam10 now?
Marc
]From: coreboot-bounces@coreboot.org [mailto:coreboot-bounces@coreboot.org] On Behalf Of Arne Georg ]Gleditsch ] ]"Scott Duplichan" scott@notabs.org writes: ]> I find with my RS780/SB700 board (ECS A780GM-M3), the initial ]> family 10h AP launch is not reliable. I often see a hang when ]> when all the cores are running in parallel. The parallel execution ]> is apparent if serial logging is enabled, because the characters ]> logged by each core are interleaved. I noticed the AP cores do a ]> few cf8/cfc pci config writes. Could simultaneous access to this ]> shared index/data pair be responsible for the hang? Concerns about ]> such conflicts is why the AMD agesa code does not run cores on the ]> same die in parallel. ] ]This is actually the main reason I fixed MMCONFIG for AMD boards, since ]that avoids the cf8/cfc race. I would suggest enabling that, but you ]might have to change the default address range. I believe there's a BAR ]that's assigned a temporary address by the RS780 code that would clash.
Hello Arne,
Thanks. Enabling MMIO config space access makes a huge improvement. I already had everything esle ready. All I had to do was enable CONFIG_MMCONF_SUPPORT_DEFAULT. Now the problem appears to be solved. Great work on this feature.
Thanks, Scott
]-- ] Arne.