Hi Marc,
sorry for the new load of questions. I'm trying to fully understand our AMD64 init sequence in the hope of adapting v3 to it. Especially multicode/multiprocessor support seems to be lacking a bit in v3.
While reading and comparing the various BKDGs, I found an inconsistency. All BKDGs say the following: MSR0000_001B APIC Base Address Register (APIC_BAR) Bit 8 BSP/BSC is read-write. The AMD64 Architecture Programmer's Manual Volume 2 says the following: Bit 8 BSC is read-only. Which one is correct and does changing that bit even make sense?
Family 10h BKDG, chapter 2.5 Processor State Transition Sequences is empty.
Family 10h and 11h BKDGs suggest in chapter 2.3.3 "Using L2 Cache as General Storage During Boot" that it is possible to set CAR to WB-DRAM, fill it with code, then change it to WP-IO and execute code from it. Is that right? If so, is there a similar ability in older AMD processors?
In the same chapter, the BKDGs also say it is possible to flush CAR contents to RAM during CAR disabling. "If DRAM is initialized and there is data in the cache that needs to get moved to main memory, CLFLUSH or WBINVD may be used instead of INVD". Older BKDGs are a bit fuzzy on this. Will this work on K8 and later?
For additional bringup verification, we could follow the suggestion of "Performance monitor event EventSelect 07Fh [L2 Fill/Writeback], sub-event bit 1, titled 'L2 Writebacks to system', can be used to indicate whether L2 dirty data was victimized and sent to the disabled memory controller." I don't see any code in the tree performing this check.
General initialization: I couldn't find the expected BSP/AP CAR/DRAM sequence. My current understanding is this: 1. Poweron 2. BSP(BSC) running, all APs halted 3. CAR enabled on BSP 4. HT link setup from BSP 5. DRAM on all cores enabled from BSP (did that change with Fam 10h?) 6. CAR disabled on BSP 7. BSP signals all APs to start 8. APs start execution at the reset vector 9. APs enable CAR 10. ??? Depending on the real sequence, my question is how to pass around data from the APs to the BSP, preferably by letting the APs write to BSP memory directly.
Locking between cores/processors. Is there a way to atomically access a common memory location/register from all active cores, even if some of those cores are still in CAR? Will this work in MP systems?
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
Hi Marc,
sorry for the new load of questions. I'm trying to fully understand our AMD64 init sequence in the hope of adapting v3 to it. Especially multicode/multiprocessor support seems to be lacking a bit in v3.
While reading and comparing the various BKDGs, I found an inconsistency. All BKDGs say the following: MSR0000_001B APIC Base Address Register (APIC_BAR) Bit 8 BSP/BSC is read-write. The AMD64 Architecture Programmer's Manual Volume 2 says the following: Bit 8 BSC is read-only. Which one is correct and does changing that bit even make sense?
It should be read-only. The BSP should set it once it figures out it is the BSP.
Family 10h BKDG, chapter 2.5 Processor State Transition Sequences is empty.
Family 10h and 11h BKDGs suggest in chapter 2.3.3 "Using L2 Cache as General Storage During Boot" that it is possible to set CAR to WB-DRAM, fill it with code, then change it to WP-IO and execute code from it. Is that right? If so, is there a similar ability in older AMD processors?
You don't need to change the state. The Fam10 code works this way. The memory eye finding requires it (or it would never finish).
In the same chapter, the BKDGs also say it is possible to flush CAR contents to RAM during CAR disabling. "If DRAM is initialized and there is data in the cache that needs to get moved to main memory, CLFLUSH or WBINVD may be used instead of INVD". Older BKDGs are a bit fuzzy on this. Will this work on K8 and later?
I think that a WBINVD should always work. This is one reason I am not sure why memory is copied in the CAR code.
For additional bringup verification, we could follow the suggestion of "Performance monitor event EventSelect 07Fh [L2 Fill/Writeback], sub-event bit 1, titled 'L2 Writebacks to system', can be used to indicate whether L2 dirty data was victimized and sent to the disabled memory controller." I don't see any code in the tree performing this check.
I think you are correct.
General initialization: I couldn't find the expected BSP/AP CAR/DRAM sequence. My current understanding is this:
- Poweron
- BSP(BSC) running, all APs halted
- CAR enabled on BSP
- HT link setup from BSP
- DRAM on all cores enabled from BSP (did that change with Fam 10h?)
- CAR disabled on BSP
- BSP signals all APs to start
- APs start execution at the reset vector
- APs enable CAR
- ???
Depending on the real sequence, my question is how to pass around data from the APs to the BSP, preferably by letting the APs write to BSP memory directly.
1,2,3,4,7,8,9,5,6.
See the serengeti-cheetah-fam10 cache_as_ram_auto.c. There are comments about the AP calls there.
Locking between cores/processors. Is there a way to atomically access a common memory location/register from all active cores, even if some of those cores are still in CAR? Will this work in MP systems?
This is already done. See the sysinfo structure sharing in CAR. Note that it is only read from the APs.
Marc