Hi!
Looking at some of the changes proposed with the new support of Intel Sandybridge and Ivybridge, combined with my previous design choices made with support of Intel Hyper-Threading for NetBurst architectures and SMP generally, made me share my thoughts of the Coreboot stage -layout.
So there is currently bootblock, romstage, ramstage and payload, in that specific order. I have identified a few issues that would need to be worked on.
1. Built-in-self test failures
On (Intel) SMP system only BSP CPU failure is detected and possibly reported. I think architecture allows that BSP CPU is not the same physical core across power-cycles. One should consistently either be redundant or die on single CPU failure.
2. Serial console
This is initialized in romstage and requires working cache to work. If due to a BIST failure or bad cache-as-ram init code, cache fails to work, there is no console.
3. Microcode updates
The "tiny" bootblock doesn't seem like the correct place for microcode updates.
4. Cache coherency
MTRR setup should be consistent across all CPUs. If all CPUs are started for microcode updates before ramstage, they should fix their MTRRs too. Even then, pre-ram spinlocks may be impossible to implement, so pre-ram SMP operation is very, very restricted.
5. XIP alignment
If 4 variable MTRRs were used in pre-ram execution environment for XIP, there would be no alignment requirement on the placement of XIP romstage in Flash ROM. Such runtime MTRR setup code is around 512 bytes and cache footprint would extend at most 30% over the actual romstage size. A single MTRR setup may reserve almost twice the actual size of a romstage in both flash and cache memory.
6. Bypassing raminit
One may want to start his Coreboot conversion job from something less complex than raminit, like setting up PCI device tree. With the amount of cache on modern CPUs, one could probably run libpayload -apps from cache. One such a nice app would be zmodem download of raminit.
7. CPU max physical address
MTRR physical mask should be set correctly for the time of romstage too, just in case memory over 4GB is tested. Should first auto-detect and then provide work-arounds for CPU errata.
Counting all the above together, I would like to start some discussion whether the current 4 stage model is the best design choice. I am thinking about some changes in the layout as a fix:
1. Bootblock
No real change. Must guarantee access to all of Flash ROM and operational PCI configuration cycles for following stages. Contains boot vector for any AP CPUs. Exits in protected mode to Stageloader.
2. Stageloader
A new stage. This has a pre-CAR environment (ROMCC-build) to enable early serial console and control MTRR setup to enable cache-as-ram. Pre-CAR environment can execute stages from Flash ROM with XIP.
This also has a CAR and RAM environment (GCC-build) that can execute XIP stages from Flash ROM or decompress stages to CAR/RAM from Flash ROM.
3. CPU init
A new stage built with ROMCC. Checks BIST of AP CPUs, executes microcode updates and handles the issue of shared Cache-Disable bit on hyper-threading Intel CPUs.
4. RAM init
Old romstage built with GCC. Returns to Stageloader after DRAM is functional, but before any DRAM is written.
5. DEV init
Old ramstage built with GCC. Only change is that microcode update and SMP setup is already taken care of.
6. Payload
No changes required.
I would be interested in working on some of these topics and I think I can also test most of the suggested changes on older SMP hardware.
Thanks,
Kyösti Mälkki kyosti.malkki@gmail.com