Hi!
Looking at some of the changes proposed with the new support of Intel
Sandybridge and Ivybridge, combined with my previous design choices made
with support of Intel Hyper-Threading for NetBurst architectures and SMP
generally, made me share my thoughts of the Coreboot stage -layout.
So there is currently bootblock, romstage, ramstage and payload, in that
specific order. I have identified a few issues that would need to be
worked on.
1. Built-in-self test failures
On (Intel) SMP system only BSP CPU failure is detected and possibly
reported. I think architecture allows that BSP CPU is not the same
physical core across power-cycles. One should consistently either be
redundant or die on single CPU failure.
2. Serial console
This is initialized in romstage and requires working cache to work. If
due to a BIST failure or bad cache-as-ram init code, cache fails to
work, there is no console.
3. Microcode updates
The "tiny" bootblock doesn't seem like the correct place for microcode
updates.
4. Cache coherency
MTRR setup should be consistent across all CPUs. If all CPUs are started
for microcode updates before ramstage, they should fix their MTRRs too.
Even then, pre-ram spinlocks may be impossible to implement, so pre-ram
SMP operation is very, very restricted.
5. XIP alignment
If 4 variable MTRRs were used in pre-ram execution environment for XIP,
there would be no alignment requirement on the placement of XIP romstage
in Flash ROM. Such runtime MTRR setup code is around 512 bytes and cache
footprint would extend at most 30% over the actual romstage size.
A single MTRR setup may reserve almost twice the actual size of a
romstage in both flash and cache memory.
6. Bypassing raminit
One may want to start his Coreboot conversion job from something less
complex than raminit, like setting up PCI device tree. With the amount
of cache on modern CPUs, one could probably run libpayload -apps from
cache. One such a nice app would be zmodem download of raminit.
7. CPU max physical address
MTRR physical mask should be set correctly for the time of romstage too,
just in case memory over 4GB is tested. Should first auto-detect and
then provide work-arounds for CPU errata.
Counting all the above together, I would like to start some discussion
whether the current 4 stage model is the best design choice. I am
thinking about some changes in the layout as a fix:
1. Bootblock
No real change. Must guarantee access to all of Flash ROM and
operational PCI configuration cycles for following stages.
Contains boot vector for any AP CPUs.
Exits in protected mode to Stageloader.
2. Stageloader
A new stage. This has a pre-CAR environment (ROMCC-build) to enable
early serial console and control MTRR setup to enable cache-as-ram.
Pre-CAR environment can execute stages from Flash ROM with XIP.
This also has a CAR and RAM environment (GCC-build) that can execute XIP
stages from Flash ROM or decompress stages to CAR/RAM from Flash ROM.
3. CPU init
A new stage built with ROMCC. Checks BIST of AP CPUs, executes microcode
updates and handles the issue of shared Cache-Disable bit on
hyper-threading Intel CPUs.
4. RAM init
Old romstage built with GCC. Returns to Stageloader after DRAM is
functional, but before any DRAM is written.
5. DEV init
Old ramstage built with GCC. Only change is that microcode update and
SMP setup is already taken care of.
6. Payload
No changes required.
I would be interested in working on some of these topics and I think I
can also test most of the suggested changes on older SMP hardware.
Thanks,
Kyösti Mälkki
<kyosti.malkki(a)gmail.com>