Arthur Heymans has uploaded this change for review. ( https://review.coreboot.org/c/coreboot/+/36126 )
Change subject: Documentation: Add cache as ram documentation ......................................................................
Documentation: Add cache as ram documentation
This add documentation around coreboot's history of CAR usage. The history is reconstructed from articles and git/svn history, but should provide a good overview of the challenges and solutions coreboot came up with to deal with early code execution on X86 platforms.
The technical details about CAr will come in follow-up patch and are marked as TODO for now.
Change-Id: I964d59a6bd210f3d4d06220d2ec7166110ab2cf3 Signed-off-by: Arthur Heymans arthur@aheymans.xyz --- A Documentation/cpu/index.md A Documentation/cpu/x86/car.md M Documentation/index.md 3 files changed, 171 insertions(+), 0 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/26/36126/1
diff --git a/Documentation/cpu/index.md b/Documentation/cpu/index.md new file mode 100644 index 0000000..d9f063f --- /dev/null +++ b/Documentation/cpu/index.md @@ -0,0 +1,5 @@ +# CPU-specific documentation + +This section contains documentation about coreboot on specific CPU. + +* [A gentle introduction to Cache-as-Ram on X86](x86/car.md) diff --git a/Documentation/cpu/x86/car.md b/Documentation/cpu/x86/car.md new file mode 100644 index 0000000..3d2983e --- /dev/null +++ b/Documentation/cpu/x86/car.md @@ -0,0 +1,165 @@ +# A gentle introduction to Cache-as-Ram on X86 + +This explains a bit of history on CAR in coreboot, how it works +and then finally give a more in depth overview at the steps of +how it is set up on Intel CPUs. + +## Glossary + +- Cache-as-RAM/CAR: Using not memory mapped CPU cache as execution + environment. +- XIP: Execute in place on a memory mapped medium +- SRAM: Static Random Access Memory, here: memory mapped memory + that needs no initialization. +- ROMCC: A C compiler using only the CPU registers. + +## Cache-as-Ram, the basics + +When an X86 platforms boot, it starts in a pretty bare environment. +The first instruction is fetched from the top of the boot medium, +which on most x86 platforms is memory mapped below 4G (in 32bit +protected mode). Not only do they start in 16bit 'real' mode, +most of the hardware starts uninitialized. This includes the dram +controller. Another thing about X86 platforms is that they generally +don't feature SRAM, static RAM. SRAM for this purpose of this text is +a small region of ram that is available without any initialization. +Without RAM this leaves the system with just CPU registers and a +memory mapped boot medium of which code can be executed in place (XIP) +to initialize the hardware. + +For simple dram controllers that environment is just fine and +initializing the dram controller in assembly code without using a +stack is very doable. Modern systems however require complex +initialization procedures and writing that in assembly would be either +too hard or simply impossible. To overcome this limitation a technique +called Cache-as-Ram exists. It allows to use the CPU's Cache-as-Ram. +This in turn makes it possible to write the early initialization code +in a higher level programming language compiled by a regular compiler, +e.g. GCC. Typically only the stack is placed in Cache-as-Ram and the +code is still executed in place, but copying code in the cache as ram +and executing from there is also possible. + +For coreboot this means that with only a very small amount of assembly +code to set up CAR, all the other code can written in C, which +compared to writing assembly tremendously increases development speed +and is much less error prone. + +## A bit of history on Cache-as-Ram in coreboot + +### Linux as a BIOS? The invention of romstage + +When LinuxBIOS, later coreboot, started out, the first naive attempt +to run the Linux kernel instead of the legacy BIOS interface, was to +jump straight to loading the kernel. This did not work at all, since +dram was not working at that point. LinuxBIOS v1 therefore +initialized the memory in assembly before it was able to load the +kernel. This executed in place and would later become romstage. It +turned that Linux needed other devices, like enumerated and have it's +resources allocated, so yet another stage, ramstage, was called into +life, but that's another story. + +### coreboot v2: romcc + +In 2003 AMD's K8 CPU's came out. Those CPU's featured an on die memory +controller and also a new point to point BUS for CPU <-> CPU and +CPU <-> IO controller (northbridge) devices. Writing support for +these in assembly became difficult and a better solution was needed. +This is where ROMCC came to the rescue in 2002. ROMCC is a 25K lines +of code C compiler, written by Eric Biederman, that does not use a +stack but uses CPU registers instead. Besides the 8 general purpose +registers on x86 the MMX and SSE registers, respectively _%mm0_-_%mm7_ +and _%xmm0_-_%xmm7_, can be used to store data. ROMCC also converts +all function calls to static inline functions. This has the +disadvantage that a ROMCC compiled stage must be compiled in one go, +linking is not possible. In practice you will see a lot of '#include +_some_file.c_' which is quite unusual in C, because it makes the +inclusion of files fragile. Given that limited amount +'register-stack', it also imposes restrictions on how many levels of +nested function can be achieved. One last issue with ROMCC is that +no-one really dared touching that code, due to its size and complexity +besides its original author. + +Coreboot used ROMCC in the following way: a bootblock written in both +assembly to enter 32bit protected mode would use ROMCC compiled code +to either jump to 'normal/romstage' or 'fallback/romstage' which in +turn was also compiled with romstage. + +### Cache-as-Ram + +While ROMCC was eventually used for early initialization, prior to +that development attempts were made to use the CPU's cache as RAM. +This would allow to use regular GCC compiled and linked code to +be used. The idea is to set up the CPU's cache to Write Back to a +region, while disable cacheline filling or simply hope that no cache +eviction or invalidation happens. Eric Biederman initially got CAR +working, but with the advent of Intel Hyperthreading on the Pentium 4, +on which CAR setup proved to be a little more difficult and he +developed ROMCC instead. + +Around 2006 a lot was learned from coreboot v2 with its ROMCC usage +and development on coreboot v3 was started. The decision was mode to +not use ROMCC anymore and use Cache Memory for early stages. The +results were good as init object code size is reduced by factor of +four. So the code was smaller and faster at the same time. + +coreboot v4 sort of merged v2 and v3, so it was a mix of ROMCC compiled +romstage and romstage with CAR. + +In 2014 Support for ROMCC compiled romstage was dropped. + +### Intel Apollolake, a new era: POSTCAR_STAGE and C_ENVIRONMENT_BOOTBLOCK + +In 2016 Intel released the ApolloLake architecture. This architecture +is a weird duck in the X86 pool. It is the first Intel CPU to feature +MMC as a boot medium and does not memory map it. It also features to +the main CPU read only SRAM that is mapped right below 4G. The +bootblock is copied to SRAM and executed. Given that the boot medium +is not always memory mapped XIP is not an option. The solution is to +set up CAR in the bootblock and copy the romstage in CAR. We call this +C_ENVIRONMENT_BOOTBLOCK, because it runs GCC compiled code. Granted +XIP is still possible on ApolloLake, but coreboot has to use Intel's +blob, FSP-M, and it is linked to run in CAR, so a blob actually forced +a nice feature in coreboot! + +Another issue arises with this setup. With romstage running from the +read only boot medium, you can continue executing code (albeit without +stack) to tear down the CAR and start executing code from 'real' RAM. +On ApolloLake with romstage executing from CAR, tearing down CAR in +there, would shooting in ones own foot, as our code would be gone +in doing so. The solution was to tear down CAR in a separate stage, +named 'postcar' stage. This provides a clean separation of programs +and results in needing less linker scripts hacks in romstage. This +solution was therefore adopted by many other platforms that still did +XIP. + +### AMD Zen, 2019 + +On AMD Zen the first CPU to come out of reset is the PSP and it +initializes the dram before pulling out the main CPU out of reset. +CAR won't be needed, nor used on this platform. + +### The future? + +For coreboot release 4.11, scheduled for october 2019, support for +ROMCC in the bootblock will be dropped as will support for the messy +tearing down of CAR in romstage as opposed to doing that in a separate +stage. + +## In depth analysis + +### CAR setup + +TODO: MTRR, IPI/hyperthreaded, ROM Caching, Non-evict + +### Linker symbols + +TODO: sharing things over multiple CAR stages: car.ld + +### CAR teardown + +TODO: postcarstage, NO_CAR_GLOBAL_MIGRATION + +## References +[2007, The Open Source BIOS is Ten An interview with the coreboot developers](http://www.h-online.com/open/features/The-Open-Source-BIOS-is-Ten-An-intervi...) +[A Framework for Using Processor Cache-as-Ram (CAR)](https://www.coreboot.org/images/6/6c/LBCar.pdf) +[CAR: Using Cache-as-Ram in LinuxBIOS](https://www.coreboot.org/data/yhlu/cache_as_ram_lb_09142006.pdf) diff --git a/Documentation/index.md b/Documentation/index.md index 1c04ad3..85666cb 100644 --- a/Documentation/index.md +++ b/Documentation/index.md @@ -177,6 +177,7 @@ * [Display panel-specific documentation](gfx/display-panel.md) * [Architecture-specific documentation](arch/index.md) * [Platform independend drivers documentation](drivers/index.md) +* [CPU-specific documentation](cpu/index.md) * [Northbridge-specific documentation](northbridge/index.md) * [System on Chip-specific documentation](soc/index.md) * [Mainboard-specific documentation](mainboard/index.md)