Attention is currently required from: Eran Mitrani, Jakub Czapiga, Kapil Porwal, Nico Huber, Paul Menzel, Rishika Raj, Subrata Banik, Tarun.
Julius Werner has posted comments on this change by Rishika Raj. ( https://review.coreboot.org/c/coreboot/+/83540?usp=email )
Change subject: soc/intel/mtl: Increase CAR_STACK_SIZE by 31KB for coreboot compatibility ......................................................................
Patch Set 8:
(1 comment)
Commit Message:
https://review.coreboot.org/c/coreboot/+/83540/comment/637703b9_ed577ab1?usp... : PS5, Line 7: soc/intel/meteorlake: Increase CAR STACK_SIZE by 31KB to meet coreboot requirements
Thanks, that's much clearer. I'm much more used to know how much stack we need. […]
I'm not sure how much static analysis can do there because coreboot does have a bunch of code paths that are affected by runtime values (and a few recursive functions), so I'm not sure if it's really possible to reliably determine stack use at build time. If you know some solution that can do that it would be great if you could integrate it into the CI, of course. We do use `-Wstack-usage=1536` on non-x86 boards and maybe we should do something like that on x86 as well, but I believe that only checks each function on its own.
There is code in `romstage_main()` to place and check stack canaries and I assume that it should fire if we have an overflow like we're suspecting, but it just writes a BIOS_DEBUG to the console and nothing else. My suspicion is that the FSP may only get into the path that really makes it use its whole 512K in very rare and unreproducible circumstances (e.g. related to memory training), and we've never managed to capture the boot that actually fails in our case on a UART. All we are seeing is corrupted vboot data structures persisted in the TPM that look like they would have been created by coreboot trying to write back vboot state from memory that has been overwritten with a bunch of 0xff.
I am not against moving the UPD and other big things off the stack either if someone wants to do that. (I've also been thinking about doing some other things to harden against stack overflows, e.g. move the stack to the end of the CAR space and let it grow downwards into unallocated space, but I likely won't have time to finish that work any time soon unfortunately.) For now, we're just trying to do the most simple thing that should reasonably ensure we're not going to have a problem on the Meteorlake platform.