Attention is currently required from: Jason Glenesk, Anjaneya "Reddy" Chagam, Raul Rangel, Marshall Dawson, Jonathan Zhang, Johnny Lin, Morgan Jang, Kyösti Mälkki, Aaron Durbin, Patrick Rudolph, Felix Held. Arthur Heymans has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/54301 )
Change subject: [WIP]arch/x86: Tear down CAR in ramstage ......................................................................
Patch Set 2:
(1 comment)
Commit Message:
https://review.coreboot.org/c/coreboot/+/54301/comment/c57a9ca7_57d2fa58 PS2, Line 15: likely to be slow.
Yes, and among many other things. Caching teardown in general will have the same problem as all the old ones -- just this time they are sitting in ramstage.
https://review.coreboot.org/q/topic:%22WIP_wb_cache_postcar%22+(status:open%...) Sets up caching for cbmem and uses clflush before jumping to postcar. It seems to work well and improves bootspeed a tiny bit, mostly due to compressing postcar.
Just so the background is captured. We used to teardown cache as ram in romstage then return back into C code to load ramstage. Variable juggling is hard there because much of the code is expecting state to be maintained. That's why we added the relocatable car variable stuff; it was complicated but necessary to reuse code from areas that had SRAM that was maintained or targeting ramstage environment.
So we put in postcar to provide a clean boundary between romstage, ramstage, and semantics of dealing w/ cache as ram backing store disappearing. postcar provides smallish, though it seems ROM space is so tight one can't afford it (would be good to see the numbers. it can be compressed as well), environment where it can be loaded in uncached DRAM, clean up cache-as-ram, enable caching for DRAM, and start running all the fancy C code we have.
I don't think the assumption that the code tearing down CAR has to be in uncached DRAM is necessary. The code has to hit DRAM for sure and uncached DRAM is one way of achieving that. I experimented with caching DRAM before loading code (postcar stage) in there and use CLFLUSH afterwards to make sure it hits DRAM. It seems to work well and allows for efficient decompressing of the stage. I don't have numbers yet but I would think that this would also work well with a bigger LZMA compressed ramstage.
Moving that to stuff to ramstage inherently means you are loading a bigger footprint into uncacheable space. And one needs to ensure that cache-as-ram teardown is done from assembly (which this CL does) and no fancy anything until caching is set up.
Long story short, I think re-mixing things isn't the best direction because this area is pretty darn complicated to get right. I think we have a pretty good abstraction w/ postcar to where it makes many things straight forward. I would be curious to know numbers on the pressure people are seeing w.r.t. ROM footprints and if there's anything else we can do to help reduce it.