Aaron Durbin has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34991 )
Change subject: arch/x86: Include stack_top region into postcar_frame ......................................................................
Patch Set 4:
Patch Set 4:
Patch Set 4:
Patch Set 4:
Patch Set 4:
This commit is in my opinion somehow tied to POSTCAR_STAGE=n developments and does not have to be merged until we have decisions about them. Let me know if I misunderstood something. If I see relevant +2 I will change my vote to -1.
No its not, as i have explained above about why this stack region also need to inform, irrespective of its bene postcar_stage=y or n. i guess we are lucky that APL didn't complain but looks like we are ignoring a range where actual data been stored.
@Aaron, do you think this is more specific to APL as you have implemented platform_segment_loaded() in apl/car.c ?
prog_segment_loaded() was introduced for that as well as some other archs. However, the semantics aren't super clear (my fault) if it's for coherency purposes or not. postcar loader (or even ramstage) are implicitly assuming the target address regions are coherent. This actually plays out in practice, currently, because the target DRAM regions are UC on x86. By definition there is no coherency issue since everything hits DRAM. If one wants to add WB caching to the target regions (all of cbmem in this case + tseg for stage cache) then now cache management routines are required. It gets more complicated when one is doing this while utilizing CAR (either stack/data or code) as it depends on CAR implementation and uarch. prog_segment_loaded() can be used for that purpose. It does get hairy pretty quickly because all the memory touched during load isn't explicitly tracked since there's an implicit assumption about the loading inherently being coherent.
Thanks Aaron for clearing our the assumption and implicit requirement of why prog_segment_loaded() been introduced and what we like to achieve.
Question: do you like to see this CL going in or you think its more of an exploratory work?
I don't think this hurts anything. However, I think we should verify the stage cache loading works as well. I think it's important we have solutions for all the cases -- not just partial ones.
Yes, stage cache loading is working fine right, we don't see any issue. I have run this entire patchsets on KBL, CML, ICL. Can you please let me know which one you like to check ?
stage cache can utilize cbmem or tseg. If those regions are marked as WB cacheable, it's the same problem as what this patch is trying to correct -- ensuring dirty lines make it to dram. However, loading from stage cache is also an issue.
For initial loading: stage_cache_add() is used to cache the contents of the loaded program. If this is done when WB caching attributes are covering the stage cache region (cbmem or tseg) then there are dirty lines there. Both ext_stage_cache and cbmem_stage_cache both just memcpy() into the region. There's nothing that ensures that region, if marked WB cacheable, hits dram.
The cbfs_prog_stage_load() and rmodule_stage_load() loading code have the requisite prog_segment_loaded() calls. That is why those are covered for the text, data, and bss segments.
For loading during resume: stage_cache_load_stage() is utilized which is a glorified memcpy() but in reverse from the initial loading. Same issues apply w.r.t. the target region being marked WB cacheable. However, instead of cbfs_prog_stage_load() and rmodule_stage_load() which have the necessary prog_segment_loaded() calls for the text, data, and bss there's nothing there. finalize_load() in postcar_loader handles the postcar stack frame but nothing else. Thus, we're exposed on the resume path as well.
Those are the cases I was referring to w.r.t. having partial solutions. None of these scenarios exists right now because we aren't adding WB cacheable regions to the MTRR solution utilized while operating with CAR. Writes are hitting DRAM directly which means everything is coherent. This CL doesn't fix existing use-cases, but it does make things more correct. However, as I noted above there are still issues lurking if one attempted to pursue making some of these target regions cacheable. postcar stage itself remains relatively small, and that's why we don't see as big of a penalty for loading it as opposed to trying to load ramstage (way bigger) from romstage directly. Once we get into postcar stage we're running out of ram with a fuller MTRR solutions and can speed up loading and running. The only little bit of execution from UC ram is when we tear down and initialize a new set of MTRRs.