Aaron Durbin has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/35187 )
Change subject: [NOTFORMERGE] soc/intel exit_car.S ......................................................................
Patch Set 2:
(1 comment)
https://review.coreboot.org/c/coreboot/+/35187/2/src/soc/intel/common/block/... File src/soc/intel/common/block/cpu/car/exit_car.S:
https://review.coreboot.org/c/coreboot/+/35187/2/src/soc/intel/common/block/... PS2, Line 48: /* Disable cache ??? */
Patrick Rudolph CB:34791 Aug 08 20:26
My tests on Sandy Bridge showed that MTRRs are only updated on CR0.CD=1 to CR0.CD=0 transitions. All of AMD and early Intel CPUs follow this scheme and update MTRRs with cache disabled. Isn't that transition required any more starting with APL?
The above observation has never been my understanding recently.
Not related to CD=1, but DEF_MTRR.E=0:
E (MTRRs enabled) flag, bit 11 — MTRRs are enabled when set; all MTRRs are disabled when clear, and the UC memory type is applied to all of physical memory.
So once MTRRs are disable there won't be snooping in the caches.
There's also this line eluding to behavior which aligns to my understanding that I previously described:
Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0.)
But this one is a gem and aligns to my microarchitectural implementation defined behavior for older processors:
Some older IA-32 processors used the UC memory type when loading the PDPTEs. Some processors may use the UC memory type if CR0.CD = 1 or if the MTRRs are disabled. These behaviors are model-specific and not architectural.
There's a fuller description in 11.5.1 Cache Control Registers and Bits, but this is a snippet:
CD flag, bit 30 of control register CR0 — Controls caching of system memory locations (see Section 2.5, “Control Registers”). If the CD flag is clear, caching is enabled for the whole of system memory, but may be restricted for individual pages or regions of memory by other cache-control mechanisms. When the CD flag is set, caching is restricted in the processor’s caches (cache hierarchy) for the P6 and more recent processor families and prevented for the Pentium processor (see note below). With the CD flag set, however, the caches will still respond to snoop traffic. Caches should be explicitly flushed to insure memory coherency. For highest processor performance, both the CD and the NW flags in control register CR0 should be cleared. Table 11-5 shows the interaction of the CD and NW flags. The effect of setting the CD flag is somewhat different for processor families starting with P6 family than the Pentium processor (see Table 11-5). To insure memory coherency after the CD flag is set, the caches should be explicitly flushed (see Section 11.5.3, “Preventing Caching”). Setting the CD flag for the P6 and more recent processor families modify cache line fill and update behaviour. Also, setting the CD flag on these processors do not force strict ordering of memory accesses unless the MTRRs are disabled and/or all memory is referenced as uncached (see Section 8.2.5, “Strengthening or Weakening the Memory-Ordering Model”).
From that commit comments, I did not find reference to Intel docs describing NEM at a satisfactory level of details. Instead I read about random lockups in ramstage (MP init) and consistent delay being triggered inside vboot payload after we had only added WB MTRRs in postcar frame. With access to BWG you probably can find exact paragraphs showing that it is no longer required to toggle CR0.CD to correcly exit non-evict mode??
Perhaps. However, many many paragraphs are just copied forward w/o necessarily relating to reality. That said, if it makes you feel comfortable we can actually disable the cache and invalidate it. It's just not necessary in my recent memory and understanding of implementation.
The sequence for CONFIG_INTEL_CAR_NEM below is the same for sandy/ivy, except that the implementation Google provided does set CR0.CD here first. Having peeked into some random FSP-M disassembly, CR0.CD was definetly getting set at several locations, but with a 5 minute effort it's hard to see code that is on TempRamExit() execution path. That's the closed-source CAR teardown implementation.