Subrata Banik has submitted this change. ( https://review.coreboot.org/c/coreboot/+/75511?usp=email )
Change subject: include/cpu/x86: Skip `wbinvd` on CPUs with cache self-snooping (SS) ......................................................................
include/cpu/x86: Skip `wbinvd` on CPUs with cache self-snooping (SS)
This patch refers and backport some of previous work from Linux Kernel (https://lore.kernel.org/all/1561689337-19390-3-git-send-email-ricardo. neri-calderon@linux.intel.com/T/#u) that optimizes the MTRR register programming in multi-processor systems by relying on the CPUID (self-snoop feature supported).
Refer to the details below:
Programming MTRR registers in multi-processor systems is a rather lengthy process as it involves flushing caches. As a result, the process may take a considerable amount of time. Furthermore, all processors must program these registers serially.
`wbinvd` instruction is used to invalidate the cache line to ensure that all modified data is written back to memory. All logical processors are stopped from executing until after the write-back and invalidate operation is completed.
The amount of time or cycles for WBINVD to complete will vary due to the size of different cache hierarchies and other factors. As a consequence, the use of the WBINVD instruction can have an impact on response time.
As per measurements, around 98% of the time needed by the procedure to program MTRRs in multi-processor systems is spent flushing caches with wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32 Architectures Software Developer's Manual, it is not necessary to flush caches if the CPU supports cache self-snooping (ss).
"Flush all caches using the WBINVD instructions. Note on a processor that supports self-snooping, CPUID feature flag bit 27, this step is unnecessary."
Thus, skipping the cache flushes can reduce by several tens of milliseconds the time needed to complete the programming of the MTRR registers:
Platform Before After 12-core (14 Threads) MeteorLake 35ms 1ms
BUG=b:260455826 TEST=Able to build and boot google/rex.
Change-Id: I83cac2b1e1707bbb1bc1bba82cf3073984e9768f Signed-off-by: Subrata Banik subratabanik@google.com Reviewed-on: https://review.coreboot.org/c/coreboot/+/75511 Tested-by: build bot (Jenkins) no-reply@coreboot.org Reviewed-by: Jérémy Compostella jeremy.compostella@intel.com Reviewed-by: Lean Sheng Tan sheng.tan@9elements.com Reviewed-by: Himanshu Sahdev himanshu.sahdev@intel.com Reviewed-by: Tarun Tuli taruntuli@google.com --- M src/include/cpu/x86/cache.h 1 file changed, 14 insertions(+), 1 deletion(-)
Approvals: Lean Sheng Tan: Looks good to me, approved build bot (Jenkins): Verified Jérémy Compostella: Looks good to me, but someone else must approve Himanshu Sahdev: Looks good to me, but someone else must approve Tarun Tuli: Looks good to me, approved
diff --git a/src/include/cpu/x86/cache.h b/src/include/cpu/x86/cache.h index 4143d97..d4d9160 100644 --- a/src/include/cpu/x86/cache.h +++ b/src/include/cpu/x86/cache.h @@ -9,9 +9,11 @@ #define CR0_NoWriteThrough (CR0_NW)
#define CPUID_FEATURE_CLFLUSH_BIT 19 +#define CPUID_FEATURE_SELF_SNOOP_BIT 27
#if !defined(__ASSEMBLER__)
+#include <arch/cpuid.h> #include <stdbool.h> #include <stddef.h>
@@ -51,6 +53,16 @@ write_cr0(cr0); }
+/* + * Cache flushing is the most time-consuming step when programming the MTRRs. + * However, if the processor supports cache self-snooping (ss), we can skip + * this step and save time. + */ +static __always_inline bool self_snooping_supported(void) +{ + return (cpuid_edx(1) >> CPUID_FEATURE_SELF_SNOOP_BIT) & 1; +} + static __always_inline void disable_cache(void) { /* Disable and write back the cache */ @@ -58,7 +70,8 @@ cr0 = read_cr0(); cr0 |= CR0_CD; write_cr0(cr0); - wbinvd(); + if (!self_snooping_supported()) + wbinvd(); }
#endif /* !__ASSEMBLER__ */