Subrata Banik has uploaded this change for review. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and car tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~2-4ms time savings in warm reboot case with this CL.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 17 insertions(+), 0 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/1
diff --git a/src/soc/intel/cannonlake/romstage/romstage.c b/src/soc/intel/cannonlake/romstage/romstage.c index 94b9899..04a9d53 100644 --- a/src/soc/intel/cannonlake/romstage/romstage.c +++ b/src/soc/intel/cannonlake/romstage/romstage.c @@ -127,6 +127,21 @@ printk(BIOS_DEBUG, "%d DIMMs found\n", mem_info->dimm_cnt); }
+/* + * Make sure we are enabling intermediate caching to speed up next stage + * (postcar/romstage) loading and decompression as we are still in romstage + * and car tear down will be handled by next stage at its entry. + */ +static void enable_ramstage_caching(uintptr_t base, size_t size) +{ + int mtrr = get_free_var_mtrr(); + + if (mtrr == -1) + return; + + set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT); +} + asmlinkage void car_stage_entry(void) { bool s3wake; @@ -160,6 +175,8 @@ printk(BIOS_DEBUG, "top_of_ram = 0x%lx\n", top_of_ram); top_of_ram -= 16*MiB; postcar_frame_add_mtrr(&pcf, top_of_ram, 16*MiB, MTRR_TYPE_WRBACK); + /* enabling intermediate caching */ + enable_ramstage_caching(top_of_ram, 16*MiB);
/* Cache the ROM as WP just below 4GiB. */ postcar_frame_add_romcache(&pcf, MTRR_TYPE_WRPROT);
Hello Aaron Durbin, Patrick Rudolph, Aamir Bohra, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#2).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and car tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~3-5ms time savings in warm reboot case with this CL.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 27 insertions(+), 5 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/2
Paul Menzel has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 2: Code-Review+1
(4 comments)
Nice.
https://review.coreboot.org/c/coreboot/+/34791/2//COMMIT_MSG Commit Message:
https://review.coreboot.org/c/coreboot/+/34791/2//COMMIT_MSG@11 PS2, Line 11: car CAR
https://review.coreboot.org/c/coreboot/+/34791/2//COMMIT_MSG@14 PS2, Line 14: CL. On what board?
https://review.coreboot.org/c/coreboot/+/34791/2/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/2/src/soc/intel/cannonlake/ro... PS2, Line 133: car CAR
https://review.coreboot.org/c/coreboot/+/34791/2/src/soc/intel/cannonlake/ro... PS2, Line 177: enabling enable
Hello Aaron Durbin, Patrick Rudolph, Aamir Bohra, Paul Menzel, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#3).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and car tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~3-5ms time savings in warm reboot case with this CL.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 27 insertions(+), 5 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/3
Hello Aaron Durbin, Patrick Rudolph, Aamir Bohra, Paul Menzel, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#4).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and CAR tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~3-5ms time savings in warm reboot case with this CL on CML-Hatch.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 27 insertions(+), 5 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/4
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 4:
(4 comments)
https://review.coreboot.org/c/coreboot/+/34791/2//COMMIT_MSG Commit Message:
https://review.coreboot.org/c/coreboot/+/34791/2//COMMIT_MSG@11 PS2, Line 11: car
CAR
Ack
https://review.coreboot.org/c/coreboot/+/34791/2//COMMIT_MSG@14 PS2, Line 14: CL.
On what board?
Ack
https://review.coreboot.org/c/coreboot/+/34791/2/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/2/src/soc/intel/cannonlake/ro... PS2, Line 133: car
CAR
Ack
https://review.coreboot.org/c/coreboot/+/34791/2/src/soc/intel/cannonlake/ro... PS2, Line 177: enabling
enable
Ack
Aaron Durbin has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 4:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... PS4, Line 150: size_t top_of_ram_size; You can just do:
const size_t top_of_ram_size = 16*MiB;
Arthur Heymans has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 4:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... PS4, Line 142: set_var_mtrr Don't you have to disable caching before configuring MTRR's?
Hello Aaron Durbin, Patrick Rudolph, Aamir Bohra, Paul Menzel, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#5).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and CAR tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~3-5ms time savings in warm reboot case with this CL on CML-Hatch.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 26 insertions(+), 5 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/5
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 4:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... PS4, Line 150: size_t top_of_ram_size;
You can just do: […]
Done
Aaron Durbin has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size); Do we want to add this as an option to postcar api? Basically provide a runtime mtrr option to be applied prior to loading?
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... PS4, Line 142: set_var_mtrr
Don't you have to disable caching before configuring MTRR's?
disable_cache() and enable_cache() around will cause sluggishness. i have same exp in fast_spi_cache_bios_region()
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
Do we want to add this as an option to postcar api?
that should be ideal but how do we know which MTRR is runtime MTRR and need to program before loading postcar stage? Because when control reaches postcar, inside postcar_frame we have multiple MTRR base and size, there is no such order to tell that index[0] is the one what i have to set before loading posrcar.
Basically provide a runtime mtrr option to be applied prior to loading?
Also changing postcar API might require to modify all SOC romstage.c code.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
Do we want to add this as an option to postcar api?
that should be ideal but how do we know which MTRR is runtime MTRR and need to program before loading postcar stage? Because when control reaches postcar, inside postcar_frame we have multiple MTRR base and size, there is no such order to tell that index[0] is the one what i have to set before loading posrcar.
Aaron, do you mean call set_var_mtrr(base, size) from postcar_frame_add_mtrr() function itself so all soc can leverage this benefit without dedicated soc code change ?
Arthur Heymans has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
Do we want to add this as an option to postcar api? […]
Maybe add a function like postcar_frame_add_romcache() but for both setting up the postcar frame MTRR as setting up the MTRR to speed up operation on the area below cbmem?
Furquan Shaikh has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 135: enable_ramstage_caching Just thinking out loud: Would enabling caching for the CBMEM region here when running from a CAR stage result in coherency issues?
Arthur Heymans has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 135: enable_ramstage_caching
Just thinking out loud: Would enabling caching for the CBMEM region here when running from a CAR sta […]
I'd not expect this to happen if the CPU is in non-evict mode
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(2 comments)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 135: enable_ramstage_caching
I'd not expect this to happen if the CPU is in non-evict mode
yes, i don't think there would be any cache cohencency issue here
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
Maybe add a function like postcar_frame_add_romcache() but for both setting up the postcar frame MTR […]
got it, we can do that
Patrick Rudolph has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT); I wonder how this works. My tests on Sandy Bridge showed that MTRRs are only updated on CR0.CD=1 to CR0.CD=0 transitions. All of AMD and early Intel CPUs follow this scheme and update MTRRs with cache disabled. Isn't that transition required any more starting with APL? "Intel® 64 and IA-32 Architectures Software Developer’s Manual" doesn't mention that.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT);
I wonder how this works. […]
if you could see since SPT (SKL) we are using similar approach. I don't recall about older PCH platform mechanism.
Please refer to https://github.com/coreboot/coreboot/blob/ca38fbcdbfcb5024496d2577f71de06745...
grep -rsn "fast_spi_cache_bios_region" src/
src/soc/intel/skylake/bootblock/cpu.c:23: fast_spi_cache_bios_region(); src/soc/intel/icelake/bootblock/cpu.c:25: fast_spi_cache_bios_region(); src/soc/intel/apollolake/bootblock/bootblock.c:107: fast_spi_cache_bios_region(); src/soc/intel/apollolake/cpu.c:287: fast_spi_cache_bios_region(); src/soc/intel/cannonlake/bootblock/cpu.c:26: fast_spi_cache_bios_region();
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 5:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
got it, we can do that
*sigh*
You are doing multiple things here (again) making review harder than necessary.
Anyways: Making any RAM region WB cacheable here conflicts with 'invd' in exit_car.S. The good news is, there are conditions when you can switch that to 'wbinvd' and I think all Intel where CAR does not overlay RAM (it's high in MMIO window) could do that.
Also I would expect the real perfomance come from enabling WB for TSEG here. That's were stage cache would live. So again... I am claiming the postcar frame has the MTRR setup you would want.
Hello Patrick Rudolph, Aaron Durbin, Aamir Bohra, Paul Menzel, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#6).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and CAR tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~3-5ms time savings in warm reboot case with this CL on CML-Hatch.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 8 insertions(+), 5 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/6
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
*sigh* […]
Maybe add a function like postcar_frame_add_romcache() but for both setting up the postcar frame MTRR as setting up the MTRR to speed up operation on the area below cbmem?
@Arthur: CL is here https://review.coreboot.org/c/coreboot/+/34805
Also I would expect the real perfomance come from enabling WB for TSEG here.
here is the snapshot of MTRR at romstage end (before loading postcar)
0x00000000fef00006: PHYBASE0: Address = 0x00000000fef00000, WB 0x0000007ffffc0800: PHYMASK0: Length = 0x0000000000040000, Valid 0x00000000ff000005: PHYBASE1: Address = 0x00000000ff000000, WP 0x0000000fff000800: PHYMASK1: Length = 0x0000007001000000, Valid
and entire cbmem region here whatever we have marked from top_of_ram-16MB to 16MB is out of MTRR snapshot and majority of activities lies in this region.
And due to below reason, we can't make top_of_ram region as WB. So intention here is to make required region as WP to make use of caching for "smaller" windows for now till we loading postcar and postcar MTRR stack frame again sets things up as per expectation.
Anyways: Making any RAM region WB cacheable here conflicts with 'invd' in exit_car.S.
with this CL, MTRR snapshot at romstage before loading postcar
0x00000000fef00006: PHYBASE0: Address = 0x00000000fef00000, WB 0x0000007ffffc0800: PHYMASK0: Length = 0x0000000000040000, Valid 0x00000000ff000005: PHYBASE1: Address = 0x00000000ff000000, WP 0x0000007fff000800: PHYMASK1: Length = 0x0000000001000000, Valid 0x0000000099000005: PHYBASE2: Address = 0x0000000099000000, WP 0x0000007fff000800: PHYMASK2: Length = 0x0000000001000000, Valid
and after loading postcar
0x00000000ff000005: PHYBASE0: Address = 0x00000000ff000000, WP 0x0000007fff000800: PHYMASK0: Length = 0x0000000001000000, Valid 0x0000000099000006: PHYBASE1: Address = 0x0000000099000000, WB 0x0000007fff000800: PHYMASK1: Length = 0x0000000001000000, Valid
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 135: enable_ramstage_caching
yes, i don't think there would be any cache cohencency issue here
Ack
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/4/src/soc/intel/cannonlake/ro... PS4, Line 142: set_var_mtrr
disable_cache() and enable_cache() around will cause sluggishness. […]
Ack
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
Maybe add a function like postcar_frame_add_romcache() but for both setting up the postcar frame […]
Also refer to this https://github.com/coreboot/coreboot/blob/ca38fbcdbfcb5024496d2577f71de06745...
TSEG and top_of_ram might not pointed to same MMIO address based on other memory reservation (DMAPR, PRM, ME stolen, PTT etc)
Patrick Rudolph has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT);
if you could see since SPT (SKL) we are using similar approach. […]
Yes, I can see that newer Intel code uses a similar approach. Trying to explain why code is correct by just referencing it doesn't work.
The question is WHY does it use to work, not which code also does it.
The following code does use CR0.CD=1: Linux kernel: arch/x86/kernel/cpu/mtrr/mtrr.c coreboot ramstage MTRR code: src/cpu/x86/mtrr/mtrr.c coreboot intel car code: soc/intel/common/block/cpu/car/cache_as_ram.S
If it would be that easy to set up MTRRs from within CAR, why wasn't it done in 7f8afe063139f6fc7076a3e4edf6093a953792dc ?
For my understanding the postcar frame needs to be filled with MTRRs, because you can only updated them after CAR has been disabled.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT);
Yes, I can see that newer Intel code uses a similar approach. Trying to explain why code is correct by just referencing it doesn't work.
The question is WHY does it use to work, not which code also does it.
We are already in NEM mode hence those mechanism is different than the one you have mentioned above
1. NEM Init 2. Configure DataStack region as WB memory type using a single variable MTRR 3. Configure BIOS Code region as WP memory type using a variable MTRR
Notes for Use of Cache for Stack and Code 1. Use of INVD will result in loss of DataStack data 2. Use of WBINVD or CLFLUSH is not recommended.
The following code does use CR0.CD=1: Linux kernel: arch/x86/kernel/cpu/mtrr/mtrr.c coreboot ramstage MTRR code: src/cpu/x86/mtrr/mtrr.c coreboot intel car code: soc/intel/common/block/cpu/car/cache_as_ram.S
If it would be that easy to set up MTRRs from within CAR, why wasn't it done in 7f8afe063139f6fc7076a3e4edf6093a953792dc ?
For my understanding the postcar frame needs to be filled with MTRRs, because you can only updated them after CAR has been disabled.
yes, we are disabling CAR and updating all required MTRR filled by postcar frame inside postcar and enabling MTRR back & enabling cache.
Patrick Rudolph has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT);
Yes, I can see that newer Intel code uses a similar approach. […]
Where's that different mechanism for NEM documented?
If it would be that easy to set up MTRRs from within NEM why does the ROMCC code and the postcar stage code update MTRRs only with NEM disabled?
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT);
Where's that different mechanism for NEM documented?
In general NEM mode and MTRR details its part of SDM and product BWG for anything more specific, you can find at public intel domain.
If it would be that easy to set up MTRRs from within NEM why does the ROMCC code and the postcar stage code update MTRRs only with NEM disabled?
postcar stage setting up MTRR with DRAM resources after we have initialized DIMM and main memory is available. Its recommended to tear down the car and move into main memory as soon as its available hence NEM mode existed and postcar stage sets new MTRR snapshots with 2 MTRR types
1. ROM cache 2. top of ram cache
why do we want to live into NEM mode when main memory is available.
Patrick Rudolph has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT);
Where's that different mechanism for NEM documented? […]
Thinking about it, setting MTRRs might have only an effect if CQOS NEM is used. Using regular NEM the whole cache would be locked and you wont see any speed up on RAM access as there are no free cache lines.
As there's no documentation for CQOS, it's just a guess.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT);
Thinking about it, setting MTRRs might have only an effect if CQOS NEM is used. […]
i don't think its anything specific to NEM enhance or CQOS. Its in general NEM mode thing as i understood.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
Also refer to this https://github. […]
NEM enhance or CQOS is special because it treated data and code cache differently than whats been done in NEM mode, but that doesn't mean that NEM enhance and CQOS provides more cache size than actually its available. CPU cache sizes are fixed in both cases
I could see the same benefit with very old SKL code when we don't have NEM enhanced implemented
Aaron Durbin has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 6:
(2 comments)
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 142: set_var_mtrr(mtrr, base, size, MTRR_TYPE_WRPROT);
i don't think its anything specific to NEM enhance or CQOS. […]
One can dynamically add to the MTRR solution. However, I'm not sure that it works on all x86 systems when CD=1. CD=1 turns on the use of caches -- I don't think it says anything about making the registers take effect (though that could be an implementation), and such behavior would be surprising to me.
Also, the assumption here is that the address range being requested is not overlapping with CAR address space. If it was then, yes, everything is busted.
https://review.coreboot.org/c/coreboot/+/34791/5/src/soc/intel/cannonlake/ro... PS5, Line 177: enable_ramstage_caching(top_of_ram, top_of_ram_size);
NEM enhance or CQOS is special because it treated data and code cache differently than whats been do […]
w.r.t. invd instruction in exit_car.S doesn't matter. Things are already loaded and flushed out by way of prog_segment_loaded(). The expectation is that by the time invd is ran dirty lines in the cache are not needed and therefore lost.
The name of the function is confusing. It's marking the top of ram WB as noted.
Hello Aaron Durbin, Patrick Rudolph, Aamir Bohra, Paul Menzel, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#7).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and CAR tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~3-5ms time savings in warm reboot case with this CL on CML-Hatch.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 8 insertions(+), 5 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/7
Hello Aaron Durbin, Patrick Rudolph, Aamir Bohra, Paul Menzel, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#8).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and CAR tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~3-5ms time savings in warm reboot case with this CL on CML-Hatch.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 8 insertions(+), 5 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/8
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 9:
(1 comment)
RFC CB:34897
I did not even try to make it pass builds yet.
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */ You do not setup TSEG cache here in the original code? That's where your ramstage will be copied before the jump, see run_ramstage(). Unless you have NO_STAGE_CACHE=y in your config.
Maybe I missed something? That would totally have messed up any performance numbers you have gathered so far.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 9:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
You do not setup TSEG cache here in the original code? That's where your ramstage will be copied before the jump, see run_ramstage(). Unless you have NO_STAGE_CACHE=y in your config.
You are correct, we should have also cache TSEG region for that matter. This will give some more savings for sure
Maybe I missed something? That would totally have messed up any performance numbers you have gathered so far.
performance numbers are coming "not from stage_cache" (for now what I'm measuring) rather between (top_of_ram - 16MB) till top_of_ram, in this area we have entire operation going on.
For an example: I'm working on CML hatch.
postcar gets loaded into 0x99c20000 size 0x4598 ramstage gets loaded into 0x99bcb000 size 0x5ddd8\
[all above ranges are between between (top_of_ram - 16MB) till top_of_ram, top_of_ram = 0x9a000000 and TSEG is between 0x9a000000 till 0x9ac00000]
in case of S3 resume, yes, it won't load from cbfs rather it will pick from stage_cache, there we should get savings and we should also try to make TSEG area also into WB cache if possible.
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 10:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
You do not setup TSEG cache here in the original code? That's where your ramstage will be copied b […]
run_ramstage() has this:
stage_cache_add(STAGE_RAMSTAGE, &ramstage);
That is effectively memcpy(@TSEG, @CBMEM, sizeof(RAMSTAGE))
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 10:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
run_ramstage() has this: […]
yes, ramstage execution will help for sure. Today postcar MTRR frame has only 2 MTRR range (atleast on IA side i can tell, latest platform)
1.ROM mapped 2. DRAM top 16MB
both these ranges doesn't cover your TSEG range which will be higher address space than Bootloader TOLUM memory, hence you can enable caching for that range as well if you wish
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 10:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
yes, ramstage execution will help for sure. […]
You ** NEED ** to make the perfomance comparison with TSEG marked WB. Please provide the actual output of cbmem -t (or -T) somewhere. Otherwise the 4 milliseconds you advertise in your commit message is just bogus number.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 10:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
You ** NEED ** to make the perfomance comparison with TSEG marked WB.
can you please submit that CL to enable TSEG range marked WB and share the savings because its your idea.
Please provide the actual output of cbmem -t (or -T) somewhere. Otherwise the 4 milliseconds you advertise in your commit message is just bogus number.
Move inevitable saving would be if romstage loading ramstage as ramstage is much bigger than postcar without CL when we are running from UC vs running with cached (WP)
Without CL:
localhost ~ # cbmem -t 58 entries total:
0:1st timestamp 12,942 5:start of verified boot 39,867 (26,924) 503:starting to initialize TPM 40,503 (636) 504:finished TPM initialization 78,397 (37,893) 505:starting to verify keyblock/preamble (RSA) 79,753 (1,355) 506:finished verifying keyblock/preamble (RSA) 94,591 (14,838) 507:starting to verify body (load+SHA2+RSA) 94,593 (2) 508:finished loading body (ignore for x86) 218,853 (124,260) 509:finished calculating body hash (SHA2) 237,757 (18,903) 510:finished verifying body signature (RSA) 240,336 (2,578) 511:starting TPM PCR extend 241,700 (1,364) 512:finished TPM PCR extend 257,240 (15,540) 513:starting locking TPM 257,240 (0) 514:finished locking TPM 265,500 (8,259) 6:end of verified boot 273,796 (8,295) 13:starting to load romstage 273,814 (18) 14:finished loading romstage 273,814 (0) 1:start of romstage 273,819 (5) 950:calling FspMemoryInit 276,698 (2,878) 951:returning from FspMemoryInit 301,959 (25,260) 4:end of romstage 308,139 (6,180) 100:start of postcar 309,030 (890) 101:end of postcar 309,030 (0) 8:starting to load ramstage 309,207 (177) 15:starting LZMA decompress (ignore for x86) 309,215 (7) 16:finished LZMA decompress (ignore for x86) 334,927 (25,712) 9:finished loading ramstage 342,141 (7,213) 550:starting to load Chrome OS VPD 342,219 (78) 10:start of ramstage 342,553 (333) 30:device enumeration 393,554 (51,001) 954:calling FspSiliconInit 402,074 (8,520) 955:returning from FspSiliconInit 489,795 (87,720) 40:device configuration 505,392 (15,597) 956:calling FspNotify(AfterPciEnumeration) 539,808 (34,415) 957:returning from FspNotify(AfterPciEnumeration) 540,151 (342) 50:device enable 540,339 (188) 60:device initialization 560,092 (19,752) 70:device setup done 592,849 (32,757) 75:cbmem post 593,397 (548) 80:write tables 593,516 (119) 15:starting LZMA decompress (ignore for x86) 596,352 (2,835) 16:finished LZMA decompress (ignore for x86) 596,611 (259) 85:finalize chips 597,292 (680) 90:load payload 611,144 (13,852) 15:starting LZMA decompress (ignore for x86) 611,431 (286) 16:finished LZMA decompress (ignore for x86) 659,263 (47,831) 958:calling FspNotify(ReadyToBoot) 659,773 (509) 959:returning from FspNotify(ReadyToBoot) 662,956 (3,183) 960:calling FspNotify(EndOfFirmware) 663,051 (95) 961:returning from FspNotify(EndOfFirmware) 663,457 (406) 99:selfboot jump 663,930 (472) 1000:depthcharge start 663,948 (18) 1002:RO vboot init 664,061 (112) 1020:vboot select&load kernel 664,064 (3) 1030:finished EC verification 684,564 (20,499) 1040:finished storage device initialization 685,653 (1,089) 1050:finished reading kernel from disk 692,897 (7,243) 1100:finished vboot kernel verification 831,888 (138,990) 1101:jumping to kernel 834,390 (2,502)
Total Time: 821,421
With CL:
localhost ~ # cbmem -t 58 entries total:
0:1st timestamp 12,006 5:start of verified boot 38,361 (26,355) 503:starting to initialize TPM 38,995 (634) 504:finished TPM initialization 76,816 (37,821) 505:starting to verify keyblock/preamble (RSA) 78,224 (1,407) 506:finished verifying keyblock/preamble (RSA) 92,933 (14,709) 507:starting to verify body (load+SHA2+RSA) 92,935 (2) 508:finished loading body (ignore for x86) 217,221 (124,285) 509:finished calculating body hash (SHA2) 236,124 (18,902) 510:finished verifying body signature (RSA) 238,813 (2,688) 511:starting TPM PCR extend 240,175 (1,361) 512:finished TPM PCR extend 255,745 (15,570) 513:starting locking TPM 255,745 (0) 514:finished locking TPM 264,098 (8,352) 6:end of verified boot 272,439 (8,341) 13:starting to load romstage 272,457 (18) 14:finished loading romstage 272,457 (0) 1:start of romstage 272,463 (5) 950:calling FspMemoryInit 275,352 (2,889) 951:returning from FspMemoryInit 300,751 (25,399) 4:end of romstage 305,902 (5,150) 100:start of postcar 306,792 (889) 101:end of postcar 306,792 (0) 8:starting to load ramstage 306,986 (193) 15:starting LZMA decompress (ignore for x86) 306,988 (2) 16:finished LZMA decompress (ignore for x86) 332,705 (25,717) 9:finished loading ramstage 339,918 (7,213) 550:starting to load Chrome OS VPD 339,996 (78) 10:start of ramstage 340,347 (351) 30:device enumeration 391,132 (50,784) 954:calling FspSiliconInit 399,673 (8,541) 955:returning from FspSiliconInit 488,172 (88,498) 40:device configuration 503,753 (15,580) 956:calling FspNotify(AfterPciEnumeration) 538,170 (34,417) 957:returning from FspNotify(AfterPciEnumeration) 538,478 (307) 50:device enable 538,666 (188) 60:device initialization 558,375 (19,708) 70:device setup done 591,293 (32,918) 75:cbmem post 591,850 (556) 80:write tables 591,969 (119) 15:starting LZMA decompress (ignore for x86) 594,841 (2,871) 16:finished LZMA decompress (ignore for x86) 595,103 (262) 85:finalize chips 595,739 (635) 90:load payload 608,651 (12,911) 15:starting LZMA decompress (ignore for x86) 608,938 (287) 16:finished LZMA decompress (ignore for x86) 656,813 (47,874) 958:calling FspNotify(ReadyToBoot) 657,323 (509) 959:returning from FspNotify(ReadyToBoot) 661,169 (3,846) 960:calling FspNotify(EndOfFirmware) 661,264 (95) 961:returning from FspNotify(EndOfFirmware) 661,671 (406) 99:selfboot jump 662,138 (467) 1000:depthcharge start 662,156 (18) 1002:RO vboot init 662,268 (111) 1020:vboot select&load kernel 662,272 (3) 1030:finished EC verification 682,521 (20,249) 1040:finished storage device initialization 683,607 (1,085) 1050:finished reading kernel from disk 690,722 (7,115) 1100:finished vboot kernel verification 828,339 (137,616) 1101:jumping to kernel 830,838 (2,498)
Total Time: 818,805
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 10:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
You ** NEED ** to make the perfomance comparison with TSEG marked WB.
can you please submit that CL to enable TSEG range marked WB and share the savings because its your idea.
No hardware to test here. Just copy the three lines from somewhere. As far as I am concerned that is a bug in original cannonlake work that was submitted. No clue why it was removed. And why some platforms guarded it with CONFIG(HAVE_SMI_HANDLER).
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 10:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
You ** NEED ** to make the perfomance comparison with TSEG marked WB. […]
i hope you like to add this right ?
smm_region(&smm_base, &smm_size); postcar_frame_add_mtrr(&pcf, smm_base, smm_size, MTRR_TYPE_WRBACK);
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 10:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
i hope you like to add this right ? […]
From provided sample:
0:1st timestamp 12,942 99:selfboot jump 663,930 (472)
663,930 - 12,942 = 650,988
0:1st timestamp 12,006 99:selfboot jump 662,138 (467)
662,138 - 12,006 = 650,132
to payload: 650,988 - 650,132 = 0,856 ms improvement to OS kernel: 821,421 - 818,805 = 2,616 ms improvement
I think you are only measuring noise and normal fluctuation of udelay(). I would remove any arguments about 4 ms being saved you have put in the commit message.
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 10:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
i hope you like to add this right ?
smm_region(&smm_base, &smm_size); postcar_frame_add_mtrr(&pcf, smm_base, smm_size, MTRR_TYPE_WRBACK);
Yes. And cbmem -t with POSTCAR_STAGE=y for comparison.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 11:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
i hope you like to add this right ? […]
CL: https://review.coreboot.org/c/coreboot/+/34995/1 i'm about to leave from office now. i will share cbmem -t log here tomorrow.
btw, my measurement data is based cbmem -t without vs with CL.
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 11:
Well my quick analysis is that biggest time savings to achieve are the certain paths of vboot that are somewhat proportional to the size of the FMAP region verified. Reduce FSP blob size to achieve improvements there. As for the biggest single timestamped section, it appears to be FspSiliconInit... do less in FSP and you get to further reduce the FSP blob size. Replacement native coreboot code is likely to execute faster and take less space.
Yes, this was written with tongue in the cheek. Give us native raminit and source, and we'll likely throw out any UEFI and instantly take away 50+ ms from cold boots. Unless the figures against a solution with POSTCAR_STAGE=y show a much more significant improvement (4 ms of 830ms is <0.5 %) I see no value in the approach of skipping the separate POSTCAR_STAGE.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 11:
Yes, this was written with tongue in the cheek. Give us native raminit and source, and we'll likely throw out any UEFI and instantly take away 50+ ms from cold boots. Unless the figures against a solution with POSTCAR_STAGE=y show a much more significant improvement (4 ms of 830ms is <0.5 %) I see no value in the approach of skipping the separate POSTCAR_STAGE.
Its not only size reduction as i have pointed out in past gerrit update. Coreboot should have flexible stage requirement, where one should be able to pick or drop a stage. if could see the note from Aaron, why he has introduced postcar and we should be able to achieve the say in ramstage as well without introducing a new stage if we wish.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 11:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
CL: https://review.coreboot.org/c/coreboot/+/34995/1 […]
Yes. And cbmem -t with POSTCAR_STAGE=y for comparison.
localhost ~ # cbmem -t 58 entries total:
0:1st timestamp 12,239 5:start of verified boot 39,018 (26,779) 503:starting to initialize TPM 39,655 (637) 504:finished TPM initialization 77,398 (37,742) 505:starting to verify keyblock/preamble (RSA) 78,750 (1,352) 506:finished verifying keyblock/preamble (RSA) 93,515 (14,765) 507:starting to verify body (load+SHA2+RSA) 93,517 (2) 508:finished loading body (ignore for x86) 217,794 (124,277) 509:finished calculating body hash (SHA2) 236,697 (18,902) 510:finished verifying body signature (RSA) 239,416 (2,718) 511:starting TPM PCR extend 240,048 (632) 512:finished TPM PCR extend 255,510 (15,462) 513:starting locking TPM 255,510 (0) 514:finished locking TPM 263,825 (8,314) 6:end of verified boot 272,109 (8,284) 13:starting to load romstage 272,127 (18) 14:finished loading romstage 272,127 (0) 1:start of romstage 272,132 (5) 950:calling FspMemoryInit 275,015 (2,882) 951:returning from FspMemoryInit 300,914 (25,899) 4:end of romstage 306,208 (5,293) 100:start of postcar 307,098 (890) 101:end of postcar 307,099 (0) 8:starting to load ramstage 307,243 (144) 15:starting LZMA decompress (ignore for x86) 307,245 (2) 16:finished LZMA decompress (ignore for x86) 333,064 (25,818) 9:finished loading ramstage 333,170 (106) 550:starting to load Chrome OS VPD 333,242 (71) 10:start of ramstage 333,615 (373) 30:device enumeration 379,313 (45,697) 954:calling FspSiliconInit 387,842 (8,529) 955:returning from FspSiliconInit 473,917 (86,074) 40:device configuration 489,653 (15,736) 956:calling FspNotify(AfterPciEnumeration) 524,022 (34,368) 957:returning from FspNotify(AfterPciEnumeration) 524,332 (309) 50:device enable 524,520 (188) 60:device initialization 544,339 (19,818) 70:device setup done 577,113 (32,774) 75:cbmem post 577,617 (503) 80:write tables 577,736 (119) 15:starting LZMA decompress (ignore for x86) 580,608 (2,871) 16:finished LZMA decompress (ignore for x86) 580,870 (261) 85:finalize chips 581,500 (630) 90:load payload 595,361 (13,860) 15:starting LZMA decompress (ignore for x86) 595,649 (287) 16:finished LZMA decompress (ignore for x86) 643,568 (47,919) 958:calling FspNotify(ReadyToBoot) 644,078 (509) 959:returning from FspNotify(ReadyToBoot) 646,522 (2,443) 960:calling FspNotify(EndOfFirmware) 646,617 (95) 961:returning from FspNotify(EndOfFirmware) 647,024 (406) 99:selfboot jump 647,489 (465) 1000:depthcharge start 647,508 (19) 1002:RO vboot init 647,608 (99) 1020:vboot select&load kernel 647,611 (3) 1030:finished EC verification 667,816 (20,204) 1040:finished storage device initialization 668,909 (1,092) 1050:finished reading kernel from disk 676,022 (7,112) 1100:finished vboot kernel verification 823,294 (147,272) 1101:jumping to kernel 825,814 (2,520)
Total Time: 813,549
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
Yes. And cbmem -t with POSTCAR_STAGE=y for comparison.
localhost ~ # cbmem -t 58 entries total:
0:1st timestamp 12,239 100:start of postcar 307,098 (890) 101:end of postcar 307,099 (0) 99:selfboot jump 647,489 (465)
I am pretty sure this was not with POSTCAR_STAGE=y. Nevertheless, it's now 635ms to enter payload, while it was 650ms before. So I assume this was POSTCAR_STAGE=n with TSEG marked WB cacheable?
I am still waiting for cbmem -t from POSTCAR_STAGE=y and TSEG marked WB cacheable. That should be your reference point for the commit message.
10:start of ramstage 340,347 (351) 30:device enumeration 391,132 (50,784)
10:start of ramstage 333,615 (373) 30:device enumeration 379,313 (45,697)
I think none of the changes around POSTCAR MTRRs would have effect after either timestamp 10 or 30 above. So I would use 99:selfboot jmp as the reference.
1100:finished vboot kernel verification 828,339 (137,616) 1100:finished vboot kernel verification 831,888 (138,990) 1100:finished vboot kernel verification 823,294 (147,272)
Don't stare at the "Total Time" line. Notice how your fastest entry to kernel had the slowest 1100:finished vboot kernel verification.
IMO this continues be much ado about nothing. You are optimizing at the wrong spot for high price to pay on code complexity (and stability against changes). Until x86 reset is even de-asserted, the hardware probably already has spent over 150ms for the voltage ramp-ups after you toggled the power button.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
Yes. And cbmem -t with POSTCAR_STAGE=y for comparison.
localhost ~ # cbmem -t 58 entries total:
0:1st timestamp 12,239 100:start of postcar 307,098 (890) 101:end of postcar 307,099 (0) 99:selfboot jump 647,489 (465)
I am pretty sure this was not with POSTCAR_STAGE=y. Nevertheless, it's now 635ms to enter payload, while it was 650ms before. So I assume this was POSTCAR_STAGE=n with TSEG marked WB cacheable?
if you have trust issue, then i would recommend you to find your HW and identify the data.
Just on trust factor, kindly note if this would have been "POSTCAR_STAGE=n" then cbmem -t entries (which is 58 in all these 3 cases) would have reduced for natural reason. (dropping post car might drop 100 and 101 isn't it ? 100:start of postcar 307,098 (890) 101:end of postcar 307,099 (0))
localhost ~ # cbmem -t 58 entries total:
I am still waiting for cbmem -t from POSTCAR_STAGE=y and TSEG marked WB cacheable. That should be your reference point for the commit message.
As i have told, my last shared data was with POSTCAR_STAGE=y and https://review.coreboot.org/c/coreboot/+/34995 CL.
if you wish to understand the savings then watch below entries in cbmem -t time between last few logs to calculate the direct impact of caching those ranges.
9:finished loading ramstage
Until x86 reset is even de-asserted, the hardware probably already has spent over 150ms for the voltage ramp-ups after you toggled the power button.
These are the pre-reset time, we have means to measure that and optimization those as well.
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
Yes. And cbmem -t with POSTCAR_STAGE=y for comparison.
localhost ~ # cbmem -t 58 entries total:
0:1st timestamp 12,239 100:start of postcar 307,098 (890) 101:end of postcar 307,099 (0) 99:selfboot jump 647,489 (465)
I am pretty sure this was not with POSTCAR_STAGE=y. Nevertheless, it's now 635ms to enter payload, while it was 650ms before. So I assume this was POSTCAR_STAGE=n with TSEG marked WB cacheable?
if you have trust issue, then i would recommend you to find your HW and identify the data.
Nah.. I just got confused because there were no "starting to load postcar" or "finished loading postcar" timestamps.
I am still waiting for cbmem -t from POSTCAR_STAGE=y and TSEG marked WB cacheable. That should be your reference point for the commit message.
As i have told, my last shared data was with POSTCAR_STAGE=y and https://review.coreboot.org/c/coreboot/+/34995 CL.
OK, good. It was CB:34995 + CB:34791 and I got confused about the entries.
For comparison, I would want to see cbmem -t posted in CB:34995 and CB:34752 comments as well, so we get the correct numbers presented in the commit messages.
Those would be just TSEG marked WB (CB:34995) and the POSTCAR_STAGE=n case (CB:34995 + CB:34791 + CB:34752).
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
Those would be just TSEG marked WB (CB:34995) and the POSTCAR_STAGE=n case (CB:34995 + CB:34791 + CB:34752).
yes, i can roll another image to get the data as you said. But I will be able to give you data by tomorrow morning time as in middle of other task. Thanks.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
[…] Those would be just TSEG marked WB (CB:34995) and the POSTCAR_STAGE=n case (CB:34995 + CB:34791 + CB:34752).
localhost ~ # cbmem -t 56 entries total:
0:1st timestamp 12,896 5:start of verified boot 39,730 (26,834) 503:starting to initialize TPM 40,330 (600) 504:finished TPM initialization 78,131 (37,801) 505:starting to verify keyblock/preamble (RSA) 79,583 (1,451) 506:finished verifying keyblock/preamble (RSA) 94,367 (14,784) 507:starting to verify body (load+SHA2+RSA) 94,369 (2) 508:finished loading body (ignore for x86) 217,020 (122,651) 509:finished calculating body hash (SHA2) 235,692 (18,671) 510:finished verifying body signature (RSA) 238,292 (2,600) 511:starting TPM PCR extend 238,920 (627) 512:finished TPM PCR extend 254,488 (15,568) 513:starting locking TPM 254,488 (0) 514:finished locking TPM 262,797 (8,308) 6:end of verified boot 271,039 (8,242) 13:starting to load romstage 271,057 (18) 14:finished loading romstage 271,058 (0) 1:start of romstage 271,063 (5) 2:before ram initialization 271,104 (41) 950:calling FspMemoryInit 273,942 (2,838) 951:returning from FspMemoryInit 300,002 (26,059) 3:after ram initialization 303,331 (3,329) 15:starting LZMA decompress (ignore for x86) 303,827 (495) 16:finished LZMA decompress (ignore for x86) 338,874 (35,047) 4:end of romstage 340,873 (1,998) 550:starting to load Chrome OS VPD 341,841 (968) 10:start of ramstage 342,241 (400) 30:device enumeration 387,612 (45,370) 954:calling FspSiliconInit 396,130 (8,517) 955:returning from FspSiliconInit 482,587 (86,456) 40:device configuration 498,273 (15,686) 956:calling FspNotify(AfterPciEnumeration) 532,691 (34,418) 957:returning from FspNotify(AfterPciEnumeration) 533,074 (383) 50:device enable 533,263 (188) 60:device initialization 553,018 (19,755) 70:device setup done 585,872 (32,853) 75:cbmem post 586,387 (514) 80:write tables 586,505 (118) 15:starting LZMA decompress (ignore for x86) 589,157 (2,651) 16:finished LZMA decompress (ignore for x86) 589,417 (260) 85:finalize chips 590,043 (626) 90:load payload 603,864 (13,820) 15:starting LZMA decompress (ignore for x86) 604,127 (263) 16:finished LZMA decompress (ignore for x86) 652,009 (47,881) 958:calling FspNotify(ReadyToBoot) 652,518 (509) 959:returning from FspNotify(ReadyToBoot) 655,539 (3,020) 960:calling FspNotify(EndOfFirmware) 655,634 (95) 961:returning from FspNotify(EndOfFirmware) 656,040 (406) 99:selfboot jump 656,509 (469) 1000:depthcharge start 656,528 (18) 1002:RO vboot init 656,640 (112) 1020:vboot select&load kernel 656,644 (3) 1030:finished EC verification 676,932 (20,288) 1040:finished storage device initialization 677,959 (1,026) 1050:finished reading kernel from disk 685,071 (7,112) 1100:finished vboot kernel verification 822,967 (137,895) 1101:jumping to kernel 825,460 (2,493)
Total Time: 812,542
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/9/src/soc/intel/cannonlake/ro... PS9, Line 167: /* Cache the ROM as WP just below 4GiB. */
Those would be just TSEG marked WB (CB:34995) and the POSTCAR_STAGE=n case (CB:34995 + CB:34791 + […]
best number might be this one.
localhost ~ # cbmem -t 56 entries total:
0:1st timestamp 12,337 5:start of verified boot 39,349 (27,011) 503:starting to initialize TPM 39,951 (601) 504:finished TPM initialization 76,640 (36,688) 505:starting to verify keyblock/preamble (RSA) 78,060 (1,420) 506:finished verifying keyblock/preamble (RSA) 92,786 (14,725) 507:starting to verify body (load+SHA2+RSA) 92,788 (2) 508:finished loading body (ignore for x86) 215,488 (122,700) 509:finished calculating body hash (SHA2) 234,161 (18,672) 510:finished verifying body signature (RSA) 236,868 (2,706) 511:starting TPM PCR extend 237,495 (627) 512:finished TPM PCR extend 253,037 (15,541) 513:starting locking TPM 253,037 (0) 514:finished locking TPM 261,325 (8,288) 6:end of verified boot 269,633 (8,307) 13:starting to load romstage 269,651 (18) 14:finished loading romstage 269,651 (0) 1:start of romstage 269,656 (5) 2:before ram initialization 269,699 (42) 950:calling FspMemoryInit 272,541 (2,841) 951:returning from FspMemoryInit 297,977 (25,435) 3:after ram initialization 301,304 (3,327) 15:starting LZMA decompress (ignore for x86) 301,800 (495) 16:finished LZMA decompress (ignore for x86) 336,745 (34,944) 4:end of romstage 338,748 (2,003) 550:starting to load Chrome OS VPD 339,715 (966) 10:start of ramstage 340,075 (360) 30:device enumeration 385,569 (45,494) 954:calling FspSiliconInit 394,083 (8,514) 955:returning from FspSiliconInit 482,500 (88,416) 40:device configuration 498,082 (15,582) 956:calling FspNotify(AfterPciEnumeration) 532,478 (34,395) 957:returning from FspNotify(AfterPciEnumeration) 532,789 (311) 50:device enable 532,977 (188) 60:device initialization 552,688 (19,710) 70:device setup done 585,699 (33,011) 75:cbmem post 586,209 (509) 80:write tables 586,328 (119) 15:starting LZMA decompress (ignore for x86) 589,021 (2,693) 16:finished LZMA decompress (ignore for x86) 589,279 (258) 85:finalize chips 589,900 (620) 90:load payload 603,089 (13,189) 15:starting LZMA decompress (ignore for x86) 603,354 (265) 16:finished LZMA decompress (ignore for x86) 651,314 (47,960) 958:calling FspNotify(ReadyToBoot) 651,824 (510) 959:returning from FspNotify(ReadyToBoot) 654,020 (2,195) 960:calling FspNotify(EndOfFirmware) 654,115 (95) 961:returning from FspNotify(EndOfFirmware) 654,522 (406) 99:selfboot jump 654,992 (470) 1000:depthcharge start 655,011 (18) 1002:RO vboot init 655,123 (112) 1020:vboot select&load kernel 655,126 (3) 1030:finished EC verification 675,365 (20,238) 1040:finished storage device initialization 676,391 (1,026) 1050:finished reading kernel from disk 683,493 (7,102) 1100:finished vboot kernel verification 820,191 (136,697) 1101:jumping to kernel 822,694 (2,502)
Total Time: 810,332
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
Patch Set 12:
(1 comment)
Sorry.. it's very hard to determine which cbmem -t is from which commit. I still do not see the baseline CB:34995 numbers anywhere, as the last two numbers you posted are both with POSTCAR_STAGE=n. Nor can I say what's the change between the last two was.
It got way too convoluted to try derive this correcly from gerrit reviews comments but it still looks like boots with POSTCAR_STAGE=y reach '99: selfboot jmp' the quickest.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
Patch Set 12:
Patch Set 12:
(1 comment)
Sorry.. it's very hard to determine which cbmem -t is from which commit. I still do not see the baseline CB:34995 numbers anywhere, as the last two numbers you posted are both with POSTCAR_STAGE=n. Nor can I say what's the change between the last two was.
It got way too convoluted to try derive this correcly from gerrit reviews comments but it still looks like boots with POSTCAR_STAGE=y reach '99: selfboot jmp' the quickest.
Agree, such big cbmem -t data are confusing. Let me summarize it here with data.
1. Baseline number without any code changes:
Total Time till picking kernel: 823,964 Total Time till picking payload: 651,510
2. With POSTCAR_STAGE=y and (CB:34805 + 34995) [romstage -> postcar -> ramstage]
Total Time till picking kernel: 813,549 Total Time till picking payload: 635,250
3. With POSTCAR_STAGE=n and (CB:34995 + CB:34791 + CB:34752) [romstage -> ramstage]
Total Time till picking kernel: 810,332 Total Time till picking payload: 642,655
Now if you compare the end to end time between #2 and #3, we will seeing savings ~3ms+ (as you have pointed rightly "Notice how your fastest entry to kernel had the slowest 1100:finished vboot kernel verification.") savings in #3 approach consistently. All the platform performance scripts we do run to meet product compliance are relying on "Total Time till picking kernel". So if we compare as below
Compare between #1 vs #2: Savings ~10ms+ of total boot time Compare between #1 vs #3: Savings ~13ms+ of total boot time
But if you compare "Total Time till picking payload" between #2 and #3 then #2 is quickest because of postcar smaller size and enabling MTRR type as WB while loading ramstage from postcar. In case of #3, romstage loading ramstage (which is bigger in size) + MTRR type set to WP (to avoid getting stuck with WB during CAR tear down).
I hope these data will help to make a logical closure on this effort.
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
Agree, such big cbmem -t data are confusing. Let me summarize it here with data.
- Baseline number without any code changes:
Total Time till picking kernel: 823,964 Total Time till picking payload: 651,510
- With POSTCAR_STAGE=y and (CB:34805 + 34995) [romstage -> postcar -> ramstage]
Total Time till picking kernel: 813,549 Total Time till picking payload: 635,250
CB:34995, adding TSEG WB is the least controversial and something that would get easily merged. It is also roughly half (~7 ms) of #1 vs #2 improvement (reaching payload) and needs to appear alone as the first commit of your patch train.
Aug 19 13:27 1100:finished vboot kernel verification 828,339 (137,616) Total Time: 818,805
Aug 20 09:55 1100:finished vboot kernel verification 823,294 (147,272) Total Time: 813,549
The reference 813,549 came from a boot with 10 ms slower 1100: step. I cannot see how 1100 would be affected of any MTRR changes made in late romstage, but maybe I am mistaken about it. Could be something about SSD and how you power-cycle/reboot that triggers a seemingly constant 10ms difference for 1100. I just don't know.
I am speculating some boots of #2 could have been 10ms better for 1100, giving us: 'Total Time till picking kernel: ~803,000'.
- With POSTCAR_STAGE=n and (CB:34995 + CB:34791 + CB:34752) [romstage -> ramstage]
Total Time till picking kernel: 810,332 Total Time till picking payload: 642,655
Now if you compare the end to end time between #2 and #3, we will seeing savings ~3ms+ (as you have pointed rightly "Notice how your fastest entry to kernel had the slowest 1100:finished vboot kernel verification.") savings in #3 approach consistently. All the platform performance scripts we do run to meet product compliance are relying on "Total Time till picking kernel". So if we compare as below
If I am correct in my comment about 1100: vboot kernel verification step, #2 POSTCAR_STAGE=y beats #3 POSTCAR_STAGE=n by 7 ms in both "Total Time till picking kernel" and "Total Time till picking payload".
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 12:
Patch Set 12:
Agree, such big cbmem -t data are confusing. Let me summarize it here with data.
- Baseline number without any code changes:
Total Time till picking kernel: 823,964 Total Time till picking payload: 651,510
- With POSTCAR_STAGE=y and (CB:34805 + 34995) [romstage -> postcar -> ramstage]
Total Time till picking kernel: 813,549 Total Time till picking payload: 635,250
CB:34995, adding TSEG WB is the least controversial and something that would get easily merged. It is also roughly half (~7 ms) of #1 vs #2 improvement (reaching payload) and needs to appear alone as the first commit of your patch train.
Aug 19 13:27 1100:finished vboot kernel verification 828,339 (137,616) Total Time: 818,805
Aug 20 09:55 1100:finished vboot kernel verification 823,294 (147,272) Total Time: 813,549
The reference 813,549 came from a boot with 10 ms slower 1100: step. I cannot see how 1100 would be affected of any MTRR changes made in late romstage, but maybe I am mistaken about it.
I'm guessing caching might have some role to play here as we see those numbers (1100:finished vboot kernel verification) are consistently remain different in 3 approach (base line, postcar + those CB CL, ramstage + those CB CL).
Could be something about SSD and how you power-cycle/reboot that triggers a seemingly constant 10ms difference for 1100. I just don't know.
I am speculating some boots of #2 could have been 10ms better for 1100, giving us: 'Total Time till picking kernel: ~803,000'.
I would also expect the same but i do see consistent number in long cycling test as well with #2 and #3 approach.
- With POSTCAR_STAGE=n and (CB:34995 + CB:34791 + CB:34752) [romstage -> ramstage]
Total Time till picking kernel: 810,332 Total Time till picking payload: 642,655
Now if you compare the end to end time between #2 and #3, we will seeing savings ~3ms+ (as you have pointed rightly "Notice how your fastest entry to kernel had the slowest 1100:finished vboot kernel verification.") savings in #3 approach consistently. All the platform performance scripts we do run to meet product compliance are relying on "Total Time till picking kernel". So if we compare as below
If I am correct in my comment about 1100: vboot kernel verification step, #2 POSTCAR_STAGE=y beats #3 POSTCAR_STAGE=n by 7 ms in both "Total Time till picking kernel" and "Total Time till picking payload".
yes, as per current data and analysis been done so far, #2 POSTCAR_STAGE=y beats #3 POSTCAR_STAGE=n by 7 ms in "Total Time till picking payload" for sure
Hello Aaron Durbin, Patrick Rudolph, Aamir Bohra, Paul Menzel, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#13).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and CAR tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~5-7ms time savings in warm reboot case with this CL on CML-Hatch.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 8 insertions(+), 5 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/13
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 13:
CB:34995, adding TSEG WB is the least controversial and something that would get easily merged. It is also roughly half (~7 ms) of #1 vs #2 improvement (reaching payload) and needs to appear alone as the first commit of your patch train.
Done.
Kyösti Mälkki has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 13:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/13/src/soc/intel/cannonlake/r... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/13/src/soc/intel/cannonlake/r... PS13, Line 159: top_of_ram -= top_of_ram_size; If cbmem_top() is 16 MiB aligned, it is very well hidden from the reader.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 13:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/13/src/soc/intel/cannonlake/r... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/13/src/soc/intel/cannonlake/r... PS13, Line 159: top_of_ram -= top_of_ram_size;
If cbmem_top() is 16 MiB aligned, it is very well hidden from the reader.
didn't get this part. top_of_ram_size been added to avoid multiple time multiplication effort (* MiB)
Hello Aaron Durbin, Patrick Rudolph, Aamir Bohra, Paul Menzel, Duncan Laurie, V Sowmya, build bot (Jenkins), Furquan Shaikh,
I'd like you to reexamine a change. Please visit
https://review.coreboot.org/c/coreboot/+/34791
to look at the new patch set (#14).
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
soc/intel/cannonlake: Speed up postcar loading using intermediate caching
This patch ensures intermediate caching is enabled to speed up loading and decompression of next stage as we are still in romstage and CAR tear down will be handled by next stage at its entry.
TEST=cbmem -t shows ~2-4ms time savings in warm reboot case with this CL on CML-Hatch.
Change-Id: I3ba63887acb5c4bdeaf3e21c24fb0e631362962c Signed-off-by: Subrata Banik subrata.banik@intel.com --- M src/soc/intel/cannonlake/romstage/romstage.c 1 file changed, 2 insertions(+), 2 deletions(-)
git pull ssh://review.coreboot.org:29418/coreboot refs/changes/91/34791/14
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34791 )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Patch Set 14:
(1 comment)
https://review.coreboot.org/c/coreboot/+/34791/13/src/soc/intel/cannonlake/r... File src/soc/intel/cannonlake/romstage/romstage.c:
https://review.coreboot.org/c/coreboot/+/34791/13/src/soc/intel/cannonlake/r... PS13, Line 159: top_of_ram -= top_of_ram_size;
didn't get this part. […]
CB:35029 will hopefully address this concern
Stefan Reinauer has abandoned this change. ( https://review.coreboot.org/c/coreboot/+/34791?usp=email )
Change subject: soc/intel/cannonlake: Speed up postcar loading using intermediate caching ......................................................................
Abandoned