Hey Paul, I'll respond based on my experience trying to get the AMD platforms to the kernel in under a second. This has become very difficult for us, which is why you see us pushing commits to shave off every bit of time that we can. If we can reliably shave 5ms off of the boot time with a patch, that's worth doing, because those changes add up.
On AMD, it takes between 50 and 100ms from the time you press the power button just to start running the code from the SPI rom. The on-chip bootloader, baked into the silicon, needs that to start running, find the EFS section in the SPI ROM, load the off-chip PSP bootloader and jump to it. That's up to 10% of the entire time we're allowed for booting, and we haven't even started running from SPI yet.
AMD needs to run the verstage on the PSP's ARM core so that it can be done before memory initialization. Doing that takes longer than running the same verification on a higher-speed x86 core. This is why we're looking forward to CBFS verification, but we're just starting to be able to use that now.
Next, the firmware that comes from AMD is not optimized for speed nearly as much as we need, and I'd assume that it's similar for Intel. The ChromeOS team does as much as we can to overcome that, but anything we change from what we're given is something that we're now responsible for, and have to maintain. This, as you might guess, is somewhat less than optimal, especially considering that we're not the experts for these pieces of firmware. This is not faulting those teams - it's just not made the same priority for other platforms.
In the FSP, there's lots of overhead because of everything needed for UEFI, and because commands sent to peripherals are typically written as "send a command to do something, then pause until it's done." For AMD Chromebooks, this is a significant problem every time we need to communicate with the SMU, because every command happens synchronously. We can't just send a command, go on to something else, and get back to the SMU later. That just isn't how the interface is written. We'd *love* to have something where we could send command1, then command2, come back later, get a response from command 2, and later still get a response from command 1, but that just isn't how it works currently.
Then there's loading and decompressing from the SPI ROM. This is a huge trade-off between space, SPI bus speed, and compression/decompression ratios and time. As the ROMs have gotten larger, associated firmware has gotten larger, so takes longer to load. We run the SPI buses as fast as we can for a given design, but chromebooks are limited to dual-SPI because of the need for the hardware write-protect pin used for security, which is muxed with one of the data pins needed for quad-SPI. This effectively cuts the bandwidth in half from what other systems that don't need the WP pin on the SPI ROM can use.
That's where that "finished loading body" 200ms comes from - that's loading the entire coreboot image from SPI into memory so that it can be verified. This is another place where CBFS verification will make a huge difference. Currently, we read it into memory, verify the entire section, then discard it and read it from SPI again when CBFS loads it. With CBFS verification, we'll only load it a single time for verification AND use.
We want to upgrade our compression algorithms at coreboot to something newer that compresses better, and decompresses faster, but that's still work in progress at this point.
Another piece which we don't control is that storage device initialization at the end of the boot sequence. Different storage mediums & brands take different amounts of time to initialize, and that's just something we tend to have to live with. That's why we wanted to start that initialization earlier as a part of coreboot, so we didn't just need to wait until the payload to start the setup.
Here's how things broke down on a recent measurement I did:
PSP & ABL - 325ms Verstage - 130ms Bootblock - 22ms Romstage - 114ms FSP-M - 197ms Ramstage - 122ms FSP-S - 146ms Depthcharge - 102ms Total - 1,161ms
Of that time spent in coreboot bootblock, romstage, and ramstage (discounting the FSP calls) of 259ms, almost 170ms is *just* loading other firmware from SPI and decompressing it.
I know this doesn't address your question about the Intel CSE, but hopefully this helps some in understanding where the time is spent. Martin
Sep 28, 2023, 11:35 by pmenzel@molgen.mpg.de:
Dear Google Chromebook folks,
I got a Dell Latitude 5340 Chromebook with
coreboot-v1.9308_26_0.0.22-26491-g38c6637296
With hot MRC cache, ChromeOS login screen is reached after around six seconds, which is great. According to the firmware time stamps in `chrome://system`, coreboot takes 1.2 until starting depthcharge:
82 entries total:
990:CSME ROM started execution 0 944:CSE sent 'Boot Stall Done' to PMC 119,000 945:CSE started to handle ICC configuration 119,000 (0) 946:CSE sent 'Host BIOS Prep Done' to PMC 123,000 (4,000) 947:CSE received 'CPU Reset Done Ack sent' from PMC 307,000 (184,000) 0:1st timestamp 342,717 (35,717)
So the Converged Security Engine alone takes 342 ms – as long as coreboot takes to execute on older systems.
11:start of bootblock 352,538 (9,821) 12:end of bootblock 358,690 (6,151) 501:starting to load verstage 358,693 (3) 502:finished loading verstage 359,785 (1,092) 5:start of verified boot 361,037 (1,252) 503:starting to initialize TPM 361,739 (701) 504:finished TPM initialization 450,095 (88,356)
88 ms is high, but I think common for TPM initialization.
505:starting to verify keyblock/preamble (RSA) 471,441 (21,345) 506:finished verifying keyblock/preamble (RSA) 476,234 (4,793) 507:starting to verify body (load+SHA2+RSA) 476,256 (22) 508:finished loading body 679,079 (202,822)
Another 200 ms is spent here.
509:finished calculating body hash (SHA2) 686,553 (7,473) 510:finished verifying body signature (RSA) 688,054 (1,501) 511:starting TPM PCR extend 690,511 (2,456) 512:finished TPM PCR extend 719,107 (28,596) 513:starting locking TPM 719,107 (0) 514:finished locking TPM 733,290 (14,183) 6:end of verified boot 747,454 (14,163) 13:starting to load romstage 747,563 (108) 14:finished loading romstage 747,575 (12) 1:start of romstage 747,587 (12) 515:starting EC software sync 747,596 (8) 516:EC vboot hash ready 749,955 (2,359) 517:waiting for EC to allow higher power draw 753,883 (3,928) 518:finished EC software sync 755,385 (1,501) 948:starting CSE firmware sync 756,428 (1,043) 949:finished CSE firmware sync 759,141 (2,712) 970:loading FSP-M 760,597 (1,456) 2:before RAM initialization 760,603 (5) 950:calling FspMemoryInit 785,801 (25,198) 951:returning from FspMemoryInit 825,967 (40,165) 550:starting to load ChromeOS VPD 826,013 (45) 551:finished loading ChromeOS VPD (RO) 826,099 (86) 552:finished loading ChromeOS VPD (RW) 826,132 (32) 3:after RAM initialization 826,196 (64) 4:end of romstage 827,576 (1,379) 100:start of postcar 836,651 (9,074) 101:end of postcar 836,651 (0) 8:starting to load ramstage 836,655 (3) 15:starting LZMA decompress (ignore for x86) 836,701 (46) 16:finished LZMA decompress (ignore for x86) 866,740 (30,039) 9:finished loading ramstage 866,880 (139) 10:start of ramstage 866,929 (48) 17:starting LZ4 decompress (ignore for x86) 867,936 (1,006) 18:finished LZ4 decompress (ignore for x86) 907,401 (39,465) 30:device enumeration 913,217 (5,816) 971:loading FSP-S 923,807 (10,589) 954:calling FspSiliconInit 934,237 (10,430) 955:returning from FspSiliconInit 968,817 (34,579) 962:calling FspMultiPhaseSiInit 968,822 (5) 963:returning from FspMultiPhaseSiInit 1,146,574 (177,751)
177 ms are spent here in the FSP blob?
942:before sending EOP to ME 1,146,823 (248) 40:device configuration 1,149,840 (3,016) 50:device enable 1,162,440 (12,600) 60:device initialization 1,167,018 (4,578) 114:started elog init 1,168,482 (1,464) 115:finished elog init 1,170,862 (2,379) 15:starting LZMA decompress (ignore for x86) 1,174,928 (4,065) 16:finished LZMA decompress (ignore for x86) 1,175,208 (280) 70:device setup done 1,186,831 (11,622) 75:cbmem post 1,187,518 (686) 80:write tables 1,187,593 (75) 85:finalize chips 1,195,483 (7,889) 553:started TPM enable update 1,204,169 (8,685) 554:finished TPM enable update 1,223,291 (19,122) 90:starting to load payload 1,223,317 (26) 15:starting LZMA decompress (ignore for x86) 1,223,459 (141) 16:finished LZMA decompress (ignore for x86) 1,236,494 (13,035) 943:after sending EOP to ME 1,240,485 (3,990) 99:selfboot jump 1,240,607 (122) 1000:depthcharge start 1,248,752 (8,144) 1002:RO vboot init 1,249,204 (452) 1020:vboot select&load kernel 1,249,228 (23) 1040:finished storage device initialization 1,282,580 (33,352) 1050:finished reading kernel from disk 1,288,042 (5,461) 1100:finished vboot kernel verification 1,295,930 (7,887) 1101:jumping to kernel 1,314,111 (18,180)
Total Time: 1,314,072
I thought ChromeOS requirement is to reach `selfboot jump` below (still quite high) one second. Or is the actually met by not counting from power-on but from 1st timestamp?
Does somebody know, why the CSE takes so long? Is Intel aware of that and going to improve that? I think there was something similar in the past, where Duncan Laurie(?), experienced, that coreboot was faster than the ME initialization, and had to wait for the Intel ME to send the ready signal. I think Intel improved on that later on?
Kind regards,
Paul