Hello coreboot community,
One of coreboot's goals (according to the home page) is to be lightning fast! I'm currently working on improving the boot time for the AMD Cezanne platform. One place where we spend a decent amount of time is reading from SPI flash. We have the SPI speed/modes set to the optimal settings for our platforms, but there is still room for improvement.
One feature the AMD platforms provide is a SPI DMA controller. It allows performing upto 64 KiB reads from SPI to RAM without the CPU being involved. This opens up the door for parallelizing SPI reads and coreboot execution. This effectively makes SPI read times 0 as far as the BSP is concerned (assuming you can preload early enough).
Another difference between the latest AMD SoCs (Picasso, Cezanne), is that RAM is available in bootblock. This means that we can preload compressed stages into RAM before they are required. This will require a larger RAM footprint.
The question is, how do we model these asynchronous operations, how is data ownership handled, and how does the BSP know the operation is done?
I have proposed a Futures API [here](https://review.coreboot.org/c/coreboot/+/56047) that I think addresses all the questions. I think it is simple to understand, hard for consumers to misuse, and pretty straightforward to implement.
The patch train also contains adding a `readat_async` to `rdev`, adding `cbfs_load_async`, APOB (i.e., MRC) preloading, and compressed payload preloading. This so far has saved over 30ms. I still have the goal of preloading the VBIOS, uCode, and ramstage. I want to do all of this while adding minimal complexity to the synchronous path.
I'm curious to see what the community thinks, and welcome any feedback.
Thanks, Raul