Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34476 )
Change subject: Rampayload: Attempt to boot coreboot without ramstage ......................................................................
Patch Set 5:
Patch Set 5:
<- snip ->
I think that analysis should be documented to live on as well as serve as a basis for the approach as a whole. I'm still not convinced that the approach we're taking is the right way.
Sure i will add a live document link to capture this.
I just meant it should be in some documentation in the tree as well as the commit description.
<- snip ->
If not, what other low hanging fruit is there?
As i told low hanging fruits would be limiting the time that we are spending during PCI enumeration and setting up the resources. I would advocate about limiting the functionality in ramstage as well (without introducing RAMPAYLOAD concept), if i could skip entire PCI tree enumeration and do in a fixed manner means a way where user provides list of PCI devices (might have from BIOS side to boot to paylaod/kernel) based on some Kconfig options. Ultimately BIOS is only bother about fixing some chipset WA/recommended programming and setting up BAR to communicate with devices to boot from.
Today entire PCI enumeration takes ~160ms+ time, which we can limit and still could able to boot to kernel.
30:device enumeration 466,492 (50,800) 40:device configuration 589,305 (21,695) 50:device enable 638,470 (258) 60:device initialization 665,478 (27,007) 70:device setup done 755,515 (90,037) 75:cbmem post 756,226 (711) 80:write tables 756,437 (210)
Also loading any additional stage takes some time. Just loading fallback/ramstage will take ~30ms
8:starting to load ramstage 383,195 (138) 15:starting LZMA decompress (ignore for x86) 383,210 (14) 16:finished LZMA decompress (ignore for x86) 408,031 (24,820) 9:finished loading ramstage 415,242 (7,211) 550:starting to load Chrome OS VPD 415,321 (78) 10:start of ramstage 415,692 (370)
<- snip ->
It seems you've identified a feature you want to add that reduces PCI enumeration and init costs (or speed up the current approach). That's a way different direction than trying to recreate ramstage on a piecemeal basis. Given what you are indicating that seems like the better approach.
I believe you are referring to reduced feature ramstage (without introducing RAMPAYLOAD concept) ?
FWIW, putting CAR teardown on the front of ramstage would get rid of the extra stage load. That was my point.
yes, for sure we will save postcar loading time. as postcar by default is smaller in size hence saving won't be that much as compare to ramstage but i totally agree here.
At what cost? Putting in lots of device specific code to program bars as needed? Similarly, can't this be achieved by tweaking the exiting boot flow? In the end it has to be an option because one cannot rely on the eventual kernel/payload performing the actions you are trying to defer here.
I guess the concern would be how do we limit our FW space and make FW doing minimum then what is doing today. You are right that skipping entire PCI enumeration in BIOS space means it will happen inside kernel (if kernel has provision) and it might add up to same time (if we bother about end to end booting time). But the counter argument would be why to bloat BIOS with an already existed kernel feature and user won't love to see more time spent on FW space and might interested to see OS UI as soon as power on the device. if we could save ~200ms+ time using this approach then we would be closer to ~600ms of booting time (per with ABL standard even with CB without making any HW BOM change)
This is a new feature as well. Providing BARs to statically allocate and slam in for initialization in the boot flow. I definitely don't think we should be manually solving such a thing though. i.e. writing specific code for initializing BARs. It should be done at build time and instructions for which BAR/register to init should just be executed. That is a more scalable approach if one wanted to go down this path.
yes, Aaron. i'm looking for something like that and would be great if you can help on this line.
Agree that there might be some price to pay for that and it could have done using existing ramstage way as well (Furquan had requested for the same and i have provided him the time estimation). Based on above data, you can say, dropping fallback/ramstage might saves you ~40ms of boot time and ~150kB * 3 copies = ~450kB spi footprint reduction over doing everything on top of ramstage. (without rampayload)
Again, this is another goal: reduce size of ramstage. Great. Where's the analysis with the break down of code size? And why aren't we targeting ways to reduce that directly in ramstage?
We are planning to present those data in OSFC hopefully :) . originally i have done this POC to gather those number in boot time saving, spi flash saving etc. Without running a POC, it might not helpful to have. And if you see the RAMPAYLOAD patch series, i haven't introduced any new piece that exclusive for RAMPAYLOAD activity. I was just organizing the current code to reduce POC code size.