Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/34476 )
Change subject: Rampayload: Attempt to boot coreboot without ramstage ......................................................................
Patch Set 5:
Patch Set 5:
Patch Set 5:
Patch Set 5:
Patch Set 5:
Patch Set 5:
What are the limitations of romstage/ramstage pair that this is trying to solve?
we can remove romstage from the scope of coreboot-lite/rampaylaod work discussion as romstage is going to stay here for sure
So are we packing in memory training and other things into this new stage. i.e. combine romstage and ramstage?
We might don't want to disturb romstage if we have postcar existed, then basically romstage -> postcar -> kernel
From a high level, that's how I read what you are attempting, but there's lots of weird details there so I don't think that's happening. I still think we're doing romstge -> newstage -> kernel
you are right that postcar might be extended in feature to pull in required functions from ramstage to boot to OS. Like MP init, ASL generation
what we are trying to achieve is that avoid loading 1 dedicated stage (i.e ramstage here) and if we can just pull in required functionality into previous stage for booting a platform.
So far we have seen ~240ms of time saving in this approach with some known WA.
Can you quantify where the savings is coming from? We should have an idea what is the source of the savings because we could focus on improving that aspect in ramstage.
yes, i have those break down, i will share time details to you over email as i can't attach .log files here with RAMPAYLOAD enable.
I think that analysis should be documented to live on as well as serve as a basis for the approach as a whole. I'm still not convinced that the approach we're taking is the right way.
Sure i will add a live document link to capture this.
migrating to rampayload might be long pole for chrome platform but we might think about some use case for coreboot to compete with ABL, where bootloader has to be more thinner and meet certain boot time restriction.
As I noted before, you'll be pulling in almost all of the functionality from ramstage in order to get a system proper configured.
i'm also afraid of same situation but so far if i look at my POC patch for reference. From POC CL to here, i have almost ported required functionalities like MP init, AML generation into this CL.
Now next steps would be thinking about possible solutions to make dynamic BAR assignment of limited PCI resource (Input, Output and boot device) and see how do we create run time AML for peripheral devices (touch, tpm etc). In this process we might need to compile FSP call into previous stage (postcar here)
I think it'd be instructive to do a side by side comparison of our "traditional" boot flow along w/ the actions/details each stage is performing against your target boot flow. There are high level semantics that are needed to be employed. e.g. CAR tear down for a clean program environment, etc. Those building blocks would ideally be able to be moved around to form different boot flows. However, I think we should understand the current limitations and see if we can improve those before going down a more complicated solution. e.g. do you save time by tearing down CAR in ramstage itself? i.e. prologue of tearing down CAR is in first part of ramstage.
if i'm not wrong, we are tearning down the CAR in postcar for IA platform and i don't see much savings if we don't tear the CAR in postcar.
That's not what I'm saying. My mental model is that you are slowly adding in features/properties of ramstage into postcar as you realize you need them. As such this comes back to where the savings in boot time is coming from. If we just bolted on CAR teardown in ramstage entry does that reap the gains that you are seeing?
no, CAR teardown in ramstage won't help much
If not, what other low hanging fruit is there?
As i told low hanging fruits would be limiting the time that we are spending during PCI enumeration and setting up the resources. I would advocate about limiting the functionality in ramstage as well (without introducing RAMPAYLOAD concept), if i could skip entire PCI tree enumeration and do in a fixed manner means a way where user provides list of PCI devices (might have from BIOS side to boot to paylaod/kernel) based on some Kconfig options. Ultimately BIOS is only bother about fixing some chipset WA/recommended programming and setting up BAR to communicate with devices to boot from.
Today entire PCI enumeration takes ~160ms+ time, which we can limit and still could able to boot to kernel.
30:device enumeration 466,492 (50,800) 40:device configuration 589,305 (21,695) 50:device enable 638,470 (258) 60:device initialization 665,478 (27,007) 70:device setup done 755,515 (90,037) 75:cbmem post 756,226 (711) 80:write tables 756,437 (210)
Also loading any additional stage takes some time. Just loading fallback/ramstage will take ~30ms
8:starting to load ramstage 383,195 (138) 15:starting LZMA decompress (ignore for x86) 383,210 (14) 16:finished LZMA decompress (ignore for x86) 408,031 (24,820) 9:finished loading ramstage 415,242 (7,211) 550:starting to load Chrome OS VPD 415,321 (78) 10:start of ramstage 415,692 (370)
at high level i think 2 place where definitely we could improve would be
- Avoid entire PCI enumeration and setup
At what cost? Putting in lots of device specific code to program bars as needed? Similarly, can't this be achieved by tweaking the exiting boot flow? In the end it has to be an option because one cannot rely on the eventual kernel/payload performing the actions you are trying to defer here.
I guess the concern would be how do we limit our FW space and make FW doing minimum then what is doing today. You are right that skipping entire PCI enumeration in BIOS space means it will happen inside kernel (if kernel has provision) and it might add up to same time (if we bother about end to end booting time). But the counter argument would be why to bloat BIOS with an already existed kernel feature and user won't love to see more time spent on FW space and might interested to see OS UI as soon as power on the device. if we could save ~200ms+ time using this approach then we would be closer to ~600ms of booting time (per with ABL standard even with CB without making any HW BOM change)
Agree that there might be some price to pay for that and it could have done using existing ramstage way as well (Furquan had requested for the same and i have provided him the time estimation). Based on above data, you can say, dropping fallback/ramstage might saves you ~40ms of boot time and ~150kB * 3 copies = ~450kB spi footprint reduction over doing everything on top of ramstage. (without rampayload)
- Executing any stage has its own problem of finding from cbfs, decompressesion and loading into memory.
But this is a solved problem. All stages know how to load next program.
Sorry for the confusion, i mean to say loading time, not the locating mechanism :) i have provided time saving data above.
This approach actually tries to broadly avoid these 2 things in ramstage and achieve the same via postcar itself