Julius Werner has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/38423 )
Change subject: cbfs: Add metadata cache ......................................................................
Patch Set 17:
(1 comment)
https://review.coreboot.org/c/coreboot/+/38423/10/src/lib/cbfs.c File src/lib/cbfs.c:
https://review.coreboot.org/c/coreboot/+/38423/10/src/lib/cbfs.c@28 PS10, Line 28: if (err == CB_CBFS_CACHE_FULL)
There are other ways such that RW_SECTION_A/B only has the next stage in its cbfs with another cbfs with the rest of the files.
RO -> RW_SECTION_A -> RW_SECTION_FULL
Yeah, but... do we want that? I feel like this would add a grotesque overcomplication to our otherwise neat and clean RO/RW CBFS concept to solve what I really think is a non-issue. I mean part of the whole reason I'm doing all of this is to maintain backwards-compatibility to old tooling, old payloads, etc. If we suddenly start changing CBFS section names and splitting things up into multiple pieces I might as well throw all of this away and design a new file system from scratch that's more suited for verification.
I find the argument about wasting time hard to swallow since you are explicitly allowing CACHE_FULL to be a 1st class citizen on chrome os which inherently requires a re-walk of the entire metadata of the cbfs to re-validate the metadata for anything that falls out of the cache.
But you're comparing a common case time waste to an extremely rare case here. I mean in how many RW updates in Chrome OS history did we actually add a new file that didn't exist in the old version? And even then it would most likely fit because there's most likely a bunch of slack in the cache space anyway.
I think it's important to have as little brittleness from the RO binding to RW, and we're now enforcing a tight coupling that is forcing some not so optimal decisions.
I'm confused... isn't the point of having the fallback exactly that there's no tight coupling (the RW will still always work even if it exceeds the bounds that were presumed in the RO)?
Also, I don't think there's anything "brittle" about the fallback path -- it is not unsafe or insecure or in any way worse than using the mcache other than a tiny speed hit, and I would continue to sleep perfectly sound if we actually did end up hitting that very unlikely set of circumstances to trigger it on a board. That's the reason I implemented this fallback because I *don't* want people to have to worry about what happens if their update overflows the cache. And the speed hit is really just the cost of a normal CBFS lookup (without CONFIG_TOCTOU_SAFETY, which we're not planning to enable, the metadata hash doesn't get checked again in that path), the very thing we have always been doing all the time before this patch anyway, and only for the few files that actually exceed the cache (which on Chrome OS we could for example order to be files like ecrw that aren't needed on every boot).
So basically, the way I see this is that I uploaded an optimization patch here that can make booting maybe ~20ms faster, and what we're discussing right now is "but what if the stars align and something happens that probably never actually happened in practice in the past 5 years and cause this to only make the boot 19ms faster on one board?". From that point of view, I'm kinda surprised how concerned you are about this case.
especially fixing the size of the metadata cache when adding to cbmem when we have memory to play with to handle the full cache size.
I mean, if this would help allay your concerns I'm happy to add code to the CBMEM hook that would rebuild the full cache in DRAM if the pre-RAM cache had overflowed (although I think that the cumulated extra boot time from loading all the mcache building code into romstage on every board would far, far exceed the time saved in these very rare cases).
Also, FWIW we can really enforce requirements as high as we want on the Chrome OS side. The amount of real cache used is printed in the log so if you want us to always have at least XX% free space in our RW mcache or room for at least Y files, I can write you a FAFT test to guarantee that.