For 1, this is attempting to protect physical attack. Obviously this particular problem can't be solved in isolation, but it's something to think about.
But isn't this something that per-file hashing would probably make easier to protect against, not harder? I mean, right now we just hash the whole region once and then assume it stays good -- there is no protection. I doubt this would really become easier to solve if you split the CBFS up into chunks... you'd have to somehow built a system where a whole chunk is loaded into memory at once, verified there, and then every file we may want to access from it is accessed in that same stage from that cached version in memory.
The file is the natural unit which is loaded at a time, so I'd think scoping the verification to that would make it easiest to verify on load. I mean, on Arm we already always load whole files at a time anyway, so it's really just a matter of inserting the verification step between load and decompression on the buffer we already have. (On x86 it may be a bit different, but it should still be easier to find space to load a file at a time than to load a larger CBFS chunk at a time.)
When discussing 2 from a practical matter, we need to pass on the metadata information across stages to help mitigate 1 and ensure integrity of the hashes are correct.
This is true -- but isn't this the same for all solutions? No matter how you scope verification, if you want to verify things at load time then every stage needs to be able to run verification, and it needs some kind of trust anchor passed in from a previous stage for that. I also don't think this should be a huge hurdle... we're already passing vboot workbuffer metadata, this just means passing something more in there. (For per-file hashing, I'd assume you'd just pass the single hash covering all the metadata, and then all the metadata is verified again on every CBFS walk.)
Similarly, limiting complexity is also important. If we can group the assets together that are tied to the boot flow then it's conceptually easier to limit access to regions that haven't been checked yet or shouldn't be accessed. I do think per-file metadata hashing brings a lot of complications in implementation. When in limited resource environments chunking cbfs into multiple regions lends itself well to accomplishing that while also restricting access to data/regions that aren't needed yet when thinking about limiting #1.
Fair enough. I agree limiting complexity is always important, I'm just not ad-hoc convinced that a solution like you describe would really be less complex than per-file hashing. I think that while it makes the CBFS access code itself a bit more complex, it would save complexity in other areas (e.g. if you have arbitrary chunks then something must decide which chunk to use at what time and which file to pack into which chunk, all of which is extra code). Assuming we can "get it right once", I think per-file hashing should be a solution that will "just work" for whatever future platform ports and general features want to put in CBFS (whereas a solution where everyone who wants to put another file into CBFS must understand the verification solution well enough to make an informed decision on which chunk to place something into may end up pushing more complexity onto more people).
Anyway, I didn't want to derail this thread into discussing CBFS verification, I just wanted to mention that I still think the per-file hashing is a good idea and worth discussing. We should have a larger discussion about the pros and cons of possible approaches before we decide what we're planning to do (and then someone still needs to find time to do it, of course ;) ).