On Wed, Jan 23, 2019 at 4:00 PM Julius Werner jwerner@chromium.org wrote:
For 1, this is attempting to protect physical attack. Obviously this
particular problem can't be solved in isolation, but it's something to think about.
But isn't this something that per-file hashing would probably make easier to protect against, not harder? I mean, right now we just hash the whole region once and then assume it stays good -- there is no protection. I doubt this would really become easier to solve if you split the CBFS up into chunks... you'd have to somehow built a system where a whole chunk is loaded into memory at once, verified there, and then every file we may want to access from it is accessed in that same stage from that cached version in memory.
The file is the natural unit which is loaded at a time, so I'd think scoping the verification to that would make it easiest to verify on load. I mean, on Arm we already always load whole files at a time anyway, so it's really just a matter of inserting the verification step between load and decompression on the buffer we already have. (On x86 it may be a bit different, but it should still be easier to find space to load a file at a time than to load a larger CBFS chunk at a time.)
I don't believe the per-file approach necessarily makes things easier to protect. In fact the re-walk with validation might make it easier to exploit (depending on complexity of implementation). For time-of-check-time-of-use scenarios the easier thing is to load data that will be used in-core. i.e. not going back to boot media. Platform specifics with resource constraints would inherently leave these attacks open. Your suggestion on loading file, verifying, then using is valid, but my concern is all the rewalking of cbfs (comment below).
When discussing 2 from a practical matter, we need to pass on the
metadata information across stages to help mitigate 1 and ensure integrity of the hashes are correct.
This is true -- but isn't this the same for all solutions? No matter how you scope verification, if you want to verify things at load time then every stage needs to be able to run verification, and it needs some kind of trust anchor passed in from a previous stage for that. I also don't think this should be a huge hurdle... we're already passing vboot workbuffer metadata, this just means passing something more in there. (For per-file hashing, I'd assume you'd just pass the single hash covering all the metadata, and then all the metadata is verified again on every CBFS walk.)
What does that practically look like? Every time we have to re-walk we have to reverify the integrity of the metadata. Designing on the fly, to me that suggests we need to carry a copy of the metadata including offset & size after verifying it and not using it again on the boot media. That way it stays in-core. Otherwise, one has to walk all of the cbfs to only rewind and find the file again. There's variants of how big a span is covered by the metadata hash (i.e. how many entries), but one shouldn't rely upon existing entries every time we walk. It should be reliant upon the previous verified and cached metadata. Assets and access patterns very much are a part of the puzzle. Some platforms have very little assets while others have a quite a bit.
Similarly, limiting complexity is also important. If we can group the
assets together that are tied to the boot flow then it's conceptually easier to limit access to regions that haven't been checked yet or shouldn't be accessed. I do think per-file metadata hashing brings a lot of complications in implementation. When in limited resource environments chunking cbfs into multiple regions lends itself well to accomplishing that while also restricting access to data/regions that aren't needed yet when thinking about limiting #1.
Fair enough. I agree limiting complexity is always important, I'm just not ad-hoc convinced that a solution like you describe would really be less complex than per-file hashing. I think that while it makes the CBFS access code itself a bit more complex, it would save complexity in other areas (e.g. if you have arbitrary chunks then something must decide which chunk to use at what time and which file to pack into which chunk, all of which is extra code). Assuming we can "get it right once", I think per-file hashing should be a solution that will "just work" for whatever future platform ports and general features want to put in CBFS (whereas a solution where everyone who wants to put another file into CBFS must understand the verification solution well enough to make an informed decision on which chunk to place something into may end up pushing more complexity onto more people).
The decision on where files go is a static one at build time. When the cbfs chunks follow boot flow then the decision to switch to a new one follows those same boundaries. I agree, though, that one needs to understand their boot flow to make informed decisions on the asset location.
Anyway, I didn't want to derail this thread into discussing CBFS verification, I just wanted to mention that I still think the per-file hashing is a good idea and worth discussing. We should have a larger discussion about the pros and cons of possible approaches before we decide what we're planning to do (and then someone still needs to find time to do it, of course ;) ).
Agreed.