Hi Sven, this is great, I approved it.
I am going to be looking at something I wanted to mention here. On machines with SMP, I want to make hardware main send the SIPI right after console is set up, rather than waiting as long as we do now. The reason is I need them to get through their startup and be ready to do a function in parallel DURING device enumeration. The function in this case is loading payload while we enumerate the devices. These are quite independent functions and there is no reason to serialise them. This is a big change from current practice but we need it for startup performance. We have those cores and it is good to use them for something other than starting themselves up.
I think the structure of your code lends itself to this use, hence my enthusiasm for it.
So, if this causes you to have thoughts, I'd like to hear them :)
ron