So what we can see is that everything is serial and there is great deal of waiting. For that specific SDHCI case you can see "Storage device initialization" that is happening in depthcharge. That is CMD1 that you need keep on sending to the controller. As you can see, it completes in 130ms. Unfortunately you really can't just send CMD1 and go about your business. You need to poll readiness status and keep on sending CMD1 again and again. Also, it is not always 130ms. It tends to vary and worst case we seen was over 300ms.
Do you actually have an eMMC part that requires repeating CMD1 within a certain bounded time interval? What happens if you violate that? Does it just not progress initialization or does it actually fail in some way?
I can't find any official documentation suggesting that this is really required. JESD84-B51 just says (6.4.3): "The busy bit in the CMD1 response can be used by a device to tell the host that it is still working on its power-up/reset procedure (e.g., downloading the register information from memory field) and is not ready yet for communication. In this case the host must repeat CMD1 until the busy bit is cleared." This suggests that the only point of the command is polling for readiness.
Another one is "kernel read", which is pure IO and takes 132ms. If you invest some 300ms in training the link (has to happen on every boot on every board) to HS400 you can read it in just 10ms. Naturally you can't see HS400 in the picture because enabling it late in the boot flow would be counter productive.
Have you considered implementing HS400-ES (enhanced strobe) support in your host controllers? That feature allows you to run at HS400 speeds immediately without any tuning (by essentially turning the clock master around and having the device pulse its own clock when it's sending data IIRC). We've had great success improving boot speed with that on a different Chrome OS platform. This won't help you for your current generation of SoCs yet, but at least it should resolve the tuning issue in the long run as this feature becomes more standard (so this issue shouldn't actually get worse and worse in the future... it should go away again).