I've talked to Marc Jones about this several times over the years.. He can confirm my memory. There is almost no win to parallelizing any of the memory or PCI bus setup. Yes, it's supported in the code, kind of, for some platforms, and maybe it works on some of them, but it's not worth it and it really complicates things.
What is worth it, and we've measured this, is ECC scrubbing. We should focus on that.
So the boot path: BSP does all device tree, DRAM setup, sets up stacks and boot code for APs
APs are woken up and do what they are told, which is in many cases to set themselves up and do ECC scrubbing.
In other words, Stefan is right (again :-)
ron