Thanks for the great suggestions Nico!
I'll check the UPDs between cold boot / cold boot resume / reboot / reboot resume, try some delays, etc., to see if I can identify anything relevant on the coreboot side that could impact this. It'd be great to find a solution there rather than having to kludge around it.
If I can't find a solution there, I'll try padding the FSP allocation, and we can see what that implementation looks like. I belive the IMD structs already have magic numbers, so I could probably look for those on 4K-aligned positions in the bootloader reserved area during resume without costing too much (and all of this could be Kconfig-controlled if needed).
On 11/3/23 18:35, Nico Huber wrote:
Hi Jonathon,
On 03.11.23 22:46, Jonathon Hall wrote:
On Librem Mini v2, rebooting, then suspending and resuming fails to resume.
I've tracked this down to a change in the TOLUM returned by FSP, which causes failures to find important cbmem regions during S3 resume. (I've run into problems relating to the TOLUM change before: https://puri.sm/posts/how-we-fixed-reboot-loops-on-the-librem-mini/.) This doesn't seem to happen on all CML boards but has always happened on Mini v2 for whatever reason. I have some ideas to address it, but I'm not sure which is best.
For example:
- Cold boot: cbmem_top() = 0x99fff000
- Reboot: cbmem_top() = 0x9a000000 (4K later, FSP seems to reserve 4K
less memory for itself on reboot)
- Resume after reboot: cbmem_top() = 0x99fff000 (will not be able to
find cbmem from reboot, not sure if the upper 4 KB have been overwritten by FSP)
I would love to blame FSP for this, but first we should make sure that it's not coreboot's fault. I assume FSP is free to change the allocation depending on its inputs. So it would be coreboot's job to ensure that these inputs don't change when resuming. Obviously UPDs handed to FSP-M shouldn't change. Have you confirmed that? (maybe dump them or a check- sum). Otherwise, the hardware state could be different. I don't know any example for FSP-M, but generally FSP checking for the presence of a PCIe device, for instance, is imaginable. Then it could be bad timing. Maybe as a desperate last test, try a 200ms delay before jumping into FSP-M.
If it's not that simple, I think we should bug Intel to provide a complete list of all inputs that affect TOLUM.
- Put the imd structures below the FSP reserved memory with some buffer
space?
This would probably require additional hacks for coreboot to find things in the FSP reserved memory later. I'm not sure how invasive this would be. I can't remember rn. what were the reasons to keep the IMD structures on top. But IIRC FSP was changed for this, so I bet there are good reasons. (Ironically, I believe not having to move them when the amount of data FSP-M spews changes, was among them.)
- Put the imd structures somewhere else entirely, like toward the
beginning of the available low memory instead of the end?
This could conflict with payloads, and (legacy) bootloaders and OSs.
- Ask FSP to reserve more than 8 KB for some buffer in case TOLUM
changes on resume, so the imd structures are still there?
This was also one of my first thoughts. We may still have to jump through some hoops if the location of the FSP reserved things move, though.
Nico