On Fri, 21 Jun 2013 09:26:11 -0700 ron minnich rminnich@gmail.com wrote:
the question about fallback is 'when do I tell the machine that the normal boot succeeded'? At LANL, we learned the best place: as LATE in the boot process as possible, long after LInux is up. You want to be sure, if you set 'booted ok', that it is LINUX that booted ok, not just coreboot. That's a key piece.
Because if coreboot sets 'booted ok', and then the node doesn't boot, that's not doing you much good, is it? We learned that the hard way.
So: only linux boot scripts get to set 'normal booted ok', and that should be the last thing you do and many things get to clear 'normal booted ok', including linux, the payload, and coreboot itself.
The issue is that the coreboot implementation doesn't currently work like that.
An implementation of what you described would work like that: At very early boot, coreboot would record the value of a booted_ok nvram setting in a variable, and then reset that the booted_ok nvram parameter to false. It would then boot on the boot_option nvram setting(like Normal for instance) if the recorded value is true; Then the OS would boot and the last boot scipt to run would do something like nvramtool -w booted_ok=true
In the case of when something goes wrong, coreboot would record the value of booted_ok and find that it booted ok, then it would set that value to false, since something goes wrong it wouldn't complete the boot... Then the user would shut down the computer and power it on again, coreboot would then look at the booted_ok value and finds that there was a problem last time the user booted the computer, and because of that would run the images that have the fallback prefix in cbfs.
Here would be the advantages and disadvantages of that approach: ---------------------------------------------------------------- -> It would require cooperation from the OS/Distribution, so for instance the user would be expected to put a systemd unit in /etc/systemd/system, we would probalby also have to create a sysvinit script for that. We could probably add support for clearing the booted_ok flag in SeaBIOS for the cases where adding an init script is not an option, and make a Kconfig option in coreboot to select that SeaBIOS option. I guess that adding an nvram option for telling seabios to clear that value is probably a bad idea because the change would affect too many boards(see below).
-> The good thing is that it's way more reliable than the current approach that seem to tell that it booted fine in ramstage, from IRC: <kmalkki> if I remember correctly normal boot is marked good before entering payload The current approach doesn't guarantee that the user could boot into a working system, the payload could fail or the OS could not boot because of a wrong memory layout for instance.
-> The user would be able to test new changes really easily: * For instance if it's a laptop he wouldn't even need to disassemble the laptop if the reflashing goes wrong. * Even for the people used to reflash their laptop it would be a huge benefit: * Assuming that there is a working coreboot image already for the laptop, they wouldn't need to use an external reflashing tool, that means faster testing. * That also means that they could develop for that laptop in more situations(For instance in a train, in a plane etc... where it would be complicated to reflash the laptop externally), If someone ports CONFIG_ELOG and/or CONFIG_CHROMEOS_RAMOOPS to all laptops, the developer would also get the logs in the case where the system didn't boot.
Implementation: --------------- So what would be the best approach for adding support for that? -> Many boards have the possibility to use the nvram and some use it by default... Should a new cmos layout option be added? Or should we re-use the last_boot option? would the rename of options or values be a problem if the layout doesn't change? What about value types changes (Fallback/Normal -> false/true) ? Basically what happen if you have a board that has a CMOS layout and that you flash an image with a new and different CMOS layout and reboot? There are 2 cases: if the board has a cmos.default(very few boards have that) or if the board lack that(it's the case for the majority of the boards).
-> Should a new Kconfig option and fallback/normal mecanism be added, since we have 2 implementations already that could be a good idea and it would be safer that way.
Denis.