On Thu, May 27, 2021 at 11:44 AM Kevin O'Connor kevin@koconnor.net wrote:
The purpose of this code is to restore the NMI_DISABLE_BIT to what it was prior to call32_prep(). If something calls the bios without the NMI_DISABLE_BIT set, it's this code that makes sure SeaBIOS returns to that calling code with NMI_DISABLE_BIT also not set.
If you've run into some bug, I think it would help if you could further describe that bug.
Cheers, -Kevin
Hi Kevin,
Thank you for the explanation! Sorry, it seems I misunderstood this part of the code as I thought every access to PORT_CMOS_DATA should be performed with NMIs disabled. Maybe giving some of the background on the issue will help me understand this a bit better, indeed!
This has been originally reported by some Ubuntu users running specific VMs on older versions of seabios, where they would occasionally see KVM emulation failures and VMs going into "PAUSED" state (and being unable to resume without a full VM reboot afterwards). Inspecting the ASM dumps [0] on those VMs revealed that the last actions performed were accesses to PORT_CMOS_DATA, and those seemed to be caused by rtc_mask(). Since these were on old versions of seabios, they looked like a result of our builds missing patch 3156b71a535e (rtc: Disable NMI in rtc_mask()) [1], which we tried to address initially.
After providing new packages of seabios with the rtc_mask() patch, some users noticed that a few VMs still continued to present similar symptoms, but with a different ASM dump this time. This was also seen on "newer" versions of our seabios packages based on upstream 1.10.2, which should already include the rtc_mask() patches by default (git describe --contains reports this patch being introduced with rel-1.9.0~47). These new failed instances lead us to believe that call32_post() was the culprit, since the trapping instruction was still the same access to PORT_CMOS_DATA which was "unguarded" by an NMI_DISABLE_BIT. We then provided another package for testing, implementing the patch I've proposed originally in this thread, and our users reported no further KVM emulation failures.
Unfortunately, I'm not entirely sure what originally causes the KVM emulation failures, as I've been unable to reproduce these issues in test VMs. Our users reported that the commands below can trigger the emulation failures, but I have no details on the exact platform those are running on or any details of the specific file system in use:
root@vsfo-2[]:/root> date; fsfreeze --freeze /flash root@vsfo-2[]:/root> date; dd if=/dev/zero of=/flash/test bs=1 count=0 seek=1G
In any case, I'm somewhat puzzled by CMOS port accesses causing KVM emulation failures. Could it be that an NMI comes in between outb/inb and we end up trying to read from a nonsense CMOS index?
If I understand it correctly, my proposed patch effectively turns off NMIs unconditionally which sounds like it should cause horrible breakage. Could you help me understand why that doesn't happen with the rtc_read/write/mask functions in src/hw/rtc.c as well?
Hopefully the above helps contextualize the issue a bit better, Kevin. Apologies for asking so many questions, but would you have any suggestions on how we could try to get more information on the seabios side of this?
Many thanks for the help! Heitor
[0] https://pastebin.ubuntu.com/p/4dYFCqPpxb/ [1] https://review.coreboot.org/plugins/gitiles/seabios/+/3156b71a535e661%5E%21/...