On Mon, 24 Jan 2022 08:24:52 +0000 Mark Cave-Ayland mark.cave-ayland@ilande.co.uk wrote:
On 18/01/2022 19:26, Glenn Washburn wrote:
Resending this because I didn't realize I needed to be subscribed to the list to send emails to it. Sorry for any duplicates.
Hi list,
I've noticed what I think is a bug in the PPC OpenBIOS, but it could be in QEMU. First off, are github issues[1] an acceptable way of interacting with OpenBIOS developers? I have another potential bug to report with an attachment and github seems like a better place for the attachment than the mailing list. Also, I'm not subscribed, so a friendly reminder to do reply-all when responding.
The development discussions are all done here on the mailing list for now. Unfortunately there doesn't seem to be an easy way to gate emails between github issues and the Openbios mailing list, so people have to manually subscribe to the project for notifications which is why they currently get little attention :/
Ok, perhaps in the future if I have a large attachment I should make a github issue with the attachment there and send an email to the list about the issue with a reference to the github issue.
I'm running the GRUB functional test suite for PPC on QEMU using the machine type mac99,via=pmu and noticed the sleep test fails. The sleep test, one of the first several tests that gets run, uses the RTC to get the current time, waits 10 seconds and gets the time again. The output from GRUB shows that the "get-time" firmware command is returning times before and after the 10 second sleep that differ by 2 seconds. The sleep is implemented in GRUB using the firmware "milliseconds" method. It calls the method in a loop until the last call minus the first call is greater or equal to the number of milliseconds given (10000 in this case). It does seem that by my watch 10 seconds do not go by, so I suspect this is an issue with the clock speed being faster than OpenBIOS thinks it is. Could this be an issue with using two different time sources one from QEMU's RTC and the other using emulated CPU frequency?
I've testing with QEMU 5.2 and 6.1 on debian stable with the same results. This can be reproduced by downloading the attached XZ compressed iso, uncompressing and invoking QEMU like this:
$ qemu-system-ppc -m 2047M -nographic -monitor file:/dev/null -fw_cfg name=etc/sercon-port,string=0 -serial null -serial stdio -hda grub.iso -boot c
I also noticed that the firmware outputs a message "milliseconds isn't unique." Is it just a coincidence that this mentions milliseconds and that is also the name of the firmware method that seems to be at issue here?
$ qemu-system-ppc -M mac99,via=pmu -display none -serial stdio s>> et_property: NULL phandle
============================================================= OpenBIOS 1.1 [Apr 16 2021 09:43] Configuration device id QEMU version 1 machine id 1 CPUs: 1 Memory: 128M UUID: 00000000-0000-0000-0000-000000000000 CPU type PowerPC,G4
milliseconds isn't unique.
Glenn
I suspect that the milliseconds/get-ms words aren't doing the right thing in openbios-ppc right now. I haven't looked at this for a while, however the best solution is probably similar to the one used for SPARC which is to increment a counter on the timer (decrementer) interrupt calculating the interval from the timebase. Is that something that you would be interested to implement?
I'm mildly interested, but its not something I can realistically commit to at the moment. And as someone not familiar with the PPC architecture, it would take more effort for me than someone else more familiar.
The warning "milliseconds isn't unique" is because there is a fallback implementation of these words in OpenBIOS which simply increments a counter by 1 every time it is called, and appears because the current PPC implementation overrides it at a Forth level. Again this would be solved by moving an interrupt-based solution.
Since I'm building for QEMU, wouldn't the code that's getting called be ciface_milliseconds in arch/ppc/qemu/methods.c? It appears as though that is actually trying to do the right thing. This also seems to be similar to how the kernel is doing delay[1]. I don't understand all that assembly though, so maybe there's more that should be done. Does this change your above analysis and potentially make the fix something quicker/easier?
Glenn
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch...