Resending this because I didn't realize I needed to be subscribed to the list to send emails to it. Sorry for any duplicates. ---
Hi list,
I've noticed what I think is a bug in the PPC OpenBIOS, but it could be in QEMU. First off, are github issues[1] an acceptable way of interacting with OpenBIOS developers? I have another potential bug to report with an attachment and github seems like a better place for the attachment than the mailing list. Also, I'm not subscribed, so a friendly reminder to do reply-all when responding.
I'm running the GRUB functional test suite for PPC on QEMU using the machine type mac99,via=pmu and noticed the sleep test fails. The sleep test, one of the first several tests that gets run, uses the RTC to get the current time, waits 10 seconds and gets the time again. The output from GRUB shows that the "get-time" firmware command is returning times before and after the 10 second sleep that differ by 2 seconds. The sleep is implemented in GRUB using the firmware "milliseconds" method. It calls the method in a loop until the last call minus the first call is greater or equal to the number of milliseconds given (10000 in this case). It does seem that by my watch 10 seconds do not go by, so I suspect this is an issue with the clock speed being faster than OpenBIOS thinks it is. Could this be an issue with using two different time sources one from QEMU's RTC and the other using emulated CPU frequency?
I've testing with QEMU 5.2 and 6.1 on debian stable with the same results. This can be reproduced by downloading the attached XZ compressed iso, uncompressing and invoking QEMU like this:
$ qemu-system-ppc -m 2047M -nographic -monitor file:/dev/null -fw_cfg name=etc/sercon-port,string=0 -serial null -serial stdio -hda grub.iso -boot c
I also noticed that the firmware outputs a message "milliseconds isn't unique." Is it just a coincidence that this mentions milliseconds and that is also the name of the firmware method that seems to be at issue here?
$ qemu-system-ppc -M mac99,via=pmu -display none -serial stdio s>> et_property: NULL phandle
============================================================= OpenBIOS 1.1 [Apr 16 2021 09:43] Configuration device id QEMU version 1 machine id 1 CPUs: 1 Memory: 128M UUID: 00000000-0000-0000-0000-000000000000 CPU type PowerPC,G4
milliseconds isn't unique.
Glenn
On 18/01/2022 19:26, Glenn Washburn wrote:
Resending this because I didn't realize I needed to be subscribed to the list to send emails to it. Sorry for any duplicates.
Hi list,
I've noticed what I think is a bug in the PPC OpenBIOS, but it could be in QEMU. First off, are github issues[1] an acceptable way of interacting with OpenBIOS developers? I have another potential bug to report with an attachment and github seems like a better place for the attachment than the mailing list. Also, I'm not subscribed, so a friendly reminder to do reply-all when responding.
The development discussions are all done here on the mailing list for now. Unfortunately there doesn't seem to be an easy way to gate emails between github issues and the Openbios mailing list, so people have to manually subscribe to the project for notifications which is why they currently get little attention :/
I'm running the GRUB functional test suite for PPC on QEMU using the machine type mac99,via=pmu and noticed the sleep test fails. The sleep test, one of the first several tests that gets run, uses the RTC to get the current time, waits 10 seconds and gets the time again. The output from GRUB shows that the "get-time" firmware command is returning times before and after the 10 second sleep that differ by 2 seconds. The sleep is implemented in GRUB using the firmware "milliseconds" method. It calls the method in a loop until the last call minus the first call is greater or equal to the number of milliseconds given (10000 in this case). It does seem that by my watch 10 seconds do not go by, so I suspect this is an issue with the clock speed being faster than OpenBIOS thinks it is. Could this be an issue with using two different time sources one from QEMU's RTC and the other using emulated CPU frequency?
I've testing with QEMU 5.2 and 6.1 on debian stable with the same results. This can be reproduced by downloading the attached XZ compressed iso, uncompressing and invoking QEMU like this:
$ qemu-system-ppc -m 2047M -nographic -monitor file:/dev/null -fw_cfg name=etc/sercon-port,string=0 -serial null -serial stdio -hda grub.iso -boot c
I also noticed that the firmware outputs a message "milliseconds isn't unique." Is it just a coincidence that this mentions milliseconds and that is also the name of the firmware method that seems to be at issue here?
$ qemu-system-ppc -M mac99,via=pmu -display none -serial stdio s>> et_property: NULL phandle
============================================================= OpenBIOS 1.1 [Apr 16 2021 09:43] Configuration device id QEMU version 1 machine id 1 CPUs: 1 Memory: 128M UUID: 00000000-0000-0000-0000-000000000000 CPU type PowerPC,G4
milliseconds isn't unique.
Glenn
I suspect that the milliseconds/get-ms words aren't doing the right thing in openbios-ppc right now. I haven't looked at this for a while, however the best solution is probably similar to the one used for SPARC which is to increment a counter on the timer (decrementer) interrupt calculating the interval from the timebase. Is that something that you would be interested to implement?
The warning "milliseconds isn't unique" is because there is a fallback implementation of these words in OpenBIOS which simply increments a counter by 1 every time it is called, and appears because the current PPC implementation overrides it at a Forth level. Again this would be solved by moving an interrupt-based solution.
ATB,
Mark.
On Mon, 24 Jan 2022 08:24:52 +0000 Mark Cave-Ayland mark.cave-ayland@ilande.co.uk wrote:
On 18/01/2022 19:26, Glenn Washburn wrote:
Resending this because I didn't realize I needed to be subscribed to the list to send emails to it. Sorry for any duplicates.
Hi list,
I've noticed what I think is a bug in the PPC OpenBIOS, but it could be in QEMU. First off, are github issues[1] an acceptable way of interacting with OpenBIOS developers? I have another potential bug to report with an attachment and github seems like a better place for the attachment than the mailing list. Also, I'm not subscribed, so a friendly reminder to do reply-all when responding.
The development discussions are all done here on the mailing list for now. Unfortunately there doesn't seem to be an easy way to gate emails between github issues and the Openbios mailing list, so people have to manually subscribe to the project for notifications which is why they currently get little attention :/
Ok, perhaps in the future if I have a large attachment I should make a github issue with the attachment there and send an email to the list about the issue with a reference to the github issue.
I'm running the GRUB functional test suite for PPC on QEMU using the machine type mac99,via=pmu and noticed the sleep test fails. The sleep test, one of the first several tests that gets run, uses the RTC to get the current time, waits 10 seconds and gets the time again. The output from GRUB shows that the "get-time" firmware command is returning times before and after the 10 second sleep that differ by 2 seconds. The sleep is implemented in GRUB using the firmware "milliseconds" method. It calls the method in a loop until the last call minus the first call is greater or equal to the number of milliseconds given (10000 in this case). It does seem that by my watch 10 seconds do not go by, so I suspect this is an issue with the clock speed being faster than OpenBIOS thinks it is. Could this be an issue with using two different time sources one from QEMU's RTC and the other using emulated CPU frequency?
I've testing with QEMU 5.2 and 6.1 on debian stable with the same results. This can be reproduced by downloading the attached XZ compressed iso, uncompressing and invoking QEMU like this:
$ qemu-system-ppc -m 2047M -nographic -monitor file:/dev/null -fw_cfg name=etc/sercon-port,string=0 -serial null -serial stdio -hda grub.iso -boot c
I also noticed that the firmware outputs a message "milliseconds isn't unique." Is it just a coincidence that this mentions milliseconds and that is also the name of the firmware method that seems to be at issue here?
$ qemu-system-ppc -M mac99,via=pmu -display none -serial stdio s>> et_property: NULL phandle
============================================================= OpenBIOS 1.1 [Apr 16 2021 09:43] Configuration device id QEMU version 1 machine id 1 CPUs: 1 Memory: 128M UUID: 00000000-0000-0000-0000-000000000000 CPU type PowerPC,G4
milliseconds isn't unique.
Glenn
I suspect that the milliseconds/get-ms words aren't doing the right thing in openbios-ppc right now. I haven't looked at this for a while, however the best solution is probably similar to the one used for SPARC which is to increment a counter on the timer (decrementer) interrupt calculating the interval from the timebase. Is that something that you would be interested to implement?
I'm mildly interested, but its not something I can realistically commit to at the moment. And as someone not familiar with the PPC architecture, it would take more effort for me than someone else more familiar.
The warning "milliseconds isn't unique" is because there is a fallback implementation of these words in OpenBIOS which simply increments a counter by 1 every time it is called, and appears because the current PPC implementation overrides it at a Forth level. Again this would be solved by moving an interrupt-based solution.
Since I'm building for QEMU, wouldn't the code that's getting called be ciface_milliseconds in arch/ppc/qemu/methods.c? It appears as though that is actually trying to do the right thing. This also seems to be similar to how the kernel is doing delay[1]. I don't understand all that assembly though, so maybe there's more that should be done. Does this change your above analysis and potentially make the fix something quicker/easier?
Glenn
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch...
On 25/01/2022 01:02, Glenn Washburn wrote:
On Mon, 24 Jan 2022 08:24:52 +0000 Mark Cave-Ayland mark.cave-ayland@ilande.co.uk wrote:
On 18/01/2022 19:26, Glenn Washburn wrote:
Resending this because I didn't realize I needed to be subscribed to the list to send emails to it. Sorry for any duplicates.
Hi list,
I've noticed what I think is a bug in the PPC OpenBIOS, but it could be in QEMU. First off, are github issues[1] an acceptable way of interacting with OpenBIOS developers? I have another potential bug to report with an attachment and github seems like a better place for the attachment than the mailing list. Also, I'm not subscribed, so a friendly reminder to do reply-all when responding.
The development discussions are all done here on the mailing list for now. Unfortunately there doesn't seem to be an easy way to gate emails between github issues and the Openbios mailing list, so people have to manually subscribe to the project for notifications which is why they currently get little attention :/
Ok, perhaps in the future if I have a large attachment I should make a github issue with the attachment there and send an email to the list about the issue with a reference to the github issue.
I'm running the GRUB functional test suite for PPC on QEMU using the machine type mac99,via=pmu and noticed the sleep test fails. The sleep test, one of the first several tests that gets run, uses the RTC to get the current time, waits 10 seconds and gets the time again. The output from GRUB shows that the "get-time" firmware command is returning times before and after the 10 second sleep that differ by 2 seconds. The sleep is implemented in GRUB using the firmware "milliseconds" method. It calls the method in a loop until the last call minus the first call is greater or equal to the number of milliseconds given (10000 in this case). It does seem that by my watch 10 seconds do not go by, so I suspect this is an issue with the clock speed being faster than OpenBIOS thinks it is. Could this be an issue with using two different time sources one from QEMU's RTC and the other using emulated CPU frequency?
I've testing with QEMU 5.2 and 6.1 on debian stable with the same results. This can be reproduced by downloading the attached XZ compressed iso, uncompressing and invoking QEMU like this:
$ qemu-system-ppc -m 2047M -nographic -monitor file:/dev/null -fw_cfg name=etc/sercon-port,string=0 -serial null -serial stdio -hda grub.iso -boot c
I also noticed that the firmware outputs a message "milliseconds isn't unique." Is it just a coincidence that this mentions milliseconds and that is also the name of the firmware method that seems to be at issue here?
$ qemu-system-ppc -M mac99,via=pmu -display none -serial stdio s>> et_property: NULL phandle
============================================================= OpenBIOS 1.1 [Apr 16 2021 09:43] Configuration device id QEMU version 1 machine id 1 CPUs: 1 Memory: 128M UUID: 00000000-0000-0000-0000-000000000000 CPU type PowerPC,G4
milliseconds isn't unique.
Glenn
I suspect that the milliseconds/get-ms words aren't doing the right thing in openbios-ppc right now. I haven't looked at this for a while, however the best solution is probably similar to the one used for SPARC which is to increment a counter on the timer (decrementer) interrupt calculating the interval from the timebase. Is that something that you would be interested to implement?
I'm mildly interested, but its not something I can realistically commit to at the moment. And as someone not familiar with the PPC architecture, it would take more effort for me than someone else more familiar.
Well I thought I would ask, otherwise it's something else that has to go on my TODO list ;)
The warning "milliseconds isn't unique" is because there is a fallback implementation of these words in OpenBIOS which simply increments a counter by 1 every time it is called, and appears because the current PPC implementation overrides it at a Forth level. Again this would be solved by moving an interrupt-based solution.
Since I'm building for QEMU, wouldn't the code that's getting called be ciface_milliseconds in arch/ppc/qemu/methods.c? It appears as though that is actually trying to do the right thing. This also seems to be similar to how the kernel is doing delay[1]. I don't understand all that assembly though, so maybe there's more that should be done. Does this change your above analysis and potentially make the fix something quicker/easier?
Looking at ciface_milliseconds I did spot one obvious issue: TIMER_FREQUENCY is hardcoded to 16600000ULL (166MHz) which is correct for g3beige, but not mac99 which uses a 100MHz timebase frequency.
As a quick hack if you alter TIMER_FREQUENCY to 100MHz does that get you close to the required 10s interval on a mac99 machine?
ATB,
Mark.
On Thu, 27 Jan 2022, Mark Cave-Ayland wrote:
On 25/01/2022 01:02, Glenn Washburn wrote:
On Mon, 24 Jan 2022 08:24:52 +0000 Mark Cave-Ayland mark.cave-ayland@ilande.co.uk wrote:
On 18/01/2022 19:26, Glenn Washburn wrote:
Resending this because I didn't realize I needed to be subscribed to the list to send emails to it. Sorry for any duplicates.
Hi list,
I've noticed what I think is a bug in the PPC OpenBIOS, but it could be in QEMU. First off, are github issues[1] an acceptable way of interacting with OpenBIOS developers? I have another potential bug to report with an attachment and github seems like a better place for the attachment than the mailing list. Also, I'm not subscribed, so a friendly reminder to do reply-all when responding.
The development discussions are all done here on the mailing list for now. Unfortunately there doesn't seem to be an easy way to gate emails between github issues and the Openbios mailing list, so people have to manually subscribe to the project for notifications which is why they currently get little attention :/
Ok, perhaps in the future if I have a large attachment I should make a github issue with the attachment there and send an email to the list about the issue with a reference to the github issue.
I'm running the GRUB functional test suite for PPC on QEMU using the machine type mac99,via=pmu and noticed the sleep test fails. The sleep test, one of the first several tests that gets run, uses the RTC to get the current time, waits 10 seconds and gets the time again. The output from GRUB shows that the "get-time" firmware command is returning times before and after the 10 second sleep that differ by 2 seconds. The sleep is implemented in GRUB using the firmware "milliseconds" method. It calls the method in a loop until the last call minus the first call is greater or equal to the number of milliseconds given (10000 in this case). It does seem that by my watch 10 seconds do not go by, so I suspect this is an issue with the clock speed being faster than OpenBIOS thinks it is. Could this be an issue with using two different time sources one from QEMU's RTC and the other using emulated CPU frequency?
I've testing with QEMU 5.2 and 6.1 on debian stable with the same results. This can be reproduced by downloading the attached XZ compressed iso, uncompressing and invoking QEMU like this:
$ qemu-system-ppc -m 2047M -nographic -monitor file:/dev/null -fw_cfg name=etc/sercon-port,string=0 -serial null -serial stdio -hda grub.iso -boot c
I also noticed that the firmware outputs a message "milliseconds isn't unique." Is it just a coincidence that this mentions milliseconds and that is also the name of the firmware method that seems to be at issue here?
$ qemu-system-ppc -M mac99,via=pmu -display none -serial stdio s>> et_property: NULL phandle
============================================================= OpenBIOS 1.1 [Apr 16 2021 09:43] Configuration device id QEMU version 1 machine id 1 CPUs: 1 Memory: 128M UUID: 00000000-0000-0000-0000-000000000000 CPU type PowerPC,G4
milliseconds isn't unique.
Glenn
I suspect that the milliseconds/get-ms words aren't doing the right thing in openbios-ppc right now. I haven't looked at this for a while, however the best solution is probably similar to the one used for SPARC which is to increment a counter on the timer (decrementer) interrupt calculating the interval from the timebase. Is that something that you would be interested to implement?
I'm mildly interested, but its not something I can realistically commit to at the moment. And as someone not familiar with the PPC architecture, it would take more effort for me than someone else more familiar.
Well I thought I would ask, otherwise it's something else that has to go on my TODO list ;)
I think this issue came up a few years before and I remember asking about the milliseconds isn't uinque error even before that but not sure what came out of that at the end. This is the last mention I've found:
https://mail.coreboot.org/pipermail/openbios/2017-December/010085.html
There may be some more info in that thread about what you've found back then to help refresh memories.
The warning "milliseconds isn't unique" is because there is a fallback implementation of these words in OpenBIOS which simply increments a counter by 1 every time it is called, and appears because the current PPC implementation overrides it at a Forth level. Again this would be solved by moving an interrupt-based solution.
Since I'm building for QEMU, wouldn't the code that's getting called be ciface_milliseconds in arch/ppc/qemu/methods.c? It appears as though that is actually trying to do the right thing. This also seems to be similar to how the kernel is doing delay[1]. I don't understand all that assembly though, so maybe there's more that should be done. Does this change your above analysis and potentially make the fix something quicker/easier?
Looking at ciface_milliseconds I did spot one obvious issue: TIMER_FREQUENCY is hardcoded to 16600000ULL (166MHz) which is correct for g3beige, but not mac99 which uses a 100MHz timebase frequency.
As a quick hack if you alter TIMER_FREQUENCY to 100MHz does that get you close to the required 10s interval on a mac99 machine?
Isn't there an FW_CFG variable for this already that could be used to get the correct freq? (Other than that my idea for a more complete solution for all similar issues was to drop hard coded device-tree hacks from OpenBIOS and replace FW_CFG with generating the device-tree in QEMU and pass that as SLOF does which I've ported to OpenBIOS as an experiment but this was too big a change so I did not carry on with it and it was abandoned. I've done the same for OpenFirmware too but then did not finish that as I've ended up with VOF for pegasos2 which was simpler to do and still has device tree in QEMU so nothing is needed in the firmware. This is just a side note not really related to the milliseconds issue so I digress.)
Regards, BALATON Zoltan
On 27/01/2022 16:13, BALATON Zoltan wrote:
Looking at ciface_milliseconds I did spot one obvious issue: TIMER_FREQUENCY is hardcoded to 16600000ULL (166MHz) which is correct for g3beige, but not mac99 which uses a 100MHz timebase frequency.
As a quick hack if you alter TIMER_FREQUENCY to 100MHz does that get you close to the required 10s interval on a mac99 machine?
Isn't there an FW_CFG variable for this already that could be used to get the correct freq? (Other than that my idea for a more complete solution for all similar issues was to drop hard coded device-tree hacks from OpenBIOS and replace FW_CFG with generating the device-tree in QEMU and pass that as SLOF does which I've ported to OpenBIOS as an experiment but this was too big a change so I did not carry on with it and it was abandoned. I've done the same for OpenFirmware too but then did not finish that as I've ended up with VOF for pegasos2 which was simpler to do and still has device tree in QEMU so nothing is needed in the firmware. This is just a side note not really related to the milliseconds issue so I digress.)
Yes indeed, that's correct - rather than using the fixed constant, it should be using the timer_freq variable which is populated in arch/ppc/qemu/init.c.
ATB,
Mark.
On Thu, 27 Jan 2022 17:12:29 +0000 Mark Cave-Ayland mark.cave-ayland@ilande.co.uk wrote:
On 27/01/2022 16:13, BALATON Zoltan wrote:
Looking at ciface_milliseconds I did spot one obvious issue: TIMER_FREQUENCY is hardcoded to 16600000ULL (166MHz) which is correct for g3beige, but not mac99 which uses a 100MHz timebase frequency.
As a quick hack if you alter TIMER_FREQUENCY to 100MHz does that get you close to the required 10s interval on a mac99 machine?
Isn't there an FW_CFG variable for this already that could be used to get the correct freq? (Other than that my idea for a more complete solution for all similar issues was to drop hard coded device-tree hacks from OpenBIOS and replace FW_CFG with generating the device-tree in QEMU and pass that as SLOF does which I've ported to OpenBIOS as an experiment but this was too big a change so I did not carry on with it and it was abandoned. I've done the same for OpenFirmware too but then did not finish that as I've ended up with VOF for pegasos2 which was simpler to do and still has device tree in QEMU so nothing is needed in the firmware. This is just a side note not really related to the milliseconds issue so I digress.)
Yes indeed, that's correct - rather than using the fixed constant, it should be using the timer_freq variable which is populated in arch/ppc/qemu/init.c.
Yes, using the timer_freq variable does the trick. Do you want to create the patch or should I?
Glenn
On 28/01/2022 06:52, Glenn Washburn wrote:
On Thu, 27 Jan 2022 17:12:29 +0000 Mark Cave-Ayland mark.cave-ayland@ilande.co.uk wrote:
On 27/01/2022 16:13, BALATON Zoltan wrote:
Looking at ciface_milliseconds I did spot one obvious issue: TIMER_FREQUENCY is hardcoded to 16600000ULL (166MHz) which is correct for g3beige, but not mac99 which uses a 100MHz timebase frequency.
As a quick hack if you alter TIMER_FREQUENCY to 100MHz does that get you close to the required 10s interval on a mac99 machine?
Isn't there an FW_CFG variable for this already that could be used to get the correct freq? (Other than that my idea for a more complete solution for all similar issues was to drop hard coded device-tree hacks from OpenBIOS and replace FW_CFG with generating the device-tree in QEMU and pass that as SLOF does which I've ported to OpenBIOS as an experiment but this was too big a change so I did not carry on with it and it was abandoned. I've done the same for OpenFirmware too but then did not finish that as I've ended up with VOF for pegasos2 which was simpler to do and still has device tree in QEMU so nothing is needed in the firmware. This is just a side note not really related to the milliseconds issue so I digress.)
Yes indeed, that's correct - rather than using the fixed constant, it should be using the timer_freq variable which is populated in arch/ppc/qemu/init.c.
Yes, using the timer_freq variable does the trick. Do you want to create the patch or should I?
Go for it - if you've already got the patch ready to go, please go ahead and post it.
ATB,
Mark.