On 06/11/2021 17:26, BALATON Zoltan wrote:
On Sat, 6 Nov 2021, Mark Cave-Ayland wrote:
On 02/11/2021 15:39, BALATON Zoltan wrote:
On Tue, 26 Oct 2021, BALATON Zoltan wrote:
On Tue, 26 Oct 2021, Mark Cave-Ayland wrote:
What was the response from the NetBSD port-macppc mailing list? The NetBSD guys use QEMU a lot as part of their CI/release engineering process and are generally very responsive and knowledgeable: at the very minimum they will be able to point you directly towards the code in question.
I don't know. Are you subscribed to that list or know their IRC channel you could ask them faster? Please cc me the answer. I've tried my best to find this out but I don't have more time to spend with this now.
What will be done about this so that QEMU 6.2 won't be released in a state that MorphOS can't boot on mac99 any more? Looks like NetBSD list can't help finding the problem and I'm not able to debug it. (I think this may be an existing bug only exposed by this patch not caused by it based on the response we got so far. It looks like there's some overlap in memory usage for some reason which causes parts to be overwritten and adding another /pci node probably causes more stuff to be overwritten so it breaks but otherwise it probably just overwrites less critical stuff and still boots by chance. This is likely since most versions don't boot either before or after this patch.)
I think we have two options for the next QEMU release given that we only have one week left for bug fixes:
- Take the original v8 patch adding the pci node with actual pci bus info and
accept this breaks NetBSD 8.0 and 9,0 (but 9.2 still works).
or
- Take the dummy-pci patch that does not break anything but fixes the MorphOS
boot. (This may be a hack but it could be cleaned up later and does the job for now.)
What's your decision on this?
Firstly option 1 is not a viable choice because it would break booting NetBSD 8.0/9.0 for people that have existing disk images. But then option 2 is not
I think this breakage is not caused by this patch but only exposed by it. Therefore it's also not possible to fix it in this patch so don't blame this patch but look for the problem elsewhere and fix it there. This should be no reason to withhold this patch but if it is I've provided an alternative with the dummy-pci that did not break this case but still fixed the problem I'd like to fix.
suitable in that it moves the DT further away from that of a real Mac: the aim should be to move the emulation closer to real life rather than further from it.
OK but then is there an option that is viable and possible to do for QEMU 6.2 release?
I also would not agree with your statement that the patch doesn't break anything, since the fact regressions caused by earlier versions of this patch have only been caught by my pre-merge tests highlight the problem of testing coverage.
It's very unlikely to break anything but I can't be sure for at least two reasons: you did not publish what images you test with so it's impossible for me to make sure your tests will pass but secondly even if it was public I don't have the time to do all this testing. You've volunteered to do that as a maintainer so please do your job and test the dummy-pci patch and only say it will break anything if you can show what. Don't discard it based on belief it might break something. That argument is true for any and all patches.
Introducing a bogus DT node is a highly visible change to introduce to the guest, and propagating such an entry into OSs and device trees is at the minimum going to cause confusion and at worst cause more regressions.
Confusion for whom and why should it cause regressions? I don't believe this.
I had a look at the port-macppc archives and I saw the discussion around possible corruption but there didn't seem to be any further analysis or conclusive outcome. I spent some time this morning burning the NetBSD 8.0 ISO to a CD and booting it on my G4 Mac Mini and it booted into userspace fine. On this basis we know that a DT with multiple pci nodes boots fine with that QEMU version, so any bugs must be either in the extra /pci node contents or how the PCI bus is mapped for the mac99 machine.
This test is not suffucient to prove this patch causes the problem. Do another test: take NetBSD 8.1 or 9.1 that currently does not boot on QEMU master and verify that boots on your Mac mini. If it does then I think that proves there's a bug elsewhere which just happen to be not serious enough without this patch for some NetBSD versions to fail but it's nevertheless there and this patch only exposes that. I see no way this patch could cause overwriting memory in itself unless there's some other problem existing without this patch already. That also means no ammount of changes to this patch will fix that problem.
For me the obvious answer is to use QEMU to determine why boot fails and add a standard /pci@f0000000 node as per your earlier patch that will then be guaranteed to work across all OSs, even ones that do not have existing test coverage, as we know this is what works on real hardware.
Attached is the dmesg from booting NetBSD 8.0 on my Mac Mini, along with a copy of the DTs for an iMac 99 and G4 AGP which are the closest DTs to the QEMU mac99 machine which I hope will be of help. I do understand how getting patches merged is always frustrating - it is something that happens to all of us - but once the issue with the /pci node has been fixed so that it is the same as a real Mac, compatibility becomes something we can simply forget about.
OK I've sent a v9 which completely recreates the /pci node matching the real hardware adding the so far missing properties but it makes no difference for NetBSD so it only proves your theory is wrong. Now you have 10 versions of this patch plus the dummy-pci one to chose from. All of these fixes the MorphOS boot problem and you can decide how many other guests you want to break or keep working but please take one of these and don't let users who use MorphOS with mac99 with broken machine in QEMU 6.2.
I do understand your frustration as I have been in similar situations myself where a patch I have written has exposed a bug somewhere else, and indeed this tends to happen quite a bit in QEMU development (or at least on the architectures I've worked with). It's not ideal, but unfortunately it happens and a single patch can turn into several patches. However both adding a bogus DT node which is visible to the guest or breaking existing guests are not the right options here for the reasons I've already explained.
My role as maintainer is to help guide the development of OpenBIOS and manage testing/merging as required. It is not my responsibility to perform exhaustive testing of every patch submitted because the submitter either doesn't want to, or doesn't has time to do it. Did you know that a complete set of boot tests for Mac takes over an hour? I've already tested several versions of your patch, and even performed boot tests on real hardware for comparison. As you can see I have already invested a huge amount of time into this including pointing you towards the NetBSD people and testing on real hardware, and I am also willing to get involved in further discussion on port-macppc where I can help.
Finally whilst you may have a different opinion as to the role of a maintainer, the tone of language used is not appropriate: please reduce the accusatory nature of your comments when posting to the mailing list.
As a bonus I've also fixed compilation with gcc 10 but only because it was in the code I've contributed before. I'm at a point where it feels like wasting my time trying to contibute so I'm very close to give up on it.
Thanks, I'll have a look - although since we are in freeze it's likely this will be deferred until after QEMU 6.2 is released.
ATB,
Mark.