On Sun, 7 Nov 2021, Mark Cave-Ayland wrote:
On 06/11/2021 17:26, BALATON Zoltan wrote:
On Sat, 6 Nov 2021, Mark Cave-Ayland wrote:
On 02/11/2021 15:39, BALATON Zoltan wrote:
On Tue, 26 Oct 2021, BALATON Zoltan wrote:
On Tue, 26 Oct 2021, Mark Cave-Ayland wrote:
What was the response from the NetBSD port-macppc mailing list? The NetBSD guys use QEMU a lot as part of their CI/release engineering process and are generally very responsive and knowledgeable: at the very minimum they will be able to point you directly towards the code in question.
I don't know. Are you subscribed to that list or know their IRC channel you could ask them faster? Please cc me the answer. I've tried my best to find this out but I don't have more time to spend with this now.
What will be done about this so that QEMU 6.2 won't be released in a state that MorphOS can't boot on mac99 any more? Looks like NetBSD list can't help finding the problem and I'm not able to debug it. (I think this may be an existing bug only exposed by this patch not caused by it based on the response we got so far. It looks like there's some overlap in memory usage for some reason which causes parts to be overwritten and adding another /pci node probably causes more stuff to be overwritten so it breaks but otherwise it probably just overwrites less critical stuff and still boots by chance. This is likely since most versions don't boot either before or after this patch.)
I think we have two options for the next QEMU release given that we only have one week left for bug fixes:
- Take the original v8 patch adding the pci node with actual pci bus
info and accept this breaks NetBSD 8.0 and 9,0 (but 9.2 still works).
or
- Take the dummy-pci patch that does not break anything but fixes the
MorphOS boot. (This may be a hack but it could be cleaned up later and does the job for now.)
What's your decision on this?
Firstly option 1 is not a viable choice because it would break booting NetBSD 8.0/9.0 for people that have existing disk images. But then option 2 is not
I think this breakage is not caused by this patch but only exposed by it. Therefore it's also not possible to fix it in this patch so don't blame this patch but look for the problem elsewhere and fix it there. This should be no reason to withhold this patch but if it is I've provided an alternative with the dummy-pci that did not break this case but still fixed the problem I'd like to fix.
suitable in that it moves the DT further away from that of a real Mac: the aim should be to move the emulation closer to real life rather than further from it.
OK but then is there an option that is viable and possible to do for QEMU 6.2 release?
I also would not agree with your statement that the patch doesn't break anything, since the fact regressions caused by earlier versions of this patch have only been caught by my pre-merge tests highlight the problem of testing coverage.
It's very unlikely to break anything but I can't be sure for at least two reasons: you did not publish what images you test with so it's impossible for me to make sure your tests will pass but secondly even if it was public I don't have the time to do all this testing. You've volunteered to do that as a maintainer so please do your job and test the dummy-pci patch and only say it will break anything if you can show what. Don't discard it based on belief it might break something. That argument is true for any and all patches.
Introducing a bogus DT node is a highly visible change to introduce to the guest, and propagating such an entry into OSs and device trees is at the minimum going to cause confusion and at worst cause more regressions.
Confusion for whom and why should it cause regressions? I don't believe this.
I had a look at the port-macppc archives and I saw the discussion around possible corruption but there didn't seem to be any further analysis or conclusive outcome. I spent some time this morning burning the NetBSD 8.0 ISO to a CD and booting it on my G4 Mac Mini and it booted into userspace fine. On this basis we know that a DT with multiple pci nodes boots fine with that QEMU version, so any bugs must be either in the extra /pci node contents or how the PCI bus is mapped for the mac99 machine.
This test is not suffucient to prove this patch causes the problem. Do another test: take NetBSD 8.1 or 9.1 that currently does not boot on QEMU master and verify that boots on your Mac mini. If it does then I think that proves there's a bug elsewhere which just happen to be not serious enough without this patch for some NetBSD versions to fail but it's nevertheless there and this patch only exposes that. I see no way this patch could cause overwriting memory in itself unless there's some other problem existing without this patch already. That also means no ammount of changes to this patch will fix that problem.
For me the obvious answer is to use QEMU to determine why boot fails and add a standard /pci@f0000000 node as per your earlier patch that will then be guaranteed to work across all OSs, even ones that do not have existing test coverage, as we know this is what works on real hardware.
Attached is the dmesg from booting NetBSD 8.0 on my Mac Mini, along with a copy of the DTs for an iMac 99 and G4 AGP which are the closest DTs to the QEMU mac99 machine which I hope will be of help. I do understand how getting patches merged is always frustrating - it is something that happens to all of us - but once the issue with the /pci node has been fixed so that it is the same as a real Mac, compatibility becomes something we can simply forget about.
OK I've sent a v9 which completely recreates the /pci node matching the real hardware adding the so far missing properties but it makes no difference for NetBSD so it only proves your theory is wrong. Now you have 10 versions of this patch plus the dummy-pci one to chose from. All of these fixes the MorphOS boot problem and you can decide how many other guests you want to break or keep working but please take one of these and don't let users who use MorphOS with mac99 with broken machine in QEMU 6.2.
I do understand your frustration as I have been in similar situations myself where a patch I have written has exposed a bug somewhere else, and indeed this tends to happen quite a bit in QEMU development (or at least on the architectures I've worked with). It's not ideal, but unfortunately it happens and a single patch can turn into several patches. However both adding a bogus DT node which is visible to the guest or breaking existing guests are not the right options here for the reasons I've already explained.
This is beyond the case where getting in a patch also needs fixing some more surrounding things like code style or little improvements to clean up the code I touch. That's frustrating but I'm OK with that to a point where the additional work is within limits and did that in QEMU several times. E.g. I had pegasos2 emulation working a year earlier but did the additional work to clean up VT82c686 emulation so it could be added cleanly which delayed it by a year due to the limited time I had for it. But in this case we are talking about just a few lines patch adding a dummy pci node that would fix MorphOS boot and you haven't shown yet it would break anything but you can't accept this and demand instead that I should implement support for multiple PCI buses in OpenBIOS and also fix QEMU to precisely emulate the hardware or fix bugs totally unrelated to the issue I'm trying to fix with this patch. This is not reasonable.
My role as maintainer is to help guide the development of OpenBIOS and manage testing/merging as required. It is not my responsibility to perform exhaustive testing of every patch submitted because the submitter either doesn't want to, or doesn't has time to do it. Did you know that a complete set of boot tests for Mac takes over an hour? I've already tested several versions of your patch, and even performed boot tests on real hardware for comparison. As you can see I have already invested a huge amount of time into this including pointing you towards the NetBSD people and testing on real hardware, and I am also willing to get involved in further discussion on port-macppc where I can help.
So did I, I've made several versions of this patch (also testing it with the images I have or the ones you've shown had problems) and trying to follow your advice while keeping within the limits I'm willing to invest in this but none of the attempts were good enough for you. And this is going on for 3 years. As a maintainer you should manage the project so those who want to use it get a working version and allow to make progress for those who want to contribute. There are a few people I know are using MorphOS on QEMU (maybe 5, not a big number but probably more than those who want to use NetBSD 8.0 and 9.0) and they will be left with their setup broken with the next QEMU release due to you're trying to preserve some old NetBSD versions booting which maybe nobody uses and only booted by chance so far. But there's also a simple fix to keep both these NetBSD versions work and fix MorphOS too but that's not acceptable for you either. This makes it impossible to fix the problem and stops all progress whereas if you took the simple fix with dummy pci that would solve the problem and not make the situation much worse than it is now and allow to make a small progress also allowing MorphOS to work. This is not such a big step backwards that's unacceptable and does not prevent cleaning it up later so I don't know what you can't accept this as a solution. Unless you can show this dummy pci breaks some other guest in your tests, but I doubt it would. This has escalated from a simple patch to fix a problem to a more complex patch adding the complete PCI node but that exposed bugs already present and caused problems for other guests. So what would the next step be? I've spent about a month debugging the original problem with MorphOS including following in a debugger why it could not talk to USB devices becuase its developers did not help at all. Then tried to find a solution to patch OpenBIOS to work around this problem, first from Forth then from C which took another few weeks. Then made countless versions on your requests trying to meet your demands which also took time, I've even fixed an issue with OpenSUSE that also took hours to get the image and test it and also tried to find out what causes the problem with NetBSD but at the end we're back where we were uears ago that the dummy pci version is the one which causes the least problems and fixes the issue. So the solution is simple and available for years yet it's impossible for users to get a working version without compiling their own patched version. Is that a good service as a maintainer?
Finally whilst you may have a different opinion as to the role of a maintainer, the tone of language used is not appropriate: please reduce the accusatory nature of your comments when posting to the mailing list.
I'm sorry but I'm really upset about this not being able to get over this problem for years and still don't see you can propose an alternative solution you'd accept just keep saying it's not acceptable for any patches I send trying to fix this. We have a problem booting MorphOS that I've debugged and identified that it could be fixed by adding a node to OpenBIOS one way or other and there are users out there who would be helped by this working in QEMU but I can't convince you that this problem is worth fixing. Why you can't accept a solution that fixes the problem and does not break other guests? Is it better for users to have clean code that does not work or a bit less clean but one that works? (And the OpenBIOS code is not that clean to begin with there are a lot of differences from real OpenFirmware so a bit more should not hurt as long as it works and the result is more guests booting.)
As a bonus I've also fixed compilation with gcc 10 but only because it was in the code I've contributed before. I'm at a point where it feels like wasting my time trying to contibute so I'm very close to give up on it.
Thanks, I'll have a look - although since we are in freeze it's likely this will be deferred until after QEMU 6.2 is released.
That's OK, however the MorphOS fix is something that can get in during the freeze and should be resolved. Please get one of the patches you're most happy with (or the least unhappy, I don't care) and get this solved at last.
Regards, BALATON Zoltan