Hi all,
Continuing more with my work on migrating SPARC32 to OFMEM, I've hit an issue with the ESP driver which is causing me a bit of a problem.
At the moment, I have a hybrid old-mem/OFMEM SPARC32 setup where I am migrating the various memory calls over to OFMEM one at a time. Currently my implementation just uses OFMEM for allocating MMU page tables, and with the default compile option of -Os looks like this:
Configuration device id QEMU version 1 machine id 32 Unhandled Exception 0x0000001f PC = 0xffd12f08 NPC = 0xffd12f0c Stopping execution
The interesting part is that this problem goes away if I compile with any -O3 or -O0 or but shows when I compile with -O2, -O1 or -Os. So I wonder if I've hit some kind of logic bug in OpenBIOS?
Looking at the SPARCv8 specification, exception 0x1f is equivalent to IRQ15 and the offending code where the error occurs can be found in drivers/esp.c:do_command():
esp->ll->regs[ESP_BUSID] = sd->id & 7; // Set DMA address esp->espdma.regs->st_addr = esp->buffer_dvma; // Set DMA length esp->ll->regs[ESP_TCLOW] = cmdlen & 0xff; esp->ll->regs[ESP_TCMED] = (cmdlen >> 8) & 0xff; // Set DMA direction and enable DMA esp->espdma.regs->cond_reg = DMA_ENABLE;
/* Crash occurs somewhere in this section... */
// Set ATN, issue command esp->ll->regs[ESP_CMD] = ESP_CMD_SELA | ESP_CMD_DMA; // Wait for DMA to complete. Can this fail? while ((esp->espdma.regs->cond_reg & DMA_HNDL_INTR) == 0)
/* End of crash section */;
// Check status status = esp->ll->regs[ESP_STATUS]; // Clear interrupts to avoid guests seeing spurious interrupts (void)esp->ll->regs[ESP_INTRPT];
I notice from the code above there is an explicit comment that mentions clearing interrupts to prevent the guest from seeing them so I would have thought that this wouldn't be an issue? I've checked the espdma structures to ensure that they are marked volatile (or _volatile_) and this appears to be the case - so I'm a little bit stumped. Can anyone point me in the right direction or spot the mistake?
ATB,
Mark.
On Fri, Dec 17, 2010 at 6:14 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Hi all,
Continuing more with my work on migrating SPARC32 to OFMEM, I've hit an issue with the ESP driver which is causing me a bit of a problem.
At the moment, I have a hybrid old-mem/OFMEM SPARC32 setup where I am migrating the various memory calls over to OFMEM one at a time. Currently my implementation just uses OFMEM for allocating MMU page tables, and with the default compile option of -Os looks like this:
Configuration device id QEMU version 1 machine id 32 Unhandled Exception 0x0000001f PC = 0xffd12f08 NPC = 0xffd12f0c Stopping execution
The interesting part is that this problem goes away if I compile with any -O3 or -O0 or but shows when I compile with -O2, -O1 or -Os. So I wonder if I've hit some kind of logic bug in OpenBIOS?
Looking at the SPARCv8 specification, exception 0x1f is equivalent to IRQ15 and the offending code where the error occurs can be found in drivers/esp.c:do_command():
esp->ll->regs[ESP_BUSID] = sd->id & 7; // Set DMA address esp->espdma.regs->st_addr = esp->buffer_dvma; // Set DMA length esp->ll->regs[ESP_TCLOW] = cmdlen & 0xff; esp->ll->regs[ESP_TCMED] = (cmdlen >> 8) & 0xff; // Set DMA direction and enable DMA esp->espdma.regs->cond_reg = DMA_ENABLE;
/* Crash occurs somewhere in this section... */
// Set ATN, issue command esp->ll->regs[ESP_CMD] = ESP_CMD_SELA | ESP_CMD_DMA; // Wait for DMA to complete. Can this fail? while ((esp->espdma.regs->cond_reg & DMA_HNDL_INTR) == 0)
/* End of crash section */;
// Check status status = esp->ll->regs[ESP_STATUS]; // Clear interrupts to avoid guests seeing spurious interrupts (void)esp->ll->regs[ESP_INTRPT];
I notice from the code above there is an explicit comment that mentions clearing interrupts to prevent the guest from seeing them so I would have thought that this wouldn't be an issue? I've checked the espdma structures to ensure that they are marked volatile (or _volatile_) and this appears to be the case - so I'm a little bit stumped. Can anyone point me in the right direction or spot the mistake?
One tricky case was that when allocating memory for IOMMU, the alignment restrictions concern physical memory, not virtual.
Blue Swirl wrote:
I notice from the code above there is an explicit comment that mentions clearing interrupts to prevent the guest from seeing them so I would have thought that this wouldn't be an issue? I've checked the espdma structures to ensure that they are marked volatile (or _volatile_) and this appears to be the case - so I'm a little bit stumped. Can anyone point me in the right direction or spot the mistake?
One tricky case was that when allocating memory for IOMMU, the alignment restrictions concern physical memory, not virtual.
Wow that was it - forcing physical alignment (as well as virtual alignment) seems to resolve the issue - thanks a lot! :)
After a fairly heavy hacking session I now have a version of SPARC32 OpenBIOS that runs under OFMEM! Initial tests show that Solaris 8 boot doesn't get any further, but it's such a tremendous step forward having everything using the same ofmem code (and the debugging that comes with it), and means that when we fix up the various /memory and /virtual-memory properties then they get fixed on ALL platforms.
I think the best thing to do will be to post 2 different patch sets - the first containing the OFMEM changes, and the second for the actual conversion just to make sure it doesn't cause any regressions.
Note that for this patchset I've taken Andreas' advice and am using git for the first time so please go easy on any mistakes in this area :)
ATB,
Mark.
On Tue, Dec 21, 2010 at 10:08 AM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
I notice from the code above there is an explicit comment that mentions clearing interrupts to prevent the guest from seeing them so I would have thought that this wouldn't be an issue? I've checked the espdma structures to ensure that they are marked volatile (or _volatile_) and this appears to be the case - so I'm a little bit stumped. Can anyone point me in the right direction or spot the mistake?
One tricky case was that when allocating memory for IOMMU, the alignment restrictions concern physical memory, not virtual.
Wow that was it - forcing physical alignment (as well as virtual alignment) seems to resolve the issue - thanks a lot! :)
IIRC I was also very, very puzzled by this for a long time.
After a fairly heavy hacking session I now have a version of SPARC32 OpenBIOS that runs under OFMEM! Initial tests show that Solaris 8 boot doesn't get any further, but it's such a tremendous step forward having everything using the same ofmem code (and the debugging that comes with it), and means that when we fix up the various /memory and /virtual-memory properties then they get fixed on ALL platforms.
Great!
I think the best thing to do will be to post 2 different patch sets - the first containing the OFMEM changes, and the second for the actual conversion just to make sure it doesn't cause any regressions.
Note that for this patchset I've taken Andreas' advice and am using git for the first time so please go easy on any mistakes in this area :)
The line length of patch descriptions should be ~65 chars for nice git logs, but since the official repo is still SVN it doesn't matter. Otherwise I didn't spot any problems.
Tip: for juggling with patch sets, I use StGit with QGit. Of course other tools exist and 'git rebase --interactive' is cool too.
Blue Swirl wrote:
Wow that was it - forcing physical alignment (as well as virtual alignment) seems to resolve the issue - thanks a lot! :)
IIRC I was also very, very puzzled by this for a long time.
;)
After a fairly heavy hacking session I now have a version of SPARC32 OpenBIOS that runs under OFMEM! Initial tests show that Solaris 8 boot doesn't get any further, but it's such a tremendous step forward having everything using the same ofmem code (and the debugging that comes with it), and means that when we fix up the various /memory and /virtual-memory properties then they get fixed on ALL platforms.
Great!
Yeah. This has been one of those patches that has taken a long time to get working, and of course MMU bugs are painful to debug when they get wrong :/ At least now all the parts are in place and it's just a case of merging the patches.
I think the best thing to do will be to post 2 different patch sets - the first containing the OFMEM changes, and the second for the actual conversion just to make sure it doesn't cause any regressions.
Note that for this patchset I've taken Andreas' advice and am using git for the first time so please go easy on any mistakes in this area :)
The line length of patch descriptions should be ~65 chars for nice git logs, but since the official repo is still SVN it doesn't matter. Otherwise I didn't spot any problems.
Tip: for juggling with patch sets, I use StGit with QGit. Of course other tools exist and 'git rebase --interactive' is cool too.
Okay thanks. I haven't managed to define my own personal git workflow yet, but I will be sure to have a play with various tools to see what works/doesn't work for me.
ATB,
Mark.