On Sat, Feb 26, 2011 at 3:56 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
On 25/02/11 15:54, Blue Swirl wrote:
So it looks like the IOMMU doesn't have an entry mapped at 0xfc000000 which is causing it to raise IRQ15. Now I see that OpenBIOS doesn't map that range in drivers/iommu.c - is that something we should be doing?
IOMMU virtual address space is completely separate from MMU virtual address space. 0xfc000000 does not need to be mapped by MMU. This area is determined by the mapping range bits in IOMMU control register.
The other interesting part is why address 0xfc000000, since OpenBIOS must surely have pre-mapped ESP somewhere above 0xff000000 to be able to bootstrap the kernel from disk using DMA?
The address 0xfc000000 comes from ESP DMA controller's address register, ESP can only supply lowest 24 bits of the address.
A very similar problem prevented NetBSD 1.6.x boot, because NetBSD programmed the DMA controller and ESP in different order that what QEMU expected. This is now fixed, so it shouldn't be a problem anymore and since OBP works, the problem at hand is not in QEMU side.
Maybe Solaris also programs DMA and ESP in some different way, which causes DMA to fire before the address is set up. To fix it, maybe OpenBIOS should reset the DMA controller after using it or at least disable DMA.
Okay. Well I added some debugging to hw/sun4m_iommu.c to see what was happening and managed to get the following trace:
Welcome to OpenBIOS v1.0 built on Feb 19 2011 17:00 Type 'help' for detailed information
0 > boot cdrom:d -v Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 259040+54154+47486 Bytes SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. Ethernet address = 52:54:0:12:34:56 Using default device instance data vac: enabled in write through mode mem = 131072K (0x8000000) avail mem = 110419968 ### Writing to iommu addr: 1 ### Setting IOMMU base addr: 6bc000 ### Writing to iommu addr: 0 ### Writing to iommu addr: 5 ### IOMMU TLB flush 0 root nexus = SUNW,SPARCstation-5 iommu0 at root: obio 0x10000000 sbus0 at iommu0: obio 0x10001000 dma0 at sbus0: SBus slot 5 0x8400000 dma0 is /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000 ### Writing to iommu addr: 6 ### IOMMU page flush fc000000 /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000 (esp0): esp-options=0x46 esp0 at dma0: SBus slot 5 0x8800000 sparc ipl 4 esp0 is /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000 ### Writing to iommu addr: 6 ### IOMMU page flush fc001000 (crash cut)
So it looks like Solaris has already taken over the IOMMU page table by setting a new base address, and appears to be flushing 2 entries for 0xfc000000 and 0xfc001000 which looks like it should be doing the right thing. If Solaris has taken over the IOMMU page table, then how could OpenBIOS affect this?
If the DMA is launched by accident before the page tables are set up, that could cause the crash.
Please enable DMA debugging as well, that would tell us what is the programming sequence.
Also, what do the IOMMU_RNGE_*MB constants do? Do they control the size of the IOMMU page table? Sorry to keep asking these questions but it seems that Oracle have removed all of the documents referenced in the various IOMMU source files :(
Yes, for example with 64MB range, virtual DMA addresses between 0xfc000000 to 0xffffffff are available.
One possibility for crash is that OpenBIOS uses 64MB range, but Solaris "knows" that it is something else as used by OBP.