I've been trying to get old versions of SunOS to load under qemu. In doing so, I've encountered a number of bugs in OBP. I'm not always certain of the best fix, but I can at least provide a quick hack that will get people farther along.
1) Error message: "kmem_alloc failed, nbytes 680"
Bug: obp_dumb_memalloc is a bit too dumb. It needs to pick an address if passed a null address. (According to the comment in the allocator in OpenSolaris prom_alloc.c (see http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/promif/ieee1275/sun4/prom_alloc.c), "If virthint is zero, a suitable virt is chosen.")
Quick fix: If passed a null address, start doling out addresses at 10MB and increment by size.
Shortcomings: The quick fix ignores the issue of free() and doesn't remove memory from the virtual-memory/available node.
After the quick fix, the boot gets farther, leading us to:
2) Error message: "Unhandled Exception 0x00000080"
Bug: Trap 0 (entry 0x80 in the table, i.e. syscall_trap_4x) is undefined. This is because the SunOS bootloader installs the trap by writing code in the trap table, but the trap table is in the .text section of OpenBIOS. Thus the trap 0 handler simply jumps to "bug".
Quick fix: Move the trap table to the .data section. Insert a "b entry; nop; nop; nop;" before "bug:".
Shortcomings: Requires the extra "b entry" code. Allows the only VM copy of the trap table to be permanently changed. OpenBIOS should copy the read-only trap table to read-write memory (and update %tbr) upon reset/entry.
3) #2 above actually exposes another bug. The write to the read-only trap table does not cause an access violation -- instead, it silently fails. The "std" instruction at 0x403e6c in the bootloader has no effect.
Bug: Uncertain. It could be a systemic bug in qemu, but it appears that the VM's MMU believes that the page is writable. That means that the VM's MMU is not having the access protection flags set for pages mapped to ROM. It thinks everything is rwx.
Fix?: The VM's MMU should have the access protection flags properly set for each ROM section. This should probably be done within OpenBIOS. E.g., .text should be r-x, .data should probably be rwx, etc.
This is the one fix I'm really not sure how to implement. Any suggestions? This may be a problem that only affects this bootloader, so fixing #2 above may be all that's strictly necessary. But I'm not positive that this bug doesn't have other ill effects I haven't found yet.
At any rate, fixing #2 gets us still further, to:
4) Error messages: "obp_devopen(sd(0,0,0):d) = 0xffd8e270 obp_inst2pkg(fd 0xffd8e270) = 0xffd57f44 obp_getprop(0xffd57f44, device_type) (not found)"
Bug: The OpenBIOS "interpose" implementation is not transparent to non-interposition-aware code (in violation of the interposition spec). The inst2pkg call in this sequence returns the phandle for /packages/misc-files, instead of the proper phandle.
Quick fix: Comment out the "interpose disk-label" lines in ob_sd_open.
Shortcomings: It disables disk-label. The correct fix is to fix the underlying problem with interposition, but I'm not sure exactly what it is. Could someone help?
Fixing #4 gets us quite a bit further, until:
5) Error message: "Unhandled Exception 0x00000009 PC = 0xf0138b20 NPC = 0xf0138b24 Stopping execution"
Bug: The instruction is trying to read from 0xfd020000+4, which is an invalid address. This address isn't mapped by OBP by default on Sun hardware, so the bootloader must be trying to (a) map this address and failing silently or (b) skipping the mapping for some reason. The instruction is hard-coded to look at this absolute address.
Fix: Unknown. This may be another instance of writes silently failing, hence my interest in #3 above. It could also be a side-effect of the quick fix for #4.
I'm happy to work further on these fixes and put them into patch form. Could someone point me to how I'd do that?
First, thanks for the detailed report!
- Error message: "kmem_alloc failed, nbytes 680"
Bug: obp_dumb_memalloc is a bit too dumb. It needs to pick an address if passed a null address. (According to the comment in the allocator in OpenSolaris prom_alloc.c (see http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/promif/ieee1275/sun4/prom_alloc.c), "If virthint is zero, a suitable virt is chosen.")
Quick fix: If passed a null address, start doling out addresses at 10MB and increment by size.
Shortcomings: The quick fix ignores the issue of free() and doesn't remove memory from the virtual-memory/available node.
Yes, a real memory allocator/deallocator would be nice. Your code could be a starting point, though.
- Error message: "Unhandled Exception 0x00000080"
Bug: Trap 0 (entry 0x80 in the table, i.e. syscall_trap_4x) is undefined. This is because the SunOS bootloader installs the trap by writing code in the trap table, but the trap table is in the .text section of OpenBIOS. Thus the trap 0 handler simply jumps to "bug".
Quick fix: Move the trap table to the .data section. Insert a "b entry; nop; nop; nop;" before "bug:".
Shortcomings: Requires the extra "b entry" code. Allows the only VM copy of the trap table to be permanently changed. OpenBIOS should copy the read-only trap table to read-write memory (and update %tbr) upon reset/entry.
I think easier solution is to copy the whole ROM to RAM on boot. I'll make a patch for that.
- #2 above actually exposes another bug. The write to the read-only
trap table does not cause an access violation -- instead, it silently fails. The "std" instruction at 0x403e6c in the bootloader has no effect.
Bug: Uncertain. It could be a systemic bug in qemu, but it appears that the VM's MMU believes that the page is writable. That means that the VM's MMU is not having the access protection flags set for pages mapped to ROM. It thinks everything is rwx.
Fix?: The VM's MMU should have the access protection flags properly set for each ROM section. This should probably be done within OpenBIOS. E.g., .text should be r-x, .data should probably be rwx, etc.
This is the one fix I'm really not sure how to implement. Any suggestions? This may be a problem that only affects this bootloader, so fixing #2 above may be all that's strictly necessary. But I'm not positive that this bug doesn't have other ill effects I haven't found yet.
The protections are currently RWX for all. At first I tried much stricter permissions, but because for example Linux wants to write to romvec structure, I had to loosen them. Newer GCCs assume that code can be read always, like in x86, and place some jump tables in .text.
The MMU setup is done in arch/sparc32/entry.S. The comments are misleading (reflecting the earlier stricter permissions), sorry for that.
But after the ROM-RAM copy change, the current permissions should be OK.
- Error messages:
"obp_devopen(sd(0,0,0):d) = 0xffd8e270 obp_inst2pkg(fd 0xffd8e270) = 0xffd57f44 obp_getprop(0xffd57f44, device_type) (not found)"
Bug: The OpenBIOS "interpose" implementation is not transparent to non-interposition-aware code (in violation of the interposition spec). The inst2pkg call in this sequence returns the phandle for /packages/misc-files, instead of the proper phandle.
Quick fix: Comment out the "interpose disk-label" lines in ob_sd_open.
Shortcomings: It disables disk-label. The correct fix is to fix the underlying problem with interposition, but I'm not sure exactly what it is. Could someone help?
Sorry, I'm not so familiar with Forth internals. Stepan?
- Error message:
"Unhandled Exception 0x00000009 PC = 0xf0138b20 NPC = 0xf0138b24 Stopping execution"
Bug: The instruction is trying to read from 0xfd020000+4, which is an invalid address. This address isn't mapped by OBP by default on Sun hardware, so the bootloader must be trying to (a) map this address and failing silently or (b) skipping the mapping for some reason. The instruction is hard-coded to look at this absolute address.
Fix: Unknown. This may be another instance of writes silently failing, hence my interest in #3 above. It could also be a side-effect of the quick fix for #4.
Maybe there are hardware registers in the location? They could be some known hardware (for example Slavio or IOMMU) that is currently present somewhere else, or some previously unknown, for example system control registers.
The dirty fix would be mapping some RAM to the location in OpenBIOS and hope that the accesses aren't important. The real fix would be getting more information of the reason of the access and determining the correct remedy (new devices or IO address changes in Qemu etc.).
I'm happy to work further on these fixes and put them into patch form. Could someone point me to how I'd do that?
Just use "svn diff". Or "diff -rupN" between the clean source tree and your fixed one.
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
* Blue Swirl blueswir1@hotmail.com [070217 09:57]:
- Error messages:
"obp_devopen(sd(0,0,0):d) = 0xffd8e270 obp_inst2pkg(fd 0xffd8e270) = 0xffd57f44 obp_getprop(0xffd57f44, device_type) (not found)"
Bug: The OpenBIOS "interpose" implementation is not transparent to non-interposition-aware code (in violation of the interposition spec). The inst2pkg call in this sequence returns the phandle for /packages/misc-files, instead of the proper phandle.
Quick fix: Comment out the "interpose disk-label" lines in ob_sd_open.
Shortcomings: It disables disk-label. The correct fix is to fix the underlying problem with interposition, but I'm not sure exactly what it is. Could someone help?
Sorry, I'm not so familiar with Forth internals. Stepan?
So what's the correct thing to do, inst2pkg is supposed to return the phandle to sd, even though the device was interposed?
A quick look reveils we might need a special case for ihandle>phandle in case of an interposed device.
stepan@coresystems.de wrote:
- Blue Swirl blueswir1@hotmail.com [070217 09:57]:
- Error messages:
"obp_devopen(sd(0,0,0):d) = 0xffd8e270 obp_inst2pkg(fd 0xffd8e270) = 0xffd57f44 obp_getprop(0xffd57f44, device_type) (not found)"
Bug: The OpenBIOS "interpose" implementation is not transparent to non-interposition-aware code (in violation of the interposition spec). The inst2pkg call in this sequence returns the phandle for /packages/misc-files, instead of the proper phandle.
Quick fix: Comment out the "interpose disk-label" lines in ob_sd_open.
Shortcomings: It disables disk-label. The correct fix is to fix the underlying problem with interposition, but I'm not sure exactly what it is. Could someone help?
Sorry, I'm not so familiar with Forth internals. Stepan?
So what's the correct thing to do, inst2pkg is supposed to return the phandle to sd, even though the device was interposed?
A quick look reveils we might need a special case for ihandle>phandle in case of an interposed device.
This may be related to a bug I found a long time ago, but never had the mandate to fix.
It had to do with the way that the search for a device is implemented; it is a recursive procedure and very tricky to understand or to debug.
In essence, the bug was that, under certain circumstances, (I seem to recall that one of them was something like that there was a match for the name of the sought-after device but not for its address -- e.g.: searching for "memory@bf000000" when there were device-nodes called "memory", but none with that address), the search would reach the end of the list of devices, and, instead of returning a failure indication, it would return a phandle for the last device in the list.
If the device whose phandle was erroneously returned matches the last one in the output of the show-devs command, then this might be an after-effect of that bug.
I apologize in advance if this advice is "off the mark". I am no longer "in the game", and have only memory that is getting increasingly remote to rely on...
On 2/15/07, Peter pjcreath+openbios@gmail.com wrote:
- Error message:
"Unhandled Exception 0x00000009 PC = 0xf0138b20 NPC = 0xf0138b24 Stopping execution"
Bug: This is a combination of a bug in the SunOS kernel (that never occurs in real hardware) and in qemu. SunOS is trying to printf() an error message before mapping a required device (_utimers). This error message doesn't occur on real hardware, so this access exception doesn't occur on a real SparcStation. The error message is due to an invalid machine ID specified by qemu in the nvram. Qemu is setting the machine ID to 0x80, which is not recognized by SunOS. SunOS recognizes only 0x71 and 0x72. According to idprom.h (quoted in part at http://www.sunmanagers.org/archives/1993/0050.html), these correspond to:
#define IDM_SUN4M_690 0x71 /* SPARCsystem 600 series */ #define IDM_SUN4M_50 0x72 /* Campus 2 */
Fix: Change qemu-0.9.0/hw/sun4m.c:154 to set the machine ID to 0x72 (the value of a SparcStation 10) instead of 0x80. It may be preferable to make qemu more configurable in this regard, but this will do for now. I've forwarded this information to the qemu-devel mailing list.
There's now another subsequent access exception that I still need to chase down.
6) Error message: "BAD TRAP: cpu=0 type=9 rp=fd008f0c addr=feff8008 mmu_fsr=3a6 rw=2 MMU sfsr=3a6: Invalid Address on supv data store at level 3 regs at fd008f0c: psr=4400fc7 pc=f00053f4 npc=f00053f8 ..."
Bug: Real sun4m hardware registers 4 CPU-specific interrupts followed by a system-wide interrupt, regardless of the number of CPUs installed. The same is true of counters. SunOS looks at the 5th interrupt for the system-wide interrupt. OBP, since there's only one CPU, just sets up one CPU-specific interrupt followed by the system-wide interrupt, so there is no 5th interrupt. See the comment on "NCPU" at http://stuff.mit.edu/afs/athena/astaff/project/opssrc/sys.sunos/sun4m/devaddr.h.
Fix: in obp_interrupt_init() and obp_counter_init() register 4 CPU-specific interrupts before allocating the system-wide interrupt. The kernel will then map the 5th interrupt to the system-wide interrupt.
7) Error message: "BAD TRAP: cpu=0 type=9 rp=fd008d8c addr=7ff000 mmu_fsr=126 rw=1 MMU sfsr=126: Invalid Address on supv data fetch at level 1 regs at fd008d8c: psr=4000cc4 pc=f01339a4 npc=f01339a8 ..."
Bug: The command-line arguments passed to the kernel are fixed at address 0x7FF000 (CMDLINE_ADDR, passed from qemu via nv_info.cmdline), which is no longer mapped by the time the kernel looks at the boot arguments. A regular Sun boot ROM will copy this into mapped memory.
Fix: Copy the string in nv_info.cmdline to a OpenBIOS global (since OpenBIOS continues to be mapped) in ob_nvram_init().
8) Error message: "BAD TRAP: cpu=0 type=9 rp=fd008dec addr=1019000 mmu_fsr=126 rw=1 MMU sfsr=126: Invalid Address on supv data fetch at level 1 regs at fd008dec: psr=4400cc5 pc=f0131680 npc=f0131684 ..."
Bug: The dumb memory allocator from bug #1 was allocating a range that the SunOS 4 kernel doesn't like.
Fix: Mimic the Sun boot ROM allocator: the top of the heap should be a 0xFFEDA000 and allocations should return descending addresses. So, for example, if asking for 0x1000 bytes, the first returned pointer should be 0xFFED9000.
9) Error message: "BAD TRAP: cpu=0 type=9 rp=fd008d2c addr=b1b91000 mmu_fsr=126 rw=1 MMU sfsr=126: Invalid Address on supv data fetch at level 1 regs at fd008d2c: psr=4900cc3 pc=f0142c04 npc=f0142c08 ..."
Bug: The precise underlying cause isn't clear. The bug appears due to a variation between OBP's behavior and stock Sun behavior.
Fix: Add the "cache-physical?" property to the CPU node in ob_nvram_init() and bump the "mmu-nctx" property up to 4096 (from 256).
10) Error message: "BAD TRAP: cpu=0 type=9 rp=fd008d8c addr=fff8910f mmu_fsr=326 rw=1 MMU sfsr=326: Invalid Address on supv data fetch at level 3 regs at fd008d8c: psr=4900cc4 pc=f0137f58 npc=f0137fc ..."
Bug: Memory is not mapped by OBP the way SunOS expects it. In particular, it expects that 0xFF843000 (Syslimit) will be mapped down to the L3 level, where it points to 0 (not actually mapped to physical memory). hat_findpte() is returning an invalid address on the emulator, whereas the address of an empty PTE is returned by real hardware. The invalid address comes from the translation table in use by the MMU when load_tmpptes() sets up the PTE table used by hat_findpte().
Fix: Unknown. The memory mapping utilized by OBP appears to be scattered all over, and cleaning it up looks like a substantial task. It's possible that by actually implementing obp_dumb_memunmap() and obp_dumb_memfree() (and updating the available "memory" and "virtual-memory" properties), it could make this problem go away.
A patch file resolving bugs 1-9 is forthcoming.
Here's a patch file based on the latest snapshot that addresses everything up through #9, excluding #5 (which requires a tweak to qemu). It includes the patch for #2 written by Blue Swirl, which copies the entire contents of the ROM into RAM.
* Peter pjcreath+openbios@gmail.com [070308 23:07]:
Here's a patch file based on the latest snapshot that addresses everything up through #9, excluding #5 (which requires a tweak to qemu). It includes the patch for #2 written by Blue Swirl, which copies the entire contents of the ROM into RAM.
This is amazing work. Thank you very much. It's applied in the latest svn revision.
Stefan
Here's a patch file based on the latest snapshot that addresses everything up through #9, excluding #5 (which requires a tweak to qemu). It includes the patch for #2 written by Blue Swirl, which copies the entire contents of the ROM into RAM.
Thank you for your work, it would be great to get more OSes running on Qemu/OpenBIOS.
Just one comment: On my Sun4m machine (SS-5) the OBP properties for interrupt and counter reg nodes are as follows:
$ cat obio/interrupt@0,e00000/reg 00000000.00e00000.00000010.00000000.00e10000.00000010 $ cat obio/counter@0,d00000/reg 00000000.00d00000.00000010.00000000.00d10000.00000010
Your patch deviates from this. If you want to emulate a SS-10, it may be better to add it as a new sub-machine type to Qemu and OpenBIOS. I don't know how much SS-5 and SS-10 differ. For example, what is the MMU on SS-10, is it also Swift (SRMMU)?
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
This deviation is the documented fix for one of the SunOS crashes. Sun4m requires there to be 4 CPU-specific interrupts and counters (regardless of the number of CPUs) before the system-wide interrupt and counter. The OBP registers you point out were broken in this regard: it only mapped a single CPU specific interrupt and counter. This allowed OBP to boot, say, linux, but not SunOS.
On 3/10/07, Blue Swirl blueswir1@hotmail.com wrote:
Here's a patch file based on the latest snapshot that addresses everything up through #9, excluding #5 (which requires a tweak to qemu). It includes the patch for #2 written by Blue Swirl, which copies the entire contents of the ROM into RAM.
Thank you for your work, it would be great to get more OSes running on Qemu/OpenBIOS.
Just one comment: On my Sun4m machine (SS-5) the OBP properties for interrupt and counter reg nodes are as follows:
$ cat obio/interrupt@0,e00000/reg 00000000.00e00000.00000010.00000000.00e10000.00000010 $ cat obio/counter@0,d00000/reg 00000000.00d00000.00000010.00000000.00d10000.00000010
Your patch deviates from this. If you want to emulate a SS-10, it may be better to add it as a new sub-machine type to Qemu and OpenBIOS. I don't know how much SS-5 and SS-10 differ. For example, what is the MMU on SS-10, is it also Swift (SRMMU)?
Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-- OpenBIOS http://openbios.org/ Mailinglist: http://lists.openbios.org/mailman/listinfo Free your System - May the Forth be with you
This deviation is the documented fix for one of the SunOS crashes. Sun4m requires there to be 4 CPU-specific interrupts and counters (regardless of the number of CPUs) before the system-wide interrupt and counter. The OBP registers you point out were broken in this regard: it only mapped a single CPU specific interrupt and counter. This allowed OBP to boot, say, linux, but not SunOS.
But the register dump with only single interrrupt/counter comes from a real SparcStation 5 machine with real OpenBootProm (OBP) made by Sun, it was not made with OpenBIOS. It can't be broken or in violation of Sun4m spec, I would be surprised if my machine had not run SunOS originally. My guess is that the version of SunOS you are trying to run predates even SS-5 if it requires things to be like SS-10. Or it could be targeted for a SMP system with 4 CPUs. Or maybe those days SunOS was specific to each machine type, I can't remember anymore.
So to help you make these changes, I'd like to add a new machine for SS-10. That way these changes can be tied to the machine model, if necessary. I want to keep the Qemu's model for SS-5 as close to real SS-5 as possible and this register change is a true deviation from that.
About the terminology: Sun4m was a machine class, or a super architecture. There were several implementations, including SparcStation 5 (which Qemu currently emulates), older model SparcStation 10 and many others. OBP (OpenBootProm) was Sun's predecessor to or early implementation of Open Firmware. Every machine had a different version of OBP, also some versions were older, some newer. OpenBIOS is a free re-implementation, emulating the interface version 3.
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/