First, thanks for the detailed report!
- Error message: "kmem_alloc failed, nbytes 680"
Bug: obp_dumb_memalloc is a bit too dumb. It needs to pick an address if passed a null address. (According to the comment in the allocator in OpenSolaris prom_alloc.c (see http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/promif/ieee1275/sun4/prom_alloc.c), "If virthint is zero, a suitable virt is chosen.")
Quick fix: If passed a null address, start doling out addresses at 10MB and increment by size.
Shortcomings: The quick fix ignores the issue of free() and doesn't remove memory from the virtual-memory/available node.
Yes, a real memory allocator/deallocator would be nice. Your code could be a starting point, though.
- Error message: "Unhandled Exception 0x00000080"
Bug: Trap 0 (entry 0x80 in the table, i.e. syscall_trap_4x) is undefined. This is because the SunOS bootloader installs the trap by writing code in the trap table, but the trap table is in the .text section of OpenBIOS. Thus the trap 0 handler simply jumps to "bug".
Quick fix: Move the trap table to the .data section. Insert a "b entry; nop; nop; nop;" before "bug:".
Shortcomings: Requires the extra "b entry" code. Allows the only VM copy of the trap table to be permanently changed. OpenBIOS should copy the read-only trap table to read-write memory (and update %tbr) upon reset/entry.
I think easier solution is to copy the whole ROM to RAM on boot. I'll make a patch for that.
- #2 above actually exposes another bug. The write to the read-only
trap table does not cause an access violation -- instead, it silently fails. The "std" instruction at 0x403e6c in the bootloader has no effect.
Bug: Uncertain. It could be a systemic bug in qemu, but it appears that the VM's MMU believes that the page is writable. That means that the VM's MMU is not having the access protection flags set for pages mapped to ROM. It thinks everything is rwx.
Fix?: The VM's MMU should have the access protection flags properly set for each ROM section. This should probably be done within OpenBIOS. E.g., .text should be r-x, .data should probably be rwx, etc.
This is the one fix I'm really not sure how to implement. Any suggestions? This may be a problem that only affects this bootloader, so fixing #2 above may be all that's strictly necessary. But I'm not positive that this bug doesn't have other ill effects I haven't found yet.
The protections are currently RWX for all. At first I tried much stricter permissions, but because for example Linux wants to write to romvec structure, I had to loosen them. Newer GCCs assume that code can be read always, like in x86, and place some jump tables in .text.
The MMU setup is done in arch/sparc32/entry.S. The comments are misleading (reflecting the earlier stricter permissions), sorry for that.
But after the ROM-RAM copy change, the current permissions should be OK.
- Error messages:
"obp_devopen(sd(0,0,0):d) = 0xffd8e270 obp_inst2pkg(fd 0xffd8e270) = 0xffd57f44 obp_getprop(0xffd57f44, device_type) (not found)"
Bug: The OpenBIOS "interpose" implementation is not transparent to non-interposition-aware code (in violation of the interposition spec). The inst2pkg call in this sequence returns the phandle for /packages/misc-files, instead of the proper phandle.
Quick fix: Comment out the "interpose disk-label" lines in ob_sd_open.
Shortcomings: It disables disk-label. The correct fix is to fix the underlying problem with interposition, but I'm not sure exactly what it is. Could someone help?
Sorry, I'm not so familiar with Forth internals. Stepan?
- Error message:
"Unhandled Exception 0x00000009 PC = 0xf0138b20 NPC = 0xf0138b24 Stopping execution"
Bug: The instruction is trying to read from 0xfd020000+4, which is an invalid address. This address isn't mapped by OBP by default on Sun hardware, so the bootloader must be trying to (a) map this address and failing silently or (b) skipping the mapping for some reason. The instruction is hard-coded to look at this absolute address.
Fix: Unknown. This may be another instance of writes silently failing, hence my interest in #3 above. It could also be a side-effect of the quick fix for #4.
Maybe there are hardware registers in the location? They could be some known hardware (for example Slavio or IOMMU) that is currently present somewhere else, or some previously unknown, for example system control registers.
The dirty fix would be mapping some RAM to the location in OpenBIOS and hope that the accesses aren't important. The real fix would be getting more information of the reason of the access and determining the correct remedy (new devices or IO address changes in Qemu etc.).
I'm happy to work further on these fixes and put them into patch form. Could someone point me to how I'd do that?
Just use "svn diff". Or "diff -rupN" between the clean source tree and your fixed one.
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/