[OpenBIOS] Sun OBP bugs in 1.0RC1

Blue Swirl blueswir1 at hotmail.com
Sat Feb 17 09:57:35 CET 2007


First, thanks for the detailed report!

>1) Error message: "kmem_alloc failed, nbytes 680"
>
>Bug: obp_dumb_memalloc is a bit too dumb.  It needs to pick an address
>if passed a null address.  (According to the comment in the allocator
>in OpenSolaris prom_alloc.c (see
><http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/psm/promif/ieee1275/sun4/prom_alloc.c>),
>"If virthint is zero, a suitable virt is chosen.")
>
>Quick fix: If passed a null address, start doling out addresses at
>10MB and increment by size.
>
>Shortcomings: The quick fix ignores the issue of free() and doesn't
>remove memory from the virtual-memory/available node.

Yes, a real memory allocator/deallocator would be nice. Your code could be a 
starting point, though.

>2) Error message: "Unhandled Exception 0x00000080"
>
>Bug: Trap 0 (entry 0x80 in the table, i.e. syscall_trap_4x) is
>undefined.  This is because the SunOS bootloader installs the trap by
>writing code in the trap table, but the trap table is in the .text
>section of OpenBIOS.  Thus the trap 0 handler simply jumps to "bug".
>
>Quick fix: Move the trap table  to the .data section.  Insert a "b
>entry; nop; nop; nop;" before "bug:".
>
>Shortcomings: Requires the extra "b entry" code.  Allows the only VM
>copy of the trap table to be permanently changed.  OpenBIOS should
>copy the read-only trap table to read-write memory (and update %tbr)
>upon reset/entry.

I think easier solution is to copy the whole ROM to RAM on boot. I'll make a 
patch for that.

>3) #2 above actually exposes another bug.  The write to the read-only
>trap table does not cause an access violation -- instead, it silently
>fails.  The "std" instruction at 0x403e6c in the bootloader has no
>effect.
>
>Bug: Uncertain.  It could be a systemic bug in qemu, but it appears
>that the VM's MMU believes that the page is writable.  That means that
>the VM's MMU is not having the access protection flags set for pages
>mapped to ROM.  It thinks everything is rwx.
>
>Fix?: The VM's MMU should have the access protection flags properly
>set for each ROM section.  This should probably be done within
>OpenBIOS.  E.g., .text should be r-x, .data should probably be rwx,
>etc.
>
>
>This is the one fix I'm really not sure how to implement.  Any
>suggestions?  This may be a problem that only affects this bootloader,
>so fixing #2 above may be all that's strictly necessary.  But I'm not
>positive that this bug doesn't have other ill effects I haven't found
>yet.

The protections are currently RWX for all. At first I tried much stricter 
permissions, but because for example Linux wants to write to romvec 
structure, I had to loosen them. Newer GCCs assume that code can be read 
always, like in x86, and place some jump tables in .text.

The MMU setup is done in arch/sparc32/entry.S. The comments are misleading 
(reflecting the earlier stricter permissions), sorry for that.

But after the ROM-RAM copy change, the current permissions should be OK.

>4) Error messages:
>"obp_devopen(sd(0,0,0):d) = 0xffd8e270
>obp_inst2pkg(fd 0xffd8e270) = 0xffd57f44
>obp_getprop(0xffd57f44, device_type) (not found)"
>
>Bug: The OpenBIOS "interpose" implementation is not transparent to
>non-interposition-aware code (in violation of the interposition spec).
>  The inst2pkg call in this sequence returns the phandle for
>/packages/misc-files, instead of the proper phandle.
>
>Quick fix: Comment out the "interpose disk-label" lines in ob_sd_open.
>
>Shortcomings: It disables disk-label.  The correct fix is to fix the
>underlying problem with interposition, but I'm not sure exactly what
>it is.  Could someone help?

Sorry, I'm not so familiar with Forth internals. Stepan?

>5) Error message:
>"Unhandled Exception 0x00000009
>PC = 0xf0138b20 NPC = 0xf0138b24
>Stopping execution"
>
>Bug: The instruction is trying to read from 0xfd020000+4, which is an
>invalid address.  This address isn't mapped by OBP by default on Sun
>hardware, so the bootloader must be trying to (a) map this address and
>failing silently or (b) skipping the mapping for some reason.  The
>instruction is hard-coded to look at this absolute address.
>
>Fix: Unknown.  This may be another instance of writes silently
>failing, hence my interest in #3 above.  It could also be a
>side-effect of the quick fix for #4.

Maybe there are hardware registers in the location? They could be some known 
hardware (for example Slavio or IOMMU) that is currently present somewhere 
else, or some previously unknown, for example system control registers.

The dirty fix would be mapping some RAM to the location in OpenBIOS and hope 
that the accesses aren't important. The real fix would be getting more 
information of the reason of the access and determining the correct remedy 
(new devices or IO address changes in Qemu etc.).

>I'm happy to work further on these fixes and put them into patch form.
>  Could someone point me to how I'd do that?

Just use "svn diff". Or "diff -rupN" between the clean source tree and your 
fixed one.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/




More information about the OpenBIOS mailing list