Hi folks,
So I thought it may be interesting over the weekend to take a look and see how Solaris on SPARC32 is looking at the moment, and found that I was getting about as far as Artyom was, i.e. ufsboot dies after emitting this cryptic Forth statement with missing parameters:
['] find-device catch if 2drop true else current-device device-end then swap l!
With gdb primed with a breakpoint set at obp_fortheval_v2() I was able to poke around and see what was happening at the time the call into OpenBIOS was being made. Nothing interesting there. But jumping back a frame showed some extra information being set in the o registers:
(gdb) bt #0 obp_fortheval_v2 (str=0x1190d0 " ['] find-device catch if 2drop true else current-device device-end then swap l!") at ../arch/sparc32/romvec.c:424 #1 0x00113dd4 in ?? () #2 0x00113dd4 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) frame 1 #1 0x00113dd4 in ?? () (gdb) info regi ... o0 0x1190d0 1151184 o1 0x49 73 o2 0x1335c0 1258944 o3 0x130794 1247124 o4 0x0 0 o5 0x0 0 ...
Okay; so let's take a look at what's in these o registers:
(gdb) x/1s 0x1190d0 0x1190d0: " ['] find-device catch if 2drop true else current-device device-end then swap l!" (gdb) x/1s 0x1335c0 0x1335c0: "/iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@2,0:d" (gdb) x/4xb 0x130794 0x130794: 0xff 0xff 0xff 0xff (gdb)
Bingo! The missing parameters! Stepping back to just before where the code is invoked, I see this:
0x00113dac: sethi %hi(0x118c00), %g2 0x00113db0: mov %i0, %o0 0x00113db4: ld [ %g2 + 0xa0 ], %g2 0x00113db8: mov %l2, %o1 0x00113dbc: mov %l1, %o2 0x00113dc0: mov %i2, %o3 0x00113dc4: mov %i4, %o4 0x00113dc8: ld [ %g2 + 0x7c ], %l0 0x00113dcc: call %l0 0x00113dd0: mov %i5, %o5
So this makes it appear as if %o0 to %o5 are all set in preparation for the romvec call. Now minus the actual Forth string (actually a cstr), if the other values were pushed onto the stack in reverse order then it appears that this code would it is supposed to.
There is one Forth call in the traces that only takes 1 parameter, but there doesn't seem to be a way of passing the number of arguments into the romvec call. However, based on looking at the traces, starting with the first non-zero value in the highest o register and then pushing all the lower o register values onto the Forth stack before execution should give the correct result.
On the basis of this, I'd like to suggest the following proposal:
1) Change the signature of obp_fortheval_v2() from:
static void obp_fortheval_v2(char *str)
to:
static void obp_fortheval_v2(char *str, int arg0, int arg1, int arg2, int arg3, int arg4)
2) Add code to obp_fortheval_v2 that starting from arg4 and working down to arg0, finds the first non-zero value and then pushes all remaining values argN down to arg0 onto the Forth stack before executing the Forth string.
Does this sound reasonable? I'm surprised that one else has realised that the obp_fortheval_v2 function signature was wrong, but I guess it probably hardly gets used for anything these days.
As a separate hack, the equivalent of the OpenBOOT current-device word in OpenBIOS is active-package. So we should probably create an OpenBIOS variable called current-device too, and set its value to -1 when the function is entered. Then after all Forth has been evaluated, if it's value has changed from before the Forth call was made then set active-package to the new value before exit.
ATB,
Mark.
- Change the signature of obp_fortheval_v2() from:
static void obp_fortheval_v2(char *str)
to:
static void obp_fortheval_v2(char *str, int arg0, int arg1, int arg2, int arg3, int arg4)
- Add code to obp_fortheval_v2 that starting from arg4 and working down to
arg0, finds the first non-zero value and then pushes all remaining values argN down to arg0 onto the Forth stack before executing the Forth string.
Does this sound reasonable? I'm surprised that one else has realised that the obp_fortheval_v2 function signature was wrong, but I guess it probably hardly gets used for anything these days.
Ops. Forgot to submit this patch back then. :(. Sorry. The reason was not getting too far with the patch, because of the broken memory management.
As a separate hack, the equivalent of the OpenBOOT current-device word in OpenBIOS is active-package. So we should probably create an OpenBIOS variable called current-device too, and set its value to -1 when the function is entered. Then after all Forth has been evaluated, if it's value has changed from before the Forth call was made then set active-package to the new value before exit.
ATB,
Mark.
-- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063
Sirius Labs: http://www.siriusit.co.uk/labs
Artyom Tarasenko wrote:
Ops. Forgot to submit this patch back then. :(. Sorry.
Heh :)
The reason was not getting too far with the patch, because of the broken memory management.
Well one of the Forth words that it tries to execute is rmap@. According to the OpenBOOT command reference, it looks like this:
rmap@ ( virt -- rmentry ) - Returns the region map entry for the virtual address.
I have no idea what a region map is though; there is mention of MMU region map sizes (based upon page table level) in the v8 architecture manual, but nothing concrete enough to tell me what the above Forth word should actually do...
ATB,
Mark.
Artyom Tarasenko wrote:
Does this sound reasonable? I'm surprised that one else has realised that the obp_fortheval_v2 function signature was wrong, but I guess it probably hardly gets used for anything these days.
Ops. Forgot to submit this patch back then. :(. Sorry. The reason was not getting too far with the patch, because of the broken memory management.
Okay; I've committed something which should do the trick to SVN as r915.
Hmmmm these symptoms look exactly like something on SPARC64 where various bits of memory were getting clobbered because of a lack of stack space... (goes and has a look)...
Right - it's definitely something related to the stack when calling from the client into OpenBIOS (or vice-versa). If you take a look at arch/sparc32/context.c and start increasing IMAGE_STACK_SIZE, say to 16K or 32K you'll see that things suddenly start to work much better.
I think its related to the flushing of register windows onto the stack when switching between the two. My thoughts are that this is probably related to Igor's SPARC64 patches here:
http://www.openfirmware.info/pipermail/openbios/2009-July/003762.html
I suspect that you'll need to come up with something along similar lines for SPARC32. Blue, any thoughts?
ATB,
Mark.
On Mon, Oct 18, 2010 at 7:43 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Artyom Tarasenko wrote:
Does this sound reasonable? I'm surprised that one else has realised that the obp_fortheval_v2 function signature was wrong, but I guess it probably hardly gets used for anything these days.
Ops. Forgot to submit this patch back then. :(. Sorry. The reason was not getting too far with the patch, because of the broken memory management.
Okay; I've committed something which should do the trick to SVN as r915.
Hmmmm these symptoms look exactly like something on SPARC64 where various bits of memory were getting clobbered because of a lack of stack space... (goes and has a look)...
Right - it's definitely something related to the stack when calling from the client into OpenBIOS (or vice-versa). If you take a look at arch/sparc32/context.c and start increasing IMAGE_STACK_SIZE, say to 16K or 32K you'll see that things suddenly start to work much better.
I think its related to the flushing of register windows onto the stack when switching between the two. My thoughts are that this is probably related to Igor's SPARC64 patches here:
http://www.openfirmware.info/pipermail/openbios/2009-July/003762.html
I suspect that you'll need to come up with something along similar lines for SPARC32. Blue, any thoughts?
On Sparc32 the client interface is not used and there is no 'flushw' instruction, but otherwise saving and restoring %g registers and flushing windows may be helpful.
Blue Swirl wrote:
I suspect that you'll need to come up with something along similar lines for SPARC32. Blue, any thoughts?
On Sparc32 the client interface is not used and there is no 'flushw' instruction, but otherwise saving and restoring %g registers and flushing windows may be helpful.
Hmmm okay. Please find attached the following very very alpha quality patch (read: it doesn't work) that attempts to flush the windows using the FLUSH_ALL_KERNEL_WINDOWS macro and save the globals in an ASM wrapper before calling the actual obp_fortheval_v2() C function, then attempting to restore the globals and return.
Sadly I think that I've misunderstood something in my copy-and-paste quality patch as SPARC32 Solaris boot seems to get *less* far than it did without this patch, so my effort must have a fundamental bug somewhere...
ATB,
Mark.
On Wed, Oct 20, 2010 at 6:40 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
I suspect that you'll need to come up with something along similar lines for SPARC32. Blue, any thoughts?
On Sparc32 the client interface is not used and there is no 'flushw' instruction, but otherwise saving and restoring %g registers and flushing windows may be helpful.
Hmmm okay. Please find attached the following very very alpha quality patch (read: it doesn't work) that attempts to flush the windows using the FLUSH_ALL_KERNEL_WINDOWS macro and save the globals in an ASM wrapper before calling the actual obp_fortheval_v2() C function, then attempting to restore the globals and return.
Sadly I think that I've misunderstood something in my copy-and-paste quality patch as SPARC32 Solaris boot seems to get *less* far than it did without this patch, so my effort must have a fundamental bug somewhere...
At least %fp is not loaded and return with 'retl' is not OK if the function is not a leaf.
Blue Swirl wrote:
Sadly I think that I've misunderstood something in my copy-and-paste quality patch as SPARC32 Solaris boot seems to get *less* far than it did without this patch, so my effort must have a fundamental bug somewhere...
At least %fp is not loaded and return with 'retl' is not OK if the function is not a leaf.
Ah okay - so swap retl with plain ret? And can I just use %sp like the SPARC64 version?
ATB,
Mark.
On Wed, Oct 20, 2010 at 7:18 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
Sadly I think that I've misunderstood something in my copy-and-paste quality patch as SPARC32 Solaris boot seems to get *less* far than it did without this patch, so my effort must have a fundamental bug somewhere...
At least %fp is not loaded and return with 'retl' is not OK if the function is not a leaf.
Ah okay - so swap retl with plain ret? And can I just use %sp like the SPARC64 version?
Yes. If you add 'save' and 'restore', then %sp can be used.
Blue Swirl wrote:
Ah okay - so swap retl with plain ret? And can I just use %sp like the SPARC64 version?
Yes. If you add 'save' and 'restore', then %sp can be used.
Okay - perhaps I'm missing something more fundamental here. The attached patch creates a simple handler that does nothing except flush windows to the stack and then call the C function - but it still seems to corrupt the stack somehow as subsequent calls into OBP have the wrong parameters.
The only thing I can think of is that this simple example fails because of something related to the return address, but I'm not 100% sure. Also what are the rules about how much information you can push onto the stack of the previous frame, i.e. the frame pointer? My current thoughts are that I can either i) push arguments onto the %fp and not save into a new window or ii) push arguments onto the %sp after a save (in which case I need additional code to copy the i registers into the o registers before calling the C function). Does this sound correct?
ATB,
Mark.
On Fri, Oct 22, 2010 at 6:45 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
Ah okay - so swap retl with plain ret? And can I just use %sp like the SPARC64 version?
Yes. If you add 'save' and 'restore', then %sp can be used.
Okay - perhaps I'm missing something more fundamental here. The attached patch creates a simple handler that does nothing except flush windows to the stack and then call the C function - but it still seems to corrupt the stack somehow as subsequent calls into OBP have the wrong parameters.
The only thing I can think of is that this simple example fails because of something related to the return address, but I'm not 100% sure.
'call' instruction writes the return address to %o7, clobbering previous %o7 from the OS. That's why 'save' and 'restore' are needed.
In this case, alternatively changing just 'call' to 'jmp' and deleting the rest should also work.
Also what are the rules about how much information you can push onto the stack of the previous frame, i.e. the frame pointer? My current thoughts are that I can either i) push arguments onto the %fp and not save into a new window or ii) push arguments onto the %sp after a save (in which case I need additional code to copy the i registers into the o registers before calling the C function). Does this sound correct?
I'd go for ii) or 'jmp'.
Blue Swirl wrote:
The only thing I can think of is that this simple example fails because of something related to the return address, but I'm not 100% sure.
'call' instruction writes the return address to %o7, clobbering previous %o7 from the OS. That's why 'save' and 'restore' are needed.
In this case, alternatively changing just 'call' to 'jmp' and deleting the rest should also work.
Also what are the rules about how much information you can push onto the stack of the previous frame, i.e. the frame pointer? My current thoughts are that I can either i) push arguments onto the %fp and not save into a new window or ii) push arguments onto the %sp after a save (in which case I need additional code to copy the i registers into the o registers before calling the C function). Does this sound correct?
I'd go for ii) or 'jmp'.
Okay. I tried using jmp but I got an "out of range" error on compilation so I started working on something based on disassembling the output of a GCC wrapper function. The attached patch works well my Solaris 8 ISO, although now after some twirly batons we appear to be sat in an infinite loop:
build@zeno:~/src/openbios/openbios-devel$ sparc64-linux-gdb GNU gdb 6.8 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=sparc64-linux". (gdb) file obj-sparc32/openbios-builtin.elf.nostrip Reading symbols from /home/build/src/openbios/openbios-devel/obj-sparc32/openbios-builtin.elf.nostrip...done. (gdb) target remote :1234 Remote debugging using :1234 [New Thread 1] 0x00000000 in ?? () (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x00106190 in ?? () (gdb) stepi 0x00106194 in ?? () (gdb) 0x00106190 in ?? () (gdb) disas 0x106190 0x106194 Dump of assembler code from 0x106190 to 0x106194: 0x00106190: call 0x106190 End of assembler dump. (gdb)
From the addresses and the twirly baton beforehand, I'd say we're actually now in the Solaris kernel. The interesting part of this is the fact that we get stuck in an infinite loop as shown above - I wonder if this is just the Solaris way of handling errors? Or perhaps maybe OpenBIOS doesn't have a romvec entry to enable diagnostic messages to be output to the console?
While there is no notable regression when building with -Os, my NetBSD and Debian test ISO images crash out when compiled with -O0 (as set in this patch) - anyone have any ideas on this one?
ATB,
Mark.
On Sat, Oct 23, 2010 at 8:51 AM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
The only thing I can think of is that this simple example fails because of something related to the return address, but I'm not 100% sure.
'call' instruction writes the return address to %o7, clobbering previous %o7 from the OS. That's why 'save' and 'restore' are needed.
In this case, alternatively changing just 'call' to 'jmp' and deleting the rest should also work.
Also what are the rules about how much information you can push onto the stack of the previous frame, i.e. the frame pointer? My current thoughts are that I can either i) push arguments onto the %fp and not save into a new window or ii) push arguments onto the %sp after a save (in which case I need additional code to copy the i registers into the o registers before calling the C function). Does this sound correct?
I'd go for ii) or 'jmp'.
Okay. I tried using jmp but I got an "out of range" error on compilation so I started working on something based on disassembling the output of a GCC wrapper function. The attached patch works well my Solaris 8 ISO, although now after some twirly batons we appear to be sat in an infinite loop:
The address of jmp target needs be in a register. Alternatively 'b' should work.
The offsets to %sp are too low, the area you are using is used by window trap handlers. They should be %sp+0x60 etc, but usually the addressing is %fp-0x20 etc which should yield the same address. Maybe this was the problem?
This document describes the usual stack frame layout: http://www.sparc.com/standards/psABI3rd.pdf
Other nitpicks: The usual practice is to use 'ret' followed by 'restore' in the delay slot (which can also perform 'mov %o0, %i0'). 'extern' is not needed with function declarations. obp_fortheval_v2_handler2() is not used AFAICT. init_openprom() declaration should be moved to romvec.h so all romvec stuff is in one place.
build@zeno:~/src/openbios/openbios-devel$ sparc64-linux-gdb GNU gdb 6.8 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=sparc64-linux". (gdb) file obj-sparc32/openbios-builtin.elf.nostrip Reading symbols from /home/build/src/openbios/openbios-devel/obj-sparc32/openbios-builtin.elf.nostrip...done. (gdb) target remote :1234 Remote debugging using :1234 [New Thread 1] 0x00000000 in ?? () (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x00106190 in ?? () (gdb) stepi 0x00106194 in ?? () (gdb) 0x00106190 in ?? () (gdb) disas 0x106190 0x106194 Dump of assembler code from 0x106190 to 0x106194: 0x00106190: call 0x106190 End of assembler dump. (gdb)
From the addresses and the twirly baton beforehand, I'd say we're actually now in the Solaris kernel. The interesting part of this is the fact that we get stuck in an infinite loop as shown above - I wonder if this is just the Solaris way of handling errors? Or perhaps maybe OpenBIOS doesn't have a romvec entry to enable diagnostic messages to be output to the console?
We don't implement pv_putstr (not used by Linux) and pv_printf calls printk directly without your wrappers.
While there is no notable regression when building with -Os, my NetBSD and Debian test ISO images crash out when compiled with -O0 (as set in this patch) - anyone have any ideas on this one?
Probably related to low %sp offset: unoptimized functions always use save/restore and they are not inlined, so the call chain is longer and window spills more likely to happen.
Blue Swirl wrote:
The address of jmp target needs be in a register. Alternatively 'b' should work.
Ah okay. Note for next time I guess :)
The offsets to %sp are too low, the area you are using is used by window trap handlers. They should be %sp+0x60 etc, but usually the addressing is %fp-0x20 etc which should yield the same address. Maybe this was the problem?
This document describes the usual stack frame layout: http://www.sparc.com/standards/psABI3rd.pdf
Yes indeed! Changing the offsets so that the globals are stored starting from %sp + 0x60 solves the problem compiling with -O0 :)
Other nitpicks: The usual practice is to use 'ret' followed by 'restore' in the delay slot (which can also perform 'mov %o0, %i0'). 'extern' is not needed with function declarations. obp_fortheval_v2_handler2() is not used AFAICT. init_openprom() declaration should be moved to romvec.h so all romvec stuff is in one place.
Done.
build@zeno:~/src/openbios/openbios-devel$ sparc64-linux-gdb GNU gdb 6.8 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=sparc64-linux". (gdb) file obj-sparc32/openbios-builtin.elf.nostrip Reading symbols from /home/build/src/openbios/openbios-devel/obj-sparc32/openbios-builtin.elf.nostrip...done. (gdb) target remote :1234 Remote debugging using :1234 [New Thread 1] 0x00000000 in ?? () (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x00106190 in ?? () (gdb) stepi 0x00106194 in ?? () (gdb) 0x00106190 in ?? () (gdb) disas 0x106190 0x106194 Dump of assembler code from 0x106190 to 0x106194: 0x00106190: call 0x106190 End of assembler dump. (gdb)
From the addresses and the twirly baton beforehand, I'd say we're actually now in the Solaris kernel. The interesting part of this is the fact that we get stuck in an infinite loop as shown above - I wonder if this is just the Solaris way of handling errors? Or perhaps maybe OpenBIOS doesn't have a romvec entry to enable diagnostic messages to be output to the console?
We don't implement pv_putstr (not used by Linux) and pv_printf calls printk directly without your wrappers.
Hmmmm. I've also added in an implementation of pv_putstr but still don't see anything useful output on the console (and in fact, if I set a breakpoint on obp_putstr it never breaks which means it isn't being called here either). I guess there will have to be some long nights tracing through disassembled code :(
Based upon all your comments above, would you say that the attached patch is ready for commit? It passes my tests here against my NetBSD, Debian and Solaris 8 ISO images.
ATB,
Mark.
On Sun, Oct 24, 2010 at 1:24 AM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
The address of jmp target needs be in a register. Alternatively 'b' should work.
Ah okay. Note for next time I guess :)
The offsets to %sp are too low, the area you are using is used by window trap handlers. They should be %sp+0x60 etc, but usually the addressing is %fp-0x20 etc which should yield the same address. Maybe this was the problem?
This document describes the usual stack frame layout: http://www.sparc.com/standards/psABI3rd.pdf
Yes indeed! Changing the offsets so that the globals are stored starting from %sp + 0x60 solves the problem compiling with -O0 :)
Other nitpicks: The usual practice is to use 'ret' followed by 'restore' in the delay slot (which can also perform 'mov %o0, %i0'). 'extern' is not needed with function declarations. obp_fortheval_v2_handler2() is not used AFAICT. init_openprom() declaration should be moved to romvec.h so all romvec stuff is in one place.
Done.
build@zeno:~/src/openbios/openbios-devel$ sparc64-linux-gdb GNU gdb 6.8 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=sparc64-linux". (gdb) file obj-sparc32/openbios-builtin.elf.nostrip Reading symbols from
/home/build/src/openbios/openbios-devel/obj-sparc32/openbios-builtin.elf.nostrip...done. (gdb) target remote :1234 Remote debugging using :1234 [New Thread 1] 0x00000000 in ?? () (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x00106190 in ?? () (gdb) stepi 0x00106194 in ?? () (gdb) 0x00106190 in ?? () (gdb) disas 0x106190 0x106194 Dump of assembler code from 0x106190 to 0x106194: 0x00106190: call 0x106190 End of assembler dump. (gdb)
From the addresses and the twirly baton beforehand, I'd say we're actually now in the Solaris kernel. The interesting part of this is the fact that we get stuck in an infinite loop as shown above - I wonder if this is just the Solaris way of handling errors? Or perhaps maybe OpenBIOS doesn't have a romvec entry to enable diagnostic messages to be output to the console?
We don't implement pv_putstr (not used by Linux) and pv_printf calls printk directly without your wrappers.
Hmmmm. I've also added in an implementation of pv_putstr but still don't see anything useful output on the console (and in fact, if I set a breakpoint on obp_putstr it never breaks which means it isn't being called here either). I guess there will have to be some long nights tracing through disassembled code :(
What about pv_printf?
Based upon all your comments above, would you say that the attached patch is ready for commit? It passes my tests here against my NetBSD, Debian and Solaris 8 ISO images.
Works for me too.
Blue Swirl wrote:
Hmmmm. I've also added in an implementation of pv_putstr but still don't see anything useful output on the console (and in fact, if I set a breakpoint on obp_putstr it never breaks which means it isn't being called here either). I guess there will have to be some long nights tracing through disassembled code :(
What about pv_printf?
Hmmmm well I think this should have been calling printk already? However, I added the handler wrapper function just before commit and I now get the following message on boot:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Oct 24 2010 19:39 Type 'help' for detailed information Trying cdrom:d... Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: kobj_read: seek 0x16a000 failed kobj_load_module: error reading section headers krtld: error during initial load/link phase Unhandled Exception 0x00000002 PC = 0x00000000 NPC = 0x00000004 Stopping execution
Strangely enough, with pv_printf() included the console output seems to change in that my screen is limited to a much smaller vertical terminal space, so perhaps it was never working in the first place anyway?
At least with error messages we can figure out what's going on :) A random crash on a function that has worked fine before tends to suggest some kind of memory allocation/memory clobber though.
Based upon all your comments above, would you say that the attached patch is ready for commit? It passes my tests here against my NetBSD, Debian and Solaris 8 ISO images.
Works for me too.
Great - thanks for testing!
ATB,
Mark.
Hi all,
So I spent some time stepping through OpenBIOS SPARC32 with various debug options enabled trying to figure out why it was overwriting the wrong part of memory, and in r923 I believe I fixed a fairly obvious bug in the SPARC32 memory allocation routines.
With this fixed in SVN trunk, I now get much further booting my Solaris 8 installation ISO:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Oct 28 2010 20:58 Type 'help' for detailed information Trying cdrom:d... Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: device auxio size -1 SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f004127c npc: f0041280 General Registers: %g0-7: 00000000 00000808 00000001 f0041b74 00000000 f0243b88 00000000 f0244020
Current Register Window: %o0-7: f025831c f5a2f00c f0240374 f0240370 f024036c 00000004 f0240300 f005bd84 %l0-7: 04400cc2 f005bf94 f005bf98 00000004 00000209 00000004 00000000 f023fe60 %i0-7: 00000001 f02403f4 f5a2f00c f025831c 00000001 00000009 f023ff08 f005c6b8
Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 psr: 04000cc2 (icc: ---- SPE: SP-) wim: 00000004 fsr: 00080000 y: 00000000 Aborted build@zeno:~/rel-qemu-git/bin$
Artyom, do you see a similar improvement with your test Solaris images too?
ATB,
Mark.
On Thu, 2010-10-28 at 22:01 +0100, Mark Cave-Ayland wrote:
Hi all,
So I spent some time stepping through OpenBIOS SPARC32 with various debug options enabled trying to figure out why it was overwriting the wrong part of memory, and in r923 I believe I fixed a fairly obvious bug in the SPARC32 memory allocation routines.
With this fixed in SVN trunk, I now get much further booting my Solaris 8 installation ISO:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Oct 28 2010 20:58 Type 'help' for detailed information Trying cdrom:d... Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: device auxio size -1 SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f004127c npc: f0041280 General Registers: %g0-7: 00000000 00000808 00000001 f0041b74 00000000 f0243b88 00000000 f0244020
Current Register Window: %o0-7: f025831c f5a2f00c f0240374 f0240370 f024036c 00000004 f0240300 f005bd84 %l0-7: 04400cc2 f005bf94 f005bf98 00000004 00000209 00000004 00000000 f023fe60 %i0-7: 00000001 f02403f4 f5a2f00c f025831c 00000001 00000009 f023ff08 f005c6b8
Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 psr: 04000cc2 (icc: ---- SPE: SP-) wim: 00000004 fsr: 00080000 y: 00000000 Aborted build@zeno:~/rel-qemu-git/bin$
Artyom, do you see a similar improvement with your test Solaris images too?
ATB,
Mark.
Mark, I get the exact same error trying to boot Solaris 9 in 32-bit SPARC Qemu.
-Nick
-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR.
On Thu, Oct 28, 2010 at 11:01 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Hi all,
So I spent some time stepping through OpenBIOS SPARC32 with various debug options enabled trying to figure out why it was overwriting the wrong part of memory, and in r923 I believe I fixed a fairly obvious bug in the SPARC32 memory allocation routines.
Great job!
While back I was asking how totavail and totmap are supposed to work and why did OpenBIOS decrease the totmap. Now you've explained it. :-).
With this fixed in SVN trunk, I now get much further booting my Solaris 8 installation ISO:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Oct 28 2010 20:58 Type 'help' for detailed information Trying cdrom:d... Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: device auxio size -1 SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. qemu: fatal: Trap 0x29 while interrupts disabled, Error state
Looks pretty much the same as with OBP.
Artyom, do you see a similar improvement with your test Solaris images too?
Won't have the time for it for the next few days, sorry. Meanwhile can you try booting with "-b" option? I guess it should do the trick.
Artyom Tarasenko wrote:
Great job!
While back I was asking how totavail and totmap are supposed to work and why did OpenBIOS decrease the totmap. Now you've explained it. :-).
Really? I've explained it, but I haven't yet quite understood it ;)
Artyom, do you see a similar improvement with your test Solaris images too?
Won't have the time for it for the next few days, sorry. Meanwhile can you try booting with "-b" option? I guess it should do the trick.
Hmmm that still crashes out in exactly the same way, although I see references to it on your blog which suggests it may be a qemu emulation issue as it occurs with OBP too.
Let's see how it goes with your testing :)
M-D.
Mark Cave-Ayland wrote:
While back I was asking how totavail and totmap are supposed to work and why did OpenBIOS decrease the totmap. Now you've explained it. :-).
Really? I've explained it, but I haven't yet quite understood it ;)
Just to add to this, in particular the questions I am trying to solve now are:
1) Are the memory properties physical or virtual? (totphys and totavail appear to be physical, where as totmap appears to be virtual?)
2) Should the relevant properties in the /memory and /virtual-memory nodes in the device tree be updated at the same time? (I think yes, as removing the properties causes boot to fail even on SPARC32).
ATB,
Mark.
On Fri, Oct 29, 2010 at 12:16 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Mark Cave-Ayland wrote:
While back I was asking how totavail and totmap are supposed to work and why did OpenBIOS decrease the totmap. Now you've explained it. :-).
Really? I've explained it, but I haven't yet quite understood it ;)
Just to add to this, in particular the questions I am trying to solve now are:
- Are the memory properties physical or virtual? (totphys and totavail
appear to be physical, where as totmap appears to be virtual?)
I think that's right. We don't update totavail, which is a linked list of all virtual memory zones available. So map_pages should add an entry to the list.
- Should the relevant properties in the /memory and /virtual-memory nodes
in the device tree be updated at the same time? (I think yes, as removing the properties causes boot to fail even on SPARC32).
Probably and /virtual-memory nodes should have the same information as totavail list.
Great job, anyway!
Blue Swirl wrote:
- Are the memory properties physical or virtual? (totphys and totavail
appear to be physical, where as totmap appears to be virtual?)
I think that's right. We don't update totavail, which is a linked list of all virtual memory zones available. So map_pages should add an entry to the list.
Yeah. The SPARC32 memory routines are quite simple in that they only have one item in each linked list; so while the memory regions represented in the device tree may not be exactly accurate, all memory within each region still meets the required criteria.
- Should the relevant properties in the /memory and /virtual-memory nodes
in the device tree be updated at the same time? (I think yes, as removing the properties causes boot to fail even on SPARC32).
Probably and /virtual-memory nodes should have the same information as totavail list.
Yes, that sounds about right.
Great job, anyway!
Thanks :) Do you have access to any 32-bit Solaris images at the moment for testing purposes? Following up on the trap 0x29 error, it seems that Artyom sees this on the more modern versions of Solaris which suggests it may possibly be a qemu emulation bug (see the comments especially):
http://tyom.blogspot.com/2009/12/solaris-under-qemu-how-to.html
So it would be useful to have someone who understands both SPARC32 and qemu to take a look (hint, hint!) ;)
ATB,
Mark.
On Fri, Oct 29, 2010 at 7:51 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
- Are the memory properties physical or virtual? (totphys and totavail
appear to be physical, where as totmap appears to be virtual?)
I think that's right. We don't update totavail, which is a linked list of all virtual memory zones available. So map_pages should add an entry to the list.
Yeah. The SPARC32 memory routines are quite simple in that they only have one item in each linked list; so while the memory regions represented in the device tree may not be exactly accurate, all memory within each region still meets the required criteria.
- Should the relevant properties in the /memory and /virtual-memory
nodes in the device tree be updated at the same time? (I think yes, as removing the properties causes boot to fail even on SPARC32).
Probably and /virtual-memory nodes should have the same information as totavail list.
Yes, that sounds about right.
Great job, anyway!
Thanks :) Do you have access to any 32-bit Solaris images at the moment for testing purposes? Following up on the trap 0x29 error, it seems that Artyom sees this on the more modern versions of Solaris which suggests it may possibly be a qemu emulation bug (see the comments especially):
http://tyom.blogspot.com/2009/12/solaris-under-qemu-how-to.html
There's a lot of comments. One of them says that the trap also happens on real SS-5.
So it would be useful to have someone who understands both SPARC32 and qemu to take a look (hint, hint!) ;)
Trap 0x29 is TT_DATA_ACCESS, invoked on access to unassigned memory by QEMU. Perhaps enabling DEBUG_UNASSIGNED in target-sparc/op_helper.c may reveal something.
It has been suspected that QEMU may be a bit too trigger happy with unassigned memory accesses. There may also be an undocumented device, or Solaris just tries to access a device which does not exist on SS-5.
Blue Swirl wrote:
So it would be useful to have someone who understands both SPARC32 and qemu to take a look (hint, hint!) ;)
Trap 0x29 is TT_DATA_ACCESS, invoked on access to unassigned memory by QEMU. Perhaps enabling DEBUG_UNASSIGNED in target-sparc/op_helper.c may reveal something.
It has been suspected that QEMU may be a bit too trigger happy with unassigned memory accesses. There may also be an undocumented device, or Solaris just tries to access a device which does not exist on SS-5.
Okay. With that DEBUG_UNASSIGNED enabled I get the following message as the trap is invoked:
Unassigned mem write access of 4 bytes to ffffffffffff0ecc from f004127c
Is there some kind of device map for SPARC32 somewhere so we can lookup what kind of device this is?
ATB,
Mark.
On 2010-10-29 4:28 PM, Mark Cave-Ayland wrote:
[...] Okay. With that DEBUG_UNASSIGNED enabled I get the following message as the trap is invoked:
Unassigned mem write access of 4 bytes to ffff.ffff.ffff.0ecc from f004127c
Is there some kind of device map for SPARC32 somewhere so we can lookup what kind of device this is?
That's ROM space. If we're writing that address on a real SS5, I'd expect the hardware to ignore it. I expect that address is in the middle of the ROM interrupt vector.
On Fri, Oct 29, 2010 at 8:28 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
So it would be useful to have someone who understands both SPARC32 and qemu to take a look (hint, hint!) ;)
Trap 0x29 is TT_DATA_ACCESS, invoked on access to unassigned memory by QEMU. Perhaps enabling DEBUG_UNASSIGNED in target-sparc/op_helper.c may reveal something.
It has been suspected that QEMU may be a bit too trigger happy with unassigned memory accesses. There may also be an undocumented device, or Solaris just tries to access a device which does not exist on SS-5.
Okay. With that DEBUG_UNASSIGNED enabled I get the following message as the trap is invoked:
Unassigned mem write access of 4 bytes to ffffffffffff0ecc from f004127c
That seems to be a 'neverland' mapping, accessed using MMU no-fault mode. This is very poorly documented. In QEMU the code for the mode is in target-sparc/helper.c, functions get_physical_address() and cpu_sparc_handle_mmu_fault().
Is there some kind of device map for SPARC32 somewhere so we can lookup what kind of device this is?
For QEMU, this kind of information is in hw/sun4m.c, otherwise there's the Sun4m System Architecture manual.
On Fri, Oct 29, 2010 at 10:28 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
So it would be useful to have someone who understands both SPARC32 and qemu to take a look (hint, hint!) ;)
Trap 0x29 is TT_DATA_ACCESS, invoked on access to unassigned memory by QEMU. Perhaps enabling DEBUG_UNASSIGNED in target-sparc/op_helper.c may reveal something.
It has been suspected that QEMU may be a bit too trigger happy with unassigned memory accesses. There may also be an undocumented device, or Solaris just tries to access a device which does not exist on SS-5.
Okay. With that DEBUG_UNASSIGNED enabled I get the following message as the trap is invoked:
Unassigned mem write access of 4 bytes to ffffffffffff0ecc from f004127c
No, it's not the same as with OBP. Btw I have a fix for the OBP fault, but it waits till someone commits the promised refactoring.
Is there some kind of device map for SPARC32 somewhere so we can lookup what kind of device this is?
None. It probably tries to access something which is not mapped.
On Fri, Oct 29, 2010 at 8:42 PM, Artyom Tarasenko atar4qemu@gmail.com wrote:
On Fri, Oct 29, 2010 at 10:28 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
So it would be useful to have someone who understands both SPARC32 and qemu to take a look (hint, hint!) ;)
Trap 0x29 is TT_DATA_ACCESS, invoked on access to unassigned memory by QEMU. Perhaps enabling DEBUG_UNASSIGNED in target-sparc/op_helper.c may reveal something.
It has been suspected that QEMU may be a bit too trigger happy with unassigned memory accesses. There may also be an undocumented device, or Solaris just tries to access a device which does not exist on SS-5.
Okay. With that DEBUG_UNASSIGNED enabled I get the following message as the trap is invoked:
Unassigned mem write access of 4 bytes to ffffffffffff0ecc from f004127c
No, it's not the same as with OBP. Btw I have a fix for the OBP fault, but it waits till someone commits the promised refactoring.
Which refactoring?
Is there some kind of device map for SPARC32 somewhere so we can lookup what kind of device this is?
None. It probably tries to access something which is not mapped.
This could be verified by changing a line in helper.c: *physical = 0xffffffffffff0000ULL; to for example *physical = 0xfef1f0fef1ff0000ULL; and see if the address changes.
Blue Swirl wrote:
None. It probably tries to access something which is not mapped.
This could be verified by changing a line in helper.c: *physical = 0xffffffffffff0000ULL; to for example *physical = 0xfef1f0fef1ff0000ULL; and see if the address changes.
It changes for me here:
Unassigned mem write access of 4 bytes to fef1f0fef1ff0ecc from f004127c
But then again I'm not sure whether this comment was aimed at me or Artyom? Does Tarl's suggestion of just ignoring write accesses to the ROM area sound feasible?
ATB,
Mark.
On 2010-10-29 5:06 PM, Mark Cave-Ayland wrote:
Blue Swirl wrote:
None. It probably tries to access something which is not mapped.
This could be verified by changing a line in helper.c: *physical = 0xffffffffffff0000ULL; to for example *physical = 0xfef1f0fef1ff0000ULL; and see if the address changes.
It changes for me here:
Unassigned mem write access of 4 bytes to fef1f0fef1ff0ecc from f004127c
But then again I'm not sure whether this comment was aimed at me or Artyom? Does Tarl's suggestion of just ignoring write accesses to the ROM area sound feasible?
I don't think you should ignore them - that's what the hardware would do, but if you're trying to write them something is busted.
On Fri, Oct 29, 2010 at 9:06 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
None. It probably tries to access something which is not mapped.
This could be verified by changing a line in helper.c: *physical = 0xffffffffffff0000ULL; to for example *physical = 0xfef1f0fef1ff0000ULL; and see if the address changes.
It changes for me here:
Unassigned mem write access of 4 bytes to fef1f0fef1ff0ecc from f004127c
Then it must be no-fault access. But I wonder why there is a fault, in no-fault mode there should be no faults. And when MMU is switched back to normal mode, the fake TLB mappings should be flushed.
But then again I'm not sure whether this comment was aimed at me or Artyom? Does Tarl's suggestion of just ignoring write accesses to the ROM area sound feasible?
I don't think ROM area is in play, but no-fault mode.
Blue Swirl wrote:
None. It probably tries to access something which is not mapped.
This could be verified by changing a line in helper.c: *physical = 0xffffffffffff0000ULL; to for example *physical = 0xfef1f0fef1ff0000ULL; and see if the address changes.
It changes for me here:
Unassigned mem write access of 4 bytes to fef1f0fef1ff0ecc from f004127c
Then it must be no-fault access. But I wonder why there is a fault, in no-fault mode there should be no faults. And when MMU is switched back to normal mode, the fake TLB mappings should be flushed.
But then again I'm not sure whether this comment was aimed at me or Artyom? Does Tarl's suggestion of just ignoring write accesses to the ROM area sound feasible?
I don't think ROM area is in play, but no-fault mode.
Is this the relevant section from the SPARC v8 manual here?
NF
NF is the “No Fault” bit. When NF = 0, any fault detected by the MMU causes FSR and FAR to be updated and causes a fault to be generated to the processor. When NF = 1, a fault on an access to ASI 9 is handled as when NF = 0; a fault on an access to any other ASI causes FSR and FAR to be updated but no fault is generated to the processor.
If a fault on access to an ASI other than 9 occurs while NF = 1, subse- quently resetting NF from 1 to 0 does not cause a fault to the processor (even though FSR.FT ≠ 0 at that time). A change in value of the NF bit takes effect as soon as the bit is written; a subsequent access to ASI 9 will be evaluated according to the new value of the NF bit.
ATB,
Mark.
Mark Cave-Ayland wrote:
I don't think ROM area is in play, but no-fault mode.
Here's an excerpt from a gdb session stepping through the problem area in the qemu code:
Breakpoint 1, cpu_sparc_handle_mmu_fault (env=0x10579f0, address=4028890828, rw=1, mmu_idx=1, is_softmmu=1) at /home/build/src/qemu/git/qemu/target-sparc/helper.c:261 261 vaddr = address & TARGET_PAGE_MASK; (gdb) bt #0 cpu_sparc_handle_mmu_fault (env=0x10579f0, address=4028890828, rw=1, mmu_idx=1, is_softmmu=1) at /home/build/src/qemu/git/qemu/target-sparc/helper.c:261 #1 0x0000000000521563 in tlb_fill (addr=4028890828, is_write=1, mmu_idx=1, retaddr=0x408ef5ad) at /home/build/src/qemu/git/qemu/target-sparc/op_helper.c:4204 #2 0x00000000005208f9 in __stl_mmu (addr=4028890828, val=2056, mmu_idx=1) at /home/build/src/qemu/git/qemu/softmmu_template.h:272 #3 0x00000000408ef5ae in ?? () #4 0x00000000408f1b15 in ?? () #5 0x0000000000000001 in ?? () #6 0xff0a000000000000 in ?? () #7 0x000000000084a4e0 in ss5_machine () #8 0x0000000001057cf8 in ?? () #9 0x0000000001057af8 in ?? () #10 0x00007ffff8068434 in ?? () #11 0x00007ffff806843c in ?? () #12 0x00007ffff8068438 in ?? () #13 0xf004127c010579f0 in ?? () #14 0xf3c4023500000475 in ?? () #15 0xff0a000000000000 in ?? () #16 0x00007ffff8068450 in ?? () #17 0x00000000004f6e72 in tb_find_fast () at /home/build/src/qemu/git/qemu/cpu-exec.c:185 Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) next 262 prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC; (gdb) 263 tlb_set_page(env, vaddr, paddr, prot, mmu_idx, TARGET_PAGE_SIZE); (gdb) p/x vaddr $1 = 0xf023f000 (gdb) p/x paddr $2 = 0xfef1f0fef1ff0000 (gdb) next tlb_set_page: vaddr=f023f000 paddr=0xfef1f0fef1ff0000 prot=7 idx=1 pd=0x00000010 [Thread 0x42939950 (LWP 29874) exited] 264 return 0; (gdb) 272 } (gdb) tlb_fill (addr=4028890828, is_write=1, mmu_idx=1, retaddr=0x403d94cd) at /home/build/src/qemu/git/qemu/target-sparc/op_helper.c:4205 4205 if (ret) { (gdb) next 4209 env = saved_env; (gdb) 4210 } (gdb) __stl_mmu (addr=4028890828, val=2056, mmu_idx=1) at /home/build/src/qemu/git/qemu/softmmu_template.h:237 237 tlb_addr = env->tlb_table[mmu_idx][index].addr_write; (gdb) 238 if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) { (gdb) 239 if (tlb_addr & ~TARGET_PAGE_MASK) { (gdb) 241 if ((addr & (DATA_SIZE - 1)) != 0) (gdb) 243 retaddr = GETPC(); (gdb) 244 ioaddr = env->iotlb[mmu_idx][index]; (gdb) 245 glue(io_write, SUFFIX)(ioaddr, val, addr, retaddr); (gdb) step io_writel (physaddr=18370729328764456976, val=2056, addr=4028890828, retaddr=0x403d94cd) at /home/build/src/qemu/git/qemu/softmmu_template.h:201 201 { (gdb) step 203 index = (physaddr >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1); (gdb) 204 physaddr = (physaddr & TARGET_PAGE_MASK) + addr; (gdb) 205 if (index > (IO_MEM_NOTDIRTY >> IO_MEM_SHIFT) (gdb) 210 env->mem_io_vaddr = addr; (gdb) 211 env->mem_io_pc = (unsigned long)retaddr; (gdb) 213 io_mem_write[index][SHIFT](io_mem_opaque[index], physaddr, val); (gdb) unassigned_mem_writel (opaque=0x0, addr=18370729332793347788, val=2056) at /home/build/src/qemu/git/qemu/exec.c:3014 3014 { (gdb) 3019 do_unassigned_access(addr, 1, 0, 0, 4); (gdb) do_unassigned_access (addr=18370729332793347788, is_write=1, is_exec=0, is_asi=0, size=4) at /home/build/src/qemu/git/qemu/target-sparc/op_helper.c:4218 4218 { (gdb) 4224 saved_env = env; (gdb) 4225 env = cpu_single_env; (gdb) 4227 if (is_asi) (gdb) 4233 printf("Unassigned mem %s access of %d byte%s to " TARGET_FMT_plx (gdb) Unassigned mem write access of 4 bytes to fef1f0fef1ff0ecc from f004127c 4239 fault_type = (env->mmuregs[3] & 0x1c) >> 2; (gdb) 4240 if ((fault_type > 4) || (fault_type == 0)) { (gdb) 4257 if (fault_type == ((env->mmuregs[3] & 0x1c)) >> 2) { (gdb) 4258 env->mmuregs[3] |= 1; (gdb) 4261 if ((env->mmuregs[0] & MMU_E) && !(env->mmuregs[0] & MMU_NF)) { (gdb) 4262 if (is_exec) (gdb) 4265 raise_exception(TT_DATA_ACCESS); (gdb) raise_exception (tt=41) at /home/build/src/qemu/git/qemu/target-sparc/op_helper.c:287 287 { (gdb) 288 env->exception_index = tt; (gdb) 289 cpu_loop_exit(); (gdb) cpu_loop_exit () at /home/build/src/qemu/git/qemu/cpu-exec.c:59 59 { (gdb) 60 env->current_tb = NULL; (gdb) 61 longjmp(env->jmp_env, 1); (gdb) qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f004127c npc: f0041280 General Registers: %g0-7: 00000000 00000808 00000001 f0041b74 00000000 f0243b88 00000000 f0244020
Current Register Window: %o0-7: f025831c f5a2f00c f0240374 f0240370 f024036c 00000004 f0240300 f005bd84 %l0-7: 04400cc2 f005bf94 f005bf98 00000004 00000209 00000004 00000000 f023fe60 %i0-7: 00000001 f02403f4 f5a2f00c f025831c 00000001 00000009 f023ff08 f005c6b8
Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 psr: 04000cc2 (icc: ---- SPE: SP-) wim: 00000004 fsr: 00080000 y: 00000000
Program received signal SIGABRT, Aborted. 0x00007ff3fde68ed5 in raise () from /lib/libc.so.6 (gdb)
I'm not sure exactly what's happening, although it seems like some kind of I/O memory access is triggering the error before the neverland mapping is removed?
ATB,
Mark.
Mark Cave-Ayland wrote:
I'm not sure exactly what's happening, although it seems like some kind of I/O memory access is triggering the error before the neverland mapping is removed?
Even simpler than that: the reason the neverland code is being invoked is because env->psret == 0 (i.e. traps are disabled), not because the MMU is in no fault mode:
Breakpoint 1, cpu_sparc_handle_mmu_fault (env=0x10579f0, address=4028890828, rw=1, mmu_idx=1, is_softmmu=1) at /home/build/src/qemu/git/qemu/target-sparc/helper.c:261 261 vaddr = address & TARGET_PAGE_MASK; (gdb) p/x env->mmuregs[0] & MMU_NF No symbol "MMU_NF" in current context. (gdb) p/x env->mmuregs[0] & 2 $5 = 0x0 (gdb) p/x env->psret $6 = 0x0 (gdb) quit
Based upon this, it would seem that we shouldn't be invoking the data access exception if traps have been globally disabled. Blue, what do you make of the following patch?
diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c index be3c1e0..d3a9f28 100644 --- a/target-sparc/op_helper.c +++ b/target-sparc/op_helper.c @@ -4258,7 +4258,7 @@ void do_unassigned_access(target_phys_addr_t addr, int is_write, int is_exec, env->mmuregs[3] |= 1; }
- if ((env->mmuregs[0] & MMU_E) && !(env->mmuregs[0] & MMU_NF)) { + if ((env->mmuregs[0] & MMU_E) && !(env->mmuregs[0] & MMU_NF) && (env->psret)) { if (is_exec) raise_exception(TT_CODE_ACCESS); else
This allows the Solaris 8 boot to proceed a couple of seconds longer, however it still falls over with a similar error but for trap 0x6 (window underflow) this time:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Oct 30 2010 16:27 Type 'help' for detailed information
0 > boot cdrom:d -vb Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 259040+54154+47486 Bytes device auxio size -1 SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. Ethernet address = 52:54:0:12:34:56 Using default device instance data qemu: fatal: Trap 0x06 while interrupts disabled, Error state pc: f00414a4 npc: f00413e0 General Registers: %g0-7: 00000000 00000003 00000000 f0041b74 000000ab f0243b88 00000000 f0244020
Current Register Window: %o0-7: f0000000 f0158f08 f0158f08 000000b7 f0243b88 00000000 f00423c8 f005bf58 %l0-7: 04400cc0 f005bf90 f005bf94 00000001 00000000 f0041b74 00000000 00000101 %i0-7: 00000009 f00424cc f1ff0514 000000b7 00000002 00000004 f0042470 f0041b74
Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 psr: 04400cc0 (icc: -Z-- SPE: SP-) wim: 00000003 fsr: 00080000 y: 00000000 Aborted
ATB,
Mark.
On Sun, Oct 31, 2010 at 1:07 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Mark Cave-Ayland wrote:
I'm not sure exactly what's happening, although it seems like some kind of I/O memory access is triggering the error before the neverland mapping is removed?
Even simpler than that: the reason the neverland code is being invoked is because env->psret == 0 (i.e. traps are disabled), not because the MMU is in no fault mode:
Breakpoint 1, cpu_sparc_handle_mmu_fault (env=0x10579f0, address=4028890828, rw=1, mmu_idx=1, is_softmmu=1) at /home/build/src/qemu/git/qemu/target-sparc/helper.c:261 261 vaddr = address & TARGET_PAGE_MASK; (gdb) p/x env->mmuregs[0] & MMU_NF No symbol "MMU_NF" in current context. (gdb) p/x env->mmuregs[0] & 2 $5 = 0x0 (gdb) p/x env->psret $6 = 0x0 (gdb) quit
Based upon this, it would seem that we shouldn't be invoking the data access exception if traps have been globally disabled. Blue, what do you make of the following patch?
This is not in line with the V8 spec. "If ET=0 and a precise trap occurs, the processor enters the error_mode state and halts execution."
Maybe the unassigned accesses shouldn't cause any faults. Or perhaps unassigned access is triggered where it shouldn't, do_unassigned_access() is called from several places, not only from normal load/store path.
diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c index be3c1e0..d3a9f28 100644 --- a/target-sparc/op_helper.c +++ b/target-sparc/op_helper.c @@ -4258,7 +4258,7 @@ void do_unassigned_access(target_phys_addr_t addr, int is_write, int is_exec, env->mmuregs[3] |= 1; }
- if ((env->mmuregs[0] & MMU_E) && !(env->mmuregs[0] & MMU_NF)) {
- if ((env->mmuregs[0] & MMU_E) && !(env->mmuregs[0] & MMU_NF) &&
(env->psret)) { if (is_exec) raise_exception(TT_CODE_ACCESS); else
This allows the Solaris 8 boot to proceed a couple of seconds longer, however it still falls over with a similar error but for trap 0x6 (window underflow) this time:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Oct 30 2010 16:27 Type 'help' for detailed information
0 > boot cdrom:d -vb Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 259040+54154+47486 Bytes device auxio size -1 SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. Ethernet address = 52:54:0:12:34:56 Using default device instance data qemu: fatal: Trap 0x06 while interrupts disabled, Error state
0x06 = window underflow. This shouldn't happen.
pc: f00414a4 npc: f00413e0 General Registers: %g0-7: 00000000 00000003 00000000 f0041b74 000000ab f0243b88 00000000 f0244020
Current Register Window: %o0-7: f0000000 f0158f08 f0158f08 000000b7 f0243b88 00000000 f00423c8 f005bf58 %l0-7: 04400cc0 f005bf90 f005bf94 00000001 00000000 f0041b74 00000000 00000101 %i0-7: 00000009 f00424cc f1ff0514 000000b7 00000002 00000004 f0042470 f0041b74
Floating Point Registers: %f00: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f04: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f08: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f12: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f16: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f20: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f24: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 %f28: 000000000.000000 000000000.000000 000000000.000000 000000000.000000 psr: 04400cc0 (icc: -Z-- SPE: SP-) wim: 00000003 fsr: 00080000 y: 00000000 Aborted
ATB,
Mark.
-- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063
Sirius Labs: http://www.siriusit.co.uk/labs
-- OpenBIOS http://openbios.org/ Mailinglist: http://lists.openbios.org/mailman/listinfo Free your System - May the Forth be with you
On Sun, Oct 31, 2010 at 2:07 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Mark Cave-Ayland wrote:
I'm not sure exactly what's happening, although it seems like some kind of I/O memory access is triggering the error before the neverland mapping is removed?
I still think it's just an access to the unmapped memory region.
Even simpler than that: the reason the neverland code is being invoked is because env->psret == 0 (i.e. traps are disabled), not because the MMU is in no fault mode:
This means that is a fault in a fault handler. The reason for it can be stack or something else getting exhausted while trying to process some trap. Or the trap handler tries to report the error over some non-existent device.
Are you running with -nographic?
Also Solaris boot option "-v" makes the boot more verbose.
Breakpoint 1, cpu_sparc_handle_mmu_fault (env=0x10579f0, address=4028890828, rw=1, mmu_idx=1, is_softmmu=1) at /home/build/src/qemu/git/qemu/target-sparc/helper.c:261 261 vaddr = address & TARGET_PAGE_MASK; (gdb) p/x env->mmuregs[0] & MMU_NF No symbol "MMU_NF" in current context. (gdb) p/x env->mmuregs[0] & 2 $5 = 0x0 (gdb) p/x env->psret $6 = 0x0 (gdb) quit
Based upon this, it would seem that we shouldn't be invoking the data access exception if traps have been globally disabled. Blue, what do you make of the following patch?
diff --git a/target-sparc/op_helper.c b/target-sparc/op_helper.c index be3c1e0..d3a9f28 100644 --- a/target-sparc/op_helper.c +++ b/target-sparc/op_helper.c @@ -4258,7 +4258,7 @@ void do_unassigned_access(target_phys_addr_t addr, int is_write, int is_exec, env->mmuregs[3] |= 1; }
- if ((env->mmuregs[0] & MMU_E) && !(env->mmuregs[0] & MMU_NF)) {
- if ((env->mmuregs[0] & MMU_E) && !(env->mmuregs[0] & MMU_NF) &&
(env->psret)) { if (is_exec) raise_exception(TT_CODE_ACCESS); else
OBP works without this hack.
Artyom Tarasenko wrote:
I still think it's just an access to the unmapped memory region.
Even simpler than that: the reason the neverland code is being invoked is because env->psret == 0 (i.e. traps are disabled), not because the MMU is in no fault mode:
This means that is a fault in a fault handler. The reason for it can be stack or something else getting exhausted while trying to process some trap. Or the trap handler tries to report the error over some non-existent device.
Can you take a look at the output of show-devs with OBP to try and figure out which device it is? Also would it be possible for you to enable DEBUG_UNALIGNED and DEBUG_UNASSIGNED in target-sparc/op_helper.c in qemu, boot as far as you can, and then send me the output?
Ah and another thing while I think about it: could you send me the output of the following too:
cd /virtual-memory .properties
cd /memory .properties
Are you running with -nographic?
Yes.
Also Solaris boot option "-v" makes the boot more verbose.
Yes, I found this on your blog. The output with -vb looks like this:
Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 259040+54154+47486 Bytes device auxio size -1 SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. Ethernet address = 52:54:0:12:34:56 Using default device instance data
OBP works without this hack.
Meh. Is there any improvement with the older versions of SunOS?
ATB,
Mark.
On Sun, Oct 31, 2010 at 7:30 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Artyom Tarasenko wrote:
I still think it's just an access to the unmapped memory region.
Even simpler than that: the reason the neverland code is being invoked is because env->psret == 0 (i.e. traps are disabled), not because the MMU is in no fault mode:
This means that is a fault in a fault handler. The reason for it can be stack or something else getting exhausted while trying to process some trap. Or the trap handler tries to report the error over some non-existent device.
Can you take a look at the output of show-devs with OBP to try and figure out which device it is?
Just a wild guess - it can be idprom. Solaris doesn't manage to tell the Ethernet address. Also there seems to be a problem with the root node:
Welcome to OpenBIOS v1.0 built on Oct 31 2010 19:48 Type 'help' for detailed information Trying disk... No valid state has been set by load or init-program
0 > cd / ok 0 > .properties name "SUNW,SPARCstation-5" #address-cells 2 #size-cells 1 compatible "sun4m" clock-frequency a21fe80 idprom -- 20 : 01 80 52 54 00 12 34 56 00 00 00 00 00 00 00 f7 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 banner-name "SPARCstation 5" model "SUNW,501-3059" stdin-path "/obio/zs@0,100000:a" stdout-path "/obio/zs@0,100000:a" uuid -- 10 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ok 0 > device-end ok 0 > boot cdrom:d No valid state has been set by load or init-program ok 2 >
Also would it be possible for you to enable DEBUG_UNALIGNED and DEBUG_UNASSIGNED in target-sparc/op_helper.c in qemu, boot as far as you can, and then send me the output?
Unlike Linux/Debian, Solaris rarely does unaligned accesses (in fact I think I haven't seen one which I didn't trigger myself).
ok boot disk2:d -vb Boot device: /iommu/sbus/espdma@5,8400000/esp@5,8800000/sd@2,0:d File and args: -vb Size: 264120+54502+47926 Bytes SunOS Release 5.8 Version Generic_108528-29 32-bit Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved. Ethernet address = 52:54:0:12:34:56 Using default device instance data vac: enabled in write through mode mem = 262144K (0x10000000) avail mem = 258269184 root nexus = SUNW,SPARCstation-5 iommu0 at root: obio 0x10000000 sbus0 at iommu0: obio 0x10001000 dma0 at sbus0: SBus slot 5 0x8400000 dma0 is /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000 /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000 (esp0): esp-options=0x46 esp0 at dma0: SBus slot 5 0x8800000 sparc ipl 4 esp0 is /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000 sd2 at esp0: target 2 lun 0 sd2 is /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@2,0 root on /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@2,0:b fstype ufs obio0 at root obio0 at obio0: obio 0x100000, sparc ipl 12 zs0 is /obio/zs@0,100000 obio1 at obio0: obio 0x0, sparc ipl 12 zs1 is /obio/zs@0,0 cpu0: FMI,MB86907 (mid 0 impl 0x0 ver 0x4 clock 1071 MHz) Unassigned mem write access of 4 bytes to ffffffffffff0ee0 from f00602cc Unassigned mem write access of 4 bytes to ffffffffffff0ee4 from f00602cc Unassigned mem write access of 4 bytes to ffffffffffff0ee8 from f00602d0 [^^^ note that these aren't real faults. The NF flag is on. I usually modify debug output not to show them.] #
Ah and another thing while I think about it: could you send me the output of the following too:
cd /virtual-memory .properties
cd /memory .properties
ok cd /virtual-memory ok .properties .properties ? ok .attributes available 00000000 fff00000 00100000 00000000 fef00000 00e00000 00000000 00000000 fe400000 00000000 ffe15000 000cd000 00000000 ffd00000 00008000 00000000 fe400000 00b00000 reg 00000000 00000000 80000000 00000000 80000000 80000000 name virtual-memory ok cd /memory ok .attributes reg 00000000 00000000 02000000 00000000 02000000 02000000 00000000 04000000 02000000 00000000 06000000 02000000 00000000 08000000 02000000 00000000 0a000000 02000000 00000000 0c000000 02000000 00000000 0e000000 02000000 available 00000000 00000000 0ffa5000 name memory ok
Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 259040+54154+47486 Bytes device auxio size -1 SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. Ethernet address = 52:54:0:12:34:56 Using default device instance data
OBP works without this hack.
Meh. Is there any improvement with the older versions of SunOS?
Compared to the previous state - yes. Compared to Solaris 8 - no:
0 > boot cdrom:d -vsb Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 243536+176918+41926 Bytes device auxio size -1 SunOS Release 5.6 Version Generic [UNIX(R) System V Release 4.0] Copyright (c) 1983-1997, Sun Microsystems, Inc. Using default device instance data Unassigned mem write access of 4 bytes to ffffffffffff0ff4 from f0041354 qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f0041354 npc: f0041358 General Registers: %g0-7: 00000000 f026de48 00000001 f0041bb4 00000326 f0243b88 00000001 f0244020
Current Register Window: %o0-7: 00000000 f0240494 f59cb00c 00000000 00000000 f0274e38 f0240438 f0041bb4 %l0-7: 04400cc0 f004cb10 f004cb14 00000001 00000209 00000001 f026de48 f023ff98 %i0-7: f59cb000 04400cc1 ff812201 000003d5 f025e800 00000000 f0240040 f0057110
0 > boot cdrom:d -vbs Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 260620+167370+38134 Bytes device auxio size -1 SunOS Release 5.5.1 Version Generic [UNIX(R) System V Release 4.0] Copyright (c) 1983-1996, Sun Microsystems, Inc. Using default device instance data Unassigned mem write access of 4 bytes to ffffffffffff0ee4 from f00412a0 qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f00412a0 npc: f00412a4 General Registers: %g0-7: 00000000 ffdf1000 ffdd7740 00000001 001f5c30 00121798 00000001 f0242020
Current Register Window: %o0-7: ffd8bb30 00000800 ffffe000 00000400 00000000 00000000 f0240240 ffd0a918 %l0-7: 04000cc6 ffd2d378 ffd2d37c 00000040 00000209 00000040 00000007 f023fe78 %i0-7: ffd885a4 000a6898 00000000 00000200 f0240300 00000000 f023ff20 ffd0e860
Btw, where does the message "device auxio size -1" come from?
Artyom Tarasenko wrote:
Just a wild guess - it can be idprom. Solaris doesn't manage to tell the Ethernet address. Also there seems to be a problem with the root node:
Welcome to OpenBIOS v1.0 built on Oct 31 2010 19:48 Type 'help' for detailed information Trying disk... No valid state has been set by load or init-program
0 > cd / ok 0 > .properties name "SUNW,SPARCstation-5" #address-cells 2 #size-cells 1 compatible "sun4m" clock-frequency a21fe80 idprom -- 20 : 01 80 52 54 00 12 34 56 00 00 00 00 00 00 00 f7 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 banner-name "SPARCstation 5" model "SUNW,501-3059" stdin-path "/obio/zs@0,100000:a" stdout-path "/obio/zs@0,100000:a" uuid -- 10 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ok 0 > device-end ok 0 > boot cdrom:d No valid state has been set by load or init-program ok 2 >
Ah yes. There does seem to be an issue executing the boot word when you're not in /chosen at the moment. You should be able to do:
cd /chosen boot cdrom:d -vb
Unlike Linux/Debian, Solaris rarely does unaligned accesses (in fact I think I haven't seen one which I didn't trigger myself).
Yeah, I see a lot of those on Debian too :(
ok boot disk2:d -vb Boot device: /iommu/sbus/espdma@5,8400000/esp@5,8800000/sd@2,0:d File and args: -vb Size: 264120+54502+47926 Bytes SunOS Release 5.8 Version Generic_108528-29 32-bit Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved. Ethernet address = 52:54:0:12:34:56 Using default device instance data vac: enabled in write through mode mem = 262144K (0x10000000) avail mem = 258269184 root nexus = SUNW,SPARCstation-5 iommu0 at root: obio 0x10000000 sbus0 at iommu0: obio 0x10001000 dma0 at sbus0: SBus slot 5 0x8400000 dma0 is /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000 /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000 (esp0): esp-options=0x46 esp0 at dma0: SBus slot 5 0x8800000 sparc ipl 4 esp0 is /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000 sd2 at esp0: target 2 lun 0 sd2 is /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@2,0 root on /iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000/sd@2,0:b fstype ufs obio0 at root obio0 at obio0: obio 0x100000, sparc ipl 12 zs0 is /obio/zs@0,100000 obio1 at obio0: obio 0x0, sparc ipl 12 zs1 is /obio/zs@0,0 cpu0: FMI,MB86907 (mid 0 impl 0x0 ver 0x4 clock 1071 MHz) Unassigned mem write access of 4 bytes to ffffffffffff0ee0 from f00602cc Unassigned mem write access of 4 bytes to ffffffffffff0ee4 from f00602cc Unassigned mem write access of 4 bytes to ffffffffffff0ee8 from f00602d0 [^^^ note that these aren't real faults. The NF flag is on. I usually modify debug output not to show them.] #
Ah and another thing while I think about it: could you send me the output of the following too:
cd /virtual-memory .properties
cd /memory .properties
ok cd /virtual-memory ok .properties .properties ? ok .attributes available 00000000 fff00000 00100000 00000000 fef00000 00e00000 00000000 00000000 fe400000 00000000 ffe15000 000cd000 00000000 ffd00000 00008000 00000000 fe400000 00b00000 reg 00000000 00000000 80000000 00000000 80000000 80000000 name virtual-memory ok cd /memory ok .attributes reg 00000000 00000000 02000000 00000000 02000000 02000000 00000000 04000000 02000000 00000000 06000000 02000000 00000000 08000000 02000000 00000000 0a000000 02000000 00000000 0c000000 02000000 00000000 0e000000 02000000 available 00000000 00000000 0ffa5000 name memory ok
I think one of the next things I'd like to do is try and figure out if it's possible to switch SPARC32 to ofmem, since we already have code that correctly generates all these properties so it would be a shame not to use it (I see accesses to some of the /memory nodes just before it crashes on OpenBIOS, so I'd like to eliminate that as a source of confusion first).
Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 259040+54154+47486 Bytes device auxio size -1 SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. Ethernet address = 52:54:0:12:34:56 Using default device instance data
OBP works without this hack.
Meh. Is there any improvement with the older versions of SunOS?
Compared to the previous state - yes. Compared to Solaris 8 - no:
0 > boot cdrom:d -vsb Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 243536+176918+41926 Bytes device auxio size -1 SunOS Release 5.6 Version Generic [UNIX(R) System V Release 4.0] Copyright (c) 1983-1997, Sun Microsystems, Inc. Using default device instance data Unassigned mem write access of 4 bytes to ffffffffffff0ff4 from f0041354 qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f0041354 npc: f0041358 General Registers: %g0-7: 00000000 f026de48 00000001 f0041bb4 00000326 f0243b88 00000001 f0244020
Current Register Window: %o0-7: 00000000 f0240494 f59cb00c 00000000 00000000 f0274e38 f0240438 f0041bb4 %l0-7: 04400cc0 f004cb10 f004cb14 00000001 00000209 00000001 f026de48 f023ff98 %i0-7: f59cb000 04400cc1 ff812201 000003d5 f025e800 00000000 f0240040 f0057110
0 > boot cdrom:d -vbs Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: Size: 260620+167370+38134 Bytes device auxio size -1 SunOS Release 5.5.1 Version Generic [UNIX(R) System V Release 4.0] Copyright (c) 1983-1996, Sun Microsystems, Inc. Using default device instance data Unassigned mem write access of 4 bytes to ffffffffffff0ee4 from f00412a0 qemu: fatal: Trap 0x29 while interrupts disabled, Error state pc: f00412a0 npc: f00412a4 General Registers: %g0-7: 00000000 ffdf1000 ffdd7740 00000001 001f5c30 00121798 00000001 f0242020
Current Register Window: %o0-7: ffd8bb30 00000800 ffffe000 00000400 00000000 00000000 f0240240 ffd0a918 %l0-7: 04000cc6 ffd2d378 ffd2d37c 00000040 00000209 00000040 00000007 f023fe78 %i0-7: ffd885a4 000a6898 00000000 00000200 f0240300 00000000 f023ff20 ffd0e860
Cool - thanks for the output :)
Btw, where does the message "device auxio size -1" come from?
It seems as if there should be an auxio device somewhere:
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/arch/sparc/kernel...
Perhaps this is the device that Solaris is trying to access and failing?
ATB,
Mark.
On Mon, Nov 1, 2010 at 11:59 AM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Artyom Tarasenko wrote:
Unlike Linux/Debian, Solaris rarely does unaligned accesses (in fact I think I haven't seen one which I didn't trigger myself).
Yeah, I see a lot of those on Debian too :(
This is not bad. They are supposed to be there. Probably they are used for flushing something. The annoying thing is that they don't allow to have DEBUG_UNALIGNED always turned on.
I think one of the next things I'd like to do is try and figure out if it's possible to switch SPARC32 to ofmem, since we already have code that correctly generates all these properties so it would be a shame not to use it (I see accesses to some of the /memory nodes just before it crashes on OpenBIOS, so I'd like to eliminate that as a source of confusion first).
Would be cool. Then the sparc32 port can be used to improve the sparc64 one.
Btw, where does the message "device auxio size -1" come from?
It seems as if there should be an auxio device somewhere:
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/arch/sparc/kernel...
Perhaps this is the device that Solaris is trying to access and failing?
Yes, that's what I mean. But what can be wrong with the OpenBIOS implementation? OpenBIOS:
0 > cd /obio/auxio ok 0 > .properties name "auxio" reg -- c : 00 00 00 00 00 90 00 00 00 00 00 01 ok
OBP: ok cd /obio/auxio ok .attributes address ffee6000 reg 00000000 00900000 00000001 name auxio ok
On Mon, Nov 1, 2010 at 3:58 PM, Artyom Tarasenko atar4qemu@gmail.com wrote:
On Mon, Nov 1, 2010 at 11:59 AM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Artyom Tarasenko wrote:
Unlike Linux/Debian, Solaris rarely does unaligned accesses (in fact I think I haven't seen one which I didn't trigger myself).
Yeah, I see a lot of those on Debian too :(
This is not bad. They are supposed to be there. Probably they are used for flushing something. The annoying thing is that they don't allow to have DEBUG_UNALIGNED always turned on.
I think one of the next things I'd like to do is try and figure out if it's possible to switch SPARC32 to ofmem, since we already have code that correctly generates all these properties so it would be a shame not to use it (I see accesses to some of the /memory nodes just before it crashes on OpenBIOS, so I'd like to eliminate that as a source of confusion first).
Would be cool. Then the sparc32 port can be used to improve the sparc64 one.
Btw, where does the message "device auxio size -1" come from?
It seems as if there should be an auxio device somewhere:
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/arch/sparc/kernel...
Perhaps this is the device that Solaris is trying to access and failing?
Yes, that's what I mean. But what can be wrong with the OpenBIOS implementation? OpenBIOS:
0 > cd /obio/auxio ok 0 > .properties name "auxio" reg -- c : 00 00 00 00 00 90 00 00 00 00 00 01 ok
OBP: ok cd /obio/auxio ok .attributes address ffee6000
This means that the device is not mapped. Maybe this can help:
diff --git a/drivers/obio.c b/drivers/obio.c index 38c5f8d..d22abe3 100644 --- a/drivers/obio.c +++ b/drivers/obio.c @@ -228,7 +228,7 @@ ob_auxio_init(uint64_t base, uint64_t offset) { ob_new_obio_device("auxio", NULL);
- ob_reg(base, offset, AUXIO_REGS, 0); + ob_reg(base, offset, AUXIO_REGS, 1);
fword("finish-device"); }
Blue Swirl wrote:
This means that the device is not mapped. Maybe this can help:
diff --git a/drivers/obio.c b/drivers/obio.c index 38c5f8d..d22abe3 100644 --- a/drivers/obio.c +++ b/drivers/obio.c @@ -228,7 +228,7 @@ ob_auxio_init(uint64_t base, uint64_t offset) { ob_new_obio_device("auxio", NULL);
- ob_reg(base, offset, AUXIO_REGS, 0);
ob_reg(base, offset, AUXIO_REGS, 1);
fword("finish-device");
}
Better, but still not quite right:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Oct 30 2010 16:27 Type 'help' for detailed information Trying cdrom:d... Not a bootable ELF image Loading a.out image... Loaded 7680 bytes entry point is 0x4000 bootpath: /iommu/sbus/espdma/esp/sd@2,0:d
Jumping to entry point 00004000 for type 00000005... switching to new context: device auxio address prop too big SunOS Release 5.8 Version Generic_108528-09 32-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved.
Looking closer:
Configuration device id QEMU version 1 machine id 32 CPUs: 1 x FMI,MB86904 UUID: 00000000-0000-0000-0000-000000000000 Welcome to OpenBIOS v1.0 built on Oct 30 2010 16:27 Type 'help' for detailed information
0 > cd /obio/auxio ok 0 > .properties name "auxio" reg -- c : 00 00 00 00 00 90 00 00 00 00 00 01 address -- 8 : ff eb 20 00 00 00 00 04 ok 0 >
Hmmm. It look as if this part of map_reg in drivers/obio.c is totally wrong, at least for SPARC32:
if (map) { unsigned long addr;
addr = (unsigned long)map_io(base + offset, size);
PUSH(addr); fword("encode-int"); PUSH(4); fword("encode-int"); fword("encode+"); push_str("address"); fword("property"); return addr; } return 0;
I'd probably say that based upon the output above we should probably remove the second PUSH() and encode-int/encode+ completely, but it must have been added for a reason. Blue, any ideas?
ATB,
Mark.
Am 02.11.2010 um 11:04 schrieb Mark Cave-Ayland:
Hmmm. It look as if this part of map_reg in drivers/obio.c is totally wrong, at least for SPARC32:
if (map) { unsigned long addr;
addr = (unsigned long)map_io(base + offset, size); PUSH(addr); fword("encode-int"); PUSH(4); fword("encode-int"); fword("encode+"); push_str("address"); fword("property"); return addr;
} return 0;
I'd probably say that based upon the output above we should probably remove the second PUSH() and encode-int/encode+ completely, but it must have been added for a reason.
Fwiw I initially thought the same about ppc code. I guess these are symptoms of copy and paste between unrelated parts of code. The reg property of the /memory node on ppc is similarly bogus, still need to clean up my patch for review.
In some places the IEEE 1275 spec (or was it the platform binding?) has some nice examples of how they would typically be set up.
Andreas
On Fri, Oct 29, 2010 at 10:58 PM, Blue Swirl blauwirbel@gmail.com wrote:
On Fri, Oct 29, 2010 at 8:42 PM, Artyom Tarasenko atar4qemu@gmail.com wrote:
On Fri, Oct 29, 2010 at 10:28 PM, Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
Blue Swirl wrote:
So it would be useful to have someone who understands both SPARC32 and qemu to take a look (hint, hint!) ;)
Trap 0x29 is TT_DATA_ACCESS, invoked on access to unassigned memory by QEMU. Perhaps enabling DEBUG_UNASSIGNED in target-sparc/op_helper.c may reveal something.
It has been suspected that QEMU may be a bit too trigger happy with unassigned memory accesses. There may also be an undocumented device, or Solaris just tries to access a device which does not exist on SS-5.
Okay. With that DEBUG_UNASSIGNED enabled I get the following message as the trap is invoked:
Unassigned mem write access of 4 bytes to ffffffffffff0ecc from f004127c
No, it's not the same as with OBP. Btw I have a fix for the OBP fault, but it waits till someone commits the promised refactoring.
Which refactoring?
le/esp dma split.
Am 24.10.2010 um 03:24 schrieb Mark Cave-Ayland:
Index: arch/sparc32/romvec.c
--- arch/sparc32/romvec.c (revision 915) +++ arch/sparc32/romvec.c (working copy)
@@ -219,36 +211,43 @@ }
static const struct linux_nodeops nodeops0 = {
- obp_nextnode, /* int (*no_nextnode)(int node); */
- obp_child, /* int (*no_child)(int node); */
- obp_proplen, /* int (*no_proplen)(int node, char *name); */
- obp_getprop, /* int (*no_getprop)(int node,char *name,char
*val); */
- obp_setprop, /* int (*no_setprop)(int node, char *name,
char *val, int len); */
- obp_nextprop /* char * (*no_nextprop)(int node, char *name); */
- obp_nextnode_handler, /* int (*no_nextnode)(int node); */
- obp_child_handler, /* int (*no_child)(int node); */
- obp_proplen_handler, /* int (*no_proplen)(int node, char
*name); */
- obp_getprop_handler, /* int (*no_getprop)(int node,char
*name,char *val); */
- obp_setprop_handler, /* int (*no_setprop)(int node, char *name,
char *val, int len); */
- obp_nextprop_handler /* char * (*no_nextprop)(int node, char
*name); */ };
-static int obp_nbgetchar(void) +int obp_nbgetchar(void) { return getchar(); }
-static int obp_nbputchar(int ch) +int obp_nbputchar(int ch) { putchar(ch);
return 0;
}
-static void obp_reboot(char *str) +void obp_putstr(char *str, int len) {
- PUSH((ucell)str);
PUSH(pointer2cell(str));
@@ -449,11 +448,16 @@ DPRINTF("obp_fortheval_v2(%s)\n", str); push_str(str); fword("eval");
}
Whitespace intentional?
Andreas
Andreas Färber wrote:
-static void obp_reboot(char *str) +void obp_putstr(char *str, int len) {
- PUSH((ucell)str);
PUSH(pointer2cell(str));
@@ -449,11 +448,16 @@ DPRINTF("obp_fortheval_v2(%s)\n", str); push_str(str); fword("eval");
}
Whitespace intentional?
Hi Andreas,
Have corrected both of these issues and committed - thanks for the review :)
ATB,
Mark.