Hi all,
Whilst spending some time working on debugging SPARC64 support with Qemu/OpenBIOS, it became readily apparent that progress was being hampered by the lack of debugging facilities in OpenBIOS (see http://lists.openbios.org/pipermail/openbios/2009-August/003949.html). Hence I've been working on adding a source debugger to OpenBIOS which should enable developers to step/trace through Forth words in order to locate bugs in the lower level Forth OpenBIOS code.
The attached patch implements a Forth Source Debugger based upon the IEEE-1275 specification; it is not a comprehensive implementation but has already proved to be very useful in my tests here. A sample session using the debugger goes something like this:
Welcome to OpenBIOS v1.0 built on Oct 31 2009 10:09 Type 'help' for detailed information
[unix] Booting default not supported.
0 > : bar ." test " ; ok 0 > debug bar Stepper keys: <space>/<enter> Up Down Trace Rstack Forth ok 0 > bar : bar ( Empty ) 0xf7e11f0c: (") : (") ( Empty ) 0xf7dfd928: r> ( f7e11f0c ) 0xf7dfd92c: dup ( f7e11f0c f7e11f0c ) 0xf7dfd930: 2 ( f7e11f0c f7e11f0c 2 ) 0xf7dfd934: cells ( f7e11f0c f7e11f0c 8 ) 0xf7dfd938: + ( f7e11f0c f7e11f14 ) 0xf7dfd93c: over ( f7e11f0c f7e11f14 f7e11f0c ) 0xf7dfd940: cell+ ( f7e11f0c f7e11f14 f7e11f10 ) 0xf7dfd944: @ ( f7e11f0c f7e11f14 5 ) 0xf7dfd948: rot ( f7e11f14 5 f7e11f0c ) 0xf7dfd94c: over ( f7e11f14 5 f7e11f0c 5 ) 0xf7dfd950: + ( f7e11f14 5 f7e11f11 ) 0xf7dfd954: aligned ( f7e11f14 5 f7e11f14 ) 0xf7dfd958: cell+ ( f7e11f14 5 f7e11f18 ) 0xf7dfd95c: >r ( f7e11f14 5 ) 0xf7dfd960: (semis) [ Finished (") ] ( f7e11f14 5 ) 0xf7e11f1c: type test ( Empty ) 0xf7e11f20: (semis) [ Finished bar ] ok 0 > bar : bar ( Empty ) 0xf7e11f0c: (") : (") ( Empty ) 0xf7dfd928: r> [ Up to bar ] ( f7e11f14 5 ) 0xf7e11f1c: type test ( Empty ) 0xf7e11f20: (semis) [ Finished bar ] ok 0 >
As eluded to in earlier posts to the list, my initial attempts at adding debug support were focused on storing additional information in the rstack. Unfortunately this created extra problems in debugging some of the more interesting Forth words, since they would manipulate the return stack and cause the debugger to get confused.
My final implementation works in a much more simple way; when the debug word is invoked with the name of the word to debug, the start and end addresses of the word are added to a debug linked list. Then in the next() function, we iterate through the linked list to see if the current PC lies within one of the functions within. If this is the case, we enter the source debugger in step/trace mode as appropriate.
Having given the patch a reasonably good test here, I'm quite pleased with the additional functionality it provides. The only minor downsides I can see are that the patch adds extra work in docol(), semis() and next() in order to update the debug linked list. I've tried to wrap most of the complexity in conditional while() statements so that it is only invoked while the debugger is active, and so should have a minimal impact on normal runtime performance (which seems to be the case here).
Please test the patch and let me know if it requires extra work in order for it to be considered ready for committing to the OpenBIOS SVN repository.
ATB,
Mark.
Le samedi 31 octobre 2009 à 11:02 +0000, Mark Cave-Ayland a écrit :
Hi all,
Whilst spending some time working on debugging SPARC64 support with Qemu/OpenBIOS, it became readily apparent that progress was being hampered by the lack of debugging facilities in OpenBIOS (see http://lists.openbios.org/pipermail/openbios/2009-August/003949.html). Hence I've been working on adding a source debugger to OpenBIOS which should enable developers to step/trace through Forth words in order to locate bugs in the lower level Forth OpenBIOS code.
The attached patch implements a Forth Source Debugger based upon the IEEE-1275 specification; it is not a comprehensive implementation but has already proved to be very useful in my tests here. A sample session using the debugger goes something like this:
Welcome to OpenBIOS v1.0 built on Oct 31 2009 10:09 Type 'help' for detailed information
[unix] Booting default not supported.
I test this with obj-ppc/openbios-qemu.elf and ./ppc-softmmu/qemu-system-ppc
0 > : bar ." test " ; ok 0 > debug bar Stepper keys: <space>/<enter> Up Down Trace Rstack Forth ok
I don't have this.
0 > bar
It hangs here for me.
If I test this with -nographic", on "debug bar", I have the :
">> Stepper keys: <space>/<enter> Up Down Trace Rstack Forth ok
But then on "bar" I have an infinite loop displaying:
">> Stepper keys: <space>/<enter> Up Down Trace Rstack Forth "
Regards, Laurent
Laurent Vivier wrote:
Hi Laurent,
Thanks for testing.
I test this with obj-ppc/openbios-qemu.elf and ./ppc-softmmu/qemu-system-ppc
0 > : bar ." test " ; ok 0 > debug bar Stepper keys: <space>/<enter> Up Down Trace Rstack Forth ok
I don't have this.
0 > bar
It hangs here for me.
Hmmmm it looks as if the output from the printk() isn't making it to the screen when being run from a graphical terminal. You can tell this by pressing 't' (trace) when you run "bar" and it will return back to the Forth prompt. I'm not sure why this is the case at the moment.
If I test this with -nographic", on "debug bar", I have the :
">> Stepper keys: <space>/<enter> Up Down Trace Rstack Forth ok
But then on "bar" I have an infinite loop displaying:
">> Stepper keys: <space>/<enter> Up Down Trace Rstack Forth "
Yeah - I've already found and fixed this one while I've been testing with -nographic on sparc64. Please try the revised v2 patch attached instead which should solve the issue for you.
ATB,
Mark.
As eluded to in earlier posts to the list, my initial attempts at adding debug support were focused on storing additional information in the rstack. Unfortunately this created extra problems in debugging some of the more interesting Forth words, since they would manipulate the return stack and cause the debugger to get confused.
My final implementation works in a much more simple way; when the debug word is invoked with the name of the word to debug, the start and end addresses of the word are added to a debug linked list. Then in the next() function, we iterate through the linked list to see if the current PC lies within one of the functions within. If this is the case, we enter the source debugger in step/trace mode as appropriate.
Having given the patch a reasonably good test here, I'm quite pleased with the additional functionality it provides. The only minor downsides I can see are that the patch adds extra work in docol(), semis() and next() in order to update the debug linked list. I've tried to wrap most of the complexity in conditional while () statements so that it is only invoked while the debugger is active, and so should have a minimal impact on normal runtime performance (which seems to be the case here).
Unfortunately, SEMIS is _the_ hotspot in profiles of any ITC/DTC, followed by DOCOL. I never profiled NEXT separately, it is best to inline it into most words. Your approach will slow down the engine by anywhere from 30%-200% when not debugging.
Did you try to change the actual compiled Forth code at runtime? That's how all other Forth debuggers do it.
You can probably lift most code you need directly from Mitch Bradley's Open Firmware implementation.
Segher
Segher Boessenkool wrote:
Hi Segher,
Unfortunately, SEMIS is _the_ hotspot in profiles of any ITC/DTC, followed by DOCOL. I never profiled NEXT separately, it is best to inline it into most words. Your approach will slow down the engine by anywhere from 30%-200% when not debugging.
Interesting - thanks for your comments on this. I'd like to get an idea as to what kind of impact the patch has as it stands. Is there a standard benchmark for Forth implementations? Or alternatively some word implementations that will stretch the kernel, for example Fibonacci, Mandelbrot etc. that I can use for comparison?
Given that OpenBIOS is not performance critical, I think it is acceptable to allow a slight degradation in speed over the ease of use of debugging. Perhaps another scenario would be to make the debugger an optional compile-time feature for those people that require the best performance?
Did you try to change the actual compiled Forth code at runtime? That's how all other Forth debuggers do it.
Possibly; there are probably several ways in which you can implement something like this. One of the reasons for going the C route was that I wanted to analyse the rstack in detail; OpenBIOS unfortunately is not standards-compliant Forth in that some words pop information from the rstack which they did not put there themselves.
In its current form, the patch allows me to trace through words that do this since it doesn't rely on iterating through the rstack. I'm not sure whether the Open Firmware code works in this way though.
You can probably lift most code you need directly from Mitch Bradley's Open Firmware implementation.
I'm not sure that we can; a brief look at the documentation shows that there appears to be a mixture of licenses involved :( Without a clear statement from the core team as to what you can and cannot do, I am quite hesitant in going down this particular route.
ATB,
Mark.
Interesting - thanks for your comments on this. I'd like to get an idea as to what kind of impact the patch has as it stands. Is there a standard benchmark for Forth implementations?
No standard, no. It wouldn't matter anyway, you can measure this slowdown on _any_ code.
Or alternatively some word implementations that will stretch the kernel, for example Fibonacci, Mandelbrot etc. that I can use for comparison?
Sure, fib or mandel will do.
performance critical, I think it is acceptable to allow a slight degradation in speed over the ease of use of debugging.
Yes, but my point is your patch does not cause a "slight" slowdown (I haven't measured it though!), while standard techniques cause no slowdown at all.
Did you try to change the actual compiled Forth code at runtime? That's how all other Forth debuggers do it.
Possibly; there are probably several ways in which you can implement something like this.
My point is that your strategy has some severe disadvantages. Just run the benchmarks and then someone can decide if it's worthwhile. My opinion would be that it is not worth it, since it inserts code into the "hot path" while that's not necessary at all.
One of the reasons for going the C route was that I wanted to analyse the rstack in detail; OpenBIOS unfortunately is not standards-compliant Forth in that some words pop information from the rstack which they did not put there themselves.
That is perfectly valid, standard-compliant Forth. A portable program cannot do such things since it doesn't know what a nesting-sys looks like; but OpenBIOS includes the Forth system as well.
Anyway, run benchmarks and report them here please.
Segher
On Mon, Nov 2, 2009 at 8:15 PM, Segher Boessenkool segher@kernel.crashing.org wrote:
Interesting - thanks for your comments on this. I'd like to get an idea as to what kind of impact the patch has as it stands. Is there a standard benchmark for Forth implementations?
No standard, no. It wouldn't matter anyway, you can measure this slowdown on _any_ code.
Or alternatively some word implementations that will stretch the kernel, for example Fibonacci, Mandelbrot etc. that I can use for comparison?
Sure, fib or mandel will do.
performance critical, I think it is acceptable to allow a slight degradation in speed over the ease of use of debugging.
Yes, but my point is your patch does not cause a "slight" slowdown (I haven't measured it though!), while standard techniques cause no slowdown at all.
Did you try to change the actual compiled Forth code at runtime? That's how all other Forth debuggers do it.
Possibly; there are probably several ways in which you can implement something like this.
My point is that your strategy has some severe disadvantages. Just run the benchmarks and then someone can decide if it's worthwhile. My opinion would be that it is not worth it, since it inserts code into the "hot path" while that's not necessary at all.
The debug stuff could be made a compile time option. We could even enable the debugging for non-release versions, so release versions would not suffer.
Blue Swirl wrote:
The debug stuff could be made a compile time option. We could even enable the debugging for non-release versions, so release versions would not suffer.
Maybe it would make sense to have a normal variant and a debug variant in the qemu tree in case people actually want to debug their boot scenario?
Stefan
On Mon, Nov 2, 2009 at 11:57 PM, Stefan Reinauer stepan@coresystems.de wrote:
Blue Swirl wrote:
The debug stuff could be made a compile time option. We could even enable the debugging for non-release versions, so release versions would not suffer.
Maybe it would make sense to have a normal variant and a debug variant in the qemu tree in case people actually want to debug their boot scenario?
That's possible.
But I think the performance hit could be avoided almost entirely. Make non-debug and debug versions of "semis" and "docol". On startup construct the "words" table, while building select the non-debug versions, unless some magic diagnostic switch is on. "next" or "enterforth" may still need a test.
Maybe the switch to debug table could even happen during execution by user command.
Blue Swirl wrote:
That's possible.
But I think the performance hit could be avoided almost entirely. Make non-debug and debug versions of "semis" and "docol". On startup construct the "words" table, while building select the non-debug versions, unless some magic diagnostic switch is on. "next" or "enterforth" may still need a test.
Maybe the switch to debug table could even happen during execution by user command.
Indeed. I had a similar inspiration earlier this evening, with the only problem being how to switch code into next() without causing a performance hit. However, I think I've just worked out a cute little hack that would solve this. I'm at a conference over the next few days, so it might not be until next week that I get a chance re-work the patch and resubmit.
In the meantime, I've looked again at the v2 patch I posted and I still can't work out why the debugging output by printk() doesn't appear on a VNC display for Qemu SPARC64, while it appears fine when Qemu SPARC64 is invoked in -nographic mode. Can anyone shed any light on this?
ATB,
Mark.
Mark Cave-Ayland wrote:
Indeed. I had a similar inspiration earlier this evening, with the only problem being how to switch code into next() without causing a performance hit. However, I think I've just worked out a cute little hack that would solve this. I'm at a conference over the next few days, so it might not be until next week that I get a chance re-work the patch and resubmit.
Okay, so I've come across an interesting problem with regard to reworking this. Attached is a patch that demonstrates some very alarming behaviour, at least here on x86_64 gcc 4.3.2. What happens is that just by adding a single function pointer to kernel/internal.c, the runtime for my fibonacci benchmark goes up by 40%!
With the attached patch for OpenBIOS SVN, I get the following runtime for the Fibonacci benchmark:
build@zeno:~/src/openbios/openbios-devel$ time echo "28 fib-rec u. bye" | ./obj-x86/openbios-unix ./obj-x86/openbios-x86.dict Welcome to OpenBIOS v1.0 built on Nov 9 2009 17:17 Type 'help' for detailed information
[unix] Booting default not supported.
0 > 28 fib-rec u. bye 6197ecb Farewell!
ok
real 0m52.564s user 0m52.027s sys 0m0.012s
If the line "void (*debughook) (void);" is then commented out from kernel/internal.c then the runtime looks like below (which is roughly the same as running the fib-rec benchmark on plain OpenBIOS SVN):
build@zeno:~/src/openbios/openbios-devel$ time echo "28 fib-rec u. bye" | ./obj-x86/openbios-unix ./obj-x86/openbios-x86.dict Welcome to OpenBIOS v1.0 built on Nov 9 2009 17:12 Type 'help' for detailed information
[unix] Booting default not supported.
0 > 28 fib-rec u. bye 6197ecb Farewell!
ok
real 0m37.946s user 0m37.178s sys 0m0.020s
So in other words, simply defining a pointer variable (which hasn't even been used anywhere in the code yet) is increasing the runtime by 40%?! Can anyone shed any light on this behaviour? It just doesn't seem to make any sense to me.
ATB,
Mark.
Segher Boessenkool wrote:
Anyway, run benchmarks and report them here please.
Segher
Okay. I managed to find a recursive fibonacci forth implementation which gives the following results for fib(28):
Latest SVN without patch: 37s Latest SVN with path: 59s
So that works out as roughly a 60% increase in execution time just for adding the extra checks :( This seems beyond even my limit of acceptability, although I have some ideas to work around this...
ATB,
Mark.