As eluded to in earlier posts to the list, my initial attempts at adding debug support were focused on storing additional information in the rstack. Unfortunately this created extra problems in debugging some of the more interesting Forth words, since they would manipulate the return stack and cause the debugger to get confused.
My final implementation works in a much more simple way; when the debug word is invoked with the name of the word to debug, the start and end addresses of the word are added to a debug linked list. Then in the next() function, we iterate through the linked list to see if the current PC lies within one of the functions within. If this is the case, we enter the source debugger in step/trace mode as appropriate.
Having given the patch a reasonably good test here, I'm quite pleased with the additional functionality it provides. The only minor downsides I can see are that the patch adds extra work in docol(), semis() and next() in order to update the debug linked list. I've tried to wrap most of the complexity in conditional while () statements so that it is only invoked while the debugger is active, and so should have a minimal impact on normal runtime performance (which seems to be the case here).
Unfortunately, SEMIS is _the_ hotspot in profiles of any ITC/DTC, followed by DOCOL. I never profiled NEXT separately, it is best to inline it into most words. Your approach will slow down the engine by anywhere from 30%-200% when not debugging.
Did you try to change the actual compiled Forth code at runtime? That's how all other Forth debuggers do it.
You can probably lift most code you need directly from Mitch Bradley's Open Firmware implementation.
Segher