Re: [OpenBIOS] Sparc32 "see" crashes on "" literals

25 Jan 2009


      Mark Cave-Ayland wrote:
...
Thanks for the patch! With the patch applied, then "see" no longer
crashes on those particular routines. Interestingly enough, digging
further into the BIOS I still see some discrepancies between the
source code and the detokenized version:
0 > see find-device
: find-device
    2dup " .." strcmp 0= if
    2drop active-package dup if
      >dn.parent @ dup 0= if
        (lit) throw active-package! exit 0 -rot path-resolution 0= if
          false exit active-package swap true path-res-cleanup
active-package!
In this case, (lit) should be "..". And also:
-22 even.
And all the "then"s are missing. That's a bit more complicated to
implement though.
...
0 > see (find-dev)
: (find-dev)
            active-package -rot (lit) catch if
            3drop false exit active-package swap active-package! true
  ;
 ok
And here (lit) should be "[']".
Hm..
...
Am I right in thinking that it should be possible to reconstruct any
source exactly (minus formatting) from a tokenized input?
Not exactly.
Forgive me for some nit-picking, the forth dictionary is not "tokenized"
forth code, like the stuff toke produces. A tokenizer just produces a
binary representation of the source code ("FCode"), similar to what some
BASIC dialects did in ancient times to reduce file size. It's still
"source code" and in order to execute it, it still needs to be
"compiled", just like forth source code.
Now, if source code (or FCode for that matter) is compiled, the forth
engine can keep things simple.
Example:
: some-new-word ( -- xt-of-find-device )
  ['] find-device
;
['] and ' will put the execution token (xt) of a word on the stack. That
execution token could then be executed with "execute". It's like a
function pointer in C.
['] is executed as an immediate word, which means it will not start
looking for "find-device" when some-new-word is executed, but rather
when it is "compiled into the dictionary" (aka when it is defined).
So at the time some-new-word is executed, all it really does is put a
cell sized integer on the stack. Just as if you had typed -22.
The primitive word to achieve this is (lit). When a (lit) is executed,
it will read the cell after the execution token of (lit) and put it to
the stack. It has no knowledge anymore about what number that would be.
So when a number is compiled into the dictionary, it looks like this:
| xt-of-(lit) | number | next-word's-xt | ...
Formerly, when a string was put on the stack with " it looked like this:
| xt-of-(lit) | pointer-to-string | xt-of-(lit) | length-of-string |
xt-of-dobranch | offset-behind-string | cell-aligned-string | ...
So it's not easy to recognize from two (lit) and a dobranch that the
above is a string. Which is why at some point we started hiding that
magic behind another word called (") which basically puts a two numbers
on the stack, but is only used for string handling. So we can recognize
strings in see. For some odd reason s" was using (") but " was not.
We can do this kind of thing for other words, too, in order to improve
the reversability of forth words. Suggestions, and patches are most welcome!
Stefan
-- 
coresystems GmbH • Brahmsstr. 16 • D-79104 Freiburg i. Br.
      Tel.: +49 761 7668825 • Fax: +49 761 7664613
Email: info@coresystems.de  • http://www.coresystems.de/
Registergericht: Amtsgericht Freiburg • HRB 7656
Geschäftsführer: Stefan Reinauer • Ust-IdNr.: DE245674866