On Aug 27, 2012, at 4:44 PM, openbios-request@openbios.org wrote:
Message: 4 Date: Mon, 27 Aug 2012 22:35:40 +0200 From: Segher Boessenkool segher@kernel.crashing.org To: The OpenBIOS Mailinglist openbios@openbios.org Subject: Re: [OpenBIOS] [PATCH] Adds local variable support to OpenBIOS. Message-ID: F74BFA60-0824-49A9-B592-D6862E0ACFB7@kernel.crashing.org Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Let me comment on the code, independently of if I think using local var names is a good idea at all.
- DEPTH 0= IF
CR ." Please specify an array size." CR
EXIT
- THEN
You shouldn't blindly continue when you detect an error; use ABORT instead, or ABORT" since you want to print something.
Thanks for catching this. It has been corrected.
Don't check stack depth, it is useless. It doesn't work when you forgot to pass a parameter but there is other stuff on the stack already (the common case). Your check also doesn't work if there is a negative number of parameters on the stack (the next most common case...)
Just kill this part.
I don't think ignoring all possible error situations is better than handing a few of them.
- CREATE CELL * ALLOT \ creates and initializes the instance
CELL * is CELLS (which is a standard Forth word, CELL is not).
Thanks for letting me know about CELL. I have changed them all to CELLS
- DEPTH 2 < IF
CR ." Please specify an index number." CR
drop \ removes the address of the array instance
-1 throw \ stop normal execution after error
- THEN
-1 THROW is ABORT Same thing here, just kill this code.
The error handling makes the word more user friendly. Having it fail and leaving the user clueless why doesn't sound right.
+; immediate
Defining words should not be immediate in most cases, including here.
+\ Declare the local-base-address VALUE +0 VALUE local-base-address
+\ returns the base address used for the local words +: getBaseAddress ( - addr )
- local-base-address
+;
: really-get-base-address getBaseAddress ; : really-really-get-base-address really-get-base-address ;
And don't use CamelCase.
Is this some official naming convention, or your own taste?
+\ Sets the first local variable's value +: Local0! ( x - )
- 0 CELL * getBaseAddress + !
+;
You can do something smarter than this...
+\ Sets the twelfth local variable's value +: Local11! ( x - )
- 11 CELL * getBaseAddress + !
+;
...why stop at 11?
As far as I know, Apple's code only uses two local variables at most. I figured 12 local variables per word should be enough.
+: calculateNeededMemory ( "char" - n )
- 0 TO variableCount
in @ \ keep track of where the pointer was- begin
parse-word
0= if \ if there is no more text to parse
drop
true
else
dup " ;" comp
0= if \ if the semicolon is encountered
drop \ drop the duplicated address
false
else
" }" comp
0= if \ if '}' character is encountered
true \ end loop because '}' marks end of local variables
else
variableCount 1 + TO variableCount
false
then
then
then
- until
in ! \ reset the pointer- variableCount CELL *
+;
Factor this? You shouldn't ever use a variable that is used as a temp inside a word.
I'm not sure why you suggest this. It is just such a pain having to deal with the stack.
+: allocateMemory ( n - addr )
- alloc-mem dup 0= if
drop
cr cr 10 spaces abort" Failed to allocate memory for local
variables!" cr cr
- then
+;
alloc-mem is way too slow to be used on every function call. Statically allocate a locals stack, or put it on an existing stack, or something like that.
I agree. That is why it is only used at compile time.
+\ Declares the size of the local variable table +48 CONSTANT localTableSize
+\ Declare the local variable table +localTableSize ARRAY localVariableTable
You have 48 but can only access 12?
That is four fields per local variable: 48 / 4 = 12.
+: getLocalRecordCount
- arrayCount 4 /
+;
Oh. Use a struct?
The little documentation on struct I found did not indicate it was a better replacement for Array.
I'll stop here. There are two good ways to implement locals (or anything else, really): 1) Make it _simple_: you can do locals in about 20 or 30 lines of code. 60 is fine, too. 2) Make it efficient: don't allocate from a heap, don't run much code on the hot paths. This will be much smaller code as well, but perhaps trickier to understand.
What is a hot path?
Segher
[...]
I'm not Segher, but I'll comment:
Don't check stack depth, it is useless. It doesn't work when you forgot to pass a parameter but there is other stuff on the stack already (the common case). Your check also doesn't work if there is a negative number of parameters on the stack (the next most common case...)
Just kill this part.
I don't think ignoring all possible error situations is better than handing a few of them.
Not really. If you can only catch occasional errors, you produce more confusion than illumination. In general, in any given forth code, you are going to get called with other stuff on the stack so your depth is meaningless. You almost never are at a stack depth of zero.
+DEPTH 2 < IF +CR ." Please specify an index number." CR +drop\ removes the address of the array instance +-1 throw\ stop normal execution after error +THEN
-1 THROW is ABORT Same thing here, just kill this code.
The error handling makes the word more user friendly. Having it fail and leaving the user clueless why doesn't sound right.
Not really. If sometimes you get an error message and sometimes you don't, and the difference is something entirely unrelated (like, you are inside a loop), it produces more confusion than elucidation. In general, forth code, because it's always called deep on the stack, doesn't assume any significant meaning from depth on the stack. Other than "underflow" when you've already screwed the pooch.
And don't use CamelCase.
Is this some official naming convention, or your own taste?
In general (at least in the 1275 world), Forth is case-insensitive. The rare cases where it is made case sensitive (there are a couple in Sun's Openboot - see dropins.src) make use of mixed case very painful. The general standard is that methods and variables are lower case, defined constants and structure offsets are upper case. In some code, even they are lower case.
Mixed-case is generally confined to comments.
Factor this? You shouldn't ever use a variable that is used as a temp inside a word.
I'm not sure why you suggest this. It is just such a pain having to deal with the stack.
Because using a global variable gets you in trouble in recursion or when the same code is called at alarm level. As for stack being a pain, yes, that's forth. Stack manipulation as a way of life. Personally, I prefer "2over nip" rather than "2 pick" :-)
+48 CONSTANT localTableSize
+\ Declare the local variable table +localTableSize ARRAY localVariableTable
You have 48 but can only access 12?
That is four fields per local variable: 48 / 4 = 12.
At a minimum, any derived constants should be derived explicitly. Like:
12 constant #local-variables \ Feels right. If you need more, add more code after Locals11. 4 constant /local-variable \ Datastructures require four cells #local-variables /local-variable * constant locals-table-size \ Size in cells, not bytes.
Ah. That points out another convention. When mashing together multiple words for a method or variable name, we tend to use dashes as separators (and please, don't use underscores. There is someone I know who *mixes* underscores and dashes in variable names!). Another is that sizes of things are usually /object (such as /n, /l, ...) and counts of objects are usually #object.
+: getLocalRecordCount +arrayCount 4 / +;
Oh. Use a struct?
The little documentation on struct I found did not indicate it was a better replacement for Array.
Struct will allow you define what the four separate words used in each local variable entry are used for. If you use structs, I'd expect to see something like:
struct /n field >LOCAL-INIT /n field >LOCAL-ORDER /n field >LOCAL-LEN /n field >LOCAL-ADDR constant /local-variable 12 constant #local-variables /local-variable #local-variables constant locals-table-size
Note that the above provides the table size in bytes, rather than cells.
To use the above (assuming you are still using alloc-mem), you'd see something like:
locals-table-size alloc-mem ( table ) #local-variables 0 do ( table ) i /local-variable * ( table entry ) 0 over >LOCAL-INIT ! ( table entry ) 0 over >LOCAL-ORDER ! ( table entry ) 0 over >LOCAL-LEN ! ( table entry ) 0 swap >LOCAL-ADDR ! ( table ) loop ( table )
What is a hot path?
Anything that gets called in the run path of code. If it gets called only once at initialization, it's not in the hot path. If it gets called on every method invocation, it's a hot path.
And don't use CamelCase.
Is this some official naming convention, or your own taste?
In general (at least in the 1275 world), Forth is case-insensitive.
Almost all Forth systems, and all the modern ones, yes.
The rare cases where it is made case sensitive (there are a couple in Sun's Openboot - see dropins.src) make use of mixed case very painful. The general standard is that methods and variables are lower case, defined constants and structure offsets are upper case. In some code, even they are lower case.
Mixed-case is generally confined to comments.
Some people write e.g. "Constant". There is nothing wrong with that. But you should write "a-long-name" and not "ALongName" or "alongname". It's much more readable.
FWIW, I write CONSTANT and REPEAT etc. so that the defining words and structure words really stand out when you read the code, making the structure more obvious. Not everyone likes that, and that is of course fine :-)
Factor this? You shouldn't ever use a variable that is used as a temp inside a word.
I'm not sure why you suggest this. It is just such a pain having to deal with the stack.
Because using a global variable gets you in trouble in recursion or when the same code is called at alarm level.
It's also a red flag warning for not properly structured code. Global vars can be useful for passing state between words, but from a word to itself? It must be doing way too much.
[SNIP lots of good stuff]
Segher
+\ Sets the twelfth local variable's value +: Local11! ( x - )
- 11 CELL * getBaseAddress + !
+;
...why stop at 11?
As far as I know, Apple's code only uses two local variables at most. I figured 12 local variables per word should be enough.
Apple's code uses a maximum of eight locals (and they actually use (close to) that -- really badly factored code).
Neither 8 nor 12 is anywhere near close enough to allow compiled C code to run. Any implementation that limits the number of allowed locals is only good for letting people who are not yet comfortable with programming Forth code get something done. Whether it is better to let them do that, or force them to learn to use Forth properly first, is not something I want to get into here ;-)
[SNIP a word of 30 lines, indented to hell and back, impossible to read]
Factor this? You shouldn't ever use a variable that is used as a temp inside a word.
I'm not sure why you suggest this. It is just such a pain having to deal with the stack.
That's why you should factor it! A normal word should deal with maybe two or three data items only, and do only one thing. The main problem is coming up with good names for your words (thankfully you do not have to do this for data items, because you do not use locals); but then again, if it is hard to think of a good name for a word, it probably does more than one thing. So factor some more :-)
alloc-mem is way too slow to be used on every function call. Statically allocate a locals stack, or put it on an existing stack, or something like that.
I agree. That is why it is only used at compile time.
That wasn't clear to me. Maybe you should put in some more comments? A good thing to comment are your data structures and the global data, often the words using that are clear then as well.
Segher