* Segher Boessenkool segher@chello.nl [020623 18:15]:
- use c style comments to not break non gcc compilers (compile with -ansi -pedantic)
Try compiling it on a non-gcc compiler. It won't work.
But it's a step closer.
- implement unaligned-w@, unaligned-w!, unaligned-l@, unaligned-l!
I'd rather have these implemented in Forth; we have too many primitives already.
I don't agree on this. Unaligned accesses are slower than the aligned versions anyways, plus we need to bloat the forth code with endianess checks where we can solve this in the preprocessor in C. I made Forth versions of these words as well, but while thinking about when you need unaligned accesses, I came to the conclusion that you probably don't want further slowdown. It's ugly that you have to break up atomicity of the access anyways.
- use conf.pl to create types.h according to compiler capabilities (cross compiling possible) (cleaner version then last patch)
I don't like it yet...
What's wrong?
- move unix host code from prim.code to unix.code
Please leave it where it is, for now. We'll move it when we compile stuff from source (as opposed to the current situation: from a precompiled dictionary).
Which reminds me that we also need support for multiple linked dictionaries when doing packages. Though maybe it might be enough to have multiple fcode lookup tables in the fcode evaluator? Is there any trivial way of doing this?
Stefan
- implement unaligned-w@, unaligned-w!, unaligned-l@, unaligned-l!
I'd rather have these implemented in Forth; we have too many primitives already.
I don't agree on this. Unaligned accesses are slower than the aligned versions anyways, plus we need to bloat the forth code with endianess checks where we can solve this in the preprocessor in C. I made Forth
you actually don't need to know endianness to implement a fast unaligned-*** in pure Forth :)
: unaligned-l@ here /l move> here l@ ;
etc.
versions of these words as well, but while thinking about when you need unaligned accesses, I came to the conclusion that you probably don't want further slowdown. It's ugly that you have to break up atomicity of the access anyways.
most hardware won't allow general unaligned accesses to be atomic anyway... i doubt we'll ever see unaligned-*** used on anything that's not just ram, so there's no problem here.
- use conf.pl to create types.h according to compiler capabilities (cross compiling possible) (cleaner version then last patch)
I don't like it yet...
What's wrong?
it just "feels" ugly to me.
- move unix host code from prim.code to unix.code
Please leave it where it is, for now. We'll move it when we compile stuff from source (as opposed to the current situation: from a precompiled dictionary).
Which reminds me that we also need support for multiple linked dictionaries when doing packages. Though maybe it might be enough to have multiple fcode lookup tables in the fcode evaluator? Is there any trivial way of doing this?
for the dictionaries, a little playing around with HERE and LAST and LATEST will do. fcode lookup tables are only valid for the package that defines it, and only during package loading, so i see no problem there?
Segher
- To unsubscribe: send mail to majordomo@freiburg.linux.de with 'unsubscribe openbios' in the body of the message http://www.freiburg.linux.de/OpenBIOS/ - free your system..
* Segher Boessenkool segher@chello.nl [020626 07:58]:
you actually don't need to know endianness to implement a fast unaligned-*** in pure Forth :)
: unaligned-l@ here /l move> here l@ ;
Ah.. nice idea. Which makes me think it might be worth writing accelerated versions of move, that only copy unaligned starts/ends of a block byte per byte and use cell size copying for the rest. Don't know whether it's worth the overhead for the typical amount of data copied with move in an OF implementation.
most hardware won't allow general unaligned accesses to be atomic anyway... i doubt we'll ever see unaligned-*** used on anything that's not just ram, so there's no problem here.
graphics hardware might need it. i.e. but that can be considered RAM
What's wrong?
it just "feels" ugly to me.
What about using it until someone comes up with a nicer solution? We're modular enough
for the dictionaries, a little playing around with HERE and LAST and LATEST will do. fcode lookup tables are only valid for the package that defines it, and only during package loading, so i see no problem there?
is it legal for an fcode program to overload words defined within the lower space (i.e. 0x000-0x5ff)? If not, we can keep this list static and split the array up in 2 parts, one for the static data, one for the dynamic table that can be changed by user packages (i.e. containing fcode#s above 0x800) This would really speedup initialization when creating a new package or executing an fcode program using byte-load in general.
Stefan
Stefan Reinauer wrote:
- Segher Boessenkool segher@chello.nl [020626 07:58]:
you actually don't need to know endianness to implement a fast unaligned-*** in pure Forth :)
: unaligned-l@ here /l move> here l@ ;
Ah.. nice idea. Which makes me think it might be worth writing accelerated versions of move, that only copy unaligned starts/ends of a block byte per byte and use cell size copying for the rest. Don't know whether it's worth the overhead for the typical amount of data copied with move in an OF implementation.
Most probably it's worth the effort. Not too much effort, either:
MOVE --> memmove() MOVE> --> memcpy()
most hardware won't allow general unaligned accesses to be atomic anyway... i doubt we'll ever see unaligned-*** used on anything that's not just ram, so there's no problem here.
graphics hardware might need it. i.e. but that can be considered RAM
Graphics drivers need to be (partly) written in C or asm anyway (esp. the "blitter" parts), so as not too make the system feel sluggish.
is it legal for an fcode program to overload words defined within the lower space (i.e. 0x000-0x5ff)? If not, we can keep this list static and split the array up in 2 parts, one for the static data, one for the dynamic table that can be changed by user packages (i.e. containing fcode#s above 0x800) This would really speedup initialization when creating a new package or executing an fcode program using byte-load in general.
The FCode table is just (number of bits per cell) x 0.5kB big, and is only needed during package load, so I don't see the problem here?
Segher
- To unsubscribe: send mail to majordomo@freiburg.linux.de with 'unsubscribe openbios' in the body of the message http://www.freiburg.linux.de/OpenBIOS/ - free your system..
* Segher Boessenkool segher@chello.nl [020628 15:39]:
block byte per byte and use cell size copying for the rest. Don't know whether it's worth the overhead for the typical amount of data copied with move in an OF implementation.
Most probably it's worth the effort. Not too much effort, either:
MOVE --> memmove() MOVE> --> memcpy()
which just moves the implementation down a layer and speeds things up for host execution. If we want to get this thing to flash we have to do it ourselfes anyways, no matter whether we write it in forth or use a C/asm optimized memmove/memcpy
Graphics drivers need to be (partly) written in C or asm anyway (esp. the "blitter" parts), so as not too make the system feel sluggish.
Which was the reason why i proposed C written unaligned words.
is it legal for an fcode program to overload words defined within the lower space (i.e. 0x000-0x5ff)? If not, we can keep this list static and split the array up in 2 parts, one for the static data, one for the dynamic table that can be changed by user packages (i.e. containing fcode#s above 0x800) This would really speedup initialization when creating a new package or executing an fcode program using byte-load in general.
The FCode table is just (number of bits per cell) x 0.5kB big, and is only needed during package load, so I don't see the problem here?
The "problem" is that it takes paflof about 1.2secs per package on a 667MHz 21264 to initialize this table. Not really fatal, but I still want to see that compressed dictionary dumps are smaller than fcode drivers before I consider them the universal solution.
Stefan
Most probably it's worth the effort. Not too much effort, either:
MOVE --> memmove() MOVE> --> memcpy()
which just moves the implementation down a layer and speeds things up for host execution. If we want to get this thing to flash we have to do it ourselfes anyways, no matter whether we write it in forth or use a C/asm optimized memmove/memcpy
Ah, c'mon. You can link against some libraries and get memmove() et. al. for free, even when doing embedded work. This problem has been solved so many times already that there are lots of good, free, implementations available.
Graphics drivers need to be (partly) written in C or asm anyway (esp. the "blitter" parts), so as not too make the system feel sluggish.
Which was the reason why i proposed C written unaligned words.
Optimized blitters won't use unaligned accesses anyway (except for some boundary cases, maybe).
The "problem" is that it takes paflof about 1.2secs per package on a 667MHz 21264 to initialize this table. Not really fatal, but I still
Huh?
want to see that compressed dictionary dumps are smaller than fcode drivers before I consider them the universal solution.
Just wait and see, I guess...
Segher
- To unsubscribe: send mail to majordomo@freiburg.linux.de with 'unsubscribe openbios' in the body of the message http://www.freiburg.linux.de/OpenBIOS/ - free your system..