Stefan Reinauer stepan@suse.de writes:
- ron minnich rminnich@lanl.gov [031202 17:13]:
On 1 Dec 2003, Eric W. Biederman wrote:
Ron one thing you did note was the changing of word accesses to byte accesses. With romcc that does not help in the case of register pressure.
I would think it would hurt since x86 lets you use those little sub-registers (puddle arithmetic), so using bigger registers reduces the number of registers available.
Yes, being able to use this from romcc would severely lower register pressure I assume. Neither romcc nor the code compiled with it takes care of this at the moment though.
I tried this at one point. And the problem is that there is not a instruction sequence to move to/from the byte registers from a normal 32bit register. Which negates most of the benefit of the extra registers. 64bit mode on the Opteron gets byte register correct but it no longer has more than one byte register per general purpose register.
Getting in support for mmx and sse registers was much more beneficial. 16 more instead of just 4.
A more general purpose technique is to use bit-fields. I am close to having bit-fields implemented in my backburner version of romcc. I have some really odd ball ideas about bitfields in 128 bit sse registers :) But who knows when I will get that done.
Bit-fields still share with the x86 byte registers the property of increasing the register pressure when you modify their values or read/write them. (Because the field needs a register of it's own to be modified). But when they are just passed around they can nicely reduce the register pressure. And in addition they are under programmer control so you know it is a trade off between register pressure when using the value and register pressure when passing the values.
You can roll bit-fields by hand at the moment if you want though.
What I find most disturbing is last I looked is that size crt0.o list it at about 33K (After lowering spurious debugging messages from debug to spew). And linuxbios_payload.nrvb at about 24K. crt0.o from the p4dpr is at about 10K. So romcc is giving me a 3X code bloat... I am pretty certain it is code bloat caused by inlining everything.
Ron you complained earlier about compile speed and I think romcc is the big culprit there. It's register allocator is currently using a O(N^2) data structure, so the more code it compiles the slower it gets... I think I saw another version of basically the same algorithm that uses a different data structure, which would make it much faster.
Right now the speed is tolerable when I remember to set #define DEBUG_CONSISTENCY 1 instead of 2 which I committed accidently the other day. DEBUG_CONSISTENCY 2 is only really useful when debugging the register allocator. With a perfect compiler DEBUG_CONSISTENCY is not needed at all but romcc is still teething so if there is not a performance hit it is useful.
Eric