These discussions on romcc running out of registers, optimizations, inline and all are surprising. Isn't the point of romcc to just get the system ram up right at the beginning? Then once that is done any compiler (gcc) can be used to write code. Because dram configuration is so complicated and interpreting the SPD data from the dram, people would rather code that in 'C' than in x86 asm, so that's the purpose of romcc?
-Dave
Dave Ashley linuxbios@xdr.com writes:
These discussions on romcc running out of registers, optimizations, inline and all are surprising. Isn't the point of romcc to just get the system ram up right at the beginning?
Yes.
Then once that is done any compiler (gcc) can be used to write code. Because dram configuration is so complicated and interpreting the SPD data from the dram, people would rather code that in 'C' than in x86 asm, so that's the purpose of romcc?
Yes.
The most common problem is for people to write a piece of code that romcc cannot figure out how to make fit in only a limited set of registers. That is fairly rare at the moment but it does happen.
To reduce register pressure romcc inlines all functions. So there is no need to store a return address anywhere.
Inlining everything leads to a case where code generated by romcc is 3x larger than hand coded assembly for a similar problem. Last I checked when using both sse and mmx registers on the Opteron port I had about 8 free registers most of the time. And my average call depth is less than 8. So it looks reasonable to actually store a return address and cut down on register pressure.
The reason register allocation is hard is that it is an NP complete problem, which would take exponential time in the size of the program if I was to implement a perfect algorithm. Given that a O(N^2) algorithm takes a full minute using the perfect algorithm is unreasonable given that my current heuristic works almost every time. Taking any longer would mess up a programmers productivity. There is one known corner case where the current heuristic fails that I would like very much to improve.
But these are the only problems. And once I have finished exploring them romcc will be pretty much done. I would not work quite as much on them except I find solving the problems quite fun :)
Eric
* Dave Ashley linuxbios@xdr.com [031102 17:43]:
These discussions on romcc running out of registers, optimizations, inline and all are surprising. Isn't the point of romcc to just get the system ram up right at the beginning? Then once that is done any compiler (gcc) can be used to write code. Because dram configuration is so complicated and interpreting the SPD data from the dram, people would rather code that in 'C' than in x86 asm, so that's the purpose of romcc?
Yes. The problem that makes romcc run out of registers in my case can be circumvented by some small restructures in the code.
Given that not all spd roms of all modules are visible at the same time, but switched via an smbus hub, I have to make sure that before trying to access spd rom data with smbus_read_byte() the smbus hub is switched to the correct rom.
The current approach is to do this in spd_read_byte, which is used as a wrapper function around smbus_read_byte. It always sends a "switch" command to the smbus hub before actually reading from the rom. Since the function for switching parses the information from the ram controller struct and does an smbus_write_byte, it eats quite some registers all the time, in addition to the already used ones.
Since the dram controllers are initialized one after the other, we don't need to switch the smbus hub every time we do an access, but only before starting dram initialization on a given controller. This lowers register usage noticably and especially only uses registers at a point when we have plenty of them.
Therefore I suggest adding a function activate_spd_rom(const struct mem_controller *ctrl) that is called as the first command in sdram_set_spd_registers() that can be implemented by the motherboard specific code if needed, similar to memreset() now.
As far as I can see, all spd accesses done by sdram_set_spd_registers() end up on the same spd rom, making it safe to switch only one time per call to this function.
I'll change the code to reflect this and see if it works out.
Stefan