I have just implemented inline assembly and a better allocator in romcc.
It will now use sse and mmx registers when you give it the appropriate -mcpu= option. I am not happy with the command line options but fixing them is easy.
In addition the quality of the register allocations has improved to the point that it looks quite a bit like it was done by hand. There are limits but there are fewer stupid things happening.
The inline assembly is syntactically the same as gcc's. But it is not 100% compatible with gcc. The goal is to be as compatible with gcc as gcc is compatible with gcc between ports. In practice this means that romcc supports a subset of what gcc does on x86, with respect to constraints.
On inline assembly versus builtin operations. A compiler has a lot more potential to optimize code in a builtin so I will continue to implement builtins. Support wise inline assembly with parameters is a pain to initially implement but there really is not a support burden after that. And there are some operations where there is not gain making them a builtin.
The code is now checked into the freebios2 tree. And I can get back to exercising romcc by writing code with it.
Eric