Li-Ta Lo ollie@lanl.gov writes:
Hello,
I have successfully used the cache in the K8 processor as RAM on the AMD Serenade mainboard. The cache as ram is used as a tiny stack space for the code generated by GCC which replace the need for a register only C complier like ROMCC. Now the whole LinuxBIOS C code can be compiled by GCC.
Note this certainly will not work for older cpus. But there is less complexity there so hopefully romcc is sufficient.
There are few problems remaining. The first thing is I can only use 7 cache lines of cache (448 bytes) reliably in the K8. The access to the 8th cache line is unstable and the access to the 9th cache line hangs the processor. The other problem is the optimize_connection() function for multi-processor configuration runs unstably under CAR. It does not overflow the stack, it's just plain unstable for some reason. So I can only configure the mainboard as Uniprocessor.
Most likely it is the cross cpu probes, causing cache invalidates. You may be able to ``improperly'' setup caching of memory (no cross cpu probes) while you are initializing the memory controllers.
I wonder if some part of that cache line access problems are the swapping between L1 and L2. Although that sounds unlikely.
Is there anyone has any idea about these problems ? If we can solve these two problems, Cache As Ram can be used routinly for K8 and probably we can try to extend it to some other processors.
Ollie while in theory the cache as RAM idea works. When I have implemented it has been a case of fixing it with every cpu rev. Whereas romcc while it is harder, only needs to be stabilized once. And you don't need to load a microcode update just so your code can run.
Before we do this routinely I would really like some buy off from AMD that they would support this. But anyway...
On the fun side it would be extremely interesting is if you could get enough memory working to start paging and we could go into 64bit mode :) That is likely tempting fate too much.....
Eric, What is the "effective" or "equalvalent" stack size of ROMCC ? Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
8 (gpr) + 8 (mmx) + 8 (sse) registers each 4 bytes long = 96 bytes. Looking at the hdama configuration my max inline depth is 14 procedures so that likely totals to another 14 *4 = 56 bytes in return addresses. So 448 bytes would be a small improvement.
Note generally I have noticed romcc compiled does not even use all of the registers...
Eric