Li-Ta Lo <ollie(a)lanl.gov> writes:
I have successfully used the cache in the K8 processor as RAM on
the AMD Serenade mainboard. The cache as ram is used as a tiny
stack space for the code generated by GCC which replace the need
for a register only C complier like ROMCC. Now the whole LinuxBIOS
C code can be compiled by GCC.
Note this certainly will not work for older cpus. But there is less
complexity there so hopefully romcc is sufficient.
There are few problems remaining. The first thing is I
use 7 cache lines of cache (448 bytes) reliably in the K8. The
access to the 8th cache line is unstable and the access to the
9th cache line hangs the processor. The other problem is the
optimize_connection() function for multi-processor configuration
runs unstably under CAR. It does not overflow the stack, it's just
plain unstable for some reason. So I can only configure the mainboard
Most likely it is the cross cpu probes, causing cache invalidates.
You may be able to ``improperly'' setup caching of memory (no cross
cpu probes) while you are initializing the memory controllers.
I wonder if some part of that cache line access problems are
the swapping between L1 and L2. Although that sounds unlikely.
Is there anyone has any idea about these problems ? If
we can solve
these two problems, Cache As Ram can be used routinly for K8 and
probably we can try to extend it to some other processors.
Ollie while in theory the cache as RAM idea works. When I have implemented
it has been a case of fixing it with every cpu rev. Whereas romcc while it
is harder, only needs to be stabilized once. And you don't need to load
a microcode update just so your code can run.
Before we do this routinely I would really like some buy off from AMD
that they would support this. But anyway...
On the fun side it would be extremely interesting is if you could get
enough memory working to start paging and we could go into 64bit mode :)
That is likely tempting fate too much.....
What is the "effective" or "equalvalent" stack size of ROMCC ?
Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
8 (gpr) + 8 (mmx) + 8 (sse) registers each 4 bytes long = 96 bytes.
Looking at the hdama configuration my max inline depth is 14
procedures so that likely totals to another 14 *4 = 56 bytes in
return addresses. So 448 bytes would be a small improvement.
Note generally I have noticed romcc compiled does not even use all of