Hello,
I have successfully used the cache in the K8 processor as RAM on the AMD Serenade mainboard. The cache as ram is used as a tiny stack space for the code generated by GCC which replace the need for a register only C complier like ROMCC. Now the whole LinuxBIOS C code can be compiled by GCC.
There are few problems remaining. The first thing is I can only use 7 cache lines of cache (448 bytes) reliably in the K8. The access to the 8th cache line is unstable and the access to the 9th cache line hangs the processor. The other problem is the optimize_connection() function for multi-processor configuration runs unstably under CAR. It does not overflow the stack, it's just plain unstable for some reason. So I can only configure the mainboard as Uniprocessor.
Is there anyone has any idea about these problems ? If we can solve these two problems, Cache As Ram can be used routinly for K8 and probably we can try to extend it to some other processors.
YHLu, Do you have any dual K8 EVB with HDT pin out and works "perfectly" under normal (ROMCC) LinuxBIOS ? The ROMCC LinuxBIOS deos not work for the AMD Serenade board 100% (I still have the phantom device s problem) and the s2885 board we have does not have the pin out.
Eric, What is the "effective" or "equalvalent" stack size of ROMCC ? Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
Ollie
It's great to hear that AMD CPU allow software to access Cache, it's absolutely not allowed in the Intel CPU which I just finish working on.
I guess you must turned off cache completely and CPUs are not prefetching, snooping and ete. before you use it as memory, right? I'm not that knowledgable at this point with AMD Opteron. but this is pretty intrigueing. I will try it someday, I hope AMD supports this type of use. In another word, AMD will not scew up the cache data software saved at the background and SW saving data to it doesn't impact CPU function.
I'm really try to ask you about another question, you have mentioned the AMD Serenade Mainboard, Do you have the LinuxBIOS source code for this AMD Serenade Mainboard? I couldn't find it in the Sorceforge CVS. (Only AMD Solo and Quatet are available there)
I have a Tyan board, and Yh gave me some latest code which works great. But Tyan doesn't have a HDT Header. AMD Serenade will be a good chioce for us to try some our code and watch it using an American Arium.
Thanks
Tony
----- Original Message ----- From: "Li-Ta Lo" ollie@lanl.gov To: "LinuxBIOS" linuxbios@clustermatic.org; "Eric Biederman" ebiederman@lnxi.com; "YhLu" YhLu@tyan.com Sent: Thursday, June 24, 2004 10:27 AM Subject: Using Cache As Ram for K8
Hello,
I have successfully used the cache in the K8 processor as RAM on the AMD Serenade mainboard. The cache as ram is used as a tiny stack space for the code generated by GCC which replace the need for a register only C complier like ROMCC. Now the whole LinuxBIOS C code can be compiled by GCC.
There are few problems remaining. The first thing is I can only use 7 cache lines of cache (448 bytes) reliably in the K8. The access to the 8th cache line is unstable and the access to the 9th cache line hangs the processor. The other problem is the optimize_connection() function for multi-processor configuration runs unstably under CAR. It does not overflow the stack, it's just plain unstable for some reason. So I can only configure the mainboard as Uniprocessor.
Is there anyone has any idea about these problems ? If we can solve these two problems, Cache As Ram can be used routinly for K8 and probably we can try to extend it to some other processors.
YHLu, Do you have any dual K8 EVB with HDT pin out and works "perfectly" under normal (ROMCC) LinuxBIOS ? The ROMCC LinuxBIOS deos not work for the AMD Serenade board 100% (I still have the phantom device s problem) and the s2885 board we have does not have the pin out.
Eric, What is the "effective" or "equalvalent" stack size of ROMCC ? Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
Ollie
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
On Thu, 2004-06-24 at 12:33, Tony Cheng wrote:
It's great to hear that AMD CPU allow software to access Cache, it's absolutely not allowed in the Intel CPU which I just finish working on.
I guess you must turned off cache completely and CPUs are not prefetching, snooping and ete. before you use it as memory, right? I'm not that knowledgable at this point with AMD Opteron. but this is pretty intrigueing. I will try it someday, I hope AMD supports this type of use. In another word, AMD will not scew up the cache data software saved at the background and SW saving data to it doesn't impact CPU function.
I'm really try to ask you about another question, you have mentioned the AMD Serenade Mainboard, Do you have the LinuxBIOS source code for this AMD Serenade Mainboard? I couldn't find it in the Sorceforge CVS. (Only AMD Solo and Quatet are available there)
The AMD Serenade board support for ROMCC LinuxBIOS is in the CVS. Please do a "cvs update -dP". The cache as ram version is not committed yet.
Ollie
Does the CVS server down?
I get a "timed out" message like below. I can access the CVS about 10 days ago.
$ cvs -d:pserver:anonymous@cvs.freebios.sourceforge.net:/cvsroot/freebios login Logging in to :pserver:anonymous@cvs.freebios.sourceforge.net:2401/cvsroot/freebios CVS password:
cvs [login aborted]: connect to cvs.freebios.sourceforge.net(66.35.250.209):2401 failed: Connection timed out
Tony
Li-Ta Lo wrote:
The AMD Serenade board support for ROMCC LinuxBIOS is in the CVS. Please do a "cvs update -dP". The cache as ram version is not committed yet.
Ollie
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
Guess it's that time of year again when Sourceforge starts to buckle.
Those of us blessed with developer access tend to have a higher rate of success when syncing to the CVS tree. Here's a snapshot from about 3 minutes ago: http://www.flagen.com/~sc/linux/lb/cvs_snapshots/freebios2-06252004.tar.bz2
On Fri, 25 Jun 2004, Tony Cheng wrote:
Does the CVS server down?
I get a "timed out" message like below. I can access the CVS about 10 days ago.
$ cvs -d:pserver:anonymous@cvs.freebios.sourceforge.net:/cvsroot/freebios login Logging in to :pserver:anonymous@cvs.freebios.sourceforge.net:2401/cvsroot/freebios CVS password:
cvs [login aborted]: connect to cvs.freebios.sourceforge.net(66.35.250.209):2401 failed: Connection timed out
Tony
Li-Ta Lo wrote:
The AMD Serenade board support for ROMCC LinuxBIOS is in the CVS. Please do a "cvs update -dP". The cache as ram version is not committed yet.
Ollie
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
Hello from Gregg C Levine Tony are you accessing the CVS storage via the instructions on the regular Linux BIOS website? Or on the Source Forge instructions page? Of the two the Source Forge instructions page are correct. And in fact I just checked, these pages, the ones at http://sourceforge.net/cvs/?group_id=3206 are correct. The ones that are up at the regular website are not, they are over due for updating. ------------------- Gregg C Levine hansolofalcon@worldnet.att.net ------------------------------------------------------------ "The Force will be with you...Always." Obi-Wan Kenobi "Use the Force, Luke." Obi-Wan Kenobi
-----Original Message----- From: linuxbios-admin@clustermatic.org [mailto:linuxbios- admin@clustermatic.org] On Behalf Of Hendricks David W. Sent: Friday, June 25, 2004 3:16 PM To: Tony Cheng Cc: Li-Ta Lo; LinuxBIOS Subject: Re: CVS Server down?
Guess it's that time of year again when Sourceforge starts to
buckle.
Those of us blessed with developer access tend to have a higher rate of success when syncing to the CVS tree. Here's a snapshot from
about 3
minutes ago:
http://www.flagen.com/~sc/linux/lb/cvs_snapshots/freebios2-06252004.ta r.bz2
On Fri, 25 Jun 2004, Tony Cheng wrote:
Does the CVS server down?
I get a "timed out" message like below. I can access the CVS
about 10
days ago.
$ cvs
-d:pserver:anonymous@cvs.freebios.sourceforge.net:/cvsroot/freebios login
Logging in to
:pserver:anonymous@cvs.freebios.sourceforge.net:2401/cvsroot/freebios
CVS password:
cvs [login aborted]: connect to cvs.freebios.sourceforge.net(66.35.250.209):2401 failed:
Connection
timed out
Tony
Li-Ta Lo wrote:
The AMD Serenade board support for ROMCC LinuxBIOS is in the CVS.
Please
do a "cvs update -dP". The cache as ram version is not committed
yet.
Ollie
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
On Fri, 25 Jun 2004, Tony Cheng wrote:
$ cvs -d:pserver:anonymous@cvs.freebios.sourceforge.net:/cvsroot/freebios login Logging in to :pserver:anonymous@cvs.freebios.sourceforge.net:2401/cvsroot/freebios CVS password:
sorry, change to cvs.sourceforge.net
take the freebios out
ron
Li-Ta Lo ollie@lanl.gov writes:
Hello,
I have successfully used the cache in the K8 processor as RAM on the AMD Serenade mainboard. The cache as ram is used as a tiny stack space for the code generated by GCC which replace the need for a register only C complier like ROMCC. Now the whole LinuxBIOS C code can be compiled by GCC.
Note this certainly will not work for older cpus. But there is less complexity there so hopefully romcc is sufficient.
There are few problems remaining. The first thing is I can only use 7 cache lines of cache (448 bytes) reliably in the K8. The access to the 8th cache line is unstable and the access to the 9th cache line hangs the processor. The other problem is the optimize_connection() function for multi-processor configuration runs unstably under CAR. It does not overflow the stack, it's just plain unstable for some reason. So I can only configure the mainboard as Uniprocessor.
Most likely it is the cross cpu probes, causing cache invalidates. You may be able to ``improperly'' setup caching of memory (no cross cpu probes) while you are initializing the memory controllers.
I wonder if some part of that cache line access problems are the swapping between L1 and L2. Although that sounds unlikely.
Is there anyone has any idea about these problems ? If we can solve these two problems, Cache As Ram can be used routinly for K8 and probably we can try to extend it to some other processors.
Ollie while in theory the cache as RAM idea works. When I have implemented it has been a case of fixing it with every cpu rev. Whereas romcc while it is harder, only needs to be stabilized once. And you don't need to load a microcode update just so your code can run.
Before we do this routinely I would really like some buy off from AMD that they would support this. But anyway...
On the fun side it would be extremely interesting is if you could get enough memory working to start paging and we could go into 64bit mode :) That is likely tempting fate too much.....
Eric, What is the "effective" or "equalvalent" stack size of ROMCC ? Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
8 (gpr) + 8 (mmx) + 8 (sse) registers each 4 bytes long = 96 bytes. Looking at the hdama configuration my max inline depth is 14 procedures so that likely totals to another 14 *4 = 56 bytes in return addresses. So 448 bytes would be a small improvement.
Note generally I have noticed romcc compiled does not even use all of the registers...
Eric
Eric W. Biederman wrote:
Eric, What is the "effective" or "equalvalent" stack size of ROMCC ? Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
8 (gpr) + 8 (mmx) + 8 (sse) registers each 4 bytes long = 96 bytes. Looking at the hdama configuration my max inline depth is 14 procedures so that likely totals to another 14 *4 = 56 bytes in return addresses. So 448 bytes would be a small improvement.
Note generally I have noticed romcc compiled does not even use all of the registers...
For comparison when using U-Boot for PPC:
"Note: on PPC, we could use a static initializer (since the address of the global data structure is known at compile time), but it turned out that reserving a register results in somewhat smaller code - although the code savings are not that big (on average for all boards 752 bytes for the whole U-Boot image, 624 text + 127 data)."
-Bari
* Eric W. Biederman ebiederman@lnxi.com [040625 00:27]:
On the fun side it would be extremely interesting is if you could get enough memory working to start paging and we could go into 64bit mode :) That is likely tempting fate too much.....
Ack! With all C code compiled by gcc this sounds like a reasonable goal. But will any payloads work with this?
Eric, What is the "effective" or "equalvalent" stack size of ROMCC ? Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
8 (gpr) + 8 (mmx) + 8 (sse) registers each 4 bytes long = 96 bytes. Looking at the hdama configuration my max inline depth is 14 procedures so that likely totals to another 14 *4 = 56 bytes in return addresses. So 448 bytes would be a small improvement.
With current CVS the code shrinks from about 90k object size to 10k. This is actually not bad for a small improvement. I have not tried major hand tuning with romcc's anti-inline tags yet.
Stefan
Stefan Reinauer stepan@openbios.org writes:
- Eric W. Biederman ebiederman@lnxi.com [040625 00:27]:
On the fun side it would be extremely interesting is if you could get enough memory working to start paging and we could go into 64bit mode :) That is likely tempting fate too much.....
Ack! With all C code compiled by gcc this sounds like a reasonable goal. But will any payloads work with this?
Besides the kernel? For the payloads it does not matter. Having an ELF loader that does both 32bit and 64bit is not hard, and the code already exists in etherboot. One of the advantages of LinuxBIOS is that we don't have to run in the same processor mode as our payloads. We can switch processor word size or endianness and things should continue to work.
Even with romcc an x86-64 port would not be hard. There are just some interesting complications in using a minimal 4K page table that I have been avoiding. Thinking about it though the extra programmer visible registers might just make it worth it, especially combined with non-inlining.
One of these days I must take the time to generate a 64bit ELF executable from an Opteron kernel.
Eric, What is the "effective" or "equalvalent" stack size of ROMCC ? Is 448 bytes of stack adquant for ROMCC "linted" code in general ?
8 (gpr) + 8 (mmx) + 8 (sse) registers each 4 bytes long = 96 bytes. Looking at the hdama configuration my max inline depth is 14 procedures so that likely totals to another 14 *4 = 56 bytes in return addresses. So 448 bytes would be a small improvement.
With current CVS the code shrinks from about 90k object size to 10k.
??? I don't know where the 90k comes from. But the code compiled with romcc should likely fall into the 10k real without inlining. Currently it is about 30k-40k.
My basic objection is that I have been down the cache as ram path once, and decided there was less maintenance in doing romcc. After writing romcc I still feel that way. Not having to worry about a BIOS breaking because of a new processor and is a relief.
This is actually not bad for a small improvement. I have not tried major hand tuning with romcc's anti-inline tags yet.
In theory it should be similar in code size. In practice things aren't quite usable yet.
Eric
On Thu, 2004-06-24 at 16:27, Eric W. Biederman wrote:
There are few problems remaining. The first thing is I can only use 7 cache lines of cache (448 bytes) reliably in the K8. The access to the 8th cache line is unstable and the access to the 9th cache line hangs the processor. The other problem is the optimize_connection() function for multi-processor configuration runs unstably under CAR. It does not overflow the stack, it's just plain unstable for some reason. So I can only configure the mainboard as Uniprocessor.
Most likely it is the cross cpu probes, causing cache invalidates. You may be able to ``improperly'' setup caching of memory (no cross cpu probes) while you are initializing the memory controllers.
That is exactly the point I don't understand. In theory, the AP is halted and HT routing disabled on powerup before the first instruction is even executed. But to what extend is the processor "halted", does it still try to maintain cache coherence ? And how about the northbridge in the processor ? Obviously, the northbridge is in a working state when the processor is halted, what is the impact of northbridge on the cache of the processor ?
On the fun side it would be extremely interesting is if you could get enough memory working to start paging and we could go into 64bit mode :) That is likely tempting fate too much.....
Why does CAR have anything to do with entering 64bit mode ?
Ollie
Li-Ta Lo ollie@lanl.gov writes:
On Thu, 2004-06-24 at 16:27, Eric W. Biederman wrote:
There are few problems remaining. The first thing is I can only use 7 cache lines of cache (448 bytes) reliably in the K8. The access to the 8th cache line is unstable and the access to the 9th cache line hangs the processor. The other problem is the optimize_connection() function for multi-processor configuration runs unstably under CAR. It does not overflow the stack, it's just plain unstable for some reason. So I can only configure the mainboard as Uniprocessor.
Most likely it is the cross cpu probes, causing cache invalidates. You may be able to ``improperly'' setup caching of memory (no cross cpu probes) while you are initializing the memory controllers.
That is exactly the point I don't understand. In theory, the AP is halted and HT routing disabled on powerup before the first instruction is even executed. But to what extend is the processor "halted", does it still try to maintain cache coherence ?
Yes. Only the instruction dispatch unit is halted. The other functionality continues to work. Intel cpus do this to some extent as well. Although I think a halted processor does is quite.
Look at what disable_probes does in the hypertransport initialization. Those are the control bits of to see if another cpu is asked about cache coherency.
And how about the northbridge in the processor ? Obviously, the northbridge is in a working state when the processor is halted, what is the impact of northbridge on the cache of the processor ?
The northbridge should not generate any requests on it's own.
On the fun side it would be extremely interesting is if you could get enough memory working to start paging and we could go into 64bit mode :) That is likely tempting fate too much.....
Why does CAR have anything to do with entering 64bit mode ?
64bit mode can only be entered with paging enabled. With 64K of L1 cache there is enough space to hold the page tables. Roughly (4K+4K+4K+4K + 4K + 4K) = 24K is needed to map 4G of memory without using a hack of page table. That is not something we want to use before some kind of memory is initialized because it would waste rom space much worse than romcc.
.....
Now that Intel in ``Intel Pentium 4 Processors on 90num Process Specification update'' R28 has let the cat out of the bag I cant vent a little. Intel is formally supporting cache as ram in their newest cpus, and there have been enough issues having romcc and not having to worry about them has been a relief. This may just be a temporary case of teething pains but...
Eric
Hi Ollie,
The Serenade BIOS I've got from CVS Server can not build. Do you see this also? The payload size of my binary is about 55k.
The error messages are as follow:
objcopy -O binary linuxbios_c linuxbios_payload.bin ./nrv2b e linuxbios_payload.bin linuxbios_payload.nrv2b input/output = 67188/26767 = 2.510 cp linuxbios_payload.nrv2b linuxbios_payload gcc -nostdlib -nostartfiles -static -o linuxbios -T ldscript.ld crt0.o /usr/bin/ld: section .id [ffffffd8 -> ffffffef] overlaps section .payload [ffff9750 -> ffffffe2] collect2: ld returned 1 exit status make[1]: *** [linuxbios] Error 1 make[1]: Leaving directory `/root/dl/freebios2/targets/amd/serenade/serenade/fallback' make: *** [fallback-rom] Error 1
Thanks
Tony
On Fri, 2004-06-25 at 17:17, Tony Cheng wrote:
Hi Ollie,
The Serenade BIOS I've got from CVS Server can not build. Do you see this also? The payload size of my binary is about 55k.
The error messages are as follow:
objcopy -O binary linuxbios_c linuxbios_payload.bin ./nrv2b e linuxbios_payload.bin linuxbios_payload.nrv2b input/output = 67188/26767 = 2.510 cp linuxbios_payload.nrv2b linuxbios_payload gcc -nostdlib -nostartfiles -static -o linuxbios -T ldscript.ld crt0.o /usr/bin/ld: section .id [ffffffd8 -> ffffffef] overlaps section .payload [ffff9750 -> ffffffe2] collect2: ld returned 1 exit status make[1]: *** [linuxbios] Error 1 make[1]: Leaving directory `/root/dl/freebios2/targets/amd/serenade/serenade/fallback' make: *** [fallback-rom] Error 1
It is the 64kB size limit problem. Please set the
option MAXIMUM_CONSOLE_LOGLEVEL=8 option DEFAULT_CONSOLE_LOGLEVEL=8
in Config.lb to some lower value.
Ollie
Thanks
Tony