Neil has a very interesting point, and it does bring up the "cache as ram" issue again.
Neil, my only question is, did you test this MTRR approach on lots of CPUs. My impression is that it is not guaranteed to work.
thanks
ron
---------- Forwarded message ---------- Date: Wed, 9 Oct 2002 17:50:22 -0600 From: Neil Crossley etanza@lycos.co.uk To: rminnich@lanl.gov Subject: Making IPL coding easier?
Hi Ron,
Dunno if this will be any use to you but you never know .....
I was thinking how much easier is would be to write dram setup code for IPL's if it was possible to get even a small amount of memory for temp variables and a stack. This would get rid of the annoing CALL_SP/BP and RET_SP macros.
Obviously this 'memory' couldn't be DRAM coz we haven't initialised it yet. and using the CMOS RTC locations would get us a few bytes of memory, but no stack (i'm suprised you dont use the RTC memory though as at least you could store a few temp vars if you needed to)
Having some work memory should make the IPL code a bit more readable and thus easier to work with, you could also put more functionality there leaving less work for the C code to do later.
I found 2 methods of getting some work memory, one relies of the motherboard having a SiS7018 soundchip (as found on the SiS540/630/730 series) and the other requires a CPU with MTRR support.
Using method 2 may allow you to code the DRAM initialisation code in 'C' as it will provide you with 64K of work memory, but it may take some tweaking to get C code running.
Appologies for the following code fragments using intel syntax but they should be easy enoogh to follow.
Have fun and mail me if you have any questions ...
bye for now .... Neil :)
METHOD 1 - Use the 7018 Soundchip to get 1KB of memory ------------------------------------------------------
The first way is at least relativley sane but relies on the mainboard having the 7018 soundchip enabled (some motherboard manufacturers choose to disable this and fit another chip on the mainboard)
The 1KB of ram exists from offset 0x800 in the soundchip's Memory mapped IO space
mov ebx,0x80000c00 ; Select 7018 Soundchip mov eax,ebx ; Set MMIO Base address mov al,0x14 mov dx,0xcf8 out dx,eax add dl,4 mov eax,0xf8100000 ; Map MMIO at this address out dx,eax mov eax,ebx ; Enable device mov al,0x04 sub dl,4 out dx,eax add dl,4 mov eax,0x6 ; Just set Bus master and MMIO, not I/O out dx,eax mov esp,0xf8100bfc ; Set stack to top of ram mov ebp,0xf8100800 ; Set ebp to bottom mov eax,0xf00dface ; Check the memory is really mov [ebp],eax ; working ... mov ebx,[ebp] cmp eax,ebx je .got_mem jmp ERROR_1 ; issue some sort of BEEP code or something .got_mem
As far as I know, VIA and ALI both use variations of this soundchip on their integrated boards so this may work on those chipsets too :)
METHOD 2 - Abuse the CPU Cache to get 64K of memory ---------------------------------------------------
CPU's like the Hitachi SH series allow the cache to be memory mapped and used as RAM, so I wondered if there was any way to do this on the Intel chips. I actually discovered this method by accident (dont ask - long story!!). To use it requires a CPU with MTRR support.
Note - When you switch to protected mode make sure the NW and CD bits in CR0 are 0 (enabled) doing "mov eax,1" and "mov cr0,eax" will do this nicely :)
; Enable Variable MTRR's only (not fixed) and set the default memory type to ; uncached
xor edx,edx mov eax,0x800 mov ecx,0x2ff wrmsr
; Allocate ONE MTRR to map 64KB at location 0xc000000
mov ecx,0x200 ; Base mov edx,0 mov eax,0xc0000006 wrmsr mov eax,0xffff0800 ; Mask (64K) mov edx,0x0000000f mov ecx,0x201 wrmsr
cld ; Wipe our 'memory' forcing all the 64K xor eax,eax ; to be cached mov ecx,65536/4 mov edi,ebp rep stosd
mov esp,0xc000fffc ; Set stack to top of ram mov ebp,0xc0000000 ; Set ebp to bottom
mov eax,0xf00dface ; Check the memory is really mov [ebp],eax ; working ... mov ebx,[ebp] cmp eax,ebx je .got_mem jmp ERROR_1 ; issue some sort of BEEP code or something .got_mem
______________________________________________________ Check out all the latest outrageous email attachments on the Outrageous Email Chart! - http://viral.lycos.co.uk
Ronald G Minnich rminnich@lanl.gov writes:
Neil has a very interesting point, and it does bring up the "cache as ram" issue again.
Neil, my only question is, did you test this MTRR approach on lots of CPUs. My impression is that it is not guaranteed to work.
Let me respond. My apologies for the delay but I misplaced this email.
There are several sides to this problem. 1) I have tried setting up the MTRRs and fake an area of ram. In fact if you look at the p4dc6 I actually implemented it and did a port that way.
Unfortunately there is not an architecturally garanteed way to make this work, and cpu designers are continuously tweaking how their caches work so it breaks with new cpus at the drop of a hat. Especially new hyperthreaded monsters.
There is no real savings in having code that breaks when new cpus are introduced.
2) Using memory that is elsewhere in the system. I do use cmos memory but not currently for temporaries. I like the idea. But with only 128bytes to play with you can not do much with it.
3) Using external ram for a stack, is dangerous in a couple of ways. Cache coherency with pci devices isn't garanteed, so setting up mttrs over the stack is not guaranteed to work.
The common case is in processors is to work against cacheable memory. I have seen P4's incorrectly execute code when caching was disabled. Stacks may soon follow suite. So long term this does not look a solution.
Then there is the added danger that the device or something on the path to the device will be corrupted/broken. And then you get very strange hard to debug failure cases.
The only real solution to this problem is to build a compiler that generates code that does not use ram. And then the compiler can have all of the smarts about how to call subroutines etc. The coding would still be tryi
Eric
Ronald G Minnich rminnich@lanl.gov writes:
Neil has a very interesting point, and it does bring up the "cache as ram" issue again.
Neil, my only question is, did you test this MTRR approach on lots of CPUs. My impression is that it is not guaranteed to work.
Let me respond. My apologies for the delay but I misplaced this email.
There are several sides to this problem. 1) I have tried setting up the MTRRs and fake an area of ram. In fact if you look at the p4dc6 I actually implemented it and did a port that way.
Unfortunately there is not an architecturally garanteed way to make this work, and cpu designers are continuously tweaking how their caches work so it breaks with new cpus at the drop of a hat. Especially new hyperthreaded monsters.
There is no real savings in having code that breaks when new cpus are introduced.
2) Using memory that is elsewhere in the system. I do use cmos memory but not currently for temporaries. I like the idea. But with only 128bytes to play with you can not do much with it.
3) Using external ram for a stack, is dangerous in a couple of ways. Cache coherency with pci devices isn't garanteed, so setting up mttrs over the stack is not guaranteed to work.
The common case is in processors is to work against cacheable memory. I have seen P4's incorrectly execute code when caching was disabled. Stacks may soon follow suite. So long term this does not look a solution.
Then there is the added danger that the device or something on the path to the device will be corrupted/broken. And then you get very strange hard to debug failure cases.
The only real solution to this problem is to build a compiler that generates code that does not use ram. And then the compiler can have all of the smarts about how to call subroutines etc. The coding would still be tricky as you could not have too many active variables at ounce but it should be much easier than the current situation.
Eric