I keep trying to get Cache-As-Ram working with a dual Xeon P4/HT board,
and have a few simple(?) questions about cache behaviour. I may have
misunderstood some aspects of the cache logic here.
As a test setup, my coreboot.rom image now includes the cache_as_ram.inc
from intel/model_6ex and I link it with a ROMCC compiled romstage for
easier debugging. Except for the cache-based stack, I got this to work
nicely.
The Xeon P4/HT CPUs installed on the mainboard have 8 kB L1, 512 kB L2
and 1024 kB L3. All levels share code and data, cache-line is 64 bytes.
Below I have ignored the existence of L3.
Problem 1: L2 does not store cache-line.
I may have a case that L2 cache is currently not enabled at all. The
reference code uses MSR 0x11e to explicitly enable L2, but this MSR does
not exist for Xeon P4 and actually halts CPU. I did not find controls
besides CR0 and L3 disable MSR that effect cache.
As a test procedure, I have defined a 16 kB cache region over
non-existing MMIO on the system bus, just below the FWH decode range.
DCACHE_RAM_SIZE=0x4000 and DCACHE_RAM_BASE=0xffafc000. All reads there
return 0xFF's and any writes are ignored.
MTRR is setup for write-back. State of CR0.CD (cache disable bit) seems
to have no effect on this test.
For every dword in the range, starting from BASE, I first read them from
system bus, hoping to get valid cache-lines in L2 to hit later on. Then
I write each dword in the range with its address. When reading back,
again starting from BASE, the last 8kB (except for one cache line)
return the contents I wrote.
My conclusion of this is, that when a modified cache-line is
de-allocated from L1, there is a write miss on L2 and the write is lost
on the system bus. Is this allowed or typical behaviour, as under normal
operation the cache-line would be stored in DRAM?
Is the minimum amount of RAM required for romstage 32 kB? (STACK_SIZE =
0x8000)? So L1 alone cannot handle it?
Problem 2: Can I skip cache-fill from L2?
This question is only relevant, if I cannot enable L2 and the 8kB L1
would be enough for Cache-As-Ram (which I doubt).
I have dirty (exclusive?) cache-lines stored in L1 and code reaches
execution of an instruction that requires writing dirty lines to system
bus. Examples of such opcodes : inb, outb, mov ->cr0, wrmsr->MTRR.
I assume that with one of these opcodes, any modified cache-lines are
written on the system bus. The cache-line data remains valid in L1, but
the state is probably changed (exclusive -> shared ?). The first write
access on such a line will cause a fill from next level cache (L2) or
system bus. In my case L2 might be disabled, so cache-line then contains
all 0xFF's except for the data from our store instruction. Further
writes on the same line do not cause new fill.
Is there a way to avoid the fill from L2 on the first write, and re-use
the valid data on the (shared?) line? I thought the no-fill mode
(CR0.NW) would do this for me.
Problem 3: Cache re-allocation policy?
Cache-lines for the stack must remain in L1 while the XIP ROM lines can
be thrown out whenever necessary. Generally, do dirty cache-lines remain
in L1 as long as there are non-dirty cache-lines that require less
effort to re-allocate?
Kyösti