I keep trying to get Cache-As-Ram working with a dual Xeon P4/HT board, and have a few simple(?) questions about cache behaviour. I may have misunderstood some aspects of the cache logic here.
As a test setup, my coreboot.rom image now includes the cache_as_ram.inc from intel/model_6ex and I link it with a ROMCC compiled romstage for easier debugging. Except for the cache-based stack, I got this to work nicely.
The Xeon P4/HT CPUs installed on the mainboard have 8 kB L1, 512 kB L2 and 1024 kB L3. All levels share code and data, cache-line is 64 bytes. Below I have ignored the existence of L3.
Problem 1: L2 does not store cache-line.
I may have a case that L2 cache is currently not enabled at all. The reference code uses MSR 0x11e to explicitly enable L2, but this MSR does not exist for Xeon P4 and actually halts CPU. I did not find controls besides CR0 and L3 disable MSR that effect cache.
As a test procedure, I have defined a 16 kB cache region over non-existing MMIO on the system bus, just below the FWH decode range. DCACHE_RAM_SIZE=0x4000 and DCACHE_RAM_BASE=0xffafc000. All reads there return 0xFF's and any writes are ignored.
MTRR is setup for write-back. State of CR0.CD (cache disable bit) seems to have no effect on this test.
For every dword in the range, starting from BASE, I first read them from system bus, hoping to get valid cache-lines in L2 to hit later on. Then I write each dword in the range with its address. When reading back, again starting from BASE, the last 8kB (except for one cache line) return the contents I wrote.
My conclusion of this is, that when a modified cache-line is de-allocated from L1, there is a write miss on L2 and the write is lost on the system bus. Is this allowed or typical behaviour, as under normal operation the cache-line would be stored in DRAM?
Is the minimum amount of RAM required for romstage 32 kB? (STACK_SIZE = 0x8000)? So L1 alone cannot handle it?
Problem 2: Can I skip cache-fill from L2?
This question is only relevant, if I cannot enable L2 and the 8kB L1 would be enough for Cache-As-Ram (which I doubt).
I have dirty (exclusive?) cache-lines stored in L1 and code reaches execution of an instruction that requires writing dirty lines to system bus. Examples of such opcodes : inb, outb, mov ->cr0, wrmsr->MTRR.
I assume that with one of these opcodes, any modified cache-lines are written on the system bus. The cache-line data remains valid in L1, but the state is probably changed (exclusive -> shared ?). The first write access on such a line will cause a fill from next level cache (L2) or system bus. In my case L2 might be disabled, so cache-line then contains all 0xFF's except for the data from our store instruction. Further writes on the same line do not cause new fill.
Is there a way to avoid the fill from L2 on the first write, and re-use the valid data on the (shared?) line? I thought the no-fill mode (CR0.NW) would do this for me.
Problem 3: Cache re-allocation policy?
Cache-lines for the stack must remain in L1 while the XIP ROM lines can be thrown out whenever necessary. Generally, do dirty cache-lines remain in L1 as long as there are non-dirty cache-lines that require less effort to re-allocate?
Kyösti