On 24.01.2009 20:58, Stefan Reinauer wrote:
Carl-Daniel Hailfinger wrote:
Example: We want to cache 0MB - (2G-64M-64k).
Where do the 64k come from?
That was specific to Jason's setup. IIRC the 64k were ACPI memory or somesuch.
Current setup: reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0x40000000 (1024MB), size= 512MB: write-back, count=1 reg02: base=0x60000000 (1536MB), size= 256MB: write-back, count=1 reg03: base=0x70000000 (1792MB), size= 128MB: write-back, count=1 reg04: base=0x78000000 (1920MB), size= 32MB: write-back, count=1 reg05: base=0x7a000000 (1952MB), size= 16MB: write-back, count=1 reg06: base=0x7b000000 (1968MB), size= 8MB: write-back, count=1 reg07: base=0x7b800000 (1976MB), size= 4MB: write-back, count=1 --Here we run out of MTRRs, additionally needed MTRRs follow--
And actually we should have run out of MTRRs 2 steps earlier. The BIOS is only allowed to grab 6 of the 8 possible MTRRs. The other two have to be left free to use by the operating system. There is code in coreboot to enforce this, but it's #if 0'ed.
Ouch.
We could achieve the same effect with a subtractive setup: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7c000000 (1984MB), size=64MB: uncached, count=1 reg02: base=0x7bff0000 (1983MB), size=64kB: uncached, count=1
However, a subtractive setup is not always more efficient.
Is it not? It sounds like at least if we have 2^x bytes of memory and subtract a small chunk or two, we would be quite well off with it.
Assuming you don't have anything you want to cache near 4 GB (like flash): Both strategies are equally efficient if the contiguous cacheable area has a size of 2^n+2^(n-1). The additive strategy is more efficient if the size is 2^n+2^(n-k) and k>1. The subtractive strategy is more efficient if the size is 2^n-2^(n-k) and k>1.
I hope that you accept this without a detailed mathematical proof. ;-)
I wonder... the subtractive strategy you subscribe is mentioned in mtrr.c too and the comment claims it is implemented. But it very much seems it is not.
Ouch.
That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:
- Check if there are multiple disjoint cached areas in a given
power-of-two sized area. 1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here.
please describe ;)
1b. Take the largest contiguous power-of-2 sized natually aligned chunk. Use additive setup for that chunk. Look at the remaining area. Does it still contain two disjoint chunks? If yes, go to 1b. If no, go to 2.
- additive_count=bitcount(top_cached_addr+1)
subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method
Yes, sounds good.
Glad to hear that. I hope the rest of the algorithm is OK for you as well.
Regards, Carl-Daniel