On Sat, Jan 24, 2009 at 4:27 AM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Hi,
the problems Jason is seeing with slow boot can be explained with our incomplete and less than efficient MTRR setup.
We never use subtractive MTRRs, so if a range is not power-of-two sized, we try to combine it from subranges which are power-of-two sized. For large ranges which are a little smaller than a power of two, we waste MTRRs and in some cases even run out of MTRRs. That's not something to be proud of.
Example: We want to cache 0MB - (2G-64M-64k). Current setup: reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0x40000000 (1024MB), size= 512MB: write-back, count=1 reg02: base=0x60000000 (1536MB), size= 256MB: write-back, count=1 reg03: base=0x70000000 (1792MB), size= 128MB: write-back, count=1 reg04: base=0x78000000 (1920MB), size= 32MB: write-back, count=1 reg05: base=0x7a000000 (1952MB), size= 16MB: write-back, count=1 reg06: base=0x7b000000 (1968MB), size= 8MB: write-back, count=1 reg07: base=0x7b800000 (1976MB), size= 4MB: write-back, count=1 --Here we run out of MTRRs, additionally needed MTRRs follow-- reg08: base=0x7bc00000 (1980MB), size= 2048kB: write-back, count=1 reg09: base=0x7be00000 (1982MB), size= 1024kB: write-back, count=1 reg10: base=0x7bf00000 (1983MB), size= 512kB: write-back, count=1 reg11: base=0x7bf80000 (1983MB), size= 256kB: write-back, count=1 reg12: base=0x7bfc0000 (1983MB), size= 128kB: write-back, count=1 reg13: base=0x7bfe0000 (1983MB), size= 64kB: write-back, count=1
We could achieve the same effect with a subtractive setup: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7c000000 (1984MB), size=64MB: uncached, count=1 reg02: base=0x7bff0000 (1983MB), size=64kB: uncached, count=1
it's a good point but I wonder: does VIA hardware have the ability to "hoist" this uncached memory to a reasonable alignment? That is what the SiS chipsets used to to. That would make this problem easier.
However, a subtractive setup is not always more efficient. That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:
- Check if there are multiple disjoint cached areas in a given
power-of-two sized area.
remember that there is no harm in having an mtrr cover an area of memory which has holes in it. We may not need too many tricks.
1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here. 2. additive_count=bitcount(top_cached_addr+1) 3. subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method
This sounds neat, we're going to need it. An interesting test for now would be to artificially limit memory size to that which can be described with 4 MTRRs. We would lose 200M or so that way but it would be helpful to see if that resolves the speed problem.
MTRRs are one of my least favorite pieces of x86 architecture ...
ron