On 25.01.2009 00:43, Stefan Reinauer wrote:
On 24.01.2009 23:21 Uhr, Carl-Daniel Hailfinger wrote:
On 24.01.2009 20:58, Stefan Reinauer wrote:
Carl-Daniel Hailfinger wrote:
Example: We want to cache 0MB - (2G-64M-64k).
Where do the 64k come from?
That was specific to Jason's setup. IIRC the 64k were ACPI memory or somesuch.
Any reason why that shouldn't be cachable?
From a memory perspective, it's just normal memory, not graphics memory
or some such.
This might even be caused by my high tables patch from recently, but it looks like a bug to me.
However, a subtractive setup is not always more efficient.
Is it not? It sounds like at least if we have 2^x bytes of memory and subtract a small chunk or two, we would be quite well off with it.
Assuming you don't have anything you want to cache near 4 GB (like flash): Both strategies are equally efficient if the contiguous cacheable area has a size of 2^n+2^(n-1). The additive strategy is more efficient if the size is 2^n+2^(n-k) and k>1. The subtractive strategy is more efficient if the size is 2^n-2^(n-k) and k>1.
I hope that you accept this without a detailed mathematical proof. ;-)
I should have pointed out that the bit counting algorithm at the end of my mail is the definitive answer. The explanation above only covers some very common special cases. Please note that the explanation above explicitly does not cover the "2 equally sized DIMMS or only one DIMM and no UMA" scenario because 2^n can not be zero. (If you ever encounter DIMMs with non-power-of-2 sizes, ignore my last sentence.)
So in a setup with 2 equally sized DIMMs or only one DIMM,
With only one DIMM or two equally sized DIMMS, one MTRR is enough (provided you don't want a hole in there because they are >= 4 GB in total). Feel free to call this either additive or subtractive.
and possibly UMA the subtractive method will always be the way to go.
If UMA is the top part of RAM and TOPMEM is 2^n, you're right for most scenarios. There are cases where that assumption is slighly off, though. Consider total RAM 1024 MB, normal memory 640 MB and UMA 384 MB (possible on AMD 690G). You either need two MTRRs for additive setup or three MTRRs for subtractive setup.
That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:
- Check if there are multiple disjoint cached areas in a given
power-of-two sized area. 1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here.
please describe ;)
1b. Take the largest contiguous power-of-2 sized natually aligned chunk. Use additive setup for that chunk. Look at the remaining area. Does it still contain two disjoint chunks? If yes, go to 1b. If no, go to 2.
- additive_count=bitcount(top_cached_addr+1)
subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method
Yes, sounds good.
Glad to hear that. I hope the rest of the algorithm is OK for you as well.
Thinking about it again I'm not sure we're ever going to see multiple disjoint cached areas in a given 2^x sized area. That would be... UMA in the middle of the memory or something? Uh.. or more than 4G of memory?
It happens if you want to cache the ROM (slighly below 4 GB), but not the IOMEM areas before the ROM. My reply to Corey should have an example. Fortunately, we try to have no RAM between 3 GB and 4 GB, so for effective RAM sizes >=3 GB (and no UMA) we need 2 MTRRs below 4 GB for RAM (and 1 MTRR for ROM) and 1 MTRR above 4 GB if you're willing to waste address space (not RAM) to save on MTRRs.
The bit counting algorithm will solve these problems in an optimal way. If you want the "waste address space to save MTRRs" optimization, the bit count algorithm can haldle it. Just change the input to round up top_cached_addr+1 to a multiple of the biggest possible power of 2 which is in a "don't care" area.
Of course, if you are really really trying to clamp down on MTRR usage, you can do MTRR setup in two steps: cached ROM during coreboot execution and a change to uncached ROM (and thus avoidance of disjoint cached areas) directly before passing execution to the payload.
(And anyone implementing this should probably add all mails explaining the algorithm as code comments. Nobody is going to understand such code in a few months if it was written without enough comments. ;-)
Regards, Carl-Daniel