MTRR setup strategy

List overview All Threads
Download

newer

older

[FYI] flashrom SST SST49LF003A/B...

Will coreboot run on my acer...

Carl-Daniel Hailfinger

24 Jan 2009 24 Jan '09

1:27 p.m.

Hi,

the problems Jason is seeing with slow boot can be explained with our incomplete and less than efficient MTRR setup.

We never use subtractive MTRRs, so if a range is not power-of-two sized, we try to combine it from subranges which are power-of-two sized. For large ranges which are a little smaller than a power of two, we waste MTRRs and in some cases even run out of MTRRs. That's not something to be proud of.

Example: We want to cache 0MB - (2G-64M-64k). Current setup: reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0x40000000 (1024MB), size= 512MB: write-back, count=1 reg02: base=0x60000000 (1536MB), size= 256MB: write-back, count=1 reg03: base=0x70000000 (1792MB), size= 128MB: write-back, count=1 reg04: base=0x78000000 (1920MB), size= 32MB: write-back, count=1 reg05: base=0x7a000000 (1952MB), size= 16MB: write-back, count=1 reg06: base=0x7b000000 (1968MB), size= 8MB: write-back, count=1 reg07: base=0x7b800000 (1976MB), size= 4MB: write-back, count=1 --Here we run out of MTRRs, additionally needed MTRRs follow-- reg08: base=0x7bc00000 (1980MB), size= 2048kB: write-back, count=1 reg09: base=0x7be00000 (1982MB), size= 1024kB: write-back, count=1 reg10: base=0x7bf00000 (1983MB), size= 512kB: write-back, count=1 reg11: base=0x7bf80000 (1983MB), size= 256kB: write-back, count=1 reg12: base=0x7bfc0000 (1983MB), size= 128kB: write-back, count=1 reg13: base=0x7bfe0000 (1983MB), size= 64kB: write-back, count=1

We could achieve the same effect with a subtractive setup: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7c000000 (1984MB), size=64MB: uncached, count=1 reg02: base=0x7bff0000 (1983MB), size=64kB: uncached, count=1

However, a subtractive setup is not always more efficient. That means we have to select the best setup type. I devised a slightly tricky algorithm to do that: 1. Check if there are multiple disjoint cached areas in a given power-of-two sized area. 1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here. 2. additive_count=bitcount(top_cached_addr+1) 3. subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method

Regards, Carl-Daniel

-- http://www.hailfinger.org/

Show replies by date

ron minnich

24 Jan 24 Jan

4:31 p.m.

On Sat, Jan 24, 2009 at 4:27 AM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:

...

Hi,

the problems Jason is seeing with slow boot can be explained with our incomplete and less than efficient MTRR setup.

We never use subtractive MTRRs, so if a range is not power-of-two sized, we try to combine it from subranges which are power-of-two sized. For large ranges which are a little smaller than a power of two, we waste MTRRs and in some cases even run out of MTRRs. That's not something to be proud of.

Example: We want to cache 0MB - (2G-64M-64k). Current setup: reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0x40000000 (1024MB), size= 512MB: write-back, count=1 reg02: base=0x60000000 (1536MB), size= 256MB: write-back, count=1 reg03: base=0x70000000 (1792MB), size= 128MB: write-back, count=1 reg04: base=0x78000000 (1920MB), size= 32MB: write-back, count=1 reg05: base=0x7a000000 (1952MB), size= 16MB: write-back, count=1 reg06: base=0x7b000000 (1968MB), size= 8MB: write-back, count=1 reg07: base=0x7b800000 (1976MB), size= 4MB: write-back, count=1 --Here we run out of MTRRs, additionally needed MTRRs follow-- reg08: base=0x7bc00000 (1980MB), size= 2048kB: write-back, count=1 reg09: base=0x7be00000 (1982MB), size= 1024kB: write-back, count=1 reg10: base=0x7bf00000 (1983MB), size= 512kB: write-back, count=1 reg11: base=0x7bf80000 (1983MB), size= 256kB: write-back, count=1 reg12: base=0x7bfc0000 (1983MB), size= 128kB: write-back, count=1 reg13: base=0x7bfe0000 (1983MB), size= 64kB: write-back, count=1

We could achieve the same effect with a subtractive setup: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7c000000 (1984MB), size=64MB: uncached, count=1 reg02: base=0x7bff0000 (1983MB), size=64kB: uncached, count=1

it's a good point but I wonder: does VIA hardware have the ability to "hoist" this uncached memory to a reasonable alignment? That is what the SiS chipsets used to to. That would make this problem easier.

...

However, a subtractive setup is not always more efficient. That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:

Check if there are multiple disjoint cached areas in a given

power-of-two sized area.

remember that there is no harm in having an mtrr cover an area of memory which has holes in it. We may not need too many tricks.

...

1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here. 2. additive_count=bitcount(top_cached_addr+1) 3. subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method

This sounds neat, we're going to need it. An interesting test for now would be to artificially limit memory size to that which can be described with 4 MTRRs. We would lose 200M or so that way but it would be helpful to see if that resolves the speed problem.

MTRRs are one of my least favorite pieces of x86 architecture ...

ron

Stefan Reinauer

8:46 p.m.

ron minnich wrote:

...

...
We could achieve the same effect with a subtractive setup: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7c000000 (1984MB), size=64MB: uncached, count=1 reg02: base=0x7bff0000 (1983MB), size=64kB: uncached, count=1

it's a good point but I wonder: does VIA hardware have the ability to "hoist" this uncached memory to a reasonable alignment? That is what the SiS chipsets used to to. That would make this problem easier.

Hoisting will not be generally available, so even if one chipset can do it, we need to solve the problem without it. Overlaying MTRRs work fine for that matter. No hoisting needed.

The same problem we see above is on all UMA platforms (like the i945 or the cx700) I've seen so far. I'd be surprised if it's different on the dbm690.

...

This sounds neat, we're going to need it. An interesting test for now would be to artificially limit memory size to that which can be described with 4 MTRRs. We would lose 200M or so that way but it would be helpful to see if that resolves the speed problem.

How would we lose 200M? The above example only requires 3 MTRRs?

Stefan

-- coresystems GmbH • Brahmsstr. 16 • D-79104 Freiburg i. Br. Tel.: +49 761 7668825 • Fax: +49 761 7664613 Email: info@coresystems.de • http://www.coresystems.de/ Registergericht: Amtsgericht Freiburg • HRB 7656 Geschäftsführer: Stefan Reinauer • Ust-IdNr.: DE245674866

ron minnich

8:49 p.m.

On Sat, Jan 24, 2009 at 11:46 AM, Stefan Reinauer stepan@coresystems.de wrote:

...

...
This sounds neat, we're going to need it. An interesting test for now would be to artificially limit memory size to that which can be described with 4 MTRRs. We would lose 200M or so that way but it would be helpful to see if that resolves the speed problem.

How would we lose 200M? The above example only requires 3 MTRRs?

I was not clear. What I meant was that we could test the 'it's an mtrr problem' assertion by using 3 MTRRs for positive decoding of most of memory, but not the last little bit,and then reporting that we only have that memory covered by MTRRs.

put another way: simple test: hardwire TOM to 1 GB, and see if booting gets better, since one MTRR will cover that.

What do MTRRs look like on this board on factory BIOS?

ron

Stefan Reinauer

8:58 p.m.

Carl-Daniel Hailfinger wrote:

...

Example: We want to cache 0MB - (2G-64M-64k).

Where do the 64k come from?

...

Current setup: reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0x40000000 (1024MB), size= 512MB: write-back, count=1 reg02: base=0x60000000 (1536MB), size= 256MB: write-back, count=1 reg03: base=0x70000000 (1792MB), size= 128MB: write-back, count=1 reg04: base=0x78000000 (1920MB), size= 32MB: write-back, count=1 reg05: base=0x7a000000 (1952MB), size= 16MB: write-back, count=1 reg06: base=0x7b000000 (1968MB), size= 8MB: write-back, count=1 reg07: base=0x7b800000 (1976MB), size= 4MB: write-back, count=1 --Here we run out of MTRRs, additionally needed MTRRs follow--

And actually we should have run out of MTRRs 2 steps earlier. The BIOS is only allowed to grab 6 of the 8 possible MTRRs. The other two have to be left free to use by the operating system. There is code in coreboot to enforce this, but it's #if 0'ed.

...

We could achieve the same effect with a subtractive setup: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7c000000 (1984MB), size=64MB: uncached, count=1 reg02: base=0x7bff0000 (1983MB), size=64kB: uncached, count=1

However, a subtractive setup is not always more efficient.

Is it not? It sounds like at least if we have 2^x bytes of memory and subtract a small chunk or two, we would be quite well off with it.

I wonder... the subtractive strategy you subscribe is mentioned in mtrr.c too and the comment claims it is implemented. But it very much seems it is not.

...

That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:

Check if there are multiple disjoint cached areas in a given

power-of-two sized area. 1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here.

please describe ;)

...

additive_count=bitcount(top_cached_addr+1)

subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method

Yes, sounds good.

Stefan

Corey Osgood

10:47 p.m.

On Sat, Jan 24, 2009 at 2:58 PM, Stefan Reinauer stepan@coresystems.dewrote:

...

Carl-Daniel Hailfinger wrote:

...
Example: We want to cache 0MB - (2G-64M-64k).

Where do the 64k come from?

...
Current setup: reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0x40000000 (1024MB), size= 512MB: write-back, count=1 reg02: base=0x60000000 (1536MB), size= 256MB: write-back, count=1 reg03: base=0x70000000 (1792MB), size= 128MB: write-back, count=1 reg04: base=0x78000000 (1920MB), size= 32MB: write-back, count=1 reg05: base=0x7a000000 (1952MB), size= 16MB: write-back, count=1 reg06: base=0x7b000000 (1968MB), size= 8MB: write-back, count=1 reg07: base=0x7b800000 (1976MB), size= 4MB: write-back, count=1 --Here we run out of MTRRs, additionally needed MTRRs follow--

And actually we should have run out of MTRRs 2 steps earlier. The BIOS is only allowed to grab 6 of the 8 possible MTRRs. The other two have to be left free to use by the operating system. There is code in coreboot to enforce this, but it's #if 0'ed.

...
We could achieve the same effect with a subtractive setup: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7c000000 (1984MB), size=64MB: uncached, count=1 reg02: base=0x7bff0000 (1983MB), size=64kB: uncached, count=1

However, a subtractive setup is not always more efficient.

Is it not? It sounds like at least if we have 2^x bytes of memory and subtract a small chunk or two, we would be quite well off with it.

I wonder... the subtractive strategy you subscribe is mentioned in mtrr.c too and the comment claims it is implemented. But it very much seems it is not.

...
That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:

Check if there are multiple disjoint cached areas in a given

power-of-two sized area. 1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here.

please describe ;)

...

additive_count=bitcount(top_cached_addr+1)

subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1))

...

if (additive_count>subtractive_count) go to subtractive_method else

go to additive_method

Yes, sounds good.

I've already been working on it for v3, along with handling the cached flash area. Quick question though, what would happen if, for a few instructions, both MTRR0 and MTRR1 covered the flash area? Also, I've been extremely busy lately and haven't had much time for coreboot, so I won't be offended if someone beats me to fixing this.

-Corey

Carl-Daniel Hailfinger

11:10 p.m.

On 24.01.2009 22:47, Corey Osgood wrote:

...

I've already been working on it for v3, along with handling the cached flash area.

Cool.

...

Quick question though, what would happen if, for a few instructions, both MTRR0 and MTRR1 covered the flash area?

The AMD64 documentation says you can have as many overlapping MTRRs as you want (subject to the number of available MTRRs). However, the MTRR with the "worst" (uncached) memory type will always win. That means a purely subtractive setup with 3 GB RAM makes it impossible to cache the flash area.

So what do you do if you want 3 GB RAM and 1 MB ROM cached? MTRR0: 0-2048 MB, size 2048 MB (writeback) MTRR1: 2048-3072 MB, size 1024 MB (writeback) MTRR2: 4095-4096 MB, size 1 MB (writeback) MTRR default type: (uncached)

What does not work? MTRR0: 0-4096 MB, size 4096 MB (writeback) MTRR1: 3072-4096 MB, size 1024 MB (uncached) MTRR2: 4095-4096 MB, size 1 MB (writeback) MTRR default type: (uncached) Here MTRR2 has no effect because it is overridden by the "worse" type of MTRR1.

...

Also, I've been extremely busy lately and haven't had much time for coreboot, so I won't be offended if someone beats me to fixing this.

I'm glad you were working on this. Do you already have any code we can use as a basis?

Regards, Carl-Daniel

-- http://www.hailfinger.org/

Carl-Daniel Hailfinger

11:21 p.m.

On 24.01.2009 20:58, Stefan Reinauer wrote:

...

Carl-Daniel Hailfinger wrote:

...
Example: We want to cache 0MB - (2G-64M-64k).

Where do the 64k come from?

That was specific to Jason's setup. IIRC the 64k were ACPI memory or somesuch.

...

...
Current setup: reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0x40000000 (1024MB), size= 512MB: write-back, count=1 reg02: base=0x60000000 (1536MB), size= 256MB: write-back, count=1 reg03: base=0x70000000 (1792MB), size= 128MB: write-back, count=1 reg04: base=0x78000000 (1920MB), size= 32MB: write-back, count=1 reg05: base=0x7a000000 (1952MB), size= 16MB: write-back, count=1 reg06: base=0x7b000000 (1968MB), size= 8MB: write-back, count=1 reg07: base=0x7b800000 (1976MB), size= 4MB: write-back, count=1 --Here we run out of MTRRs, additionally needed MTRRs follow--

And actually we should have run out of MTRRs 2 steps earlier. The BIOS is only allowed to grab 6 of the 8 possible MTRRs. The other two have to be left free to use by the operating system. There is code in coreboot to enforce this, but it's #if 0'ed.

Ouch.

...

...
We could achieve the same effect with a subtractive setup: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7c000000 (1984MB), size=64MB: uncached, count=1 reg02: base=0x7bff0000 (1983MB), size=64kB: uncached, count=1

However, a subtractive setup is not always more efficient.

Is it not? It sounds like at least if we have 2^x bytes of memory and subtract a small chunk or two, we would be quite well off with it.

Assuming you don't have anything you want to cache near 4 GB (like flash): Both strategies are equally efficient if the contiguous cacheable area has a size of 2^n+2^(n-1). The additive strategy is more efficient if the size is 2^n+2^(n-k) and k>1. The subtractive strategy is more efficient if the size is 2^n-2^(n-k) and k>1.

I hope that you accept this without a detailed mathematical proof. ;-)

...

I wonder... the subtractive strategy you subscribe is mentioned in mtrr.c too and the comment claims it is implemented. But it very much seems it is not.

Ouch.

...

...
That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:

Check if there are multiple disjoint cached areas in a given

power-of-two sized area. 1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here.

please describe ;)

1b. Take the largest contiguous power-of-2 sized natually aligned chunk. Use additive setup for that chunk. Look at the remaining area. Does it still contain two disjoint chunks? If yes, go to 1b. If no, go to 2.

...

...

additive_count=bitcount(top_cached_addr+1)

subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method

Yes, sounds good.

Glad to hear that. I hope the rest of the algorithm is OK for you as well.

Regards, Carl-Daniel

-- http://www.hailfinger.org/

Stefan Reinauer

25 Jan 25 Jan

12:43 a.m.

On 24.01.2009 23:21 Uhr, Carl-Daniel Hailfinger wrote:

...

On 24.01.2009 20:58, Stefan Reinauer wrote:

...
Carl-Daniel Hailfinger wrote:

...
Example: We want to cache 0MB - (2G-64M-64k).

Where do the 64k come from?

That was specific to Jason's setup. IIRC the 64k were ACPI memory or somesuch.

Any reason why that shouldn't be cachable?

...

From a memory perspective, it's just normal memory, not graphics memory

or some such.

This might even be caused by my high tables patch from recently, but it looks like a bug to me.

...

...
...
However, a subtractive setup is not always more efficient.

Is it not? It sounds like at least if we have 2^x bytes of memory and subtract a small chunk or two, we would be quite well off with it.

Assuming you don't have anything you want to cache near 4 GB (like flash): Both strategies are equally efficient if the contiguous cacheable area has a size of 2^n+2^(n-1). The additive strategy is more efficient if the size is 2^n+2^(n-k) and k>1. The subtractive strategy is more efficient if the size is 2^n-2^(n-k) and k>1.

I hope that you accept this without a detailed mathematical proof. ;-)

So in a setup with 2 equally sized DIMMs or only one DIMM, and possibly UMA the subtractive method will always be the way to go.

...

...
...
That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:

Check if there are multiple disjoint cached areas in a given

power-of-two sized area. 1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here.

please describe ;)

1b. Take the largest contiguous power-of-2 sized natually aligned chunk. Use additive setup for that chunk. Look at the remaining area. Does it still contain two disjoint chunks? If yes, go to 1b. If no, go to 2.

...
...

additive_count=bitcount(top_cached_addr+1)

subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method

Yes, sounds good.

Glad to hear that. I hope the rest of the algorithm is OK for you as well.

Thinking about it again I'm not sure we're ever going to see multiple disjoint cached areas in a given 2^x sized area. That would be... UMA in the middle of the memory or something? Uh.. or more than 4G of memory?

Carl-Daniel Hailfinger

1:37 a.m.

On 25.01.2009 00:43, Stefan Reinauer wrote:

...

On 24.01.2009 23:21 Uhr, Carl-Daniel Hailfinger wrote:

...
On 24.01.2009 20:58, Stefan Reinauer wrote:

...
Carl-Daniel Hailfinger wrote:

...
Example: We want to cache 0MB - (2G-64M-64k).

Where do the 64k come from?

That was specific to Jason's setup. IIRC the 64k were ACPI memory or somesuch.

Any reason why that shouldn't be cachable?

...
From a memory perspective, it's just normal memory, not graphics memory

or some such.

This might even be caused by my high tables patch from recently, but it looks like a bug to me.

...
...
...
However, a subtractive setup is not always more efficient.

Is it not? It sounds like at least if we have 2^x bytes of memory and subtract a small chunk or two, we would be quite well off with it.

Assuming you don't have anything you want to cache near 4 GB (like flash): Both strategies are equally efficient if the contiguous cacheable area has a size of 2^n+2^(n-1). The additive strategy is more efficient if the size is 2^n+2^(n-k) and k>1. The subtractive strategy is more efficient if the size is 2^n-2^(n-k) and k>1.

I hope that you accept this without a detailed mathematical proof. ;-)

I should have pointed out that the bit counting algorithm at the end of my mail is the definitive answer. The explanation above only covers some very common special cases. Please note that the explanation above explicitly does not cover the "2 equally sized DIMMS or only one DIMM and no UMA" scenario because 2^n can not be zero. (If you ever encounter DIMMs with non-power-of-2 sizes, ignore my last sentence.)

...

So in a setup with 2 equally sized DIMMs or only one DIMM,

With only one DIMM or two equally sized DIMMS, one MTRR is enough (provided you don't want a hole in there because they are >= 4 GB in total). Feel free to call this either additive or subtractive.

...

and possibly UMA the subtractive method will always be the way to go.

If UMA is the top part of RAM and TOPMEM is 2^n, you're right for most scenarios. There are cases where that assumption is slighly off, though. Consider total RAM 1024 MB, normal memory 640 MB and UMA 384 MB (possible on AMD 690G). You either need two MTRRs for additive setup or three MTRRs for subtractive setup.

...

...
...
...
That means we have to select the best setup type. I devised a slightly tricky algorithm to do that:

Check if there are multiple disjoint cached areas in a given

power-of-two sized area. 1a. If no, go to step 2 1b. If yes, stop here. Need advanced setup not described here.

please describe ;)

1b. Take the largest contiguous power-of-2 sized natually aligned chunk. Use additive setup for that chunk. Look at the remaining area. Does it still contain two disjoint chunks? If yes, go to 1b. If no, go to 2.

...
...

additive_count=bitcount(top_cached_addr+1)

subtractive_count=bitcount(rounduptonextpowerof2(top_cached_addr)-(top_cached_addr+1)) 4. if (additive_count>subtractive_count) go to subtractive_method else go to additive_method

Yes, sounds good.

Glad to hear that. I hope the rest of the algorithm is OK for you as well.

Thinking about it again I'm not sure we're ever going to see multiple disjoint cached areas in a given 2^x sized area. That would be... UMA in the middle of the memory or something? Uh.. or more than 4G of memory?

It happens if you want to cache the ROM (slighly below 4 GB), but not the IOMEM areas before the ROM. My reply to Corey should have an example. Fortunately, we try to have no RAM between 3 GB and 4 GB, so for effective RAM sizes >=3 GB (and no UMA) we need 2 MTRRs below 4 GB for RAM (and 1 MTRR for ROM) and 1 MTRR above 4 GB if you're willing to waste address space (not RAM) to save on MTRRs.

The bit counting algorithm will solve these problems in an optimal way. If you want the "waste address space to save MTRRs" optimization, the bit count algorithm can haldle it. Just change the input to round up top_cached_addr+1 to a multiple of the biggest possible power of 2 which is in a "don't care" area.

Of course, if you are really really trying to clamp down on MTRR usage, you can do MTRR setup in two steps: cached ROM during coreboot execution and a change to uncached ROM (and thus avoidance of disjoint cached areas) directly before passing execution to the payload.

(And anyone implementing this should probably add all mails explaining the algorithm as code comments. Nobody is going to understand such code in a few months if it was written without enough comments. ;-)

Regards, Carl-Daniel

-- http://www.hailfinger.org/

Peter Stuge

2:39 a.m.

Carl-Daniel Hailfinger wrote:

...

(And anyone implementing this should probably add all mails explaining the algorithm as code comments. Nobody is going to understand such code in a few months if it was written without enough comments. ;-)

This is just another allocation algorithm.

The PCI resource allocation taught us that KISS must be key.

6 entries is a sad restriction to deal with. These are times when

...

=4GB is not so uncommon.

//Peter

Carl-Daniel Hailfinger

3:06 a.m.

On 25.01.2009 02:39, Peter Stuge wrote:

...

Carl-Daniel Hailfinger wrote:

...
(And anyone implementing this should probably add all mails explaining the algorithm as code comments. Nobody is going to understand such code in a few months if it was written without enough comments. ;-)

This is just another allocation algorithm.

The PCI resource allocation taught us that KISS must be key.

We have a KISS algorithm for MTRRs right now. It causes >10 minute boots. The RAM init code taught us that good comments are essential and documenting the shortfalls is key.

...

6 entries is a sad restriction to deal with. These are times when

...
=4GB is not so uncommon.

There are enough common corner cases and restrictions not handled by coreboot. I hope we can get an optimal solution at least for MTRRs.

I have a dream. One recent mainstream desktop mainboard that works a least as well as with the alternative.

Regards, Carl-Daniel

-- http://www.hailfinger.org/

Carl-Daniel Hailfinger

1:45 a.m.

On 25.01.2009 00:43, Stefan Reinauer wrote:

...

On 24.01.2009 23:21 Uhr, Carl-Daniel Hailfinger wrote:

...
On 24.01.2009 20:58, Stefan Reinauer wrote:

...
Carl-Daniel Hailfinger wrote:

...
Example: We want to cache 0MB - (2G-64M-64k).

Where do the 64k come from?

That was specific to Jason's setup. IIRC the 64k were ACPI memory or somesuch.

Any reason why that shouldn't be cachable?

No idea. Maybe nonserialized concurrent accesses by multicore APCI interpreters? (Is that even possible?)

...

From a memory perspective, it's just normal memory, not graphics memory or some such.

That would certainly reduce the number of MTRRs needed quite a lot.

...

This might even be caused by my high tables patch from recently, but it looks like a bug to me.

The tables should be cacheable as well, right?

We need a really LOUD SCREAMING (probably CRIT/EMERG) warning if we run out of MTRRs.

Regards, Carl-Daniel

-- http://www.hailfinger.org/

ebiederm＠xmission.com

28 Jan 28 Jan

3:30 a.m.

Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net writes:

...

Assuming you don't have anything you want to cache near 4 GB (like flash): Both strategies are equally efficient if the contiguous cacheable area has a size of 2^n+2^(n-1). The additive strategy is more efficient if the size is 2^n+2^(n-k) and k>1. The subtractive strategy is more efficient if the size is 2^n-2^(n-k) and k>1.

I hope that you accept this without a detailed mathematical proof. ;-)

...
I wonder... the subtractive strategy you subscribe is mentioned in mtrr.c too and the comment claims it is implemented. But it very much seems it is not.

Ouch.

When did it break?

I remember the subtractive mtrr setup working correctly. At least most of the time.

Eric

Corey Osgood

5:38 a.m.

On Tue, Jan 27, 2009 at 9:30 PM, Eric W. Biederman ebiederm@xmission.comwrote:

...

Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net writes:

...
Assuming you don't have anything you want to cache near 4 GB (like

flash):

...
Both strategies are equally efficient if the contiguous cacheable area has a size of 2^n+2^(n-1). The additive strategy is more efficient if the size is 2^n+2^(n-k) and

k>1.

...
The subtractive strategy is more efficient if the size is 2^n-2^(n-k) and k>1.

I hope that you accept this without a detailed mathematical proof. ;-)

...
I wonder... the subtractive strategy you subscribe is mentioned in mtrr.c too and the comment claims it is implemented. But it very much seems it is not.

Ouch.

When did it break?

I remember the subtractive mtrr setup working correctly. At least most of the time.

??? I'm not seeing any code in v2 capable of calculating subtractive mtrrs, is it possibly in v1? Am I missing something? I'm only looking in mtrr.c.

Thanks, Corey

ebiederm＠xmission.com

5:53 a.m.

Corey Osgood corey.osgood@gmail.com writes:

...

??? I'm not seeing any code in v2 capable of calculating subtractive mtrrs, is it possibly in v1? Am I missing something? I'm only looking in mtrr.c.

I haven't looked recently so it may be that someone took it for some bizarre reason. But the code was there and it worked. Do we have a good history or has transitioning between a dozen version control systems messed that up?

Eric

ron minnich

6:14 a.m.

On Tue, Jan 27, 2009 at 8:53 PM, Eric W. Biederman ebiederm@xmission.com wrote:

...

I haven't looked recently so it may be that someone took it for some bizarre reason. But the code was there and it worked. Do we have a good history or has transitioning between a dozen version control systems messed that up?

Look at rev 700 or so, the first v2 rev in svn. If it ain't there then I don't know.

There have been lots and lots of fingers in the MTRR pie over the years. It's hard to say when this might have changed.

ron

Stefan Reinauer

12:05 p.m.

On 28.01.2009 5:53 Uhr, Eric W. Biederman wrote:

...

Corey Osgood corey.osgood@gmail.com writes:

...
??? I'm not seeing any code in v2 capable of calculating subtractive mtrrs, is it possibly in v1? Am I missing something? I'm only looking in mtrr.c.

I haven't looked recently so it may be that someone took it for some bizarre reason. But the code was there and it worked. Do we have a good history or has transitioning between a dozen version control systems messed that up?

Eric

http://tracker.coreboot.org/trac/coreboot/browser works pretty nicely. I took a lot of care to preserve the v2 history from the very first v2 checkin. The one thing you have to take care is that r3051 renamed LinuxBIOSv2 to coreboot-v2 but that's it.

Stefan

Carl-Daniel Hailfinger

1:26 p.m.

On 28.01.2009 05:53, Eric W. Biederman wrote:

...

Corey Osgood corey.osgood@gmail.com writes:

...
??? I'm not seeing any code in v2 capable of calculating subtractive mtrrs, is it possibly in v1? Am I missing something? I'm only looking in mtrr.c.

I haven't looked recently so it may be that someone took it for some bizarre reason. But the code was there and it worked. Do we have a good history or has transitioning between a dozen version control systems messed that up?

I looked at all changes since r2006 in src/cpu/x86/mtrr/ and src/cpu/amd/mtrr/ r3014 introduced CONFIG_VAR_MTRR_HOLE which needs to be enabled to use subtractive MTRR code for x86. Before that revision, that code was always enabled.

However, I get the following boot log, so I assume subtractive setup works at least in some cases:

Initializing CPU #0 CPU: vendor AMD device 50ff2 CPU: family 0f, model 5f, stepping 02 Enabling cache Setting fixed MTRRs(0-88) type: UC Setting fixed MTRRs(0-16) Type: WB, RdMEM, WrMEM Setting fixed MTRRs(24-88) Type: WB, RdMEM, WrMEM DONE fixed MTRRs Setting variable MTRR 0, base: 0MB, range: 4096MB, type WB ADDRESS_MASK_HIGH=0xff Setting variable MTRR 1, base: 4096MB, range: 1024MB, type WB ADDRESS_MASK_HIGH=0xff Setting variable MTRR 2, base: 3072MB, range: 1024MB, type UC ADDRESS_MASK_HIGH=0xff DONE variable MTRRs Clear out the extra MTRR's call enable_var_mtrr() Leave x86_setup_var_mtrrs MTRR check Fixed MTRRs : Enabled Variable MTRRs: Enabled CPU model AMD Athlon(tm) 64 Processor 3000+

Regards, Carl-Daniel

-- http://www.hailfinger.org/

ebiederm＠xmission.com

29 Jan 29 Jan

7:29 a.m.

Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net writes:

...

I looked at all changes since r2006 in src/cpu/x86/mtrr/ and src/cpu/amd/mtrr/ r3014 introduced CONFIG_VAR_MTRR_HOLE which needs to be enabled to use subtractive MTRR code for x86. Before that revision, that code was always enabled.

However, I get the following boot log, so I assume subtractive setup works at least in some cases:

Good to see. For the rest of the cases I guess someone needs to take a look and see why it doesn't work.

A few comment on general MTRR policy.

A MTRR covering the rom chip while we are executing out of the ROM is necessary for performance. After that even when copying data I could not measure a measurable performance impact, so it is unnecessary.

Given that both Linux and Windows uses PAT on modern systems we want to use whatever MTRRs we can to mark as much of the RAM as possible (preferably all) as cacheable. Reserving MTRRs for the OS was a nice idea when it was specced but with overlapping MTRRs it doesn't really help, and with PAT support in the OS's it really isn't necessary.

If the RAM is not cacheable we really should not report it to the OS because it will be painfully and horribly slow, and significantly degrade performance.

Eric

5936

days inactive

5941

days old

coreboot@coreboot.org

19 comments

6 participants

tags (0)

participants (6)

Carl-Daniel Hailfinger
Corey Osgood
ebiederm＠xmission.com
Peter Stuge
ron minnich
Stefan Reinauer