I've decided to take on the resource allocator. I think it can be written to work in three passes that will be clearly defined.
Here's a summary of how it works right now (as best I can tell)
There's a function called compute_allocate_resource in device/device.c which is the main part of the function.
It takes a bus, a resource, a resource-type mask, and a type.
1. It calls read bus resources on the bus 2. In a loop, it goes through all the resources (largest to smallest) and assigns them values. It marks them assigned, checks that they don't conflict with PCI, etc. 3. As it assigns values, it increments the base address it uses. 4. After it assigns the values, it sets the resource's size
Part of what makes this function confusing is how often it gets called.
It gets called in the device code during phase4, but it also gets called every time a PCI bus resource is "recorded." So it gets called lots of times, and thus there's a need for the have_resources flag which means "have read resources."
My proposal is to split it into distinct pieces. I've implemented most of it but it's not quite there.
1. read_resources - Just read resources. No setting PCI Config values, etc. 2. compute_resource_needs - Go through using the same algorithm as compute_allocate_resources, but don't assign the values. Just sum them up. This is also the time to propagate minimum resource alignments, etc. It is a depth-first dts traversal. 3. assign_resource_values - Go through and assign actual values to the resources, and make sure everything is reachable. This is also mostly the same as compute_allocate_resources, but it goes through breadth first instead of depth first. 4. set_resources
I'm hoping to move the resource-specific code out of device/pci_device.c, and much of the PCI-specific code out of device/device.c It doesn't seem like the allocation code should know or care how to disable any specific type of resource when it has size 0. Same thing for the resource code. It shouldn't fiddle with start and end values. The have_resources flag is completely gone since resources only get read once at the beginning.
Right now I'm trying to understand the legacy PCI decoding problem. I understand that legacy PCI only decoded 10 bits, so there are aliasing problems. I just don't know how much of the address space is affected. The check in the code right now just checks to see if bits 8 and 9 (0x300) are set.
if (resource->flags & IORESOURCE_IO) { /* Don't allow potential aliases over the legacy PCI * expansion card addresses. The legacy PCI decodes * only 10 bits, uses 0x100 - 0x3ff. Therefore, only * 0x00 - 0xff can be used out of each 0x400 block of * I/O space. */ if ((base & 0x300) != 0) { base = (base & ~0x3ff) + 0x400; } /* Don't allow allocations in the VGA I/O range. * PCI has special cases for that. */ else if ((base >= 0x3b0) && (base <= 0x3df)) { base = 0x3e0; } }
That makes it so that an allocation at 0x1100 triggers this check. Is it really the whole I/O address space that has this problem?
The other thing I'm not sure about is prefetchable memory. I see that it has its own resource in k8, but it doesn't seem clear how it was meant to be used. For right now it's unimportant, but I really need it to work correctly later.
I'll post some logs when it's closer to being ready. I just needed to write this down before the weekend.
Thanks for listening,
Myles
Hi Myles,
Myles Watson wrote:
I've decided to take on the resource allocator. I think it can be written to work in three passes that will be clearly defined.
You beat me to it. :)
Me, Ron and Stefan had a close look at it.
Here's a summary of how it works right now (as best I can tell)
There's a function called compute_allocate_resource in device/device.c which is the main part of the function.
It's, well, let's say special.
- It calls read bus resources on the bus
- In a loop, it goes through all the resources (largest to
smallest) and assigns them values. It marks them assigned, checks that they don't conflict with PCI, etc. 3. As it assigns values, it increments the base address it uses. 4. After it assigns the values, it sets the resource's size
Part of what makes this function confusing is how often it gets called.
Yes.
It gets called in the device code during phase4, but it also gets called every time a PCI bus resource is "recorded." So it gets called lots of times, and thus there's a need for the have_resources flag which means "have read resources."
Yes. The code is very clever, it can resolve any kind of conflict in resources, but our conclusion was that this should not really be neccesary if allocation is made a separate step. We couldn't think of a case where this behavior was really required, we believe the function has been written to be completely general.
My proposal is to split it into distinct pieces. I've implemented most of it but it's not quite there.
Awesome. This was also our idea.
- read_resources - Just read resources. No setting PCI Config
values, etc. 2. compute_resource_needs - Go through using the same algorithm as compute_allocate_resources, but don't assign the values. Just sum them up. This is also the time to propagate minimum resource alignments, etc. It is a depth-first dts traversal.
I started thinking about how to rewrite the allocation and my idea was to recurse down to the bottom and sum on the way up. It was quite straightforward in my mind, I hope I was right. :)
Suggest name sum_resources instead.
- assign_resource_values - Go through and assign actual values to
the resources, and make sure everything is reachable. This is also mostly the same as compute_allocate_resources, but it goes through breadth first instead of depth first.
I disagree with rolling constraint checking into this step. Either put that into 2 above (rather not) or insert 2.5 validate_resources.
- set_resources
I'm hoping to move the resource-specific code out of device/pci_device.c, and much of the PCI-specific code out of device/device.c It doesn't seem like the allocation code should know or care how to disable any specific type of resource when it has size 0. Same thing for the resource code. It shouldn't fiddle with start and end values. The have_resources flag is completely gone since resources only get read once at the beginning.
Lovely.
Right now I'm trying to understand the legacy PCI decoding problem. I understand that legacy PCI only decoded 10 bits, so there are aliasing problems.
What's "legacy PCI" ?
The other thing I'm not sure about is prefetchable memory. I see that it has its own resource in k8, but it doesn't seem clear how it was meant to be used. For right now it's unimportant, but I really need it to work correctly later.
I think Ron has some input on this.
I'll post some logs when it's closer to being ready. I just needed to write this down before the weekend.
Great work so far!
//Peter
Peter Stuge wrote:
- set_resources
I thought about simply making these more phases in stage 2. How do you feel about that?
I would also like us to clear up the stage 0 vs. stage 1 confusion. We have more quite distinct parts than we actually have defined parts. We should fix that now. There are already too many files in the tree.
//Peter
-----Original Message----- From: coreboot-bounces@coreboot.org [mailto:coreboot-bounces@coreboot.org] On Behalf Of Peter Stuge Sent: Friday, November 21, 2008 10:33 PM To: 'Coreboot' Subject: Re: [coreboot] Resource Allocator for v3
Peter Stuge wrote:
- set_resources
I thought about simply making these more phases in stage 2. How do you feel about that?
I guess you'll have to bring me up to speed here. Right now we have: Phase3 - device discovery Phase4 - resource allocation Phase5 - resource setting Phase6 - device enable
Right?
Are you saying to add: Phase4a - resource reading Phase4b - resource summing Phase4c - resource value assigning
I'm not sure that there's enough work in each of those to justify a separate phase. Especially because I'd like to make the resource summing generic enough that it never needs to get overwritten. Did I get your intent right?
I would also like us to clear up the stage 0 vs. stage 1 confusion. We have more quite distinct parts than we actually have defined parts. We should fix that now. There are already too many files in the tree.
I'm not up to speed on this discussion at all.
Thanks, Myles
-----Original Message----- From: coreboot-bounces@coreboot.org [mailto:coreboot-bounces@coreboot.org] On Behalf Of Peter Stuge Sent: Friday, November 21, 2008 10:14 PM To: 'Coreboot' Subject: Re: [coreboot] Resource Allocator for v3
Hi Myles,
Myles Watson wrote:
I've decided to take on the resource allocator. I think it can be written to work in three passes that will be clearly defined.
You beat me to it. :)
Me, Ron and Stefan had a close look at it.
Here's a summary of how it works right now (as best I can tell)
There's a function called compute_allocate_resource in device/device.c which is the main part of the function.
It's, well, let's say special.
- It calls read bus resources on the bus
- In a loop, it goes through all the resources (largest to
smallest) and assigns them values. It marks them assigned, checks that they don't conflict with PCI, etc. 3. As it assigns values, it increments the base address it uses. 4. After it assigns the values, it sets the resource's size
Part of what makes this function confusing is how often it gets called.
Yes.
It gets called in the device code during phase4, but it also gets called every time a PCI bus resource is "recorded." So it gets called lots of times, and thus there's a need for the have_resources flag which means "have read resources."
Yes. The code is very clever, it can resolve any kind of conflict in resources, but our conclusion was that this should not really be neccesary if allocation is made a separate step. We couldn't think of a case where this behavior was really required, we believe the function has been written to be completely general.
My proposal is to split it into distinct pieces. I've implemented most of it but it's not quite there.
Awesome. This was also our idea.
- read_resources - Just read resources. No setting PCI Config
values, etc. 2. compute_resource_needs - Go through using the same algorithm as compute_allocate_resources, but don't assign the values. Just sum them up. This is also the time to propagate minimum resource alignments, etc. It is a depth-first dts traversal.
I started thinking about how to rewrite the allocation and my idea was to recurse down to the bottom and sum on the way up. It was quite straightforward in my mind, I hope I was right. :)
Yes, I think it would work this way. I think it could all get done at once, but I was hoping that it would be simpler and easier to understand in multiple passes. Later I wondered about the duplicated code...
Suggest name sum_resources instead.
That's fine with me.
- assign_resource_values - Go through and assign actual values to
the resources, and make sure everything is reachable. This is also mostly the same as compute_allocate_resources, but it goes through breadth first instead of depth first.
I disagree with rolling constraint checking into this step. Either put that into 2 above (rather not) or insert 2.5 validate_resources.
If part two does its job correctly, there should be no need for checking. I guess that's a big if at first :)
- set_resources
I'm hoping to move the resource-specific code out of device/pci_device.c, and much of the PCI-specific code out of device/device.c It doesn't seem like the allocation code should know or care how to disable any specific type of resource when it has size 0. Same thing for the resource code. It shouldn't fiddle with start and end values. The have_resources flag is completely gone since resources only get read once at the beginning.
Lovely.
Right now I'm trying to understand the legacy PCI decoding problem. I understand that legacy PCI only decoded 10 bits, so there are aliasing problems.
What's "legacy PCI" ?
Here's the part I was referring to:
if (resource->flags & IORESOURCE_IO) { /* Don't allow potential aliases over the legacy PCI * expansion card addresses. The legacy PCI decodes * only 10 bits, uses 0x100 - 0x3ff. Therefore, only * 0x00 - 0xff can be used out of each 0x400 block of * I/O space. */ if ((base & 0x300) != 0) { base = (base & ~0x3ff) + 0x400; } /* Don't allow allocations in the VGA I/O range. * PCI has special cases for that. */ else if ((base >= 0x3b0) && (base <= 0x3df)) { base = 0x3e0; } }
It seems like I'm misreading this and/or the check isn't broad enough. If there are really devices claiming all IO where the last 10 bits decode to 0x300, then many more ranges should be affected. For example, a size 0x200 allocation from 0x200-0x3ff doesn't trigger this check. Neither does a size 0x200 allocation from 0x1200 - 0x13ff, but a size 0x20 allocation from 0x1320-0x133f does.
My theory is that this only matters for devices that are subtractively decoding the address space. At least in the k8 case, the 0x1200-0x13ff range transactions will never conflict with the "legacy PCI" unless the device is on the southbridge. If it's anywhere else, the HT routing means that the southbridge will never see the transaction.
The other thing I'm not sure about is prefetchable memory. I see that it has its own resource in k8, but it doesn't seem clear how it was meant to be used. For right now it's unimportant, but I really need it to work correctly later.
I think Ron has some input on this.
I'll post some logs when it's closer to being ready. I just needed to write this down before the weekend.
Great work so far!
Thanks.
Regards, Myles
On Mon, Nov 24, 2008 at 7:01 AM, Myles Watson mylesgw@gmail.com wrote:
if (resource->flags & IORESOURCE_IO) { /* Don't allow potential aliases over the legacy PCI * expansion card addresses. The legacy PCI decodes * only 10 bits, uses 0x100 - 0x3ff. Therefore, only * 0x00 - 0xff can be used out of each 0x400 block
of * I/O space. */ if ((base & 0x300) != 0) { base = (base & ~0x3ff) + 0x400; } /* Don't allow allocations in the VGA I/O range. * PCI has special cases for that. */ else if ((base >= 0x3b0) && (base <= 0x3df)) { base = 0x3e0; } }
It seems like I'm misreading this and/or the check isn't broad enough. If there are really devices claiming all IO where the last 10 bits decode to 0x300, then many more ranges should be affected. For example, a size 0x200 allocation from 0x200-0x3ff doesn't trigger this check. Neither does a size 0x200 allocation from 0x1200 - 0x13ff, but a size 0x20 allocation from 0x1320-0x133f does.
I don't understand. 0x200 & 0x300 is non-zero. 0x200 will trigger it.
My theory is that this only matters for devices that are subtractively decoding the address space. At least in the k8 case, the 0x1200-0x13ff range transactions will never conflict with the "legacy PCI" unless the device is on the southbridge. If it's anywhere else, the HT routing means that the southbridge will never see the transaction.
good point.
ron
On Mon, Nov 24, 2008 at 9:15 AM, ron minnich rminnich@gmail.com wrote:
On Mon, Nov 24, 2008 at 7:01 AM, Myles Watson mylesgw@gmail.com wrote:
if (resource->flags & IORESOURCE_IO) { /* Don't allow potential aliases over the legacy
PCI
* expansion card addresses. The legacy PCI
decodes
* only 10 bits, uses 0x100 - 0x3ff. Therefore,
only
* 0x00 - 0xff can be used out of each 0x400 block
of * I/O space. */ if ((base & 0x300) != 0) { base = (base & ~0x3ff) + 0x400; } /* Don't allow allocations in the VGA I/O range. * PCI has special cases for that. */ else if ((base >= 0x3b0) && (base <= 0x3df)) { base = 0x3e0; } }
It seems like I'm misreading this and/or the check isn't broad enough.
If
there are really devices claiming all IO where the last 10 bits decode to 0x300, then many more ranges should be affected. For example, a size
0x200
allocation from 0x200-0x3ff doesn't trigger this check. Neither does a
size
0x200 allocation from 0x1200 - 0x13ff, but a size 0x20 allocation from 0x1320-0x133f does.
I don't understand. 0x200 & 0x300 is non-zero. 0x200 will trigger it.
You're right. I didn't put that example together well. What I was trying to say was a resource allocation that covers that range but doesn't have its base address in the range. Maybe I won't mess up a simpler example:
An allocation of size 0x400 from 0x0 - 0x3ff. The allocation has several addresses within it that have problems with the legacy decode, but doesn't get caught because the base doesn't trigger the check. Same with 0x1000 - 0x13ff.
Thanks,
Myles
On Mon, Nov 24, 2008 at 8:22 AM, Myles Watson mylesgw@gmail.com wrote:
An allocation of size 0x400 from 0x0 - 0x3ff. The allocation has several addresses within it that have problems with the legacy decode, but doesn't get caught because the base doesn't trigger the check. Same with 0x1000 - 0x13ff.
yep, looks like a bug to me.
ron
On Fri, Nov 21, 2008 at 8:24 PM, Myles Watson mylesgw@gmail.com wrote:
I've decided to take on the resource allocator. I think it can be written to work in three passes that will be clearly defined.
Here's a summary of how it works right now (as best I can tell)
There's a function called compute_allocate_resource in device/device.c which is the main part of the function.
It takes a bus, a resource, a resource-type mask, and a type.
- It calls read bus resources on the bus
- In a loop, it goes through all the resources (largest to smallest) and
assigns them values. It marks them assigned, checks that they don't conflict with PCI, etc. 3. As it assigns values, it increments the base address it uses. 4. After it assigns the values, it sets the resource's size
Part of what makes this function confusing is how often it gets called.
It gets called in the device code during phase4, but it also gets called every time a PCI bus resource is "recorded." So it gets called lots of times, and thus there's a need for the have_resources flag which means "have read resources."
My proposal is to split it into distinct pieces. I've implemented most of it but it's not quite there.
- read_resources - Just read resources. No setting PCI Config values, etc.
- compute_resource_needs - Go through using the same algorithm as
compute_allocate_resources, but don't assign the values. Just sum them up. This is also the time to propagate minimum resource alignments, etc. It is a depth-first dts traversal.
compute_resources?
- assign_resource_values - Go through and assign actual values to the
resources, and make sure everything is reachable. This is also mostly the same as compute_allocate_resources, but it goes through breadth first instead of depth first.
allocate_resources?
I.e. split up compute_allocate.
compute_allocate was one pass in v1 because the chips were so simple. In v2 it was still one thing but the "mutual recursion" with read_resources got added, and you can see what happened. You are right that splitting up into two passes is a reasonable fix.
- set_resources
I'm hoping to move the resource-specific code out of device/pci_device.c, and much of the PCI-specific code out of device/device.c It doesn't seem like the allocation code should know or care how to disable any specific type of resource when it has size 0. Same thing for the resource code. It shouldn't fiddle with start and end values. The have_resources flag is completely gone since resources only get read once at the beginning.
good. I never like have_resources but the recursion of compute_allocate_resources required it.
Right now I'm trying to understand the legacy PCI decoding problem. I understand that legacy PCI only decoded 10 bits, so there are aliasing problems. I just don't know how much of the address space is affected. The check in the code right now just checks to see if bits 8 and 9 (0x300) are set.
if (resource->flags & IORESOURCE_IO) { /* Don't allow potential aliases over the legacy PCI * expansion card addresses. The legacy PCI decodes * only 10 bits, uses 0x100 - 0x3ff. Therefore, only * 0x00 - 0xff can be used out of each 0x400 block
of * I/O space. */ if ((base & 0x300) != 0) { base = (base & ~0x3ff) + 0x400; } /* Don't allow allocations in the VGA I/O range. * PCI has special cases for that. */ else if ((base >= 0x3b0) && (base <= 0x3df)) { base = 0x3e0; } }
That makes it so that an allocation at 0x1100 triggers this check. Is it really the whole I/O address space that has this problem?
If the decoding is really only 10 bits then yes, we're stuck. I think assuming badness is always safe in this case. So leave the test in :-)
The other thing I'm not sure about is prefetchable memory. I see that it has its own resource in k8, but it doesn't seem clear how it was meant to be used. For right now it's unimportant, but I really need it to work correctly later.
I'm assuming you know what prefetchable is, it really in a sense means 'cacheable'. You have to get this right from the start or things will be slow.
ron