attached.
ron
On 16.09.2008 01:31, ron minnich wrote:
attached.
Thanks.
That patch still has the NULL pointer dereference bug I pointed out earlier. Use the attached serengeti dts to expose the bug.
struct device dev_domain_0_pci0_18_0 = { .path = {.type=DEVICE_PATH_PCI,{.pci={ .devfn = PCI_DEVFN(0x18, 0x0)}}}, .device_configuration = &domain_0_pci0_18_0, .ops = &k8_ops, .sibling = &dev_domain_0_pci1_18_0, .link = { [0] = { .dev = &dev_domain_0_pci0_18_0, .link = 0, .children = &dev_domain_0_pci0_18_0_pci_0_0 }, [1] = { .dev = &dev_domain_0_pci1_18_0, .link = 0,
.link=1 would be correct.
}, [2] = { .dev = &dev_domain_0_pci2_18_0, .link = 0,
.link=2 would be correct.
.children = &dev_domain_0_pci2_18_0_pci_2_0 },
}, .links = 3, .bus = &dev_domain_0.link[0], .next = &dev_domain_0_pci1_18_0, .dtsname = "domain_0_pci0_18_0", .enabled = 1 }; struct device dev_domain_0_pci2_18_0 = { .path = {.type=DEVICE_PATH_PCI,{.pci={ .devfn = PCI_DEVFN(0x18, 0x0)}}}, .device_configuration = &domain_0_pci2_18_0, .ops = &k8_ops, .sibling = &dev_domain_0_ioport_2e, .links = 0, .bus = &dev_domain_0.link[0], .next = &dev_domain_0_ioport_2e, .dtsname = "domain_0_pci2_18_0", .enabled = 1 }; struct device dev_domain_0_pci2_18_0_pci_2_0 = { .path = {.type=DEVICE_PATH_PCI,{.pci={ .devfn = PCI_DEVFN(0x2, 0x0)}}}, .links = 0, .bus = &dev_domain_0_pci2_18_0.link[0],
.bus is invalid here because dev_domain_0_pci2_18_0 has zero link members and we use the first one here. If anyone ever accesses dev_domain_0_pci2_18_0_pci_2_0.bus->dev he will get a less than nice NULL pointer.
.next = &dev_domain_0_pci2_18_0, .dtsname = "domain_0_pci2_18_0_pci_2_0", .enabled = 1 };
Sorry, no Ack for now. I kind of dislike NULL pointer dereferences during runtime.
There are various ways to code around that problem, one of them is to make links explicit. Another one would be to kill the "helper (pci1,pci2)" structs for the domain.
Regards, Carl-Daniel
On Mon, Sep 15, 2008 at 5:04 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
On 16.09.2008 01:31, ron minnich wrote:
attached.
Thanks.
That patch still has the NULL pointer dereference bug I pointed out earlier. Use the attached serengeti dts to expose the bug.
struct device dev_domain_0_pci0_18_0 = { .path = {.type=DEVICE_PATH_PCI,{.pci={ .devfn = PCI_DEVFN(0x18, 0x0)}}}, .device_configuration = &domain_0_pci0_18_0, .ops = &k8_ops, .sibling = &dev_domain_0_pci1_18_0, .link = { [0] = { .dev = &dev_domain_0_pci0_18_0, .link = 0, .children = &dev_domain_0_pci0_18_0_pci_0_0 }, [1] = { .dev = &dev_domain_0_pci1_18_0, .link = 0,
.link=1 would be correct.
}, [2] = { .dev = &dev_domain_0_pci2_18_0, .link = 0,
.link=2 would be correct.
good catch, thanks.
.children = &dev_domain_0_pci2_18_0_pci_2_0 }, }, .links = 3, .bus = &dev_domain_0.link[0], .next = &dev_domain_0_pci1_18_0, .dtsname = "domain_0_pci0_18_0", .enabled = 1
}; struct device dev_domain_0_pci2_18_0 = { .path = {.type=DEVICE_PATH_PCI,{.pci={ .devfn = PCI_DEVFN(0x18, 0x0)}}}, .device_configuration = &domain_0_pci2_18_0, .ops = &k8_ops, .sibling = &dev_domain_0_ioport_2e, .links = 0, .bus = &dev_domain_0.link[0], .next = &dev_domain_0_ioport_2e, .dtsname = "domain_0_pci2_18_0", .enabled = 1 }; struct device dev_domain_0_pci2_18_0_pci_2_0 = { .path = {.type=DEVICE_PATH_PCI,{.pci={ .devfn = PCI_DEVFN(0x2, 0x0)}}}, .links = 0, .bus = &dev_domain_0_pci2_18_0.link[0],
.bus is invalid here because dev_domain_0_pci2_18_0 has zero link members and we use the first one here. If anyone ever accesses dev_domain_0_pci2_18_0_pci_2_0.bus->dev he will get a less than nice NULL pointer.
good point, I think there is an easy fix, coming up in a bit.
ron
try 2. This version resolves the errors Carl-Daniel found. Tested and working on simnow and dbe62.
ron
On 16.09.2008 05:01, ron minnich wrote:
try 2. This version resolves the errors Carl-Daniel found. Tested and working on simnow and dbe62.
Sorry, the NULL pointer bug exposed by my torture dts is still there, but in a slightly changed shape. And there's a new bug as well: dev_domain_0_pci2_18_0_pci_2_0 is not a child of any other device/bus anymore.
May I suggest we use my torture dts as default dts for the serengeti cheetah until multilink support is working?
Regards, Carl-Daniel
good idea, send me that dts.
thanks
ron
On 16.09.2008 05:32, ron minnich wrote:
good idea, send me that dts.
It was attached to my earlier reply. I'm attaching it again in case it was stripped by gmail.
Basically, each of my reviews of your patchset was done by running that dts through dtc and inspecting statictree.c for possible bugs. Yay for torture tests! ;-)
Oh, and we should check whether the new dts architecture supports representing multiprocessor environments with HT links between them. AFAICS that breaks for current svn HEAD and any of our proposed patches. We'd need cross-referencing abilities in the dts. And I fear that nobody will understand a dts for 8 processors in crosssed ladder configuration with 2 selected HT links out. We need graphical tools for that.
Regards, Carl-Daniel
Here we go again.
I've made Carl-Daniel's dts the default serengeti dts for now and changed dt compiler to eliminate the bug pointed out yesterday.
The dt compiler now creates the correct structures as near as I can tell. Maybe we're done with bugs for now :-)
If this passes the mailing list test, I will test it on real hardware, and commit if it all works. It is interesting to note that so far the mailing list has been more rigorous than the hardware!
thanks again
ron
On 16.09.2008 18:29, ron minnich wrote:
Here we go again.
I've made Carl-Daniel's dts the default serengeti dts for now and changed dt compiler to eliminate the bug pointed out yesterday.
Cool, thanks.
The dt compiler now creates the correct structures as near as I can tell.
I verified that the NULL pointer bugs are gone. Good work!
Maybe we're done with bugs for now :-)
I feel bad for pointing out more problems. IIRC dev->bus.dev used to point to the parent device. With the current patch, lots of devices are their own parents. That means you can't determine the device path anymore by walking upwards in in the tree and that could lead to problems matching a real device against the tree (false positives).
If this passes the mailing list test, I will test it on real hardware, and commit if it all works. It is interesting to note that so far the mailing list has been more rigorous than the hardware!
To be honest, I think the design of the patch has potential for improvement (my personal opinion only). Three points stick out: - The dummy struct device with full config (including primary link config) for each non-primary link seems like a fifth wheel. - The dual-purpose struct device for a device and its primary link will make more complex HT scenarios a real nightmare, potentially triggering a complete rewrite. - No explicit designation of link numbers for HT in the dts and usage of clever magic instead.
And no, I'm not going to force you to rewrite the patch again. At the beginning of october I'll send a patch to illustrate what I mean.
If you fix the bug with device parents and it works on real hardware, I think your patch is a good solution (maybe not to my design liking, but it should work).
By the way, a before/after bootlog for dbe62 would be appreciated. I'll read through the logs at the beginning of october to check for potential problems.
Regards, Carl-Daniel
I'll send another patch but you have to promise to check things again :-)
thanks for the reviews.
ron
On Tue, Sep 16, 2008 at 6:12 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I feel bad for pointing out more problems. IIRC dev->bus.dev used to point to the parent device. With the current patch, lots of devices are their own parents.
OK, my bad, I just fixed it and it removed 3 lines:
struct device dev_root = { .bus = &dev_root.link[0], struct device dev_cpus = { .bus = &dev_root.link[0], struct device dev_apic_0 = { .bus = &dev_root.link[0], struct device dev_domain_0 = { .bus = &dev_root.link[0], struct device dev_domain_0_pci_1_0 = { .bus = &dev_domain_0.link[0], struct device dev_domain_0_pci0_18_0 = { .bus = &dev_domain_0.link[0], struct device dev_domain_0_pci0_18_0_pci_0_0 = { .bus = &dev_domain_0_pci0_18_0.link[0], struct device dev_domain_0_pci0_18_0_pci_4_0 = { .bus = &dev_domain_0_pci0_18_0.link[0], struct device dev_domain_0_pci0_18_0_pci_5_0 = { .bus = &dev_domain_0_pci0_18_0.link[0], struct device dev_domain_0_pci1_18_0 = { .bus = &dev_domain_0_pci0_18_0.link[1], struct device dev_domain_0_pci2_18_0 = { .bus = &dev_domain_0_pci0_18_0.link[2], struct device dev_domain_0_pci2_18_0_pci_2_0 = { .bus = &dev_domain_0_pci0_18_0.link[2], struct device dev_domain_0_ioport_2e = { .bus = &dev_domain_0.link[0], struct device_operations *all_device_operations[] = { .bus = &dev_root.link[0],
better?
To be honest, I think the design of the patch has potential for improvement (my personal opinion only). Three points stick out:
- The dummy struct device with full config (including primary link
config) for each non-primary link seems like a fifth wheel.
no, it is needed for the dynamic device tree code. What's going to happen is that the bus struct for that device, although in the primary link, points to that fifth wheel. It looks odd but IIRC it does work. Let's leave it in.
- The dual-purpose struct device for a device and its primary link will
make more complex HT scenarios a real nightmare, potentially triggering a complete rewrite.
Maybe. It has indeed worked for some years, and some weird configurations.
- No explicit designation of link numbers for HT in the dts and usage of
clever magic instead.
agreed. This is going to work for now. Once k8 is up and working we may need another spin on the device code.
Testing now on both platforms ... I assume if I post an "it works" and you see this fixed parent link, we are good to go?
ron
OK, the new code properly sets parent links and also boots on both simnow and dbe62.
I am attaching my statictree.c for perusal.
thanks
ron
On 17.09.2008 05:53, ron minnich wrote:
On Tue, Sep 16, 2008 at 6:12 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I feel bad for pointing out more problems. IIRC dev->bus.dev used to point to the parent device. With the current patch, lots of devices are their own parents.
OK, my bad, I just fixed it and it removed 3 lines:
struct device dev_root = { .bus = &dev_root.link[0], struct device dev_cpus = { .bus = &dev_root.link[0], struct device dev_apic_0 = { .bus = &dev_root.link[0], struct device dev_domain_0 = { .bus = &dev_root.link[0], struct device dev_domain_0_pci_1_0 = { .bus = &dev_domain_0.link[0], struct device dev_domain_0_pci0_18_0 = { .bus = &dev_domain_0.link[0], struct device dev_domain_0_pci0_18_0_pci_0_0 = { .bus = &dev_domain_0_pci0_18_0.link[0], struct device dev_domain_0_pci0_18_0_pci_4_0 = { .bus = &dev_domain_0_pci0_18_0.link[0], struct device dev_domain_0_pci0_18_0_pci_5_0 = { .bus = &dev_domain_0_pci0_18_0.link[0], struct device dev_domain_0_pci1_18_0 = { .bus = &dev_domain_0_pci0_18_0.link[1], struct device dev_domain_0_pci2_18_0 = { .bus = &dev_domain_0_pci0_18_0.link[2], struct device dev_domain_0_pci2_18_0_pci_2_0 = { .bus = &dev_domain_0_pci0_18_0.link[2], struct device dev_domain_0_ioport_2e = { .bus = &dev_domain_0.link[0], struct device_operations *all_device_operations[] = { .bus = &dev_root.link[0],
better?
Yes. AFAICS it now matches the original code. Good.
To be honest, I think the design of the patch has potential for improvement (my personal opinion only). Three points stick out:
- The dummy struct device with full config (including primary link
config) for each non-primary link seems like a fifth wheel.
no, it is needed for the dynamic device tree code. What's going to happen is that the bus struct for that device, although in the primary link, points to that fifth wheel. It looks odd but IIRC it does work. Let's leave it in.
Hm. I won't stop you now, but I'll work on a slightly different design RFC.
- The dual-purpose struct device for a device and its primary link will
make more complex HT scenarios a real nightmare, potentially triggering a complete rewrite.
Maybe. It has indeed worked for some years, and some weird configurations.
I can't argue with that. Right now the priority is getting K8 working reasonably well. Design cleanliness can come after stuff works.
- No explicit designation of link numbers for HT in the dts and usage of
clever magic instead.
agreed. This is going to work for now. Once k8 is up and working we may need another spin on the device code.
Go ahead with your patch. We need it now.
Testing now on both platforms ... I assume if I post an "it works" and you see this fixed parent link, we are good to go?
Yes. Then it is Acked-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Regards, Carl-Daniel
Committed revision 865.
And I accidentally commited the amp support too, oh well.
I'm having a slow morning.
ron