Andi Kleen wrote:
Please note there is a high barrier of entry for any kind of BIOS workarounds - in particular for LinuxBIOS i'm not very motivated because you guys can just fix the BIOS.
Hi Andi, just wanted to let you know, that I do agree that this is a good policy in general. In terms of LinuxBIOS, now that we're starting to approach 2M nodes out in the field, fixing it is geting a wee bit harder. Again, I'm not disagreeing with the point above, just mentioning that "just fix the BIOS" is not as easy as it was when we had all the LinuxBIOS nodes in the world -- all 13 of them -- in my lab :-)
This APIC lifting thing has been a real mess, and IIRC what really pushed it originally was the island aruma, with its 32 PCI busses. It's amazing how PC architectures always seem to involve over-running bit-fields -- 4 bits, 6 bits, 8 bits, 10 bits, whatever.
Getting it all to work has involved lots of backtracking, as we found that fixing this problem HERE broke that legacy system THERE -- where legacy seems to mean "more than 3 weeks old". The mail traffic on the linuxbios list on this issue has been interesting, and in some cases, more than I can keep up with. Part of the issue is that we all have mutually exclusive hardware, and we keep running into hardware limitations that don't seem to be known to even the guys who make the chips. So we think we have the permanent fix, and somebody pops up to report we just broke their mainboard -- and they're the only ones with that mainboard, so testing is hard.
At the same time, we seem to be treading in territory where the fuctory BIOSes have not yet been. We're in the weird position, at times, of finding things out before the proprietary BIOSes get there.
Sometimes the ease of updating the BIOS can cause troubles you don't expect. Fuctory BIOSes seem to count on infrequent updates, forked code bases, and so on, so that you have to update each mainboard source base individually -- they have the disadvantage of a forked code base, but the one advantage is that a mod to fix one platform won't ever break another.
At some point I had understood that linux was going to be able to function without resorting to SRAT tables -- has that changed? Is this patch really intrusive enough that it is not acceptable? The issue is that we get LinuxBIOS right on a platform, and then some new rev of the CPU comes along, and LinuxBIOS gets updated in a way that is not obviously going to cause trouble for the older stuff -- but then it does, for some other reason. I am hoping this apic lifting will settle down in the next while, but it's been hard.
thanks
ron