I found this problem on the AMD Geode db800 platform but it probably affects other platforms and chipsets. LinuxBIOS sets every PCI device PERR# enable and SERR# enable in the PCI Command Register. This is probably not the right thing to do for a number of reasons.
The PCI spec is not clear on what should be done with a PERR# it is a device, chipset and platform design desicsion and we don't know what the platform or device will do. For example, PERR# can be connected to SERR# which can be connected to reset. Instead of just getting a parity error you have now reset the system, oops.
Also, PERR# is required to be reported to system software. This has to be done through the driver via an interrupt (or through polling, ewwww). As you can see, a lot of pieces need to be in place, the device has to do the interrupt, the driver should do something with the PERR#, and then? I don't know if the kernel does anything with the PERR# message. LinuxBIOS is no longer in play so it can't do anything with the errors.
Because of the uncertainty of PERR# and SERR# I don't think that many manufacturers use it, especially in the consumer space. They may still detect and fix parity errors at the hardware level or in their driver, just not use the PERR# and SERR signals. Note that is a device does have a parity error it will still report it in
As I noted above, the SERR# can cause a system reset. I think that it would be better to do nothing and maybe the device will stop/hang/etc rather than to mysteriously reset. Again, let the system software decide policy.
All this seems like a good reason to let the driver and/or system level software enable PERR# and SERR# and for LinuxBIOS to leave them alone.
Dissenting opinions welcome.
Marc
On 9/25/07, Marc Jones marc.jones@amd.com wrote:
All this seems like a good reason to let the driver and/or system level software enable PERR# and SERR# and for LinuxBIOS to leave them alone.
you're right. I don't even remember when those started getting set, and had not noticed it, but it's a mistake.
I'm almost inclined to say "leave that line in there commented out, with a warning: NEVER DO THIS!". That's up to you.
Acked-by: Ronald G. Minnich rminnich@gmail.com
Quoting ron minnich rminnich@gmail.com:
On 9/25/07, Marc Jones marc.jones@amd.com wrote:
All this seems like a good reason to let the driver and/or system level software enable PERR# and SERR# and for LinuxBIOS to leave them alone.
you're right. I don't even remember when those started getting set, and had not noticed it, but it's a mistake.
I'm almost inclined to say "leave that line in there commented out, with a warning: NEVER DO THIS!". That's up to you.
Acked-by: Ronald G. Minnich rminnich@gmail.com
Marc, your the man! This is the exact problem I am having with the Intel 82801DB. When the PCI Bridge goes to "Enabling resources..." it just freezes. I have traced it back to the "command |= (PCI_COMMAND_PARITY + PCI_COMMAND_SERR); /* error check */" line. Yeh I have done alot of reading about parity errors and it seems to be something software drivers on the OS level use, not at the bios level. Parity error checking also seems to be very old method. So, I was starting to question why this line was even there. These bits are not set with the factory bios. Also, what about the pci_bus_enable_resources() function in pci_device.c? Do we want to comment out the "ctrl |= (PCI_BRIDGE_CTL_PARITY + PCI_BRIDGE_CTL_SERR); /* error check */" line also?? Anyways nice work Marc:-)
Acked-by: Joseph Smith joe@smittys.pointclark.net
Thanks - Joe
ron minnich wrote:
On 9/25/07, Marc Jones marc.jones@amd.com wrote:
All this seems like a good reason to let the driver and/or system level software enable PERR# and SERR# and for LinuxBIOS to leave them alone.
you're right. I don't even remember when those started getting set, and had not noticed it, but it's a mistake.
I'm almost inclined to say "leave that line in there commented out, with a warning: NEVER DO THIS!". That's up to you.
Acked-by: Ronald G. Minnich rminnich@gmail.com
I don't think that the warning is necessary. I prefer to just remove the code. r2810
thanks, Marc
Thee lines in i82801xx_pci.c need to be removed. They cause the i82801DB to reset. See this thread for more info:
http://article.gmane.org/gmane.linux.bios/26791
Signed-off-by: Joseph Smith joe@smittys.pointclark.net
Thanks - Joe
joe@smittys.pointclark.net wrote:
Thee lines in i82801xx_pci.c need to be removed. They cause the i82801DB to reset. See this thread for more info:
http://article.gmane.org/gmane.linux.bios/26791
Signed-off-by: Joseph Smith joe@smittys.pointclark.net
Thanks - Joe
Acked-by: Corey Osgood corey.osgood@gmail.com
Committed revision 2816.
ron