Hey all,
It appears that under certain situations / hardware, HT can come up with the LinkFail and CrcError bits set on certain devices, even though the bus isn't *currently* in an error state. This causes 'hypertransport_scan_chain()' to stop traversing down a chain. I've made the following patch which knocks down the error state and re-reads to identify if the error is transient or not (It also reports the error rather than silently aborts the chain scan which caused me about 6 hours of hunting to find):
*****BEGIN CUT***** Index: hypertransport.c =================================================================== --- hypertransport.c (revision 2064) +++ hypertransport.c (working copy) @@ -345,12 +345,25 @@ /* Wait until the link initialization is complete */ do { ctrl = pci_read_config16(prev.dev, prev.pos + prev.ctrl_off); - /* Is this the end of the hypertransport chain? - * Has the link failed? - * If so further scanning is pointless. - */ - if (ctrl & ((1 << 6) | (1 << 4))) { - goto end_of_chain; + + if (ctrl & (1 << 6)) + goto end_of_chain; // End of chain + + if (ctrl & ((1 << 4) | (1 << 8))) { + /* + * Either the link has failed, or we have + * a CRC error. + * Sometimes this can happen due to link + * retrain, so lets knock it down and see + * if its transient + */ + ctrl |= ((1 << 6) | (1 <<8)); // Link fail + Crc + pci_write_config16(prev.dev, prev.pos + prev.ctrl_off, ctrl); + ctrl = pci_read_config16(prev.dev, prev.pos + prev.ctrl_off); + if (ctrl & ((1 << 4) | (1 << 8))) { + printk_alert("Detected error on Hypertransport Link\n"); + goto end_of_chain; + } } } while((ctrl & (1 << 5)) == 0);
****END CUT*****
Hi San,
this is good stuff, it should go in the tree. If nobody objects, I'll put it in by the end of this weekend. Same for the other patch (gotta check if the last lnxi mega patch doesn't do that one already though)
Stefan
* San Mehat san@google.com [051021 17:19]:
*****BEGIN CUT***** Index: hypertransport.c =================================================================== --- hypertransport.c (revision 2064) +++ hypertransport.c (working copy) @@ -345,12 +345,25 @@ /* Wait until the link initialization is complete */ do { ctrl = pci_read_config16(prev.dev, prev.pos + prev.ctrl_off);
/* Is this the end of the hypertransport chain?
* Has the link failed?
* If so further scanning is pointless.
*/
if (ctrl & ((1 << 6) | (1 << 4))) {
goto end_of_chain;
if (ctrl & (1 << 6))
goto end_of_chain; // End of chain
if (ctrl & ((1 << 4) | (1 << 8))) {
/*
* Either the link has failed, or we have
* a CRC error.
* Sometimes this can happen due to link
* retrain, so lets knock it down and see
* if its transient
*/
ctrl |= ((1 << 6) | (1 <<8)); // Link fail +
Crc
pci_write_config16(prev.dev, prev.pos +
prev.ctrl_off, ctrl);
ctrl = pci_read_config16(prev.dev, prev.pos +
prev.ctrl_off);
if (ctrl & ((1 << 4) | (1 << 8))) {
printk_alert("Detected error on
Hypertransport Link\n");
goto end_of_chain;
} } } while((ctrl & (1 << 5)) == 0);
****END CUT*****
-- LinuxBIOS mailing list LinuxBIOS@openbios.org http://www.openbios.org/mailman/listinfo/linuxbios
Hey Stefan,
Thanks man.
Incidentally, does anyone have an example of how to represent 2 southbridges on an HT chain?.. ie:
CPU0 <---HT---> SB1 <---HT---> SB2
having issues figuring it out...
thanks ;)
-san
On 10/21/05, Stefan Reinauer stepan@openbios.org wrote:
Hi San,
this is good stuff, it should go in the tree. If nobody objects, I'll put it in by the end of this weekend. Same for the other patch (gotta check if the last lnxi mega patch doesn't do that one already though)
Stefan
- San Mehat san@google.com [051021 17:19]:
*****BEGIN CUT***** Index: hypertransport.c =================================================================== --- hypertransport.c (revision 2064) +++ hypertransport.c (working copy) @@ -345,12 +345,25 @@ /* Wait until the link initialization is complete */ do { ctrl = pci_read_config16(prev.dev, prev.pos + prev.ctrl_off);
- /* Is this the end of the hypertransport chain?
- Has the link failed?
- If so further scanning is pointless.
- */
- if (ctrl & ((1 << 6) | (1 << 4))) {
- goto end_of_chain;
- if (ctrl & (1 << 6))
- goto end_of_chain; // End of chain
- if (ctrl & ((1 << 4) | (1 << 8))) {
- /*
- Either the link has failed, or we have
- a CRC error.
- Sometimes this can happen due to link
- retrain, so lets knock it down and see
- if its transient
- */
- ctrl |= ((1 << 6) | (1 <<8)); // Link fail +
Crc
- pci_write_config16(prev.dev, prev.pos +
prev.ctrl_off, ctrl);
- ctrl = pci_read_config16(prev.dev, prev.pos +
prev.ctrl_off);
- if (ctrl & ((1 << 4) | (1 << 8))) {
- printk_alert("Detected error on
Hypertransport Link\n");
- goto end_of_chain;
- }
} } while((ctrl & (1 << 5)) == 0);
****END CUT*****
-- LinuxBIOS mailing list LinuxBIOS@openbios.org http://www.openbios.org/mailman/listinfo/linuxbios
-- LinuxBIOS mailing list LinuxBIOS@openbios.org http://www.openbios.org/mailman/listinfo/linuxbios
* San Mehat san@google.com [051021 21:40]:
Hey Stefan,
Incidentally, does anyone have an example of how to represent 2 southbridges on an HT chain?.. ie:
CPU0 <---HT---> SB1 <---HT---> SB2
having issues figuring it out...
On the island aruma I just enumerated them as follows. In the below example SB1 and SB2 would be an amd8131:
chip northbridge/amd/amdk8 device pci 19.0 on end # LDT0 device pci 19.0 on end # LDT1 device pci 19.0 on # LDT2 chip southbridge/amd/amd8131 # the on/off keyword is # mandatory device pci 0.0 on end device pci 0.1 on end device pci 1.0 on end device pci 1.1 on end end chip southbridge/amd/amd8131 # the on/off keyword is # mandatory device pci 0.0 on end device pci 0.1 on end device pci 1.0 on end device pci 1.1 on end end end # LDT2 device pci 19.1 on end device pci 19.2 on end device pci 19.3 on end end
Stefan
Stefan Reinauer wrote:
Hi San,
this is good stuff, it should go in the tree. If nobody objects, I'll put it in by the end of this weekend. Same for the other patch (gotta check if the last lnxi mega patch doesn't do that one already though)
it looks good to me too.
We're still waiting on the 'how to commit' doc, but I think in this case approval by stefan and san constitutes approval.
ron
Great... thanks guys.
On 10/24/05, Ronald G Minnich rminnich@lanl.gov wrote:
Stefan Reinauer wrote:
Hi San,
this is good stuff, it should go in the tree. If nobody objects, I'll put it in by the end of this weekend. Same for the other patch (gotta check if the last lnxi mega patch doesn't do that one already though)
it looks good to me too.
We're still waiting on the 'how to commit' doc, but I think in this case approval by stefan and san constitutes approval.
ron
-- LinuxBIOS mailing list LinuxBIOS@openbios.org http://www.openbios.org/mailman/listinfo/linuxbios