Hey all,

It appears that under certain situations / hardware, HT can come up with the LinkFail and CrcError bits set on certain devices, even though the bus isn't *currently* in an error state. This causes 'hypertransport_scan_chain()' to stop traversing down a chain. I've made the following patch which knocks down the error state and re-reads to identify if the error is transient or not (It also reports the error rather than silently aborts the chain scan which caused me about 6 hours of hunting to find):

*****BEGIN CUT*****
Index: hypertransport.c
===================================================================
--- hypertransport.c    (revision 2064)
+++ hypertransport.c    (working copy)
@@ -345,12 +345,25 @@
                /* Wait until the link initialization is complete */
                do {
                        ctrl = pci_read_config16(prev.dev, prev.pos + prev.ctrl_off);
-                       /* Is this the end of the hypertransport chain?
-                        * Has the link failed?
-                        * If so further scanning is pointless.
-                        */
-                       if (ctrl & ((1 << 6) | (1 << 4))) {
-                               goto end_of_chain;
+
+                       if (ctrl & (1 << 6))
+                               goto end_of_chain;      // End of chain
+
+                       if (ctrl & ((1 << 4) | (1 << 8))) {
+                               /*
+                                * Either the link has failed, or we have
+                                * a CRC error.
+                                * Sometimes this can happen due to link
+                                * retrain, so lets knock it down and see
+                                * if its transient
+                                */
+                               ctrl |= ((1 << 6) | (1 <<8)); // Link fail + Crc
+                               pci_write_config16(prev.dev, prev.pos + prev.ctrl_off, ctrl);
+                               ctrl = pci_read_config16(prev.dev, prev.pos + prev.ctrl_off);
+                               if (ctrl & ((1 << 4) | (1 << 8))) {
+                                       printk_alert("Detected error on Hypertransport Link\n");
+                                       goto end_of_chain;
+                               }
                        }
                } while((ctrl & (1 << 5)) == 0);

****END CUT*****