While we're on the 'error handling' track, I also had an issue where 'ht_optimize_link()' was continually indicating that a reset was required and causing an endless reset loop. The culprit was the code which performs an 8 bit read to get the *current* frequency of the link and doesn't mask off the error bits next to the frequency bits (bits 0-4 are frequency, and bits 5-7 are error status).

Once again, this was triggered by a transient error on the link (likely due to link retraining).

Heres the patch:

****START CUT****
Index: incoherent_ht.c
===================================================================
--- incoherent_ht.c     (revision 2064)
+++ incoherent_ht.c     (working copy)

@@ -198,12 +206,12 @@
        freq = log2(freq_cap1 & freq_cap2);
 
        /* See if I am changing the link freqency */
-       old_freq = pci_read_config8(dev1, pos1 + LINK_FREQ(offs1));
+       old_freq = (pci_read_config8(dev1, pos1 + LINK_FREQ(offs1)) & 0x0f); // Mask off error bits
        needs_reset |= old_freq != freq;
-       old_freq = pci_read_config8(dev2, pos2 + LINK_FREQ(offs2));
+       old_freq = (pci_read_config8(dev2, pos2 + LINK_FREQ(offs2)) & 0x0f); // Mask off error bits
        needs_reset |= old_freq != freq;
 
-       /* Set the Calulcated link frequency */
+       /* Set the Calculated link frequency */
        pci_write_config8(dev1, pos1 + LINK_FREQ(offs1), freq);
        pci_write_config8(dev2, pos2 + LINK_FREQ(offs2), freq);
***END CUT****