Stefan,
Thanks for your great work. I just tested it on AMD Olive Hill, and found a few issues.
1. Your patch defaults SPI clock to 16.5 MHz, which didn't work. I also found none of 22, 33, 100 MHz worked. Only passing -p internal:spispeed="66 MHz" worked.
2. Reading is much faster than my/Carl's patch, but some bits were incorrect. The speedup must have come from your use of the now longer FIFO, but why is it reading bad values?