Thanks for the response Kevin

I've spent another day trying to get to the bottom of this one, still no luck.

Attempt to 16 byte align: No difference.

Errata: One point mentioned with regards to UHCI but its affect would be a complete disabling of the controller, not this _sort of_ working issue I see, and, coreboot already has code which applies the workaround.

Registers: Nothing much interesting here either

Before:
Regs: USBCMD: c1 USBSTS: 0 USBINTR: 0 USBFRNUM: 2ab USBSOF: 40 USBFLBASEAD: eaac USBPORTSC1: 1a7 USBPORTSC2: 80

After:
Regs: USBCMD: c1 USBSTS: 0 USBINTR: 0 USBFRNUM: 4b3 USBSOF: 40 USBFLBASEAD: e2cc USBPORTSC1: 1a7 USBPORTSC2: 80

The more I look the more I find out _OK_ everything is. I did try another experiment which had even more interesting results, I tacked some more data onto the end of the setup packet (a5 a5 a5 a5 33 33 33 33 66 66 66 66) - what did it do? It made an exact copy of it immediately after the first one, it always does this regardless of how much data is being sent, and always writes a 'mystery' 12 bytes after that.

before:

1fbc44b0: 00 05 01 00 00 00 00 00 a5 a5 a5 a5 33 33 33 33
1fbc44c0: 66 66 66 66 ff ff ff ff ff ff ff ff ff ff ff ff
1fbc44d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
1fbc44e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

after:

1fbc44b0: 00 05 01 00 00 00 00 00 a5 a5 a5 a5 33 33 33 33
1fbc44c0: 66 66 66 66 00 05 01 00 00 00 00 00 a5 a5 a5 a5
1fbc44d0: 33 33 33 33 66 66 66 66 fd 03 00 00 00 00 00 00
1fbc44e0: 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff

I think the setting of the ActLen to 7ff is just its way of zeroing the count when the transaction begins, perhaps it never gets incremented because the process which is meant to fetch the memory is broken.

Current theory: Something is broken in the DMA process. I can't really see how a UHCI controller could ever do something this crazy.

Next step, assuming no one has any ideas, start examining why Linux works OK, I had a quick look for quirks but couldn't spot anything, but realistically, that is going to be a significant effort to see through.

Cheers
Matt

On Sun, Aug 5, 2012 at 4:44 PM, Kevin O'Connor <kevin@koconnor.net> wrote:
On Sun, Aug 05, 2012 at 12:21:13PM +0100, Matthew Millman wrote:
> Hi
>
> I'm seeing a rather interesting problem with UHCI on Intel US15W and
> wondered if anyone else had seen anything like this before. I noticed it
> when I plugged in a USB keyboard, which caused a crash due to something
> corrupting the stack? it turns out that the stack has been trashed by the
> UHCI controller via DMA?!
>
> When trying to transmit the 8 byte address setup packet, the hardware
> doesn't quite seem to be doing as it's told. SeaBIOS sets up the UHCI TDs
> exactly as per the spec - no problems there,
>
> Once the QH element is set, instead of transmitting the 8 bytes as
> described in the TD, it transmits a full 1023 bytes? (according to the
> returned TD) UHCI then goes ahead and overwrites another 35 bytes beyond
> the end of the buffer pointed to by the TD.
>
> Here's the 8 bytes of the setup packet (I've set everything after it to
> 0xFF):
>
> 1fbc1f95: 00 05 01 00 00 00 00 00 ff ff ff
> 1fbc1fa0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 1fbc1fb0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 1fbc1fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 1fbc1fd0: ff ff ff ff ff
>
> Here it is after the UHCI controller has been at it. The only code to
> execute between these two dumps is this:
>
> pipe->qh.element = (u32)&tds[0]; (in uhci_control())
>
> 1fbc1f95: 00 05 01 00 00 00 00 00 ff ff ff
> 1fbc1fa0: bf 00 05 01 00 00 00 00 00 ff ff ff fd 03 00 00
> 1fbc1fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 1fbc1fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 1fbc1fd0: ff ff ff ff ff
>
> TD Chain before:
> 1fbc4870: 84 48 bc 1f 00 00 80 1c 2d 00 e0 00 95 1f bc 1f
> 1fbc4880: 01 00 00 00 00 00 80 04 69 00 e8 ff 00 00 00 00
>
> TD Chain after:
> 1fbc4870: 84 48 bc 1f ff 07 80 1c 2d 00 e0 00 95 1f bc 1f
> 1fbc4880: 01 00 00 00 00 00 80 04 69 00 e8 ff 00 00 00 00

My read of the spec says an actlen=0x07ff means a null transfer (not
1023 bytes).  However, given that the status is still active I don't
think it really matters what's in the td.

> I'm wondering if I'm not the first person to have seen this. The problem
> (without detailed debugging) manifests its self exactly as described in
> this message:

I haven't seen this type of report before.  A couple of things you
could try: dump the USB controller registers as well (the controller
may have shutdown for a different reason), check to see if any other
transfer attempted to use 0x1fbc1fa0 in the past (perhaps the
controller has something stale cached), look for an errata for the
chipset, look through the linux code for the chipset to see if it is
working about something, try aligning the setup packet buffer to 16
bytes.

-Kevin