Hi Kevin,
I'm running into an intermittent (roughly 50% of the time) USB keyboard freeze, using the following command line:
bin/qemu-system-x86_64 -enable-kvm -m 2048 -cpu core2duo -M q35 \ -usb -device usb-kbd -device usb-mouse \ -device ide-drive,bus=ide.2,drive=HDD \ -drive id=HDD,if=none,snapshot=on,file=./Fedora-Live-Desktop-x86_64-20-1.iso \ -bios bios-test.bin
where both qemu and bios-test.bin are built from their latest currently available git masters.
The symptom is that, roughly half the time I can't navigate the grub splash screen to select an entry via the up/down arrow keys. After a few keystrokes, I start getting console errors like this:
usb-kbd: warning: key event queue full usb-kbd: warning: key event queue full usb-kbd: warning: key event queue full ...
It's not specific to grub/Fedora either, I get the same behavior with Chameleon when trying to boot OS X.
The other 50% of the time, things work just fine.
After a bisect on Seabios, I *think* the culprit is commit 4ea2fa9620ec4c0c6da53d534d6a69c8a0db27cd
I couldn't easily reverse that patch due to other churn in the code, so I wasn't able to build a seabios with everything but that commit to test for sure.
Also, one has to fire up qemu several (at least 10) times to adequately make sure the problem isn't present, so take my bisect with a grain of salt in case the content of the patch doesn't make sense with respect to the error behavior :)
Thanks, --Gabriel
On Mon, Feb 03, 2014 at 11:50:21AM -0500, Gabriel L. Somlo wrote:
Hi Kevin,
I'm running into an intermittent (roughly 50% of the time) USB keyboard freeze, using the following command line:
bin/qemu-system-x86_64 -enable-kvm -m 2048 -cpu core2duo -M q35 \ -usb -device usb-kbd -device usb-mouse \ -device ide-drive,bus=ide.2,drive=HDD \ -drive id=HDD,if=none,snapshot=on,file=./Fedora-Live-Desktop-x86_64-20-1.iso \ -bios bios-test.bin
Hi Gerd,
You're familiar with the QEMU USB code. Any idea what could be causing these intermittent failures? I can reproduce the above issue as well. It's not related to xhci at all - it seems related to this change:
--- a/src/hw/usb.c +++ b/src/hw/usb.c @@ -263,6 +263,8 @@ usb_set_address(struct usbdevice_s *usbdev) if (cntl->maxaddr >= USB_MAXADDR) return -1;
+ msleep(USB_TIME_RSTRCY); + // Create a pipe for the default address. struct usb_endpoint_descriptor epdesc = { .wMaxPacketSize = 8, @@ -272,8 +274,6 @@ usb_set_address(struct usbdevice_s *usbdev) if (!usbdev->defpipe) return -1;
- msleep(USB_TIME_RSTRCY); - // Send set_address command. struct usb_ctrlrequest req; req.bRequestType = USB_DIR_OUT | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
It seems like the QEMU UHCI code wants the RSTRCY sleep to be after the pipe is created. But, I can't see why that should be required or why QEMU would even care about a RSTRCY sleep at all.
The failure doesn't depend on kvm, the failure goes away when the above change is reverted, and the failure does not occur if there is only a usb keyboard (and not a usb mouse). When comparing the seabios debug output (level 8) of a successful run to a failed run, the output is basically identical (all the way down to thread timings and the memory positions returned by malloc). With debug level 8, I see the failure more often then not.
Any thoughts?
-Kevin
On Mon, Feb 03, 2014 at 12:46:50PM -0500, Kevin O'Connor wrote:
On Mon, Feb 03, 2014 at 11:50:21AM -0500, Gabriel L. Somlo wrote:
Hi Kevin,
I'm running into an intermittent (roughly 50% of the time) USB keyboard freeze, using the following command line:
bin/qemu-system-x86_64 -enable-kvm -m 2048 -cpu core2duo -M q35 \ -usb -device usb-kbd -device usb-mouse \ -device ide-drive,bus=ide.2,drive=HDD \ -drive id=HDD,if=none,snapshot=on,file=./Fedora-Live-Desktop-x86_64-20-1.iso \ -bios bios-test.bin
Hi Gerd,
You're familiar with the QEMU USB code. Any idea what could be causing these intermittent failures? I can reproduce the above issue as well. It's not related to xhci at all - it seems related to this change:
Interestingly, regardless of the keyboard failure, it looks like the usb mouse is never detected correctly (nor the usb hub it's attached to). The failure is more pronounced when using:
-usb -device usb-kbd -device usb-net
as in this case, one can see in the seabios logs:
|1ffaa000| device rev=0100 cls=00 sub=00 proto=00 size=8 [...] |1ffaa000| usb_hid_setup 0x1ffad2cc [...] |1ffaa000| USB keyboard initialized [...] \1ffaa000/ End thread
[...] |1ffac000| device rev=0100 cls=00 sub=00 proto=00 size=8 [...] |1ffac000| usb_hid_setup 0x1ffad2cc [...] \1ffac000/ End thread
Which indicates two USB HID devices are found when there is only one actual device. So, it seems like when SeaBIOS is trying to initialize the USB hub, it is somehow talking to the USB keyboard again.
Does QEMU keep some kind of cache of UHCI transfer descriptors that may be getting out of sync?
-Kevin
On Mon, Feb 03, 2014 at 01:59:38PM -0500, Kevin O'Connor wrote:
Which indicates two USB HID devices are found when there is only one actual device. So, it seems like when SeaBIOS is trying to initialize the USB hub, it is somehow talking to the USB keyboard again.
Does QEMU keep some kind of cache of UHCI transfer descriptors that may be getting out of sync?
I looked through the QEMU hcd-uhci.c code, and I think QEMU is buggy here. QEMU keeps a mapping of queues that are indexed by the usb device address and endpoint (see uhci_queue_new() ). When the usb device has address 0, it creates an entry in this mapping and the entry remains even after the device is given a new address. Later, when the next device also has address 0, QEMU attempts to use that mapping even though the 0 address now corresponds with a different device.
Before the move of the seabios sleep call, the seabios allocators just happened to give a different address for the queue head, and this was enough to tip QEMU off and it invalidated the mapping. Now, though, the seabios allocators just happen to give the same address for the queue head, and QEMU is happily sending the commands to the wrong device.
It seems to me that QEMU should invalidate its mappings on a set_address command, and it shouldn't have to worry about invalidating on a qh_addr change. But, I'm not really sure how to fix this.
-Kevin
Hi,
I looked through the QEMU hcd-uhci.c code, and I think QEMU is buggy here. QEMU keeps a mapping of queues that are indexed by the usb device address and endpoint (see uhci_queue_new() ). When the usb device has address 0, it creates an entry in this mapping and the entry remains even after the device is given a new address. Later, when the next device also has address 0, QEMU attempts to use that mapping even though the 0 address now corresponds with a different device.
Nice spotting. Does the attached patch help?
cheers, Gerd
On Tue, Feb 04, 2014 at 10:01:32AM +0100, Gerd Hoffmann wrote:
I looked through the QEMU hcd-uhci.c code, and I think QEMU is buggy here. QEMU keeps a mapping of queues that are indexed by the usb device address and endpoint (see uhci_queue_new() ). When the usb device has address 0, it creates an entry in this mapping and the entry remains even after the device is given a new address. Later, when the next device also has address 0, QEMU attempts to use that mapping even though the 0 address now corresponds with a different device.
Nice spotting. Does the attached patch help?
I still get the same behavior (usb keyboard hangs 50% of the time) after applying it...
Thanks, --G
From 42568e8e4812df944abcac27adefdf518ae1361e Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann kraxel@redhat.com Date: Tue, 4 Feb 2014 09:57:36 +0100 Subject: [PATCH] uhci: don't cache queues for addr 0 control transfers.
Signed-off-by: Gerd Hoffmann kraxel@redhat.com
hw/usb/hcd-uhci.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c index 238d1d2..11ebb9f 100644 --- a/hw/usb/hcd-uhci.c +++ b/hw/usb/hcd-uhci.c @@ -253,6 +253,10 @@ static bool uhci_queue_verify(UHCIQueue *queue, uint32_t qh_addr, UHCI_TD *td, { UHCIAsync *first = QTAILQ_FIRST(&queue->asyncs);
- if (queue->ep->nr == 0 && queue->ep->dev->addr == 0 &&
queue->ep->dev->setup_state == 0 /* SETUP_STATE_IDLE */)
return false;
- return queue->qh_addr == qh_addr && queue->token == uhci_queue_token(td) && (queuing || !(td->ctrl & TD_CTRL_ACTIVE) || first == NULL ||
-- 1.8.3.1
On Tue, Feb 04, 2014 at 10:01:32AM +0100, Gerd Hoffmann wrote:
Hi,
I looked through the QEMU hcd-uhci.c code, and I think QEMU is buggy here. QEMU keeps a mapping of queues that are indexed by the usb device address and endpoint (see uhci_queue_new() ). When the usb device has address 0, it creates an entry in this mapping and the entry remains even after the device is given a new address. Later, when the next device also has address 0, QEMU attempts to use that mapping even though the 0 address now corresponds with a different device.
Nice spotting. Does the attached patch help?
Doesn't help for me either. It's not hard to reproduce for me - I just run qemu with "-usb -device usb-kbd -device usb-net" and don't run with seabios debugging directed to stdout (as that seems to add some delays which can make things work) - it locks up the keyboard most times for me.
The qemu patch below does fix it for me. I think clearing the cache on reset is probably simpler then trying to verify the cache later on. It would be even better to clear the cache on a set_address command (to cover the admittedly obscure case of an OS changing a device's address later on). But, I'm not familiar enough with the QEMU code to attempt that.
-Kevin
--- a/hw/usb/hcd-uhci.c +++ b/hw/usb/hcd-uhci.c @@ -550,6 +552,13 @@ static void uhci_port_write(void *opaque, hwaddr addr, !(port->ctrl & UHCI_PORT_RESET) ) { usb_device_reset(dev); } + UHCIQueue *queue, *nq; + + QTAILQ_FOREACH_SAFE(queue, &s->queues, next, nq) { + if (queue->token == 0) { + uhci_queue_free(queue, "reset-port"); + } + } } port->ctrl &= UHCI_PORT_READ_ONLY; /* enabled may only be set if a device is connected */
On Tue, Feb 04, 2014 at 12:27:47PM -0500, Kevin O'Connor wrote:
The qemu patch below does fix it for me.
This fixes it for me as well.
Thanks, --G
I think clearing the cache on reset is probably simpler then trying to verify the cache later on. It would be even better to clear the cache on a set_address command (to cover the admittedly obscure case of an OS changing a device's address later on). But, I'm not familiar enough with the QEMU code to attempt that.
-Kevin
--- a/hw/usb/hcd-uhci.c +++ b/hw/usb/hcd-uhci.c @@ -550,6 +552,13 @@ static void uhci_port_write(void *opaque, hwaddr addr, !(port->ctrl & UHCI_PORT_RESET) ) { usb_device_reset(dev); }
UHCIQueue *queue, *nq;
QTAILQ_FOREACH_SAFE(queue, &s->queues, next, nq) {
if (queue->token == 0) {
uhci_queue_free(queue, "reset-port");
}
} } port->ctrl &= UHCI_PORT_READ_ONLY; /* enabled may only be set if a device is connected */
Hi,
It would be even better to clear the cache on a set_address command (to cover the admittedly obscure case of an OS changing a device's address later on).
Additional verification check catches any address change (token address not matching current device address). That fixes the keyboard issue and should also catch the obscure address change case. And also the more likely case of the address flipping back to zero due to a port reset.
cheers, Gerd
On Wed, Feb 05, 2014 at 03:01:41PM +0100, Gerd Hoffmann wrote:
Hi,
It would be even better to clear the cache on a set_address command (to cover the admittedly obscure case of an OS changing a device's address later on).
Additional verification check catches any address change (token address not matching current device address). That fixes the keyboard issue and should also catch the obscure address change case. And also the more likely case of the address flipping back to zero due to a port reset.
Yes. That makes sense and is a better approach.
The patch fixes the issue for me.
-Kevin
On Wed, Feb 05, 2014 at 09:22:37AM -0500, Kevin O'Connor wrote:
On Wed, Feb 05, 2014 at 03:01:41PM +0100, Gerd Hoffmann wrote:
Additional verification check catches any address change (token address not matching current device address). That fixes the keyboard issue and should also catch the obscure address change case. And also the more likely case of the address flipping back to zero due to a port reset.
Yes. That makes sense and is a better approach. The patch fixes the issue for me.
Works here as well -- Thank you !
--Gabriel