Hi Paolo & Co,
So I've been running into another bug with SeaBIOS w/ virtio-scsi +
vhost-scsi-pci code..
During the last weeks debugging on the QEMU vhost memory re-mapping bug,
my seabios builds have been using a smaller target ID (16) for scanning
than the default hard coded value of 256 in init_virtio_scsi().
After bumping this value back to 256 recently, a bogus out_num value
from vhost_get_vq_desc() start appearing on the tcm_vhost side:
[17348.044031] Invalid buffer layout out: 0 in: 2
After adding more debug output to SeaBIOS, the vring->num value appears
to be reset from the expected MAX_QUEUE_NUM=128, then incremented each
time after vring_more_used() is called to poll for completion..
init virtio-scsi
found virtio-scsi at 0:4
vring_init Setting vr->num: 128
Calling virtio_scsi_scan_target for target: 0
Searching bootorder for: /pci@i0cf8/*@4/*@0/*@0,0
virtio_scsi_cmd, using vring_add_buf for out_num: 1 in_num: 2
virtio_scsi_cmd, using vring_add_buf vr: 0x000ed404 vr->num: 128
vring_add_buf, vr: 0x000ed404 vr->num: 128
vring_add_buf, head: 0, index: 0
vring_add_buf, setting desc[prev=2].flags
vring_add_buf, setting free_head: 3
vring_add_buf, setting q->vdata[head=0]
vring_add_buf, setting q->ring[av=0]
virtio_scsi_cmd, after vring_add_buf vr: 0x000ed404 vr->num: 128
virtio_scsi_cmd, after vring_kick vr: 0x000ed404 vr->num: 128
virtio_scsi_cmd, after vring_more_used #4 vr: 0x000ed404 vr->num: 1
^^^^^^^^^^^^^^^^^^^^^^^^^
Reset from 128 to 1 immediate after first vring_more_used()..?
vring_get_buf last_used_idx: 0 vr: 0x000ed404 vr->num: 1
vring_get_buf checking &used->ring[0]
vring_get_buf, using new last_unsed: 0 >>>>>>>>>>>>>>>>>>
vring_get_buf using id: 0 from elem->id: 0x000ed004
vring_detach: vq->free_head: 3
vring_detach: Setting new vq->free_head: 0
virtio_scsi_cmd, after vring_get_buf #5 vr: 0x000ed404 vr->num: 1
Calling virtio_scsi_scan_target for target: 1
Searching bootorder for: /pci@i0cf8/*@4/*@0/*@1,0
virtio_scsi_cmd, using vring_add_buf for out_num: 1 in_num: 2
virtio_scsi_cmd, using vring_add_buf vr: 0x000ed404 vr->num: 1
vring_add_buf, vr: 0x000ed404 vr->num: 1
vring_add_buf, head: 0, index: 0
vring_add_buf, setting desc[prev=2].flags
vring_add_buf, setting free_head: 3
vring_add_buf, setting q->vdata[head=0]
vring_add_buf, setting q->ring[av=0]
virtio_scsi_cmd, after vring_add_buf vr: 0x000ed404 vr->num: 1
virtio_scsi_cmd, after vring_kick vr: 0x000ed404 vr->num: 1
virtio_scsi_cmd, after vring_more_used #4 vr: 0x000ed404 vr->num: 2
^^^^^^^^^^^^^^^^^^^^^^^^^
Incremented from 1 to 2 after second vring_more_used()..?
vring_get_buf last_used_idx: 1 vr: 0x000ed404 vr->num: 2
vring_get_buf checking &used->ring[1]
vring_get_buf, using new last_unsed: 1 >>>>>>>>>>>>>>>>>>
vring_get_buf using id: 0 from elem->id: 0x000ed00c
vring_detach: vq->free_head: 3
vring_detach: Setting new vq->free_head: 0
virtio_scsi_cmd, after vring_get_buf #5 vr: 0x000ed404 vr->num: 2
virtio-scsi vendor='LIO-ORG' product='RAMDISK-MCP' rev='4.0' type=0 removable=0
virtio_scsi_cmd, using vring_add_buf for out_num: 1 in_num: 1
virtio_scsi_cmd, using vring_add_buf vr: 0x000ed404 vr->num: 2
vring_add_buf, vr: 0x000ed404 vr->num: 2
vring_add_buf, head: 0, index: 0
vring_add_buf, setting desc[prev=1].flags
vring_add_buf, setting free_head: 2
vring_add_buf, setting q->vdata[head=0]
vring_add_buf, setting q->ring[av=0]
virtio_scsi_cmd, after vring_add_buf vr: 0x000ed404 vr->num: 2
virtio_scsi_cmd, after vring_kick vr: 0x000ed404 vr->num: 2
virtio_scsi_cmd, after vring_more_used #4 vr: 0x000ed404 vr->num: 3
^^^^^^^^^^^^^^^^^^^^^^^^^
Incremented from 2 to 3, and so on..
and this keeps happening until vr->num begins to exceed
MAX_QUEUE_NUM around scanning of target ID 127:
virtio_scsi_cmd, after vring_add_buf vr: 0x000ed404 vr->num: 128
virtio_scsi_cmd, after vring_kick vr: 0x000ed404 vr->num: 128
virtio_scsi_cmd, after vring_more_used #4 vr: 0x000ed404 vr->num: 129
vring_get_buf last_used_idx: 128 vr: 0x000ed404 vr->num: 129
vring_get_buf checking &used->ring[128]
vring_get_buf, using new last_unsed: 128 >>>>>>>>>>>>>>>>>>
vring_get_buf using id: 129 from elem->id: 0x000ed404
vring_detach: vq->free_head: 3
vring_detach: Setting new vq->free_head: 129
virtio_scsi_cmd, after vring_get_buf #5 vr: 0x000ed404 vr->num: 129
Calling virtio_scsi_scan_target for target: 127
Searching bootorder for: /pci@i0cf8/*@4/*@0/*@127,0
virtio_scsi_cmd, using vring_add_buf for out_num: 1 in_num: 2
virtio_scsi_cmd, using vring_add_buf vr: 0x000ed404 vr->num: 129
vring_add_buf, vr: 0x000ed404 vr->num: 129
vring_add_buf, head: 129, index: 0
vring_add_buf, setting desc[prev=4].flags
vring_add_buf, setting free_head: 5
vring_add_buf, setting q->vdata[head=0]
vring_add_buf, setting q->ring[av=129]
virtio_scsi_cmd, after vring_add_buf vr: 0x000ed404 vr->num: 129
virtio_scsi_cmd, after vring_kick vr: 0x000ed404 vr->num: 129
virtio_scsi_cmd, after vring_more_used #4 vr: 0x000ed404 vr->num: 130
vring_get_buf last_used_idx: 129 vr: 0x000ed404 vr->num: 130
vring_get_buf checking &used->ring[129]
vring_get_buf, using new last_unsed: 129 >>>>>>>>>>>>>>>>>>
vring_get_buf using id: 968704 from elem->id: 0x000ed40c
vring_detach: vq->free_head: 5
vring_detach: Setting new vq->free_head: 968704
virtio_scsi_cmd, after vring_get_buf #5 vr: 0x000ed404 vr->num: 130
Calling virtio_scsi_scan_target for target: 128
Searching bootorder for: /pci@i0cf8/*@4/*@0/*@128,0
virtio_scsi_cmd, using vring_add_buf for out_num: 1 in_num: 2
virtio_scsi_cmd, using vring_add_buf vr: 0x000ed404 vr->num: 130
vring_add_buf, vr: 0x000ed404 vr->num: 130
vring_add_buf, head: 51200, index: 0
vring_add_buf, setting desc[prev=1].flags
vring_add_buf, setting free_head: 2
vring_add_buf, setting q->vdata[head=0]
vring_add_buf, setting q->ring[av=51200]
until '[17348.044031] Invalid buffer layout out: 0 in: 2' appears on the
vhost.
So it's clear this definitely is not supposed to be happening, but I've
not yet been able to ascertain why it's suddenly being triggered now..?
I don't recall Asias running into a similar issue during his earlier
testing, but this bug is happening after every I/O in SeaBIOS with the
most recent vhost-scsi-pci code.
I'm using the following to work-around this issue for the moment, and
would like some input from virtio / SeaBIOS folks..
WDYT..?
--nab
diff --git a/src/virtio-ring.c b/src/virtio-ring.c
index bddb090..e046141 100644
--- a/src/virtio-ring.c
+++ b/src/virtio-ring.c
@@ -35,10 +35,16 @@
int vring_more_used(struct vring_virtqueue *vq)
{
+ struct vring *vr = &vq->vring;
struct vring_used *used = GET_LOWFLAT(vq->vring.used);
int more = GET_LOWFLAT(vq->last_used_idx) != GET_LOWFLAT(used->idx);
/* Make sure ring reads are done after idx read above. */
smp_rmb();
+
+ /* Huh..? vr->num is somehow changing from below us..? */
+ if (GET_LOWFLAT(vr->num) != MAX_QUEUE_NUM) {
+ SET_LOWFLAT(vr->num, MAX_QUEUE_NUM);
+ }
return more;
}