On 10/16/19 1:32 PM, Gerd Hoffmann wrote:
On Tue, Oct 01, 2019 at 10:25:00AM +0300, Denis Plotnikov wrote:
Some linux kernels has a performance flaw in virtio block device access. On some frequent disk access patterns, e.g. 1M read, the kernel produces more block requests than needed. This happens because of virtio seg_max parameter set to 126 (virtqueue_size - 2) which limits the maximum block request to 516096 (126 * 4096_PAGE_SIZE).
Setting seg_max > 126 fixes the issue, however, not all linux kernels allow that. The old ones have a restriction virtqueue_size >= seg_max. This restriction is hardcoded and the kernel crashes in case of violation. The restriction is relaxed in the recent kernels. Windows kernels don't have such a restriction.
To increse seg_max and not to break the restriction, one can increase the virtqueue size to 256 and seg_max to 254. To do that, seabios support of 256 virtqueue size is needed.
--verbose please. Do you talk about guest kernels? Or host kernels?
We are talking about guest kernels. Host ones are not a problem :) We could easily query them. The problem in the guest is fixed with
commit 44ed8089e991a60d614abe0ee4b9057a28b364e4 Author: Richard W.M. Jones address@hidden Date: Thu Aug 10 17:56:51 2017 +0100 scsi: virtio: Reduce BUG if total_sg > virtqueue size to WARN. If using indirect descriptors, you can make the total_sg as large as you want. If not, BUG is too serious because the function later returns -ENOSPC
Without that patch the guest crashes if we get the amount of segments in the request more than the virt-queue length even if indirect requests are used.
Why does changing seabios fix the kernel bug?
Changing SeaBIOS does not fix guest kernel bug :) but allows to dodge it.
Linux guests submit IO requests no longer than PAGE_SIZE * max_seg field reported by SCSI controller. Thus typical sequential read with 1 MB size results in the following pattern of the IO from the guest: 8,16 1 15754 2.766095122 2071 D R 2095104 + 1008 [dd] 8,16 1 15755 2.766108785 2071 D R 2096112 + 1008 [dd] 8,16 1 15756 2.766113486 2071 D R 2097120 + 32 [dd] 8,16 1 15757 2.767668961 0 C R 2095104 + 1008 [0] 8,16 1 15758 2.768534315 0 C R 2096112 + 1008 [0] 8,16 1 15759 2.768539782 0 C R 2097120 + 32 [0] The IO was generated by dd if=/dev/sda of=/dev/null bs=1024 iflag=direct
So, in order to fix this and observe normal 8,16 1 9921 2.662721340 2063 D R 2095104 + 1024 [dd] 8,16 1 9922 2.662737585 2063 D R 2096128 + 1024 [dd] 8,16 1 9923 2.665188167 0 C R 2095104 + 1024 [0] 8,16 1 9924 2.665198777 0 C R 2096128 + 1024 [0] we need to have max_segments to be not less than 128.
Though his could be achieved only when virt_queue size is > 128 due to the guest bug listed above. We were surrendered to invent any usable "guest detect" code. Support of VIRTIO 1.0 is not the indicator. The bug is triggered in a lot of popular guest distros like RHEL 7.
We have solved the puzzle setting virt-queue size to 256 locally in our machine types. And here we come to SeaBIOS assert! It is triggered once the queue size is set above 128.
Is this just a performance issue (first paragraph sounds like it is, and that would not be that much of a problem IMO given that virtio-blk isn't tweaked for performance anyway)? Or something more serious?
The patch sent by Denis does not change anything inside SeaBIOS but it allows to accept lengthier queue, specified inside the QEMU. That is all.
Hope this is verbose enough :)
Den