On Wed, 2020-11-04 at 23:27 +0000, Graf (AWS), Alexander wrote:
> > Am 31.10.2020 um 00:49 schrieb David Woodhouse <dwmw2(a)infradead.org>:
> >
> > From: David woodhouse <dwmw(a)amazon.co.uk>
> >
> > This ended up with an odd mix of recursion (albeit *mostly* tail-recursion)
> > and interation that could have been prettier. Make it prettier by making
> > nvme_build_prpl() return the number of blocks it consumes, and just
> > looping.
> >
> > If nvme_build_prpl() doesn't want to play, just fall through to the
> > original loop.
> >
> > Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
>
> Please don't merge just yet. I think we found a bug with this patch,
> but need to find out where exactly.
Found both of them :)
> > @@ -725,34 +726,28 @@ nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int write)
> > {
> > int res = DISK_RET_SUCCESS;
> > u16 const max_blocks = NVME_PAGE_SIZE / ns->block_size;
> > - u16 i;
> > + u16 i, blocks;
> >
> > - /* Split up requests that are larger than the device can handle */
> > - if (op->count > ns->max_req_size) {
> > - u16 count = op->count;
> > + while (op->count && ((blocks = nvme_build_prpl(ns, op)) > 0)) {
1. If nvme_build_prpl() returns -1 that becomes (u16)0xFFFF and that's >0.
> > + res = nvme_io_readwrite(ns, op->lba, ns->prp1, blocks, write);
> > + dprintf(5, "ns %u %s lba %llu+%u: %d\n", ns->ns_id, write ? "write"
> > + : "read",
> > + op->lba, blocks, res);
> >
> > - /* Handle the first max_req_size elements */
> > - op->count = ns->max_req_size;
> > - if (nvme_cmd_readwrite(ns, op, write))
> > + if (res != DISK_RET_SUCCESS)
> > return res;
> >
> > - /* Handle the remainder of the request */
> > - op->count = count - ns->max_req_size;
> > - op->lba += ns->max_req_size;
> > - op->buf_fl += (ns->max_req_size * ns->block_size);
2. We can't do this. Callers expect op->count to contain the number of
blocks of I/O actually performed...
> > - return nvme_cmd_readwrite(ns, op, write);
> > - }
> > -
> > - if (!nvme_build_prpl(ns, op)) {
> > - /* Request goes via PRP List logic */
> > - return nvme_io_readwrite(ns, op->lba, ns->prp1, op->count, write);
> > + op->count -= blocks;
> > + op->lba += blocks;
> > + op->buf_fl += (blocks * ns->block_size);
... which means that when my cleanup made it happen unconditionally
instead of only in the case where the max size had been exceeded
and it actually had to recurse, it started showing up in the test
runs. But commit 94f0510dc has the problem already.
Running another full test set now, will post the fixed patch when it
completes.