From: Thanos Makatos thanos@nutanix.com
The current implementation keeps probing NVMe namespaces even if an active one has been found. This is unnecessary and most importantly results in memory allocation failure if the are more than a few dozens of nmespaces. Also, not probing the remaining namespaces after having found an active one reduces boot time.
I've tested with 1024 namespaces where 512 where active and it worked fined.
Signed-off-by: Thanos Makatos thanos@nutanix.com --- src/hw/nvme.c | 51 +++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 39 insertions(+), 12 deletions(-)
diff --git a/src/hw/nvme.c b/src/hw/nvme.c index 746e468..73c1a52 100644 --- a/src/hw/nvme.c +++ b/src/hw/nvme.c @@ -233,9 +233,22 @@ nvme_admin_identify_ns(struct nvme_ctrl *ctrl, u32 ns_id) ns_id)->ns; }
-static void +static char* +nvme_ns_desc(const struct nvme_namespace *ns, u32 ns_id) +{ + return znprintf(MAXDESCSIZE, "NVMe NS %u: %llu MiB (%llu %u-byte " + "blocks + %u-byte metadata)\n", + ns_id, (ns->lba_count * ns->block_size) >> 20, + ns->lba_count, ns->block_size, + ns->metadata_size); +} + +/* Returns a pointer to the namespace if it's usable, NULL otherwise. */ +struct nvme_namespace* nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 mdts) { + struct nvme_namespace *ns = NULL; + u32 ns_id = ns_idx + 1;
struct nvme_identify_ns *id = nvme_admin_identify_ns(ctrl, ns_id); @@ -257,7 +270,7 @@ nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 mdts) goto free_buffer; }
- struct nvme_namespace *ns = malloc_fseg(sizeof(*ns)); + ns = malloc_fseg(sizeof(*ns)); if (!ns) { warn_noalloc(); goto free_buffer; @@ -277,6 +290,7 @@ nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 mdts) buffer size. */ warn_internalerror(); free(ns); + ns = NULL; goto free_buffer; }
@@ -296,16 +310,13 @@ nvme_probe_ns(struct nvme_ctrl *ctrl, u32 ns_idx, u8 mdts)
ns->dma_buffer = zalloc_page_aligned(&ZoneHigh, NVME_PAGE_SIZE);
- char *desc = znprintf(MAXDESCSIZE, "NVMe NS %u: %llu MiB (%llu %u-byte " - "blocks + %u-byte metadata)", - ns_id, (ns->lba_count * ns->block_size) >> 20, - ns->lba_count, ns->block_size, ns->metadata_size); - - dprintf(3, "%s\n", desc); - boot_add_hd(&ns->drive, desc, bootprio_find_pci_device(ctrl->pci)); + char *desc = nvme_ns_desc(ns, ns_id); + dprintf(3, "%s", desc); + free(desc);
free_buffer: free (id); + return ns; }
@@ -560,6 +571,13 @@ nvme_wait_csts_rdy(struct nvme_ctrl *ctrl, unsigned rdy) return 0; }
+static void +nvme_destroy_io_queues(struct nvme_ctrl *ctrl) +{ + nvme_destroy_sq(&ctrl->io_sq); + nvme_destroy_cq(&ctrl->io_cq); +} + /* Returns 0 on success. */ static int nvme_controller_enable(struct nvme_ctrl *ctrl) @@ -628,15 +646,24 @@ nvme_controller_enable(struct nvme_ctrl *ctrl) goto err_destroy_admin_sq; }
- /* Populate namespace IDs */ + /* Find first active namespace. */ int ns_idx; - for (ns_idx = 0; ns_idx < ctrl->ns_count; ns_idx++) { - nvme_probe_ns(ctrl, ns_idx, identify->mdts); + struct nvme_namespace *ns = NULL; + for (ns_idx = 0; ns_idx < ctrl->ns_count && !ns; ns_idx++) { + ns = nvme_probe_ns(ctrl, ns_idx, identify->mdts); + } + if (!ns) { + dprintf(3, "no active NVMe namespace\n"); + goto err_destroy_ioq; } + boot_add_hd(&ns->drive, nvme_ns_desc(ns, ns_idx + 1), + bootprio_find_pci_device(ctrl->pci));
dprintf(3, "NVMe initialization complete!\n"); return 0;
+ err_destroy_ioq: + nvme_destroy_io_queues(ctrl); err_destroy_admin_sq: nvme_destroy_sq(&ctrl->admin_sq); err_destroy_admin_cq:
On Thu, Jul 01, 2021 at 02:20:47PM +0000, Thanos Makatos wrote:
From: Thanos Makatos thanos@nutanix.com
The current implementation keeps probing NVMe namespaces even if an active one has been found. This is unnecessary and most importantly results in memory allocation failure if the are more than a few dozens of nmespaces. Also, not probing the remaining namespaces after having found an active one reduces boot time.
[ just back from vacation ]
Well, assuming the first namespace is the one you want actually boot from. Which I expect is the common case, but who knows how people use their machines ...
IIRC there is some work in progress to add bootorder support for namespaces, where do we stand with that? Once we have this we should be able to extend the skip_nonbootable logic to handle not only controllers but also namespaces. With that we should be able to initialize the correct boot namespaces instead of picking the first namespace.
take care, Gerd
-----Original Message----- From: Gerd Hoffmann kraxel@redhat.com Sent: 21 July 2021 15:57 To: Thanos Makatos thanos.makatos@nutanix.com Cc: seabios@seabios.org; Thanos Makatos thanos.makatos@nutanix.com Subject: Re: [PATCH] nvme: don't keep probing namespaces after an active one has been found
On Thu, Jul 01, 2021 at 02:20:47PM +0000, Thanos Makatos wrote:
From: Thanos Makatos thanos@nutanix.com
The current implementation keeps probing NVMe namespaces even if an active one has been found. This is unnecessary and most importantly results in memory allocation failure if the are more than a few dozens of nmespaces. Also, not probing the remaining namespaces after having found an active one reduces boot time.
[ just back from vacation ]
Well, assuming the first namespace is the one you want actually boot from. Which I expect is the common case, but who knows how people use their machines ...
IIRC there is some work in progress to add bootorder support for namespaces, where do we stand with that?
IIUC I got the go-ahead to implement this in QEMU, so let's see how that goes. Then we can look at the SeaBIOS patch I suppose.
Once we have this we should be able to extend the skip_nonbootable logic to handle not only controllers but also namespaces. With that we should be able to initialize the correct boot namespaces instead of picking the first namespace.
I would expect that selecting a particular namespace (which will be enabled by the QEMU patch) will be optional. Therefore, we might still have to figure out from which namespace to boot from, so we'll be in the same situation as we are now (an NVMe controller with lots of namespaces). So, I think the patch I propose is still useful?
[ just back from vacation ]
Well, assuming the first namespace is the one you want actually boot from. Which I expect is the common case, but who knows how people use their machines ...
IIRC there is some work in progress to add bootorder support for namespaces, where do we stand with that?
IIUC I got the go-ahead to implement this in QEMU, so let's see how that goes. Then we can look at the SeaBIOS patch I suppose.
Once we have this we should be able to extend the skip_nonbootable logic to handle not only controllers but also namespaces. With that we should be able to initialize the correct boot namespaces instead of picking the first namespace.
I would expect that selecting a particular namespace (which will be enabled by the QEMU patch) will be optional. Therefore, we might still have to figure out from which namespace to boot from, so we'll be in the same situation as we are now (an NVMe controller with lots of namespaces). So, I think the patch I propose is still useful?
Yes, when bootorder doesn't specify one (or more) specific namespace(s) picking the first is fine.
The patch seems to do some unneeded / unrelated changes though. Why do you move the boot_add() call? Why the desc changes? Why return the namespace? Just returning a int or bool telling whenever the namespace is active or not should be enough, no?
take care, Gerd
The patch seems to do some unneeded / unrelated changes though. Why do you move the boot_add() call? Why the desc changes? Why return the namespace? Just returning a int or bool telling whenever the namespace is active or not should be enough, no?
You're right, these changes are unnecessary, let me fix that.