-----Original Message----- From: Kevin O'Connor kevin@koconnor.net Sent: 14 April 2021 16:09 To: Gerd Hoffmann kraxel@redhat.com Cc: Thanos Makatos thanos.makatos@nutanix.com; seabios@seabios.org; John Levon john.levon@nutanix.com; Swapnil Ingle swapnil.ingle@nutanix.com; Liu, Changpeng changpeng.liu@intel.com Subject: Re: [SeaBIOS] SeaBIOS fails to boot from NVMe controller with lots of namespaces
On Thu, Apr 08, 2021 at 01:32:47PM +0200, Gerd Hoffmann wrote:
I changed the number of namespaces my controller reports to 1 and it worked fine. Is there an easy way to get around this or do I have to fix the code? I haven't looked at the code in detail, but I think we don't have to allocate the array of namespaces in nvme_controller_enable; instead, we can probe a namespace right before we attempt to boot from it (not sure where
exactly this is done).
Well, you can try skip non-bootable namespaces and use "qemu -boot strict=on". It happens on nvme controller level already (see nvme_controller_setup()).
AFAIK this applies to the entire controller, not individual namespaces.
Current code, yes, but you can change the driver to do the same on namespace level (simliar to how virtio-scsi skips non-bootable disks) ...
Easy way out without actual code changes would be to use two nvme controllers, one for the boot disk, one for all others, set bootindex for the boot disk only (and use strict=on of course). seabios should completely ignore the second nvme controller then.
That's not option for me as this will be a customer VM so we don't know
on which NS the OS will be installed.
... except that it doesn't help much if you don't know which NS the OS is installed on.
In another email I said that increasing BUILD_MIN_BIOSTABLE by 8x
solves the problem, is there a problem with this solution?
redhat increases BUILD_MIN_BIOSTABLE too (4x only though).
I think this was discussed before but google doesn't find me anything so not sure why BUILD_MIN_BIOSTABLE hasn't been increased upstream. Maybe it doesn't work with some configurations due to running out of address space. Should that be the case a config option could be a way out.
I don't recall discussing BUILD_MIN_BIOSTABLE and there isn't history of it on the mailing list.
I get the following on the current build:
Fixed space: 0xe05b-0x10000 total: 8101 slack: 10 Percent slack: 0.1% 16bit size: 38048 32bit segmented size: 2292 32bit flat size: 46540 32bit flat init size: 84576 Lowmem size: 2240 f-segment var size: 1248 ... Total size: 181248 Fixed: 88128 Free: 80896 (used 69.1% of 256KiB rom)
The main constraint is that the "16bit", "32bit segmented", "f-segment var", and biostable sections must fit in the f-segment (~42K today). The secondary constraint is on option rom space - the above plus "32bit flat", "lowmem", any dynamic low memory allocations, and option roms must all fit in 256KiB (starting at ~90K today). The third constraint is that the image as a whole (including biostable space) must fit in 256KiB (~181K today). Finally, there is the "corner case" where if someone does not select RELOCATE_INIT, then everything (including dynamic memory and option roms) must fit in 256KiB.
So, it does seem like we have space available to increase BUILD_MIN_BIOSTABLE. The challenge is mostly on managing risks wrt to other failure cases.
Regarding the failure cases, will things break during build (BUILD_MIN_BIOSTABLE=16K), e.g:
[seabios] Error! ROM doesn't fit (135584 > 131072) [seabios] You have to either increase the size (CONFIG_ROM_SIZE) [seabios] or turn off some features (such as hardware support not [seabios] needed) to make it fit. Trying a more recent gcc version [seabios] might work too. [seabios] make: *** [out/bios.bin.prep] Error 1
Or do we expect undefined behavior at run time?
Also, according to the NVMe spec there can be 2^32 namespaces, which is a lot. I did the following tests:
BUILD_MIN_BIOSTABLE namespaces works? 16K 128 yes 16K 256 no 32K 256 yes 32K 512 no 64K 512 guest goes into reset loop
256 namespaces is not an insanely huge number.