On Mon, Jun 19, 2017 at 06:21:09PM +0200, Paolo Bonzini wrote:
The introduction of REPORT LUNS caused a potential regression when many disks are present on the same SCSI target. If LUN0 is processed too late, SeaBIOS might run out of memory and fail to register it.
There are two kinds of problems:
QEMU presents the results of REPORT LUNS in the opposite order to the "-device" command-line options. It's likely that LUN 0 will be first in the command-line, and thus will be processed last by SeaBIOS.
QEMU registers all LUNs in each target first. It's thus possible that LUN 0 of target 1 is processed after many LUNs of target 0 and not be able to allocate memory.
Am I correct that this is not two problems but one: the recently added REPORT LUNS enumeration made more disks visible to SeaBIOS, so that their control structures may not all fit in the limited available memory, and their unfortunate ordering may result in the most relevant drives falling out?
To fix both, the SCSI scans are modified to operate in two phases; LUN 0 is scanned before the REPORT LUNS command is sent. A possible future enhancement is to remember the results of the scan in a 32-byte bitmap, and skip the REPORT LUNS command during the second phase.
When multiple SCSI controllers are found, it's still possible to have a regression because they are scanned in parallel.
I think there's a simpler but more reliable solution. One can safely assume that the operating systems that use BIOS to access hard drives aren't interested in that many LUNs being visible. So we can just skip calling scsi_drive_setup() and return non-zero from the respective add_lun functions if the lun number is above some predefined number (say, 2 or 8) and it doesn't appear on the fw_cfg bootorder list, e.g.
--- a/src/hw/virtio-scsi.c +++ b/src/hw/virtio-scsi.c @@ -114,6 +114,12 @@ virtio_scsi_add_lun(u32 lun, struct drive_s *tmpl_drv) { struct virtio_lun_s *tmpl_vlun = container_of(tmpl_drv, struct virtio_lun_s, drive); + int prio = bootprio_find_scsi_device(tmpl_vlun->pci, tmpl_vlun->target, + lun); + + if (prio < 0 && lun >= SCSI_MAX_NONBOOT_LUNS) + return -1; + struct virtio_lun_s *vlun = malloc_fseg(sizeof(*vlun)); if (!vlun) { warn_noalloc(); @@ -122,7 +128,6 @@ virtio_scsi_add_lun(u32 lun, struct drive_s *tmpl_drv) virtio_scsi_init_lun(vlun, tmpl_vlun->pci, tmpl_vlun->vp, tmpl_vlun->vq, tmpl_vlun->target, lun);
- int prio = bootprio_find_scsi_device(vlun->pci, vlun->target, vlun->lun); int ret = scsi_drive_setup(&vlun->drive, "virtio-scsi", prio); if (ret) goto fail;
Alternatively, one can pass the lun number to scsi_drive_setup and move this logic there.
This way we keep the possibilty to boot off any available lun, but will ignore higher luns otherwise for DOS and friends.
Would it work for your case?
Roman.