The first patch fixes a regression where many disks on a single SCSI target could prevent LUN0 from booting on another target.
The second patch then exploits the new scanning algorithm to speed up the bus scan on virtio-scsi, which has a higher limit for the target than the other adaptors and therefore was slowed down the most.
Paolo Bonzini (2): scsi: ensure LUN0 is added first virtio-scsi: speed up SCSI bus scan
src/hw/blockcmd.c | 6 ++++-- src/hw/esp-scsi.c | 10 ++++++++-- src/hw/lsi-scsi.c | 10 ++++++++-- src/hw/mpt-scsi.c | 10 ++++++++-- src/hw/usb-uas.c | 1 + src/hw/virtio-scsi.c | 46 ++++++++++++++++++++++++++++++++++++++-------- 6 files changed, 67 insertions(+), 16 deletions(-)
The introduction of REPORT LUNS caused a potential regression when many disks are present on the same SCSI target. If LUN0 is processed too late, SeaBIOS might run out of memory and fail to register it.
There are two kinds of problems:
- QEMU presents the results of REPORT LUNS in the opposite order to the "-device" command-line options. It's likely that LUN 0 will be first in the command-line, and thus will be processed last by SeaBIOS.
- QEMU registers all LUNs in each target first. It's thus possible that LUN 0 of target 1 is processed after many LUNs of target 0 and not be able to allocate memory.
To fix both, the SCSI scans are modified to operate in two phases; LUN 0 is scanned before the REPORT LUNS command is sent. A possible future enhancement is to remember the results of the scan in a 32-byte bitmap, and skip the REPORT LUNS command during the second phase.
When multiple SCSI controllers are found, it's still possible to have a regression because they are scanned in parallel.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- src/hw/blockcmd.c | 4 ++-- src/hw/esp-scsi.c | 10 ++++++++-- src/hw/lsi-scsi.c | 10 ++++++++-- src/hw/mpt-scsi.c | 10 ++++++++-- src/hw/usb-uas.c | 1 + src/hw/virtio-scsi.c | 13 +++++++++---- 6 files changed, 36 insertions(+), 12 deletions(-)
diff --git a/src/hw/blockcmd.c b/src/hw/blockcmd.c index 324188d..af9cd4d 100644 --- a/src/hw/blockcmd.c +++ b/src/hw/blockcmd.c @@ -254,7 +254,7 @@ int scsi_rep_luns_scan(struct drive_s *tmp_drive, scsi_add_lun add_lun)
for (i = 0, ret = 0; i < nluns; i++) { u64 lun = scsilun2u64(&resp->luns[i]); - if (lun >> 32) + if (lun >> 32 || !lun) continue; ret += !add_lun((u32)lun, tmp_drive); } @@ -270,7 +270,7 @@ int scsi_sequential_scan(struct drive_s *tmp_drive, u32 maxluns, int ret; u32 lun;
- for (lun = 0, ret = 0; lun < maxluns; lun++) + for (lun = 1, ret = 0; lun < maxluns; lun++) ret += !add_lun(lun, tmp_drive); return ret; } diff --git a/src/hw/esp-scsi.c b/src/hw/esp-scsi.c index 57d3832..a2f6cbc 100644 --- a/src/hw/esp-scsi.c +++ b/src/hw/esp-scsi.c @@ -194,11 +194,15 @@ fail: }
static void -esp_scsi_scan_target(struct pci_device *pci, u32 iobase, u8 target) +esp_scsi_scan_target(struct pci_device *pci, u32 iobase, u8 target, u8 report_luns) { struct esp_lun_s llun0;
esp_scsi_init_lun(&llun0, pci, iobase, target, 0); + if (!report_luns) { + esp_scsi_add_lun(0, &llun0.drive); + return; + }
scsi_rep_luns_scan(&llun0.drive, esp_scsi_add_lun); } @@ -219,7 +223,9 @@ init_esp_scsi(void *data)
int i; for (i = 0; i <= 7; i++) - esp_scsi_scan_target(pci, iobase, i); + esp_scsi_scan_target(pci, iobase, i, 0); + for (i = 0; i <= 7; i++) + esp_scsi_scan_target(pci, iobase, i, 1); }
void diff --git a/src/hw/lsi-scsi.c b/src/hw/lsi-scsi.c index 846bb0b..bf2ff76 100644 --- a/src/hw/lsi-scsi.c +++ b/src/hw/lsi-scsi.c @@ -175,11 +175,15 @@ fail: }
static void -lsi_scsi_scan_target(struct pci_device *pci, u32 iobase, u8 target) +lsi_scsi_scan_target(struct pci_device *pci, u32 iobase, u8 target, u8 report_luns) { struct lsi_lun_s llun0;
lsi_scsi_init_lun(&llun0, pci, iobase, target, 0); + if (!report_luns) { + lsi_scsi_add_lun(0, &llun0.drive); + return; + }
if (scsi_rep_luns_scan(&llun0.drive, lsi_scsi_add_lun) < 0) scsi_sequential_scan(&llun0.drive, 8, lsi_scsi_add_lun); @@ -201,7 +205,9 @@ init_lsi_scsi(void *data)
int i; for (i = 0; i < 7; i++) - lsi_scsi_scan_target(pci, iobase, i); + lsi_scsi_scan_target(pci, iobase, i, 0); + for (i = 0; i < 7; i++) + lsi_scsi_scan_target(pci, iobase, i, 1); }
void diff --git a/src/hw/mpt-scsi.c b/src/hw/mpt-scsi.c index 80c6d6b..5a412aa 100644 --- a/src/hw/mpt-scsi.c +++ b/src/hw/mpt-scsi.c @@ -237,11 +237,15 @@ fail: }
static void -mpt_scsi_scan_target(struct pci_device *pci, u32 iobase, u8 target) +mpt_scsi_scan_target(struct pci_device *pci, u32 iobase, u8 target, u8 report_luns) { struct mpt_lun_s llun0;
mpt_scsi_init_lun(&llun0, pci, iobase, target, 0); + if (!report_luns) { + mpt_scsi_add_lun(0, &llun0.drive); + return; + }
if (scsi_rep_luns_scan(&llun0.drive, mpt_scsi_add_lun) < 0) scsi_sequential_scan(&llun0.drive, 8, mpt_scsi_add_lun); @@ -295,7 +299,9 @@ init_mpt_scsi(void *data)
int i; for (i = 0; i < 7; i++) - mpt_scsi_scan_target(pci, iobase, i); + mpt_scsi_scan_target(pci, iobase, i, 0); + for (i = 0; i < 7; i++) + mpt_scsi_scan_target(pci, iobase, i, 1); }
void diff --git a/src/hw/usb-uas.c b/src/hw/usb-uas.c index f00221a..f2eb049 100644 --- a/src/hw/usb-uas.c +++ b/src/hw/usb-uas.c @@ -272,6 +272,7 @@ usb_uas_setup(struct usbdevice_s *usbdev)
struct uasdrive_s lun0; uas_init_lun(&lun0, usbdev, command, status, data_in, data_out, 0); + uas_add_lun(0, &lun0.drive); int ret = scsi_rep_luns_scan(&lun0.drive, uas_add_lun); if (ret <= 0) { dprintf(1, "Unable to configure UAS drive.\n"); diff --git a/src/hw/virtio-scsi.c b/src/hw/virtio-scsi.c index 7490ec0..3e8f13e 100644 --- a/src/hw/virtio-scsi.c +++ b/src/hw/virtio-scsi.c @@ -135,12 +135,14 @@ fail:
static int virtio_scsi_scan_target(struct pci_device *pci, struct vp_device *vp, - struct vring_virtqueue *vq, u16 target) + struct vring_virtqueue *vq, u16 target, u8 report_luns) {
struct virtio_lun_s vlun0;
virtio_scsi_init_lun(&vlun0, pci, vp, vq, target, 0); + if (!report_luns) + return !virtio_scsi_add_lun(0, &vlun0.drive);
int ret = scsi_rep_luns_scan(&vlun0.drive, virtio_scsi_add_lun); return ret < 0 ? 0 : ret; @@ -185,9 +187,12 @@ init_virtio_scsi(void *data) status |= VIRTIO_CONFIG_S_DRIVER_OK; vp_set_status(vp, status);
- int i, tot; - for (tot = 0, i = 0; i < 256; i++) - tot += virtio_scsi_scan_target(pci, vp, vq, i); + int tot = 0; + int i; + for (i = 0; i < 256; i++) + tot += virtio_scsi_scan_target(pci, vp, vq, i, 0); + for (i = 0; i < 256; i++) + tot += virtio_scsi_scan_target(pci, vp, vq, i, 1);
if (!tot) goto fail;
The introduction of REPORT LUNS caused virtio-scsi to send two SCSI commands (INQUIRY and REPORT LUNS) to each target rather than just one.
However, INQUIRY will only fail if _no_ LUN is present on that target. If all devices have LUN>0 is present, INQUIRY will report a dummy device. Therefore, the LUN0 scan can be used to prepare a compact bitmap of targets that we have to scan, and omit sending REPORT LUNS to the others.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- src/hw/blockcmd.c | 2 ++ src/hw/virtio-scsi.c | 45 +++++++++++++++++++++++++++++++++++---------- 2 files changed, 37 insertions(+), 10 deletions(-)
diff --git a/src/hw/blockcmd.c b/src/hw/blockcmd.c index af9cd4d..32d8d68 100644 --- a/src/hw/blockcmd.c +++ b/src/hw/blockcmd.c @@ -276,6 +276,8 @@ int scsi_sequential_scan(struct drive_s *tmp_drive, u32 maxluns, }
// Validate drive, find block size / sector count, and register drive. +// Return positive value if disk error, 0 if success, negative if +// not a disk or the disk parameters are invalid. int scsi_drive_setup(struct drive_s *drive, const char *s, int prio) { diff --git a/src/hw/virtio-scsi.c b/src/hw/virtio-scsi.c index 3e8f13e..ad09701 100644 --- a/src/hw/virtio-scsi.c +++ b/src/hw/virtio-scsi.c @@ -130,22 +130,40 @@ virtio_scsi_add_lun(u32 lun, struct drive_s *tmpl_drv)
fail: free(vlun); - return -1; + return ret; }
static int virtio_scsi_scan_target(struct pci_device *pci, struct vp_device *vp, - struct vring_virtqueue *vq, u16 target, u8 report_luns) + struct vring_virtqueue *vq, u16 target, u32 *targets, + u8 report_luns) {
struct virtio_lun_s vlun0;
virtio_scsi_init_lun(&vlun0, pci, vp, vq, target, 0); - if (!report_luns) - return !virtio_scsi_add_lun(0, &vlun0.drive); - - int ret = scsi_rep_luns_scan(&vlun0.drive, virtio_scsi_add_lun); - return ret < 0 ? 0 : ret; + dprintf(3, "scanning target %d, pass %d\n", target, report_luns); + if (!report_luns) { + int ret = virtio_scsi_add_lun(0, &vlun0.drive); + if (ret > 0) { + /* Target error */ + return 0; + } + /* Found a SCSI device (maybe not a disk if ret < 0). Search higher + * LUNs in the second pass. + */ + dprintf(1, "will scan target %d\n", target); + targets[target >> 5] |= 1 << (target & 31); + return !ret; + } else { + int ret = 0; + if (targets[target >> 5] & (1 << (target & 31))) { + ret = scsi_rep_luns_scan(&vlun0.drive, virtio_scsi_add_lun); + if (ret < 0) + ret = 0; + } + return ret; + } }
static void @@ -188,11 +206,18 @@ init_virtio_scsi(void *data) vp_set_status(vp, status);
int tot = 0; + u32 targets[256 / 32]; int i; + memset(targets, 0, sizeof(targets)); for (i = 0; i < 256; i++) - tot += virtio_scsi_scan_target(pci, vp, vq, i, 0); - for (i = 0; i < 256; i++) - tot += virtio_scsi_scan_target(pci, vp, vq, i, 1); + tot += virtio_scsi_scan_target(pci, vp, vq, i, targets, 0); + for (i = 0; i < 256; i++) { + if (!targets[i >> 5]) { + i |= 31; + continue; + } + tot += virtio_scsi_scan_target(pci, vp, vq, i, targets, 1); + }
if (!tot) goto fail;
On Mon, Jun 19, 2017 at 06:21:09PM +0200, Paolo Bonzini wrote:
The introduction of REPORT LUNS caused a potential regression when many disks are present on the same SCSI target. If LUN0 is processed too late, SeaBIOS might run out of memory and fail to register it.
There are two kinds of problems:
QEMU presents the results of REPORT LUNS in the opposite order to the "-device" command-line options. It's likely that LUN 0 will be first in the command-line, and thus will be processed last by SeaBIOS.
QEMU registers all LUNs in each target first. It's thus possible that LUN 0 of target 1 is processed after many LUNs of target 0 and not be able to allocate memory.
Am I correct that this is not two problems but one: the recently added REPORT LUNS enumeration made more disks visible to SeaBIOS, so that their control structures may not all fit in the limited available memory, and their unfortunate ordering may result in the most relevant drives falling out?
To fix both, the SCSI scans are modified to operate in two phases; LUN 0 is scanned before the REPORT LUNS command is sent. A possible future enhancement is to remember the results of the scan in a 32-byte bitmap, and skip the REPORT LUNS command during the second phase.
When multiple SCSI controllers are found, it's still possible to have a regression because they are scanned in parallel.
I think there's a simpler but more reliable solution. One can safely assume that the operating systems that use BIOS to access hard drives aren't interested in that many LUNs being visible. So we can just skip calling scsi_drive_setup() and return non-zero from the respective add_lun functions if the lun number is above some predefined number (say, 2 or 8) and it doesn't appear on the fw_cfg bootorder list, e.g.
--- a/src/hw/virtio-scsi.c +++ b/src/hw/virtio-scsi.c @@ -114,6 +114,12 @@ virtio_scsi_add_lun(u32 lun, struct drive_s *tmpl_drv) { struct virtio_lun_s *tmpl_vlun = container_of(tmpl_drv, struct virtio_lun_s, drive); + int prio = bootprio_find_scsi_device(tmpl_vlun->pci, tmpl_vlun->target, + lun); + + if (prio < 0 && lun >= SCSI_MAX_NONBOOT_LUNS) + return -1; + struct virtio_lun_s *vlun = malloc_fseg(sizeof(*vlun)); if (!vlun) { warn_noalloc(); @@ -122,7 +128,6 @@ virtio_scsi_add_lun(u32 lun, struct drive_s *tmpl_drv) virtio_scsi_init_lun(vlun, tmpl_vlun->pci, tmpl_vlun->vp, tmpl_vlun->vq, tmpl_vlun->target, lun);
- int prio = bootprio_find_scsi_device(vlun->pci, vlun->target, vlun->lun); int ret = scsi_drive_setup(&vlun->drive, "virtio-scsi", prio); if (ret) goto fail;
Alternatively, one can pass the lun number to scsi_drive_setup and move this logic there.
This way we keep the possibilty to boot off any available lun, but will ignore higher luns otherwise for DOS and friends.
Would it work for your case?
Roman.
On 19/06/2017 23:55, Roman Kagan wrote:
On Mon, Jun 19, 2017 at 06:21:09PM +0200, Paolo Bonzini wrote:
The introduction of REPORT LUNS caused a potential regression when many disks are present on the same SCSI target. If LUN0 is processed too late, SeaBIOS might run out of memory and fail to register it.
There are two kinds of problems:
QEMU presents the results of REPORT LUNS in the opposite order to the "-device" command-line options. It's likely that LUN 0 will be first in the command-line, and thus will be processed last by SeaBIOS.
QEMU registers all LUNs in each target first. It's thus possible that LUN 0 of target 1 is processed after many LUNs of target 0 and not be able to allocate memory.
Am I correct that this is not two problems but one: the recently added REPORT LUNS enumeration made more disks visible to SeaBIOS, so that their control structures may not all fit in the limited available memory, and their unfortunate ordering may result in the most relevant drives falling out?
Yes, I listed them separately because you could fix the first by e.g. sorting the REPORT LUNS output.
To fix both, the SCSI scans are modified to operate in two phases; LUN 0 is scanned before the REPORT LUNS command is sent. A possible future enhancement is to remember the results of the scan in a 32-byte bitmap, and skip the REPORT LUNS command during the second phase.
When multiple SCSI controllers are found, it's still possible to have a regression because they are scanned in parallel.
I think there's a simpler but more reliable solution. One can safely assume that the operating systems that use BIOS to access hard drives aren't interested in that many LUNs being visible. So we can just skip calling scsi_drive_setup() and return non-zero from the respective add_lun functions if the lun number is above some predefined number (say, 2 or 8) and it doesn't appear on the fw_cfg bootorder list, e.g.
I'm not sure about the reliability; in practice this is a corner case and there is room for many LUNs in the upper memory blocks, and ignoring some LUNs (but not all of them) may produce unexpected results on some configurations.
I also kind of liked the idea of reusing the result of the LUN0 scan to speed up virtio-scsi boot. :)
Paolo
--- a/src/hw/virtio-scsi.c +++ b/src/hw/virtio-scsi.c @@ -114,6 +114,12 @@ virtio_scsi_add_lun(u32 lun, struct drive_s *tmpl_drv) { struct virtio_lun_s *tmpl_vlun = container_of(tmpl_drv, struct virtio_lun_s, drive);
- int prio = bootprio_find_scsi_device(tmpl_vlun->pci, tmpl_vlun->target,
lun);
- if (prio < 0 && lun >= SCSI_MAX_NONBOOT_LUNS)
return -1;
- struct virtio_lun_s *vlun = malloc_fseg(sizeof(*vlun)); if (!vlun) { warn_noalloc();
@@ -122,7 +128,6 @@ virtio_scsi_add_lun(u32 lun, struct drive_s *tmpl_drv) virtio_scsi_init_lun(vlun, tmpl_vlun->pci, tmpl_vlun->vp, tmpl_vlun->vq, tmpl_vlun->target, lun);
- int prio = bootprio_find_scsi_device(vlun->pci, vlun->target, vlun->lun); int ret = scsi_drive_setup(&vlun->drive, "virtio-scsi", prio); if (ret) goto fail;
Alternatively, one can pass the lun number to scsi_drive_setup and move this logic there.
This way we keep the possibilty to boot off any available lun, but will ignore higher luns otherwise for DOS and friends.
Would it work for your case?
Roman.
On Tue, Jun 20, 2017 at 02:42:15PM +0200, Paolo Bonzini wrote:
On 19/06/2017 23:55, Roman Kagan wrote:
On Mon, Jun 19, 2017 at 06:21:09PM +0200, Paolo Bonzini wrote:
The introduction of REPORT LUNS caused a potential regression when many disks are present on the same SCSI target. If LUN0 is processed too late, SeaBIOS might run out of memory and fail to register it.
There are two kinds of problems:
QEMU presents the results of REPORT LUNS in the opposite order to the "-device" command-line options. It's likely that LUN 0 will be first in the command-line, and thus will be processed last by SeaBIOS.
QEMU registers all LUNs in each target first. It's thus possible that LUN 0 of target 1 is processed after many LUNs of target 0 and not be able to allocate memory.
Am I correct that this is not two problems but one: the recently added REPORT LUNS enumeration made more disks visible to SeaBIOS, so that their control structures may not all fit in the limited available memory, and their unfortunate ordering may result in the most relevant drives falling out?
Yes, I listed them separately because you could fix the first by e.g. sorting the REPORT LUNS output.
But does the LUN processing order matter per se? Doesn't fixing it only change which LUNs are more likely to fall out?
To fix both, the SCSI scans are modified to operate in two phases; LUN 0 is scanned before the REPORT LUNS command is sent. A possible future enhancement is to remember the results of the scan in a 32-byte bitmap, and skip the REPORT LUNS command during the second phase.
When multiple SCSI controllers are found, it's still possible to have a regression because they are scanned in parallel.
I think there's a simpler but more reliable solution. One can safely assume that the operating systems that use BIOS to access hard drives aren't interested in that many LUNs being visible. So we can just skip calling scsi_drive_setup() and return non-zero from the respective add_lun functions if the lun number is above some predefined number (say, 2 or 8) and it doesn't appear on the fw_cfg bootorder list, e.g.
I'm not sure about the reliability; in practice this is a corner case and there is room for many LUNs in the upper memory blocks, and ignoring some LUNs (but not all of them) may produce unexpected results on some configurations.
I'm not sure I get the point. Basically it all turns around which LUNs to ignore: originally it was "higher LUNs in lower targets win", with your patch it's "LUN 0 in all targets win", with what I propose it's "LUNs up to N and bootable ones in all targets win". Am I missing anything?
[BTW what are those unexpected results that may arise from ignoring some LUNs, other than being unable to use them in guest DOS?]
I also kind of liked the idea of reusing the result of the LUN0 scan to speed up virtio-scsi boot. :)
Yes that's a cool trick but it wouldn't be needed without two-phase enumeration, would it? ;)
Roman.
On 20/06/2017 15:59, Roman Kagan wrote:
On Tue, Jun 20, 2017 at 02:42:15PM +0200, Paolo Bonzini wrote: But does the LUN processing order matter per se? Doesn't fixing it only change which LUNs are more likely to fall out?
For a long time, only LUN0 was booted from (or in fact available from DOS). I think it's important to keep this case running as reliably as possible.
To fix both, the SCSI scans are modified to operate in two phases; LUN 0 is scanned before the REPORT LUNS command is sent. A possible future enhancement is to remember the results of the scan in a 32-byte bitmap, and skip the REPORT LUNS command during the second phase.
When multiple SCSI controllers are found, it's still possible to have a regression because they are scanned in parallel.
I think there's a simpler but more reliable solution. One can safely assume that the operating systems that use BIOS to access hard drives aren't interested in that many LUNs being visible. So we can just skip calling scsi_drive_setup() and return non-zero from the respective add_lun functions if the lun number is above some predefined number (say, 2 or 8) and it doesn't appear on the fw_cfg bootorder list, e.g.
I'm not sure about the reliability; in practice this is a corner case and there is room for many LUNs in the upper memory blocks, and ignoring some LUNs (but not all of them) may produce unexpected results on some configurations.
I'm not sure I get the point. Basically it all turns around which LUNs to ignore: originally it was "higher LUNs in lower targets win", with your patch it's "LUN 0 in all targets win", with what I propose it's "LUNs up to N and bootable ones in all targets win". Am I missing anything?
We have four policies:
1) LUN0 only
2) higher LUNs in lower targets win
3) LUN0 in all targets win
4) LUNs up to N and bootable ones in all targets win
(2) causes regressions so it's worse. (3) and (4) don't.
Because the common case is "not enough disks to trigger the issue", I think it's appealing to have no limit on LUNs (3); the disadvantage is the parallel multi-controller case.
The "and bootable ones" clause in (4) is nifty, and even "LUN0 and bootable nonzero LUNs" could be a good compromise. However, I'm not sure which higher-level management tools (OpenStack, oVirt, Virtuozzo) let you customize the boot order and how easily.
[BTW what are those unexpected results that may arise from ignoring some LUNs, other than being unable to use them in guest DOS?]
Just that if you have specified no boot order, changing the LUN number may make SeaBIOS ignore the disk. I guess it would be acceptable for "LUN0 and nonzero LUNs in the boot order" since LUN0 has long been the only bootable one.
I also kind of liked the idea of reusing the result of the LUN0 scan to speed up virtio-scsi boot. :)
Yes that's a cool trick but it wouldn't be needed without two-phase enumeration, would it? ;)
Yes, these are the commands invoked for each target:
- before your patches: INQUIRY only;
- current master: REPORT LUNS, then optionally INQUIRY;
- after patch 1/2: INQUIRY + REPORT LUNS;
- after patch 2/2: INQUIRY, then optionally REPORT LUNS.
Paolo
On Tue, Jun 20, 2017 at 04:14:12PM +0200, Paolo Bonzini wrote:
We have four policies:
LUN0 only
higher LUNs in lower targets win
LUN0 in all targets win
LUNs up to N and bootable ones in all targets win
(2) causes regressions so it's worse. (3) and (4) don't.
Because the common case is "not enough disks to trigger the issue", I think it's appealing to have no limit on LUNs (3); the disadvantage is the parallel multi-controller case.
The "and bootable ones" clause in (4) is nifty, and even "LUN0 and bootable nonzero LUNs" could be a good compromise. However, I'm not sure which higher-level management tools (OpenStack, oVirt, Virtuozzo) let you customize the boot order and how easily.
Alas, none of these options seem appealing to me.
If virtio-scsi didn't need to allocate any space in the f-segment, does this problem go away in practice?
-Kevin
Alas, none of these options seem appealing to me.
If virtio-scsi didn't need to allocate any space in the f-segment, does this problem go away in practice?
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
Paolo
On Tue, Jun 20, 2017 at 04:05:32PM -0400, Paolo Bonzini wrote:
If virtio-scsi didn't need to allocate any space in the f-segment, does this problem go away in practice?
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
The 'struct drive_s' storage currently must be allocated in the f-segment so that the disk.c code can access some critical details of mapped drives when in 16bit mode. However, we could change the code to allocate that data separately from the controller specific data and then move the controller specific data to a larger memory pool. This would have two gains - there's a cap of 16 hard drives that can be mapped so we'd be less likely to exceed the f-segment, and even if the f-segment did run out of space it would almost certainly be on a non-bootable drive.
-Kevin
On 21/06/2017 00:44, Kevin O'Connor wrote:
On Tue, Jun 20, 2017 at 04:05:32PM -0400, Paolo Bonzini wrote:
If virtio-scsi didn't need to allocate any space in the f-segment, does this problem go away in practice?
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
The 'struct drive_s' storage currently must be allocated in the f-segment so that the disk.c code can access some critical details of mapped drives when in 16bit mode. However, we could change the code to allocate that data separately from the controller specific data and then move the controller specific data to a larger memory pool. This would have two gains - there's a cap of 16 hard drives that can be mapped so we'd be less likely to exceed the f-segment, and even if the f-segment did run out of space it would almost certainly be on a non-bootable drive.
Or maybe, after the first 16 hard drives, we can stop allocating from the fseg.
Paolo
On Fri, Jul 07, 2017 at 04:14:01PM +0200, Paolo Bonzini wrote:
On 21/06/2017 00:44, Kevin O'Connor wrote:
On Tue, Jun 20, 2017 at 04:05:32PM -0400, Paolo Bonzini wrote:
If virtio-scsi didn't need to allocate any space in the f-segment, does this problem go away in practice?
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
The 'struct drive_s' storage currently must be allocated in the f-segment so that the disk.c code can access some critical details of mapped drives when in 16bit mode. However, we could change the code to allocate that data separately from the controller specific data and then move the controller specific data to a larger memory pool. This would have two gains - there's a cap of 16 hard drives that can be mapped so we'd be less likely to exceed the f-segment, and even if the f-segment did run out of space it would almost certainly be on a non-bootable drive.
I was hoping to be able to prototype the above, but haven't gotten a chance to yet. I think I should be able to get to it next week.
Or maybe, after the first 16 hard drives, we can stop allocating from the fseg.
The user could select any drive to boot from, so it would be hard to know which drives could be allocated in f-seg and which couldn't (at least until after the user has made the boot selection).
-Kevin
On 21/06/2017 00:44, Kevin O'Connor wrote:
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
The 'struct drive_s' storage currently must be allocated in the f-segment so that the disk.c code can access some critical details of mapped drives when in 16bit mode. However, we could change the code to allocate that data separately from the controller specific data and then move the controller specific data to a larger memory pool.
I don't think this is the issue. virtqueues actually have been allocated with memalign_high since 2015 (commit 6cfebb4e).
virtio-blk is allocating 68 fseg bytes for the controller-specific data, virtio-scsi only 16. struct drive_s is about 40 bytes long.
Paolo
On Wed, Sep 27, 2017 at 09:36:01AM +0200, Paolo Bonzini wrote:
On 21/06/2017 00:44, Kevin O'Connor wrote:
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
The 'struct drive_s' storage currently must be allocated in the f-segment so that the disk.c code can access some critical details of mapped drives when in 16bit mode. However, we could change the code to allocate that data separately from the controller specific data and then move the controller specific data to a larger memory pool.
I don't think this is the issue. virtqueues actually have been allocated with memalign_high since 2015 (commit 6cfebb4e).
virtio-blk is allocating 68 fseg bytes for the controller-specific data, virtio-scsi only 16. struct drive_s is about 40 bytes long.
What is the test case for reproducing the problem?
I can test with the patchset I posted earlier:
https://mail.coreboot.org/pipermail/seabios/2017-July/011452.html
which is also now at:
https://github.com/KevinOConnor/seabios/tree/work-drive-lowmem-20170711
-Kevin
On 27/09/2017 15:58, Kevin O'Connor wrote:
On Wed, Sep 27, 2017 at 09:36:01AM +0200, Paolo Bonzini wrote:
On 21/06/2017 00:44, Kevin O'Connor wrote:
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
The 'struct drive_s' storage currently must be allocated in the f-segment so that the disk.c code can access some critical details of mapped drives when in 16bit mode. However, we could change the code to allocate that data separately from the controller specific data and then move the controller specific data to a larger memory pool.
I don't think this is the issue. virtqueues actually have been allocated with memalign_high since 2015 (commit 6cfebb4e).
virtio-blk is allocating 68 fseg bytes for the controller-specific data, virtio-scsi only 16. struct drive_s is about 40 bytes long.
What is the test case for reproducing the problem?
Do
for i in $(seq 1 500); do echo -drive if=none,file=null-co://,id=null$i echo -device scsi-hd,lun=$i,drive=null$i; done > manydisks for i in $(seq 1 10); do echo -drive if=none,file=null-co://,id=null$i echo -device scsi-hd,lun=$i,drive=null$i; done > fewdisks
then
qemu-kvm -device virtio-scsi-pci \ -drive if=none,file=someos.qcow2,id=os \ -device scsi-hd,lun=0,bootindex=0,drive=os $(cat manydisks)
does not respect bootindex=0 while
qemu-kvm -device virtio-scsi-pci \ -drive if=none,file=someos.qcow2,id=os \ -device scsi-hd,lun=0,bootindex=0,drive=os $(cat fewdisks)
respects it. The reason is that with "manydisks" you get many FSEG allocation failures in virtio_scsi_init_lun.
Paolo
I can test with the patchset I posted earlier:
https://mail.coreboot.org/pipermail/seabios/2017-July/011452.html
which is also now at:
https://github.com/KevinOConnor/seabios/tree/work-drive-lowmem-20170711
-Kevin
On Wed, Sep 27, 2017 at 04:15:49PM +0200, Paolo Bonzini wrote:
On 27/09/2017 15:58, Kevin O'Connor wrote:
On Wed, Sep 27, 2017 at 09:36:01AM +0200, Paolo Bonzini wrote:
On 21/06/2017 00:44, Kevin O'Connor wrote:
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
The 'struct drive_s' storage currently must be allocated in the f-segment so that the disk.c code can access some critical details of mapped drives when in 16bit mode. However, we could change the code to allocate that data separately from the controller specific data and then move the controller specific data to a larger memory pool.
I don't think this is the issue. virtqueues actually have been allocated with memalign_high since 2015 (commit 6cfebb4e).
virtio-blk is allocating 68 fseg bytes for the controller-specific data, virtio-scsi only 16. struct drive_s is about 40 bytes long.
What is the test case for reproducing the problem?
[...]
respects it. The reason is that with "manydisks" you get many FSEG allocation failures in virtio_scsi_init_lun.
I can confirm it fails on SeaBIOS master, but works on the "work-drive-lowmem-20170711" branch.
I didn't want to commit that branch without confirming it fixed the underlying problem. Is this sufficient confirmation?
-Kevin
On 27/09/2017 16:35, Kevin O'Connor wrote:
On Wed, Sep 27, 2017 at 04:15:49PM +0200, Paolo Bonzini wrote:
On 27/09/2017 15:58, Kevin O'Connor wrote:
On Wed, Sep 27, 2017 at 09:36:01AM +0200, Paolo Bonzini wrote:
On 21/06/2017 00:44, Kevin O'Connor wrote:
Yes, I think so. I'm not sure why virtqueues are allocated in low memory. Either cargo culting, or a remain of when virtio was a 16-bit driver, if it ever was.
The 'struct drive_s' storage currently must be allocated in the f-segment so that the disk.c code can access some critical details of mapped drives when in 16bit mode. However, we could change the code to allocate that data separately from the controller specific data and then move the controller specific data to a larger memory pool.
I don't think this is the issue. virtqueues actually have been allocated with memalign_high since 2015 (commit 6cfebb4e).
virtio-blk is allocating 68 fseg bytes for the controller-specific data, virtio-scsi only 16. struct drive_s is about 40 bytes long.
What is the test case for reproducing the problem?
[...]
respects it. The reason is that with "manydisks" you get many FSEG allocation failures in virtio_scsi_init_lun.
I can confirm it fails on SeaBIOS master, but works on the "work-drive-lowmem-20170711" branch.
I didn't want to commit that branch without confirming it fixed the underlying problem. Is this sufficient confirmation?
Yes, I tried with more disks and it works with about 900. I guess that's fine.
Paolo