On 20/06/2017 15:59, Roman Kagan wrote:
On Tue, Jun 20, 2017 at 02:42:15PM +0200, Paolo Bonzini wrote: But does the LUN processing order matter per se? Doesn't fixing it only change which LUNs are more likely to fall out?
For a long time, only LUN0 was booted from (or in fact available from DOS). I think it's important to keep this case running as reliably as possible.
To fix both, the SCSI scans are modified to operate in two phases; LUN 0 is scanned before the REPORT LUNS command is sent. A possible future enhancement is to remember the results of the scan in a 32-byte bitmap, and skip the REPORT LUNS command during the second phase.
When multiple SCSI controllers are found, it's still possible to have a regression because they are scanned in parallel.
I think there's a simpler but more reliable solution. One can safely assume that the operating systems that use BIOS to access hard drives aren't interested in that many LUNs being visible. So we can just skip calling scsi_drive_setup() and return non-zero from the respective add_lun functions if the lun number is above some predefined number (say, 2 or 8) and it doesn't appear on the fw_cfg bootorder list, e.g.
I'm not sure about the reliability; in practice this is a corner case and there is room for many LUNs in the upper memory blocks, and ignoring some LUNs (but not all of them) may produce unexpected results on some configurations.
I'm not sure I get the point. Basically it all turns around which LUNs to ignore: originally it was "higher LUNs in lower targets win", with your patch it's "LUN 0 in all targets win", with what I propose it's "LUNs up to N and bootable ones in all targets win". Am I missing anything?
We have four policies:
1) LUN0 only
2) higher LUNs in lower targets win
3) LUN0 in all targets win
4) LUNs up to N and bootable ones in all targets win
(2) causes regressions so it's worse. (3) and (4) don't.
Because the common case is "not enough disks to trigger the issue", I think it's appealing to have no limit on LUNs (3); the disadvantage is the parallel multi-controller case.
The "and bootable ones" clause in (4) is nifty, and even "LUN0 and bootable nonzero LUNs" could be a good compromise. However, I'm not sure which higher-level management tools (OpenStack, oVirt, Virtuozzo) let you customize the boot order and how easily.
[BTW what are those unexpected results that may arise from ignoring some LUNs, other than being unable to use them in guest DOS?]
Just that if you have specified no boot order, changing the LUN number may make SeaBIOS ignore the disk. I guess it would be acceptable for "LUN0 and nonzero LUNs in the boot order" since LUN0 has long been the only bootable one.
I also kind of liked the idea of reusing the result of the LUN0 scan to speed up virtio-scsi boot. :)
Yes that's a cool trick but it wouldn't be needed without two-phase enumeration, would it? ;)
Yes, these are the commands invoked for each target:
- before your patches: INQUIRY only;
- current master: REPORT LUNS, then optionally INQUIRY;
- after patch 1/2: INQUIRY + REPORT LUNS;
- after patch 2/2: INQUIRY, then optionally REPORT LUNS.
Paolo