v1:
Non-standard logical geometries break under QEMU.
A virtual disk which contains an operating system which depends on logical geometries (consistent values being reported from BIOS INT13 AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard logical geometries - for example 56 SPT (sectors per track). No matter what QEMU will guess - SeaBIOS, for large enough disks - will use LBA translation, which will report 63 SPT instead.
In addition we can not enforce SeaBIOS to rely on phyiscal geometries at all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not report more than 16 physical heads when moved to an IDE controller, the ATA spec allows a maximum of 16 heads - this is an artifact of virtualization.
By supplying the logical geometies directly we are able to support such "exotic" disks.
We will use fw_cfg to do just that.
v2:
Fix missing parenthesis check in "hd-geo-test: Add tests for lchs override"
Sam Eiderman (8): block: Refactor macros - fix tabbing block: Support providing LCHS from user bootdevice: Add interface to gather LCHS scsi: Propagate unrealize() callback to scsi-hd bootdevice: Gather LCHS from all relevant devices bootdevice: Refactor get_boot_devices_list bootdevice: FW_CFG interface for LCHS values hd-geo-test: Add tests for lchs override
bootdevice.c | 158 ++++++++++--- hw/block/virtio-blk.c | 6 + hw/ide/qdev.c | 7 +- hw/nvram/fw_cfg.c | 14 +- hw/scsi/scsi-bus.c | 15 ++ hw/scsi/scsi-disk.c | 14 ++ include/hw/block/block.h | 22 +- include/hw/scsi/scsi.h | 1 + include/sysemu/sysemu.h | 4 + tests/Makefile.include | 2 +- tests/hd-geo-test.c | 565 +++++++++++++++++++++++++++++++++++++++++++++++ 11 files changed, 767 insertions(+), 41 deletions(-)
Fixing tabbing in block related macros.
Reviewed-by: Karl Heubaum karl.heubaum@oracle.com Reviewed-by: Arbel Moshe arbel.moshe@oracle.com Signed-off-by: Sam Eiderman shmuel.eiderman@oracle.com --- hw/ide/qdev.c | 2 +- include/hw/block/block.h | 16 ++++++++-------- 2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c index 360cd20bd8..9cae3205df 100644 --- a/hw/ide/qdev.c +++ b/hw/ide/qdev.c @@ -285,7 +285,7 @@ static void ide_drive_realize(IDEDevice *dev, Error **errp) DEFINE_BLOCK_PROPERTIES(IDEDrive, dev.conf), \ DEFINE_BLOCK_ERROR_PROPERTIES(IDEDrive, dev.conf), \ DEFINE_PROP_STRING("ver", IDEDrive, dev.version), \ - DEFINE_PROP_UINT64("wwn", IDEDrive, dev.wwn, 0), \ + DEFINE_PROP_UINT64("wwn", IDEDrive, dev.wwn, 0), \ DEFINE_PROP_STRING("serial", IDEDrive, dev.serial),\ DEFINE_PROP_STRING("model", IDEDrive, dev.model)
diff --git a/include/hw/block/block.h b/include/hw/block/block.h index 607539057a..fd55a30bca 100644 --- a/include/hw/block/block.h +++ b/include/hw/block/block.h @@ -50,21 +50,21 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf) _conf.logical_block_size), \ DEFINE_PROP_BLOCKSIZE("physical_block_size", _state, \ _conf.physical_block_size), \ - DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0), \ + DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0), \ DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0), \ - DEFINE_PROP_UINT32("discard_granularity", _state, \ - _conf.discard_granularity, -1), \ - DEFINE_PROP_ON_OFF_AUTO("write-cache", _state, _conf.wce, \ - ON_OFF_AUTO_AUTO), \ + DEFINE_PROP_UINT32("discard_granularity", _state, \ + _conf.discard_granularity, -1), \ + DEFINE_PROP_ON_OFF_AUTO("write-cache", _state, _conf.wce, \ + ON_OFF_AUTO_AUTO), \ DEFINE_PROP_BOOL("share-rw", _state, _conf.share_rw, false)
#define DEFINE_BLOCK_PROPERTIES(_state, _conf) \ DEFINE_PROP_DRIVE("drive", _state, _conf.blk), \ DEFINE_BLOCK_PROPERTIES_BASE(_state, _conf)
-#define DEFINE_BLOCK_CHS_PROPERTIES(_state, _conf) \ - DEFINE_PROP_UINT32("cyls", _state, _conf.cyls, 0), \ - DEFINE_PROP_UINT32("heads", _state, _conf.heads, 0), \ +#define DEFINE_BLOCK_CHS_PROPERTIES(_state, _conf) \ + DEFINE_PROP_UINT32("cyls", _state, _conf.cyls, 0), \ + DEFINE_PROP_UINT32("heads", _state, _conf.heads, 0), \ DEFINE_PROP_UINT32("secs", _state, _conf.secs, 0)
#define DEFINE_BLOCK_ERROR_PROPERTIES(_state, _conf) \
Add logical geometry variables to BlockConf.
A user can now supply "lcyls", "lheads" & "lsecs" for any HD device that supports CHS ("cyls", "heads", "secs").
These devices include: * ide-hd * scsi-hd * virtio-blk-pci
In future commits we will use the provided LCHS and pass it to the BIOS through fw_cfg to be supplied using INT13 routines.
Reviewed-by: Karl Heubaum karl.heubaum@oracle.com Reviewed-by: Arbel Moshe arbel.moshe@oracle.com Signed-off-by: Sam Eiderman shmuel.eiderman@oracle.com --- include/hw/block/block.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/hw/block/block.h b/include/hw/block/block.h index fd55a30bca..d7246f3862 100644 --- a/include/hw/block/block.h +++ b/include/hw/block/block.h @@ -26,6 +26,7 @@ typedef struct BlockConf { uint32_t discard_granularity; /* geometry, not all devices use this */ uint32_t cyls, heads, secs; + uint32_t lcyls, lheads, lsecs; OnOffAuto wce; bool share_rw; BlockdevOnError rerror; @@ -65,7 +66,10 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf) #define DEFINE_BLOCK_CHS_PROPERTIES(_state, _conf) \ DEFINE_PROP_UINT32("cyls", _state, _conf.cyls, 0), \ DEFINE_PROP_UINT32("heads", _state, _conf.heads, 0), \ - DEFINE_PROP_UINT32("secs", _state, _conf.secs, 0) + DEFINE_PROP_UINT32("secs", _state, _conf.secs, 0), \ + DEFINE_PROP_UINT32("lcyls", _state, _conf.lcyls, 0), \ + DEFINE_PROP_UINT32("lheads", _state, _conf.lheads, 0), \ + DEFINE_PROP_UINT32("lsecs", _state, _conf.lsecs, 0)
#define DEFINE_BLOCK_ERROR_PROPERTIES(_state, _conf) \ DEFINE_PROP_BLOCKDEV_ON_ERROR("rerror", _state, _conf.rerror, \
Add an interface to provide direct logical CHS values for boot devices. We will use this interface in the next commits.
Reviewed-by: Karl Heubaum karl.heubaum@oracle.com Reviewed-by: Arbel Moshe arbel.moshe@oracle.com Signed-off-by: Sam Eiderman shmuel.eiderman@oracle.com --- bootdevice.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++ include/sysemu/sysemu.h | 3 +++ 2 files changed, 58 insertions(+)
diff --git a/bootdevice.c b/bootdevice.c index 1d225202f9..bc5e1c2de4 100644 --- a/bootdevice.c +++ b/bootdevice.c @@ -343,3 +343,58 @@ void device_add_bootindex_property(Object *obj, int32_t *bootindex, /* initialize devices' bootindex property to -1 */ object_property_set_int(obj, -1, name, NULL); } + +typedef struct FWLCHSEntry FWLCHSEntry; + +struct FWLCHSEntry { + QTAILQ_ENTRY(FWLCHSEntry) link; + DeviceState *dev; + char *suffix; + uint32_t lcyls; + uint32_t lheads; + uint32_t lsecs; +}; + +static QTAILQ_HEAD(, FWLCHSEntry) fw_lchs = + QTAILQ_HEAD_INITIALIZER(fw_lchs); + +void add_boot_device_lchs(DeviceState *dev, const char *suffix, + uint32_t lcyls, uint32_t lheads, uint32_t lsecs) +{ + FWLCHSEntry *node; + + if (!lcyls && !lheads && !lsecs) { + return; + } + + assert(dev != NULL || suffix != NULL); + + node = g_malloc0(sizeof(FWLCHSEntry)); + node->suffix = g_strdup(suffix); + node->dev = dev; + node->lcyls = lcyls; + node->lheads = lheads; + node->lsecs = lsecs; + + QTAILQ_INSERT_TAIL(&fw_lchs, node, link); +} + +void del_boot_device_lchs(DeviceState *dev, const char *suffix) +{ + FWLCHSEntry *i; + + if (dev == NULL) { + return; + } + + QTAILQ_FOREACH(i, &fw_lchs, link) { + if ((!suffix || !g_strcmp0(i->suffix, suffix)) && + i->dev == dev) { + QTAILQ_REMOVE(&fw_lchs, i, link); + g_free(i->suffix); + g_free(i); + + break; + } + } +} diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 61579ae71e..173dfbb539 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -171,6 +171,9 @@ void device_add_bootindex_property(Object *obj, int32_t *bootindex, DeviceState *dev, Error **errp); void restore_boot_order(void *opaque); void validate_bootdevices(const char *devices, Error **errp); +void add_boot_device_lchs(DeviceState *dev, const char *suffix, + uint32_t lcyls, uint32_t lheads, uint32_t lsecs); +void del_boot_device_lchs(DeviceState *dev, const char *suffix);
/* handler to set the boot_device order for a specific type of MachineClass */ typedef void QEMUBootSetHandler(void *opaque, const char *boot_order,
We will need to add LCHS removal logic to scsi-hd's unrealize() in the next commit.
Reviewed-by: Karl Heubaum karl.heubaum@oracle.com Reviewed-by: Arbel Moshe arbel.moshe@oracle.com Signed-off-by: Sam Eiderman shmuel.eiderman@oracle.com --- hw/scsi/scsi-bus.c | 15 +++++++++++++++ include/hw/scsi/scsi.h | 1 + 2 files changed, 16 insertions(+)
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index c480553083..f6fe497a1a 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -55,6 +55,14 @@ static void scsi_device_realize(SCSIDevice *s, Error **errp) } }
+static void scsi_device_unrealize(SCSIDevice *s, Error **errp) +{ + SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(s); + if (sc->unrealize) { + sc->unrealize(s, errp); + } +} + int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, void *hba_private) { @@ -213,11 +221,18 @@ static void scsi_qdev_realize(DeviceState *qdev, Error **errp) static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp) { SCSIDevice *dev = SCSI_DEVICE(qdev); + Error *local_err = NULL;
if (dev->vmsentry) { qemu_del_vm_change_state_handler(dev->vmsentry); }
+ scsi_device_unrealize(dev, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return; + } + scsi_device_purge_requests(dev, SENSE_CODE(NO_SENSE)); blockdev_mark_auto_del(dev->conf.blk); } diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 426566a5c6..8cf71f910d 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -59,6 +59,7 @@ struct SCSIRequest { typedef struct SCSIDeviceClass { DeviceClass parent_class; void (*realize)(SCSIDevice *dev, Error **errp); + void (*unrealize)(SCSIDevice *dev, Error **errp); int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, void *hba_private); SCSIRequest *(*alloc_req)(SCSIDevice *s, uint32_t tag, uint32_t lun,
Relevant devices are: * ide-hd (and ide-cd, ide-drive) * scsi-hd (and scsi-cd, scsi-disk, scsi-block) * virtio-blk-pci
We do not call del_boot_device_lchs() for ide-* since we don't need to - IDE block devices do not support unplugging.
Reviewed-by: Karl Heubaum karl.heubaum@oracle.com Reviewed-by: Arbel Moshe arbel.moshe@oracle.com Signed-off-by: Sam Eiderman shmuel.eiderman@oracle.com --- hw/block/virtio-blk.c | 6 ++++++ hw/ide/qdev.c | 5 +++++ hw/scsi/scsi-disk.c | 14 ++++++++++++++ 3 files changed, 25 insertions(+)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 06e57a4d39..787bbd768a 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -1182,6 +1182,11 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) blk_set_guest_block_size(s->blk, s->conf.conf.logical_block_size);
blk_iostatus_enable(s->blk); + + add_boot_device_lchs(dev, "/disk@0,0", + (&conf->conf)->lcyls, + (&conf->conf)->lheads, + (&conf->conf)->lsecs); }
static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp) @@ -1189,6 +1194,7 @@ static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIOBlock *s = VIRTIO_BLK(dev);
+ del_boot_device_lchs(dev, "/disk@0,0"); virtio_blk_data_plane_destroy(s->dataplane); s->dataplane = NULL; qemu_del_vm_change_state_handler(s->change); diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c index 9cae3205df..07f429d5e3 100644 --- a/hw/ide/qdev.c +++ b/hw/ide/qdev.c @@ -215,6 +215,11 @@ static void ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind, Error **errp)
add_boot_device_path(dev->conf.bootindex, &dev->qdev, dev->unit ? "/disk@1" : "/disk@0"); + + add_boot_device_lchs(&dev->qdev, dev->unit ? "/disk@1" : "/disk@0", + (&dev->conf)->lcyls, + (&dev->conf)->lheads, + (&dev->conf)->lsecs); }
static void ide_dev_get_bootindex(Object *obj, Visitor *v, const char *name, diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 7b89ac798b..3451aefdea 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2390,6 +2390,16 @@ static void scsi_realize(SCSIDevice *dev, Error **errp) blk_set_guest_block_size(s->qdev.conf.blk, s->qdev.blocksize);
blk_iostatus_enable(s->qdev.conf.blk); + + add_boot_device_lchs(&dev->qdev, NULL, + (&dev->conf)->lcyls, + (&dev->conf)->lheads, + (&dev->conf)->lsecs); +} + +static void scsi_unrealize(SCSIDevice *dev, Error **errp) +{ + del_boot_device_lchs(&dev->qdev, NULL); }
static void scsi_hd_realize(SCSIDevice *dev, Error **errp) @@ -2988,6 +2998,7 @@ static void scsi_hd_class_initfn(ObjectClass *klass, void *data) SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);
sc->realize = scsi_hd_realize; + sc->unrealize = scsi_unrealize; sc->alloc_req = scsi_new_request; sc->unit_attention_reported = scsi_disk_unit_attention_reported; dc->desc = "virtual SCSI disk"; @@ -3019,6 +3030,7 @@ static void scsi_cd_class_initfn(ObjectClass *klass, void *data) SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);
sc->realize = scsi_cd_realize; + sc->unrealize = scsi_unrealize; sc->alloc_req = scsi_new_request; sc->unit_attention_reported = scsi_disk_unit_attention_reported; dc->desc = "virtual SCSI CD-ROM"; @@ -3054,6 +3066,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data) SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
sc->realize = scsi_block_realize; + sc->unrealize = scsi_unrealize; sc->alloc_req = scsi_block_new_request; sc->parse_cdb = scsi_block_parse_cdb; sdc->dma_readv = scsi_block_dma_readv; @@ -3095,6 +3108,7 @@ static void scsi_disk_class_initfn(ObjectClass *klass, void *data) SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);
sc->realize = scsi_disk_realize; + sc->unrealize = scsi_unrealize; sc->alloc_req = scsi_new_request; sc->unit_attention_reported = scsi_disk_unit_attention_reported; dc->fw_name = "disk";
Move device name construction to a separate function.
We will reuse this function in the following commit to pass logical CHS parameters through fw_cfg much like we currently pass bootindex.
Reviewed-by: Karl Heubaum karl.heubaum@oracle.com Reviewed-by: Arbel Moshe arbel.moshe@oracle.com Signed-off-by: Sam Eiderman shmuel.eiderman@oracle.com --- bootdevice.c | 61 +++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 34 insertions(+), 27 deletions(-)
diff --git a/bootdevice.c b/bootdevice.c index bc5e1c2de4..2b12fb85a4 100644 --- a/bootdevice.c +++ b/bootdevice.c @@ -202,6 +202,39 @@ DeviceState *get_boot_device(uint32_t position) return res; }
+static char *get_boot_device_path(DeviceState *dev, bool ignore_suffixes, + char *suffix) +{ + char *devpath = NULL, *s = NULL, *d, *bootpath; + + if (dev) { + devpath = qdev_get_fw_dev_path(dev); + assert(devpath); + } + + if (!ignore_suffixes) { + if (dev) { + d = qdev_get_own_fw_dev_path_from_handler(dev->parent_bus, dev); + if (d) { + assert(!suffix); + s = d; + } else { + s = g_strdup(suffix); + } + } else { + s = g_strdup(suffix); + } + } + + bootpath = g_strdup_printf("%s%s", + devpath ? devpath : "", + s ? s : ""); + g_free(devpath); + g_free(s); + + return bootpath; +} + /* * This function returns null terminated string that consist of new line * separated device paths. @@ -218,36 +251,10 @@ char *get_boot_devices_list(size_t *size) bool ignore_suffixes = mc->ignore_boot_device_suffixes;
QTAILQ_FOREACH(i, &fw_boot_order, link) { - char *devpath = NULL, *suffix = NULL; char *bootpath; - char *d; size_t len;
- if (i->dev) { - devpath = qdev_get_fw_dev_path(i->dev); - assert(devpath); - } - - if (!ignore_suffixes) { - if (i->dev) { - d = qdev_get_own_fw_dev_path_from_handler(i->dev->parent_bus, - i->dev); - if (d) { - assert(!i->suffix); - suffix = d; - } else { - suffix = g_strdup(i->suffix); - } - } else { - suffix = g_strdup(i->suffix); - } - } - - bootpath = g_strdup_printf("%s%s", - devpath ? devpath : "", - suffix ? suffix : ""); - g_free(devpath); - g_free(suffix); + bootpath = get_boot_device_path(i->dev, ignore_suffixes, i->suffix);
if (total) { list[total-1] = '\n';
Using fw_cfg, supply logical CHS values directly from QEMU to the BIOS.
Non-standard logical geometries break under QEMU.
A virtual disk which contains an operating system which depends on logical geometries (consistent values being reported from BIOS INT13 AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard logical geometries - for example 56 SPT (sectors per track). No matter what QEMU will report - SeaBIOS, for large enough disks - will use LBA translation, which will report 63 SPT instead.
In addition we cannot force SeaBIOS to rely on physical geometries at all. A virtio-blk-pci virtual disk with 255 phyiscal heads cannot report more than 16 physical heads when moved to an IDE controller, since the ATA spec allows a maximum of 16 heads - this is an artifact of virtualization.
By supplying the logical geometries directly we are able to support such "exotic" disks.
We serialize this information in a similar way to the "bootorder" interface. The fw_cfg entry is "bootdevices" and it serializes a struct. At the moment the struct holds the values of logical CHS values but it can be expanded easily due to the extendable ABI implemented.
(In the future, we can pass the bootindex through "bootdevices" instead "bootorder" - unifying all bootdevice information in one fw_cfg value)
The PV interface through fw_cfg could have also been implemented using device specific keys, e.g.: "/etc/bootdevice/%s/logical_geometry" where %s is the device name QEMU produces - but this implementation would require much more code refactoring, both in QEMU and SeaBIOS, so the current implementation was chosen.
Reviewed-by: Karl Heubaum karl.heubaum@oracle.com Reviewed-by: Arbel Moshe arbel.moshe@oracle.com Signed-off-by: Sam Eiderman shmuel.eiderman@oracle.com --- bootdevice.c | 42 ++++++++++++++++++++++++++++++++++++++++++ hw/nvram/fw_cfg.c | 14 +++++++++++--- include/sysemu/sysemu.h | 1 + 3 files changed, 54 insertions(+), 3 deletions(-)
diff --git a/bootdevice.c b/bootdevice.c index 2b12fb85a4..84c2a83f25 100644 --- a/bootdevice.c +++ b/bootdevice.c @@ -405,3 +405,45 @@ void del_boot_device_lchs(DeviceState *dev, const char *suffix) } } } + +typedef struct QEMU_PACKED BootDeviceEntrySerialized { + /* Do not change field order - add new fields below */ + uint32_t lcyls; + uint32_t lheads; + uint32_t lsecs; +} BootDeviceEntrySerialized; + +/* Serialized as: struct size (4) + (device name\0 + device struct) x devices */ +char *get_boot_devices_info(size_t *size) +{ + FWLCHSEntry *i; + BootDeviceEntrySerialized s; + size_t total = 0; + char *list = NULL; + + list = g_malloc0(sizeof(uint32_t)); + *((uint32_t *)list) = (uint32_t)sizeof(s); + total = sizeof(uint32_t); + + QTAILQ_FOREACH(i, &fw_lchs, link) { + char *bootpath; + size_t len; + + bootpath = get_boot_device_path(i->dev, false, i->suffix); + s.lcyls = i->lcyls; + s.lheads = i->lheads; + s.lsecs = i->lsecs; + + len = strlen(bootpath) + 1; + list = g_realloc(list, total + len + sizeof(s)); + memcpy(&list[total], bootpath, len); + memcpy(&list[total + len], &s, sizeof(s)); + total += len + sizeof(s); + + g_free(bootpath); + } + + *size = total; + + return list; +} diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index 9f7b7789bc..008b21542f 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -916,13 +916,21 @@ void *fw_cfg_modify_file(FWCfgState *s, const char *filename,
static void fw_cfg_machine_reset(void *opaque) { + MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine()); + FWCfgState *s = opaque; void *ptr; size_t len; - FWCfgState *s = opaque; - char *bootindex = get_boot_devices_list(&len); + char *buf;
- ptr = fw_cfg_modify_file(s, "bootorder", (uint8_t *)bootindex, len); + buf = get_boot_devices_list(&len); + ptr = fw_cfg_modify_file(s, "bootorder", (uint8_t *)buf, len); g_free(ptr); + + if (!mc->legacy_fw_cfg_order) { + buf = get_boot_devices_info(&len); + ptr = fw_cfg_modify_file(s, "bootdevices", (uint8_t *)buf, len); + g_free(ptr); + } }
static void fw_cfg_machine_ready(struct Notifier *n, void *data) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 173dfbb539..f0552006f4 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -174,6 +174,7 @@ void validate_bootdevices(const char *devices, Error **errp); void add_boot_device_lchs(DeviceState *dev, const char *suffix, uint32_t lcyls, uint32_t lheads, uint32_t lsecs); void del_boot_device_lchs(DeviceState *dev, const char *suffix); +char *get_boot_devices_info(size_t *size);
/* handler to set the boot_device order for a specific type of MachineClass */ typedef void QEMUBootSetHandler(void *opaque, const char *boot_order,
Hi,
We serialize this information in a similar way to the "bootorder" interface. The fw_cfg entry is "bootdevices" and it serializes a struct.
Why "bootdevices"? I'd suggest to use "geometry" or "lchs" instead.
At the moment the struct holds the values of logical CHS values but it can be expanded easily due to the extendable ABI implemented.
(In the future, we can pass the bootindex through "bootdevices" instead "bootorder" - unifying all bootdevice information in one fw_cfg value)
I don't think deprecating bootorder is useful. Nobody cares about the disk geometry, except some legacy x86 bios guests. So seabios will be the only firmware using this new interface. Switching all firmware to a new fw_cfg file is pointless churn.
Why make this extendable? What possible extensions do you have in mind?
Also note that with a possible extension you might end up in a situation where you have info A for device 1 and info B for device 2 and info A+B for device 3 while with your current patch there is no way to signal whenever info A or B is available for a given device.
+/* Serialized as: struct size (4) + (device name\0 + device struct) x devices */ +char *get_boot_devices_info(size_t *size) +{
- FWLCHSEntry *i;
- BootDeviceEntrySerialized s;
- size_t total = 0;
- char *list = NULL;
if (QTAILQ_EMPTY(&fw_lchs)) { return NULL; }
- if (!mc->legacy_fw_cfg_order) {
^^^^^^^^^^^^^^^^^^^ Hmm?
cheers, Gerd
On 17 Jun 2019, at 10:20, Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
We serialize this information in a similar way to the "bootorder" interface. The fw_cfg entry is "bootdevices" and it serializes a struct.
Why "bootdevices"? I'd suggest to use "geometry" or "lchs" instead.
True, if we don’t think an extension will be required in the future we might as well call it “lchs" or "bios-geometry”.
At the moment the struct holds the values of logical CHS values but it can be expanded easily due to the extendable ABI implemented.
(In the future, we can pass the bootindex through "bootdevices" instead "bootorder" - unifying all bootdevice information in one fw_cfg value)
I don't think deprecating bootorder is useful. Nobody cares about the disk geometry, except some legacy x86 bios guests. So seabios will be the only firmware using this new interface. Switching all firmware to a new fw_cfg file is pointless churn.
Why make this extendable? What possible extensions do you have in mind?
I’m not sure about this but if “bootorder” was written in the first place using such an extension this could have been useful. I don’t have anything specific in mind. I don’t think deprecating bootorder is useful either, just mentioned that it will be possible if we would like to unify all disk values someday.
Also note that with a possible extension you might end up in a situation where you have info A for device 1 and info B for device 2 and info A+B for device 3 while with your current patch there is no way to signal whenever info A or B is available for a given device.
Well for lchs (A) a non-existing value is 0, 0, 0 (uint32). So at the moment we’re good. We can signal other values with other magic numbers (such as -1 for bootorder) or prefix the value with an additional boolean value “is signaled”.
+/* Serialized as: struct size (4) + (device name\0 + device struct) x devices */ +char *get_boot_devices_info(size_t *size) +{
- FWLCHSEntry *i;
- BootDeviceEntrySerialized s;
- size_t total = 0;
- char *list = NULL;
if (QTAILQ_EMPTY(&fw_lchs)) { return NULL; }
- if (!mc->legacy_fw_cfg_order) {
^^^^^^^^^^^^^^^^^^^
Hmm?
Only making this available in non-legacy mode. Qemu complains in get_fw_cfg_order() (fw_cfg.c):
warn_report("Unknown firmware file in legacy mode: %s", name);
Detected during qtests.
So overall, WDYT? Keep it extendible for a low price of ABI + “bootdevices” name. Or go strict and rename to “bios-geometries”?
(The ABI will not change too much anyway, the struct_size will disappear and sizeof(12) struct of LCHS will be assumed)
Sam
cheers, Gerd
Hi,
Keep it extendible for a low price of ABI + “bootdevices” name. Or go strict and rename to “bios-geometries”?
The name should reflect what is in there, so "bios-geometries" looks better to me. I'd also keep it strict, unless we have at least a vague idea what might be a useful future extension. I don't have any.
cheers, Gerd
Ok,
I’ll resubmit this patch series in v3, as well as v2 for SeaBIOS soon enough.
* Change “bootdevices” to “bios-geometry”, and remove the struct size * Add cpu_to_le32 fix as Laszlo suggested or big endian hosts * Fix last qtest commit - automatic docker tester for some reason does not have qemu-img set
Sam
On 17 Jun 2019, at 11:38, Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
Keep it extendible for a low price of ABI + “bootdevices” name. Or go strict and rename to “bios-geometries”?
The name should reflect what is in there, so "bios-geometries" looks better to me. I'd also keep it strict, unless we have at least a vague idea what might be a useful future extension. I don't have any.
cheers, Gerd
On Mon, Jun 17, 2019 at 10:36:54AM +0300, Sam Eiderman wrote:
So overall, WDYT? Keep it extendible for a low price of ABI + “bootdevices” name. Or go strict and rename to “bios-geometries”?
If we add another qemu to firmware interface I think the interface should be documented. I also think that a mix of an ascii and binary interface is going to be difficult to describe and document. I'd prefer a pure ascii interface - for example a newline separated list of four space separted fields: <device name> <cylinders> <heads> <spt>
-Kevin
On 17 Jun 2019, at 17:48, Kevin O'Connor kevin@koconnor.net wrote:
On Mon, Jun 17, 2019 at 10:36:54AM +0300, Sam Eiderman wrote:
So overall, WDYT? Keep it extendible for a low price of ABI + “bootdevices” name. Or go strict and rename to “bios-geometries”?
If we add another qemu to firmware interface I think the interface should be documented. I also think that a mix of an ascii and binary interface is going to be difficult to describe and document. I'd prefer a pure ascii interface - for example a newline separated list of four space separted fields: <device name> <cylinders> <heads> <spt>
We can go pure ascii. I meanwhile sent a v3 QEMU and v2 SeaBIOS patches for more comments.
Sam
-Kevin
Add QTest tests to check the logical geometry override option.
The tests in hd-geo-test are out of date - they only test IDE and do not test interesting MBRs.
I added a few helper functions which will make adding more tests easier.
QTest's fw_cfg helper functions support only legacy fw_cfg, so I had to read the new fw_cfg layout on my own.
Creating qcow2 disks with specific size and MBR layout is currently unused - we only use a default empty MBR.
Reviewed-by: Karl Heubaum karl.heubaum@oracle.com Reviewed-by: Arbel Moshe arbel.moshe@oracle.com Signed-off-by: Sam Eiderman shmuel.eiderman@oracle.com --- tests/Makefile.include | 2 +- tests/hd-geo-test.c | 565 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 566 insertions(+), 1 deletion(-)
diff --git a/tests/Makefile.include b/tests/Makefile.include index 46a36c2c95..55ea165ed4 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -765,7 +765,7 @@ tests/ide-test$(EXESUF): tests/ide-test.o $(libqos-pc-obj-y) tests/ahci-test$(EXESUF): tests/ahci-test.o $(libqos-pc-obj-y) qemu-img$(EXESUF) tests/ipmi-kcs-test$(EXESUF): tests/ipmi-kcs-test.o tests/ipmi-bt-test$(EXESUF): tests/ipmi-bt-test.o -tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o +tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o $(libqos-obj-y) tests/boot-order-test$(EXESUF): tests/boot-order-test.o $(libqos-obj-y) tests/boot-serial-test$(EXESUF): tests/boot-serial-test.o $(libqos-obj-y) tests/bios-tables-test$(EXESUF): tests/bios-tables-test.o \ diff --git a/tests/hd-geo-test.c b/tests/hd-geo-test.c index 62eb624726..08eafeb81a 100644 --- a/tests/hd-geo-test.c +++ b/tests/hd-geo-test.c @@ -17,7 +17,11 @@
#include "qemu/osdep.h" #include "qemu-common.h" +#include "qemu/bswap.h" +#include "qapi/qmp/qlist.h" #include "libqtest.h" +#include "libqos/fw_cfg.h" +#include "standard-headers/linux/qemu_fw_cfg.h"
#define ARGV_SIZE 256
@@ -388,6 +392,557 @@ static void test_ide_drive_cd_0(void) qtest_quit(qts); }
+typedef struct { + bool active; + uint32_t head; + uint32_t sector; + uint32_t cyl; + uint32_t end_head; + uint32_t end_sector; + uint32_t end_cyl; + uint32_t start_sect; + uint32_t nr_sects; +} MBRpartitions[4]; + +static MBRpartitions empty_mbr = { {false, 0, 0, 0, 0, 0, 0, 0, 0}, + {false, 0, 0, 0, 0, 0, 0, 0, 0}, + {false, 0, 0, 0, 0, 0, 0, 0, 0}, + {false, 0, 0, 0, 0, 0, 0, 0, 0} }; + +static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors) +{ + const char *template = "/tmp/qtest.XXXXXX"; + char *raw_path = strdup(template); + char *qcow2_path = strdup(template); + char cmd[100 + 2 * PATH_MAX]; + uint8_t buf[512]; + int i, ret, fd, offset; + uint64_t qcow2_size = sectors * 512; + uint8_t status, parttype, head, sector, cyl; + + offset = 0xbe; + + for (i = 0; i < 4; i++) { + status = mbr[i].active ? 0x80 : 0x00; + g_assert(mbr[i].head < 256); + g_assert(mbr[i].sector < 64); + g_assert(mbr[i].cyl < 1024); + head = mbr[i].head; + sector = mbr[i].sector + ((mbr[i].cyl & 0x300) >> 2); + cyl = mbr[i].cyl & 0xff; + + buf[offset + 0x0] = status; + buf[offset + 0x1] = head; + buf[offset + 0x2] = sector; + buf[offset + 0x3] = cyl; + + parttype = 0; + g_assert(mbr[i].end_head < 256); + g_assert(mbr[i].end_sector < 64); + g_assert(mbr[i].end_cyl < 1024); + head = mbr[i].end_head; + sector = mbr[i].end_sector + ((mbr[i].end_cyl & 0x300) >> 2); + cyl = mbr[i].end_cyl & 0xff; + + buf[offset + 0x4] = parttype; + buf[offset + 0x5] = head; + buf[offset + 0x6] = sector; + buf[offset + 0x7] = cyl; + + (*(uint32_t *)&buf[offset + 0x8]) = cpu_to_le32(mbr[i].start_sect); + (*(uint32_t *)&buf[offset + 0xc]) = cpu_to_le32(mbr[i].nr_sects); + + offset += 0x10; + } + + fd = mkstemp(raw_path); + g_assert(fd); + close(fd); + + fd = open(raw_path, O_WRONLY); + g_assert(fd >= 0); + ret = write(fd, buf, sizeof(buf)); + g_assert(ret == sizeof(buf)); + close(fd); + + fd = mkstemp(qcow2_path); + g_assert(fd); + close(fd); + + ret = snprintf(cmd, sizeof(cmd), + "$QTEST_QEMU_IMG convert -f raw -O qcow2 %s %s > /dev/null", + raw_path, qcow2_path); + g_assert((0 < ret) && (ret <= sizeof(cmd))); + ret = system(cmd); + g_assert(ret == 0); + + ret = snprintf(cmd, sizeof(cmd), + "$QTEST_QEMU_IMG resize %s %" PRIu64 " > /dev/null", + qcow2_path, qcow2_size); + g_assert((0 < ret) && (ret <= sizeof(cmd))); + ret = system(cmd); + g_assert(ret == 0); + + unlink(raw_path); + free(raw_path); + + return qcow2_path; +} + +struct QemuCfgFile { + uint32_t size; /* file size */ + uint16_t select; /* write this to 0x510 to read it */ + uint16_t reserved; + char name[56]; +}; + +static uint16_t find_fw_cfg_file(QFWCFG *fw_cfg, + const char *filename) +{ + struct QemuCfgFile qfile; + uint32_t count, e; + uint16_t select; + + count = qfw_cfg_get_u32(fw_cfg, FW_CFG_FILE_DIR); + count = be32_to_cpu(count); + for (select = 0, e = 0; e < count; e++) { + qfw_cfg_read_data(fw_cfg, &qfile, sizeof(qfile)); + if (!strcmp(filename, qfile.name)) { + select = be16_to_cpu(qfile.select); + } + } + + return select; +} + +static void read_fw_cfg_file(QFWCFG *fw_cfg, + const char *filename, + void *data, + size_t len) +{ + uint16_t select = find_fw_cfg_file(fw_cfg, filename); + + g_assert(select); + + qfw_cfg_get(fw_cfg, select, data, len); +} + +#define BOOTDEVICES_MAX_SIZE 10000 + +typedef struct { + uint32_t c; + uint32_t h; + uint32_t s; +} CHS; + +typedef struct { + const char *dev_path; + CHS chs; +} CHSResult; + +static void read_bootdevices(QFWCFG *fw_cfg, CHSResult expected[]) +{ + uint32_t info_size; + char *buf = g_malloc0(BOOTDEVICES_MAX_SIZE); + void *cur; + char *name; + CHS *chs; + GList *results = NULL, *cur_result; + CHSResult *r; + int i; + bool found; + + read_fw_cfg_file(fw_cfg, "bootdevices", buf, BOOTDEVICES_MAX_SIZE); + + cur = buf; + + info_size = *((uint32_t *)cur); + + g_assert(info_size >= sizeof(*chs)); + + cur += 4; + + while (strlen(cur)) { + name = cur; + chs = cur + strlen(cur) + 1; + + r = g_malloc0(sizeof(*r)); + r->dev_path = name; + r->chs = *chs; + + results = g_list_prepend(results, r); + + cur += strlen(cur) + 1 + info_size; + } + + i = 0; + + while (expected[i].dev_path) { + found = false; + cur_result = results; + while (cur_result) { + r = cur_result->data; + if (!strcmp(r->dev_path, expected[i].dev_path) && + !memcmp(&(r->chs), &(expected[i].chs), sizeof(r->chs))) { + found = true; + break; + } + cur_result = g_list_next(cur_result); + } + g_assert(found); + g_free(cur_result->data); + results = g_list_delete_link(results, cur_result); + i++; + } + + g_assert(results == NULL); + + g_free(buf); +} + +#define MAX_DRIVES 30 + +typedef struct { + char **argv; + int argc; + char **drives; + int n_drives; + int n_scsi_disks; + int n_scsi_controllers; + int n_virtio_disks; +} TestArgs; + +static TestArgs *create_args(void) +{ + TestArgs *args = g_malloc0(sizeof(*args)); + args->argv = g_new0(char *, ARGV_SIZE); + args->argc = append_arg(args->argc, args->argv, + ARGV_SIZE, g_strdup("-nodefaults")); + args->drives = g_new0(char *, MAX_DRIVES); + return args; +} + +static void add_drive_with_mbr(TestArgs *args, + MBRpartitions mbr, uint64_t sectors) +{ + char *img_file_name; + char part[300]; + int ret; + + g_assert(args->n_drives < MAX_DRIVES); + + img_file_name = create_qcow2_with_mbr(mbr, sectors); + + args->drives[args->n_drives] = img_file_name; + ret = snprintf(part, sizeof(part), + "-drive file=%s,if=none,format=qcow2,id=disk%d", + img_file_name, args->n_drives); + g_assert((0 < ret) && (ret <= sizeof(part))); + args->argc = append_arg(args->argc, args->argv, ARGV_SIZE, g_strdup(part)); + args->n_drives++; +} + +static void add_ide_disk(TestArgs *args, + int drive_idx, int bus, int unit, int c, int h, int s) +{ + char part[300]; + int ret; + + ret = snprintf(part, sizeof(part), + "-device ide-hd,drive=disk%d,bus=ide.%d,unit=%d," + "lcyls=%d,lheads=%d,lsecs=%d", + drive_idx, bus, unit, c, h, s); + g_assert((0 < ret) && (ret <= sizeof(part))); + args->argc = append_arg(args->argc, args->argv, ARGV_SIZE, g_strdup(part)); +} + +static void add_scsi_controller(TestArgs *args, + const char *type, + const char *bus, + int addr) +{ + char part[300]; + int ret; + + ret = snprintf(part, sizeof(part), + "-device %s,id=scsi%d,bus=%s,addr=%d", + type, args->n_scsi_controllers, bus, addr); + g_assert((0 < ret) && (ret <= sizeof(part))); + args->argc = append_arg(args->argc, args->argv, ARGV_SIZE, g_strdup(part)); + args->n_scsi_controllers++; +} + +static void add_scsi_disk(TestArgs *args, + int drive_idx, int bus, + int channel, int scsi_id, int lun, + int c, int h, int s) +{ + char part[300]; + int ret; + + ret = snprintf(part, sizeof(part), + "-device scsi-hd,id=scsi-disk%d,drive=disk%d," + "bus=scsi%d.0," + "channel=%d,scsi-id=%d,lun=%d," + "lcyls=%d,lheads=%d,lsecs=%d", + args->n_scsi_disks, drive_idx, bus, channel, scsi_id, lun, + c, h, s); + g_assert((0 < ret) && (ret <= sizeof(part))); + args->argc = append_arg(args->argc, args->argv, ARGV_SIZE, g_strdup(part)); + args->n_scsi_disks++; +} + +static void add_virtio_disk(TestArgs *args, + int drive_idx, const char *bus, int addr, + int c, int h, int s) +{ + char part[300]; + int ret; + + ret = snprintf(part, sizeof(part), + "-device virtio-blk-pci,id=virtio-disk%d," + "drive=disk%d,bus=%s,addr=%d," + "lcyls=%d,lheads=%d,lsecs=%d", + args->n_virtio_disks, drive_idx, bus, addr, c, h, s); + g_assert((0 < ret) && (ret <= sizeof(part))); + args->argc = append_arg(args->argc, args->argv, ARGV_SIZE, g_strdup(part)); + args->n_virtio_disks++; +} + +static void test_override(TestArgs *args, CHSResult expected[]) +{ + char *joined_args; + QFWCFG *fw_cfg; + int i; + + joined_args = g_strjoinv(" ", args->argv); + + qtest_start(joined_args); + fw_cfg = pc_fw_cfg_init(global_qtest); + + read_bootdevices(fw_cfg, expected); + + g_free(joined_args); + qtest_end(); + + g_free(fw_cfg); + + for (i = 0; i < args->n_drives; i++) { + unlink(args->drives[i]); + free(args->drives[i]); + } + g_free(args->drives); + g_strfreev(args->argv); + g_free(args); +} + +static void test_override_ide(void) +{ + TestArgs *args = create_args(); + CHSResult expected[] = { + {"/pci@i0cf8/ide@1,1/drive@0/disk@0", {10000, 120, 30} }, + {"/pci@i0cf8/ide@1,1/drive@0/disk@1", {9000, 120, 30} }, + {"/pci@i0cf8/ide@1,1/drive@1/disk@0", {0, 1, 1} }, + {"/pci@i0cf8/ide@1,1/drive@1/disk@1", {1, 0, 0} }, + {NULL, {0, 0, 0} } + }; + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_ide_disk(args, 0, 0, 0, 10000, 120, 30); + add_ide_disk(args, 1, 0, 1, 9000, 120, 30); + add_ide_disk(args, 2, 1, 0, 0, 1, 1); + add_ide_disk(args, 3, 1, 1, 1, 0, 0); + test_override(args, expected); +} + +static void test_override_scsi(void) +{ + TestArgs *args = create_args(); + CHSResult expected[] = { + {"/pci@i0cf8/scsi@3/channel@0/disk@0,0", {10000, 120, 30} }, + {"/pci@i0cf8/scsi@3/channel@0/disk@1,0", {9000, 120, 30} }, + {"/pci@i0cf8/scsi@3/channel@0/disk@2,0", {1, 0, 0} }, + {"/pci@i0cf8/scsi@3/channel@0/disk@3,0", {0, 1, 0} }, + {NULL, {0, 0, 0} } + }; + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_scsi_controller(args, "lsi53c895a", "pci.0", 3); + add_scsi_disk(args, 0, 0, 0, 0, 0, 10000, 120, 30); + add_scsi_disk(args, 1, 0, 0, 1, 0, 9000, 120, 30); + add_scsi_disk(args, 2, 0, 0, 2, 0, 1, 0, 0); + add_scsi_disk(args, 3, 0, 0, 3, 0, 0, 1, 0); + test_override(args, expected); +} + +static void test_override_scsi_2_controllers(void) +{ + TestArgs *args = create_args(); + CHSResult expected[] = { + {"/pci@i0cf8/scsi@3/channel@0/disk@0,0", {10000, 120, 30} }, + {"/pci@i0cf8/scsi@3/channel@0/disk@1,0", {9000, 120, 30} }, + {"/pci@i0cf8/scsi@4/channel@0/disk@0,1", {1, 0, 0} }, + {"/pci@i0cf8/scsi@4/channel@0/disk@1,2", {0, 1, 0} }, + {NULL, {0, 0, 0} } + }; + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_scsi_controller(args, "lsi53c895a", "pci.0", 3); + add_scsi_controller(args, "virtio-scsi-pci", "pci.0", 4); + add_scsi_disk(args, 0, 0, 0, 0, 0, 10000, 120, 30); + add_scsi_disk(args, 1, 0, 0, 1, 0, 9000, 120, 30); + add_scsi_disk(args, 2, 1, 0, 0, 1, 1, 0, 0); + add_scsi_disk(args, 3, 1, 0, 1, 2, 0, 1, 0); + test_override(args, expected); +} + +static void test_override_virtio_blk(void) +{ + TestArgs *args = create_args(); + CHSResult expected[] = { + {"/pci@i0cf8/scsi@3/disk@0,0", {10000, 120, 30} }, + {"/pci@i0cf8/scsi@4/disk@0,0", {9000, 120, 30} }, + {NULL, {0, 0, 0} } + }; + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_virtio_disk(args, 0, "pci.0", 3, 10000, 120, 30); + add_virtio_disk(args, 1, "pci.0", 4, 9000, 120, 30); + test_override(args, expected); +} + +static void test_override_zero_chs(void) +{ + TestArgs *args = create_args(); + CHSResult expected[] = { + {NULL, {0, 0, 0} } + }; + add_drive_with_mbr(args, empty_mbr, 1); + add_ide_disk(args, 0, 1, 1, 0, 0, 0); + test_override(args, expected); +} + +static void test_override_scsi_hot_unplug(void) +{ + char *joined_args; + QFWCFG *fw_cfg; + QDict *response; + int i; + TestArgs *args = create_args(); + CHSResult expected[] = { + {"/pci@i0cf8/scsi@2/channel@0/disk@0,0", {10000, 120, 30} }, + {"/pci@i0cf8/scsi@2/channel@0/disk@1,0", {20, 20, 20} }, + {NULL, {0, 0, 0} } + }; + CHSResult expected2[] = { + {"/pci@i0cf8/scsi@2/channel@0/disk@1,0", {20, 20, 20} }, + {NULL, {0, 0, 0} } + }; + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_scsi_controller(args, "virtio-scsi-pci", "pci.0", 2); + add_scsi_disk(args, 0, 0, 0, 0, 0, 10000, 120, 30); + add_scsi_disk(args, 1, 0, 0, 1, 0, 20, 20, 20); + + joined_args = g_strjoinv(" ", args->argv); + + qtest_start(joined_args); + fw_cfg = pc_fw_cfg_init(global_qtest); + + read_bootdevices(fw_cfg, expected); + + /* unplug device an restart */ + response = qmp("{ 'execute': 'device_del'," + " 'arguments': {'id': 'scsi-disk0' }}"); + g_assert(response); + g_assert(!qdict_haskey(response, "error")); + qobject_unref(response); + response = qmp("{ 'execute': 'system_reset', 'arguments': { }}"); + g_assert(response); + g_assert(!qdict_haskey(response, "error")); + qobject_unref(response); + + qtest_qmp_eventwait(global_qtest, "RESET"); + + read_bootdevices(fw_cfg, expected2); + + g_free(joined_args); + qtest_end(); + + g_free(fw_cfg); + + for (i = 0; i < args->n_drives; i++) { + unlink(args->drives[i]); + free(args->drives[i]); + } + g_free(args->drives); + g_strfreev(args->argv); + g_free(args); +} + +static void test_override_virtio_hot_unplug(void) +{ + char *joined_args; + QFWCFG *fw_cfg; + QDict *response; + int i; + TestArgs *args = create_args(); + CHSResult expected[] = { + {"/pci@i0cf8/scsi@2/disk@0,0", {10000, 120, 30} }, + {"/pci@i0cf8/scsi@3/disk@0,0", {20, 20, 20} }, + {NULL, {0, 0, 0} } + }; + CHSResult expected2[] = { + {"/pci@i0cf8/scsi@3/disk@0,0", {20, 20, 20} }, + {NULL, {0, 0, 0} } + }; + add_drive_with_mbr(args, empty_mbr, 1); + add_drive_with_mbr(args, empty_mbr, 1); + add_virtio_disk(args, 0, "pci.0", 2, 10000, 120, 30); + add_virtio_disk(args, 1, "pci.0", 3, 20, 20, 20); + + joined_args = g_strjoinv(" ", args->argv); + + qtest_start(joined_args); + fw_cfg = pc_fw_cfg_init(global_qtest); + + read_bootdevices(fw_cfg, expected); + + /* unplug device an restart */ + response = qmp("{ 'execute': 'device_del'," + " 'arguments': {'id': 'virtio-disk0' }}"); + g_assert(response); + g_assert(!qdict_haskey(response, "error")); + qobject_unref(response); + response = qmp("{ 'execute': 'system_reset', 'arguments': { }}"); + g_assert(response); + g_assert(!qdict_haskey(response, "error")); + qobject_unref(response); + + qtest_qmp_eventwait(global_qtest, "RESET"); + + read_bootdevices(fw_cfg, expected2); + + g_free(joined_args); + qtest_end(); + + g_free(fw_cfg); + + for (i = 0; i < args->n_drives; i++) { + unlink(args->drives[i]); + free(args->drives[i]); + } + g_free(args->drives); + g_strfreev(args->argv); + g_free(args); +} + int main(int argc, char **argv) { Backend i; @@ -413,6 +968,16 @@ int main(int argc, char **argv) qtest_add_func("hd-geo/ide/device/mbr/chs", test_ide_device_mbr_chs); qtest_add_func("hd-geo/ide/device/user/chs", test_ide_device_user_chs); qtest_add_func("hd-geo/ide/device/user/chst", test_ide_device_user_chst); + qtest_add_func("hd-geo/override/ide", test_override_ide); + qtest_add_func("hd-geo/override/scsi", test_override_scsi); + qtest_add_func("hd-geo/override/scsi_2_controllers", + test_override_scsi_2_controllers); + qtest_add_func("hd-geo/override/virtio_blk", test_override_virtio_blk); + qtest_add_func("hd-geo/override/zero_chs", test_override_zero_chs); + qtest_add_func("hd-geo/override/scsi_hot_unplug", + test_override_scsi_hot_unplug); + qtest_add_func("hd-geo/override/virtio_hot_unplug", + test_override_virtio_hot_unplug);
ret = g_test_run();
On Wed, Jun 12, 2019 at 02:59:31PM +0300, Sam Eiderman wrote:
v1:
Non-standard logical geometries break under QEMU.
A virtual disk which contains an operating system which depends on logical geometries (consistent values being reported from BIOS INT13 AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard logical geometries - for example 56 SPT (sectors per track). No matter what QEMU will guess - SeaBIOS, for large enough disks - will use LBA translation, which will report 63 SPT instead.
--verbose please.
As far I know seabios switches to LBA mode when the disk is simply too big for LCHS addressing. So I fail to see which problem is solved by this. If your guest needs LCHS, why do you assign a disk which can't be fully accessed using LCHS addressing?
In addition we can not enforce SeaBIOS to rely on phyiscal geometries at all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not report more than 16 physical heads when moved to an IDE controller, the ATA spec allows a maximum of 16 heads - this is an artifact of virtualization.
Well, not really. Moving disks from one controller to another when the OS depends on LHCS addressing never is a good idea. That already caused problems in the 90-ies, when moving scsi disks from one scsi host adapter to another type, *way* before virtualization became a thing.
BTW: One possible way to figure which LCHS layout a disk uses is to check the MBR partition table. With that we (a) don't need a new interface between qemu and seabios and (b) it is not needed to manually specify the geometry.
cheers, Gerd
On 12 Jun 2019, at 16:06, Gerd Hoffmann kraxel@redhat.com wrote:
On Wed, Jun 12, 2019 at 02:59:31PM +0300, Sam Eiderman wrote:
v1:
Non-standard logical geometries break under QEMU.
A virtual disk which contains an operating system which depends on logical geometries (consistent values being reported from BIOS INT13 AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard logical geometries - for example 56 SPT (sectors per track). No matter what QEMU will guess - SeaBIOS, for large enough disks - will use LBA translation, which will report 63 SPT instead.
--verbose please.
As far I know seabios switches to LBA mode when the disk is simply too big for LCHS addressing. So I fail to see which problem is solved by this. If your guest needs LCHS, why do you assign a disk which can't be fully accessed using LCHS addressing?
The scenario is as follows:
A user has a disk with 56 spts. This disk has been already created under a bios that reported 56 spts. When migrating this disk to QEMU/SeaBIOS, SeaBIOS will report 63 spts (under LBA translation) - this will break the boot for this guest.
In addition we can not enforce SeaBIOS to rely on phyiscal geometries at all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not report more than 16 physical heads when moved to an IDE controller, the ATA spec allows a maximum of 16 heads - this is an artifact of virtualization.
Well, not really. Moving disks from one controller to another when the OS depends on LHCS addressing never is a good idea. That already caused problems in the 90-ies, when moving scsi disks from one scsi host adapter to another type, *way* before virtualization became a thing.
I agree, but this is easily solvable in virtualized environments where the hypervisor can guess the correct LCHS values by inspecting the MBR, or letting the user set these values manually.
BTW: One possible way to figure which LCHS layout a disk uses is to check the MBR partition table. With that we (a) don't need a new interface between qemu and seabios and (b) it is not needed to manually specify the geometry.
In my opinion SeaBIOS is not the correct place for this change since “enhancing” the detection of LCHS values in SeaBIOS may cause it to suddenly report different values for already existing guests which rely on LCHS - thus, breaking compatibility. Much like smbios, acpi and mptables - I believe that the correct place to use MBR guessing is QEMU (which already has one, with some issues) and pass the guess using fw_cfg - this will allow using the compat system in qemu itself.
Sam
cheers, Gerd
On Wed, Jun 12, 2019 at 04:30:03PM +0300, Sam Eiderman wrote:
On 12 Jun 2019, at 16:06, Gerd Hoffmann kraxel@redhat.com wrote:
On Wed, Jun 12, 2019 at 02:59:31PM +0300, Sam Eiderman wrote:
v1:
Non-standard logical geometries break under QEMU.
A virtual disk which contains an operating system which depends on logical geometries (consistent values being reported from BIOS INT13 AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard logical geometries - for example 56 SPT (sectors per track). No matter what QEMU will guess - SeaBIOS, for large enough disks - will use LBA translation, which will report 63 SPT instead.
--verbose please.
As far I know seabios switches to LBA mode when the disk is simply too big for LCHS addressing. So I fail to see which problem is solved by this. If your guest needs LCHS, why do you assign a disk which can't be fully accessed using LCHS addressing?
The scenario is as follows:
A user has a disk with 56 spts. This disk has been already created under a bios that reported 56 spts. When migrating this disk to QEMU/SeaBIOS, SeaBIOS will report 63 spts (under LBA translation) - this will break the boot for this guest.
You sayed so already. I was looking for a real world example. Guests which can't deal with LBA should be pretty rare these days. What kind of guest? What other bios? Or is this a purely theoretical issue?
In addition we can not enforce SeaBIOS to rely on phyiscal geometries at all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not report more than 16 physical heads when moved to an IDE controller, the ATA spec allows a maximum of 16 heads - this is an artifact of virtualization.
Well, not really. Moving disks from one controller to another when the OS depends on LHCS addressing never is a good idea. That already caused problems in the 90-ies, when moving scsi disks from one scsi host adapter to another type, *way* before virtualization became a thing.
I agree, but this is easily solvable in virtualized environments where the hypervisor can guess the correct LCHS values by inspecting the MBR,
Yes. This is exactly what the more clever scsi host adapter int13 rom implementations ended up doing too. Look at MBR to figure which LCHS they should use.
or letting the user set these values manually.
Why? Asking the user to deal with the mess is pretty lame if there are better options. And IMO doing this fully automatic in seabios is better.
BTW: One possible way to figure which LCHS layout a disk uses is to check the MBR partition table. With that we (a) don't need a new interface between qemu and seabios and (b) it is not needed to manually specify the geometry.
In my opinion SeaBIOS is not the correct place for this change since “enhancing” the detection of LCHS values in SeaBIOS may cause it to suddenly report different values for already existing guests which rely on LCHS - thus, breaking compatibility.
I can't see how this can break guests. It should either have no effect (guests using LBA) or unbreak guests due to LCHS changing from "wrong" to "correct".
cheers, Gerd
On 12 Jun 2019, at 22:18, Gerd Hoffmann kraxel@redhat.com wrote:
On Wed, Jun 12, 2019 at 04:30:03PM +0300, Sam Eiderman wrote:
On 12 Jun 2019, at 16:06, Gerd Hoffmann kraxel@redhat.com wrote:
On Wed, Jun 12, 2019 at 02:59:31PM +0300, Sam Eiderman wrote:
v1:
Non-standard logical geometries break under QEMU.
A virtual disk which contains an operating system which depends on logical geometries (consistent values being reported from BIOS INT13 AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard logical geometries - for example 56 SPT (sectors per track). No matter what QEMU will guess - SeaBIOS, for large enough disks - will use LBA translation, which will report 63 SPT instead.
--verbose please.
As far I know seabios switches to LBA mode when the disk is simply too big for LCHS addressing. So I fail to see which problem is solved by this. If your guest needs LCHS, why do you assign a disk which can't be fully accessed using LCHS addressing?
The scenario is as follows:
A user has a disk with 56 spts. This disk has been already created under a bios that reported 56 spts. When migrating this disk to QEMU/SeaBIOS, SeaBIOS will report 63 spts (under LBA translation) - this will break the boot for this guest.
You sayed so already. I was looking for a real world example. Guests which can't deal with LBA should be pretty rare these days. What kind of guest? What other bios? Or is this a purely theoretical issue?
Yes they are pretty rare. Windows 2000 and Windows XP guests migrated from VMware to Qemu/KVM would not boot due to incorrect disk geometries (some had 32/56 spt instead of 56. Also number of heads was not entirely correct)
In addition we can not enforce SeaBIOS to rely on phyiscal geometries at all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not report more than 16 physical heads when moved to an IDE controller, the ATA spec allows a maximum of 16 heads - this is an artifact of virtualization.
Well, not really. Moving disks from one controller to another when the OS depends on LHCS addressing never is a good idea. That already caused problems in the 90-ies, when moving scsi disks from one scsi host adapter to another type, *way* before virtualization became a thing.
I agree, but this is easily solvable in virtualized environments where the hypervisor can guess the correct LCHS values by inspecting the MBR,
Yes. This is exactly what the more clever scsi host adapter int13 rom implementations ended up doing too. Look at MBR to figure which LCHS they should use.
or letting the user set these values manually.
Why? Asking the user to deal with the mess is pretty lame if there are better options. And IMO doing this fully automatic in seabios is better.
I’m not against an automatic approach, however I do think that doing this in SeaBIOS might break compatibility for already existing guests that will suddenly see different LCHS values. (Explanation below)
Notice that already today it is possible to pass “cyls", “heads", “sectors” and even “chs-trans” (IDE only) for devices in QEMU, but these are only the physical geometries of the disks which later on SeaBIOS might use to determine the logical geometries. "chs-trans" is an already existing PV interface between QEMU and SeaBIOS for that matter (although it only supports 4 IDE disks).
I believe that the steps to bring this issue to a more stable state are: Create a PV interface between QEMU and SeaBIOS to pass LCHS (Implemented here) Allow users to manually set values for LCHS values in QEMU (Implemented here) (Up until here, we do not break any existing functionality) Implement a better LCHS guessing algorithm in QEMU - the existing ones contains some issues On new machine versions - pass guessed LCHS directly to SeaBIOS At the moment QEMU does not propagate its MBR guessed LCHS values, but only uses them to set PCHS values for disks - so SeaBIOS has to guess again (Also here we will not break compatibility for older machine versions)
In addition, QEMU allows the use of VMDKs, some VMDK descriptors contain the following values: ddb.geometry.biosHeads = “16” ddb.geometry.biosHeads = “83257” Which override the guessing algorithm in VMware and request the following values to be set.
Providing such PV interface will allow to support these VMDKs too.
BTW: One possible way to figure which LCHS layout a disk uses is to check the MBR partition table. With that we (a) don't need a new interface between qemu and seabios and (b) it is not needed to manually specify the geometry.
In my opinion SeaBIOS is not the correct place for this change since “enhancing” the detection of LCHS values in SeaBIOS may cause it to suddenly report different values for already existing guests which rely on LCHS - thus, breaking compatibility.
I can't see how this can break guests. It should either have no effect (guests using LBA) or unbreak guests due to LCHS changing from "wrong" to "correct”.
I’m not sure what do you mean by "unbreak guests” if you change an existing guest that uses LCHS from 56 spt to LBA (63 spt) it will stop booting. Your guessing algorithm will have to guess 56, if it will fail guessing 56 correctly, the user can not perform any action beside downgrading SeaBIOS in order to run the guest.
Sam
cheers, Gerd
typo: ddb.geometry.biosCylinders = “83257” *
Sam
On 13 Jun 2019, at 10:41, Sam Eiderman shmuel.eiderman@oracle.com wrote:
On 12 Jun 2019, at 22:18, Gerd Hoffmann <kraxel@redhat.com mailto:kraxel@redhat.com> wrote:
On Wed, Jun 12, 2019 at 04:30:03PM +0300, Sam Eiderman wrote:
On 12 Jun 2019, at 16:06, Gerd Hoffmann <kraxel@redhat.com mailto:kraxel@redhat.com> wrote:
On Wed, Jun 12, 2019 at 02:59:31PM +0300, Sam Eiderman wrote:
v1:
Non-standard logical geometries break under QEMU.
A virtual disk which contains an operating system which depends on logical geometries (consistent values being reported from BIOS INT13 AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard logical geometries - for example 56 SPT (sectors per track). No matter what QEMU will guess - SeaBIOS, for large enough disks - will use LBA translation, which will report 63 SPT instead.
--verbose please.
As far I know seabios switches to LBA mode when the disk is simply too big for LCHS addressing. So I fail to see which problem is solved by this. If your guest needs LCHS, why do you assign a disk which can't be fully accessed using LCHS addressing?
The scenario is as follows:
A user has a disk with 56 spts. This disk has been already created under a bios that reported 56 spts. When migrating this disk to QEMU/SeaBIOS, SeaBIOS will report 63 spts (under LBA translation) - this will break the boot for this guest.
You sayed so already. I was looking for a real world example. Guests which can't deal with LBA should be pretty rare these days. What kind of guest? What other bios? Or is this a purely theoretical issue?
Yes they are pretty rare. Windows 2000 and Windows XP guests migrated from VMware to Qemu/KVM would not boot due to incorrect disk geometries (some had 32/56 spt instead of 56. Also number of heads was not entirely correct)
In addition we can not enforce SeaBIOS to rely on phyiscal geometries at all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not report more than 16 physical heads when moved to an IDE controller, the ATA spec allows a maximum of 16 heads - this is an artifact of virtualization.
Well, not really. Moving disks from one controller to another when the OS depends on LHCS addressing never is a good idea. That already caused problems in the 90-ies, when moving scsi disks from one scsi host adapter to another type, *way* before virtualization became a thing.
I agree, but this is easily solvable in virtualized environments where the hypervisor can guess the correct LCHS values by inspecting the MBR,
Yes. This is exactly what the more clever scsi host adapter int13 rom implementations ended up doing too. Look at MBR to figure which LCHS they should use.
or letting the user set these values manually.
Why? Asking the user to deal with the mess is pretty lame if there are better options. And IMO doing this fully automatic in seabios is better.
I’m not against an automatic approach, however I do think that doing this in SeaBIOS might break compatibility for already existing guests that will suddenly see different LCHS values. (Explanation below)
Notice that already today it is possible to pass “cyls", “heads", “sectors” and even “chs-trans” (IDE only) for devices in QEMU, but these are only the physical geometries of the disks which later on SeaBIOS might use to determine the logical geometries. "chs-trans" is an already existing PV interface between QEMU and SeaBIOS for that matter (although it only supports 4 IDE disks).
I believe that the steps to bring this issue to a more stable state are: Create a PV interface between QEMU and SeaBIOS to pass LCHS (Implemented here) Allow users to manually set values for LCHS values in QEMU (Implemented here) (Up until here, we do not break any existing functionality) Implement a better LCHS guessing algorithm in QEMU - the existing ones contains some issues On new machine versions - pass guessed LCHS directly to SeaBIOS At the moment QEMU does not propagate its MBR guessed LCHS values, but only uses them to set PCHS values for disks - so SeaBIOS has to guess again (Also here we will not break compatibility for older machine versions)
In addition, QEMU allows the use of VMDKs, some VMDK descriptors contain the following values: ddb.geometry.biosHeads = “16” ddb.geometry.biosHeads = “83257” Which override the guessing algorithm in VMware and request the following values to be set.
Providing such PV interface will allow to support these VMDKs too.
BTW: One possible way to figure which LCHS layout a disk uses is to check the MBR partition table. With that we (a) don't need a new interface between qemu and seabios and (b) it is not needed to manually specify the geometry.
In my opinion SeaBIOS is not the correct place for this change since “enhancing” the detection of LCHS values in SeaBIOS may cause it to suddenly report different values for already existing guests which rely on LCHS - thus, breaking compatibility.
I can't see how this can break guests. It should either have no effect (guests using LBA) or unbreak guests due to LCHS changing from "wrong" to "correct”.
I’m not sure what do you mean by "unbreak guests” if you change an existing guest that uses LCHS from 56 spt to LBA (63 spt) it will stop booting. Your guessing algorithm will have to guess 56, if it will fail guessing 56 correctly, the user can not perform any action beside downgrading SeaBIOS in order to run the guest.
Sam
cheers, Gerd
Hi,
Yes they are pretty rare. Windows 2000 and Windows XP guests migrated from VMware to Qemu/KVM would not boot due to incorrect disk geometries (some had 32/56 spt instead of 56. Also number of heads was not entirely correct)
Ok.
Why? Asking the user to deal with the mess is pretty lame if there are better options. And IMO doing this fully automatic in seabios is better.
I’m not against an automatic approach, however I do think that doing this in SeaBIOS might break compatibility for already existing guests that will suddenly see different LCHS values. (Explanation below)
I can't see how this can break guests. It should either have no effect (guests using LBA) or unbreak guests due to LCHS changing from "wrong" to "correct”.
I’m not sure what do you mean by "unbreak guests” if you change an existing guest that uses LCHS from 56 spt to LBA (63 spt) it will stop booting.
Well, that LCHS change happens because you move the guest from vmware to qemu and seabios uses 63 spt no matter what if the disk is too big for chs addressing.
When seabios is changed to look at the MBR to figure what the lchs of the disk is that will make your guest boot.
Your guessing algorithm will have to guess 56, if it will fail guessing 56 correctly, the user can not perform any action beside downgrading SeaBIOS in order to run the guest.
Sure, if the guess is wrong then the guest will not boot. That isn't worse than the situation we have today where seabios will not even try to figure what the lchs of the disk is.
And, no, downgrading seabios will not make your vmware guest with 56 spt boot.
cheers, Gerd
On 13 Jun 2019, at 12:38, Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
Yes they are pretty rare. Windows 2000 and Windows XP guests migrated from VMware to Qemu/KVM would not boot due to incorrect disk geometries (some had 32/56 spt instead of 56. Also number of heads was not entirely correct)
Ok.
Why? Asking the user to deal with the mess is pretty lame if there are better options. And IMO doing this fully automatic in seabios is better.
I’m not against an automatic approach, however I do think that doing this in SeaBIOS might break compatibility for already existing guests that will suddenly see different LCHS values. (Explanation below)
I can't see how this can break guests. It should either have no effect (guests using LBA) or unbreak guests due to LCHS changing from "wrong" to "correct”.
I’m not sure what do you mean by "unbreak guests” if you change an existing guest that uses LCHS from 56 spt to LBA (63 spt) it will stop booting.
Well, that LCHS change happens because you move the guest from vmware to qemu and seabios uses 63 spt no matter what if the disk is too big for chs addressing.
When seabios is changed to look at the MBR to figure what the lchs of the disk is that will make your guest boot.
See below
Your guessing algorithm will have to guess 56, if it will fail guessing 56 correctly, the user can not perform any action beside downgrading SeaBIOS in order to run the guest.
Sure, if the guess is wrong then the guest will not boot. That isn't worse than the situation we have today where seabios will not even try to figure what the lchs of the disk is.
And, no, downgrading seabios will not make your vmware guest with 56 spt boot.
I’m not talking about the vmware case here. If you introduce MBR guessing into SeaBIOS and change its default behaviour you risk making operating systems such as Windows XP / 2003 / 2000 created on QEMU to not work anymore.
Example:
Consider a Windows XP that works with the following geometries on standard QEMU/SeaBIOS today: Disk is very large, therefore INT13 AH=02:
255 heads, 63 spt
Now you change SeaBIOS to guess from the MBR. In some cases the MBR guess can be incorrect so now SeaBIOS will guess:
255 heads, 62 spt
The guest no longer boots with these geometries and you broke compatibility. Can there be a guest that will fail the MBR in such a way? Yes. Look at the following MBR partition table of a Windows XP guest in our production environment:
Disk size in sectors: 16777216
Binary (only one partition 16 bytes): 80 01 01 00 07 fe ff ff 3f 00 00 00 d5 ea ff 00 Start: (0, 1, 1, 63) End: (1023, 254, 63, 16771859)
As can be easily seen, any MBR guessing algorithm should guess:
255 heads (since a value of 254 appears), 63 spt (since a value of 63 appears)
Turns out that this image does not work with 255, 63 but actually requires
16 heads, 63 spt
to boot.
So relying on MBR partitions alone is not always enough and sometimes manual intervention is required.
(VMware solves this by specifying 16 heads, 63 spt in the descriptor file and overrides its default guessing algorithm which also fails here)
(By the way this is not a VMware specific problem, the disk itself was imported to VMware in a P2V scenario, so that probably explains why the ddb.geometry.bios* values appear in the VMDK in the first place)
cheers, Gerd
Hi,
Can there be a guest that will fail the MBR in such a way? Yes. Look at the following MBR partition table of a Windows XP guest in our production environment:
Disk size in sectors: 16777216
Binary (only one partition 16 bytes): 80 01 01 00 07 fe ff ff 3f 00 00 00 d5 ea ff 00 Start: (0, 1, 1, 63) End: (1023, 254, 63, 16771859)
As can be easily seen, any MBR guessing algorithm should guess:
255 heads (since a value of 254 appears), 63 spt (since a value of 63 appears)
Turns out that this image does not work with 255, 63 but actually requires
16 heads, 63 spt
to boot.
So relying on MBR partitions alone is not always enough and sometimes manual intervention is required.
Ok, given that seabios has no setup any manual configuration needs to be done via qemu.
But why do we need a new interface for that? IDE can pass the geometry to the guest. virtio-blk has support too (VIRTIO_BLK_F_GEOMETRY). Likewise scsi (MODE_PAGE_HD_GEOMETRY). So this should be doable without any qemu changes.
cheers, Gerd
On 14 Jun 2019, at 7:43, Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
Can there be a guest that will fail the MBR in such a way? Yes. Look at the following MBR partition table of a Windows XP guest in our production environment:
Disk size in sectors: 16777216
Binary (only one partition 16 bytes): 80 01 01 00 07 fe ff ff 3f 00 00 00 d5 ea ff 00 Start: (0, 1, 1, 63) End: (1023, 254, 63, 16771859)
As can be easily seen, any MBR guessing algorithm should guess:
255 heads (since a value of 254 appears), 63 spt (since a value of 63 appears)
Turns out that this image does not work with 255, 63 but actually requires
16 heads, 63 spt
to boot.
So relying on MBR partitions alone is not always enough and sometimes manual intervention is required.
Ok, given that seabios has no setup any manual configuration needs to be done via qemu.
But why do we need a new interface for that? IDE can pass the geometry to the guest. virtio-blk has support too (VIRTIO_BLK_F_GEOMETRY). Likewise scsi (MODE_PAGE_HD_GEOMETRY). So this should be doable without any qemu changes.
This was indeed considered (all 3 methods) but it has the following issues:
Physical geometries of devices must now also be logical geometries with translation=none. When the OS will query these devices - It will now see different physical geometries, adapted to be logical geometries. I’m not sure even how to implement this without breaking existing compatibility - since we don’t want to affect logical geometries of currently used guests. MODE_PAGE_HD_GEOMETRY does not contain the spts, only cylinders (as 3 byte number) and heads (as 1 byte number) and computes the spts using: number_of_total_sectors / (heads * cylinders), this means that qemu now must report number_of_total_sectors as heads * cylinders * spt for SeaBIOS to correctly receive the number of spts - this is not optimal since number_of_total_sectors can not reflect the real amount of sectors in the disk which should be reported from CDB_CMD_READ_CAPACITY. Moving a scsi-hd/virtio-blk with 255 physical heads to ide-hd, we will still need to report 255 heads - this is possible since a whole byte can be used in the “ide identify” command, but goes against the spec of a maximum of 16 heads for IDE.
Overall this approach is much more complicated.
Sam
cheers, Gerd
Hi,
Ok, given that seabios has no setup any manual configuration needs to be done via qemu.
But why do we need a new interface for that? IDE can pass the geometry to the guest. virtio-blk has support too (VIRTIO_BLK_F_GEOMETRY). Likewise scsi (MODE_PAGE_HD_GEOMETRY). So this should be doable without any qemu changes.
This was indeed considered (all 3 methods) but it has the following issues:
Physical geometries of devices must now also be logical geometries with translation=none.
Yes.
When the OS will query these devices - It will now see different physical geometries, adapted to be logical geometries.
Yes.
I’m not sure even how to implement this without breaking existing compatibility - since we don’t want to affect logical geometries of currently used guests.
We can copy the logic which calculates lchs from seabios to qemu and use it for pchs.
The tricky part of this is how to do the switch without requiring a lockstep update of seabios and qemu. seabios can't easily know whenever it should use the current logic (current qemu) or whenever it should simply use pchs with translation=none (updated qemu).
Hmm ...
MODE_PAGE_HD_GEOMETRY does not contain the spts, only cylinders (as 3 byte number) and heads (as 1 byte number) and computes the spts using:
Well, there also is MODE_PAGE_FLEXIBLE_DISK_GEOMETRY.
Moving a scsi-hd/virtio-blk with 255 physical heads to ide-hd, we will still need to report 255 heads - this is possible since a whole byte can be used in the “ide identify” command, but goes against the spec of a maximum of 16 heads for IDE.
Why do you want migrate _to_ IDE?
Overall this approach is much more complicated.
Well, adding new fw_cfg interfaces has a long term maintenance cost. So there should be a pretty good reason for them.
cheers, Gerd
On 17 Jun 2019, at 9:50, Gerd Hoffmann kraxel@redhat.com wrote:
Hi,
Ok, given that seabios has no setup any manual configuration needs to be done via qemu.
But why do we need a new interface for that? IDE can pass the geometry to the guest. virtio-blk has support too (VIRTIO_BLK_F_GEOMETRY). Likewise scsi (MODE_PAGE_HD_GEOMETRY). So this should be doable without any qemu changes.
This was indeed considered (all 3 methods) but it has the following issues:
Physical geometries of devices must now also be logical geometries with translation=none.
Yes.
When the OS will query these devices - It will now see different physical geometries, adapted to be logical geometries.
Yes.
I’m not sure even how to implement this without breaking existing compatibility - since we don’t want to affect logical geometries of currently used guests.
We can copy the logic which calculates lchs from seabios to qemu and use it for pchs.
The tricky part of this is how to do the switch without requiring a lockstep update of seabios and qemu. seabios can't easily know whenever it should use the current logic (current qemu) or whenever it should simply use pchs with translation=none (updated qemu).
Hmm ...
MODE_PAGE_HD_GEOMETRY does not contain the spts, only cylinders (as 3 byte number) and heads (as 1 byte number) and computes the spts using:
Well, there also is MODE_PAGE_FLEXIBLE_DISK_GEOMETRY.
Moving a scsi-hd/virtio-blk with 255 physical heads to ide-hd, we will still need to report 255 heads - this is possible since a whole byte can be used in the “ide identify” command, but goes against the spec of a maximum of 16 heads for IDE.
Why do you want migrate _to_ IDE?
Even without migration, now under SeaBIOS probably most IDE disks report 255 heads and 63 spts due to LBA translation, while exposing up to 16 physical heads (IDE spec). So you can’t really report in ATA identify command your wanted logical heads (255).
This can be solved in a very complicated way:
For virtio-blk disks - report bios geometries as physical geometries. This might break current compatibility (showing different physical geometries) For scsi disks - report bios geometries as physical geometries. Implement MODE_PAGE_FLEXIBLE_DISK_GEOMETRY and translation=none - this new interface will help with compatibility For IDE disks - specially craft valid physical geometries (heads <= 16) with a specific translation, This is super complicated, for example to make an IDE disk report lchs of 32 heads, 56 spts, you need a physical geometry of 16 heads, 56 spts and report 2046 cylinders with a "large" translation - which will effectively cut down the number of cylinders by 2 to 1023 and multiply the heads by 2 to 32, achieving the desired lchs. Also we can not even make an IDE report 255 heads with 56 spts with any translation (this is an actual value from production) - so the disk must be moved to scsi-hd/virtio-blk - which also breaks compatibility.
This implementation creates 3 different non conventional (PV) ways of reporting lchs directly between Qemu and SeaBIOS - which add a lot of technical depth for resolving a legacy issue. A fw-cfg value named “bootdevices” (or “bios-geometry”) is much more straightforward and makes it more readable/explicit.
WDYT?
Overall this approach is much more complicated.
Well, adding new fw_cfg interfaces has a long term maintenance cost. So there should be a pretty good reason for them.
cheers, Gerd