On Tue, Dec 18, 2012 at 10:41 AM, Vasilis Liaskovitis < vasilis.liaskovitis@profitbricks.com> wrote:
This is v4 of the ACPI memory hotplug functionality. Only x86_64 target is supported (both i440fx and q35). There are still several issues, but it's been a while since v3 and I wanted to get some more feedback on the current state of the patchseries.
We are working in memory hotplug functionality on pSeries machine. I'm wondering whether and how we can better integrate things. Do you think the DIMM abstraction is generic enough to be used in other machine types?
Overview:
Dimm device layout is modeled with a normal qemu device:
"-device dimm,id=name,size=sz,node=pxm,populated=on|off,bus=membus.0"
How does this will handle the no-hotplugable memory for example the memory passed in '-m' parameter?
The starting physical address for all dimms is calculated from top of memory, during memory controller init, skipping the pci hole at [PCI_HOLE_START, 4G). e.g. "-device dimm,id=dimm0,size=512M,node=0,populated=off,bus=membus.0" will define a 512M memory dimm belonging to numa node 0, on bus membus.0.
Because dimm layout needs to be configured on machine-boot, all dimm devices need to be specified on startup command line (either with populated=on or with populated=off). The dimm information is stored in dimm configuration structures.
After machine startup, dimms are hot-added or removed with normal device_add and device_del operations e.g.: Hot-add syntax: "device_add dimm,id=mydimm0,bus=membus.0" Hot-remove syntax: "device_del dimm,id=mydimm0"
Changes v3->v4
- Dimms added with normal -device argument (extra -dimm arg dropped).
- multiple memory buses can be registered. Memory buses of the real
hw/chipset or a paravirtual memory bus can be added.
- acpi implementation uses memory API instead of old ioports.
- Support for q35/ich9 added (still buggy, see patch 12/31).
- piix4/i440fx initialization code has been refactored to resemble q35.
This will allow memory map initialization at chipset qdev init time for both machines, as well as more similar code.
- Hot-remove functionality has been moved to separate patches. Hot-remove
no longer frees memory but unmaps the dimm/qdev device from the guest's view. Freeing the memory should happen when the last user unrefs/unmaps the memory, see also (work in progress): https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00728.html https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02697.html
- new qmp/hmp command for the state of each dimm (on/off)
Changes v2->v3
- qdev integration. Dimms are attached to a dimmbus. The dimmbus is a child of i440fx device in the pc machine. Hot-add and remove are done with
normal device_add / device_del operations on the dimmbus. New commands "dimm_add" and "dimm_del" are obsolete.
- Add _PS3 method to allow OSPM-induced hot operations.
- pci-window calculation in Seabios takes dimms into account(for both
32-bit and 64-bit windows)
- rename new qmp commands: query-memory-total and query-memory-hotplug
- balloon driver can see the hotplugged memory
Changes v1->v2
- memory map is automatically calculated for hotplug dimms. Dimms are
added from top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
- Renamed from "-memslot" to "-dimm". Commands changed to "dimm_add",
"dimm_del"
- Seabios ejection array reduced to a byte. Use extraction macros for dimm
ssdt.
- additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
- Documentation of new acpi_piix4 registers and paravirt data.
- add ACPI _OST support for _OST enabled guests. This allows qemu to
receive notification for success / failure of memory hot-add and hot-remove operations. Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
- add monitor info command to report total guest memory (initial +
hot-added)
Issues:
- hot-remove needs to only unmap the dimm device from guest's view.
Freeing the memory should happen when the last user of the device (e.g. virtio-blk) unrefs the device. A testcase is needed for this.
- Live Migration: Ramblocks are migrated before qdev VMStates are
migrated. So the DimmDevice is handled diferrently than other devices. Should this be reworked ?( DimmDevice structure currently does not define a VMStateDescription) Live migration works as long as the dimm layout (command line args) are identical at the source and destination qemu command line, and destination takes into account hot-operations that have occured on source. (v3 patch 10/19 created the DimmDevice that corresponds to an unknown incoming ramblock, e.g. for a dimm that was hot-added on source. but has been dropped for the moment).
- A main blocker issue is windows guest functionality. The patchset does
not work for windows currently. Testing on win2012 server RC or windows2008 consumer prerelease, when adding a DIMM, there is a BSOD with ACPI_BIOS_ERROR message. After this, the VM keeps rebooting with ACPI_BIOS_ERROR. The windows pnpmem driver obviosuly has a problem with the seabios dimm implementation (or the seabios dimm implementation is not fully ACPI-compliant). If someone can review the seabios patches or has any ideas to debug this, let me know.
- hot-operation notification lists need to be added to migration state.
series is based on:
- qemu master (commit a8a826a3) + patch:
https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02699.html
- seabios master (commit a810e4e7)
Can also be found at:
http://github.com/vliaskov/qemu-kvm/commits/memhp-v4 http://github.com/vliaskov/seabios/commits/memhp-v4
Vasilis Liaskovitis (21): qapi: make visit_type_size fallback to type_int Add SIZE type to qdev properties qemu-option: export parse_option_number Implement dimm device abstraction vl: handle "-device dimm" acpi_piix4 : Implement memory device hotplug registers acpi_ich9 : Implement memory device hotplug registers piix_pci and pc_piix: refactor piix_pci: Add i440fx dram controller initialization q35: Add i440fx dram controller initialization pc: Add dimm paravirt SRAT info Introduce paravirt interface QEMU_CFG_PCI_WINDOW Implement "info memory-total" and "query-memory-total" balloon: update with hotplugged memory Implement dimm-info dimm: add hot-remove capability acpi_piix4: add hot-remove capability acpi_ich9: add hot-remove capability Implement qmp and hmp commands for notification lists Add _OST dimm support Implement _PS3 for dimm
docs/specs/acpi_hotplug.txt | 54 ++++++ docs/specs/fwcfg.txt | 28 +++ hmp-commands.hx | 6 + hmp.c | 41 ++++ hmp.h | 3 + hw/Makefile.objs | 2 +- hw/acpi.h | 5 + hw/acpi_ich9.c | 115 +++++++++++- hw/acpi_ich9.h | 12 +- hw/acpi_piix4.c | 126 ++++++++++++- hw/dimm.c | 444 +++++++++++++++++++++++++++++++++++++++++++ hw/dimm.h | 102 ++++++++++ hw/fw_cfg.h | 1 + hw/lpc_ich9.c | 2 +- hw/pc.c | 28 +++- hw/pc.h | 1 + hw/pc_piix.c | 74 ++++++-- hw/pc_q35.c | 18 ++- hw/piix_pci.c | 249 ++++++++----------------- hw/q35.c | 27 +++ hw/q35.h | 5 + hw/qdev-properties.c | 60 ++++++ hw/qdev-properties.h | 3 + hw/virtio-balloon.c | 13 +- monitor.c | 21 ++ qapi-schema.json | 63 ++++++ qapi/qapi-visit-core.c | 11 +- qemu-option.c | 4 +- qemu-option.h | 4 + qmp-commands.hx | 57 ++++++ sysemu.h | 1 + vl.c | 60 ++++++ 32 files changed, 1432 insertions(+), 208 deletions(-) create mode 100644 docs/specs/acpi_hotplug.txt create mode 100644 docs/specs/fwcfg.txt create mode 100644 hw/dimm.c create mode 100644 hw/dimm.h
Vasilis Liaskovitis (9): Add ACPI_EXTRACT_DEVICE* macros Add SSDT memory device support acpi-dsdt: Implement functions for memory hotplug acpi: generate hotplug memory devices q35: Add memory hotplug handler pci: Use paravirt interface for pcimem_start and pcimem64_start acpi: add _EJ0 operation and eject port for memory devices Add _OST dimm method Implement _PS3 method for memory device
Makefile | 2 +- src/acpi-dsdt-mem-hotplug.dsl | 136 +++++++++++++++++++++++++++++++++++ src/acpi-dsdt.dsl | 5 +- src/acpi.c | 158 +++++++++++++++++++++++++++++++++++++++-- src/paravirt.c | 6 ++ src/paravirt.h | 2 + src/pciinit.c | 9 +++ src/q35-acpi-dsdt.dsl | 6 +- src/ssdt-mem.dsl | 73 +++++++++++++++++++ tools/acpi_extract.py | 28 +++++++ 10 files changed, 415 insertions(+), 10 deletions(-) create mode 100644 src/acpi-dsdt-mem-hotplug.dsl create mode 100644 src/ssdt-mem.dsl
-- 1.7.9