Hi,
On Wed, Oct 24, 2012 at 12:15:17PM +0200, Stefan Hajnoczi wrote:
On Wed, Oct 24, 2012 at 10:06 AM, liu ping fan qemulist@gmail.com wrote:
On Tue, Oct 23, 2012 at 8:25 PM, Stefan Hajnoczi stefanha@gmail.com wrote:
On Fri, Sep 21, 2012 at 01:17:21PM +0200, Vasilis Liaskovitis wrote:
+static void dimm_populate(DimmDevice *s) +{
- DeviceState *dev= (DeviceState*)s;
- MemoryRegion *new = NULL;
- new = g_malloc(sizeof(MemoryRegion));
- memory_region_init_ram(new, dev->id, s->size);
- vmstate_register_ram_global(new);
- memory_region_add_subregion(get_system_memory(), s->start, new);
- s->mr = new;
+}
+static void dimm_depopulate(DimmDevice *s) +{
- assert(s);
- vmstate_unregister_ram(s->mr, NULL);
- memory_region_del_subregion(get_system_memory(), s->mr);
- memory_region_destroy(s->mr);
- s->mr = NULL;
+}
How is dimm hot unplug protected against callers who currently have RAM mapped (from cpu_physical_memory_map())?
Emulated devices call cpu_physical_memory_map() directly or indirectly through DMA emulation code. The RAM pointer may be held for arbitrary lengths of time, across main loop iterations, etc.
It's not clear to me that it is safe to unplug a DIMM that has network or disk I/O buffers, for example. We also need to be robust against malicious guests who abuse the hotplug lifecycle. QEMU should never be left with dangling pointers.
Not sure about the block layer. But I think those thread are already out of big lock, so there should be a MemoryListener to catch the RAM-unplug event, and if needed, bdrv_flush.
do we want bdrv_flush, or some kind of cancel request e.g. bdrv_aio_cancel?
Here is the detailed scenario:
- Emulated device does cpu_physical_memory_map() and gets a pointer
to guest RAM. 2. Return to vcpu or iothread, continue processing... 3. Hot unplug of RAM causes the guest RAM to disappear. 4. Pending I/O completes and overwrites memory from dangling guest RAM pointer.
Any I/O device that does zero-copy I/O in QEMU faces this problem:
- The block layer is affected.
- The net layer is unaffected because it doesn't do zero-copy tx/rx
across returns to the main loop (#2 above).
- Not sure about other devices classes (e.g. USB).
How should the MemoryListener callback work? For block I/O it may not be possible to cancel pending I/O asynchronously - if you try to cancel then your thread may block until the I/O completes.
e.g. paio_cancel does this? is there already an API to asynchronously cancel all in flight operations in a BlockDriverState? Afaict block_job_cancel refers to streaming jobs only and doesn't help here.
Can we make the RAM unplug initiate async I/O cancellations, prevent further I/Os, and only free the memory in a callback, after all DMA I/O to the associated memory region has been cancelled or completed?
Also iiuc the MemoryListener should be registered from users of cpu_physical_memory_map e.g. hw/virtio.c
By the way dimm_depopulate only frees the qemu memory on an ACPI _EJ request, which means that a well-behaved guest will have already offlined the memory and is not using it anymore. If the guest still uses the memory e.g. for a DMA buffer, the logical memory offlining will fail and the _EJ/qemu memory freeing will never happen.
But in theory a malicious acpi guest driver could trigger _EJ requests to do step 3 above.
Or perhaps the backing block driver can finish an I/O request for a zero-copy block device that the guest doesn't care for anymore? I 'll think about this a bit more.
Synchronous cancel behavior is not workable since it can lead to poor latency or hangs in the guest.
ok
thanks,
- Vasilis