On Thu, Oct 25, 2012 at 1:16 AM, Vasilis Liaskovitis
<vasilis.liaskovitis(a)profitbricks.com> wrote:
> Hi,
>
> On Wed, Oct 24, 2012 at 12:15:17PM +0200, Stefan Hajnoczi wrote:
>> On Wed, Oct 24, 2012 at 10:06 AM, liu ping fan <qemulist(a)gmail.com> wrote:
>> > On Tue, Oct 23, 2012 at 8:25 PM, Stefan Hajnoczi <stefanha(a)gmail.com> wrote:
>> >> On Fri, Sep 21, 2012 at 01:17:21PM +0200, Vasilis Liaskovitis wrote:
>> >>> +static void dimm_populate(DimmDevice *s)
>> >>> +{
>> >>> + DeviceState *dev= (DeviceState*)s;
>> >>> + MemoryRegion *new = NULL;
>> >>> +
>> >>> + new = g_malloc(sizeof(MemoryRegion));
>> >>> + memory_region_init_ram(new, dev->id, s->size);
>> >>> + vmstate_register_ram_global(new);
>> >>> + memory_region_add_subregion(get_system_memory(), s->start, new);
>> >>> + s->mr = new;
>> >>> +}
>> >>> +
>> >>> +static void dimm_depopulate(DimmDevice *s)
>> >>> +{
>> >>> + assert(s);
>> >>> + vmstate_unregister_ram(s->mr, NULL);
>> >>> + memory_region_del_subregion(get_system_memory(), s->mr);
>> >>> + memory_region_destroy(s->mr);
>> >>> + s->mr = NULL;
>> >>> +}
>> >>
>> >> How is dimm hot unplug protected against callers who currently have RAM
>> >> mapped (from cpu_physical_memory_map())?
>> >>
>> >> Emulated devices call cpu_physical_memory_map() directly or indirectly
>> >> through DMA emulation code. The RAM pointer may be held for arbitrary
>> >> lengths of time, across main loop iterations, etc.
>> >>
>> >> It's not clear to me that it is safe to unplug a DIMM that has network
>> >> or disk I/O buffers, for example. We also need to be robust against
>> >> malicious guests who abuse the hotplug lifecycle. QEMU should never be
>> >> left with dangling pointers.
>> >>
>> > Not sure about the block layer. But I think those thread are already
>> > out of big lock, so there should be a MemoryListener to catch the
>> > RAM-unplug event, and if needed, bdrv_flush.
>
> do we want bdrv_flush, or some kind of cancel request e.g. bdrv_aio_cancel?
>
My original meaning is that flush out the dangling pointer.
>>
>> Here is the detailed scenario:
>>
>> 1. Emulated device does cpu_physical_memory_map() and gets a pointer
>> to guest RAM.
>> 2. Return to vcpu or iothread, continue processing...
>> 3. Hot unplug of RAM causes the guest RAM to disappear.
>> 4. Pending I/O completes and overwrites memory from dangling guest RAM pointer.
>>
>> Any I/O device that does zero-copy I/O in QEMU faces this problem:
>> * The block layer is affected.
>> * The net layer is unaffected because it doesn't do zero-copy tx/rx
>> across returns to the main loop (#2 above).
>> * Not sure about other devices classes (e.g. USB).
>>
>> How should the MemoryListener callback work? For block I/O it may not
>> be possible to cancel pending I/O asynchronously - if you try to
>> cancel then your thread may block until the I/O completes.
>
For current code, I think to block on the listener to wait for the
completion of flushing out. But after mr->ops's ref/unref patchset
accept, we can release the ref of RAM device after we have done with
it (it is a very raw idea, need to improve).
> e.g. paio_cancel does this?
> is there already an API to asynchronously cancel all in flight operations in a
> BlockDriverState? Afaict block_job_cancel refers to streaming jobs only and
> doesn't help here.
>
> Can we make the RAM unplug initiate async I/O cancellations, prevent further I/Os,
> and only free the memory in a callback, after all DMA I/O to the associated memory
> region has been cancelled or completed?
>
> Also iiuc the MemoryListener should be registered from users of
> cpu_physical_memory_map e.g. hw/virtio.c
>
Yes.
> By the way dimm_depopulate only frees the qemu memory on an ACPI _EJ request, which
> means that a well-behaved guest will have already offlined the memory and is not
> using it anymore. If the guest still uses the memory e.g. for a DMA buffer, the
> logical memory offlining will fail and the _EJ/qemu memory freeing will never
> happen.
>
Yes.
> But in theory a malicious acpi guest driver could trigger _EJ requests to do step
> 3 above.
>
> Or perhaps the backing block driver can finish an I/O request for a zero-copy
> block device that the guest doesn't care for anymore? I 'll think about this a
> bit more.
>
The guest is one of the users of dimm device, and block layer is another one.
Regards,
pingfan
>> Synchronous cancel behavior is not workable since it can lead to poor
>> latency or hangs in the guest.
>
> ok
>
> thanks,
>
> - Vasilis
>