Laszlo Ersek lersek@redhat.com writes:
On 01/04/18 11:24, Vitaly Kuznetsov wrote:
Laszlo Ersek lersek@redhat.com writes:
Is it possible that the current barrier() is not sufficient for the intended purpose in an L2 guest?
What happens if you drop your current patch, but replace
__asm__ __volatile__("": : :"memory")
in the barrier() macro definition, with a real, heavy-weight barrier, such as
__asm__ __volatile__("mfence": : :"memory")
(See mb() in "arch/x86/include/asm/barrier.h" in the kernel.)
Thanks for the suggestion,
unfortunately, it doesn't change anything :-(
... I think running in L2 could play a role here; see "Documentation/memory-barriers.txt", section "VIRTUAL MACHINE GUESTS"; from kernel commit 6a65d26385bf ("asm-generic: implement virt_xxx memory barriers", 2016-01-12).
See also the commit message.
I see, thank you.
It seems, however, that the issue here is not about barriers: first of all it is 100% reproducible and second, surrounding '*(volatile u32 *)addr = val' with all sorts of barriers doesn't help. I *think* this is some sort of a mis-assumption about this memory which is handled with vmexits so both L0 and L1 hypervisors are getting involved. More debugging ...
Thank you for your ideas,
- Do you see the issue with both legacy-only (0.9.5) and modern-only
(1.0) virtio devices?
Asking about this because legacy and modern virtio devices use registers in different address spaces (IO vs. MMIO).
This only affects 'modern' virtio-blk-pci device which I'm using for boot.
In fact, the only writew() needs patching is in vp_notify(), when I replace it with 'asm volatile' everything works.
- Does it make a difference if you disable EPT in the L1 KVM
configuration? (EPT is probably primarily controlled by the CPU features exposed by L0 Hyper-V, and secondarily by the "ept" parameter of the "kvm_intel" module in L1.)
Asking about EPT because the virtio rings and descriptors are in RAM, accessing which in L2 should "normally" never trap to L1/L0. However (I *guess*), when those pages are accessed for the very first time in L2, they likely do trap, and then the EPT setting in L1 might make a difference.
Disabling EPT helps!
- Somewhat relatedly, can you try launching QEMU in L1 with "-realtime
mlock=on"?
This doesn't seem to make a difference.
I also tried tracing L1 KVM and the difference between working and non-working cases seems to be:
1) Working:
... <...>-51387 [014] 64765.695019: kvm_page_fault: address fe007000 error_code 182 <...>-51387 [014] 64765.695024: kvm_emulate_insn: 0:eca87: 66 89 14 30 <...>-51387 [014] 64765.695026: vcpu_match_mmio: gva 0xfe007000 gpa 0xfe007000 Write GPA <...>-51387 [014] 64765.695026: kvm_mmio: mmio write len 2 gpa 0xfe007000 val 0x0 <...>-51387 [014] 64765.695033: kvm_entry: vcpu 0 <...>-51387 [014] 64765.695042: kvm_exit: reason EPT_VIOLATION rip 0xeae17 info 181 306 <...>-51387 [014] 64765.695043: kvm_page_fault: address f0694 error_code 181 <...>-51387 [014] 64765.695044: kvm_entry: vcpu 0 ...
2) Broken:
... <...>-38071 [014] 63385.241117: kvm_page_fault: address fe007000 error_code 182 <...>-38071 [014] 63385.241121: kvm_emulate_insn: 0:ecffb: 66 89 06 <...>-38071 [014] 63385.241123: vcpu_match_mmio: gva 0xfe007000 gpa 0xfe007000 Write GPA <...>-38071 [014] 63385.241124: kvm_mmio: mmio write len 2 gpa 0xfe007000 val 0x0 <...>-38071 [014] 63385.241143: kvm_entry: vcpu 0 <...>-38071 [014] 63385.241162: kvm_exit: reason EXTERNAL_INTERRUPT rip 0xecffe info 0 800000f6 <...>-38071 [014] 63385.241162: kvm_entry: vcpu 0 ...
The 'kvm_emulate_insn' difference is actually the diferent versions of 'mov' we get with the current code and with my 'asm volatile' version. What makes me wonder is where the 'EXTERNAL_INTERRUPT' (only seen in broken version) comes from.