On Sat, 13 Feb 2016 19:20:32 -0500 "Kevin O'Connor" kevin@koconnor.net wrote:
On Sat, Feb 13, 2016 at 01:57:09PM -0700, Alex Williamson wrote:
On Sat, 13 Feb 2016 15:05:09 -0500 "Kevin O'Connor" kevin@koconnor.net wrote:
On Sat, Feb 13, 2016 at 11:51:51AM -0700, Alex Williamson wrote:
On Sat, 13 Feb 2016 13:18:39 -0500 "Kevin O'Connor" kevin@koconnor.net wrote:
On Sat, Feb 13, 2016 at 08:12:09AM -0700, Alex Williamson wrote:
On Fri, 12 Feb 2016 21:49:04 -0500 "Kevin O'Connor" kevin@koconnor.net wrote: > On Fri, Feb 12, 2016 at 05:23:18PM -0700, Alex Williamson wrote: > > Intel IGD makes use of memory allocated and marked reserved by the > > BIOS as a stolen memory range. For the most part, guest drivers don't > > make use of this, but our achilles heel is the vBIOS. The vBIOS > > programs the device to use the host stolen memory range and it's used > > in the pre-boot environment. Generally the guest won't have access to > > the host stolen memory area, so these accesses either land in VM > > memory or unassigned space and generate IOMMU faults. By allocating > > this range in SeaBIOS and programming it into the device, QEMU (via > > vfio) can make sure this guest allocated stolen memory range is used > > instead. > > What does "vBIOS" mean in this context? Is it the video device option > rom or something else?
vBIOS = video BIOS, you're correct, it's just the video device option ROM.
Is the problem from when the host runs the video option rom, or is the problem from the guest (via SeaBIOS) running the video option rom? If the guest is running the option rom, is it the first time the option rom has been run for the machine (ie, was the option rom not executed on the host when the host machine first booted)?
FWIW, many of the chromebooks use coreboot with Intel graphics and a number of users have installed SeaBIOS (running natively) on their machines. Running the intel video option rom more than once has been known to cause issues.
The issue is in the VM and it occurs every time the option ROM is executed. Standard VGA text mode displays fine (ex. SeaBIOS version string and boot menu), but any sort of extended graphics mode (ex. live CD boot menu) tries to make use of the host memory area which corresponds to the stolen memory area of the physical device. We're not really sure how the ROM execution arrives at these addresses (not from the device according to access traces), but we can see when the ROM is writing these addresses to the device and modify they addresses to point at a VM memory range which we've allocated. That's what this code attempts to do, allocate a buffer and tell QEMU about it via the BDSM (Base Data of Stolen Memory) register.
Forgive me if I'm not fully understanding this. If I read what you're saying then the sequence is something like:
1 - the host system bios (or vgabios) programs the GTT/stolen memory base register at host system bootup time and reserves it in the host e820 map.
2 - upon running qemu, the guest reruns the vga bios option rom which seems to work (ie, text mode works)
3 - in the guest, upon running a bootloader that uses graphics mode, the bootloader calls the vgabios to switch to graphics mode, and the vgabios sends commands to the graphics hardware that somehow reference the host stolen memory
What exactly happens here isn't clear to me, but this is a plausible explanation. What we see in tracing access to the hardware is that a bunch of addresses are written to the device that fall within the host e820 reserved area and then the device starts generating IOMMU faults trying to access those addresses.
4 - your patch causes QEMU to catch these commands with references to the host stolen memory and replace them with references to the guest stolen memory (which seabios creates)
Am I understanding the above correctly?
Yes.
Is the only reason to run the intel option rom in the guest for bootloader graphic mode support? Last time I looked, the intel vga hardware could fully emulate a legacy vga device - if the device is in vga compatibility mode then it may be possible to have seavgabios drive mode changes.
I have a SandyBridge based laptop (Lenovo W520) where the LCD panel won't turn on without the vBIOS.
This confuses me - why didn't the host system BIOS turn on the LCD panel during host bootup?
It turns off when we reset the device between VM instances or between VM boots. IGD supports Function Level Reset (FLR).
Another desktop IvyBridge system doesn't really care about the vBIOS so long as we don't ask it to output anything before the guest native drivers are loaded. If we could, I think we'd just enable vBIOS for laptop panel support, but that's really not an option, it's going to run as a boot option ROM as well, so we need to fix the issues that it generates there.
From my experience with coreboot, running the vga option rom multiple times during a given boot is very fragile. (By multiple times, I mean either the host running it and then a guest, or running it multiple times from multiple guests.) YMMV.
We do this regularly for graphics assignment, Nvidia, AMD, and now Intel. It generally works ok. Perhaps you've seen issues with the option ROM being run multiple times without resetting the device. I could certainly believe that. We only have one blacklisted Broadcom ROM in vfio, probably due to missing or incomplete device reset method.
[...]
The write to 0x5C is used by QEMU code that traps writes to the device I/O port BAR and replaces the host stolen memory address with the new guest address generated here. 0x5C is initialized to 0x0 by kernel vfio code, so we can detect whether it has been written. If not written, QEMU has no place to redirect to for stolen memory and it will either overlap VM memory or an unassigned area. The former may corrupt VM memory, the latter throws host errors. We could in QEMU halt with a hardware error if 0x5C hasn't been programmed.
So, if I understand correctly, 0x5C is not a "real" register on the hardware, but is instead just a mechanism to give QEMU the address of some guest visible ram?
It is a real register, BDSM that is virtualized by vfio turning it effectively into a scratch register. On physical hardware the register is read-only.
BTW, is 0xFC a "real" register in the hardware? How does the guest find the location of the "OpRegion" if SeaBIOS allocates it?
0xFC is the ASL Storage register, the guest finds the location of the OpRegion using this register. This is another register that is read-only on real hardware but virtualized through vfio so we can relocate the OpRegion into the VM address space.
I've found that allocating a dummy MMIO BAR does work as an alternative for mapping space for this stolen memory into the VM address space. For a Linux guest it works to allocate BAR5 on the IGD device. Windows10 is not so happy with this, but does work if I allocate the BAR on something like the ISA bridge device. My guess is that the IGD driver in Windows freaks out at finding this strange new BAR on its device. So I'll need to come up with an algorithm for either creating a dummy PCI device to host this BAR or trying to add it to other existing devices. It's certainly a more self-contained solution this way, so I expect we'll only need patch 1/3 from this series. Thanks,
Okay. (I'm not saying patch 3 is bad, but okay.)
If you go through the trouble of mapping the BDSM through a pci bar, then why not do the same with ASLS then too?
I suppose we could do that. There are a few nuances to the fake BAR solution:
The BAR needs to get mapped and not remapped while in use - usually not a problem.
The guest needs to not disable the device we attach the BARs to, which it might do if it doesn't recognize the device.
We need to be careful about adding BARs to devices the guest does have drivers for or we might overlap real functionality.
If we create a dummy device with bogus IDs, it will show up with an exclamation mark in device manager, which makes people unhappy.
So from a perspective of being self contained, the fake BAR solution is very good, but it's not without issue. I'll try to think of what sort of dummy device we could create that would always have a guest driver, but nothing that a couple extra BARs would interfere with. Maybe a generic PCI bridge. Thanks,
Okay. Again, I'm not stating a preferred direction.
BTW, I wonder if the recent discussion between Michael and Igor is relevant here: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg05602.html
I'm certainly open to rebuttals against this approach, but I do have it working. Being entirely self contained is pretty intriguing. Theoretically this would allow us to work with OVMF with no modifications. Linux guests enable and disable devices several times during boot (per the spec, any time the BAR is sized it should be disabled), Windows never seems to disable the device. The LPC/ISA bridge seems to be the best place to put these BARs, we need to create that anyway for pre-Broadwell/Skylake and the device itself has no implemented BARs. The ISA bridge is just a shell device to keep the driver happy on those older chips, so squatting on a couple BARs doesn't seem too terrible. Thanks,
Alex