[SeaBIOS] [RFC PATCH v3 3/3] fw/pci: Allocate IGD stolen memory

Alex Williamson alex.williamson at redhat.com
Sat Feb 13 16:12:09 CET 2016

Hi Kevin,

On Fri, 12 Feb 2016 21:49:04 -0500
"Kevin O'Connor" <kevin at koconnor.net> wrote:

> On Fri, Feb 12, 2016 at 05:23:18PM -0700, Alex Williamson wrote:
> > Intel IGD makes use of memory allocated and marked reserved by the
> > BIOS as a stolen memory range.  For the most part, guest drivers don't
> > make use of this, but our achilles heel is the vBIOS.  The vBIOS
> > programs the device to use the host stolen memory range and it's used
> > in the pre-boot environment.  Generally the guest won't have access to
> > the host stolen memory area, so these accesses either land in VM
> > memory or unassigned space and generate IOMMU faults.  By allocating
> > this range in SeaBIOS and programming it into the device, QEMU (via
> > vfio) can make sure this guest allocated stolen memory range is used
> > instead.  
> What does "vBIOS" mean in this context?  Is it the video device option
> rom or something else?

vBIOS = video BIOS, you're correct, it's just the video device option
> > 
> > Signed-off-by: Alex Williamson <alex.williamson at redhat.com>
> > ---
> >  src/fw/pciinit.c |   13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
> > index 92170d5..c1ad5d4 100644
> > --- a/src/fw/pciinit.c
> > +++ b/src/fw/pciinit.c
> > @@ -260,7 +260,7 @@ static void ich9_smbus_setup(struct pci_device *dev, void *arg)
> >  static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
> >  {
> >      struct romfile_s *file = romfile_find("etc/igd-opregion");
> > -    void *opregion;
> > +    void *opregion, *bdsm;
> >      u16 bdf = dev->bdf;
> >  
> >      if (!file || !file->size)
> > @@ -281,6 +281,17 @@ static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
> >  
> >      dprintf(1, "Intel IGD OpRegion enabled on %02x:%02x.%x\n",
> >              pci_bdf_to_bus(bdf), pci_bdf_to_dev(bdf), pci_bdf_to_fn(bdf));
> > +
> > +    bdsm = memalign_high(1024 * 1024, 1024 * 1024);
> > +    if (!bdsm) {
> > +        warn_noalloc();
> > +        return;
> > +    }  
> The "high" memory pool is not a good fit for such a large allocation.
> For so much space, I'd use memalign_tmphigh() followed by
> e820_add(addr, size, E820_RESERVED).

Ok, as you saw on IRC I was looking for alternatives, I can easily
switch to this.
> > +    pci_config_writel(bdf, 0x5C, cpu_to_le32((u32)bdsm));
> > +
> > +    dprintf(1, "Allocated 1MB reserved memory for Intel IGD stolen memory at "
> > +            "0x%08x\n", (u32)bdsm);
> >  }  
> Does this make sense to do unconditionally in the firmware whenever
> the device is present?  The SeaBIOS release schedule is not in sync
> with QEMU, so changes in the firmware tend to take longer to deploy if
> something needs to be done differently in the future.
> What happens if this register is not set?  Does anything go wrong if
> register 0xFC is set, but 0x5C is not (eg, due to allocation failure).

It's not entirely unconditional, it's tagged onto the end of the
OpRegion setup, so we know that we're at least dealing with a QEMU that
has that support.  We could link it to the PCI ROM BAR for IGD being
present since this is necessary to support the code in that ROM, but
that doesn't give us any idea if the QEMU support is there.  Maybe we
could do both, OpRegion support and ROM BAR present.

The write to 0x5C is used by QEMU code that traps writes to the device
I/O port BAR and replaces the host stolen memory address with the new
guest address generated here.  0x5C is initialized to 0x0 by kernel
vfio code, so we can detect whether it has been written.  If not
written, QEMU has no place to redirect to for stolen memory and it will
either overlap VM memory or an unassigned area.  The former may corrupt
VM memory, the latter throws host errors.  We could in QEMU halt with a
hardware error if 0x5C hasn't been programmed.

One alternative I was considering was to simply use the unused BAR5
slot on IGD to expose a fake 1MB, 32bit MMIO BAR.  SeaBIOS would program
this normally, QEMU would back it with its own allocation and we could
easily find the address of it when we need it.  The downside being that
the PCI BAR can be disabled and remapped, but for the small window
where we need it, I'm not sure that matters.  It would certainly be
more transparent to the VM BIOS.  It would also make it much easier to
change the size if we found some devices need more/less/none space.  I
initially thought the fake BAR wasn't a great fix, but maybe its
advantages outweigh its downsides.  I'll give it a shot.  Thanks,


More information about the SeaBIOS mailing list