This patch series redesigns and adds new features to pciinit.c code.
[Patches 1-4]
There are no more arrays of bases and counts in new implementation. The
new implementation is based on dynamic allocation of pci_region_entry
structures which are grouped into lists. Each pci_region_entry structure
could be a PCI bar or a downstream PCI region (bridge). Each entry
has a set of attributes: type (IO, MEM,PREFMEM), is64bit, size.
[Patch 5]
Bridge regions are no longer rounded up to the next highest size -
instead track alignment explicitly. This should improve the memory layout
for bridge regions.
Also, unused bridge regions will no longer be allocated any space.
[Patch 6]
Patch takes into account PCI bar and ROM regions of PCI bridges
[Patches 7-12]
Make pciinit.c 64bit aware. Support discovery and configuration of 64bit
bars, with non-zero upper32 bits. Code allocates the 64bit PCI bars in
high address range if they don't fir in 32bit range.
Kevin O'Konor (6):
0001-pciinit-Introduction-of-pci_region_entry-structure.patch
0002-pciinit-Move-bus-bar-asignment.patch
0004-pciinit-Use-sorted-order-allocation.patch
0005-pciinit-Track-region-alignment-explicitly.patch
0006-pciinit-bridges-can-have-two-regions-too.patch
0007-pciinit-Switch-to-64bit-variable-types.patch
Alexey Korolev (6):
0003-pciinit-Remove-size-element-from-pci_bus-r-structure.patch
0008-pciinit-Add-pci_region-structure.patch
0009-pciinit-64bit-capability-discovery-for-pci-bridges.patch
0010-Do-not-store-pci-region-stats-instead-calulate-the-s.patch
0011-Migrate-64bit-entries-to-64bit-pci-regions.patch
0012-Fix-64bit-PCI-issues-on-Windows.patch
src/acpi-dsdt.dsl | 7 +
src/acpi-dsdt.hex | 64 +++++++-
src/config.h | 2 +
src/pci.h | 6 +-
src/pciinit.c | 464 ++++++++++++++++++++++++++++++-----------------------
5 files changed, 325 insertions(+), 218 deletions(-)
On Tue, Dec 06, 2011 at 07:32:55PM -0500, Amos Kong wrote:
> ----- Original Message -----
> > On Tue, Dec 06, 2011 at 01:39:35PM +0800, Amos Kong wrote:
> > > Only func 0 is registered to guest driver (we can
> > > only found func 0 in slot->funcs list of driver),
> > > the other functions could not be cleaned when
> > > hot-removing the whole slot. This patch adds
> > > device per function in ACPI DSDT tables.
> > > Notify only when func0 is added and removed.
> > >
> > > Have tested with linux/winxp/win7, hot-adding/hot-remving,
> > > single/multiple function device, they are all fine(all
> > > added devices can be removed).
> > >
> > This includes some bits I wrote but this is not an ack
> > of the patch yet :)
> >
> > I find it surprising that this works: a function
> > has _EJ0 so would not guest expect that ejecting
> > a single one will only remove it, and not all functions?
>
> Removing single func is not supported by specific, and current code
> (qemu/kernel pci driver) process hot-remove with the whole slot.
> We could not hot-remove single func with/without this patch.
>
> Register _EJ0() for each func, then all the funcs will be record
> into the function list of the slot.
Just as an update - it's not clear to me what this patch does, and it
seems like Michael had some concerns.
Also, it doesn't seem right to hardcode the generation of that many
devices (248) in the AML code.
So, unless there are further comments I'm going to hold off on this
patch.
-Kevin
Add IASL definition to the Makefile.
Add IASL definition to Makefile so it can be passed by the user.
Signed-off-by: Marc Jones <marc.jones(a)se-eng.com>
--
http://se-eng.com
On Sat, Apr 28, 2012 at 06:17:24PM -0700, Jordan Justen wrote:
> On Sat, Apr 28, 2012 at 00:08, Gleb Natapov <gleb(a)redhat.com> wrote:
> > On Fri, Apr 27, 2012 at 02:55:48PM -0700, Jordan Justen wrote:
> >> But, if qemu could be changed,
> >> could it be made to match the PIIX4 datasheet?
> >>
> > We try not to change QEMU in non backwards compatible way. We can
> > implement PMBA and start using address there if FW writes into it,
Actually this is incorrect. QEMU implements PMBA and Seabios sets it to
0xb000. Sorry about the confusion.
> > but, as I noted before, OVMF will have to be changed anyways since 0x400
> > address is already occupied and pci/cpu/memory hotplug/unplug
> > functionality uses additional IO space not documented by PIIX4 spec.
>
> The "pci/cpu/memory hotplug/unplug functionality" is also using hard
> coded addresses in the 0x400 range?
>
No, the hard coded addresses at 0x400 range are used for FW debug. There
are four of them 0x400/0x401 used to print panic message on stderr and
exit qemu process, now they do nothing but qemu still registers them.
0x402/0x403 print character to stderr if qemu is compiled with
DEBUG_BIOS, but ports are registered even if DEBUG_BIOS is not defined.
hotplug/unplug uses addresses 0xae00 and above.
> And, 0xb000 would be the recommended PM base address?
>
Yes. QEMU splits PM and GPE address spaces though. GPE starts at 0xafe0
and is hardcoded. I do not see the way to move it to PMBA + offset and
stay backwards compatible (at least not without adding new fw_cfg value, but
what would be the advantage).
> > It would be better for OVMF to treat QEMU like separate platform, that
> > behave almost, but not exactly like, real HW.
>
> Yes, it does seem like that will be required here.
>
Yes, since we do want to support functionality that does not exist in
PIIX4 spec.
--
Gleb.
On Fri, Apr 27, 2012 at 02:55:48PM -0700, Jordan Justen wrote:
> On Fri, Apr 27, 2012 at 11:47, Laszlo Ersek <lersek(a)redhat.com> wrote:
> > This was how I interpreted our discussion with Jordan:
> >
> > L: Shouldn't qemu & OVMF agree on GPE0?
> > J: Why? Anyway, OVMF should be correct, because all ACPI registers are
> > in one tight bunch, starting from 0x400.
> > L: None of those two characteristics (1: "0x400", 2: "one tight bunch")
> > are required by the spec. For proof, see what SeaBIOS does: 1: it
> > doesn't use 0x400 as base, 2: GPE0 is not even above the base.
>
> This code was written to the PIIX4 datasheet. According to the
> datasheet, if you change PMBA, it changes the GPE blk too.
>
> It sounds like QEMU does not emulate this, and hardcodes the GPE blk.
>
Yes. QEMU does not really implement PIIX4 though. It implements PIIX3 +
PM from PIIX4 (but with a functionality that is not present in original
spec like pci/cpu hotplug) + other differences that firmware had to take
into account. For instance event if QEMU will implement PMBA, OVMF would
not be able to configure it to the value 0x400 since QEMU hardcodes
another PV device there.
> Regarding a solution, such as communicating the address in fw_cfg,
> that could work. OVMF has plans to add fw_cfg support.
>
Can it check that it runs over QEMU and use QEMU's values? No need for
new fw_cfg.
> > Anyway, back to my original question: currently OVMF and qemu disagree
> > wrt. GPE0. Which one should I try to modify so that they agree?
>
> It does seem like qemu could be changed, does it? It would require a
> simultaneous fix of qemu & seabios. But, if qemu could be changed,
> could it be made to match the PIIX4 datasheet?
>
We try not to change QEMU in non backwards compatible way. We can
implement PMBA and start using address there if FW writes into it,
but, as I noted before, OVMF will have to be changed anyways since 0x400
address is already occupied and pci/cpu/memory hotplug/unplug
functionality uses additional IO space not documented by PIIX4 spec.
It would be better for OVMF to treat QEMU like separate platform, that
behave almost, but not exactly like, real HW.
--
Gleb.
On 04/27/12 17:12, Jordan Justen wrote:
> On Fri, Apr 27, 2012 at 07:31, Laszlo Ersek <lersek(a)redhat.com> wrote:
>> edk2's "OvmfPkg/AcpiTables/Platform.h" specifies GPE0_BLK at 0x40C,
>> while qemu's "hw/acpi_piix4.c" expects the guest to access it at 0xAFE0.
>> Which macro should be modified to get them in sync?
>
> Do they need to be in sync?
It appears to me so:
https://bugzilla.redhat.com/show_bug.cgi?id=653382#c22
> We set PBMA to 0x400 in OvmfPkg/Library/AcpiTimerLib/AcpiTimerLib.c,
> so isn't 0x40c correct?
Considering OVMF in isolation, I presume it's self-consistent.
However, is it necessary (a) to group these ACPI registers closely
together, (b) to base the "group" at 0x400?
>From "5.2.9 Fixed ACPI Description Table (FADT)" in the ACPI spec (v5.0)
it would appear OVMF can freely choose where to put GPE0_BLK, in both
senses (ie. port address considered alone, and also in relation to the
other ACPI registers).
Considering SeaBIOS again (build_fadt()):
- PORT_ACPI_PM_BASE is 0xb000,
- PM1a_EVT_BLK, PM1a_CNT_BLK and PM_TMR_BLK are located consecutively
from this base,
- but GPE0_BLK is placed at 0xafe0 (build_fadt() -->
pci_init_device(fadt_init_tbl) --> piix4_fadt_init())
But I'm likely missing something ^W everything...
Thanks!
Laszlo
On Mon, Apr 23, 2012 at 03:51:10PM +0900, Daniel Castro wrote:
> Hello All,
>
> I have a small problem with call32, after I make the call and the
> 32bit code is executed the parameter I am sending is incorrect, so
> evidently I am doing something incorrect.
>
> Here is my 16bit code:
> dprintf(1,"Operation 16bit drive at %p\n",op->drive_g);
> extern void _cfunc32flat_xen_blk_op_read(struct disk_op_s *);
> return call32(_cfunc32flat_xen_blk_op_read,op,DISK_RET_EPARAM);
The 'op' structure is on the stack and the pointer is therefore
relative to the stack segment. Use MAKE_FLATPTR to construct a 32bit
pointer - see pci_writel() for an example.
-Kevin
On Thu, Apr 19, 2012 at 04:08:38PM +0200, Vasilis Liaskovitis wrote:
> This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
> earlier work and comments from Gleb.
>
> Memslot devices are modeled with a new qemu command line
>
> "-memslot id=name,start=start_addr,size=sz,node=pxm"
>
> user is responsible for defining memslots with meaningful start/size values,
> e.g. not defining a memory slot over a PCI-hole. Alternatively, the start size
> could also be handled/assigned automatically from the specific emulated hardware
> (for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so hotplugged memory
> should start from max(4G, above_4g_mem_size).
>
> Node is defining numa proximity for this memslot. When not defined it defaults
> to zero.
>
> e.g. "-memslot id=hot1,start=4294967296,size=536870912,node=0"
> will define a 512M memory slot starting at physical address 4G, belonging to numa node 0.
>
> Memory slots are added or removed with a new hmp command "memslot":
> Hot-add syntax: "memslot id add"
> Hot-remove syntax: "memslot id delete"
>
> - All memslots are initially unpopulated. Memslots are currently modeling only
We can add ,populated=true to -memslot to make slot populated from the
start. We will need it for migration anyway.
> hotplug-able memory slots i.e. initial system memory is not modeled with
> memslots. The concept could be generalized to include all memory though, or it
> could more closely follow kvm-memory slots.
OK, I hope final version will allow for memory < 4G to be hot-pluggable.
>
> - Memslots are abstracted as qdevices attached to the main system bus. However,
> memory hotplugging has its own side channel ignoring main_system_bus's hotplug
> incapability. A cleaner integration would be needed. What's the preferred
> way of modeling memory devices in qom? Would it be better to attach memory
> devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
> instead of the system bus?
>
> - Refcounting memory slots has been discussed (but is not included in this
> series yet). Depopulating a memory region happens on a guestOS _EJ callback,
> which means the guestOS will not be using the region anymore. However, guest
> addresses from the depopulated region need to also be unmapped from the qemu
> address space using cpu_physical_memory_unmap(). Does memory_region_del_subregion()
> or some other memory API call guarantee that a memoryregion has been unmapped
> from qemu's address space?
>
> - What is the expected behaviour of hotplugged memory after a reboot? Is it
> supposed to be persistent after reboots? The last 2 patches in the series try to
> make hotplugged memslots persistent after reboot by creating and consulting e820
> map entries. A better solution is needed for hot-remove after a reboot, because
> e820 entries can be merged.
>
> series is based on uq/master for qemu-kvm, and master for seabios. Can be found
> also at:
>
>
> Vasilis Liaskovitis (9):
> Seabios: Add SSDT memory device support
> Seabios, acpi: Implement acpi-dsdt functions for memory hotplug.
> Seabios, acpi: generate hotplug memory devices.
> Implement memslot device abstraction
> acpi_piix4: Implement memory device hotplug registers and handlers.
> pc: pass paravirt info for hotplug memory slots to BIOS
> Implement memslot command-line option and memslot hmp monitor command
> pc: adjust e820 map on hot-add and hot-remove
> Seabios, acpi: enable memory devices if e820 entry is present
>
> Makefile.objs | 2 +-
> hmp-commands.hx | 15 ++++
> hw/acpi_piix4.c | 103 +++++++++++++++++++++++++++-
> hw/memslot.c | 201 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> hw/memslot.h | 44 ++++++++++++
> hw/pc.c | 87 ++++++++++++++++++++++--
> hw/pc.h | 1 +
> monitor.c | 8 ++
> monitor.h | 1 +
> qemu-config.c | 25 +++++++
> qemu-options.hx | 8 ++
> sysemu.h | 1 +
> vl.c | 44 ++++++++++++-
> 13 files changed, 528 insertions(+), 12 deletions(-)
> create mode 100644 hw/memslot.c
> create mode 100644 hw/memslot.h
>
> create mode 100644 src/ssdt-mem.dsl
> src/acpi-dsdt.dsl | 68 ++++++++++++++++++++++-
> src/acpi.c | 155 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> src/memmap.c | 15 +++++
> src/ssdt-mem.dsl | 66 ++++++++++++++++++++++
> 4 files changed, 298 insertions(+), 6 deletions(-)
> create mode 100644 src/ssdt-mem.dsl
>
> --
> 1.7.9
--
Gleb.
On Tue, Apr 24, 2012 at 10:24:51AM +0200, Vasilis Liaskovitis wrote:
> Hi,
> On Tue, Apr 24, 2012 at 10:52:24AM +0300, Gleb Natapov wrote:
> > On Mon, Apr 23, 2012 at 02:31:15PM +0200, Vasilis Liaskovitis wrote:
> > > The 440fx spec mentions: "The address range from the top of main DRAM to 4
> > > Gbytes (top of physical memory space supported by the 440FX PCIset) is normally
> > > mapped to PCI. The PMC forwards all accesses within this address range to PCI."
> > >
> > > What we probably want is that the initial memory map creation takes into account
> > > all dimms specified (both populated/unpopulated)
> > Yes.
> >
> > > So "-m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false"
> > > would create a system map with top of memory and start of PCI-hole at 2G.
> > >
> > What -m 1G means on this command line? Isn't it redundant?
> yes, this was redundant with the original concept.
>
> > May be we should make -m create non unplaggable, populated slot starting
> > at address 0. Ten you config above will specify 3G memory with 2G
> > populated (first of which is not removable) and 1G unpopulated. PCI hole
> > starts above 3G.
>
> I agree -m should mean one big unpluggable slot.
>
> So in the new proposal,"-device dimm populated=true" means a hot-removable dimm
> that has already been hotplugged.
>
Yes.
> A question here is when exactly should the initial hot-add event for this dimm
> be played? If the relevant OSPM has not yet been initialized (e.g. acpi_memhotplug
> module in a linux guest needs to be loaded), the guest may not see the event.
> This is a general issue of course, but with initially populated hot-removable
> dimms it may be a bigger issue. Can ospm acpi initialization be detected?
>
> Or maybe you are suggesting "populated=true" is part of initial memory (i.e. not
> hot-added, but still hot-removable). Though in that case guestOS may use it for
> bootmem allocations, making hot-remove more likely to fail at the memory
> offlining stage.
>
If memory slot is populated even before OSPM is started BIOS will detect
that by reading mem_sts and will create e820 map appropriately. OSPM
will detect it by evaluating _STA during boot. This is not unique for
memory hot-plug. Any hotpluggable device have the same issue.
--
Gleb.