This adds support for device hotplug behind pci bridges. Bridge devices themselves need to be pre-configured on qemu command line.
One of the goals of this project was to demonstrate the advantages of generating ACPI tables in QEMU. In my opinion, it does this successfully.
* This saves the need to provide a new interface for firmware to discover bus number to pci brige mapping * No need for yet another interface to bios to detect qemu version to check if it's safe to activate new code, and to ship multiple table versions: we generated code so we know this is new qemu * Use of glib's GArray makes it much easier to build up tables in code without need for iasl and code patching
I have also tried to address the concerns that Anthony has raised with this approach, please see below.
Design: - each bus gets assigned a number 0-255 - generated ACPI code writes this number to a new BSEL register, then uses existing UP/DOWN registers to probe slot status; to eject, write number to BSEL register, then slot into existing EJ
This is to address the ACPI spec requirement to avoid config cycle access to any bus except PCI roots.
Portability: - Non x86 (or any Linux) platforms don't need any of this code. They can keep happily using SHPC the way they always did.
Things to note: - this is on top of acpi patchset posted a month ago, with a small patch adding a core function to walk all pci buses, on top. Can also be found in my git tree git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi
- Extensive use of glib completely removes pointer math in new code: we use g_array_append_vals exclusively. There's no code patching in new code. This is in response to comments about security concerns with adding code to qemu. In this sense it is more secure than existing code in hw/acpi/core.c - that can be switched to glib if desired.
- New code (not imported from seabios) uses glib to reduce dependency on iasl. With time all code can be rewritted to use glib and avoid iasl, if desired.
- As was the case previously, systems that lack working iasl are detected at configure time, pre-generated hex files in source tree are used in this case. This addresses the concern with iasl/big-endian systems.
- Compile tested only. Migration is known to be broken - there are a bunch of TODO tags in code. It was agreed on the last qemu conf meeting that this code is posted without looking at migration, with understanding that it is addressed before being merged. Please do not mistake this limitation for a fundamental one - I have a very good idea how to fix it, including cross-version migration.
- Cross version migration: when running with -M 1.5 and older, all ACPI table generation should be disabled. We'll present FW_CFG interface compatible with 1.5.
Cc: Jordan Justen jljusten@gmail.com Cc: Anthony Liguori anthony@codemonkey.ws Cc: Laszlo Ersek lersek@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: seabios@seabios.org Cc: David Woodhouse dwmw2@infradead.org Cc: Kevin O'Connor kevin@koconnor.net Cc: Peter Maydell peter.maydell@linaro.org Signed-off-by: Michael S. Tsirkin mst@redhat.com --- docs/specs/acpi_pci_hotplug.txt | 8 + include/hw/i386/pc.h | 4 +- hw/i386/acpi-dsdt.dsl | 36 +++- hw/i386/ssdt-pcihp.dsl | 51 ----- hw/acpi/piix4.c | 145 ++++++++++---- hw/i386/acpi-build.c | 411 ++++++++++++++++++++++++++++++++++----- hw/i386/Makefile.objs | 2 +- hw/i386/ssdt-pcihp.hex.generated | 108 ---------- 8 files changed, 510 insertions(+), 255 deletions(-) delete mode 100644 hw/i386/ssdt-pcihp.dsl delete mode 100644 hw/i386/ssdt-pcihp.hex.generated
diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt index a839434..951b99a 100644 --- a/docs/specs/acpi_pci_hotplug.txt +++ b/docs/specs/acpi_pci_hotplug.txt @@ -43,3 +43,11 @@ PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
Used by ACPI BIOS _RMV method to indicate removability status to OS. One bit per slot. Read-only + +PCI bus selector (IO port 0xae10-0xae13, 4-byte access): +----------------------------------------------- + +Used by ACPI BIOS methods to select the current PCI bus. +When written, makes all the other PCI registers (0xae00 - 0xae0f) +to refer to the appropriate bus. +0 selects PCI root bus (default). diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index f8d0871..66ec787 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -38,7 +38,9 @@ struct PcGuestInfo { bool s3_disabled; bool s4_disabled; uint8_t s4_val; - DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX); +#define GUEST_INFO_MAX_HOTPLUG_BUS 256 + PCIBus *hotplug_buses[GUEST_INFO_MAX_HOTPLUG_BUS]; + DECLARE_BITMAP(hotplug_enable[GUEST_INFO_MAX_HOTPLUG_BUS], PCI_SLOT_MAX); uint16_t sci_int; uint8_t acpi_enable_cmd; uint8_t acpi_disable_cmd; diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 90efce0..dc05668 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -133,11 +133,18 @@ DefinitionBlock ( B0EJ, 32, }
+ OperationRegion(SEL, SystemIO, 0xae10, 0x04) + Field(SEL, DWordAcc, NoLock, WriteAsZeros) { + BSEL, 32, + } + /* Methods called by bulk generated PCI devices below */ + Name(BNUM, 0x00)
/* Methods called by hotplug devices */ - Method(PCEJ, 1, NotSerialized) { + Method(PCEJ, 2, NotSerialized) { // _EJ0 method - eject callback + Store(Arg1, BSEL) Store(ShiftLeft(1, Arg0), B0EJ) Return (0x0) } @@ -147,16 +154,25 @@ DefinitionBlock (
/* PCI hotplug notify method */ Method(PCNF, 0) { - // Local0 = iterator + // Local0 = bridge selector Store(Zero, Local0) - While (LLess(Local0, 31)) { - Increment(Local0) - If (And(PCIU, ShiftLeft(1, Local0))) { - PCNT(Local0, 1) - } - If (And(PCID, ShiftLeft(1, Local0))) { - PCNT(Local0, 3) - } + While (LLess(Local0, 255)) { + Store(Local0, BSEL) + // Local1 = slot iterator + Store(Zero, Local1) + + Store(PCIU, Local2) + Store(PCID, Local3) + While (LLess(Local1, 31)) { + Increment(Local1) + If (And(Local2, ShiftLeft(1, Local1))) { + PCNT(Add(Local1, Multiply(Local0, 32)), 1) + } + If (And(Local3, ShiftLeft(1, Local1))) { + PCNT(Add(Local1, Multiply(Local0, 32)), 3) + } + } + Increment(Local0) } } } diff --git a/hw/i386/ssdt-pcihp.dsl b/hw/i386/ssdt-pcihp.dsl deleted file mode 100644 index d29a5b9..0000000 --- a/hw/i386/ssdt-pcihp.dsl +++ /dev/null @@ -1,51 +0,0 @@ -/* - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - - * You should have received a copy of the GNU General Public License along - * with this program; if not, see http://www.gnu.org/licenses/. - */ - -ACPI_EXTRACT_ALL_CODE ssdp_pcihp_aml - -DefinitionBlock ("ssdt-pcihp.aml", "SSDT", 0x01, "BXPC", "BXSSDTPCIHP", 0x1) -{ - -/**************************************************************** - * PCI hotplug - ****************************************************************/ - - /* Objects supplied by DSDT */ - External(_SB.PCI0, DeviceObj) - External(_SB.PCI0.PCEJ, MethodObj) - - Scope(_SB.PCI0) { - - /* Bulk generated PCI hotplug devices */ - ACPI_EXTRACT_DEVICE_START ssdt_pcihp_start - ACPI_EXTRACT_DEVICE_END ssdt_pcihp_end - ACPI_EXTRACT_DEVICE_STRING ssdt_pcihp_name - - // Method _EJ0 can be patched by BIOS to EJ0_ - // at runtime, if the slot is detected to not support hotplug. - // Extract the offset of the address dword and the - // _EJ0 name to allow this patching. - Device(SAA) { - ACPI_EXTRACT_NAME_BYTE_CONST ssdt_pcihp_id - Name(_SUN, 0xAA) - ACPI_EXTRACT_NAME_DWORD_CONST ssdt_pcihp_adr - Name(_ADR, 0xAA0000) - ACPI_EXTRACT_METHOD_STRING ssdt_pcihp_ej0 - Method(_EJ0, 1) { - Return (PCEJ(_SUN)) - } - } - } -} diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 19a26f5..10106fd 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -29,6 +29,7 @@ #include "exec/ioport.h" #include "hw/nvram/fw_cfg.h" #include "exec/address-spaces.h" +#include "hw/pci/pci_bus.h"
//#define DEBUG
@@ -42,11 +43,12 @@ #define GPE_LEN 4
#define PCI_HOTPLUG_ADDR 0xae00 -#define PCI_HOTPLUG_SIZE 0x000f +#define PCI_HOTPLUG_SIZE 0x0014 #define PCI_UP_BASE 0xae00 #define PCI_DOWN_BASE 0xae04 #define PCI_EJ_BASE 0xae08 #define PCI_RMV_BASE 0xae0c +#define PCI_SEL_BASE 0xae10
#define PIIX4_PROC_BASE 0xaf00 #define PIIX4_PROC_LEN 32 @@ -57,6 +59,8 @@ struct pci_status { uint32_t up; /* deprecated, maintained for migration compatibility */ uint32_t down; + uint32_t hotplug_enable; + uint32_t device_present; };
typedef struct CPUStatus { @@ -84,9 +88,8 @@ typedef struct PIIX4PMState { Notifier powerdown_notifier;
/* for pci hotplug */ - struct pci_status pci0_status; - uint32_t pci0_hotplug_enable; - uint32_t pci0_slot_device_present; + struct pci_status pci_status[GUEST_INFO_MAX_HOTPLUG_BUS]; + uint32_t hotplug_select;
uint8_t disable_s3; uint8_t disable_s4; @@ -98,6 +101,17 @@ typedef struct PIIX4PMState { PcGuestInfo *guest_info; } PIIX4PMState;
+static int piix4_find_hotplug_bus(PcGuestInfo *guest_info, PCIBus *bus) +{ + int i; + for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) { + if (guest_info->hotplug_buses[i] == bus) { + return i; + } + } + return -1; +} + static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s);
@@ -183,6 +197,8 @@ static void pm_write_config(PCIDevice *d,
static void vmstate_pci_status_pre_save(void *opaque) { +#if 0 + /* TODO */ struct pci_status *pci0_status = opaque; PIIX4PMState *s = container_of(pci0_status, PIIX4PMState, pci0_status);
@@ -190,6 +206,7 @@ static void vmstate_pci_status_pre_save(void *opaque) * to a version that still does... of course these might get lost * by an old buggy implementation, but we try. */ pci0_status->up = s->pci0_slot_device_present & s->pci0_hotplug_enable; +#endif }
static int vmstate_acpi_post_load(void *opaque, int version_id) @@ -267,7 +284,10 @@ static int acpi_load_old(QEMUFile *f, void *opaque, int version_id) qemu_get_be16s(f, &temp); }
+#if 0 + TODO ret = vmstate_load_state(f, &vmstate_pci_status, &s->pci0_status, 1); +#endif return ret; }
@@ -293,21 +313,28 @@ static const VMStateDescription vmstate_acpi = { VMSTATE_TIMER(ar.tmr.timer, PIIX4PMState), VMSTATE_INT64(ar.tmr.overflow_time, PIIX4PMState), VMSTATE_STRUCT(ar.gpe, PIIX4PMState, 2, vmstate_gpe, ACPIGPE), +#if 0 VMSTATE_STRUCT(pci0_status, PIIX4PMState, 2, vmstate_pci_status, struct pci_status), +#endif VMSTATE_END_OF_LIST() } };
-static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned bsel, unsigned slots) { BusChild *kid, *next; - BusState *bus = qdev_get_parent_bus(&s->dev.qdev); + BusState *bus; int slot = ffs(slots) - 1; bool slot_free = true;
+ if (!s->guest_info->hotplug_buses[bsel]) { + return; + } + bus = &s->guest_info->hotplug_buses[bsel]->qbus; + /* Mark request as complete */ - s->pci0_status.down &= ~(1U << slot); + s->pci_status[bsel].down &= ~(1U << slot);
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child; @@ -322,23 +349,26 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) } } if (slot_free) { - s->pci0_slot_device_present &= ~(1U << slot); + s->pci_status[bsel].device_present &= ~(1U << slot); } }
-static void piix4_update_hotplug(PIIX4PMState *s) +static void piix4_update_hotplug(PIIX4PMState *s, unsigned bsel) { - PCIDevice *dev = &s->dev; - BusState *bus = qdev_get_parent_bus(&dev->qdev); BusChild *kid, *next; + BusState *bus;
+ if (!s->guest_info->hotplug_buses[bsel]) { + return; + } + bus = &s->guest_info->hotplug_buses[bsel]->qbus; /* Execute any pending removes during reset */ - while (s->pci0_status.down) { - acpi_piix_eject_slot(s, s->pci0_status.down); + while (s->pci_status[bsel].down) { + acpi_piix_eject_slot(s, bsel, s->pci_status[bsel].down); }
- s->pci0_hotplug_enable = ~0; - s->pci0_slot_device_present = 0; + s->pci_status[bsel].hotplug_enable = ~0; + s->pci_status[bsel].device_present = 0;
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child; @@ -347,10 +377,10 @@ static void piix4_update_hotplug(PIIX4PMState *s) int slot = PCI_SLOT(pdev->devfn);
if (pc->no_hotplug) { - s->pci0_hotplug_enable &= ~(1U << slot); + s->pci_status[bsel].hotplug_enable &= ~(1U << slot); }
- s->pci0_slot_device_present |= (1U << slot); + s->pci_status[bsel].device_present |= (1U << slot); } }
@@ -358,6 +388,7 @@ static void piix4_reset(void *opaque) { PIIX4PMState *s = opaque; uint8_t *pci_conf = s->dev.config; + int i;
pci_conf[0x58] = 0; pci_conf[0x59] = 0; @@ -371,7 +402,9 @@ static void piix4_reset(void *opaque) /* Mark SMM as already inited (until KVM supports SMM). */ pci_conf[0x5B] = 0x02; } - piix4_update_hotplug(s); + for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) { + piix4_update_hotplug(s, i); + } }
static void piix4_pm_powerdown_req(Notifier *n, void *opaque) @@ -382,14 +415,20 @@ static void piix4_pm_powerdown_req(Notifier *n, void *opaque) acpi_pm1_evt_power_down(&s->ar); }
-static void piix4_update_guest_info(PIIX4PMState *s) +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, + PCIHotplugState state); + +static void piix4_update_bus_guest_info(PCIBus *pci_bus, void *opaque) { - PCIDevice *dev = &s->dev; - BusState *bus = qdev_get_parent_bus(&dev->qdev); + PIIX4PMState *s = opaque; + PcGuestInfo *guest_info = s->guest_info; BusChild *kid, *next; + BusState *bus = &pci_bus->qbus;
- memset(s->guest_info->slot_hotplug_enable, 0xff, - DIV_ROUND_UP(PCI_SLOT_MAX, BITS_PER_BYTE)); + int i = piix4_find_hotplug_bus(guest_info, NULL); + assert(i >= 0); + + guest_info->hotplug_buses[i] = pci_bus;
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child; @@ -398,9 +437,22 @@ static void piix4_update_guest_info(PIIX4PMState *s) int slot = PCI_SLOT(pdev->devfn);
if (pc->no_hotplug) { - clear_bit(slot, s->guest_info->slot_hotplug_enable); + clear_bit(slot, guest_info->hotplug_enable[i]); } } + + pci_bus_hotplug(pci_bus, piix4_device_hotplug, &s->dev.qdev); +} + +static void piix4_update_guest_info(PIIX4PMState *s) +{ + PCIDevice *dev = &s->dev; + + memset(s->guest_info->hotplug_enable, 0xff, + sizeof s->guest_info->hotplug_enable); + + pci_for_each_bus(dev->bus, piix4_update_bus_guest_info, + s); }
static void piix4_pm_machine_ready(Notifier *n, void *opaque) @@ -589,16 +641,22 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) { PIIX4PMState *s = opaque; uint32_t val = 0; + int bsel = s->hotplug_select; + + if (bsel > GUEST_INFO_MAX_HOTPLUG_BUS) { + return 0; + }
switch (addr) { case PCI_UP_BASE - PCI_HOTPLUG_ADDR: /* Manufacture an "up" value to cause a device check on any hotplug * slot with a device. Extra device checks are harmless. */ - val = s->pci0_slot_device_present & s->pci0_hotplug_enable; + val = s->pci_status[bsel].device_present & + s->pci_status[bsel].hotplug_enable; PIIX4_DPRINTF("pci_up_read %x\n", val); break; case PCI_DOWN_BASE - PCI_HOTPLUG_ADDR: - val = s->pci0_status.down; + val = s->pci_status[bsel].down; PIIX4_DPRINTF("pci_down_read %x\n", val); break; case PCI_EJ_BASE - PCI_HOTPLUG_ADDR: @@ -606,8 +664,10 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) PIIX4_DPRINTF("pci_features_read %x\n", val); break; case PCI_RMV_BASE - PCI_HOTPLUG_ADDR: - val = s->pci0_hotplug_enable; + val = s->pci_status[bsel].hotplug_enable; break; + case PCI_SEL_BASE - PCI_HOTPLUG_ADDR: + val = s->hotplug_select; default: break; } @@ -618,12 +678,18 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) static void pci_write(void *opaque, hwaddr addr, uint64_t data, unsigned int size) { + PIIX4PMState *s = opaque; switch (addr) { case PCI_EJ_BASE - PCI_HOTPLUG_ADDR: - acpi_piix_eject_slot(opaque, (uint32_t)data); + if (s->hotplug_select >= GUEST_INFO_MAX_HOTPLUG_BUS) { + break; + } + acpi_piix_eject_slot(s, s->hotplug_select, data); PIIX4_DPRINTF("pciej write %" HWADDR_PRIx " <== % " PRIu64 "\n", addr, data); break; + case PCI_SEL_BASE - PCI_HOTPLUG_ADDR: + s->hotplug_select = data; default: break; } @@ -706,9 +772,6 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data) g->sts[id / 8] |= (1 << (id % 8)); }
-static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, - PCIHotplugState state); - static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s) { @@ -720,8 +783,6 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCI_HOTPLUG_SIZE); memory_region_add_subregion(parent, PCI_HOTPLUG_ADDR, &s->io_pci); - pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev); - qemu_for_each_cpu(piix4_init_cpu_status, &s->gpe_cpu); memory_region_init_io(&s->io_cpu, &cpu_hotplug_ops, s, "apci-cpu-hotplug", PIIX4_PROC_LEN); @@ -730,16 +791,16 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, qemu_register_cpu_added_notifier(&s->cpu_added_notifier); }
-static void enable_device(PIIX4PMState *s, int slot) +static void enable_device(PIIX4PMState *s, unsigned bsel, int slot) { s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS; - s->pci0_slot_device_present |= (1U << slot); + s->pci_status[bsel].device_present |= (1U << slot); }
-static void disable_device(PIIX4PMState *s, int slot) +static void disable_device(PIIX4PMState *s, unsigned bsel, int slot) { s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS; - s->pci0_status.down |= (1U << slot); + s->pci_status[bsel].down |= (1U << slot); }
static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, @@ -748,19 +809,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int slot = PCI_SLOT(dev->devfn); PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, PCI_DEVICE(qdev)); + int bsel = piix4_find_hotplug_bus(s->guest_info, dev->bus); + if (bsel < 0) { + return -1; + }
/* Don't send event when device is enabled during qemu machine creation: * it is present on boot, no hotplug event is necessary. We do send an * event when the device is disabled later. */ if (state == PCI_COLDPLUG_ENABLED) { - s->pci0_slot_device_present |= (1U << slot); + s->pci_status[bsel].device_present |= (1U << slot); return 0; }
if (state == PCI_HOTPLUG_ENABLED) { - enable_device(s, slot); + enable_device(s, bsel, slot); } else { - disable_device(s, slot); + disable_device(s, bsel, slot); }
pm_update_sci(s); diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 72606a8..43e4988 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -34,6 +34,7 @@ #include "hw/acpi/acpi.h" #include "hw/nvram/fw_cfg.h" #include "hw/i386/bios-linker-loader.h" +#include "hw/pci/pci_bus.h"
#define ACPI_BUILD_APPNAME "Bochs" #define ACPI_BUILD_APPNAME6 "BOCHS " @@ -258,23 +259,358 @@ acpi_encode_len(uint8_t *ssdt_ptr, int length, int bytes) #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start)
-/* 0x5B 0x82 DeviceOp PkgLength NameString */ -#define ACPI_PCIHP_OFFSET_HEX (*ssdt_pcihp_name - *ssdt_pcihp_start + 1) -#define ACPI_PCIHP_OFFSET_ID (*ssdt_pcihp_id - *ssdt_pcihp_start) -#define ACPI_PCIHP_OFFSET_ADR (*ssdt_pcihp_adr - *ssdt_pcihp_start) -#define ACPI_PCIHP_OFFSET_EJ0 (*ssdt_pcihp_ej0 - *ssdt_pcihp_start) -#define ACPI_PCIHP_SIZEOF (*ssdt_pcihp_end - *ssdt_pcihp_start) -#define ACPI_PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start) - #define ACPI_SSDT_SIGNATURE 0x54445353 /* SSDT */ #define ACPI_SSDT_HEADER_LENGTH 36
#include "hw/i386/ssdt-misc.hex" -#include "hw/i386/ssdt-pcihp.hex" + +static inline GArray *build_alloc_array(void) +{ + return g_array_new(false, true /* clear */, 1); +} + +static inline void build_free_array(GArray *array) +{ + g_array_free(array, true); +} + +static inline void build_prepend_byte(GArray *array, uint8_t val) +{ + g_array_prepend_val(array, val); +} + +static inline void build_append_byte(GArray *array, uint8_t val) +{ + g_array_append_val(array, val); +} + +static inline void build_prepend_array(GArray *array, GArray *val) +{ + g_array_prepend_vals(array, val->data, val->len); +} + +static inline void build_append_array(GArray *array, GArray *val) +{ + g_array_append_vals(array, val->data, val->len); +} + +static void build_append_nameseg(GArray *array, const char *format, ...) +{ + GString *s = g_string_new(""); + va_list args; + + va_start(args, format); + g_string_vprintf(s, format, args); + va_end(args); + + assert(s->len == 4); + g_array_append_vals(array, s->str, s->len); + g_string_free(s, true); +} + +static void build_prepend_nameseg(GArray *array, const char *format, ...) +{ + GString *s = g_string_new(""); + va_list args; + + va_start(args, format); + g_string_vprintf(s, format, args); + va_end(args); + + assert(s->len == 4); + g_array_prepend_vals(array, s->str, s->len); + g_string_free(s, true); +} + +static void build_prepend_devfn(GArray *name, uint8_t devfn) +{ + build_prepend_nameseg(name, "S%0.02x_", devfn); +} + +static void build_append_devfn(GArray *name, uint8_t devfn) +{ + GArray *array = build_alloc_array(); + build_prepend_devfn(array, devfn); + build_append_array(name, array); + build_free_array(array); +} + +static void build_prepend_name_string(GArray *name, PCIBus *bus) +{ + int segcount; + + for (;!pci_bus_is_root(bus); bus = bus->parent_dev->bus) { + build_prepend_devfn(name, bus->parent_dev->devfn); + } + g_array_prepend_vals(name, "PCI0", 4); + g_array_prepend_vals(name, "_SB_", 4); + /* Name segment length is a multiple of 4 */ + assert(name->len % 4 == 0); + segcount = name->len / 4; + build_prepend_byte(name, segcount); + build_prepend_byte(name, 0x2f); /* MultiNamePrefix */ + build_prepend_byte(name, '\'); /* RootChar */ +} + +static void build_append_name_string(GArray *name, PCIBus *bus) +{ + GArray *append = build_alloc_array(); + build_prepend_name_string(append, bus); + build_append_array(name, append); + build_free_array(append); +} + +/* 5.4 Definition Block Encoding */ +enum { + PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */ + PACKAGE_LENGTH_2BYTE_SHIFT = 4, + PACKAGE_LENGTH_3BYTE_SHIFT = 12, + PACKAGE_LENGTH_4BYTE_SHIFT = 20, +}; + +static void build_prepend_package_length(GArray *package) +{ + unsigned bytes; + uint8_t byte; + unsigned package_len = package->len; + unsigned length = package_len; + + if (length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT)) { + byte = length; + build_prepend_byte(package, byte); + return; + } + + if (length + 4 > (1 << PACKAGE_LENGTH_4BYTE_SHIFT)) { + byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT; + build_prepend_byte(package, byte); + length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1; + } + if (length + 3 > (1 << PACKAGE_LENGTH_3BYTE_SHIFT)) { + byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT; + build_prepend_byte(package, byte); + length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1; + } + if (length + 2 > (1 << PACKAGE_LENGTH_2BYTE_SHIFT)) { + byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT; + build_prepend_byte(package, byte); + length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1; + } + + bytes = package->len - package_len; + byte = (bytes << PACKAGE_LENGTH_1BYTE_SHIFT) | length; + build_prepend_byte(package, byte); +} + +static void build_package(GArray *package, uint8_t op) +{ + build_prepend_package_length(package); + build_prepend_byte(package, op); +} + +static void build_extop_package(GArray *package, uint8_t op) +{ + build_package(package, op); + build_prepend_byte(package, 0x5B); /* ExtOpPrefix */ +} + +static void build_append_notify(GArray *method, GArray *target_name, + uint16_t value) +{ + GArray *notify = build_alloc_array(); + uint8_t op = 0xA0; /* IfOp */ + + build_append_byte(notify, 0x93); /* LEqualOp */ + build_append_byte(notify, 0x68); /* Arg0Op */ + build_append_byte(notify, 0x0A); /* BytePrefix */ + build_append_byte(notify, value); + build_append_byte(notify, value >> 8); + build_append_byte(notify, 0x86); /* NotifyOp */ + build_append_array(notify, target_name); + build_append_byte(notify, 0x69); /* Arg1Op */ + + /* Pack it up */ + build_package(notify, op); + + build_append_array(method, notify); + + build_free_array(notify); +} + +static void +build_append_notify_slots(GArray *method, PCIBus *bus, + unsigned bus_select, + DECLARE_BITMAP(hotplug_enable, PCI_SLOT_MAX)) +{ + int i; + + if (!bus) { + return; + } + + for (i = 0; i < 32; i++) { + GArray *target; + if (!test_bit(i, hotplug_enable)) { + continue; + } + target = build_alloc_array(); + build_prepend_devfn(target, i); + build_prepend_name_string(target, bus); + /* *32 matches Multiply in PCNF */ + build_append_notify(method, target, bus_select * 32 + i); + build_free_array(target); + } +} + +static void build_append_notify_buses(GArray *table, PcGuestInfo *guest_info) +{ + int i; + GArray *method = build_alloc_array(); + + uint8_t op = 0x14; /* MethodOp */ + + build_append_nameseg(method, "PCNT"); + build_append_byte(method, 0x02); /* MethodFlags: ArgCount */ + for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) { + build_append_notify_slots(method, guest_info->hotplug_buses[i], i, + guest_info->hotplug_enable[i]); + } + build_package(method, op); + + build_append_array(table, method); + build_free_array(method); +} + +static void build_append_device(GArray *table, unsigned devfn) +{ + uint8_t op, ej0_op; + + GArray *device = build_alloc_array(); + GArray *ej0 = build_alloc_array(); + + op = 0x82; /* DeviceOp */ + + build_append_devfn(device, devfn); + + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "_SUN"); + build_append_byte(device, 0x0A); /* BytePrefix */ + build_append_byte(device, PCI_SLOT(devfn)); + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "_ADR"); + build_append_byte(device, 0x0C); /* DWordPrefix */ + build_append_byte(device, PCI_FUNC(devfn)); + build_append_byte(device, 0x00); + build_append_byte(device, PCI_SLOT(devfn)); + build_append_byte(device, 0x00); + + ej0_op = 0x14; /* MethodOp */ + build_append_nameseg(ej0, "_EJ0"); + build_append_byte(ej0, 0x01); /* MethodFlags: ArgCount */ + build_append_byte(ej0, 0xA4); /* ReturnOp */ + build_append_nameseg(ej0, "PCEJ"); + build_append_nameseg(ej0, "_SUN"); + build_append_nameseg(ej0, "BNUM"); + build_package(ej0, ej0_op); + build_append_array(device, ej0); + + build_extop_package(device, op); + + build_append_array(table, device); + build_free_array(device); + build_free_array(ej0); +} + +static bool build_find_device_bus(PCIBus *buses[GUEST_INFO_MAX_HOTPLUG_BUS], + PCIBus *bus, unsigned devfn) +{ + int i; + for (i = 0; i < ARRAY_SIZE(buses); ++i) { + if (buses[i] && buses[i]->parent_dev && + buses[i]->parent_dev->bus == bus && + buses[i]->parent_dev->devfn == devfn) { + return true; + } + } + return false; +} + +static void +build_append_bus(GArray *table, int bnum, PcGuestInfo *guest_info) +{ + uint8_t op; + int slot; + unsigned devfn; + GArray *device; + PCIBus *bus = guest_info->hotplug_buses[bnum]; + + if (!bus) { + return; + } + + device = build_alloc_array(); + + /* Root device described in DSDT */ + if (bnum) { + devfn = bus->parent_dev->devfn; + + op = 0x82; /* DeviceOp */ + + build_append_name_string(device, bus); + + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "_SUN"); + build_append_byte(device, 0x0A); /* BytePrefix */ + build_append_byte(device, PCI_SLOT(devfn)); + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "BNUM"); + build_append_byte(device, 0x0A); /* BytePrefix */ + build_append_byte(device, bnum); + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "_ADR"); + build_append_byte(device, 0x0C); /* DWordPrefix */ + build_append_byte(device, PCI_FUNC(devfn)); + build_append_byte(device, 0x00); + build_append_byte(device, PCI_SLOT(devfn)); + build_append_byte(device, 0x00); + } else { + op = 0x10; /* ScopeOp */ + build_append_nameseg(device, "PCI0"); + } + + for (slot = 0; slot < PCI_SLOT_MAX; ++slot) { + if (!test_bit(slot, guest_info->hotplug_enable[bnum])) { + continue; + } + /* Bus device? We'll add it later */ + if (build_find_device_bus(guest_info->hotplug_buses, + bus, PCI_DEVFN(slot, 0))) { + continue; + } + build_append_device(device, PCI_DEVFN(slot, 0)); + } + + if (bnum) { + build_extop_package(device, op); + } else { + build_package(device, op); + } + + build_append_array(table, device); + build_free_array(device); +} + +static void build_append_device_buses(GArray *table, PcGuestInfo *guest_info) +{ + int i; + for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) { + build_append_bus(table, i, guest_info); + } +}
static uint8_t* build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, - const char *target, int ofs) + const char *target, int ofs, int step) { int i;
@@ -295,31 +631,14 @@ build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, *(ssdt_ptr++) = i; *(ssdt_ptr++) = 0x86; /* NotifyOp */ memcpy(ssdt_ptr, target, 4); - ssdt_ptr[ofs] = acpi_get_hex(i >> 4); - ssdt_ptr[ofs + 1] = acpi_get_hex(i); + ssdt_ptr[ofs] = acpi_get_hex((i * step) >> 4); + ssdt_ptr[ofs + 1] = acpi_get_hex(i * step); ssdt_ptr += 4; *(ssdt_ptr++) = 0x69; /* Arg1Op */ } return ssdt_ptr; }
-static void patch_pcihp(int slot, uint8_t *ssdt_ptr, uint32_t eject) -{ - ssdt_ptr[ACPI_PCIHP_OFFSET_HEX] = acpi_get_hex(slot >> 4); - ssdt_ptr[ACPI_PCIHP_OFFSET_HEX+1] = acpi_get_hex(slot); - ssdt_ptr[ACPI_PCIHP_OFFSET_ID] = slot; - ssdt_ptr[ACPI_PCIHP_OFFSET_ADR + 2] = slot; - - /* Runtime patching of ACPI_EJ0: to disable hotplug for a slot, - * replace the method name: _EJ0 by ACPI_EJ0_. */ - /* Sanity check */ - assert (!memcmp(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "_EJ0", 4)); - - if (!eject) { - memcpy(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "EJ0_", 4); - } -} - static void build_ssdt(GArray *table_data, GArray *linker, FWCfgState *fw_cfg, PcGuestInfo *guest_info) @@ -330,11 +649,20 @@ build_ssdt(GArray *table_data, GArray *linker, + (acpi_cpus * ACPI_PROC_SIZEOF) /* procs */ + (1+2+5+(12*acpi_cpus)) /* NTFY */ + (6+2+1+(1*acpi_cpus)) /* CPON */ - + (1+3+4) /* Scope(PCI0) */ - + ((PCI_SLOT_MAX - 1) * ACPI_PCIHP_SIZEOF) /* slots */ - + (1+2+5+(12*(PCI_SLOT_MAX - 1)))); /* PCNT */ - uint8_t *ssdt = acpi_data_push(table_data, length); - uint8_t *ssdt_ptr = ssdt; + ); + uint8_t *ssdt; + uint8_t *ssdt_ptr; + GArray *devices = build_alloc_array(); + GArray *notify = build_alloc_array(); + + /* TODO: rewrite this function using glib */ + build_append_device_buses(devices, guest_info); + length += devices->len; + build_append_notify_buses(notify, guest_info); + length += notify->len; + + ssdt = acpi_data_push(table_data, length); + ssdt_ptr = ssdt;
/* Copy header and encode fwcfg values in the S3_ / S4_ / S5_ packages */ memcpy(ssdt_ptr, ssdp_misc_aml, sizeof(ssdp_misc_aml)); @@ -389,7 +717,7 @@ build_ssdt(GArray *table_data, GArray *linker,
/* build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}" */ /* Arg0 = Processor ID = APIC ID */ - ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2); + ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2, 1);
/* build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })" */ *(ssdt_ptr++) = 0x08; /* NameOp */ @@ -411,15 +739,10 @@ build_ssdt(GArray *table_data, GArray *linker, *(ssdt_ptr++) = 'I'; *(ssdt_ptr++) = '0';
- /* build Device object for each slot */ - for (i = 1; i < PCI_SLOT_MAX; i++) { - bool eject = test_bit(i, guest_info->slot_hotplug_enable); - memcpy(ssdt_ptr, ACPI_PCIHP_AML, ACPI_PCIHP_SIZEOF); - patch_pcihp(i, ssdt_ptr, eject); - ssdt_ptr += ACPI_PCIHP_SIZEOF; - } - - ssdt_ptr = build_notify(ssdt_ptr, "PCNT", 1, PCI_SLOT_MAX, "S00_", 1); + memcpy(ssdt_ptr, devices->data, devices->len); + ssdt_ptr += devices->len; + memcpy(ssdt_ptr, notify->data, notify->len); + ssdt_ptr += notify->len;
build_header(linker, table_data, (void*)ssdt, ACPI_SSDT_SIGNATURE, ssdt_ptr - ssdt, 1); diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 2ab2572..c2e74e9 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -6,7 +6,7 @@ obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o obj-y += kvmvapic.o obj-y += acpi-build.o obj-y += bios-linker-loader.o -hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex +hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex hw/i386/pc_piix.o: hw/i386/pc_piix.c hw/i386/acpi-dsdt.hex hw/i386/pc_q35.o: hw/i386/pc_q35.c hw/i386/q35-acpi-dsdt.hex
diff --git a/hw/i386/ssdt-pcihp.hex.generated b/hw/i386/ssdt-pcihp.hex.generated deleted file mode 100644 index 0d32a27..0000000 --- a/hw/i386/ssdt-pcihp.hex.generated +++ /dev/null @@ -1,108 +0,0 @@ -static unsigned char ssdt_pcihp_name[] = { -0x33 -}; -static unsigned char ssdt_pcihp_adr[] = { -0x44 -}; -static unsigned char ssdt_pcihp_end[] = { -0x58 -}; -static unsigned char ssdp_pcihp_aml[] = { -0x53, -0x53, -0x44, -0x54, -0x58, -0x0, -0x0, -0x0, -0x1, -0x77, -0x42, -0x58, -0x50, -0x43, -0x0, -0x0, -0x42, -0x58, -0x53, -0x53, -0x44, -0x54, -0x50, -0x43, -0x1, -0x0, -0x0, -0x0, -0x49, -0x4e, -0x54, -0x4c, -0x28, -0x5, -0x10, -0x20, -0x10, -0x33, -0x5c, -0x2e, -0x5f, -0x53, -0x42, -0x5f, -0x50, -0x43, -0x49, -0x30, -0x5b, -0x82, -0x26, -0x53, -0x41, -0x41, -0x5f, -0x8, -0x5f, -0x53, -0x55, -0x4e, -0xa, -0xaa, -0x8, -0x5f, -0x41, -0x44, -0x52, -0xc, -0x0, -0x0, -0xaa, -0x0, -0x14, -0xf, -0x5f, -0x45, -0x4a, -0x30, -0x1, -0xa4, -0x50, -0x43, -0x45, -0x4a, -0x5f, -0x53, -0x55, -0x4e -}; -static unsigned char ssdt_pcihp_start[] = { -0x30 -}; -static unsigned char ssdt_pcihp_id[] = { -0x3d -}; -static unsigned char ssdt_pcihp_ej0[] = { -0x4a -};
"Michael S. Tsirkin" mst@redhat.com writes:
This adds support for device hotplug behind pci bridges. Bridge devices themselves need to be pre-configured on qemu command line.
One of the goals of this project was to demonstrate the advantages of generating ACPI tables in QEMU. In my opinion, it does this successfully.
Since you've gone out of your way to make this difficult to actually review...
* This saves the need to provide a new interface for firmware to discover bus number to pci brige mapping
Do you mean fw_cfg? The BIOS normally does bus numbering. I see no reason for it not to.
* No need for yet another interface to bios to detect qemu version to check if it's safe to activate new code, and to ship multiple table versions:
We only care about supporting one version of SeaBIOS. SeaBIOS should only care about supporting one version of QEMU. We're not asking it to support multiple versions.
we generated code so we know this is new qemu * Use of glib's GArray makes it much easier to build up tables in code without need for iasl and code patching
Adding a dynamic array to SeaBIOS isn't rocket science.
I have also tried to address the concerns that Anthony has raised with this approach, please see below.
Design: - each bus gets assigned a number 0-255 - generated ACPI code writes this number to a new BSEL register, then uses existing UP/DOWN registers to probe slot status; to eject, write number to BSEL register, then slot into existing EJ
This is to address the ACPI spec requirement to avoid config cycle access to any bus except PCI roots.
Portability: - Non x86 (or any Linux) platforms don't need any of this code. They can keep happily using SHPC the way they always did.
Things to note: - this is on top of acpi patchset posted a month ago, with a small patch adding a core function to walk all pci buses, on top. Can also be found in my git tree git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi
- Extensive use of glib completely removes pointer math in new code: we use g_array_append_vals exclusively. There's no code patching in new code. This is in response to comments about security concerns with adding code to qemu. In this sense it is more secure than existing code in hw/acpi/core.c - that can be switched to glib if desired. - New code (not imported from seabios) uses glib to reduce dependency on iasl. With time all code can be rewritted to use glib and avoid iasl, if desired. - As was the case previously, systems that lack working iasl are detected at configure time, pre-generated hex files in source tree are used in this case. This addresses the concern with iasl/big-endian systems. - Compile tested only. Migration is known to be broken - there are a bunch of TODO tags in code. It was agreed on the last qemu conf meeting that this code is posted without looking at migration, with understanding that it is addressed before being merged. Please do not mistake this limitation for a fundamental one - I have a very good idea how to fix it, including cross-version migration. - Cross version migration: when running with -M 1.5 and older, all ACPI table generation should be disabled. We'll present FW_CFG interface compatible with 1.5.
So TL;DR, you break a bunch of stuff and introduce a mess of code. It would be less code and wouldn't break anything to add this logic to SeaBIOS.
How is this even a discussion?
Regards,
Anthony Liguori
Cc: Jordan Justen jljusten@gmail.com Cc: Anthony Liguori anthony@codemonkey.ws Cc: Laszlo Ersek lersek@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: seabios@seabios.org Cc: David Woodhouse dwmw2@infradead.org Cc: Kevin O'Connor kevin@koconnor.net Cc: Peter Maydell peter.maydell@linaro.org Signed-off-by: Michael S. Tsirkin mst@redhat.com
docs/specs/acpi_pci_hotplug.txt | 8 + include/hw/i386/pc.h | 4 +- hw/i386/acpi-dsdt.dsl | 36 +++- hw/i386/ssdt-pcihp.dsl | 51 ----- hw/acpi/piix4.c | 145 ++++++++++---- hw/i386/acpi-build.c | 411 ++++++++++++++++++++++++++++++++++----- hw/i386/Makefile.objs | 2 +- hw/i386/ssdt-pcihp.hex.generated | 108 ---------- 8 files changed, 510 insertions(+), 255 deletions(-) delete mode 100644 hw/i386/ssdt-pcihp.dsl delete mode 100644 hw/i386/ssdt-pcihp.hex.generated
diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt index a839434..951b99a 100644 --- a/docs/specs/acpi_pci_hotplug.txt +++ b/docs/specs/acpi_pci_hotplug.txt @@ -43,3 +43,11 @@ PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
Used by ACPI BIOS _RMV method to indicate removability status to OS. One bit per slot. Read-only
+PCI bus selector (IO port 0xae10-0xae13, 4-byte access): +-----------------------------------------------
+Used by ACPI BIOS methods to select the current PCI bus. +When written, makes all the other PCI registers (0xae00 - 0xae0f) +to refer to the appropriate bus. +0 selects PCI root bus (default). diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index f8d0871..66ec787 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -38,7 +38,9 @@ struct PcGuestInfo { bool s3_disabled; bool s4_disabled; uint8_t s4_val;
- DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX);
+#define GUEST_INFO_MAX_HOTPLUG_BUS 256
- PCIBus *hotplug_buses[GUEST_INFO_MAX_HOTPLUG_BUS];
- DECLARE_BITMAP(hotplug_enable[GUEST_INFO_MAX_HOTPLUG_BUS], PCI_SLOT_MAX); uint16_t sci_int; uint8_t acpi_enable_cmd; uint8_t acpi_disable_cmd;
diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 90efce0..dc05668 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -133,11 +133,18 @@ DefinitionBlock ( B0EJ, 32, }
OperationRegion(SEL, SystemIO, 0xae10, 0x04)
Field(SEL, DWordAcc, NoLock, WriteAsZeros) {
BSEL, 32,
}
/* Methods called by bulk generated PCI devices below */
Name(BNUM, 0x00) /* Methods called by hotplug devices */
Method(PCEJ, 1, NotSerialized) {
Method(PCEJ, 2, NotSerialized) { // _EJ0 method - eject callback
Store(Arg1, BSEL) Store(ShiftLeft(1, Arg0), B0EJ) Return (0x0) }
@@ -147,16 +154,25 @@ DefinitionBlock (
/* PCI hotplug notify method */ Method(PCNF, 0) {
// Local0 = iterator
// Local0 = bridge selector Store(Zero, Local0)
While (LLess(Local0, 31)) {
Increment(Local0)
If (And(PCIU, ShiftLeft(1, Local0))) {
PCNT(Local0, 1)
}
If (And(PCID, ShiftLeft(1, Local0))) {
PCNT(Local0, 3)
}
While (LLess(Local0, 255)) {
Store(Local0, BSEL)
// Local1 = slot iterator
Store(Zero, Local1)
Store(PCIU, Local2)
Store(PCID, Local3)
While (LLess(Local1, 31)) {
Increment(Local1)
If (And(Local2, ShiftLeft(1, Local1))) {
PCNT(Add(Local1, Multiply(Local0, 32)), 1)
}
If (And(Local3, ShiftLeft(1, Local1))) {
PCNT(Add(Local1, Multiply(Local0, 32)), 3)
}
}
}Increment(Local0) } }
diff --git a/hw/i386/ssdt-pcihp.dsl b/hw/i386/ssdt-pcihp.dsl deleted file mode 100644 index d29a5b9..0000000 --- a/hw/i386/ssdt-pcihp.dsl +++ /dev/null @@ -1,51 +0,0 @@ -/*
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- You should have received a copy of the GNU General Public License along
- with this program; if not, see http://www.gnu.org/licenses/.
- */
-ACPI_EXTRACT_ALL_CODE ssdp_pcihp_aml
-DefinitionBlock ("ssdt-pcihp.aml", "SSDT", 0x01, "BXPC", "BXSSDTPCIHP", 0x1) -{
-/****************************************************************
- PCI hotplug
- ****************************************************************/
- /* Objects supplied by DSDT */
- External(_SB.PCI0, DeviceObj)
- External(_SB.PCI0.PCEJ, MethodObj)
- Scope(_SB.PCI0) {
/* Bulk generated PCI hotplug devices */
ACPI_EXTRACT_DEVICE_START ssdt_pcihp_start
ACPI_EXTRACT_DEVICE_END ssdt_pcihp_end
ACPI_EXTRACT_DEVICE_STRING ssdt_pcihp_name
// Method _EJ0 can be patched by BIOS to EJ0_
// at runtime, if the slot is detected to not support hotplug.
// Extract the offset of the address dword and the
// _EJ0 name to allow this patching.
Device(SAA) {
ACPI_EXTRACT_NAME_BYTE_CONST ssdt_pcihp_id
Name(_SUN, 0xAA)
ACPI_EXTRACT_NAME_DWORD_CONST ssdt_pcihp_adr
Name(_ADR, 0xAA0000)
ACPI_EXTRACT_METHOD_STRING ssdt_pcihp_ej0
Method(_EJ0, 1) {
Return (PCEJ(_SUN))
}
}
- }
-} diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 19a26f5..10106fd 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -29,6 +29,7 @@ #include "exec/ioport.h" #include "hw/nvram/fw_cfg.h" #include "exec/address-spaces.h" +#include "hw/pci/pci_bus.h"
//#define DEBUG
@@ -42,11 +43,12 @@ #define GPE_LEN 4
#define PCI_HOTPLUG_ADDR 0xae00 -#define PCI_HOTPLUG_SIZE 0x000f +#define PCI_HOTPLUG_SIZE 0x0014 #define PCI_UP_BASE 0xae00 #define PCI_DOWN_BASE 0xae04 #define PCI_EJ_BASE 0xae08 #define PCI_RMV_BASE 0xae0c +#define PCI_SEL_BASE 0xae10
#define PIIX4_PROC_BASE 0xaf00 #define PIIX4_PROC_LEN 32 @@ -57,6 +59,8 @@ struct pci_status { uint32_t up; /* deprecated, maintained for migration compatibility */ uint32_t down;
- uint32_t hotplug_enable;
- uint32_t device_present;
};
typedef struct CPUStatus { @@ -84,9 +88,8 @@ typedef struct PIIX4PMState { Notifier powerdown_notifier;
/* for pci hotplug */
- struct pci_status pci0_status;
- uint32_t pci0_hotplug_enable;
- uint32_t pci0_slot_device_present;
struct pci_status pci_status[GUEST_INFO_MAX_HOTPLUG_BUS];
uint32_t hotplug_select;
uint8_t disable_s3; uint8_t disable_s4;
@@ -98,6 +101,17 @@ typedef struct PIIX4PMState { PcGuestInfo *guest_info; } PIIX4PMState;
+static int piix4_find_hotplug_bus(PcGuestInfo *guest_info, PCIBus *bus) +{
- int i;
- for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) {
if (guest_info->hotplug_buses[i] == bus) {
return i;
}
- }
- return -1;
+}
static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s);
@@ -183,6 +197,8 @@ static void pm_write_config(PCIDevice *d,
static void vmstate_pci_status_pre_save(void *opaque) { +#if 0
- /* TODO */ struct pci_status *pci0_status = opaque; PIIX4PMState *s = container_of(pci0_status, PIIX4PMState, pci0_status);
@@ -190,6 +206,7 @@ static void vmstate_pci_status_pre_save(void *opaque) * to a version that still does... of course these might get lost * by an old buggy implementation, but we try. */ pci0_status->up = s->pci0_slot_device_present & s->pci0_hotplug_enable; +#endif }
static int vmstate_acpi_post_load(void *opaque, int version_id) @@ -267,7 +284,10 @@ static int acpi_load_old(QEMUFile *f, void *opaque, int version_id) qemu_get_be16s(f, &temp); }
+#if 0
- TODO ret = vmstate_load_state(f, &vmstate_pci_status, &s->pci0_status, 1);
+#endif return ret; }
@@ -293,21 +313,28 @@ static const VMStateDescription vmstate_acpi = { VMSTATE_TIMER(ar.tmr.timer, PIIX4PMState), VMSTATE_INT64(ar.tmr.overflow_time, PIIX4PMState), VMSTATE_STRUCT(ar.gpe, PIIX4PMState, 2, vmstate_gpe, ACPIGPE), +#if 0 VMSTATE_STRUCT(pci0_status, PIIX4PMState, 2, vmstate_pci_status, struct pci_status), +#endif VMSTATE_END_OF_LIST() } };
-static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned bsel, unsigned slots) { BusChild *kid, *next;
- BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
BusState *bus; int slot = ffs(slots) - 1; bool slot_free = true;
if (!s->guest_info->hotplug_buses[bsel]) {
return;
}
bus = &s->guest_info->hotplug_buses[bsel]->qbus;
/* Mark request as complete */
- s->pci0_status.down &= ~(1U << slot);
s->pci_status[bsel].down &= ~(1U << slot);
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child;
@@ -322,23 +349,26 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) } } if (slot_free) {
s->pci0_slot_device_present &= ~(1U << slot);
}s->pci_status[bsel].device_present &= ~(1U << slot);
}
-static void piix4_update_hotplug(PIIX4PMState *s) +static void piix4_update_hotplug(PIIX4PMState *s, unsigned bsel) {
- PCIDevice *dev = &s->dev;
- BusState *bus = qdev_get_parent_bus(&dev->qdev); BusChild *kid, *next;
BusState *bus;
if (!s->guest_info->hotplug_buses[bsel]) {
return;
}
bus = &s->guest_info->hotplug_buses[bsel]->qbus; /* Execute any pending removes during reset */
- while (s->pci0_status.down) {
acpi_piix_eject_slot(s, s->pci0_status.down);
- while (s->pci_status[bsel].down) {
}acpi_piix_eject_slot(s, bsel, s->pci_status[bsel].down);
- s->pci0_hotplug_enable = ~0;
- s->pci0_slot_device_present = 0;
s->pci_status[bsel].hotplug_enable = ~0;
s->pci_status[bsel].device_present = 0;
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child;
@@ -347,10 +377,10 @@ static void piix4_update_hotplug(PIIX4PMState *s) int slot = PCI_SLOT(pdev->devfn);
if (pc->no_hotplug) {
s->pci0_hotplug_enable &= ~(1U << slot);
s->pci_status[bsel].hotplug_enable &= ~(1U << slot); }
s->pci0_slot_device_present |= (1U << slot);
}s->pci_status[bsel].device_present |= (1U << slot);
}
@@ -358,6 +388,7 @@ static void piix4_reset(void *opaque) { PIIX4PMState *s = opaque; uint8_t *pci_conf = s->dev.config;
int i;
pci_conf[0x58] = 0; pci_conf[0x59] = 0;
@@ -371,7 +402,9 @@ static void piix4_reset(void *opaque) /* Mark SMM as already inited (until KVM supports SMM). */ pci_conf[0x5B] = 0x02; }
- piix4_update_hotplug(s);
- for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) {
piix4_update_hotplug(s, i);
- }
}
static void piix4_pm_powerdown_req(Notifier *n, void *opaque) @@ -382,14 +415,20 @@ static void piix4_pm_powerdown_req(Notifier *n, void *opaque) acpi_pm1_evt_power_down(&s->ar); }
-static void piix4_update_guest_info(PIIX4PMState *s) +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
PCIHotplugState state);
+static void piix4_update_bus_guest_info(PCIBus *pci_bus, void *opaque) {
- PCIDevice *dev = &s->dev;
- BusState *bus = qdev_get_parent_bus(&dev->qdev);
- PIIX4PMState *s = opaque;
- PcGuestInfo *guest_info = s->guest_info; BusChild *kid, *next;
- BusState *bus = &pci_bus->qbus;
- memset(s->guest_info->slot_hotplug_enable, 0xff,
DIV_ROUND_UP(PCI_SLOT_MAX, BITS_PER_BYTE));
int i = piix4_find_hotplug_bus(guest_info, NULL);
assert(i >= 0);
guest_info->hotplug_buses[i] = pci_bus;
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child;
@@ -398,9 +437,22 @@ static void piix4_update_guest_info(PIIX4PMState *s) int slot = PCI_SLOT(pdev->devfn);
if (pc->no_hotplug) {
clear_bit(slot, s->guest_info->slot_hotplug_enable);
}clear_bit(slot, guest_info->hotplug_enable[i]); }
- pci_bus_hotplug(pci_bus, piix4_device_hotplug, &s->dev.qdev);
+}
+static void piix4_update_guest_info(PIIX4PMState *s) +{
- PCIDevice *dev = &s->dev;
- memset(s->guest_info->hotplug_enable, 0xff,
sizeof s->guest_info->hotplug_enable);
- pci_for_each_bus(dev->bus, piix4_update_bus_guest_info,
s);
}
static void piix4_pm_machine_ready(Notifier *n, void *opaque) @@ -589,16 +641,22 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) { PIIX4PMState *s = opaque; uint32_t val = 0;
int bsel = s->hotplug_select;
if (bsel > GUEST_INFO_MAX_HOTPLUG_BUS) {
return 0;
}
switch (addr) { case PCI_UP_BASE - PCI_HOTPLUG_ADDR: /* Manufacture an "up" value to cause a device check on any hotplug * slot with a device. Extra device checks are harmless. */
val = s->pci0_slot_device_present & s->pci0_hotplug_enable;
val = s->pci_status[bsel].device_present &
case PCI_DOWN_BASE - PCI_HOTPLUG_ADDR:s->pci_status[bsel].hotplug_enable; PIIX4_DPRINTF("pci_up_read %x\n", val); break;
val = s->pci0_status.down;
case PCI_EJ_BASE - PCI_HOTPLUG_ADDR:val = s->pci_status[bsel].down; PIIX4_DPRINTF("pci_down_read %x\n", val); break;
@@ -606,8 +664,10 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) PIIX4_DPRINTF("pci_features_read %x\n", val); break; case PCI_RMV_BASE - PCI_HOTPLUG_ADDR:
val = s->pci0_hotplug_enable;
val = s->pci_status[bsel].hotplug_enable; break;
- case PCI_SEL_BASE - PCI_HOTPLUG_ADDR:
default: break; }val = s->hotplug_select;
@@ -618,12 +678,18 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) static void pci_write(void *opaque, hwaddr addr, uint64_t data, unsigned int size) {
- PIIX4PMState *s = opaque; switch (addr) { case PCI_EJ_BASE - PCI_HOTPLUG_ADDR:
acpi_piix_eject_slot(opaque, (uint32_t)data);
if (s->hotplug_select >= GUEST_INFO_MAX_HOTPLUG_BUS) {
break;
}
acpi_piix_eject_slot(s, s->hotplug_select, data); PIIX4_DPRINTF("pciej write %" HWADDR_PRIx " <== % " PRIu64 "\n", addr, data); break;
- case PCI_SEL_BASE - PCI_HOTPLUG_ADDR:
default: break; }s->hotplug_select = data;
@@ -706,9 +772,6 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data) g->sts[id / 8] |= (1 << (id % 8)); }
-static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
PCIHotplugState state);
static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s) { @@ -720,8 +783,6 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCI_HOTPLUG_SIZE); memory_region_add_subregion(parent, PCI_HOTPLUG_ADDR, &s->io_pci);
- pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
- qemu_for_each_cpu(piix4_init_cpu_status, &s->gpe_cpu); memory_region_init_io(&s->io_cpu, &cpu_hotplug_ops, s, "apci-cpu-hotplug", PIIX4_PROC_LEN);
@@ -730,16 +791,16 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, qemu_register_cpu_added_notifier(&s->cpu_added_notifier); }
-static void enable_device(PIIX4PMState *s, int slot) +static void enable_device(PIIX4PMState *s, unsigned bsel, int slot) { s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
- s->pci0_slot_device_present |= (1U << slot);
- s->pci_status[bsel].device_present |= (1U << slot);
}
-static void disable_device(PIIX4PMState *s, int slot) +static void disable_device(PIIX4PMState *s, unsigned bsel, int slot) { s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
- s->pci0_status.down |= (1U << slot);
- s->pci_status[bsel].down |= (1U << slot);
}
static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, @@ -748,19 +809,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int slot = PCI_SLOT(dev->devfn); PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, PCI_DEVICE(qdev));
int bsel = piix4_find_hotplug_bus(s->guest_info, dev->bus);
if (bsel < 0) {
return -1;
}
/* Don't send event when device is enabled during qemu machine creation:
- it is present on boot, no hotplug event is necessary. We do send an
- event when the device is disabled later. */
if (state == PCI_COLDPLUG_ENABLED) {
s->pci0_slot_device_present |= (1U << slot);
s->pci_status[bsel].device_present |= (1U << slot); return 0;
}
if (state == PCI_HOTPLUG_ENABLED) {
enable_device(s, slot);
} else {enable_device(s, bsel, slot);
disable_device(s, slot);
disable_device(s, bsel, slot);
}
pm_update_sci(s);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 72606a8..43e4988 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -34,6 +34,7 @@ #include "hw/acpi/acpi.h" #include "hw/nvram/fw_cfg.h" #include "hw/i386/bios-linker-loader.h" +#include "hw/pci/pci_bus.h"
#define ACPI_BUILD_APPNAME "Bochs" #define ACPI_BUILD_APPNAME6 "BOCHS " @@ -258,23 +259,358 @@ acpi_encode_len(uint8_t *ssdt_ptr, int length, int bytes) #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start)
-/* 0x5B 0x82 DeviceOp PkgLength NameString */ -#define ACPI_PCIHP_OFFSET_HEX (*ssdt_pcihp_name - *ssdt_pcihp_start + 1) -#define ACPI_PCIHP_OFFSET_ID (*ssdt_pcihp_id - *ssdt_pcihp_start) -#define ACPI_PCIHP_OFFSET_ADR (*ssdt_pcihp_adr - *ssdt_pcihp_start) -#define ACPI_PCIHP_OFFSET_EJ0 (*ssdt_pcihp_ej0 - *ssdt_pcihp_start) -#define ACPI_PCIHP_SIZEOF (*ssdt_pcihp_end - *ssdt_pcihp_start) -#define ACPI_PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start)
#define ACPI_SSDT_SIGNATURE 0x54445353 /* SSDT */ #define ACPI_SSDT_HEADER_LENGTH 36
#include "hw/i386/ssdt-misc.hex" -#include "hw/i386/ssdt-pcihp.hex"
+static inline GArray *build_alloc_array(void) +{
return g_array_new(false, true /* clear */, 1);
+}
+static inline void build_free_array(GArray *array) +{
g_array_free(array, true);
+}
+static inline void build_prepend_byte(GArray *array, uint8_t val) +{
- g_array_prepend_val(array, val);
+}
+static inline void build_append_byte(GArray *array, uint8_t val) +{
- g_array_append_val(array, val);
+}
+static inline void build_prepend_array(GArray *array, GArray *val) +{
- g_array_prepend_vals(array, val->data, val->len);
+}
+static inline void build_append_array(GArray *array, GArray *val) +{
- g_array_append_vals(array, val->data, val->len);
+}
+static void build_append_nameseg(GArray *array, const char *format, ...) +{
- GString *s = g_string_new("");
- va_list args;
- va_start(args, format);
- g_string_vprintf(s, format, args);
- va_end(args);
- assert(s->len == 4);
- g_array_append_vals(array, s->str, s->len);
- g_string_free(s, true);
+}
+static void build_prepend_nameseg(GArray *array, const char *format, ...) +{
- GString *s = g_string_new("");
- va_list args;
- va_start(args, format);
- g_string_vprintf(s, format, args);
- va_end(args);
- assert(s->len == 4);
- g_array_prepend_vals(array, s->str, s->len);
- g_string_free(s, true);
+}
+static void build_prepend_devfn(GArray *name, uint8_t devfn) +{
- build_prepend_nameseg(name, "S%0.02x_", devfn);
+}
+static void build_append_devfn(GArray *name, uint8_t devfn) +{
- GArray *array = build_alloc_array();
- build_prepend_devfn(array, devfn);
- build_append_array(name, array);
- build_free_array(array);
+}
+static void build_prepend_name_string(GArray *name, PCIBus *bus) +{
- int segcount;
- for (;!pci_bus_is_root(bus); bus = bus->parent_dev->bus) {
build_prepend_devfn(name, bus->parent_dev->devfn);
- }
- g_array_prepend_vals(name, "PCI0", 4);
- g_array_prepend_vals(name, "_SB_", 4);
- /* Name segment length is a multiple of 4 */
- assert(name->len % 4 == 0);
- segcount = name->len / 4;
- build_prepend_byte(name, segcount);
- build_prepend_byte(name, 0x2f); /* MultiNamePrefix */
- build_prepend_byte(name, '\'); /* RootChar */
+}
+static void build_append_name_string(GArray *name, PCIBus *bus) +{
- GArray *append = build_alloc_array();
- build_prepend_name_string(append, bus);
- build_append_array(name, append);
- build_free_array(append);
+}
+/* 5.4 Definition Block Encoding */ +enum {
- PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */
- PACKAGE_LENGTH_2BYTE_SHIFT = 4,
- PACKAGE_LENGTH_3BYTE_SHIFT = 12,
- PACKAGE_LENGTH_4BYTE_SHIFT = 20,
+};
+static void build_prepend_package_length(GArray *package) +{
- unsigned bytes;
- uint8_t byte;
- unsigned package_len = package->len;
- unsigned length = package_len;
- if (length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT)) {
byte = length;
build_prepend_byte(package, byte);
return;
- }
- if (length + 4 > (1 << PACKAGE_LENGTH_4BYTE_SHIFT)) {
byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT;
build_prepend_byte(package, byte);
length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1;
- }
- if (length + 3 > (1 << PACKAGE_LENGTH_3BYTE_SHIFT)) {
byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT;
build_prepend_byte(package, byte);
length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1;
- }
- if (length + 2 > (1 << PACKAGE_LENGTH_2BYTE_SHIFT)) {
byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT;
build_prepend_byte(package, byte);
length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1;
- }
- bytes = package->len - package_len;
- byte = (bytes << PACKAGE_LENGTH_1BYTE_SHIFT) | length;
- build_prepend_byte(package, byte);
+}
+static void build_package(GArray *package, uint8_t op) +{
- build_prepend_package_length(package);
- build_prepend_byte(package, op);
+}
+static void build_extop_package(GArray *package, uint8_t op) +{
- build_package(package, op);
- build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
+}
+static void build_append_notify(GArray *method, GArray *target_name,
uint16_t value)
+{
- GArray *notify = build_alloc_array();
- uint8_t op = 0xA0; /* IfOp */
- build_append_byte(notify, 0x93); /* LEqualOp */
- build_append_byte(notify, 0x68); /* Arg0Op */
- build_append_byte(notify, 0x0A); /* BytePrefix */
- build_append_byte(notify, value);
- build_append_byte(notify, value >> 8);
- build_append_byte(notify, 0x86); /* NotifyOp */
- build_append_array(notify, target_name);
- build_append_byte(notify, 0x69); /* Arg1Op */
- /* Pack it up */
- build_package(notify, op);
- build_append_array(method, notify);
- build_free_array(notify);
+}
+static void +build_append_notify_slots(GArray *method, PCIBus *bus,
unsigned bus_select,
DECLARE_BITMAP(hotplug_enable, PCI_SLOT_MAX))
+{
- int i;
- if (!bus) {
return;
- }
- for (i = 0; i < 32; i++) {
GArray *target;
if (!test_bit(i, hotplug_enable)) {
continue;
}
target = build_alloc_array();
build_prepend_devfn(target, i);
build_prepend_name_string(target, bus);
/* *32 matches Multiply in PCNF */
build_append_notify(method, target, bus_select * 32 + i);
build_free_array(target);
- }
+}
+static void build_append_notify_buses(GArray *table, PcGuestInfo *guest_info) +{
- int i;
- GArray *method = build_alloc_array();
- uint8_t op = 0x14; /* MethodOp */
- build_append_nameseg(method, "PCNT");
- build_append_byte(method, 0x02); /* MethodFlags: ArgCount */
- for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) {
build_append_notify_slots(method, guest_info->hotplug_buses[i], i,
guest_info->hotplug_enable[i]);
- }
- build_package(method, op);
- build_append_array(table, method);
- build_free_array(method);
+}
+static void build_append_device(GArray *table, unsigned devfn) +{
- uint8_t op, ej0_op;
- GArray *device = build_alloc_array();
- GArray *ej0 = build_alloc_array();
- op = 0x82; /* DeviceOp */
- build_append_devfn(device, devfn);
- build_append_byte(device, 0x08); /* NameOp */
- build_append_nameseg(device, "_SUN");
- build_append_byte(device, 0x0A); /* BytePrefix */
- build_append_byte(device, PCI_SLOT(devfn));
- build_append_byte(device, 0x08); /* NameOp */
- build_append_nameseg(device, "_ADR");
- build_append_byte(device, 0x0C); /* DWordPrefix */
- build_append_byte(device, PCI_FUNC(devfn));
- build_append_byte(device, 0x00);
- build_append_byte(device, PCI_SLOT(devfn));
- build_append_byte(device, 0x00);
- ej0_op = 0x14; /* MethodOp */
- build_append_nameseg(ej0, "_EJ0");
- build_append_byte(ej0, 0x01); /* MethodFlags: ArgCount */
- build_append_byte(ej0, 0xA4); /* ReturnOp */
- build_append_nameseg(ej0, "PCEJ");
- build_append_nameseg(ej0, "_SUN");
- build_append_nameseg(ej0, "BNUM");
- build_package(ej0, ej0_op);
- build_append_array(device, ej0);
- build_extop_package(device, op);
- build_append_array(table, device);
- build_free_array(device);
- build_free_array(ej0);
+}
+static bool build_find_device_bus(PCIBus *buses[GUEST_INFO_MAX_HOTPLUG_BUS],
PCIBus *bus, unsigned devfn)
+{
- int i;
- for (i = 0; i < ARRAY_SIZE(buses); ++i) {
if (buses[i] && buses[i]->parent_dev &&
buses[i]->parent_dev->bus == bus &&
buses[i]->parent_dev->devfn == devfn) {
return true;
}
- }
- return false;
+}
+static void +build_append_bus(GArray *table, int bnum, PcGuestInfo *guest_info) +{
- uint8_t op;
- int slot;
- unsigned devfn;
- GArray *device;
- PCIBus *bus = guest_info->hotplug_buses[bnum];
- if (!bus) {
return;
- }
- device = build_alloc_array();
- /* Root device described in DSDT */
- if (bnum) {
devfn = bus->parent_dev->devfn;
op = 0x82; /* DeviceOp */
build_append_name_string(device, bus);
build_append_byte(device, 0x08); /* NameOp */
build_append_nameseg(device, "_SUN");
build_append_byte(device, 0x0A); /* BytePrefix */
build_append_byte(device, PCI_SLOT(devfn));
build_append_byte(device, 0x08); /* NameOp */
build_append_nameseg(device, "BNUM");
build_append_byte(device, 0x0A); /* BytePrefix */
build_append_byte(device, bnum);
build_append_byte(device, 0x08); /* NameOp */
build_append_nameseg(device, "_ADR");
build_append_byte(device, 0x0C); /* DWordPrefix */
build_append_byte(device, PCI_FUNC(devfn));
build_append_byte(device, 0x00);
build_append_byte(device, PCI_SLOT(devfn));
build_append_byte(device, 0x00);
- } else {
op = 0x10; /* ScopeOp */
build_append_nameseg(device, "PCI0");
- }
- for (slot = 0; slot < PCI_SLOT_MAX; ++slot) {
if (!test_bit(slot, guest_info->hotplug_enable[bnum])) {
continue;
}
/* Bus device? We'll add it later */
if (build_find_device_bus(guest_info->hotplug_buses,
bus, PCI_DEVFN(slot, 0))) {
continue;
}
build_append_device(device, PCI_DEVFN(slot, 0));
- }
- if (bnum) {
build_extop_package(device, op);
- } else {
build_package(device, op);
- }
- build_append_array(table, device);
- build_free_array(device);
+}
+static void build_append_device_buses(GArray *table, PcGuestInfo *guest_info) +{
- int i;
- for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) {
build_append_bus(table, i, guest_info);
- }
+}
static uint8_t* build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count,
const char *target, int ofs)
const char *target, int ofs, int step)
{ int i;
@@ -295,31 +631,14 @@ build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, *(ssdt_ptr++) = i; *(ssdt_ptr++) = 0x86; /* NotifyOp */ memcpy(ssdt_ptr, target, 4);
ssdt_ptr[ofs] = acpi_get_hex(i >> 4);
ssdt_ptr[ofs + 1] = acpi_get_hex(i);
ssdt_ptr[ofs] = acpi_get_hex((i * step) >> 4);
} return ssdt_ptr;ssdt_ptr[ofs + 1] = acpi_get_hex(i * step); ssdt_ptr += 4; *(ssdt_ptr++) = 0x69; /* Arg1Op */
}
-static void patch_pcihp(int slot, uint8_t *ssdt_ptr, uint32_t eject) -{
- ssdt_ptr[ACPI_PCIHP_OFFSET_HEX] = acpi_get_hex(slot >> 4);
- ssdt_ptr[ACPI_PCIHP_OFFSET_HEX+1] = acpi_get_hex(slot);
- ssdt_ptr[ACPI_PCIHP_OFFSET_ID] = slot;
- ssdt_ptr[ACPI_PCIHP_OFFSET_ADR + 2] = slot;
- /* Runtime patching of ACPI_EJ0: to disable hotplug for a slot,
* replace the method name: _EJ0 by ACPI_EJ0_. */
- /* Sanity check */
- assert (!memcmp(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "_EJ0", 4));
- if (!eject) {
memcpy(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "EJ0_", 4);
- }
-}
static void build_ssdt(GArray *table_data, GArray *linker, FWCfgState *fw_cfg, PcGuestInfo *guest_info) @@ -330,11 +649,20 @@ build_ssdt(GArray *table_data, GArray *linker, + (acpi_cpus * ACPI_PROC_SIZEOF) /* procs */ + (1+2+5+(12*acpi_cpus)) /* NTFY */ + (6+2+1+(1*acpi_cpus)) /* CPON */
+ (1+3+4) /* Scope(PCI0) */
+ ((PCI_SLOT_MAX - 1) * ACPI_PCIHP_SIZEOF) /* slots */
+ (1+2+5+(12*(PCI_SLOT_MAX - 1)))); /* PCNT */
- uint8_t *ssdt = acpi_data_push(table_data, length);
- uint8_t *ssdt_ptr = ssdt;
);
uint8_t *ssdt;
uint8_t *ssdt_ptr;
GArray *devices = build_alloc_array();
GArray *notify = build_alloc_array();
/* TODO: rewrite this function using glib */
build_append_device_buses(devices, guest_info);
length += devices->len;
build_append_notify_buses(notify, guest_info);
length += notify->len;
ssdt = acpi_data_push(table_data, length);
ssdt_ptr = ssdt;
/* Copy header and encode fwcfg values in the S3_ / S4_ / S5_ packages */ memcpy(ssdt_ptr, ssdp_misc_aml, sizeof(ssdp_misc_aml));
@@ -389,7 +717,7 @@ build_ssdt(GArray *table_data, GArray *linker,
/* build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}" */ /* Arg0 = Processor ID = APIC ID */
- ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2);
ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2, 1);
/* build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })" */ *(ssdt_ptr++) = 0x08; /* NameOp */
@@ -411,15 +739,10 @@ build_ssdt(GArray *table_data, GArray *linker, *(ssdt_ptr++) = 'I'; *(ssdt_ptr++) = '0';
- /* build Device object for each slot */
- for (i = 1; i < PCI_SLOT_MAX; i++) {
bool eject = test_bit(i, guest_info->slot_hotplug_enable);
memcpy(ssdt_ptr, ACPI_PCIHP_AML, ACPI_PCIHP_SIZEOF);
patch_pcihp(i, ssdt_ptr, eject);
ssdt_ptr += ACPI_PCIHP_SIZEOF;
- }
- ssdt_ptr = build_notify(ssdt_ptr, "PCNT", 1, PCI_SLOT_MAX, "S00_", 1);
memcpy(ssdt_ptr, devices->data, devices->len);
ssdt_ptr += devices->len;
memcpy(ssdt_ptr, notify->data, notify->len);
ssdt_ptr += notify->len;
build_header(linker, table_data, (void*)ssdt, ACPI_SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 2ab2572..c2e74e9 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -6,7 +6,7 @@ obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o obj-y += kvmvapic.o obj-y += acpi-build.o obj-y += bios-linker-loader.o -hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex +hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex hw/i386/pc_piix.o: hw/i386/pc_piix.c hw/i386/acpi-dsdt.hex hw/i386/pc_q35.o: hw/i386/pc_q35.c hw/i386/q35-acpi-dsdt.hex
diff --git a/hw/i386/ssdt-pcihp.hex.generated b/hw/i386/ssdt-pcihp.hex.generated deleted file mode 100644 index 0d32a27..0000000 --- a/hw/i386/ssdt-pcihp.hex.generated +++ /dev/null @@ -1,108 +0,0 @@ -static unsigned char ssdt_pcihp_name[] = { -0x33 -}; -static unsigned char ssdt_pcihp_adr[] = { -0x44 -}; -static unsigned char ssdt_pcihp_end[] = { -0x58 -}; -static unsigned char ssdp_pcihp_aml[] = { -0x53, -0x53, -0x44, -0x54, -0x58, -0x0, -0x0, -0x0, -0x1, -0x77, -0x42, -0x58, -0x50, -0x43, -0x0, -0x0, -0x42, -0x58, -0x53, -0x53, -0x44, -0x54, -0x50, -0x43, -0x1, -0x0, -0x0, -0x0, -0x49, -0x4e, -0x54, -0x4c, -0x28, -0x5, -0x10, -0x20, -0x10, -0x33, -0x5c, -0x2e, -0x5f, -0x53, -0x42, -0x5f, -0x50, -0x43, -0x49, -0x30, -0x5b, -0x82, -0x26, -0x53, -0x41, -0x41, -0x5f, -0x8, -0x5f, -0x53, -0x55, -0x4e, -0xa, -0xaa, -0x8, -0x5f, -0x41, -0x44, -0x52, -0xc, -0x0, -0x0, -0xaa, -0x0, -0x14, -0xf, -0x5f, -0x45, -0x4a, -0x30, -0x1, -0xa4, -0x50, -0x43, -0x45, -0x4a, -0x5f, -0x53, -0x55, -0x4e -}; -static unsigned char ssdt_pcihp_start[] = { -0x30 -}; -static unsigned char ssdt_pcihp_id[] = { -0x3d -}; -static unsigned char ssdt_pcihp_ej0[] = { -0x4a
-};
MST
On 10 June 2013 19:58, Anthony Liguori anthony@codemonkey.ws wrote:
We only care about supporting one version of SeaBIOS. SeaBIOS should only care about supporting one version of QEMU. We're not asking it to support multiple versions.
I'm confused, how does this work in cross-version migration? The BIOS is part of the guest, so I'd expect QEMU would have to support whatever it happens to be. (Plus how do you enforce the version-dependency?)
thanks -- PMM
On 06/10/13 20:58, Anthony Liguori wrote:
"Michael S. Tsirkin" mst@redhat.com writes:
This adds support for device hotplug behind pci bridges. Bridge devices themselves need to be pre-configured on qemu command line.
One of the goals of this project was to demonstrate the advantages of generating ACPI tables in QEMU. In my opinion, it does this successfully.
Since you've gone out of your way to make this difficult to actually review...
I haven't tried to review the patch yet, but why would you say such a thing? Michael wants to convince you. There's no way he could sneak past you in this patch. You surely won't say "it's too hard to review, I'll yield", and he's obviously aware.
[snip]
* No need for yet another interface to bios to detect qemu version to check if it's safe to activate new code, and to ship multiple table versions:
We only care about supporting one version of SeaBIOS. SeaBIOS should only care about supporting one version of QEMU. We're not asking it to support multiple versions.
Kevin requires cross-compat between {old, new} qemu and {old, new} SeaBIOS:
http://thread.gmane.org/gmane.comp.emulators.qemu/202005/focus=202072
we generated code so we know this is new qemu * Use of glib's GArray makes it much easier to build up tables in code without need for iasl and code patching
Adding a dynamic array to SeaBIOS isn't rocket science.
Please take a look at my series for OVMF that adds basic support for SMBIOS tables. It took me three days of basically uninterrupted coding and two approaches to throw away until I got something submittable (with default tables for only type 0 and type 1).
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/3037
I don't want to invite accusations like "this is a strawman argument, noone was speaking about SMBIOS", but the selective, dynamic patching is somewhat similar between AML and SMBIOS in any given boot firmware. You got a big bunch of named offsets that must be mangled individually.
Allowing the firmware to only install blobs verbatim, or maybe patch them but only in a completely programmatic manner (ie. without specific field names & offsets in the firmware, which I did try for SMBIOS in OVMF, and failed, (*)), would be a big help.
((*) because the fw_cfg format of individual SMBIOS fields doesn't distinguish formatted fields from strings in the unformatted area)
Not rocket science, just churn and busywork.
[snip]
So TL;DR, you break a bunch of stuff and introduce a mess of code. It would be less code and wouldn't break anything to add this logic to SeaBIOS.
How is this even a discussion?
Well it isn't for me any more; count me out. I'll go with the flow.
Cheers, Laszlo
Laszlo Ersek lersek@redhat.com writes:
On 06/10/13 20:58, Anthony Liguori wrote:
"Michael S. Tsirkin" mst@redhat.com writes:
This adds support for device hotplug behind pci bridges. Bridge devices themselves need to be pre-configured on qemu command line.
One of the goals of this project was to demonstrate the advantages of generating ACPI tables in QEMU. In my opinion, it does this successfully.
Since you've gone out of your way to make this difficult to actually review...
I haven't tried to review the patch yet, but why would you say such a thing? Michael wants to convince you. There's no way he could sneak past you in this patch. You surely won't say "it's too hard to review, I'll yield", and he's obviously aware.
The patch is very large and depends on another large series. It's not split up as a logical series of changes and more importantly doesn't solve significant problems (like live migration).
It makes it very difficult to properly evaluate the approach. But even taking those sort cuts, it looks like a lot of code in QEMU to avoid what appears to be a significantly smaller amount of code in SeaBIOS.
There isn't deep dependency on information only available in QEMU. AFAICT the only advantage of being in QEMU is being able to use a couple data types from glib.
[snip]
* No need for yet another interface to bios to detect qemu version to check if it's safe to activate new code, and to ship multiple table versions:
We only care about supporting one version of SeaBIOS. SeaBIOS should only care about supporting one version of QEMU. We're not asking it to support multiple versions.
Kevin requires cross-compat between {old, new} qemu and {old, new} SeaBIOS:
http://thread.gmane.org/gmane.comp.emulators.qemu/202005/focus=202072
we generated code so we know this is new qemu * Use of glib's GArray makes it much easier to build up tables in code without need for iasl and code patching
Adding a dynamic array to SeaBIOS isn't rocket science.
Please take a look at my series for OVMF that adds basic support for SMBIOS tables. It took me three days of basically uninterrupted coding and two approaches to throw away until I got something submittable (with default tables for only type 0 and type 1).
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/3037
I don't want to invite accusations like "this is a strawman argument, noone was speaking about SMBIOS", but the selective, dynamic patching is somewhat similar between AML and SMBIOS in any given boot firmware. You got a big bunch of named offsets that must be mangled individually.
Allowing the firmware to only install blobs verbatim, or maybe patch them but only in a completely programmatic manner (ie. without specific field names & offsets in the firmware, which I did try for SMBIOS in OVMF, and failed, (*)), would be a big help.
((*) because the fw_cfg format of individual SMBIOS fields doesn't distinguish formatted fields from strings in the unformatted area)
I don't understand this piece.
Other than TianoCore being a weird environment, what made this more difficult than it is to generate the tables in QEMU?
Regards,
Anthony Liguori
Not rocket science, just churn and busywork.
[snip]
So TL;DR, you break a bunch of stuff and introduce a mess of code. It would be less code and wouldn't break anything to add this logic to SeaBIOS.
How is this even a discussion?
Well it isn't for me any more; count me out. I'll go with the flow.
Cheers, Laszlo
On 06/10/13 21:57, Anthony Liguori wrote:
Laszlo Ersek lersek@redhat.com writes:
Please take a look at my series for OVMF that adds basic support for SMBIOS tables. It took me three days of basically uninterrupted coding and two approaches to throw away until I got something submittable (with default tables for only type 0 and type 1).
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/3037
I don't want to invite accusations like "this is a strawman argument, noone was speaking about SMBIOS", but the selective, dynamic patching is somewhat similar between AML and SMBIOS in any given boot firmware. You got a big bunch of named offsets that must be mangled individually.
Allowing the firmware to only install blobs verbatim, or maybe patch them but only in a completely programmatic manner (ie. without specific field names & offsets in the firmware, which I did try for SMBIOS in OVMF, and failed, (*)), would be a big help.
((*) because the fw_cfg format of individual SMBIOS fields doesn't distinguish formatted fields from strings in the unformatted area)
I don't understand this piece.
Other than TianoCore being a weird environment, what made this more difficult than it is to generate the tables in QEMU?
QEMU can pass down two kinds of SMBIOS config items over fw_cfg: verbatim tables and individual fields.
The verbatim table kind means "install this table as-is". I'm not sure how SeaBIOS solves it, in OVMF you call Smbios->Add().
Regarding the "field kind" config item: for each table type required by the SMBIOS spec, if qemu doesn't provide such a table in whole / in verbatim, then the firmware must install a default table for that type, *and* patch it with any referring field kind config items.
Given the current format of the "field kind" config item, it is currently not possible to just iterate over the "field kind" config items, and patch whatever field in whatever table they refer to. The information contained in the "field kind" config item is insufficient to do this.
The "field kind" config item designates a (table, field offset) pair. If that offset points at a string-typed SMBIOS field (basically string serial number into the unformatted area), one must update the unformatted string area after the actual table fields. (In OVMF one does this with a different call, Smbios->UpdateStrings(), and at that point the table even must have been installed already.)
If the offset points at a non-string-typed (= in-formatted-area) field, the field must be overwritten in-place. (In OVMF this must happen *before* table installation, ie. before Smbios->Add().)
So, the current fw_cfg representation is insufficient, and the firmware must extend (or qemu should have extended) the info contents with this distinction between formatted and unformatted. Again this differs from field to field and the origin of that classification is the SMBIOS spec.
Until this point in my email there has been no clear indication whether qemu or the firmware should do this. *Some* side will have to suffer shoveling the field/offset info, and a table template for each type, from the SMBIOS spec into code. I described the above solely as background, to set the landscape.
(a) The first thing that tips the scale is: of qemu there is one, and of firmware, there exist at least two, *wildly different* implementations. (Apologies, I forgot about coreboot; not sure if it's affected.) It's not a design purity argument, just what's cheaper.
(b) The second thing that further unbalances the scale: suppose you decide to add further field-kind overrides to qemu. For example, currently Type 3 (chassis information, table required by SMBIOS spec) cannot be overridden field-wise.
If a user is unhappy with her firmware's default Type 3 table, she's allowed to hex-edit a Type 3 blob, and pass it with "-smbios file=type3.blob" on the qemu command line. If you as developer want to provide her with more flexibility, you need to (i) patch qemu (of course), to recognize individual fields on the command line, and to populate fw_cfg with them, (ii) patch *every single firmware* supported by qemu to *look for* those (type=3,offset) field-kind config entries, and to effectuate them.
If you pass down only verbatim tables, then (ii) falls away. You don't have to update SeaBIOS, nor OVMF. As a bonus, you don't have to care about versions: age-old firmware that only knows how to install verbatim tables will immediately benefit from qemu's new-fangled Type 3 flexibility.
Case in point: https://bugzilla.redhat.com/show_bug.cgi?id=788142. (Non-public bug, sorry about that. I think I can disclose that it is about supporting the "field kind" config entry for Type 3 tables.)
For ACPI tables, especially for tables containing AML, multiply the cost of (ii) by twenty. Many more offsets to patch and vastly more dependencies via fw_cfg.
Laszlo
Sorry about replying to myself...
On 06/10/13 22:43, Laszlo Ersek wrote:
It's not a design purity argument, just what's cheaper.
You probably consider SMBIOS/ACPI table generation "guest code" that has no place in the machine emulator. You're likely willing to expose the needed bits on a case-by-case basis, in their barest representation.
I must consider the entire stack, from the libvirt top, down to the guest kernel or root shell, and see where it is cheapest to effect changes. The higher I can hoist something the better.
Modulo, it depends on what element of the stack people think of as central / indispensable -- that element should support the feature directly.
(Returning to my earlier example, it would be theoretically viable *I guess* to equip libvirt to pre-format a Type 3 SMBIOS blob in a file, and pass it with "-smbios file=...", but many people don't use libvirt. It also might not apply to other emulators supported by libvirt.)
Thanks, Laszlo
Laszlo Ersek lersek@redhat.com writes:
On 06/10/13 21:57, Anthony Liguori wrote:
Laszlo Ersek lersek@redhat.com writes:
I don't understand this piece.
Other than TianoCore being a weird environment, what made this more difficult than it is to generate the tables in QEMU?
QEMU can pass down two kinds of SMBIOS config items over fw_cfg: verbatim tables and individual fields.
The verbatim table kind means "install this table as-is". I'm not sure how SeaBIOS solves it, in OVMF you call Smbios->Add().
Regarding the "field kind" config item: for each table type required by the SMBIOS spec, if qemu doesn't provide such a table in whole / in verbatim, then the firmware must install a default table for that type, *and* patch it with any referring field kind config items.
Given the current format of the "field kind" config item, it is currently not possible to just iterate over the "field kind" config items, and patch whatever field in whatever table they refer to. The information contained in the "field kind" config item is insufficient to do this.
The "field kind" config item designates a (table, field offset) pair. If that offset points at a string-typed SMBIOS field (basically string serial number into the unformatted area), one must update the unformatted string area after the actual table fields. (In OVMF one does this with a different call, Smbios->UpdateStrings(), and at that point the table even must have been installed already.)
If the offset points at a non-string-typed (= in-formatted-area) field, the field must be overwritten in-place. (In OVMF this must happen *before* table installation, ie. before Smbios->Add().)
Okay, I understand the problem here. Thanks for the explanation. I had assumed that the smbios_field structure was the binary representation of the table. I didn't realize it was our own format used to pass information over.
So, the current fw_cfg representation is insufficient, and the firmware must extend (or qemu should have extended) the info contents with this distinction between formatted and unformatted. Again this differs from field to field and the origin of that classification is the SMBIOS spec.
Until this point in my email there has been no clear indication whether qemu or the firmware should do this. *Some* side will have to suffer shoveling the field/offset info, and a table template for each type, from the SMBIOS spec into code. I described the above solely as background, to set the landscape.
(a) The first thing that tips the scale is: of qemu there is one, and of firmware, there exist at least two, *wildly different* implementations. (Apologies, I forgot about coreboot; not sure if it's affected.) It's not a design purity argument, just what's cheaper.
The only reason there are two implementations is a licensing problem that must be solved anyway.
(b) The second thing that further unbalances the scale: suppose you decide to add further field-kind overrides to qemu. For example, currently Type 3 (chassis information, table required by SMBIOS spec) cannot be overridden field-wise.
The reason we expose blobs in SMBIOS is that blobs can be vendor specific and there was a strong desire to allow vendor specific blobs to be passed through. There really is no choice but to expose a blob interface for that.
If a user is unhappy with her firmware's default Type 3 table, she's allowed to hex-edit a Type 3 blob, and pass it with "-smbios file=type3.blob" on the qemu command line. If you as developer want to provide her with more flexibility, you need to (i) patch qemu (of course), to recognize individual fields on the command line, and to populate fw_cfg with them, (ii) patch *every single firmware* supported by qemu to *look for* those (type=3,offset) field-kind config entries, and to effectuate them.
If you pass down only verbatim tables, then (ii) falls away. You don't have to update SeaBIOS, nor OVMF.
OVMF is proprietary. It is not "supported" by QEMU. Firmware is a fundamental part of our stack. I'm not really convinced that QEMU<->firmware is a GPL boundary because of how tightly the two are linked.
Moving large chunks of firmware code into QEMU just to avoid solving licensing issues is a non-starter with me.
As a bonus, you don't have to care about versions: age-old firmware that only knows how to install verbatim tables will immediately benefit from qemu's new-fangled Type 3 flexibility.
This presume that (1) you only care about exposing this information to x86 guests (2) that the firmware will never have any need to patch these blobs.
Case in point: https://bugzilla.redhat.com/show_bug.cgi?id=788142. (Non-public bug, sorry about that. I think I can disclose that it is about supporting the "field kind" config entry for Type 3 tables.)
This information is *exactly* the kind of information that you want to be exposed across architectures. It makes lots of sense to model a device to expose this type of information in a structure way.
This discussion comes down to two things I think: (a) our existing firmware interface is pretty poor (b) we are duplicating work because of firmware licensing.
We can fix (a) and there's lots of value in doing that in terms of improving support for other architectures. We've discussed (b) in other threads and I've stated my opinion on the direction we need to take.
Regards,
Anthony Liguori
For ACPI tables, especially for tables containing AML, multiply the cost of (ii) by twenty. Many more offsets to patch and vastly more dependencies via fw_cfg.
Laszlo
On Mon, Jun 10, 2013 at 04:45:53PM -0500, Anthony Liguori wrote:
This discussion comes down to two things I think: (a) our existing firmware interface is pretty poor (b) we are duplicating work because of firmware licensing.
We can fix (a) and there's lots of value in doing that in terms of improving support for other architectures. We've discussed (b) in other threads and I've stated my opinion on the direction we need to take.
I'm not concerned about (b).
I'm quite curious how you are planning to solve (a). I think it would help move this conversation forward if you could take a couple acpi tables in use today (eg, madt, srat) and describe the future format and location for each field in those tables. I think it would also be useful if you could do the same for a couple DSDT entries (eg, _SB.PCI0, _SB.PCI0.ISA) and also describe how you plan to have the guest build the AML from that info.
-Kevin
Kevin O'Connor kevin@koconnor.net writes:
On Mon, Jun 10, 2013 at 04:45:53PM -0500, Anthony Liguori wrote:
This discussion comes down to two things I think: (a) our existing firmware interface is pretty poor (b) we are duplicating work because of firmware licensing.
We can fix (a) and there's lots of value in doing that in terms of improving support for other architectures. We've discussed (b) in other threads and I've stated my opinion on the direction we need to take.
I'm not concerned about (b).
I'm quite curious how you are planning to solve (a). I think it would help move this conversation forward if you could take a couple acpi tables in use today (eg, madt, srat) and describe the future format and location for each field in those tables.
Sure. Today we expose the device model through a graph that can be losely mapped to a filesystem. We expose two methods of interest as part of our management api:
{ 'type': 'ObjectPropertyInfo', 'data': { 'name': 'str', 'type': 'str' } }
{ 'command': 'qom-list', 'data': { 'path': 'str' }, 'returns': [ 'ObjectPropertyInfo' ] }
{ 'command': 'qom-get', 'data': { 'path': 'str', 'property': 'str' }, 'returns': 'visitor', 'gen': 'no' }
I am proposing that we replace fw_cfg with a device that exports the QOM tree (read-only) via firmware.
We have a utility called qom-fuse that can mount a fuse filesystem that exports this as a proper filesystem too.
Here's a small example:
[06:26 PM] anthony🐵 titi:~/git/qemu/QMP/tmp$ find . ./machine ./machine/i440fx ./machine/i440fx/ioapic ./machine/i440fx/ioapic/parent_bus ./machine/i440fx/ioapic/realized ./machine/i440fx/ioapic/type ./machine/i440fx/pci.0 ./machine/i440fx/pci.0/child[5] ./machine/i440fx/pci.0/child[4] ./machine/i440fx/pci.0/child[3] ./machine/i440fx/pci.0/child[2] ./machine/i440fx/pci.0/child[1] ./machine/i440fx/pci.0/child[0] ...
For the MADT table, right now SeaBIOS needs the CPU count. That can be found by counting the number of CPU nodes. Today cpus are unattached so they appear in /machine/unattached but we should put them in a /machine/cpu container for clarity.
Presumably it would be good to expose the apic id too. This is already in the QOM tree.
QOM is the full representation of the device state so we should not ever need to add additional things to fw_cfg. More likely than not, when SeaBIOS needs more information, it's already there so added functionality will probably Just Work with older QEMUs.
I think it would also be useful if you could do the same for a couple DSDT entries (eg, _SB.PCI0, _SB.PCI0.ISA) and also describe how you plan to have the guest build the AML from that info.
Likewise the slot count should be available too. We hard code slots today but it is something we should model properly. We would model it using QOM of course.
Internally within QEMU, this initial discussion started by saying that any ACPI generation within QEMU should happen strictly with QOM methods. This was the crux of my argument, if QOM is the only interface we need, everything is there for the firmware to do the same job that QEMU would be doing.
Regards,
Anthony Liguori
-Kevin
On Mon, 2013-06-10 at 18:34 -0500, Anthony Liguori wrote:
Internally within QEMU, this initial discussion started by saying that any ACPI generation within QEMU should happen strictly with QOM methods. This was the crux of my argument, if QOM is the only interface we need, everything is there for the firmware to do the same job that QEMU would be doing.
That's nice in theory, but I'm not sure how it works as things evolve and new things / new features get exposed. The firmware's *interpretation* of the QOM tree needs to be kept in sync with qemu.
Hm, make that: The firmwares' *interpretation*…
Let's take a specific, recent example. We fixed the PIIX4 code to actually handle the hard reset on port 0xcf9. We need to fix the ACPI tables to indicate a usable RESET_REG.
How is that exposed in the QOM tree, and how does it all work? With qemu exposing ACPI tables in their close-to-final form, it's just fine. Boot with a recent qemu and it's all nice and shiny; boot with an old qemu and it doesn't reset properly.
But if the firmware has to be updated to interpret the new feature advertised in the QOM tree and translate it into the ACPI table, then we haven't really got much of an improvement.
Please explain how this is supposed to work in *practice*.
David Woodhouse dwmw2@infradead.org writes:
On Mon, 2013-06-10 at 18:34 -0500, Anthony Liguori wrote:
Internally within QEMU, this initial discussion started by saying that any ACPI generation within QEMU should happen strictly with QOM methods. This was the crux of my argument, if QOM is the only interface we need, everything is there for the firmware to do the same job that QEMU would be doing.
That's nice in theory, but I'm not sure how it works as things evolve and new things / new features get exposed. The firmware's *interpretation* of the QOM tree needs to be kept in sync with qemu.
Hm, make that: The firmwares' *interpretation*…
Let's take a specific, recent example. We fixed the PIIX4 code to actually handle the hard reset on port 0xcf9. We need to fix the ACPI tables to indicate a usable RESET_REG.
Very good example.
Normally, we try to be bug-for-bug compatible for guests such that -M pc-1.4 would behave exactly the same.
In this case, we failed to introduce a property to control this behavior like we should have. If this changes the guest ACPI tables, it definitely should definitely be set based on a property.
This is a good example of why this approach is good for QEMU, it helps us catch stuff like this.
How is that exposed in the QOM tree, and how does it all work? With qemu exposing ACPI tables in their close-to-final form, it's just fine. Boot with a recent qemu and it's all nice and shiny; boot with an old qemu and it doesn't reset properly.
If we did this right in QEMU, we'd have to introduce a QOM property anyway as that's how we trigger differences in machine behavior. And -M pc-1.4 ought to generate the same tables as qemu 1.4 did.
Regards,
Anthony Liguori
But if the firmware has to be updated to interpret the new feature advertised in the QOM tree and translate it into the ACPI table, then we haven't really got much of an improvement.
Please explain how this is supposed to work in *practice*.
-- dwmw2
On Mon, 2013-06-10 at 19:11 -0500, Anthony Liguori wrote:
If we did this right in QEMU, we'd have to introduce a QOM property anyway
... and then we'd have to update each firmware implementation to know about this new property, and the how it translates to the RESET_REG field in the DSDT, etc.
So we *still* end up having to update firmwares to keep up with qemu, much more than we would if we'd generated the tables on the qemu side.
On Mon, Jun 10, 2013 at 06:34:29PM -0500, Anthony Liguori wrote:
Kevin O'Connor kevin@koconnor.net writes:
On Mon, Jun 10, 2013 at 04:45:53PM -0500, Anthony Liguori wrote:
This discussion comes down to two things I think: (a) our existing firmware interface is pretty poor (b) we are duplicating work because of firmware licensing.
We can fix (a) and there's lots of value in doing that in terms of improving support for other architectures. We've discussed (b) in other threads and I've stated my opinion on the direction we need to take.
I'm not concerned about (b).
I'm quite curious how you are planning to solve (a). I think it would help move this conversation forward if you could take a couple acpi tables in use today (eg, madt, srat) and describe the future format and location for each field in those tables.
Sure. Today we expose the device model through a graph that can be losely mapped to a filesystem. We expose two methods of interest as part of our management api:
[...]
For the MADT table, right now SeaBIOS needs the CPU count. That can be found by counting the number of CPU nodes. Today cpus are unattached so they appear in /machine/unattached but we should put them in a /machine/cpu container for clarity.
SeaBIOS needs much more than the CPU count. The madt contains info on each interrupt - its global irq number, the legacy irq number, the acpi defined type of the irq, and the acpi defined flags for the irq. It also contains similar info on each cpu (including apic id), the io apic, and NMIs.
QOM is the full representation of the device state so we should not ever need to add additional things to fw_cfg. More likely than not, when SeaBIOS needs more information, it's already there so added functionality will probably Just Work with older QEMUs.
I think it would also be useful if you could do the same for a couple DSDT entries (eg, _SB.PCI0, _SB.PCI0.ISA) and also describe how you plan to have the guest build the AML from that info.
Likewise the slot count should be available too. We hard code slots today but it is something we should model properly. We would model it using QOM of course.
Internally within QEMU, this initial discussion started by saying that any ACPI generation within QEMU should happen strictly with QOM methods. This was the crux of my argument, if QOM is the only interface we need, everything is there for the firmware to do the same job that QEMU would be doing.
I think we keep talking past each other. You seem to be pointing out that the information seabios uses to patch its hardcoded tables can be passed in via QOM. Agreed, it can. I'm pointing out all the info that is hardcoded in seabios - I don't see how that can be passed via QOM.
All the hardcoded data in seabios is a problem, because when it changes (and it frequently does) it requires changes to both QEMU and SeaBIOS and those changes have to be coordinated. The key reason for moving the tables into QEMU is not that it can better patch the tables - the advantage is that it can hardcode the tables that need patching.
I can cite several recent examples of seabios change requests that require changing of the hardcoded tables: liguang wants to add an "embedded controller" hardware device which requires changes to seabios' dsdt, Corey Minyard wants to add IPMI hardware support (both smbios changes and DSDT changes), and Corey Bryant wants to add TPM hardware support.
How do we solve the problem of seabios having a ton of hardcoded information about the qemu hardware, and seabios having to change with the hardware modifications that qemu makes?
-Kevin
Hi Kevin,
On Mon, Jun 10, 2013 at 7:23 PM, Kevin O'Connor kevin@koconnor.net wrote:
On Mon, Jun 10, 2013 at 06:34:29PM -0500, Anthony Liguori wrote:
Kevin O'Connor kevin@koconnor.net writes:
For the MADT table, right now SeaBIOS needs the CPU count. That can be found by counting the number of CPU nodes. Today cpus are unattached so they appear in /machine/unattached but we should put them in a /machine/cpu container for clarity.
SeaBIOS needs much more than the CPU count. The madt contains info on each interrupt - its global irq number, the legacy irq number, the acpi defined type of the irq, and the acpi defined flags for the irq. It also contains similar info on each cpu (including apic id), the io apic, and NMIs.
See below.
QOM is the full representation of the device state so we should not ever need to add additional things to fw_cfg. More likely than not, when SeaBIOS needs more information, it's already there so added functionality will probably Just Work with older QEMUs.
I think it would also be useful if you could do the same for a couple DSDT entries (eg, _SB.PCI0, _SB.PCI0.ISA) and also describe how you plan to have the guest build the AML from that info.
Likewise the slot count should be available too. We hard code slots today but it is something we should model properly. We would model it using QOM of course.
Internally within QEMU, this initial discussion started by saying that any ACPI generation within QEMU should happen strictly with QOM methods. This was the crux of my argument, if QOM is the only interface we need, everything is there for the firmware to do the same job that QEMU would be doing.
I think we keep talking past each other. You seem to be pointing out that the information seabios uses to patch its hardcoded tables can be passed in via QOM. Agreed, it can. I'm pointing out all the info that is hardcoded in seabios - I don't see how that can be passed via QOM.
No, we aren't talking past each other. We both have the same goal.
A lot of what is hard coded in SeaBIOS is hard coded in QEMU too. IRQ routing is a good example of that. We want to make this information dynamic.
Part of the trouble is that we haven't had a need to not hard code it because this is only information consumed by the BIOS.
All the hardcoded data in seabios is a problem, because when it changes (and it frequently does) it requires changes to both QEMU and SeaBIOS and those changes have to be coordinated. The key reason for moving the tables into QEMU is not that it can better patch the tables
- the advantage is that it can hardcode the tables that need patching.
So let's fix it. It's very easy to add read-only properties to QOM so we can hard code the bits that are in SeaBIOS now as read-only properties. For the MADT table, the per-CPU and IOAPIC info can simply be properties of those devices. We already represent irqs in QOM as integers so I suspect much of the information SeaBIOS needs is already there.
I can cite several recent examples of seabios change requests that require changing of the hardcoded tables: liguang wants to add an "embedded controller" hardware device which requires changes to seabios' dsdt, Corey Minyard wants to add IPMI hardware support (both smbios changes and DSDT changes), and Corey Bryant wants to add TPM hardware support.
How do we solve the problem of seabios having a ton of hardcoded information about the qemu hardware, and seabios having to change with the hardware modifications that qemu makes?
I think that we can pretty much touch a table once pulling all of the info from QOM and then from a SeaBIOS point of view, never have to touch it again.
The benefit is that solving this problem for x86 solves it for other architectures too and also lays the ground work to let a user actually control these bits too.
Regards,
Anthony Liguori
-Kevin
On Mon, Jun 10, 2013 at 2:45 PM, Anthony Liguori anthony@codemonkey.ws wrote:
OVMF is proprietary.
I don't agree that not-OSI means proprietary.
I agree that the FAT driver is not 'free software' and I agree that is a problem for usage with free software projects, such as QEMU. This is a big deal, but unfortunately, as an Intel employee, I think I've done as much as I can to address this.
It couldn't hurt if more people that actually care about this spoke up on edk2-devel about the issue, or perhaps within a UEFI working group. Because, I know that they've stopped listening to me about it.
It is not "supported" by QEMU.
No, but I've always thought that QEMU was happy to have alternative firmware projects.
I'm not really convinced that QEMU<->firmware is a GPL boundary because of how tightly the two are linked.
Where has 'linked' in terms of the GPL ever been anything other than actual executable linking?
Moving large chunks of firmware code into QEMU just to avoid solving licensing issues is a non-starter with me.
Is this a licensing issue? I thought this was a "let's save time by doing it in one place" thing. I'm pretty ambivalent about this feature, really. I don't think it is even worth all this bickering.
I'm certain OVMF has ACPI issues on QEMU, but I don't think it is a huge deal to resolve them independently of this feature.
I was not a huge fan of supporting this type of thing for Xen in OVMF, but it does seem to work fine.
-Jordan
On Mon, Jun 10, 2013 at 01:58:41PM -0500, Anthony Liguori wrote:
"Michael S. Tsirkin" mst@redhat.com writes:
This adds support for device hotplug behind pci bridges. Bridge devices themselves need to be pre-configured on qemu command line.
One of the goals of this project was to demonstrate the advantages of generating ACPI tables in QEMU. In my opinion, it does this successfully.
Since you've gone out of your way to make this difficult to actually review...
What's wrong?
* This saves the need to provide a new interface for firmware to discover bus number to pci brige mapping
Do you mean fw_cfg? The BIOS normally does bus numbering. I see no reason for it not to.
I did my best to explain this. ACPI spec explicitly states config access to devices not on bus 0 is forbidden. This is to allow OS to renumber the bus at any time. So you have pci bus number but can't use it.
* No need for yet another interface to bios to detect qemu version to check if it's safe to activate new code, and to ship multiple table versions:
We only care about supporting one version of SeaBIOS. SeaBIOS should only care about supporting one version of QEMU. We're not asking it to support multiple versions.
We do e.g. when we run with -M1.2 and migrate to old QEMU early in boot process.
we generated code so we know this is new qemu * Use of glib's GArray makes it much easier to build up tables in code without need for iasl and code patching
Adding a dynamic array to SeaBIOS isn't rocket science.
Glib reimplementation in bios will be more code than using existing glib.
I have also tried to address the concerns that Anthony has raised with this approach, please see below.
Design: - each bus gets assigned a number 0-255 - generated ACPI code writes this number to a new BSEL register, then uses existing UP/DOWN registers to probe slot status; to eject, write number to BSEL register, then slot into existing EJ
This is to address the ACPI spec requirement to avoid config cycle access to any bus except PCI roots.
Portability: - Non x86 (or any Linux) platforms don't need any of this code. They can keep happily using SHPC the way they always did.
Things to note: - this is on top of acpi patchset posted a month ago, with a small patch adding a core function to walk all pci buses, on top. Can also be found in my git tree git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi
- Extensive use of glib completely removes pointer math in new code: we use g_array_append_vals exclusively. There's no code patching in new code. This is in response to comments about security concerns with adding code to qemu. In this sense it is more secure than existing code in hw/acpi/core.c - that can be switched to glib if desired. - New code (not imported from seabios) uses glib to reduce dependency on iasl. With time all code can be rewritted to use glib and avoid iasl, if desired. - As was the case previously, systems that lack working iasl are detected at configure time, pre-generated hex files in source tree are used in this case. This addresses the concern with iasl/big-endian systems. - Compile tested only. Migration is known to be broken - there are a bunch of TODO tags in code. It was agreed on the last qemu conf meeting that this code is posted without looking at migration, with understanding that it is addressed before being merged. Please do not mistake this limitation for a fundamental one - I have a very good idea how to fix it, including cross-version migration. - Cross version migration: when running with -M 1.5 and older, all ACPI table generation should be disabled. We'll present FW_CFG interface compatible with 1.5.
So TL;DR, you break a bunch of stuff
Confused. When you asked me to write this up during last conf call, you explicitly said not to bother with migration? Almost any change affecting guest needs migration related work.
and introduce a mess of code.
What did you expect? New functionality with less code?
It would be less code and wouldn't break anything to add this logic to SeaBIOS.
It will be much more code - we would need to add a bunch of interfaces to qemu then code using that to seabios, worry about running this seabios on old qemu, etc etc.
How is this even a discussion?
Interface change is minimized.
Interface to bios is not changed at all. Interface to guests - by a single register.
Here we reuse existing fw cfg interface. That's the point.
Regards,
Anthony Liguori
Cc: Jordan Justen jljusten@gmail.com Cc: Anthony Liguori anthony@codemonkey.ws Cc: Laszlo Ersek lersek@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: seabios@seabios.org Cc: David Woodhouse dwmw2@infradead.org Cc: Kevin O'Connor kevin@koconnor.net Cc: Peter Maydell peter.maydell@linaro.org Signed-off-by: Michael S. Tsirkin mst@redhat.com
docs/specs/acpi_pci_hotplug.txt | 8 + include/hw/i386/pc.h | 4 +- hw/i386/acpi-dsdt.dsl | 36 +++- hw/i386/ssdt-pcihp.dsl | 51 ----- hw/acpi/piix4.c | 145 ++++++++++---- hw/i386/acpi-build.c | 411 ++++++++++++++++++++++++++++++++++----- hw/i386/Makefile.objs | 2 +- hw/i386/ssdt-pcihp.hex.generated | 108 ---------- 8 files changed, 510 insertions(+), 255 deletions(-) delete mode 100644 hw/i386/ssdt-pcihp.dsl delete mode 100644 hw/i386/ssdt-pcihp.hex.generated
diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt index a839434..951b99a 100644 --- a/docs/specs/acpi_pci_hotplug.txt +++ b/docs/specs/acpi_pci_hotplug.txt @@ -43,3 +43,11 @@ PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
Used by ACPI BIOS _RMV method to indicate removability status to OS. One bit per slot. Read-only
+PCI bus selector (IO port 0xae10-0xae13, 4-byte access): +-----------------------------------------------
+Used by ACPI BIOS methods to select the current PCI bus. +When written, makes all the other PCI registers (0xae00 - 0xae0f) +to refer to the appropriate bus. +0 selects PCI root bus (default). diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index f8d0871..66ec787 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -38,7 +38,9 @@ struct PcGuestInfo { bool s3_disabled; bool s4_disabled; uint8_t s4_val;
- DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX);
+#define GUEST_INFO_MAX_HOTPLUG_BUS 256
- PCIBus *hotplug_buses[GUEST_INFO_MAX_HOTPLUG_BUS];
- DECLARE_BITMAP(hotplug_enable[GUEST_INFO_MAX_HOTPLUG_BUS], PCI_SLOT_MAX); uint16_t sci_int; uint8_t acpi_enable_cmd; uint8_t acpi_disable_cmd;
diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 90efce0..dc05668 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -133,11 +133,18 @@ DefinitionBlock ( B0EJ, 32, }
OperationRegion(SEL, SystemIO, 0xae10, 0x04)
Field(SEL, DWordAcc, NoLock, WriteAsZeros) {
BSEL, 32,
}
/* Methods called by bulk generated PCI devices below */
Name(BNUM, 0x00) /* Methods called by hotplug devices */
Method(PCEJ, 1, NotSerialized) {
Method(PCEJ, 2, NotSerialized) { // _EJ0 method - eject callback
Store(Arg1, BSEL) Store(ShiftLeft(1, Arg0), B0EJ) Return (0x0) }
@@ -147,16 +154,25 @@ DefinitionBlock (
/* PCI hotplug notify method */ Method(PCNF, 0) {
// Local0 = iterator
// Local0 = bridge selector Store(Zero, Local0)
While (LLess(Local0, 31)) {
Increment(Local0)
If (And(PCIU, ShiftLeft(1, Local0))) {
PCNT(Local0, 1)
}
If (And(PCID, ShiftLeft(1, Local0))) {
PCNT(Local0, 3)
}
While (LLess(Local0, 255)) {
Store(Local0, BSEL)
// Local1 = slot iterator
Store(Zero, Local1)
Store(PCIU, Local2)
Store(PCID, Local3)
While (LLess(Local1, 31)) {
Increment(Local1)
If (And(Local2, ShiftLeft(1, Local1))) {
PCNT(Add(Local1, Multiply(Local0, 32)), 1)
}
If (And(Local3, ShiftLeft(1, Local1))) {
PCNT(Add(Local1, Multiply(Local0, 32)), 3)
}
}
}Increment(Local0) } }
diff --git a/hw/i386/ssdt-pcihp.dsl b/hw/i386/ssdt-pcihp.dsl deleted file mode 100644 index d29a5b9..0000000 --- a/hw/i386/ssdt-pcihp.dsl +++ /dev/null @@ -1,51 +0,0 @@ -/*
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- You should have received a copy of the GNU General Public License along
- with this program; if not, see http://www.gnu.org/licenses/.
- */
-ACPI_EXTRACT_ALL_CODE ssdp_pcihp_aml
-DefinitionBlock ("ssdt-pcihp.aml", "SSDT", 0x01, "BXPC", "BXSSDTPCIHP", 0x1) -{
-/****************************************************************
- PCI hotplug
- ****************************************************************/
- /* Objects supplied by DSDT */
- External(_SB.PCI0, DeviceObj)
- External(_SB.PCI0.PCEJ, MethodObj)
- Scope(_SB.PCI0) {
/* Bulk generated PCI hotplug devices */
ACPI_EXTRACT_DEVICE_START ssdt_pcihp_start
ACPI_EXTRACT_DEVICE_END ssdt_pcihp_end
ACPI_EXTRACT_DEVICE_STRING ssdt_pcihp_name
// Method _EJ0 can be patched by BIOS to EJ0_
// at runtime, if the slot is detected to not support hotplug.
// Extract the offset of the address dword and the
// _EJ0 name to allow this patching.
Device(SAA) {
ACPI_EXTRACT_NAME_BYTE_CONST ssdt_pcihp_id
Name(_SUN, 0xAA)
ACPI_EXTRACT_NAME_DWORD_CONST ssdt_pcihp_adr
Name(_ADR, 0xAA0000)
ACPI_EXTRACT_METHOD_STRING ssdt_pcihp_ej0
Method(_EJ0, 1) {
Return (PCEJ(_SUN))
}
}
- }
-} diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 19a26f5..10106fd 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -29,6 +29,7 @@ #include "exec/ioport.h" #include "hw/nvram/fw_cfg.h" #include "exec/address-spaces.h" +#include "hw/pci/pci_bus.h"
//#define DEBUG
@@ -42,11 +43,12 @@ #define GPE_LEN 4
#define PCI_HOTPLUG_ADDR 0xae00 -#define PCI_HOTPLUG_SIZE 0x000f +#define PCI_HOTPLUG_SIZE 0x0014 #define PCI_UP_BASE 0xae00 #define PCI_DOWN_BASE 0xae04 #define PCI_EJ_BASE 0xae08 #define PCI_RMV_BASE 0xae0c +#define PCI_SEL_BASE 0xae10
#define PIIX4_PROC_BASE 0xaf00 #define PIIX4_PROC_LEN 32 @@ -57,6 +59,8 @@ struct pci_status { uint32_t up; /* deprecated, maintained for migration compatibility */ uint32_t down;
- uint32_t hotplug_enable;
- uint32_t device_present;
};
typedef struct CPUStatus { @@ -84,9 +88,8 @@ typedef struct PIIX4PMState { Notifier powerdown_notifier;
/* for pci hotplug */
- struct pci_status pci0_status;
- uint32_t pci0_hotplug_enable;
- uint32_t pci0_slot_device_present;
struct pci_status pci_status[GUEST_INFO_MAX_HOTPLUG_BUS];
uint32_t hotplug_select;
uint8_t disable_s3; uint8_t disable_s4;
@@ -98,6 +101,17 @@ typedef struct PIIX4PMState { PcGuestInfo *guest_info; } PIIX4PMState;
+static int piix4_find_hotplug_bus(PcGuestInfo *guest_info, PCIBus *bus) +{
- int i;
- for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) {
if (guest_info->hotplug_buses[i] == bus) {
return i;
}
- }
- return -1;
+}
static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s);
@@ -183,6 +197,8 @@ static void pm_write_config(PCIDevice *d,
static void vmstate_pci_status_pre_save(void *opaque) { +#if 0
- /* TODO */ struct pci_status *pci0_status = opaque; PIIX4PMState *s = container_of(pci0_status, PIIX4PMState, pci0_status);
@@ -190,6 +206,7 @@ static void vmstate_pci_status_pre_save(void *opaque) * to a version that still does... of course these might get lost * by an old buggy implementation, but we try. */ pci0_status->up = s->pci0_slot_device_present & s->pci0_hotplug_enable; +#endif }
static int vmstate_acpi_post_load(void *opaque, int version_id) @@ -267,7 +284,10 @@ static int acpi_load_old(QEMUFile *f, void *opaque, int version_id) qemu_get_be16s(f, &temp); }
+#if 0
- TODO ret = vmstate_load_state(f, &vmstate_pci_status, &s->pci0_status, 1);
+#endif return ret; }
@@ -293,21 +313,28 @@ static const VMStateDescription vmstate_acpi = { VMSTATE_TIMER(ar.tmr.timer, PIIX4PMState), VMSTATE_INT64(ar.tmr.overflow_time, PIIX4PMState), VMSTATE_STRUCT(ar.gpe, PIIX4PMState, 2, vmstate_gpe, ACPIGPE), +#if 0 VMSTATE_STRUCT(pci0_status, PIIX4PMState, 2, vmstate_pci_status, struct pci_status), +#endif VMSTATE_END_OF_LIST() } };
-static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned bsel, unsigned slots) { BusChild *kid, *next;
- BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
BusState *bus; int slot = ffs(slots) - 1; bool slot_free = true;
if (!s->guest_info->hotplug_buses[bsel]) {
return;
}
bus = &s->guest_info->hotplug_buses[bsel]->qbus;
/* Mark request as complete */
- s->pci0_status.down &= ~(1U << slot);
s->pci_status[bsel].down &= ~(1U << slot);
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child;
@@ -322,23 +349,26 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) } } if (slot_free) {
s->pci0_slot_device_present &= ~(1U << slot);
}s->pci_status[bsel].device_present &= ~(1U << slot);
}
-static void piix4_update_hotplug(PIIX4PMState *s) +static void piix4_update_hotplug(PIIX4PMState *s, unsigned bsel) {
- PCIDevice *dev = &s->dev;
- BusState *bus = qdev_get_parent_bus(&dev->qdev); BusChild *kid, *next;
BusState *bus;
if (!s->guest_info->hotplug_buses[bsel]) {
return;
}
bus = &s->guest_info->hotplug_buses[bsel]->qbus; /* Execute any pending removes during reset */
- while (s->pci0_status.down) {
acpi_piix_eject_slot(s, s->pci0_status.down);
- while (s->pci_status[bsel].down) {
}acpi_piix_eject_slot(s, bsel, s->pci_status[bsel].down);
- s->pci0_hotplug_enable = ~0;
- s->pci0_slot_device_present = 0;
s->pci_status[bsel].hotplug_enable = ~0;
s->pci_status[bsel].device_present = 0;
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child;
@@ -347,10 +377,10 @@ static void piix4_update_hotplug(PIIX4PMState *s) int slot = PCI_SLOT(pdev->devfn);
if (pc->no_hotplug) {
s->pci0_hotplug_enable &= ~(1U << slot);
s->pci_status[bsel].hotplug_enable &= ~(1U << slot); }
s->pci0_slot_device_present |= (1U << slot);
}s->pci_status[bsel].device_present |= (1U << slot);
}
@@ -358,6 +388,7 @@ static void piix4_reset(void *opaque) { PIIX4PMState *s = opaque; uint8_t *pci_conf = s->dev.config;
int i;
pci_conf[0x58] = 0; pci_conf[0x59] = 0;
@@ -371,7 +402,9 @@ static void piix4_reset(void *opaque) /* Mark SMM as already inited (until KVM supports SMM). */ pci_conf[0x5B] = 0x02; }
- piix4_update_hotplug(s);
- for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) {
piix4_update_hotplug(s, i);
- }
}
static void piix4_pm_powerdown_req(Notifier *n, void *opaque) @@ -382,14 +415,20 @@ static void piix4_pm_powerdown_req(Notifier *n, void *opaque) acpi_pm1_evt_power_down(&s->ar); }
-static void piix4_update_guest_info(PIIX4PMState *s) +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
PCIHotplugState state);
+static void piix4_update_bus_guest_info(PCIBus *pci_bus, void *opaque) {
- PCIDevice *dev = &s->dev;
- BusState *bus = qdev_get_parent_bus(&dev->qdev);
- PIIX4PMState *s = opaque;
- PcGuestInfo *guest_info = s->guest_info; BusChild *kid, *next;
- BusState *bus = &pci_bus->qbus;
- memset(s->guest_info->slot_hotplug_enable, 0xff,
DIV_ROUND_UP(PCI_SLOT_MAX, BITS_PER_BYTE));
int i = piix4_find_hotplug_bus(guest_info, NULL);
assert(i >= 0);
guest_info->hotplug_buses[i] = pci_bus;
QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child;
@@ -398,9 +437,22 @@ static void piix4_update_guest_info(PIIX4PMState *s) int slot = PCI_SLOT(pdev->devfn);
if (pc->no_hotplug) {
clear_bit(slot, s->guest_info->slot_hotplug_enable);
}clear_bit(slot, guest_info->hotplug_enable[i]); }
- pci_bus_hotplug(pci_bus, piix4_device_hotplug, &s->dev.qdev);
+}
+static void piix4_update_guest_info(PIIX4PMState *s) +{
- PCIDevice *dev = &s->dev;
- memset(s->guest_info->hotplug_enable, 0xff,
sizeof s->guest_info->hotplug_enable);
- pci_for_each_bus(dev->bus, piix4_update_bus_guest_info,
s);
}
static void piix4_pm_machine_ready(Notifier *n, void *opaque) @@ -589,16 +641,22 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) { PIIX4PMState *s = opaque; uint32_t val = 0;
int bsel = s->hotplug_select;
if (bsel > GUEST_INFO_MAX_HOTPLUG_BUS) {
return 0;
}
switch (addr) { case PCI_UP_BASE - PCI_HOTPLUG_ADDR: /* Manufacture an "up" value to cause a device check on any hotplug * slot with a device. Extra device checks are harmless. */
val = s->pci0_slot_device_present & s->pci0_hotplug_enable;
val = s->pci_status[bsel].device_present &
case PCI_DOWN_BASE - PCI_HOTPLUG_ADDR:s->pci_status[bsel].hotplug_enable; PIIX4_DPRINTF("pci_up_read %x\n", val); break;
val = s->pci0_status.down;
case PCI_EJ_BASE - PCI_HOTPLUG_ADDR:val = s->pci_status[bsel].down; PIIX4_DPRINTF("pci_down_read %x\n", val); break;
@@ -606,8 +664,10 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) PIIX4_DPRINTF("pci_features_read %x\n", val); break; case PCI_RMV_BASE - PCI_HOTPLUG_ADDR:
val = s->pci0_hotplug_enable;
val = s->pci_status[bsel].hotplug_enable; break;
- case PCI_SEL_BASE - PCI_HOTPLUG_ADDR:
default: break; }val = s->hotplug_select;
@@ -618,12 +678,18 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) static void pci_write(void *opaque, hwaddr addr, uint64_t data, unsigned int size) {
- PIIX4PMState *s = opaque; switch (addr) { case PCI_EJ_BASE - PCI_HOTPLUG_ADDR:
acpi_piix_eject_slot(opaque, (uint32_t)data);
if (s->hotplug_select >= GUEST_INFO_MAX_HOTPLUG_BUS) {
break;
}
acpi_piix_eject_slot(s, s->hotplug_select, data); PIIX4_DPRINTF("pciej write %" HWADDR_PRIx " <== % " PRIu64 "\n", addr, data); break;
- case PCI_SEL_BASE - PCI_HOTPLUG_ADDR:
default: break; }s->hotplug_select = data;
@@ -706,9 +772,6 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data) g->sts[id / 8] |= (1 << (id % 8)); }
-static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
PCIHotplugState state);
static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s) { @@ -720,8 +783,6 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCI_HOTPLUG_SIZE); memory_region_add_subregion(parent, PCI_HOTPLUG_ADDR, &s->io_pci);
- pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
- qemu_for_each_cpu(piix4_init_cpu_status, &s->gpe_cpu); memory_region_init_io(&s->io_cpu, &cpu_hotplug_ops, s, "apci-cpu-hotplug", PIIX4_PROC_LEN);
@@ -730,16 +791,16 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, qemu_register_cpu_added_notifier(&s->cpu_added_notifier); }
-static void enable_device(PIIX4PMState *s, int slot) +static void enable_device(PIIX4PMState *s, unsigned bsel, int slot) { s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
- s->pci0_slot_device_present |= (1U << slot);
- s->pci_status[bsel].device_present |= (1U << slot);
}
-static void disable_device(PIIX4PMState *s, int slot) +static void disable_device(PIIX4PMState *s, unsigned bsel, int slot) { s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
- s->pci0_status.down |= (1U << slot);
- s->pci_status[bsel].down |= (1U << slot);
}
static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, @@ -748,19 +809,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int slot = PCI_SLOT(dev->devfn); PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, PCI_DEVICE(qdev));
int bsel = piix4_find_hotplug_bus(s->guest_info, dev->bus);
if (bsel < 0) {
return -1;
}
/* Don't send event when device is enabled during qemu machine creation:
- it is present on boot, no hotplug event is necessary. We do send an
- event when the device is disabled later. */
if (state == PCI_COLDPLUG_ENABLED) {
s->pci0_slot_device_present |= (1U << slot);
s->pci_status[bsel].device_present |= (1U << slot); return 0;
}
if (state == PCI_HOTPLUG_ENABLED) {
enable_device(s, slot);
} else {enable_device(s, bsel, slot);
disable_device(s, slot);
disable_device(s, bsel, slot);
}
pm_update_sci(s);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 72606a8..43e4988 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -34,6 +34,7 @@ #include "hw/acpi/acpi.h" #include "hw/nvram/fw_cfg.h" #include "hw/i386/bios-linker-loader.h" +#include "hw/pci/pci_bus.h"
#define ACPI_BUILD_APPNAME "Bochs" #define ACPI_BUILD_APPNAME6 "BOCHS " @@ -258,23 +259,358 @@ acpi_encode_len(uint8_t *ssdt_ptr, int length, int bytes) #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start)
-/* 0x5B 0x82 DeviceOp PkgLength NameString */ -#define ACPI_PCIHP_OFFSET_HEX (*ssdt_pcihp_name - *ssdt_pcihp_start + 1) -#define ACPI_PCIHP_OFFSET_ID (*ssdt_pcihp_id - *ssdt_pcihp_start) -#define ACPI_PCIHP_OFFSET_ADR (*ssdt_pcihp_adr - *ssdt_pcihp_start) -#define ACPI_PCIHP_OFFSET_EJ0 (*ssdt_pcihp_ej0 - *ssdt_pcihp_start) -#define ACPI_PCIHP_SIZEOF (*ssdt_pcihp_end - *ssdt_pcihp_start) -#define ACPI_PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start)
#define ACPI_SSDT_SIGNATURE 0x54445353 /* SSDT */ #define ACPI_SSDT_HEADER_LENGTH 36
#include "hw/i386/ssdt-misc.hex" -#include "hw/i386/ssdt-pcihp.hex"
+static inline GArray *build_alloc_array(void) +{
return g_array_new(false, true /* clear */, 1);
+}
+static inline void build_free_array(GArray *array) +{
g_array_free(array, true);
+}
+static inline void build_prepend_byte(GArray *array, uint8_t val) +{
- g_array_prepend_val(array, val);
+}
+static inline void build_append_byte(GArray *array, uint8_t val) +{
- g_array_append_val(array, val);
+}
+static inline void build_prepend_array(GArray *array, GArray *val) +{
- g_array_prepend_vals(array, val->data, val->len);
+}
+static inline void build_append_array(GArray *array, GArray *val) +{
- g_array_append_vals(array, val->data, val->len);
+}
+static void build_append_nameseg(GArray *array, const char *format, ...) +{
- GString *s = g_string_new("");
- va_list args;
- va_start(args, format);
- g_string_vprintf(s, format, args);
- va_end(args);
- assert(s->len == 4);
- g_array_append_vals(array, s->str, s->len);
- g_string_free(s, true);
+}
+static void build_prepend_nameseg(GArray *array, const char *format, ...) +{
- GString *s = g_string_new("");
- va_list args;
- va_start(args, format);
- g_string_vprintf(s, format, args);
- va_end(args);
- assert(s->len == 4);
- g_array_prepend_vals(array, s->str, s->len);
- g_string_free(s, true);
+}
+static void build_prepend_devfn(GArray *name, uint8_t devfn) +{
- build_prepend_nameseg(name, "S%0.02x_", devfn);
+}
+static void build_append_devfn(GArray *name, uint8_t devfn) +{
- GArray *array = build_alloc_array();
- build_prepend_devfn(array, devfn);
- build_append_array(name, array);
- build_free_array(array);
+}
+static void build_prepend_name_string(GArray *name, PCIBus *bus) +{
- int segcount;
- for (;!pci_bus_is_root(bus); bus = bus->parent_dev->bus) {
build_prepend_devfn(name, bus->parent_dev->devfn);
- }
- g_array_prepend_vals(name, "PCI0", 4);
- g_array_prepend_vals(name, "_SB_", 4);
- /* Name segment length is a multiple of 4 */
- assert(name->len % 4 == 0);
- segcount = name->len / 4;
- build_prepend_byte(name, segcount);
- build_prepend_byte(name, 0x2f); /* MultiNamePrefix */
- build_prepend_byte(name, '\'); /* RootChar */
+}
+static void build_append_name_string(GArray *name, PCIBus *bus) +{
- GArray *append = build_alloc_array();
- build_prepend_name_string(append, bus);
- build_append_array(name, append);
- build_free_array(append);
+}
+/* 5.4 Definition Block Encoding */ +enum {
- PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */
- PACKAGE_LENGTH_2BYTE_SHIFT = 4,
- PACKAGE_LENGTH_3BYTE_SHIFT = 12,
- PACKAGE_LENGTH_4BYTE_SHIFT = 20,
+};
+static void build_prepend_package_length(GArray *package) +{
- unsigned bytes;
- uint8_t byte;
- unsigned package_len = package->len;
- unsigned length = package_len;
- if (length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT)) {
byte = length;
build_prepend_byte(package, byte);
return;
- }
- if (length + 4 > (1 << PACKAGE_LENGTH_4BYTE_SHIFT)) {
byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT;
build_prepend_byte(package, byte);
length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1;
- }
- if (length + 3 > (1 << PACKAGE_LENGTH_3BYTE_SHIFT)) {
byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT;
build_prepend_byte(package, byte);
length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1;
- }
- if (length + 2 > (1 << PACKAGE_LENGTH_2BYTE_SHIFT)) {
byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT;
build_prepend_byte(package, byte);
length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1;
- }
- bytes = package->len - package_len;
- byte = (bytes << PACKAGE_LENGTH_1BYTE_SHIFT) | length;
- build_prepend_byte(package, byte);
+}
+static void build_package(GArray *package, uint8_t op) +{
- build_prepend_package_length(package);
- build_prepend_byte(package, op);
+}
+static void build_extop_package(GArray *package, uint8_t op) +{
- build_package(package, op);
- build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
+}
+static void build_append_notify(GArray *method, GArray *target_name,
uint16_t value)
+{
- GArray *notify = build_alloc_array();
- uint8_t op = 0xA0; /* IfOp */
- build_append_byte(notify, 0x93); /* LEqualOp */
- build_append_byte(notify, 0x68); /* Arg0Op */
- build_append_byte(notify, 0x0A); /* BytePrefix */
- build_append_byte(notify, value);
- build_append_byte(notify, value >> 8);
- build_append_byte(notify, 0x86); /* NotifyOp */
- build_append_array(notify, target_name);
- build_append_byte(notify, 0x69); /* Arg1Op */
- /* Pack it up */
- build_package(notify, op);
- build_append_array(method, notify);
- build_free_array(notify);
+}
+static void +build_append_notify_slots(GArray *method, PCIBus *bus,
unsigned bus_select,
DECLARE_BITMAP(hotplug_enable, PCI_SLOT_MAX))
+{
- int i;
- if (!bus) {
return;
- }
- for (i = 0; i < 32; i++) {
GArray *target;
if (!test_bit(i, hotplug_enable)) {
continue;
}
target = build_alloc_array();
build_prepend_devfn(target, i);
build_prepend_name_string(target, bus);
/* *32 matches Multiply in PCNF */
build_append_notify(method, target, bus_select * 32 + i);
build_free_array(target);
- }
+}
+static void build_append_notify_buses(GArray *table, PcGuestInfo *guest_info) +{
- int i;
- GArray *method = build_alloc_array();
- uint8_t op = 0x14; /* MethodOp */
- build_append_nameseg(method, "PCNT");
- build_append_byte(method, 0x02); /* MethodFlags: ArgCount */
- for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) {
build_append_notify_slots(method, guest_info->hotplug_buses[i], i,
guest_info->hotplug_enable[i]);
- }
- build_package(method, op);
- build_append_array(table, method);
- build_free_array(method);
+}
+static void build_append_device(GArray *table, unsigned devfn) +{
- uint8_t op, ej0_op;
- GArray *device = build_alloc_array();
- GArray *ej0 = build_alloc_array();
- op = 0x82; /* DeviceOp */
- build_append_devfn(device, devfn);
- build_append_byte(device, 0x08); /* NameOp */
- build_append_nameseg(device, "_SUN");
- build_append_byte(device, 0x0A); /* BytePrefix */
- build_append_byte(device, PCI_SLOT(devfn));
- build_append_byte(device, 0x08); /* NameOp */
- build_append_nameseg(device, "_ADR");
- build_append_byte(device, 0x0C); /* DWordPrefix */
- build_append_byte(device, PCI_FUNC(devfn));
- build_append_byte(device, 0x00);
- build_append_byte(device, PCI_SLOT(devfn));
- build_append_byte(device, 0x00);
- ej0_op = 0x14; /* MethodOp */
- build_append_nameseg(ej0, "_EJ0");
- build_append_byte(ej0, 0x01); /* MethodFlags: ArgCount */
- build_append_byte(ej0, 0xA4); /* ReturnOp */
- build_append_nameseg(ej0, "PCEJ");
- build_append_nameseg(ej0, "_SUN");
- build_append_nameseg(ej0, "BNUM");
- build_package(ej0, ej0_op);
- build_append_array(device, ej0);
- build_extop_package(device, op);
- build_append_array(table, device);
- build_free_array(device);
- build_free_array(ej0);
+}
+static bool build_find_device_bus(PCIBus *buses[GUEST_INFO_MAX_HOTPLUG_BUS],
PCIBus *bus, unsigned devfn)
+{
- int i;
- for (i = 0; i < ARRAY_SIZE(buses); ++i) {
if (buses[i] && buses[i]->parent_dev &&
buses[i]->parent_dev->bus == bus &&
buses[i]->parent_dev->devfn == devfn) {
return true;
}
- }
- return false;
+}
+static void +build_append_bus(GArray *table, int bnum, PcGuestInfo *guest_info) +{
- uint8_t op;
- int slot;
- unsigned devfn;
- GArray *device;
- PCIBus *bus = guest_info->hotplug_buses[bnum];
- if (!bus) {
return;
- }
- device = build_alloc_array();
- /* Root device described in DSDT */
- if (bnum) {
devfn = bus->parent_dev->devfn;
op = 0x82; /* DeviceOp */
build_append_name_string(device, bus);
build_append_byte(device, 0x08); /* NameOp */
build_append_nameseg(device, "_SUN");
build_append_byte(device, 0x0A); /* BytePrefix */
build_append_byte(device, PCI_SLOT(devfn));
build_append_byte(device, 0x08); /* NameOp */
build_append_nameseg(device, "BNUM");
build_append_byte(device, 0x0A); /* BytePrefix */
build_append_byte(device, bnum);
build_append_byte(device, 0x08); /* NameOp */
build_append_nameseg(device, "_ADR");
build_append_byte(device, 0x0C); /* DWordPrefix */
build_append_byte(device, PCI_FUNC(devfn));
build_append_byte(device, 0x00);
build_append_byte(device, PCI_SLOT(devfn));
build_append_byte(device, 0x00);
- } else {
op = 0x10; /* ScopeOp */
build_append_nameseg(device, "PCI0");
- }
- for (slot = 0; slot < PCI_SLOT_MAX; ++slot) {
if (!test_bit(slot, guest_info->hotplug_enable[bnum])) {
continue;
}
/* Bus device? We'll add it later */
if (build_find_device_bus(guest_info->hotplug_buses,
bus, PCI_DEVFN(slot, 0))) {
continue;
}
build_append_device(device, PCI_DEVFN(slot, 0));
- }
- if (bnum) {
build_extop_package(device, op);
- } else {
build_package(device, op);
- }
- build_append_array(table, device);
- build_free_array(device);
+}
+static void build_append_device_buses(GArray *table, PcGuestInfo *guest_info) +{
- int i;
- for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) {
build_append_bus(table, i, guest_info);
- }
+}
static uint8_t* build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count,
const char *target, int ofs)
const char *target, int ofs, int step)
{ int i;
@@ -295,31 +631,14 @@ build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, *(ssdt_ptr++) = i; *(ssdt_ptr++) = 0x86; /* NotifyOp */ memcpy(ssdt_ptr, target, 4);
ssdt_ptr[ofs] = acpi_get_hex(i >> 4);
ssdt_ptr[ofs + 1] = acpi_get_hex(i);
ssdt_ptr[ofs] = acpi_get_hex((i * step) >> 4);
} return ssdt_ptr;ssdt_ptr[ofs + 1] = acpi_get_hex(i * step); ssdt_ptr += 4; *(ssdt_ptr++) = 0x69; /* Arg1Op */
}
-static void patch_pcihp(int slot, uint8_t *ssdt_ptr, uint32_t eject) -{
- ssdt_ptr[ACPI_PCIHP_OFFSET_HEX] = acpi_get_hex(slot >> 4);
- ssdt_ptr[ACPI_PCIHP_OFFSET_HEX+1] = acpi_get_hex(slot);
- ssdt_ptr[ACPI_PCIHP_OFFSET_ID] = slot;
- ssdt_ptr[ACPI_PCIHP_OFFSET_ADR + 2] = slot;
- /* Runtime patching of ACPI_EJ0: to disable hotplug for a slot,
* replace the method name: _EJ0 by ACPI_EJ0_. */
- /* Sanity check */
- assert (!memcmp(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "_EJ0", 4));
- if (!eject) {
memcpy(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "EJ0_", 4);
- }
-}
static void build_ssdt(GArray *table_data, GArray *linker, FWCfgState *fw_cfg, PcGuestInfo *guest_info) @@ -330,11 +649,20 @@ build_ssdt(GArray *table_data, GArray *linker, + (acpi_cpus * ACPI_PROC_SIZEOF) /* procs */ + (1+2+5+(12*acpi_cpus)) /* NTFY */ + (6+2+1+(1*acpi_cpus)) /* CPON */
+ (1+3+4) /* Scope(PCI0) */
+ ((PCI_SLOT_MAX - 1) * ACPI_PCIHP_SIZEOF) /* slots */
+ (1+2+5+(12*(PCI_SLOT_MAX - 1)))); /* PCNT */
- uint8_t *ssdt = acpi_data_push(table_data, length);
- uint8_t *ssdt_ptr = ssdt;
);
uint8_t *ssdt;
uint8_t *ssdt_ptr;
GArray *devices = build_alloc_array();
GArray *notify = build_alloc_array();
/* TODO: rewrite this function using glib */
build_append_device_buses(devices, guest_info);
length += devices->len;
build_append_notify_buses(notify, guest_info);
length += notify->len;
ssdt = acpi_data_push(table_data, length);
ssdt_ptr = ssdt;
/* Copy header and encode fwcfg values in the S3_ / S4_ / S5_ packages */ memcpy(ssdt_ptr, ssdp_misc_aml, sizeof(ssdp_misc_aml));
@@ -389,7 +717,7 @@ build_ssdt(GArray *table_data, GArray *linker,
/* build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}" */ /* Arg0 = Processor ID = APIC ID */
- ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2);
ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2, 1);
/* build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })" */ *(ssdt_ptr++) = 0x08; /* NameOp */
@@ -411,15 +739,10 @@ build_ssdt(GArray *table_data, GArray *linker, *(ssdt_ptr++) = 'I'; *(ssdt_ptr++) = '0';
- /* build Device object for each slot */
- for (i = 1; i < PCI_SLOT_MAX; i++) {
bool eject = test_bit(i, guest_info->slot_hotplug_enable);
memcpy(ssdt_ptr, ACPI_PCIHP_AML, ACPI_PCIHP_SIZEOF);
patch_pcihp(i, ssdt_ptr, eject);
ssdt_ptr += ACPI_PCIHP_SIZEOF;
- }
- ssdt_ptr = build_notify(ssdt_ptr, "PCNT", 1, PCI_SLOT_MAX, "S00_", 1);
memcpy(ssdt_ptr, devices->data, devices->len);
ssdt_ptr += devices->len;
memcpy(ssdt_ptr, notify->data, notify->len);
ssdt_ptr += notify->len;
build_header(linker, table_data, (void*)ssdt, ACPI_SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 2ab2572..c2e74e9 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -6,7 +6,7 @@ obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o obj-y += kvmvapic.o obj-y += acpi-build.o obj-y += bios-linker-loader.o -hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex +hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex hw/i386/pc_piix.o: hw/i386/pc_piix.c hw/i386/acpi-dsdt.hex hw/i386/pc_q35.o: hw/i386/pc_q35.c hw/i386/q35-acpi-dsdt.hex
diff --git a/hw/i386/ssdt-pcihp.hex.generated b/hw/i386/ssdt-pcihp.hex.generated deleted file mode 100644 index 0d32a27..0000000 --- a/hw/i386/ssdt-pcihp.hex.generated +++ /dev/null @@ -1,108 +0,0 @@ -static unsigned char ssdt_pcihp_name[] = { -0x33 -}; -static unsigned char ssdt_pcihp_adr[] = { -0x44 -}; -static unsigned char ssdt_pcihp_end[] = { -0x58 -}; -static unsigned char ssdp_pcihp_aml[] = { -0x53, -0x53, -0x44, -0x54, -0x58, -0x0, -0x0, -0x0, -0x1, -0x77, -0x42, -0x58, -0x50, -0x43, -0x0, -0x0, -0x42, -0x58, -0x53, -0x53, -0x44, -0x54, -0x50, -0x43, -0x1, -0x0, -0x0, -0x0, -0x49, -0x4e, -0x54, -0x4c, -0x28, -0x5, -0x10, -0x20, -0x10, -0x33, -0x5c, -0x2e, -0x5f, -0x53, -0x42, -0x5f, -0x50, -0x43, -0x49, -0x30, -0x5b, -0x82, -0x26, -0x53, -0x41, -0x41, -0x5f, -0x8, -0x5f, -0x53, -0x55, -0x4e, -0xa, -0xaa, -0x8, -0x5f, -0x41, -0x44, -0x52, -0xc, -0x0, -0x0, -0xaa, -0x0, -0x14, -0xf, -0x5f, -0x45, -0x4a, -0x30, -0x1, -0xa4, -0x50, -0x43, -0x45, -0x4a, -0x5f, -0x53, -0x55, -0x4e -}; -static unsigned char ssdt_pcihp_start[] = { -0x30 -}; -static unsigned char ssdt_pcihp_id[] = { -0x3d -}; -static unsigned char ssdt_pcihp_ej0[] = { -0x4a
-};
MST
Hi,
* Use of glib's GArray makes it much easier to build up tables in code without need for iasl and code patching
Nice.
Design: - each bus gets assigned a number 0-255 - generated ACPI code writes this number to a new BSEL register, then uses existing UP/DOWN registers to probe slot status; to eject, write number to BSEL register, then slot into existing EJ
This is to address the ACPI spec requirement to avoid config cycle access to any bus except PCI roots.
Portability: - Non x86 (or any Linux) platforms don't need any of this code. They can keep happily using SHPC the way they always did.
Hmm. Is is possible to write a SHPC driver in AML? I think it would be alot better to have one guest/host interface for pci bridge hotplug instead of two.
cheers, Gerd
On Tue, Jun 11, 2013 at 07:33:02AM +0200, Gerd Hoffmann wrote:
Hi,
* Use of glib's GArray makes it much easier to build up tables in code without need for iasl and code patching
Nice.
Design: - each bus gets assigned a number 0-255 - generated ACPI code writes this number to a new BSEL register, then uses existing UP/DOWN registers to probe slot status; to eject, write number to BSEL register, then slot into existing EJ
This is to address the ACPI spec requirement to avoid config cycle access to any bus except PCI roots.
Portability: - Non x86 (or any Linux) platforms don't need any of this code. They can keep happily using SHPC the way they always did.
Hmm. Is is possible to write a SHPC driver in AML? I think it would be alot better to have one guest/host interface for pci bridge hotplug instead of two.
cheers, Gerd
No, it's not possible, SHPC is not designed to be used from ACPI.
Two reasons off the top of my head, there are likely others:
1. SHPC uses regular PCI interrupts to signal events. It does not signal GFE and SCI.
2. SHPC uses config accesses to get information from device. ACPI does not allow config access anywhere except the root bus from ACPI (This requirement is designed to give the OS freedom to reconfigure PCI in an arbitrary way).
Hi,
Portability: - Non x86 (or any Linux) platforms don't need any of this code. They can keep happily using SHPC the way they always did.
Hmm. Is is possible to write a SHPC driver in AML? I think it would be alot better to have one guest/host interface for pci bridge hotplug instead of two.
cheers, Gerd
No, it's not possible, SHPC is not designed to be used from ACPI.
Two reasons off the top of my head, there are likely others:
SHPC uses regular PCI interrupts to signal events. It does not signal GFE and SCI.
SHPC uses config accesses to get information from device. ACPI does not allow config access anywhere except the root bus from ACPI (This requirement is designed to give the OS freedom to reconfigure PCI in an arbitrary way).
OK, so it's designed for OSes to have native SHPC support. Linux has that?
Quick googling found me Windows Vista+ has it too, correct? So that leaves Win2k + WinXP versions. Older Windows versions do not support pci hotplug at all. Win2k is EOL already. WinXP will follow soon.
More users?
/me wonders whenever it is worth hopping through the loops needed to support ACPI-based hotplug of devices behind bridges in the first place.
cheers, Gerd
On Tue, Jun 11, 2013 at 09:42:29AM +0200, Gerd Hoffmann wrote:
Hi,
Portability: - Non x86 (or any Linux) platforms don't need any of this code. They can keep happily using SHPC the way they always did.
Hmm. Is is possible to write a SHPC driver in AML? I think it would be alot better to have one guest/host interface for pci bridge hotplug instead of two.
cheers, Gerd
No, it's not possible, SHPC is not designed to be used from ACPI.
Two reasons off the top of my head, there are likely others:
SHPC uses regular PCI interrupts to signal events. It does not signal GFE and SCI.
SHPC uses config accesses to get information from device. ACPI does not allow config access anywhere except the root bus from ACPI (This requirement is designed to give the OS freedom to reconfigure PCI in an arbitrary way).
OK, so it's designed for OSes to have native SHPC support. Linux has that?
Yes.
Quick googling found me Windows Vista+ has it too, correct? So that leaves Win2k + WinXP versions. Older Windows versions do not support pci hotplug at all. Win2k is EOL already. WinXP will follow soon.
More users?
googling lead you astray. No windows version supports SHPC.
/me wonders whenever it is worth hopping through the loops needed to support ACPI-based hotplug of devices behind bridges in the first place.
cheers, Gerd
Yes and not just because of windows guests. ACPI spec is also very explicit that native hotplug is an optional feature. Test suites such as WHQL are known to test spec compliance.
Hi,
Yes and not just because of windows guests. ACPI spec is also very explicit that native hotplug is an optional feature. Test suites such as WHQL are known to test spec compliance.
/me looks a bit surprised.
This pretty much implies that any shpc bridge needs a second interface to the hotplug functionality which can be driven via ACPI. Or the firmware somehow handles this using smm (not sure this works for the IRQ though).
Do you know how this is handled by real hardware?
cheers, Gerd
On Tue, Jun 11, 2013 at 10:00:36AM +0200, Gerd Hoffmann wrote:
Hi,
Yes and not just because of windows guests. ACPI spec is also very explicit that native hotplug is an optional feature. Test suites such as WHQL are known to test spec compliance.
/me looks a bit surprised.
This pretty much implies that any shpc bridge needs a second interface to the hotplug functionality which can be driven via ACPI. Or the firmware somehow handles this using smm (not sure this works for the IRQ though).
Do you know how this is handled by real hardware?
cheers, Gerd
SHPC is not very widely deployed on a PC.
Since most hardware vendors do care about windows support, the only way is a separate interface that is driven via ACPI. You then need an ACPI specific register to switch to standard SHPC. The SHPC spec even tells you as much.
On Tue, Jun 11, 2013 at 11:18:13AM +0300, Michael S. Tsirkin wrote:
On Tue, Jun 11, 2013 at 10:00:36AM +0200, Gerd Hoffmann wrote:
Hi,
Yes and not just because of windows guests. ACPI spec is also very explicit that native hotplug is an optional feature. Test suites such as WHQL are known to test spec compliance.
/me looks a bit surprised.
This pretty much implies that any shpc bridge needs a second interface to the hotplug functionality which can be driven via ACPI. Or the firmware somehow handles this using smm (not sure this works for the IRQ though).
Do you know how this is handled by real hardware?
cheers, Gerd
SHPC is not very widely deployed on a PC.
Since most hardware vendors do care about windows support, the only way is a separate interface that is driven via ACPI. You then need an ACPI specific register to switch to standard SHPC. The SHPC spec even tells you as much.
Just to give another example, interface in this patch scales trivially to multi-root configurations, if we ever want to support them.
-- MST