PIIX spec says that most its registers are reset on suspend. However, QEMU didn't reset most of them properly. Now that it started doing that, resume is broken. Fix it up by probing system devices on bus 0 and re-initializing them.
Signed-off-by: Michael S. Tsirkin mst@redhat.com Reported-by: Gal Hammer ghammer@redhat.com Reviewed-by: Marcel Apfelbaum marcel.a@redhat.com Tested-by: Marcel Apfelbaum marcel.a@redhat.com --- src/hw/pci.h | 1 + src/util.h | 1 + src/fw/pciinit.c | 5 +++++ src/hw/pci.c | 31 ++++++++++++++++++++----------- src/resume.c | 8 ++++++++ 5 files changed, 35 insertions(+), 11 deletions(-)
diff --git a/src/hw/pci.h b/src/hw/pci.h index 9c7351d..a64f7c5 100644 --- a/src/hw/pci.h +++ b/src/hw/pci.h @@ -66,6 +66,7 @@ extern u64 pcimem64_start, pcimem64_end; extern struct hlist_head PCIDevices; extern int MaxPCIBus; int pci_probe_host(void); +void pci_probe_device(int bdf, struct pci_device *dev); void pci_probe_devices(void); static inline u32 pci_classprog(struct pci_device *pci) { return (pci->class << 8) | pci->prog_if; diff --git a/src/util.h b/src/util.h index e6a6cb5..3a24dba 100644 --- a/src/util.h +++ b/src/util.h @@ -28,6 +28,7 @@ void boot_add_cbfs(void *data, const char *desc, int prio); void interactive_bootmenu(void); void bcv_prepboot(void); struct pci_device; +void pci_bios_resume_device(struct pci_device *pci); int bootprio_find_pci_device(struct pci_device *pci); int bootprio_find_scsi_device(struct pci_device *pci, int target, int lun); int bootprio_find_ata_device(struct pci_device *pci, int chanid, int slave); diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c index 34279a4..a4cedfb 100644 --- a/src/fw/pciinit.c +++ b/src/fw/pciinit.c @@ -293,6 +293,11 @@ static const struct pci_device_id pci_device_tbl[] = { PCI_DEVICE_END, };
+void pci_bios_resume_device(struct pci_device *pci) +{ + pci_init_device(pci_device_tbl, pci, NULL); +} + static void pci_bios_init_device(struct pci_device *pci) { u16 bdf = pci->bdf; diff --git a/src/hw/pci.c b/src/hw/pci.c index 6c9aa81..c2873c3 100644 --- a/src/hw/pci.c +++ b/src/hw/pci.c @@ -105,6 +105,24 @@ pci_probe_host(void) return 0; }
+void +pci_probe_device(int bdf, struct pci_device *dev) +{ + dev->bdf = bdf; + u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); + dev->vendor = vendev & 0xffff; + dev->device = vendev >> 16; + u32 classrev = pci_config_readl(bdf, PCI_CLASS_REVISION); + dev->class = classrev >> 16; + dev->prog_if = classrev >> 8; + dev->revision = classrev & 0xff; + dev->header_type = pci_config_readb(bdf, PCI_HEADER_TYPE); + u8 v = dev->header_type & 0x7f; + if (v == PCI_HEADER_TYPE_BRIDGE || v == PCI_HEADER_TYPE_CARDBUS) { + u8 secbus = pci_config_readb(bdf, PCI_SECONDARY_BUS); + dev->secondary_bus = secbus; + } +} // Find all PCI devices and populate PCIDevices linked list. void pci_probe_devices(void) @@ -145,21 +163,12 @@ pci_probe_devices(void) }
// Populate pci_device info. - dev->bdf = bdf; + pci_probe_device(bdf, dev); dev->parent = parent; dev->rootbus = rootbus; - u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); - dev->vendor = vendev & 0xffff; - dev->device = vendev >> 16; - u32 classrev = pci_config_readl(bdf, PCI_CLASS_REVISION); - dev->class = classrev >> 16; - dev->prog_if = classrev >> 8; - dev->revision = classrev & 0xff; - dev->header_type = pci_config_readb(bdf, PCI_HEADER_TYPE); u8 v = dev->header_type & 0x7f; if (v == PCI_HEADER_TYPE_BRIDGE || v == PCI_HEADER_TYPE_CARDBUS) { - u8 secbus = pci_config_readb(bdf, PCI_SECONDARY_BUS); - dev->secondary_bus = secbus; + u8 secbus = dev->secondary_bus; if (secbus > bus && !busdevs[secbus]) busdevs[secbus] = dev; if (secbus > MaxPCIBus) diff --git a/src/resume.c b/src/resume.c index fc2fee9..9aac853 100644 --- a/src/resume.c +++ b/src/resume.c @@ -101,6 +101,14 @@ s3_resume(void) pic_setup(); smm_setup();
+ int bdf; + foreachbdf(bdf, 0) { + // Create new pci_device struct and add to list. + struct pci_device pci; + pci_probe_device(bdf, &pci); + pci_bios_resume_device(&pci); + } + s3_resume_vga();
make_bios_readonly();
On Thu, Dec 19, 2013 at 01:49:02PM +0200, Michael S. Tsirkin wrote:
PIIX spec says that most its registers are reset on suspend. However, QEMU didn't reset most of them properly. Now that it started doing that, resume is broken. Fix it up by probing system devices on bus 0 and re-initializing them.
Thanks. This patch runs part of the "pciinit" code and not all of it. What's the reasoning behind which code to run and which code not to run? Re-running parts of pciinit on resume is a change in behaviour and I'm concerned about unintended consequences. Is this definitely what should be setup for PCI on resume - nothing more should be done and nothing less?
Also, we were just about to release SeaBIOS v1.7.4. I don't think this behaviour change would be good to add so late in the release cycle. (As a side note, the patch would not be correct for coreboot and CSM users.) That means either delaying the release, or implementing this change as part of a future stable release.
Finally, if the issue is just on the PIIX PM device, would a more targeted change be more appropriate? For example, something like:
void qemu_piix_resume_fixup(void) { if (!CONFIG_QEMU) return; int bdf; foreachbdf(bdf, 0) { u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); u16 vendor = vendev & 0xffff, device = vendev >> 16; if (vendor == PCI_VENDOR_ID_INTEL && device == PCI_DEVICE_ID_INTEL_82371AB_3) { pci_config_writeb(bdf, 0x80, 0x01); ... return; } } }
-Kevin
On Thu, Dec 19, 2013 at 12:04:25PM -0500, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 01:49:02PM +0200, Michael S. Tsirkin wrote:
PIIX spec says that most its registers are reset on suspend. However, QEMU didn't reset most of them properly. Now that it started doing that, resume is broken. Fix it up by probing system devices on bus 0 and re-initializing them.
Thanks. This patch runs part of the "pciinit" code and not all of it. What's the reasoning behind which code to run and which code not to run? Re-running parts of pciinit on resume is a change in behaviour and I'm concerned about unintended consequences. Is this definitely what should be setup for PCI on resume - nothing more should be done and nothing less?
I'm not sure about nothing less. It is possible that there are some registers here that we don't need to touch.
Nothing more, I am pretty sure.
We must not touch anything that OS might have changed, or there will be a conflict. This is why we don't touch interrupt registers and BARs - OS can change these and it must put them back exactly the way they were, if we enable some BARs there will be a conflict.
Also, we were just about to release SeaBIOS v1.7.4. I don't think this behaviour change would be good to add so late in the release cycle. (As a side note, the patch would not be correct for coreboot and CSM users.)
Interesting. Why won't it?
That means either delaying the release, or implementing this change as part of a future stable release.
You decide :)
Finally, if the issue is just on the PIIX PM device, would a more targeted change be more appropriate? For example, something like:
void qemu_piix_resume_fixup(void) { if (!CONFIG_QEMU) return; int bdf; foreachbdf(bdf, 0) { u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); u16 vendor = vendev & 0xffff, device = vendev >> 16; if (vendor == PCI_VENDOR_ID_INTEL && device == PCI_DEVICE_ID_INTEL_82371AB_3) { pci_config_writeb(bdf, 0x80, 0x01);
IMO this is definitely not enough. E.g. I'm pretty sure we should set the IO address before we enable IO.
... return; } }
}
-Kevin
On Thu, Dec 19, 2013 at 08:40:23PM +0200, Michael S. Tsirkin wrote:
On Thu, Dec 19, 2013 at 12:04:25PM -0500, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 01:49:02PM +0200, Michael S. Tsirkin wrote:
PIIX spec says that most its registers are reset on suspend. However, QEMU didn't reset most of them properly. Now that it started doing that, resume is broken. Fix it up by probing system devices on bus 0 and re-initializing them.
Thanks. This patch runs part of the "pciinit" code and not all of it. What's the reasoning behind which code to run and which code not to run? Re-running parts of pciinit on resume is a change in behaviour and I'm concerned about unintended consequences. Is this definitely what should be setup for PCI on resume - nothing more should be done and nothing less?
I'm not sure about nothing less. It is possible that there are some registers here that we don't need to touch.
Nothing more, I am pretty sure.
We must not touch anything that OS might have changed, or there will be a conflict. This is why we don't touch interrupt registers and BARs - OS can change these and it must put them back exactly the way they were, if we enable some BARs there will be a conflict.
If I recall correctly, the firmware is not required to restore device state. But, agreed, if the firmware restored some devices from the OS state and initialized other devices and that caused a conflict, then that would not be good.
Also, we were just about to release SeaBIOS v1.7.4. I don't think this behaviour change would be good to add so late in the release cycle. (As a side note, the patch would not be correct for coreboot and CSM users.)
Interesting. Why won't it?
On CSM and coreboot, it's the responsibility of UEFI/coreboot to do pci init. Performing the qemu specific firmware init on real hardware would likely cause problems. The code is specific to qemu, so it should be wrapped in a "if (CONFIG_QEMU) ...".
That means either delaying the release, or implementing this change as part of a future stable release.
You decide :)
If ultimately we need a significant change in behavior, then I'd lean torwards going forward with the current release and then make a follow up stable release for this change once it is ready.
Finally, if the issue is just on the PIIX PM device, would a more targeted change be more appropriate? For example, something like:
void qemu_piix_resume_fixup(void) { if (!CONFIG_QEMU) return; int bdf; foreachbdf(bdf, 0) { u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); u16 vendor = vendev & 0xffff, device = vendev >> 16; if (vendor == PCI_VENDOR_ID_INTEL && device == PCI_DEVICE_ID_INTEL_82371AB_3) { pci_config_writeb(bdf, 0x80, 0x01);
IMO this is definitely not enough. E.g. I'm pretty sure we should set the IO address before we enable IO.
But, could the resume code be targetted to just PCI_DEVICE_ID_INTEL_82371AB_3, or do you think an init needs to be done to all/many PCI devices?
pci_config_writel(bdf, 0x40, PORT_ACPI_PM_BASE | 1); pci_config_writeb(bdf, 0x80, 0x01); /* enable PM io space */ pci_config_writel(bdf, 0x90, PORT_SMB_BASE | 1); pci_config_writeb(bdf, 0xd2, 0x09); /* enable SMBus io space */
-Kevin
On 12/19/13 20:39, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 08:40:23PM +0200, Michael S. Tsirkin wrote:
On Thu, Dec 19, 2013 at 12:04:25PM -0500, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 01:49:02PM +0200, Michael S. Tsirkin wrote:
PIIX spec says that most its registers are reset on suspend. However, QEMU didn't reset most of them properly. Now that it started doing that, resume is broken. Fix it up by probing system devices on bus 0 and re-initializing them.
Thanks. This patch runs part of the "pciinit" code and not all of it. What's the reasoning behind which code to run and which code not to run? Re-running parts of pciinit on resume is a change in behaviour and I'm concerned about unintended consequences. Is this definitely what should be setup for PCI on resume - nothing more should be done and nothing less?
I'm not sure about nothing less. It is possible that there are some registers here that we don't need to touch.
Nothing more, I am pretty sure.
We must not touch anything that OS might have changed, or there will be a conflict. This is why we don't touch interrupt registers and BARs - OS can change these and it must put them back exactly the way they were, if we enable some BARs there will be a conflict.
If I recall correctly, the firmware is not required to restore device state. But, agreed, if the firmware restored some devices from the OS state and initialized other devices and that caused a conflict, then that would not be good.
Also, we were just about to release SeaBIOS v1.7.4. I don't think this behaviour change would be good to add so late in the release cycle. (As a side note, the patch would not be correct for coreboot and CSM users.)
Interesting. Why won't it?
On CSM and coreboot, it's the responsibility of UEFI/coreboot to do pci init. Performing the qemu specific firmware init on real hardware would likely cause problems. The code is specific to qemu, so it should be wrapped in a "if (CONFIG_QEMU) ...".
That means either delaying the release, or implementing this change as part of a future stable release.
You decide :)
If ultimately we need a significant change in behavior, then I'd lean torwards going forward with the current release and then make a follow up stable release for this change once it is ready.
Finally, if the issue is just on the PIIX PM device, would a more targeted change be more appropriate? For example, something like:
void qemu_piix_resume_fixup(void) { if (!CONFIG_QEMU) return; int bdf; foreachbdf(bdf, 0) { u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); u16 vendor = vendev & 0xffff, device = vendev >> 16; if (vendor == PCI_VENDOR_ID_INTEL && device == PCI_DEVICE_ID_INTEL_82371AB_3) { pci_config_writeb(bdf, 0x80, 0x01);
IMO this is definitely not enough. E.g. I'm pretty sure we should set the IO address before we enable IO.
But, could the resume code be targetted to just PCI_DEVICE_ID_INTEL_82371AB_3, or do you think an init needs to be done to all/many PCI devices?
pci_config_writel(bdf, 0x40, PORT_ACPI_PM_BASE | 1); pci_config_writeb(bdf, 0x80, 0x01); /* enable PM io space */ pci_config_writel(bdf, 0x90, PORT_SMB_BASE | 1); pci_config_writeb(bdf, 0xd2, 0x09); /* enable SMBus io space */
(Sorry for butting in.)
Currently OVMF cares about the former only, and it seems to suffice.
Thanks, Laszlo
On Thu, Dec 19, 2013 at 09:52:39PM +0100, Laszlo Ersek wrote:
On 12/19/13 20:39, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 08:40:23PM +0200, Michael S. Tsirkin wrote:
On Thu, Dec 19, 2013 at 12:04:25PM -0500, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 01:49:02PM +0200, Michael S. Tsirkin wrote:
PIIX spec says that most its registers are reset on suspend. However, QEMU didn't reset most of them properly. Now that it started doing that, resume is broken. Fix it up by probing system devices on bus 0 and re-initializing them.
Thanks. This patch runs part of the "pciinit" code and not all of it. What's the reasoning behind which code to run and which code not to run? Re-running parts of pciinit on resume is a change in behaviour and I'm concerned about unintended consequences. Is this definitely what should be setup for PCI on resume - nothing more should be done and nothing less?
I'm not sure about nothing less. It is possible that there are some registers here that we don't need to touch.
Nothing more, I am pretty sure.
We must not touch anything that OS might have changed, or there will be a conflict. This is why we don't touch interrupt registers and BARs - OS can change these and it must put them back exactly the way they were, if we enable some BARs there will be a conflict.
If I recall correctly, the firmware is not required to restore device state. But, agreed, if the firmware restored some devices from the OS state and initialized other devices and that caused a conflict, then that would not be good.
Also, we were just about to release SeaBIOS v1.7.4. I don't think this behaviour change would be good to add so late in the release cycle. (As a side note, the patch would not be correct for coreboot and CSM users.)
Interesting. Why won't it?
On CSM and coreboot, it's the responsibility of UEFI/coreboot to do pci init. Performing the qemu specific firmware init on real hardware would likely cause problems. The code is specific to qemu, so it should be wrapped in a "if (CONFIG_QEMU) ...".
That means either delaying the release, or implementing this change as part of a future stable release.
You decide :)
If ultimately we need a significant change in behavior, then I'd lean torwards going forward with the current release and then make a follow up stable release for this change once it is ready.
Finally, if the issue is just on the PIIX PM device, would a more targeted change be more appropriate? For example, something like:
void qemu_piix_resume_fixup(void) { if (!CONFIG_QEMU) return; int bdf; foreachbdf(bdf, 0) { u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); u16 vendor = vendev & 0xffff, device = vendev >> 16; if (vendor == PCI_VENDOR_ID_INTEL && device == PCI_DEVICE_ID_INTEL_82371AB_3) { pci_config_writeb(bdf, 0x80, 0x01);
IMO this is definitely not enough. E.g. I'm pretty sure we should set the IO address before we enable IO.
But, could the resume code be targetted to just PCI_DEVICE_ID_INTEL_82371AB_3, or do you think an init needs to be done to all/many PCI devices?
pci_config_writel(bdf, 0x40, PORT_ACPI_PM_BASE | 1); pci_config_writeb(bdf, 0x80, 0x01); /* enable PM io space */ pci_config_writel(bdf, 0x90, PORT_SMB_BASE | 1); pci_config_writeb(bdf, 0xd2, 0x09); /* enable SMBus io space */
(Sorry for butting in.)
Currently OVMF cares about the former only, and it seems to suffice.
Thanks, Laszlo
One useful test to try is to dump full 256 byte device config before and after resume. It should match for all devices. I'll look into this next week if no one beats me to it.
On Fri, 2013-12-20 at 00:18 +0200, Michael S. Tsirkin wrote:
On Thu, Dec 19, 2013 at 09:52:39PM +0100, Laszlo Ersek wrote:
On 12/19/13 20:39, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 08:40:23PM +0200, Michael S. Tsirkin wrote:
On Thu, Dec 19, 2013 at 12:04:25PM -0500, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 01:49:02PM +0200, Michael S. Tsirkin wrote:
PIIX spec says that most its registers are reset on suspend. However, QEMU didn't reset most of them properly. Now that it started doing that, resume is broken. Fix it up by probing system devices on bus 0 and re-initializing them.
Thanks. This patch runs part of the "pciinit" code and not all of it. What's the reasoning behind which code to run and which code not to run? Re-running parts of pciinit on resume is a change in behaviour and I'm concerned about unintended consequences. Is this definitely what should be setup for PCI on resume - nothing more should be done and nothing less?
I'm not sure about nothing less. It is possible that there are some registers here that we don't need to touch.
Nothing more, I am pretty sure.
We must not touch anything that OS might have changed, or there will be a conflict. This is why we don't touch interrupt registers and BARs - OS can change these and it must put them back exactly the way they were, if we enable some BARs there will be a conflict.
If I recall correctly, the firmware is not required to restore device state. But, agreed, if the firmware restored some devices from the OS state and initialized other devices and that caused a conflict, then that would not be good.
Also, we were just about to release SeaBIOS v1.7.4. I don't think this behaviour change would be good to add so late in the release cycle. (As a side note, the patch would not be correct for coreboot and CSM users.)
Interesting. Why won't it?
On CSM and coreboot, it's the responsibility of UEFI/coreboot to do pci init. Performing the qemu specific firmware init on real hardware would likely cause problems. The code is specific to qemu, so it should be wrapped in a "if (CONFIG_QEMU) ...".
That means either delaying the release, or implementing this change as part of a future stable release.
You decide :)
If ultimately we need a significant change in behavior, then I'd lean torwards going forward with the current release and then make a follow up stable release for this change once it is ready.
Finally, if the issue is just on the PIIX PM device, would a more targeted change be more appropriate? For example, something like:
void qemu_piix_resume_fixup(void) { if (!CONFIG_QEMU) return; int bdf; foreachbdf(bdf, 0) { u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); u16 vendor = vendev & 0xffff, device = vendev >> 16; if (vendor == PCI_VENDOR_ID_INTEL && device == PCI_DEVICE_ID_INTEL_82371AB_3) { pci_config_writeb(bdf, 0x80, 0x01);
IMO this is definitely not enough. E.g. I'm pretty sure we should set the IO address before we enable IO.
But, could the resume code be targetted to just PCI_DEVICE_ID_INTEL_82371AB_3, or do you think an init needs to be done to all/many PCI devices?
pci_config_writel(bdf, 0x40, PORT_ACPI_PM_BASE | 1); pci_config_writeb(bdf, 0x80, 0x01); /* enable PM io space */ pci_config_writel(bdf, 0x90, PORT_SMB_BASE | 1); pci_config_writeb(bdf, 0xd2, 0x09); /* enable SMBus io space */
(Sorry for butting in.)
Currently OVMF cares about the former only, and it seems to suffice.
Thanks, Laszlo
One useful test to try is to dump full 256 byte device config before and after resume. It should match for all devices. I'll look into this next week if no one beats me to it.
I used a patch that modifies only the pm config registers, otherwise the os doesn't load and the pci configuration is not restored...
Before resume:
------------------------------------------------------ 0:0.0 0x00h: 0x12378086 0x00000103 0x06000002 0x00000000 [...] 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 [...] 0x50h: 0x00000000 0xff000000 0x11111000 0x33111111 [...] 0x70h: 0x00020000 0x00000000 0x00000000 0x00000000 [...]
0:1.0 0x00h: 0x70008086 0x02000103 0x06010000 0x00800000 [...] 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 [...] 0x40h: 0x00000000 0x00000000 0x00000000 0x0003004d [...] 0x60h: 0x0b0a0a8a 0x00000000 0x00000200 0x00000000 0x70h: 0x00000080 0x0c0c0000 0x00000002 0x00000000 0x80h: 0x00020000 0x00000000 0x00000000 0x00000000 [...] 0xa0h: 0x00000008 0x00000000 0x0000000f 0x00000000 [...]
0:1.1 0x00h: 0x70108086 0x02800107 0x01018000 0x00000000 [...] 0x20h: 0x0000c0a1 0x00000000 0x00000000 0x11001af4 [...] 0x40h: 0xe303c000 0x00000000 0x00000000 0x00000000 [...]
0:1.3 0x00h: 0x71138086 0x02800103 0x06800003 0x00000000 [...] 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 0x30h: 0x00000000 0x00000000 0x00000000 0x00000109 0x40h: 0x0000b001 0x00000000 0x00000000 0x00000000 0x50h: 0x00000000 0x00000000 0x02000000 0x90000000 0x60h: 0x60000000 0x98000000 0x00000000 0x00000000 [...] 0x80h: 0x00000001 0x00000000 0x00000000 0x00000000 0x90h: 0x0000b101 0x00000000 0x00000000 0x00000000 [...] 0xd0h: 0x00090000 0x00000000 0x00000000 0x00000000 [...]
0:2.0 0x00h: 0x01001b36 0x00000103 0x03000004 0x00000000 0x10h: 0xf4000000 0xf8000000 0xfc070000 0x0000c081 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 0x30h: 0xfc060000 0x00000000 0x00000000 0x0000010a [...]
0:3.0 0x00h: 0x100e8086 0x00000107 0x02000003 0x00000000 0x10h: 0xfc040000 0x0000c001 0x00000000 0x00000000 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 0x30h: 0xfc000000 0x00000000 0x00000000 0x0000010b [...]
0:4.0 0x00h: 0x10011af4 0x00100507 0x01000000 0x00000000 0x10h: 0x0000c041 0xfc072000 0x00000000 0x00000000 0x20h: 0x00000000 0x00000000 0x00000000 0x00021af4 0x30h: 0x00000000 0x00000040 0x00000000 0x0000010b 0x40h: 0x80010011 0x00000001 0x00000801 0x00000000 [...]
------------------------------------------------------------
After resume: 0:0.0 0x00h: 0x12378086 0x00000103 0x06000002 0x00000000 [...] 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 [...] 0x50h: 0x00000000 0xff000000 0x11111000 0x11111111 [...] 0x70h: 0x00020000 0x00000000 0x00000000 0x00000000 [...]
0:1.0 0x00h: 0x70008086 0x02000103 0x06010000 0x00800000 [...] 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 [...] 0x40h: 0x00000000 0x00000000 0x00000000 0x0003004d [...] 0x60h: 0x0b0a0a80 0x00000000 0x00000200 0x00000000 0x70h: 0x00000080 0x0c0c0000 0x00000002 0x00000000 0x80h: 0x00020000 0x00000000 0x00000000 0x00000000 [...] 0xa0h: 0x00000008 0x00000000 0x0000000f 0x00000000 [...]
0:1.1 0x00h: 0x70108086 0x02800107 0x01018000 0x00000000 [...] 0x20h: 0x0000c0a1 0x00000000 0x00000000 0x11001af4 [...] 0x40h: 0xe303c000 0x00000000 0x00000000 0x00000000 [...]
0:1.3 0x00h: 0x71138086 0x02800103 0x06800003 0x00000000 [...] 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 0x30h: 0x00000000 0x00000000 0x00000000 0x00000109 0x40h: 0x0000b001 0x00000000 0x00000000 0x00000000 0x50h: 0x00000000 0x00000000 0x02000000 0x90000000 0x60h: 0x60000000 0x98000000 0x00000000 0x00000000 [...] 0x80h: 0x00000001 0x00000000 0x00000000 0x00000000 0x90h: 0x0000b101 0x00000000 0x00000000 0x00000000 [...] 0xd0h: 0x00090000 0x00000000 0x00000000 0x00000000 [...]
0:2.0 0x00h: 0x01001b36 0x00000103 0x03000004 0x00000000 0x10h: 0xf4000000 0xf8000000 0xfc070000 0x0000c081 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 0x30h: 0xfc060000 0x00000000 0x00000000 0x0000010a [...]
0:3.0 0x00h: 0x100e8086 0x00000107 0x02000003 0x00000000 0x10h: 0xfc040000 0x0000c001 0x00000000 0x00000000 0x20h: 0x00000000 0x00000000 0x00000000 0x11001af4 0x30h: 0xfc000000 0x00000000 0x00000000 0x0000010b [...]
0:4.0 0x00h: 0x10011af4 0x00100507 0x01000000 0x00000000 0x10h: 0x0000c041 0xfc072000 0x00000000 0x00000000 0x20h: 0x00000000 0x00000000 0x00000000 0x00021af4 0x30h: 0x00000000 0x00000040 0x00000000 0x0000010b 0x40h: 0x80010011 0x00000001 0x00000801 0x00000000 [...]
---------------------------------------------------------------
We have 2 differences (besides the obvious pm config register) 1. 0:0.0 0x50h: 0x00000000 0xff000000 0x11111000 0x33111111 => 0x50h: 0x00000000 0xff000000 0x11111000 0x11111111 Analysis: - From Intel 440FX spec: - PAM0 (59h) - PAM6 (5Fh): - Default value: 0x0 - PAM0[3:0] - reserved, PAM0[7:4] points to BIOS Area attribute bits - Before resume, BIOS Area had RW attribute - After resume, BIOS Area had RE attribute Outcome: I am not an expert on PAM, but it seems that the value after resume (RE) is ok for BIOS region, no idea why before it was RW. So everything OK here.
2. 0:1.0 0x60h: 0x0b0a0a8a 0x00000000 0x00000200 0x00000000 => 0x60h: 0x0b0a0a80 0x00000000 0x00000200 0x00000000 Analysis: - From PIIX3 spec: - 60h (PIRQRCA#)—63h (PIRQRCD#): PIRQx ROUTE CONTROL REGISTERS - Default value: 0x80 - Bit 7 Interrupt Routing Enable. 0=Enable; 1=Disable - The value is different, 8a -> 80, but bit 7 is set, so it remains disabled - After resume it receives the default value, but before this is dirty Outcome: Everything OK here
Thanks, Marcel
On Thu, Dec 19, 2013 at 02:39:45PM -0500, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 08:40:23PM +0200, Michael S. Tsirkin wrote:
On Thu, Dec 19, 2013 at 12:04:25PM -0500, Kevin O'Connor wrote:
On Thu, Dec 19, 2013 at 01:49:02PM +0200, Michael S. Tsirkin wrote:
PIIX spec says that most its registers are reset on suspend. However, QEMU didn't reset most of them properly. Now that it started doing that, resume is broken. Fix it up by probing system devices on bus 0 and re-initializing them.
Thanks. This patch runs part of the "pciinit" code and not all of it. What's the reasoning behind which code to run and which code not to run? Re-running parts of pciinit on resume is a change in behaviour and I'm concerned about unintended consequences. Is this definitely what should be setup for PCI on resume - nothing more should be done and nothing less?
I'm not sure about nothing less. It is possible that there are some registers here that we don't need to touch.
Nothing more, I am pretty sure.
We must not touch anything that OS might have changed, or there will be a conflict. This is why we don't touch interrupt registers and BARs - OS can change these and it must put them back exactly the way they were, if we enable some BARs there will be a conflict.
If I recall correctly, the firmware is not required to restore device state. But, agreed, if the firmware restored some devices from the OS state and initialized other devices and that caused a conflict, then that would not be good.
Also, we were just about to release SeaBIOS v1.7.4. I don't think this behaviour change would be good to add so late in the release cycle. (As a side note, the patch would not be correct for coreboot and CSM users.)
Interesting. Why won't it?
On CSM and coreboot, it's the responsibility of UEFI/coreboot to do pci init. Performing the qemu specific firmware init on real hardware would likely cause problems. The code is specific to qemu, so it should be wrapped in a "if (CONFIG_QEMU) ...".
I am not sure it's qemu specific. The hardware spec states explicitly that all PIIX registers are reset on suspend, and OS does not seem to restore them, so it seems that we have to. At this point it looks like the fact things worked without this is due to qemu specific behaviour of not reseting some registers on suspend.
That means either delaying the release, or implementing this change as part of a future stable release.
You decide :)
If ultimately we need a significant change in behavior, then I'd lean torwards going forward with the current release and then make a follow up stable release for this change once it is ready.
Finally, if the issue is just on the PIIX PM device, would a more targeted change be more appropriate? For example, something like:
void qemu_piix_resume_fixup(void) { if (!CONFIG_QEMU) return; int bdf; foreachbdf(bdf, 0) { u32 vendev = pci_config_readl(bdf, PCI_VENDOR_ID); u16 vendor = vendev & 0xffff, device = vendev >> 16; if (vendor == PCI_VENDOR_ID_INTEL && device == PCI_DEVICE_ID_INTEL_82371AB_3) { pci_config_writeb(bdf, 0x80, 0x01);
IMO this is definitely not enough. E.g. I'm pretty sure we should set the IO address before we enable IO.
But, could the resume code be targetted to just PCI_DEVICE_ID_INTEL_82371AB_3, or do you think an init needs to be done to all/many PCI devices?
I suspect we need to init more devices, but I need to look some more into this. Basically whatever OS does not restore, we have to. QEMU currently retains a lot of state across resets so we get away with not restoring everything neither from bios nor from OS, but I think it's a bug.
pci_config_writel(bdf, 0x40, PORT_ACPI_PM_BASE | 1); pci_config_writeb(bdf, 0x80, 0x01); /* enable PM io space */ pci_config_writel(bdf, 0x90, PORT_SMB_BASE | 1); pci_config_writeb(bdf, 0xd2, 0x09); /* enable SMBus io space */
-Kevin
On 12/19/13 23:16, Michael S. Tsirkin wrote:
I suspect we need to init more devices, but I need to look some more into this. Basically whatever OS does not restore, we have to.
I don't think that implication is necessarily true. Just because neither restores some device state currently doesn't imply the firmware should restore it. The OS is much better equipped to restore complex state; the firmware should only have to ensure that the OS gets a chance to run again.
The Platform Init spec 1.2.1 says in Vol5, 8.5.1.3.1 DXE and the S3 Resume Boot Path,
The ACPI specification only requires the BIOS to restore chipset and processor configuration. [...]
I'm not sure which part of the ACPI spec they refer to, and of course this statement may not be overly accurate either, but it does seem to limit the firmware's responsibility.
Laszlo
On Thu, Dec 19, 2013 at 11:32:03PM +0100, Laszlo Ersek wrote:
On 12/19/13 23:16, Michael S. Tsirkin wrote:
I suspect we need to init more devices, but I need to look some more into this. Basically whatever OS does not restore, we have to.
I don't think that implication is necessarily true. Just because neither restores some device state currently doesn't imply the firmware should restore it. The OS is much better equipped to restore complex state;
If you look at my patch, there's nothing complex there.
the firmware should only have to ensure that the OS gets a chance to run again.
The Platform Init spec 1.2.1 says in Vol5, 8.5.1.3.1 DXE and the S3 Resume Boot Path,
The ACPI specification only requires the BIOS to restore chipset and processor configuration. [...]
All of PIIX and Q35 is part if the chipset though.
I'm not sure which part of the ACPI spec they refer to, and of course this statement may not be overly accurate either, but it does seem to limit the firmware's responsibility.
Laszlo
Basically OSes are tested on real hardware. Whatever firmware for real boards restores, OS-es won't restore so we have to.
On Fr, 2013-12-20 at 00:40 +0200, Michael S. Tsirkin wrote:
On Thu, Dec 19, 2013 at 11:32:03PM +0100, Laszlo Ersek wrote:
On 12/19/13 23:16, Michael S. Tsirkin wrote:
I suspect we need to init more devices, but I need to look some more into this. Basically whatever OS does not restore, we have to.
I don't think that implication is necessarily true. Just because neither restores some device state currently doesn't imply the firmware should restore it. The OS is much better equipped to restore complex state;
If you look at my patch, there's nothing complex there.
You don't restore state, you re-initialize things. That is something different.
In case of the i440fx northbridge this probably is pretty much the same as the smbase + pmbase register don't change at runtime. That isn't true for other devices though.
The Platform Init spec 1.2.1 says in Vol5, 8.5.1.3.1 DXE and the S3 Resume Boot Path,
The ACPI specification only requires the BIOS to restore chipset and processor configuration. [...]
All of PIIX and Q35 is part if the chipset though.
I doubt this includes all piix/q35 onboard devices (ide/sata, usb, nic, ...).
The smbios/pmbase register bases are a special case as these registers are needed do handle suspend and resume correctly. I suspect this is the reason why they are kind-of hidden (i.e. not implemented as normal pci i/o bars) in the first place.
Also chipset-specific configuration goes into this category I expect. Basically all the stuff you can configure in the bios setup utility on physical hardware, i.e. which LPC devices (serial, parallel, ...) are enabled/disabled, whenever the storage controller operates in legacy (aka IDE) or native (AHCI) mode, ...
I think we should go go a minimum patch which specifically reinitializes the registers which actually need reinitialization.
I'm not sure which part of the ACPI spec they refer to, and of course this statement may not be overly accurate either, but it does seem to limit the firmware's responsibility.
Laszlo
Basically OSes are tested on real hardware. Whatever firmware for real boards restores, OS-es won't restore so we have to.
For USB host controllers I'm sure this is the job of the OS. At least the linux kernel handles it just fine. It'll figure whenever the uhci controller lost power, if not it resumes the controller, otherwise it does a full re-initialization and usb bus scan. On qemu the later is done as we do a full device reset on s3 resume.
xhci controllers have facilities to save/restore internal state to/from ram, so the OS doesn't need to do a full bus rescan on resume even in case the xhci controller is fully powered off.
cheers, Gerd
On Mon, 2014-01-13 at 09:58 +0100, Gerd Hoffmann wrote:
On Fr, 2013-12-20 at 00:40 +0200, Michael S. Tsirkin wrote:
On Thu, Dec 19, 2013 at 11:32:03PM +0100, Laszlo Ersek wrote:
On 12/19/13 23:16, Michael S. Tsirkin wrote:
I suspect we need to init more devices, but I need to look some more into this. Basically whatever OS does not restore, we have to.
I don't think that implication is necessarily true. Just because neither restores some device state currently doesn't imply the firmware should restore it. The OS is much better equipped to restore complex state;
If you look at my patch, there's nothing complex there.
You don't restore state, you re-initialize things. That is something different.
In case of the i440fx northbridge this probably is pretty much the same as the smbase + pmbase register don't change at runtime. That isn't true for other devices though.
The Platform Init spec 1.2.1 says in Vol5, 8.5.1.3.1 DXE and the S3 Resume Boot Path,
The ACPI specification only requires the BIOS to restore chipset and processor configuration. [...]
All of PIIX and Q35 is part if the chipset though.
I doubt this includes all piix/q35 onboard devices (ide/sata, usb, nic, ...).
The smbios/pmbase register bases are a special case as these registers are needed do handle suspend and resume correctly. I suspect this is the reason why they are kind-of hidden (i.e. not implemented as normal pci i/o bars) in the first place.
Also chipset-specific configuration goes into this category I expect. Basically all the stuff you can configure in the bios setup utility on physical hardware, i.e. which LPC devices (serial, parallel, ...) are enabled/disabled, whenever the storage controller operates in legacy (aka IDE) or native (AHCI) mode, ...
I think we should go go a minimum patch which specifically reinitializes the registers which actually need reinitialization.
I agree, I already sent a patch that does exactly that, I'll ping now. Thanks, Marcel
I'm not sure which part of the ACPI spec they refer to, and of course this statement may not be overly accurate either, but it does seem to limit the firmware's responsibility.
Laszlo
Basically OSes are tested on real hardware. Whatever firmware for real boards restores, OS-es won't restore so we have to.
For USB host controllers I'm sure this is the job of the OS. At least the linux kernel handles it just fine. It'll figure whenever the uhci controller lost power, if not it resumes the controller, otherwise it does a full re-initialization and usb bus scan. On qemu the later is done as we do a full device reset on s3 resume.
xhci controllers have facilities to save/restore internal state to/from ram, so the OS doesn't need to do a full bus rescan on resume even in case the xhci controller is fully powered off.
cheers, Gerd