On Mon, 2012-03-05 at 07:15 +0200, Michael S. Tsirkin wrote:
On Sun, Mar 04, 2012 at 08:30:00PM -0700, Alex Williamson wrote:
On Sun, 2012-03-04 at 20:53 +0200, Michael S. Tsirkin wrote:
On Fri, Feb 24, 2012 at 04:21:17PM -0700, Alex Williamson wrote:
When a Status method is provided on a slot, the OSPM evaluates _STA in response to the device check notify on the slot. This allows some degree of a handshake between the platform and the OSPM that the hotplug has been acknowledged.
In order to implement _STA, we need to know which slots have devices. A slot with device returns 0x0F, a slot without a device returns Zero. We get this information from Qemu using the 0xae08 I/O port register. This was previously the read-side of the register written to commit a device eject and always returned 0 on read. It now returns a bitmap of present slots, so we know that reading 0 means we have and old Qemu and dynamically modify our SSDT to rename the _STA methods. This is necessary to allow backwards compatibility.
Interesting. Isn't the UP register sufficient for _STA?
No, UP only reports the current slot being added, so we'd only be able to report a "present" value for that slot and not static or previously added slots.
It's probably a bug in qemu - for example, if two slots are added quickly we just lose the notification about the first one, right?
The current design up UP/DOWN is that they're transient. A bit in UP is set to cause a hot-add notification, a bit in DOWN is set to cause a hot remove.
The fix might involve making e.g. the UP register write 1 to clear, but it needs to be cleared on Notify, not on _STA.
The trouble with that, for my purposes, is that any OSPM will trigger the Notify, that's just part of GPE processing. Only hotplug capable OSPMs will respond to the Notify with a _STA check.
Assuming we want to implement _STA - for which the only motivation seems the handshake hack below.
The _STA method also writes the slot identifier to I/O port register 0xae00 as an acknowledgment of the hotplug request.
This part looks a bit like a hack. _STA is not intended as an acknowledgement - it's a query for state. ACPI spec 5.0 requires that _STA is called before _INI, but ACPI 1.0b doesn't. Did you try some 1.0 OSPMs (e.g. XP) to see what they do?
I did test with XP. Section 6.3 of ACPI spec 1.0b references the _STA method during hotplug. I also found this reference for Windows ACPI procedure for hotplug/unplug:
http://www.microsoft.com/china/whdc/system/pnppwr/hotadd/hotplugpci.mspx#EYH
I agree, _STA is not intended as an acknowledgment, but that doesn't mean we can't use it as one. The OSPM can call _STA at any point in time, however calling it after we've done a notify for device check is about the best indication we can get that the OSPM is processing it.
How about an actual access to the slot? The event that we send is just a change event. Guest accesses the slot to see whether any devices are present. No ACPI or interface changes are needed to detect this.
That's a possibility.
It doesn't hurt anything if _STA is called spuriously.
It makes its use as an acknowledgment unreliable: _STA called does *not* mean OSPM saw your hotplug event.
It seems to work well though. In practice, the OSPM doesn't go around randomly calling _STA on devices to poll if the status has changed. It does it in response to events, device check being one of those events. However, only an OSPM that intends to do something about a device check is going to call _STA in response to it. So the behavior I see is that a non-hotplug capable guest sits quietly and does nothing after the SCI is injected and GPE Notify is triggered, while hotplug capable guests respond by checking _STA.
I also think I see how this can cause a race, see below.
Signed-off-by: Alex Williamson alex.williamson@redhat.com
Your description of the qemu patches made me think that all you really want is detect an OS without OSPM. If that is the case, I would suggest adding an _INI method at top level as a simpler and more robust procedure.
No, having OSPM is a prerequisite, but does not imply supporting hotplug.
It does not? What are the OSPMs that ignore hotplug?
Linux: modprobe -r acpiphp (or just don't load it automatically)
Hotplug has been defined since ACPI 1.0 so this seems strange. But it will me much easier to discuss whether a specific hack is efficient if you define the specific bug this handshake tries to work around.
Well, there's the one above, not all guests support PCI hotplug. If you hot add a device, then remember acpiphp isn't loaded, the device is held hostage, unusable by the guest, un-removable by qemu. The only solution is to reboot the guest. There are clearly some bugs in the hotplug code blindly overwriting UP/DOWN bits that break hotplugging multiple devices or hotplugging at such a rate that we stomp on ourselves. That can result in trying to add a device, immediately removing it, clearing UP, setting DOWN, confusing the guest and again ending up with a device held hostage. That's also fixed by these patches. I also thought that since this adds a virtual "end" to a hot add, and we already have a beginning and end to a hot remove, we could think about adding parameters to devie_add/del to wait for a user specified time and report success/failure. Failure to add can then immediately remove the device.
Otherwise, how about implementing _PS0 (and probably _PS3) to manage slot power? Maybe this what you are really after, and it seems like a better interface than 'acknowledge' which does not seem to make sense for real hardware.
I tried this, _PS0/3 also requires _STA.
Interesting. Why does it require _STA?
Does the OSPM know to call _PS0/3 unless _STA reports the current state? The other serious problem that I remembered here is that Windows only powers up a slot if it has a driver. So if you hotplug a virio-net-pci device but don't have the drivers, Windows has ownership of the device, but doesn't power it up. _STA detects this, using _PS0 as an ack would not.
Implementing both caused interrupts to stop working on Linux guests.
So there's some bug then? Let's fix?
Given the above, I didn't see any reason to pursue it.
Note that _PS0/3 is even less closely associated with device removal in 1.0b than _STA even though the MSFT document references it.
ACPI spec references it :) It seems clear that _PS0 must be called before device is used, otherwise the slot has no power.
See problem above.
There's also _OST - linux doesn't implement it but it seems modern windows versions do.
If we us _OST then we have a solution that only works on very, very new guests... that's not much of a solution.
src/acpi-dsdt.dsl | 36 ++- src/acpi-dsdt.hex | 124 ++++++---- src/acpi.c | 27 ++ src/ssdt-pcihp.dsl | 3 src/ssdt-pcihp.hex | 658 ++++++++++++++++++++++++++++++++++++++++++++-------- 5 files changed, 686 insertions(+), 162 deletions(-)
diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 7082b65..6b87086 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -119,17 +119,15 @@ DefinitionBlock ( prt_slot3(0x001f), })
OperationRegion(PCST, SystemIO, 0xae00, 0x08)
OperationRegion(PCST, SystemIO, 0xae00, 0x0c) Field (PCST, DWordAcc, NoLock, WriteAsZeros) {
PCIU, 32,
PCID, 32,
}
OperationRegion(SEJ, SystemIO, 0xae08, 0x04)
Field (SEJ, DWordAcc, NoLock, WriteAsZeros)
{
B0EJ, 32,
// PCI Up/ACK
PUPA, 32,
// PCI Down
PDWN, 32,
// PCI Present/Eject
PPEJ, 32,
Note on the comment: this only affects bus0 not all of PCI.
As has always been the case.
But your comment implies otherwise :)
Hmm, hard to imagine how a 32bit register can signal anything more than a single bus.
} Name (_CRS, ResourceTemplate ()
@@ -462,10 +460,20 @@ DefinitionBlock ( /* Methods called by hotplug devices */ Method (PCEJ, 1, NotSerialized) { // _EJ0 method - eject callback
Store(ShiftLeft(1, Arg0), B0EJ)
Store(ShiftLeft(1, Arg0), PPEJ) Return (0x0) }
Method (PSTA, 1, NotSerialized) {
Store(ShiftLeft(1, Arg0), PUPA)
So this looks wrong to me.
Specifically _STA is also called at the end after _EJ0. If the device is ejected then insterted, you get a window where _STA is called and hardware will think insertion was acknowledged, while in fact ejection was acknowledged.
The qemu patch doesn't allow an insertion while an eject is pending.
I also think a request for the OS to rescan the bus will trigger _STA calls. Same race can get triggered.
Spurious _STA calls don't matter, they'll clear a bit that wasn't set in the UP register anyway. If there's a race with the hotplug SCI, ie. we've set UP, but OSPM performs a rescan, they'll noticed _STA now reports the device is present and I think that should lead to the proper result.
To be more explicit:
_EJ0
host adds new device
_STA triggered to check _EJ0 success
You now think OSPM acknowledged new device with _STA but it didn't. OSPM thinks that _EJ0 failed but it didn't.
Hmm, I wonder if we should clear DOWN from the qemu side of _STA too instead of from _EJ0.
Store(PPEJ, Local0)
If (And(Local0, ShiftLeft(1, Arg0))) {
Return(0x0F)
} Else {
Return(Zero)
}
}
- /* Hotplug notification method supplied by SSDT */ External (_SB.PCI0.PCNT, MethodObj)
@@ -473,12 +481,16 @@ DefinitionBlock ( Method(PCNF, 0) { // Local0 = iterator Store (Zero, Local0)
// Local1 = slots marked "up"
Store (PUPA, Local1)
// Local2 = slots marked "down"
Store (PDWN, Local2) While (LLess(Local0, 31)) { Increment(Local0)
If (And(PCIU, ShiftLeft(1, Local0))) {
If (And(Local1, ShiftLeft(1, Local0))) { PCNT(Local0, 1) }
If (And(PCID, ShiftLeft(1, Local0))) {
If (And(Local2, ShiftLeft(1, Local0))) { PCNT(Local0, 3) } }
Nothing wrong here but should be a separate patch?
It was pretty trivial, but I can split it if needed.
diff --git a/src/acpi-dsdt.hex b/src/acpi-dsdt.hex index 5dc7bb4..6d99f53 100644 --- a/src/acpi-dsdt.hex +++ b/src/acpi-dsdt.hex @@ -3,12 +3,12 @@ static unsigned char AmlCode[] = { 0x53, 0x44, 0x54, -0xd3, +0xeb, 0x10, 0x0,
...
I'd rather not see this part on list.
Then it should be .gitignore'd.
No, it's in git so people without iasl can build the bios.
Then should we also commit binaries to git so people without gcc can "build" the bios?
I'm just following the lead of previous patches in this space.
People tend to forget these blobs (I do sometimes) but it makes review awkward.
Noted.
[adding in follow-up] ...
The _STA method also writes the slot identifier to I/O port register 0xae00 as an acknowledgment of the hotplug request.
To summarize my previous messages, my notes are
- not clear that we want to implement _STA: yes we can tell hypervisor what did _STA report to OSPM but this won't be needed without _STA
/me tries to find the beginning of the circular logic.
- assuming we do, it seems clear that we want hypervisor to know what it is that we told OSPM about slot status
_STA always reports which slots have devices at the time it's called.
- the specific interface used for the above is fairly tricky so it needs documentation explaining how both sides cooperate
The field written from _STA is documented as an "ACK" and it's pretty clear that a write to that field clears the pending UP. There's also a pretty good comment in the commit log. It seems better documented than most of our ACPI interface. I can add more documentation, but this doesn't seem like your primary concern.
Perhaps Gleb is right that we should redesign the whole thing, removing UP/DOWN. Maybe we can come up with better synchronization logic that way. I'm not sure if a change like that can be made with any kind of backwards compatibility like this has though. I think we'd still depend on _STA for some kind of coordination too, so in the end, I'm not sure it's any better. Thanks,
Alex