Good Day,
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
If you have any ideas please let me know.
Daniel
On 19/05/2011 06:33, "Daniel Castro" evil.dani@gmail.com wrote:
Good Day,
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
If you have any ideas please let me know.
There's no easy way. Best effort might be to hook off the guest OS setting up its PV drivers. One of the first steps of that would be getting a hypercall transfer page, and also setting up event-channel delivery. It may be necessary for the hypervisor to give the BIOS some help by delivering a pre-registered callback on one of those events, to clean up. This is made uglier by the fact you don't know what execution mode the OS might be in when it triggers the callback. Needs a bit more thought.
-- Keir
Daniel
On Thu, 2011-05-19 at 08:19 +0100, Keir Fraser wrote:
On 19/05/2011 06:33, "Daniel Castro" evil.dani@gmail.com wrote:
Good Day,
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
If you have any ideas please let me know.
There's no easy way. Best effort might be to hook off the guest OS setting up its PV drivers. One of the first steps of that would be getting a hypercall transfer page, and also setting up event-channel delivery.
Probably the first thing it will do is hit the Xen CPUID leaf.
It may be necessary for the hypervisor to give the BIOS some help by delivering a pre-registered callback on one of those events, to clean up. This is made uglier by the fact you don't know what execution mode the OS might be in when it triggers the callback.
Virtual SMM? :-(
Needs a bit more thought.
We had a bit of a brainstorm yesterday and someone suggested that perhaps qemu could deal with it when it sees the I/O ports for the emulated device unplug get hit. That would potentially mean communicating a bunch of frontend state to qemu though, which could get pretty ugly.
More thought indeed.
Ian.
On 05/19/2011 10:01 AM, Ian Campbell wrote:
We had a bit of a brainstorm yesterday and someone suggested that perhaps qemu could deal with it when it sees the I/O ports for the emulated device unplug get hit.
You could imagine doing the unplug even under an OS that uses INT 13h. PV drivers also aren't forced to do the unplug: RHEL5 guests (up to 5.7 at least) use blacklisting, and the Red Hat PV drivers for Windows use a filter driver because I never got the unplug to work.
If SeaBIOS is guaranteed to never reinitialize the callback via, trapping writes to HVM_PARAM_CALLBACK_IRQ with an SMI-like effect sounds like the way to go, as everybody probably agrees at this point. You could also initate the shutdown from dom0, which you could easily do via ACPI even. ;)
STORE_PFN and STORE_EVTCHN parameters are usually written only by dom0 (in fact perhaps setting them could be made a privileged operation), which makes them less ideal.
Paolo
On 05/19/2011 10:01 AM, Ian Campbell wrote:
We had a bit of a brainstorm yesterday and someone suggested that perhaps qemu could deal with it when it sees the I/O ports for the emulated device unplug get hit.
You could imagine doing the unplug even under an OS that uses INT 13h. PV drivers also aren't forced to do the unplug: RHEL5 guests (up to
5.7
at least) use blacklisting, and the Red Hat PV drivers for Windows use
a
filter driver because I never got the unplug to work.
If SeaBIOS is guaranteed to never reinitialize the callback via, trapping writes to HVM_PARAM_CALLBACK_IRQ with an SMI-like effect
sounds
like the way to go, as everybody probably agrees at this point. You could also initate the shutdown from dom0, which you could easily do
via
ACPI even. ;)
Am I understanding right that the OS would need to be completely put on hold in the middle of the hypercall while the BIOS took over and did it's shutdown sequence, which presumably involves xenstore activity?
James
On Sat, 2011-05-21 at 09:44 +0100, James Harper wrote:
Am I understanding right that the OS would need to be completely put on hold in the middle of the hypercall while the BIOS took over and did it's shutdown sequence, which presumably involves xenstore activity?
Correct, or at least that is one suggestion (probably the front runner at the moment).
In theory it's no worse than SMM, which Windows presumably already copes with, but perhaps the latency from having to do the xenstore interaction is far greater than any real SMM operation.
Ian.
On Sat, 2011-05-21 at 09:44 +0100, James Harper wrote:
Am I understanding right that the OS would need to be completely put on hold in the middle of the hypercall while the BIOS took over and did it's shutdown sequence, which presumably involves xenstore activity?
Correct, or at least that is one suggestion (probably the front runner at the moment).
In theory it's no worse than SMM, which Windows presumably already copes with, but perhaps the latency from having to do the xenstore interaction is far greater than any real SMM operation.
How hard would it be to create a better mechanism for going forward and implement the above as a fallback if the DomU doesn't support the new mechanism?
James
On Mon, 2011-05-23 at 11:20 +0100, James Harper wrote:
On Sat, 2011-05-21 at 09:44 +0100, James Harper wrote:
Am I understanding right that the OS would need to be completely put on hold in the middle of the hypercall while the BIOS took over and did it's shutdown sequence, which presumably involves xenstore activity?
Correct, or at least that is one suggestion (probably the front runner at the moment).
In theory it's no worse than SMM, which Windows presumably already copes with, but perhaps the latency from having to do the xenstore interaction is far greater than any real SMM operation.
How hard would it be to create a better mechanism for going forward and implement the above as a fallback if the DomU doesn't support the new mechanism?
I suspect the answer is, not all that hard...
In fact I think that would make sense regardless of how hard it is.
Ian.
Good Day,
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
If you have any ideas please let me know.
You can look at the Linux source code and see what the first thing it does is. With GPLPV, the first thing I do is set up logging to /var/log/qemu-dm-<domu name>.log (iowrites which are caught by qemu), but only under the checked drivers. The next thing is to balloon down the memory before Windows touches it too much. Then I disable the qemu devices (iowrites which are caught by qemu). Finally I check the CPUID for the xen signature (should probably do that first) and then set up the rights etc.
I think the cheapest way to do it would be to trap the iowrite's and use that as the trigger to tear down the rings etc, as the iowrites are already processed in qemu which should be easier to intercept, but the xen guys would need to comment on if you can guarantee that this is always done by any reasonably recent version of Linux with PV drivers. There may well be lots of current installations that pre-date those iowrite's.
Next I guess you could look for the WriteMSR instruction to copy the hypercall pages in, or look for an OS querying the CPUID's where the Xen signatures live, but then the Hyper-V signatures are there too and I don't know when Windows queries those. Possibly harder to trap as Xen would either need to signal qemu or SeaBIOS directly that this had happened.
Alternatively, seeing the HVM_PARAM_CALLBACK_IRQ, HVM_PARAM_STORE_PFN, and HVM_PARAM_STORE_EVTCHN hypercalls (hvm set op) is the definitive way to know that the OS is initialising the xenbus interface. SeaBIOS would need to trap the calls (all three I guess in case they were executed in an order you didn't expect) before they were executed, which would be harder as I think qemu never sees it. This early intervention would be required as you'd need to use xenbus to tear down the interfaces which is probably asking a bit much.
James
On 19.05.11 at 07:33, Daniel Castro evil.dani@gmail.com wrote:
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
How can you be certain an OS won't switch back to real mode even after an extended period of up-time? Or that such switching back would affect you (could be calling e.g. the video or PCI BIOS functions only).
There is INT15 AX=EC00 with BX specifying the target operating mode, but that's apparently being called only before entering long mode (i.e. wouldn't cover 32-bit OSes). And it would neither be a guarantee that the OS might not later return to real mode.
Jan
On 05/19/11 10:08, Jan Beulich wrote:
On 19.05.11 at 07:33, Daniel Castroevil.dani@gmail.com wrote:
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
How can you be certain an OS won't switch back to real mode even after an extended period of up-time? Or that such switching back would affect you (could be calling e.g. the video or PCI BIOS functions only).
There is INT15 AX=EC00 with BX specifying the target operating mode, but that's apparently being called only before entering long mode (i.e. wouldn't cover 32-bit OSes). And it would neither be a guarantee that the OS might not later return to real mode.
Wouldn't it be possible for the BIOS to reestablish the connection to Xen in this case? This might be the best solution: close the channel and ring at some specific event (might even be timer based) and open them again if really needed.
Juergen
At 09:08 +0100 on 19 May (1305796117), Jan Beulich wrote:
How can you be certain an OS won't switch back to real mode even after an extended period of up-time? Or that such switching back would affect you (could be calling e.g. the video or PCI BIOS functions only).
You can't, but you could always try to re-establish PV connections if the guest starts making INT13h call again. In any case the existing BIOS has this problem if the PV drivers have turned off the emulated devices.
As for how you tidy up cleanly, I can't think of anything better than a sort of virtual SMM, where you register an area of code to be run in a known sane environment and have Xen trigger it based on, e.g. the disable-my-devices ioport write. It's pretty ugly but at least it'd be fairly self-contained compared to having Xen or qemu try to tear down grant-table entries &c.
Tim.
On Thu, 2011-05-19 at 09:20 +0100, Tim Deegan wrote:
As for how you tidy up cleanly, I can't think of anything better than a sort of virtual SMM, where you register an area of code to be run in a known sane environment and have Xen trigger it based on, e.g. the disable-my-devices ioport write. It's pretty ugly but at least it'd be fairly self-contained compared to having Xen or qemu try to tear down grant-table entries &c.
Tim and I just had a bit of a think about whether or not this could be done from AML }:-). (Lets ignore the fact that require ACPI support in the guest for this functionality would be a bit lame...)
Turns out it cannot (phew!) without adding some very hacky way to make hypercalls (e.g. via an I/O write), hypercalls are needed to kick the xenstore evtchn and also to close any other evtchns. The rest, such as clearing down grant entries and zeroing the xenstore ring could be done from AML, we reckon.
FWIW the set of things which needs to be done seems to be:
* xenbus writes to move devices to state 5 (provoking backend reset), notify xenbus evtchn, wait for responses to complete (or otherwise interlock against the xenstore ring reset below). * make hypercalls to close event channels * clear grant table entries * reset the xenstore ring ready for use by next OS.
So it looks like some sort of SMM alike thing is going to be the best answer here, although "real virtual" SMM looks like a complete snake/tar pit. A simpler callback with flat segments seems plausibly doable.
As an aside we will also need to handle the case where the guest is not PV aware and hence uses the emulated devices and never triggers any of the above activities. So we need to ensure that the backends are sync'd even if none of the above takes place. The PV devices will remain open but that needn't be a problem if the guest never uses them.
Possibly this means making sure all writes via this PV interface go straight to disk (using the appropriate barriers) or by having qemu do the necessary flush when the emulated device is first used.
Ian.
On 19/05/2011 10:36, "Ian Campbell" Ian.Campbell@citrix.com wrote:
On Thu, 2011-05-19 at 09:20 +0100, Tim Deegan wrote:
As for how you tidy up cleanly, I can't think of anything better than a sort of virtual SMM, where you register an area of code to be run in a known sane environment and have Xen trigger it based on, e.g. the disable-my-devices ioport write. It's pretty ugly but at least it'd be fairly self-contained compared to having Xen or qemu try to tear down grant-table entries &c.
Tim and I just had a bit of a think about whether or not this could be done from AML }:-). (Lets ignore the fact that require ACPI support in the guest for this functionality would be a bit lame...)
Turns out it cannot (phew!) without adding some very hacky way to make hypercalls (e.g. via an I/O write), hypercalls are needed to kick the xenstore evtchn and also to close any other evtchns. The rest, such as clearing down grant entries and zeroing the xenstore ring could be done from AML, we reckon.
Yuk, no. The SMM type thing (maybe not really emulated SMM, but kidn of inspired by the principle of SMM) is the best idea I have so far. That was the kind of thing in my mind when I replied yesterday.
-- Keir
FWIW the set of things which needs to be done seems to be:
* xenbus writes to move devices to state 5 (provoking backend reset), notify xenbus evtchn, wait for responses to complete (or otherwise interlock against the xenstore ring reset below). * make hypercalls to close event channels * clear grant table entries * reset the xenstore ring ready for use by next OS.
So it looks like some sort of SMM alike thing is going to be the best answer here, although "real virtual" SMM looks like a complete snake/tar pit. A simpler callback with flat segments seems plausibly doable.
As an aside we will also need to handle the case where the guest is not PV aware and hence uses the emulated devices and never triggers any of the above activities. So we need to ensure that the backends are sync'd even if none of the above takes place. The PV devices will remain open but that needn't be a problem if the guest never uses them.
Possibly this means making sure all writes via this PV interface go straight to disk (using the appropriate barriers) or by having qemu do the necessary flush when the emulated device is first used.
Ian.
Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
As an aside we will also need to handle the case where the guest is not PV aware and hence uses the emulated devices and never triggers any of the above activities. So we need to ensure that the backends are sync'd even if none of the above takes place. The PV devices will remain open but that needn't be a problem if the guest never uses them.
Possibly this means making sure all writes via this PV interface go straight to disk (using the appropriate barriers) or by having qemu do
From an userspace perspective that is funneled via fdatasync. On
2.6.39 that becomes REQ_FLUSH|REQ_FUA which is correct-ish.
the necessary flush when the emulated device is first used.
Hm, what is the HVM backend for 'phy' in QEMU mode? I've been using 'file' which translates to qdisk (which does the proper fdatasync on flush, thought not the proper flush on barrier as the 2.6.39 fdatasync does not do the old-style barrier flush).
On 19.05.11 at 07:33, Daniel Castro evil.dani@gmail.com wrote:
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will
no
longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in
a
clean state, this means, closing the channel and ring between the newly created domain and the host system.
How can you be certain an OS won't switch back to real mode even after an extended period of up-time? Or that such switching back would affect you (could be calling e.g. the video or PCI BIOS functions only).
Well that's a problem with any device with a real mode counterpart (eg INT13 for storage) isn't it? so just handle it in the same way.
James
On Thu, May 19, 2011 at 09:08:37AM +0100, Jan Beulich wrote:
On 19.05.11 at 07:33, Daniel Castro evil.dani@gmail.com wrote:
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
How can you be certain an OS won't switch back to real mode even after an extended period of up-time? Or that such switching back would affect you (could be calling e.g. the video or PCI BIOS functions only).
I don't think any modern OS will call the int13 (disk drive) functions once the OS has booted. Doing so would be highly error prone even on real hardware. Once the OS has registered it's own driver for a disk, it can't go back to the BIOS handlers for it.
-Kevin
On Thu, May 19, 2011 at 02:33:52PM +0900, Daniel Castro wrote:
Good Day,
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued?
In the general case, no.
The ACPI spec does define a mechanism for the OS to inform the BIOS that it is transitioning from "Legacy state" to "Working state" via an SMI. SeaBIOS does have code for this (see src/smm.c), but it doesn't currently do anything interesting. Unfortunately, this is only available for OSs that support ACPI.
We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
Is it required to close the channel? Can opening a new channel implicitly close the old channel (I believe this is what the virtio stuff does)?
-Kevin
On Sat, May 21, 2011 at 2:29 PM, Kevin O'Connor kevin@koconnor.net wrote:
On Thu, May 19, 2011 at 02:33:52PM +0900, Daniel Castro wrote:
Good Day,
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued?
In the general case, no.
The ACPI spec does define a mechanism for the OS to inform the BIOS that it is transitioning from "Legacy state" to "Working state" via an SMI. SeaBIOS does have code for this (see src/smm.c), but it doesn't currently do anything interesting. Unfortunately, this is only available for OSs that support ACPI.
We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
Is it required to close the channel? Can opening a new channel implicitly close the old channel (I believe this is what the virtio stuff does)?
There ought to be a way for the guest to reset the device. At least in virtio this can be done and allows an in-use device to be reused, no matter what state it is in.
Stefan
On Mon, 2011-05-23 at 06:24 +0100, Stefan Hajnoczi wrote:
On Sat, May 21, 2011 at 2:29 PM, Kevin O'Connor kevin@koconnor.net wrote:
On Thu, May 19, 2011 at 02:33:52PM +0900, Daniel Castro wrote:
Good Day,
In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued?
In the general case, no.
The ACPI spec does define a mechanism for the OS to inform the BIOS that it is transitioning from "Legacy state" to "Working state" via an SMI. SeaBIOS does have code for this (see src/smm.c), but it doesn't currently do anything interesting. Unfortunately, this is only available for OSs that support ACPI.
We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.
Is it required to close the channel? Can opening a new channel implicitly close the old channel (I believe this is what the virtio stuff does)?
There ought to be a way for the guest to reset the device. At least in virtio this can be done and allows an in-use device to be reused, no matter what state it is in.
There probably ought to be, but currently there isn't :-(
It might be that some existing frontends manage to cope gracefully with this, but I'm doubtful it will be all of them.
We were hoping to be able to enable this BIOS feature automatically and transparently to the user/guest OS but if we can't figure out a clean/sane way to do that we have the fallback option of exposing this as an option to users, updating the frontend drivers to cope by adding reset functionality and using the PV emulated device unplug protocol (which includes a protocol version field) to cause dangerous combinations of old drivers vs. enabling this new feature to fail.
Ian.