On Feb 17, 2017, at 2:06 AM, Laszlo Ersek lersek@redhat.com wrote:
CC Kevin
On 02/17/17 07:10, Ben Warren wrote:
Hi Laszlo
On Feb 9, 2017, at 12:24 AM, Laszlo Ersek <lersek@redhat.com mailto:lersek@redhat.com> wrote:
On 02/09/17 09:17, Laszlo Ersek wrote:
Ben,
On 02/05/17 18:09, ben@skyportsystems.com mailto:ben@skyportsystems.com wrote:
From: Ben Warren <ben@skyportsystems.com mailto:ben@skyportsystems.com>
This command is similar to ADD_POINTER, but instead of patching memory, it writes the pointer back to QEMU over the DMA interface.
Signed-off-by: Ben Warren <ben@skyportsystems.com
mailto:ben@skyportsystems.com>
src/fw/romfile_loader.c | 37 +++++++++++++++++++++++++++++++++++++ src/fw/romfile_loader.h | 16 ++++++++++------ 2 files changed, 47 insertions(+), 6 deletions(-)
while working on the OVMF patches for WRITE_POINTER, something else occurred to me.
As I mentioned in the QEMU thread,
https://www.mail-archive.com/qemu-devel@nongnu.org/msg428495.html
the VMGENID device (or more generally, each device that produces WRITE_POINTER command(s)) should have a reset handler that makes the device forget the GPA(s) that the firmware communicated to it, via WRITE_POINTER. This is because after system reset, all earlier GPAs become meaningless.
What I want to mention here is the corollary to the above. ACPI S3 resume is basically a reset, but the GPAs remain valid, because system memory is preserved. Therefore, at S3 resume, the guest firmware has to replay all the WRITE_POINTER commands. The QEMU reset handler will forcibly forget about the GPAs (which is correct), so the firmware has to re-store them.
... By that I don't necessarily mean to re-run the exact same linker-loader logic; it should be okay to save the *results* of those operations in a simpler array (that is, write exactly what values to what fw_cfg files at what offsets).
And, you can detect whether this is needed at all, the "etc/system-states" fw_cfg file can be used to see if ACPI S3 is enabled or disabled in QEMU. (Please grep QEMU for "disable_s3" and "etc/system-states".)
I’ve made some changes to SeaBIOS that I believe should do this, but don’t really know how to test it. Here’s what I’ve tried: • Instrumented QEMU to print a message when a fw_cfg write comes from the guest • called QEMU with the additional arguments "-global PIIX4_PM.disable_s3=0”. Note that my machine type is pc-i440fx-2.8 • Booted a Linux guest. Once logged in, typed “pm-suspend” • Noted the following in the QEMU monitor: • (QEMU) {u'timestamp': {u'seconds': 1487310498, u'microseconds': 971046}, u'event': u'SUSPEND’} • Woke the guest up from QEMU monitor: • (QEMU) system_wakeup {"return": {}} (QEMU) {u'timestamp': {u'seconds': 1487310558, u'microseconds': 135357}, u'event': u'WAKEUP’}
Yes, this is exactly how it should be tested.
Good
But didn’t see the fw_cfg getting written. I don’t understand the ACPI states well enough to know if the above sequence is exercising S3, so don’t really want to spend more effort without knowing I’m doing the right thing.
You are doing the right thing.
For additional confirmation, you can instrument SeaBIOS too, by adding dprintf()s to handle_resume() in "resume.c":
https://www.seabios.org/Execution_and_code_flow#Resume_and_reboot
Actually, if you see
dprintf(1, "In resume (status=%d)\n", status);
in the log (from the QEMU debug port), then you can be sure you're resuming.
Sounds good
On resume, QEMU will set the CMOS_RESET_CODE register to 0xFE, so the SeaBIOS code will jump to handle_resume32(), and then to s3_resume(). This last function is which is where I think the commands should be replayed (likely from a more condensed form).
Roughly, what the OVMF code does is the following (omitting all the PI/UEFI terms for now):
- during normal (first) boot, when processing the WRITE_POINTER
commands, OVMF stashes them in a condensed form in a buffer that is allocated as "reserved memory" (so that the OS stays away from it)
- in addition, a small buffer is similarly pre-allocated in reserved
memory; this buffer provides storage for one fw_cfg DMA access object, plus one uint64_t data element, to be transferred (written) by an appropriately formatted fw_cfg DMA access object
- during S3 resume, the code iterates through the condensed
WRITE_POINTER commands; for each, the fw_cfg DMA request is formatted into the pre-allocated buffer, and the pointer value to write is also formatted into the appropriate part of that buffer. The idea is that we can replay the fw_cfg writes without any linker/loader script processing at this point, and without needing dynamic memory (there's no such thing during S3 resume).
Because the memory requirements are so strict for S3 resume -- i.e., you can only use (some) stack, and pre-reserved / static data -- it is quite possible that you cannot use the existing fw_cfg helper functions. You may have to factor out (or reimplement) very low level fw_cfg access.
About the "condensed" form, think something like:
- uint16_t dst_file_key
- uint32_t dst_file_offset
- uint64_t pointer_value
- uint8_t pointer_size
This is exactly what I’m saving, but am adding the entries to a linked list and calling the qemu_fw_cfg functions. Maybe I am running up against resource constraints.
This contains all the information you need for two fw_cfg DMA operations: (1) a combined select + skip operation (based on the first two fields), and (2) a write operation (based on the last two fields). For each WRITE_POINTER command that you encounter during normal boot, you can "distill" such a structure (after the dst_file name has been resolved (--> dst_file_key), and the src blob's base address and the src offset from the command itself have been summed (--> pointer_value)).
Here's a recent example I recall for S3 memory management: commit 54e3a88609da ("smp: restore MSRs on S3 resume", 2016-07-07).
The "smp_msr" array (statically allocated with a fixed size, for 32 elements) is populated during normal boot with wrmsr_smp(). (The fact that it overlaps with "SMP" is a complication, but we can mostly ignore it for now; the point is the lifetime of the "smp_msr" array.) If client code tries to call wrmsr_smp() for saving more than 32 entries, during normal boot, the overflow is simply dropped (with warnings). Then, the array is "replayed" during S3 resume, on the following call chain -- and I'm ignoring SMP again --
... handle_resume32() s3_resume() smp_resume() smp_write_msrs()
Yes, I call my code from here.
I think a similar pattern should work here -- it should be okay to use a small, fixed size, static array for stashing WRITE_POINTER commands in condensed form (static data in SeaBIOS will automatically end up reserved memory IIRC). And, because you are reprogramming devices, not CPU registers that need to be set on each CPU simultaneously, it should be okay to do the obvious thing -- just let the boot processor write the data.
Any advice you can give here would be great.
I’m posting the patch set without this change so that at least we have something, although I’d really like to get this S3 resume code in place too.
I agree, it would be a good thing.
Thanks, Laszlo
Thanks a lot. This has been very helpful.
—Ben