[SeaBIOS] [PATCH v3 2/2] QEMU fw_cfg: Add command to write back address of file

Fri Feb 17 11:06:54 CET 2017

CC Kevin

On 02/17/17 07:10, Ben Warren wrote:
> Hi Laszlo
>> On Feb 9, 2017, at 12:24 AM, Laszlo Ersek <lersek at redhat.com
>> <mailto:lersek at redhat.com>> wrote:
>>
>> On 02/09/17 09:17, Laszlo Ersek wrote:
>>> Ben,
>>>
>>> On 02/05/17 18:09, ben at skyportsystems.com
>>> <mailto:ben at skyportsystems.com> wrote:
>>>> From: Ben Warren <ben at skyportsystems.com
>>>> <mailto:ben at skyportsystems.com>>
>>>>
>>>> This command is similar to ADD_POINTER, but instead of patching
>>>> memory, it writes the pointer back to QEMU over the DMA interface.
>>>>
>>>> Signed-off-by: Ben Warren <ben at skyportsystems.com
>>>> <mailto:ben at skyportsystems.com>>
>>>> ---
>>>> src/fw/romfile_loader.c | 37 +++++++++++++++++++++++++++++++++++++
>>>> src/fw/romfile_loader.h | 16 ++++++++++------
>>>> 2 files changed, 47 insertions(+), 6 deletions(-)
>>>
>>> while working on the OVMF patches for WRITE_POINTER, something else
>>> occurred to me.
>>>
>>> As I mentioned in the QEMU thread,
>>>
>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg428495.html
>>>
>>> the VMGENID device (or more generally, each device that produces
>>> WRITE_POINTER command(s)) should have a reset handler that makes the
>>> device forget the GPA(s) that the firmware communicated to it, via
>>> WRITE_POINTER. This is because after system reset, all earlier GPAs
>>> become meaningless.
>>>
>>> What I want to mention here is the corollary to the above. ACPI S3
>>> resume is basically a reset, but the GPAs remain valid, because system
>>> memory is preserved. Therefore, at S3 resume, the guest firmware has to
>>> replay all the WRITE_POINTER commands. The QEMU reset handler will
>>> forcibly forget about the GPAs (which is correct), so the firmware has
>>> to re-store them.
>>
>> ... By that I don't necessarily mean to re-run the exact same
>> linker-loader logic; it should be okay to save the *results* of those
>> operations in a simpler array (that is, write exactly what values to
>> what fw_cfg files at what offsets).
>>
>> And, you can detect whether this is needed at all, the
>> "etc/system-states" fw_cfg file can be used to see if ACPI S3 is enabled
>> or disabled in QEMU. (Please grep QEMU for "disable_s3" and
>> "etc/system-states".)
>>
> I’ve made some changes to SeaBIOS that I believe should do this, but
> don’t really know how to test it.
> Here’s what I’ve tried:
> • Instrumented QEMU to print a message when a fw_cfg write comes from
> the guest
> • called QEMU with the additional arguments "-global
> PIIX4_PM.disable_s3=0”.  Note that my machine type is pc-i440fx-2.8
> • Booted a Linux guest.  Once logged in, typed “pm-suspend”
> • Noted the following in the QEMU monitor:
> • (QEMU) 
> {u'timestamp': {u'seconds': 1487310498, u'microseconds': 971046},
> u'event': u'SUSPEND’}
> • Woke the guest up from QEMU monitor:
> • (QEMU) system_wakeup
>           {"return": {}}
>           (QEMU) 
>           {u'timestamp': {u'seconds': 1487310558, u'microseconds':
> 135357}, u'event': u'WAKEUP’}

Yes, this is exactly how it should be tested.

> 
> But didn’t see the fw_cfg getting written.  I don’t understand the ACPI
> states well enough to know if the above sequence is exercising S3, so
> don’t really want to spend more effort without knowing I’m doing the
> right thing.

You are doing the right thing.

For additional confirmation, you can instrument SeaBIOS too, by adding
dprintf()s to handle_resume() in "resume.c":

https://www.seabios.org/Execution_and_code_flow#Resume_and_reboot

Actually, if you see

    dprintf(1, "In resume (status=%d)\n", status);

in the log (from the QEMU debug port), then you can be sure you're resuming.

On resume, QEMU will set the CMOS_RESET_CODE register to 0xFE, so the
SeaBIOS code will jump to handle_resume32(), and then to s3_resume().
This last function is which is where I think the commands should be
replayed (likely from a more condensed form).

Roughly, what the OVMF code does is the following (omitting all the
PI/UEFI terms for now):

- during normal (first) boot, when processing the WRITE_POINTER
commands, OVMF stashes them in a condensed form in a buffer that is
allocated as "reserved memory" (so that the OS stays away from it)

- in addition, a small buffer is similarly pre-allocated in reserved
memory; this buffer provides storage for one fw_cfg DMA access object,
plus one uint64_t data element, to be transferred (written) by an
appropriately formatted fw_cfg DMA access object

- during S3 resume, the code iterates through the condensed
WRITE_POINTER commands; for each, the fw_cfg DMA request is formatted
into the pre-allocated buffer, and the pointer value to write is also
formatted into the appropriate part of that buffer. The idea is that we
can replay the fw_cfg writes without any linker/loader script processing
at this point, and without needing dynamic memory (there's no such thing
during S3 resume).

Because the memory requirements are so strict for S3 resume -- i.e., you
can only use (some) stack, and pre-reserved / static data -- it is quite
possible that you cannot use the existing fw_cfg helper functions. You
may have to factor out (or reimplement) very low level fw_cfg access.

About the "condensed" form, think something like:
- uint16_t dst_file_key
- uint32_t dst_file_offset
- uint64_t pointer_value
- uint8_t pointer_size

This contains all the information you need for two fw_cfg DMA
operations: (1) a combined select + skip operation (based on the first
two fields), and (2) a write operation (based on the last two fields).
For each WRITE_POINTER command that you encounter during normal boot,
you can "distill" such a structure (after the dst_file name has been
resolved (--> dst_file_key), and the src blob's base address and the src
offset from the command itself have been summed (--> pointer_value)).

Here's a recent example I recall for S3 memory management: commit
54e3a88609da ("smp: restore MSRs on S3 resume", 2016-07-07).

The "smp_msr" array (statically allocated with a fixed size, for 32
elements) is populated during normal boot with wrmsr_smp(). (The fact
that it overlaps with "SMP" is a complication, but we can mostly ignore
it for now; the point is the lifetime of the "smp_msr" array.) If client
code tries to call wrmsr_smp() for saving more than 32 entries, during
normal boot, the overflow is simply dropped (with warnings). Then, the
array is "replayed" during S3 resume, on the following call chain -- and
I'm ignoring SMP again --

  ...
    handle_resume32()
      s3_resume()
        smp_resume()
          smp_write_msrs()

I think a similar pattern should work here -- it should be okay to use a
small, fixed size, static array for stashing WRITE_POINTER commands in
condensed form (static data in SeaBIOS will automatically end up
reserved memory IIRC). And, because you are reprogramming devices, not
CPU registers that need to be set on each CPU simultaneously, it should
be okay to do the obvious thing -- just let the boot processor write the
data.

> Any advice you can give here would be great.
> 
> I’m posting the patch set without this change so that at least we have
> something, although I’d really like to get this S3 resume code in place too.

I agree, it would be a good thing.

Thanks,
Laszlo