[SeaBIOS] [PATCH v3 2/2] QEMU fw_cfg: Add command to write back address of file

Igor Mammedov imammedo at redhat.com
Fri Feb 17 14:23:47 CET 2017


On Fri, 17 Feb 2017 11:06:54 +0100
Laszlo Ersek <lersek at redhat.com> wrote:

> CC Kevin
> 
> On 02/17/17 07:10, Ben Warren wrote:
> > Hi Laszlo  
> >> On Feb 9, 2017, at 12:24 AM, Laszlo Ersek <lersek at redhat.com
> >> <mailto:lersek at redhat.com>> wrote:
> >>
> >> On 02/09/17 09:17, Laszlo Ersek wrote:  
> >>> Ben,
> >>>
> >>> On 02/05/17 18:09, ben at skyportsystems.com
> >>> <mailto:ben at skyportsystems.com> wrote:  
> >>>> From: Ben Warren <ben at skyportsystems.com
> >>>> <mailto:ben at skyportsystems.com>>
> >>>>
> >>>> This command is similar to ADD_POINTER, but instead of patching
> >>>> memory, it writes the pointer back to QEMU over the DMA interface.
> >>>>
> >>>> Signed-off-by: Ben Warren <ben at skyportsystems.com
> >>>> <mailto:ben at skyportsystems.com>>
> >>>> ---
> >>>> src/fw/romfile_loader.c | 37 +++++++++++++++++++++++++++++++++++++
> >>>> src/fw/romfile_loader.h | 16 ++++++++++------
> >>>> 2 files changed, 47 insertions(+), 6 deletions(-)  
> >>>
> >>> while working on the OVMF patches for WRITE_POINTER, something else
> >>> occurred to me.
> >>>
> >>> As I mentioned in the QEMU thread,
> >>>
> >>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg428495.html
> >>>
> >>> the VMGENID device (or more generally, each device that produces
> >>> WRITE_POINTER command(s)) should have a reset handler that makes the
> >>> device forget the GPA(s) that the firmware communicated to it, via
> >>> WRITE_POINTER. This is because after system reset, all earlier GPAs
> >>> become meaningless.
> >>>
> >>> What I want to mention here is the corollary to the above. ACPI S3
> >>> resume is basically a reset, but the GPAs remain valid, because system
> >>> memory is preserved. Therefore, at S3 resume, the guest firmware has to
> >>> replay all the WRITE_POINTER commands. The QEMU reset handler will
> >>> forcibly forget about the GPAs (which is correct), so the firmware has
> >>> to re-store them.  
> >>
> >> ... By that I don't necessarily mean to re-run the exact same
> >> linker-loader logic; it should be okay to save the *results* of those
> >> operations in a simpler array (that is, write exactly what values to
> >> what fw_cfg files at what offsets).
> >>
> >> And, you can detect whether this is needed at all, the
> >> "etc/system-states" fw_cfg file can be used to see if ACPI S3 is enabled
> >> or disabled in QEMU. (Please grep QEMU for "disable_s3" and
> >> "etc/system-states".)
> >>  
> > I’ve made some changes to SeaBIOS that I believe should do this, but
> > don’t really know how to test it.
> > Here’s what I’ve tried:
> > • Instrumented QEMU to print a message when a fw_cfg write comes from
> > the guest
> > • called QEMU with the additional arguments "-global
> > PIIX4_PM.disable_s3=0”.  Note that my machine type is pc-i440fx-2.8
> > • Booted a Linux guest.  Once logged in, typed “pm-suspend”
> > • Noted the following in the QEMU monitor:
> > • (QEMU) 
> > {u'timestamp': {u'seconds': 1487310498, u'microseconds': 971046},
> > u'event': u'SUSPEND’}
> > • Woke the guest up from QEMU monitor:
> > • (QEMU) system_wakeup
> >           {"return": {}}
> >           (QEMU) 
> >           {u'timestamp': {u'seconds': 1487310558, u'microseconds':
> > 135357}, u'event': u'WAKEUP’}  
> 
> Yes, this is exactly how it should be tested.
> 
> > 
> > But didn’t see the fw_cfg getting written.  I don’t understand the ACPI
> > states well enough to know if the above sequence is exercising S3, so
> > don’t really want to spend more effort without knowing I’m doing the
> > right thing.  
> 
> You are doing the right thing.
> 
> For additional confirmation, you can instrument SeaBIOS too, by adding
> dprintf()s to handle_resume() in "resume.c":
> 
> https://www.seabios.org/Execution_and_code_flow#Resume_and_reboot
> 
> Actually, if you see
> 
>     dprintf(1, "In resume (status=%d)\n", status);
> 
> in the log (from the QEMU debug port), then you can be sure you're resuming.
> 
> On resume, QEMU will set the CMOS_RESET_CODE register to 0xFE, so the
> SeaBIOS code will jump to handle_resume32(), and then to s3_resume().
> This last function is which is where I think the commands should be
> replayed (likely from a more condensed form).
> 
> Roughly, what the OVMF code does is the following (omitting all the
> PI/UEFI terms for now):
> 
> - during normal (first) boot, when processing the WRITE_POINTER
> commands, OVMF stashes them in a condensed form in a buffer that is
> allocated as "reserved memory" (so that the OS stays away from it)
> 
> - in addition, a small buffer is similarly pre-allocated in reserved
> memory; this buffer provides storage for one fw_cfg DMA access object,
> plus one uint64_t data element, to be transferred (written) by an
> appropriately formatted fw_cfg DMA access object
> 
> - during S3 resume, the code iterates through the condensed
> WRITE_POINTER commands; for each, the fw_cfg DMA request is formatted
> into the pre-allocated buffer, and the pointer value to write is also
> formatted into the appropriate part of that buffer. The idea is that we
> can replay the fw_cfg writes without any linker/loader script processing
> at this point, and without needing dynamic memory (there's no such thing
> during S3 resume).
> 
> Because the memory requirements are so strict for S3 resume -- i.e., you
> can only use (some) stack, and pre-reserved / static data -- it is quite
> possible that you cannot use the existing fw_cfg helper functions. You
> may have to factor out (or reimplement) very low level fw_cfg access.
> 
> About the "condensed" form, think something like:
> - uint16_t dst_file_key
> - uint32_t dst_file_offset
> - uint64_t pointer_value
> - uint8_t pointer_size
> 
> This contains all the information you need for two fw_cfg DMA
> operations: (1) a combined select + skip operation (based on the first
> two fields), and (2) a write operation (based on the last two fields).
> For each WRITE_POINTER command that you encounter during normal boot,
> you can "distill" such a structure (after the dst_file name has been
> resolved (--> dst_file_key), and the src blob's base address and the src
> offset from the command itself have been summed (--> pointer_value)).
Here is an example on how it's been handled for similar albeit simpler case
023b1d0d6a5 ("add helpers to read etc/boot-cpus at resume time")


> 
> Here's a recent example I recall for S3 memory management: commit
> 54e3a88609da ("smp: restore MSRs on S3 resume", 2016-07-07).
> 
> The "smp_msr" array (statically allocated with a fixed size, for 32
> elements) is populated during normal boot with wrmsr_smp(). (The fact
> that it overlaps with "SMP" is a complication, but we can mostly ignore
> it for now; the point is the lifetime of the "smp_msr" array.) If client
> code tries to call wrmsr_smp() for saving more than 32 entries, during
> normal boot, the overflow is simply dropped (with warnings). Then, the
> array is "replayed" during S3 resume, on the following call chain -- and
> I'm ignoring SMP again --
> 
>   ...
>     handle_resume32()
>       s3_resume()
>         smp_resume()
>           smp_write_msrs()
> 
> I think a similar pattern should work here -- it should be okay to use a
> small, fixed size, static array for stashing WRITE_POINTER commands in
> condensed form (static data in SeaBIOS will automatically end up
> reserved memory IIRC). And, because you are reprogramming devices, not
> CPU registers that need to be set on each CPU simultaneously, it should
> be okay to do the obvious thing -- just let the boot processor write the
> data.
> 
> > Any advice you can give here would be great.
> > 
> > I’m posting the patch set without this change so that at least we have
> > something, although I’d really like to get this S3 resume code in place too.  
> 
> I agree, it would be a good thing.
> 
> Thanks,
> Laszlo
> 
> _______________________________________________
> SeaBIOS mailing list
> SeaBIOS at seabios.org
> https://www.coreboot.org/mailman/listinfo/seabios



More information about the SeaBIOS mailing list