[OpenBIOS] Grand Unified Tracing

Tue Aug 23 16:45:43 CEST 2011

On Tue, Aug 23, 2011 at 3:34 PM, Blue Swirl <blauwirbel at gmail.com> wrote:
> On Tue, Aug 23, 2011 at 5:35 AM, Stefan Hajnoczi <stefanha at gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 9:59 PM, Blue Swirl <blauwirbel at gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 8:23 PM, Stefan Hajnoczi <stefanha at gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 7:14 PM, Blue Swirl <blauwirbel at gmail.com> wrote:
>>>>> OpenBIOS uses traditional DPRINTFs for debugging. I was considering
>>>>> replacing those with tracepoints, which would output to serial device
>>>>> or whatever DPRINTFs are using currently. This would not be extremely
>>>>> useful by itself, except maybe that configuration for debugging would
>>>>> be concentrated to single 'trace-events' file, but this would not be a
>>>>> major improvement over current XML configuration.
>>>>>
>>>>> But developing this further, maybe OpenBIOS could also pass the
>>>>> tracepoint data back to QEMU? Then all tracepoint data would be
>>>>> synchronized, coherent and all gathered to single place.
>>>>>
>>>>> The implementation could be that fw_cfg would be used to pass
>>>>> simpletrace style data. An offset should be added to event IDs and
>>>>> data would then be output as usual. On OpenBIOS side, the
>>>>> implementation would be pretty similar to current QEMU tracepoints but
>>>>> in place of file output there would be fw_cfg output.
>>>>>
>>>>> Syntax for trace-events file should be augmented with include
>>>>> directive, so that QEMU knows also OpenBIOS tracepoints. I think the
>>>>> only change to simpletrace.py would be to parse this directive.
>>>>>
>>>>> Controlling OpenBIOS tracepoints from QEMU monitor would be cool too.
>>>>>
>>>>> Going even further, other targets like kernels could use something
>>>>> similar, probably not using fw_cfg though.
>>>>>
>>>>> What do you think?
>>>>
>>>> Dhaval showed me a demo of unified host/guest Linux tracing last week.
>>>>  He is doing something similar except using a hypercall to pass a
>>>> string to the host kernel.  In his case kvm.ko handles the hypercall
>>>> and qemu is not involved.
>>>
>>> So the events flow directly from guest kernel to host kernel?
>>
>> Yes.
>
> How is that used then, with dmesg?

When the host processes the trace hypercall it simply forwards the
trace event on to host Linux's trace.  It currently logs the string
from the guest and prefixes it with the guest program counter, I
believe.  You can dump out the trace using the usual Linux ftrace
debugfs files.

But the longer term goal, I believe, is to really integrate guest and
host tracing so that host userspace (e.g. perf(1)) can pull out trace
events from the guest and just sees them tagged as belonging to a
particular qemu-kvm process.

>
>>>> One issue with QEMU tracing is that trace call sites are static.  You
>>>> need to compile in a trace_*() call, which means that there are two
>>>> choices for how to tunnel OpenBIOS trace events:
>>>>
>>>> 1. Define a tunnel trace event:
>>>> openbios_event(uint64_t event_id, uint64_t arg1, uint64_t arg2, ...)
>>>>
>>>> QEMU only has one trace event to tunnel OpenBIOS trace events.  Then
>>>> the host is unable to pretty-print OpenBIOS traces automatically and
>>>> the max arguments becomes 6 - 1 (for the openbios_event tunnel event
>>>> id).
>>>>
>>>> 2. Generate a switch statement to demultiplex trace events:
>>>> void hypercall(uint64_t event_id, uint64_t arg1, ...)
>>>> {
>>>>    /* This is auto-generated by tracetool */
>>>>    switch (event_id) {
>>>>    case TRACE_EVENT_OPENBIOS_FOO:
>>>>        trace_openbios_foo(arg1, arg2, arg3);
>>>>        break;
>>>>    case TRACE_EVENT_OPENBIOS_BAR:
>>>>        trace_openbios_bar(arg1);
>>>>        break;
>>>>    ...
>>>>    }
>>>> }
>>>>
>>>> With this approach the user can toggle trace events at runtime and it
>>>> works out much nicer.
>>>
>>> Maybe I'm missing something, but why would we have to multiplex
>>> anything? Since the trace IDs are 64 bits, we can easily allocate a
>>> static range (starting from 0x8000000000000000) to non-native events
>>> so that QEMU IDs and OpenBIOS IDs do not overlap. QEMU would add this
>>> offset to IDs coming from fw_cfg. Then QEMU would just pass the data
>>> (8*64 bit words) to tracing back end. There obviously something needs
>>> to be changed so that OpenBIOS messages can be generated. For
>>> simpletrace this should be easy, for stderr and other back ends that
>>> may be more complicated.
>>
>> I meant that the interface generated by tracetool is a header file
>> with trace_*() functions.  If you want to support any tracing backend
>> then you need to use this interface and cannot call simpletrace.c
>> trace0(), trace1(), ... directly.
>>
>> Are you thinking of only supporting the simple trace backend?
>
> That would be simplest, then fw_cfg.c (or some intermediate entity)
> could just call trace6().
>
> But stderr would be useful too. I think that could be done by adding
> code to read trace-events file to grab the format string. Then a
> modified vfprintf would parse it but instead of va_arg(args, type) it
> would just grab the uint64_t arg. Your approach would be much simpler.

I see.  I think we should avoid introducing dependencies on specific
trace backends, so I would go for the generated switch statement
above.  That way you can even use DTrace/SystemTap or other backends.
Also if we called traceX() directly we'd have to duplicate the logic
to check whether the trace event is enabled/disabled, so it becomes a
bit messy.

Stefan