I've been investigating where time disappears to when booting Linux guests.
Initially I enabled DEBUG_BIOS in QEMU's hw/pc.c, and then hacked it so that it could print a timestamp before each new line of debug output. The problem with that is that it slowed down startup, so the timings I was examining all changed.
What I really wanted was to use QEMU's trace infrastructure with a simple SystemTAP script. This is easy enough in the QEMU layer, but I also need to see where time goes to inside the various BIOS functions, and the options ROMs such as LinuxBoot. So I came up with a small hack to insert "probes" into SeaBios and LinuxBoot, which trigger a special IO port (0x404), which then cause QEMU to emit a trace event.
The implementation is really very crude and does not allow any arguments to be passed each probes, but since all I care about is timing information, it is good enough for my needs.
I'm not really expecting these patches to be merged into QEMU/SeaBios since they're just a crude hack & I don't have time to write something better. I figure they might be useful for someone else though...
With the attached patches applied to QEMU and SeaBios, the attached systemtap script can be used to debug timings in QEMU startup.
For example, one execution of QEMU produced the following log:
$ stap qemu-timing.stp 0.000 Start 0.036 Run 0.038 BIOS post 0.180 BIOS int 19 0.181 BIOS boot OS 0.181 LinuxBoot copy kernel 1.371 LinuxBoot copy initrd 1.616 LinuxBoot boot OS 2.489 Shutdown request 2.490 Stop
showing that LinuxBoot is responsible for by far the most execution time (~1500ms), in my test which runs for 2500ms in total.
Regards, Daniel
On 10/10/2011 12:08 PM, Daniel P. Berrange wrote:
I've been investigating where time disappears to when booting Linux guests.
Initially I enabled DEBUG_BIOS in QEMU's hw/pc.c, and then hacked it so that it could print a timestamp before each new line of debug output. The problem with that is that it slowed down startup, so the timings I was examining all changed.
What I really wanted was to use QEMU's trace infrastructure with a simple SystemTAP script. This is easy enough in the QEMU layer, but I also need to see where time goes to inside the various BIOS functions, and the options ROMs such as LinuxBoot. So I came up with a small hack to insert "probes" into SeaBios and LinuxBoot, which trigger a special IO port (0x404), which then cause QEMU to emit a trace event.
The implementation is really very crude and does not allow any arguments to be passed each probes, but since all I care about is timing information, it is good enough for my needs.
I'm not really expecting these patches to be merged into QEMU/SeaBios since they're just a crude hack& I don't have time to write something better. I figure they might be useful for someone else though...
With the attached patches applied to QEMU and SeaBios, the attached systemtap script can be used to debug timings in QEMU startup.
For example, one execution of QEMU produced the following log:
$ stap qemu-timing.stp 0.000 Start 0.036 Run 0.038 BIOS post 0.180 BIOS int 19 0.181 BIOS boot OS 0.181 LinuxBoot copy kernel 1.371 LinuxBoot copy initrd
Yeah, there was a thread a bit ago about the performance of the interface to read the kernel/initrd. I think at it was using single byte access instructions and there were patches to use string accessors instead? I can't remember where that threaded ended up.
CC'ing Gleb and Alex who may recall more.
Regards,
Anthony Liguori
1.616 LinuxBoot boot OS 2.489 Shutdown request 2.490 Stop
showing that LinuxBoot is responsible for by far the most execution time (~1500ms), in my test which runs for 2500ms in total.
Regards, Daniel
On 10.10.2011, at 20:53, Anthony Liguori wrote:
On 10/10/2011 12:08 PM, Daniel P. Berrange wrote:
I've been investigating where time disappears to when booting Linux guests.
Initially I enabled DEBUG_BIOS in QEMU's hw/pc.c, and then hacked it so that it could print a timestamp before each new line of debug output. The problem with that is that it slowed down startup, so the timings I was examining all changed.
What I really wanted was to use QEMU's trace infrastructure with a simple SystemTAP script. This is easy enough in the QEMU layer, but I also need to see where time goes to inside the various BIOS functions, and the options ROMs such as LinuxBoot. So I came up with a small hack to insert "probes" into SeaBios and LinuxBoot, which trigger a special IO port (0x404), which then cause QEMU to emit a trace event.
The implementation is really very crude and does not allow any arguments to be passed each probes, but since all I care about is timing information, it is good enough for my needs.
I'm not really expecting these patches to be merged into QEMU/SeaBios since they're just a crude hack& I don't have time to write something better. I figure they might be useful for someone else though...
With the attached patches applied to QEMU and SeaBios, the attached systemtap script can be used to debug timings in QEMU startup.
For example, one execution of QEMU produced the following log:
$ stap qemu-timing.stp 0.000 Start 0.036 Run 0.038 BIOS post 0.180 BIOS int 19 0.181 BIOS boot OS 0.181 LinuxBoot copy kernel 1.371 LinuxBoot copy initrd
Yeah, there was a thread a bit ago about the performance of the interface to read the kernel/initrd. I think at it was using single byte access instructions and there were patches to use string accessors instead? I can't remember where that threaded ended up.
IIRC we're already using string accessors, but are still slow. Richard had a nice patch cooked up to basically have the fw_cfg interface be able to DMA its data to the guest. I like the idea. Avi did not.
And yes, bad -kernel performance does hurt in some workloads. A lot.
Alex
On 10/10/2011 09:01 PM, Alexander Graf wrote:
For example, one execution of QEMU produced the following log:
$ stap qemu-timing.stp 0.000 Start 0.036 Run 0.038 BIOS post 0.180 BIOS int 19 0.181 BIOS boot OS 0.181 LinuxBoot copy kernel 1.371 LinuxBoot copy initrd
Yeah, there was a thread a bit ago about the performance of the interface to read the kernel/initrd. I think at it was using single byte access instructions and there were patches to use string accessors instead? I can't remember where that threaded ended up.
IIRC we're already using string accessors, but are still slow. Richard had a nice patch cooked up to basically have the fw_cfg interface be able to DMA its data to the guest. I like the idea. Avi did not.
And yes, bad -kernel performance does hurt in some workloads. A lot.
The rep/ins implementation is still slow, optimizing it can help.
What does 'perf top' say when running this workload?
On Tue, Oct 11, 2011 at 11:08:33AM +0200, Avi Kivity wrote:
On 10/10/2011 09:01 PM, Alexander Graf wrote:
For example, one execution of QEMU produced the following log:
$ stap qemu-timing.stp 0.000 Start 0.036 Run 0.038 BIOS post 0.180 BIOS int 19 0.181 BIOS boot OS 0.181 LinuxBoot copy kernel 1.371 LinuxBoot copy initrd
Yeah, there was a thread a bit ago about the performance of the interface to read the kernel/initrd. I think at it was using single byte access instructions and there were patches to use string accessors instead? I can't remember where that threaded ended up.
IIRC we're already using string accessors, but are still slow. Richard had a nice patch cooked up to basically have the fw_cfg interface be able to DMA its data to the guest. I like the idea. Avi did not.
And yes, bad -kernel performance does hurt in some workloads. A lot.
The rep/ins implementation is still slow, optimizing it can help.
What does 'perf top' say when running this workload?
To ensure it only recorded the LinuxBoot code, I created a 100 MB kernel image which takes approx 30 seconds to copy. Here is the perf output for approx 15 seconds of that copy:
1906.00 15.0% read_hpet [kernel] 1029.00 8.1% x86_emulate_insn [kvm] 863.00 6.8% test_cc [kvm] 661.00 5.2% emulator_get_segment [kvm] 631.00 5.0% kvm_mmu_pte_write [kvm] 535.00 4.2% __linearize [kvm] 431.00 3.4% do_raw_spin_lock [kernel] 356.00 2.8% vmx_get_segment [kvm_intel] 330.00 2.6% vmx_segment_cache_test_set [kvm_intel] 308.00 2.4% segmented_write [kvm] 291.00 2.3% vread_hpet [kernel].vsyscall_fn 251.00 2.0% vmx_get_cpl [kvm_intel] 230.00 1.8% trace_kvm_mmu_audit [kvm] 207.00 1.6% kvm_write_guest [kvm] 199.00 1.6% emulator_write_emulated [kvm] 187.00 1.5% emulator_write_emulated_onepage [kvm] 185.00 1.5% kvm_write_guest_page [kvm] 177.00 1.4% vmx_get_segment_base [kvm_intel] 158.00 1.2% fw_cfg_io_readb qemu-system-x86_64 148.00 1.2% register_address_increment [kvm] 142.00 1.1% emulator_write_phys [kvm] 134.00 1.1% acpi_os_read_port [kernel]
Daniel
On 10/11/2011 11:18 AM, Daniel P. Berrange wrote:
The rep/ins implementation is still slow, optimizing it can help.
What does 'perf top' say when running this workload?
To ensure it only recorded the LinuxBoot code, I created a 100 MB kernel image which takes approx 30 seconds to copy. Here is the perf output for approx 15 seconds of that copy:
1906.00 15.0% read_hpet [kernel]
Recent kernels are very clock intensive...
1029.00 8.1% x86_emulate_insn [kvm] 863.00 6.8% test_cc [kvm]
test_cc() is wierd - not called on this path at all.
661.00 5.2% emulator_get_segment [kvm] 631.00 5.0% kvm_mmu_pte_write [kvm] 535.00 4.2% __linearize [kvm] 431.00 3.4% do_raw_spin_lock [kernel] 356.00 2.8% vmx_get_segment [kvm_intel] 330.00 2.6% vmx_segment_cache_test_set [kvm_intel] 308.00 2.4% segmented_write [kvm] 291.00 2.3% vread_hpet [kernel].vsyscall_fn 251.00 2.0% vmx_get_cpl [kvm_intel] 230.00 1.8% trace_kvm_mmu_audit [kvm] 207.00 1.6% kvm_write_guest [kvm] 199.00 1.6% emulator_write_emulated [kvm] 187.00 1.5% emulator_write_emulated_onepage [kvm] 185.00 1.5% kvm_write_guest_page [kvm] 177.00 1.4% vmx_get_segment_base [kvm_intel] 158.00 1.2% fw_cfg_io_readb qemu-system-x86_64
This is where something gets done.
148.00 1.2% register_address_increment [kvm] 142.00 1.1% emulator_write_phys [kvm]
And here too. So 97.7% overhead, which could be reduced by a factor of 4096 if the code is made more rep-aware.
On Mon, Oct 10, 2011 at 06:08:03PM +0100, Daniel P. Berrange wrote:
I've been investigating where time disappears to when booting Linux guests.
Initially I enabled DEBUG_BIOS in QEMU's hw/pc.c, and then hacked it so that it could print a timestamp before each new line of debug output. The problem with that is that it slowed down startup, so the timings I was examining all changed.
A lot of effort went into optimizing SeaBIOS boot time. There is a tool in seabios git to help with benchmarking - tools/readserial.py. The tool was designed for use with serial ports on real machines using coreboot, but it works with qemu too:
mkfifo seabioslog
./tools/readserial.py -nf seabioslog
qemu-system-x86_64 -chardev pipe,id=seabios,path=seabioslog -device isa-debugcon,iobase=0x402,chardev=seabios -hda myimage
This will show the SeaBIOS debug output with timing info.
-Kevin