On 10/11/2011 11:18 AM, Daniel P. Berrange wrote:
The rep/ins implementation is still slow, optimizing it can help.
What does 'perf top' say when running this workload?
To ensure it only recorded the LinuxBoot code, I created a 100 MB kernel image which takes approx 30 seconds to copy. Here is the perf output for approx 15 seconds of that copy:
1906.00 15.0% read_hpet [kernel]
Recent kernels are very clock intensive...
1029.00 8.1% x86_emulate_insn [kvm] 863.00 6.8% test_cc [kvm]
test_cc() is wierd - not called on this path at all.
661.00 5.2% emulator_get_segment [kvm] 631.00 5.0% kvm_mmu_pte_write [kvm] 535.00 4.2% __linearize [kvm] 431.00 3.4% do_raw_spin_lock [kernel] 356.00 2.8% vmx_get_segment [kvm_intel] 330.00 2.6% vmx_segment_cache_test_set [kvm_intel] 308.00 2.4% segmented_write [kvm] 291.00 2.3% vread_hpet [kernel].vsyscall_fn 251.00 2.0% vmx_get_cpl [kvm_intel] 230.00 1.8% trace_kvm_mmu_audit [kvm] 207.00 1.6% kvm_write_guest [kvm] 199.00 1.6% emulator_write_emulated [kvm] 187.00 1.5% emulator_write_emulated_onepage [kvm] 185.00 1.5% kvm_write_guest_page [kvm] 177.00 1.4% vmx_get_segment_base [kvm_intel] 158.00 1.2% fw_cfg_io_readb qemu-system-x86_64
This is where something gets done.
148.00 1.2% register_address_increment [kvm] 142.00 1.1% emulator_write_phys [kvm]
And here too. So 97.7% overhead, which could be reduced by a factor of 4096 if the code is made more rep-aware.