[Side note: This is not the big timing code rewrite I originally wanted to post. I decided to focus on readable code instead of clever code. My other rewrite is available on request if anyone is interested.]
The current delay loop calculation is still from revision 1 of flashrom, and since then it had a logic bug which caused all delays to be twice as long as intended. Fix the delay duration.
Protect against delay loop overflows.
Detect a non-working delay loop.
Change the delay loop itself to ensure clever compiler optimizers won't eliminate it (as happens with clang/llvm in the current code). Some people suggested machine-specific asm, but the empty asm statement with the loop counter as register/memory input has the benefit of being perfectly cross-platform and working in gcc and clang.
If time goes backwards (catastrophical NTP time difference, manual time change), timing measurements were shot because the new-old time subtraction yielded negative numbers which weren't handled correctly because the variable is unsigned. Simply return 1 microsecond timing in that case.
If time goes forward too fast, pick the biggest possible timing measurement with a guaranteed overflow avoidance for all timing calculations.
Check four times if the calculated timing is at most 10% too fast. This addresses OS scheduler interactions, e.g. being scheduled out during measurement which inflates measurements.
If the timing is off, recalculate the timer values up to four times before giving up.
Avoid division by zero in rare cases where timing measurements for a 250 ms delay return 0 us elapsed.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Index: flashrom-delayloop_portable_fix/udelay.c =================================================================== --- flashrom-delayloop_portable_fix/udelay.c (Revision 988) +++ flashrom-delayloop_portable_fix/udelay.c (Arbeitskopie) @@ -24,13 +24,16 @@ #include <limits.h> #include "flash.h"
-// count to a billion. Time it. If it's < 1 sec, count to 10B, etc. +/* loops per microsecond */ unsigned long micro = 1;
-void myusec_delay(int usecs) +__attribute__ ((noinline)) void myusec_delay(int usecs) { - volatile unsigned long i; - for (i = 0; i < usecs * micro; i++) ; + unsigned long i; + for (i = 0; i < usecs * micro; i++) { + /* Make sure the compiler doesn't optimize the loop away. */ + asm volatile ("" : : "rm" (i) ); + } }
unsigned long measure_delay(int usecs) @@ -43,30 +46,62 @@ gettimeofday(&end, 0); timeusec = 1000000 * (end.tv_sec - start.tv_sec) + (end.tv_usec - start.tv_usec); + /* Protect against time going forward too much. */ + if ((end.tv_sec > start.tv_sec) && + ((end.tv_sec - start.tv_sec) >= LONG_MAX / 1000000 - 1)) + timeusec = LONG_MAX; + /* Protect against time going backwards during leap seconds. */ + if ((end.tv_sec < start.tv_sec) || (timeusec > LONG_MAX)) + timeusec = 1;
return timeusec; }
void myusec_calibrate_delay(void) { - int count = 1000; + unsigned long count = 1000; unsigned long timeusec; - int ok = 0; + int i, tries = 0;
printf("Calibrating delay loop... ");
- while (!ok) { +recalibrate: + while (1) { timeusec = measure_delay(count); + if (timeusec > 1000000 / 4) + break; + if (count >= ULONG_MAX / 2) { + msg_pinfo("timer loop overflow, reduced precision. "); + break; + } count *= 2; - if (timeusec < 1000000 / 4) - continue; - ok = 1; } + tries ++;
- // compute one microsecond. That will be count / time - micro = count / timeusec; - msg_pdbg("%ldM loops per second, ", micro); + /* Avoid division by zero, but in that case the loop is shot anyway. */ + if (!timeusec) + timeusec = 1; + + /* Compute rounded up number of loops per microsecond. */ + micro = (count * micro) / timeusec + 1; + msg_pdbg("%luM loops per second, ", micro);
+ /* Did we try to recalibrate less than 5 times? */ + if (tries < 5) { + /* Recheck our timing to make sure we weren't just hitting + * a scheduler delay or something similar. + */ + for (i = 0; i < 4; i++) { + if (measure_delay(100) < 90) { + msg_pdbg("delay more than 10% too short, " + "recalculating... "); + goto recalibrate; + } + } + } else { + msg_perr("delay loop is unreliable, trying to continue "); + } + /* We're interested in the actual precision. */ timeusec = measure_delay(10); msg_pdbg("10 myus = %ld us, ", timeusec);
On Mon, Mar 29, 2010 at 06:50:20AM +0200, Carl-Daniel Hailfinger wrote:
+__attribute__ ((noinline)) void myusec_delay(int usecs)
Is this supported on all supported compilers? If not, maybe create a #define for that and use this instead of the explicit __attribute__((noinline)).
ciao Joerg
On 29.03.2010 12:44, Joerg Mayer wrote:
On Mon, Mar 29, 2010 at 06:50:20AM +0200, Carl-Daniel Hailfinger wrote:
+__attribute__ ((noinline)) void myusec_delay(int usecs)
Is this supported on all supported compilers? If not, maybe create a #define for that and use this instead of the explicit __attribute__((noinline)).
Hm. It is compatible with clang/LLVM, gcc and IBM XL C/C++.
Apparently MSVC (which can be tested for with _MSC_VER) wants |__declspec(noinline)
|Then again, I don't think anyone ever compiled and ran flashrom with a non-gcc compiler. Heck, without my delay loop fix flashrom won't work on clang.
If you have other compilers available to test compilation or even execution of flashrom, please report your test results to the list. I'm willing to fix anything that comes up and doesn't require major surgery. It should be possible to compile flashrom on any platform without hardware support with the following: make distclean make CONFIG_INTERNAL=no CONFIG_SERPROG=no CONFIG_SATASII=no CONFIG_DRKAISER=no CONFIG_NIC3COM=no CONFIG_BUSPIRATESPI=no CONFIG_FT2232SPI=no CONFIG_DEDIPROG=no CONFIG_DUMMY=yes
Regards, Carl-Daniel
On Mon, 29 Mar 2010, Carl-Daniel Hailfinger wrote:
[Side note: This is not the big timing code rewrite I originally wanted to post. I decided to focus on readable code instead of clever code. My other rewrite is available on request if anyone is interested.]
The current delay loop calculation is still from revision 1 of flashrom, and since then it had a logic bug which caused all delays to be twice as long as intended. Fix the delay duration.
Protect against delay loop overflows.
Detect a non-working delay loop.
Change the delay loop itself to ensure clever compiler optimizers won't eliminate it (as happens with clang/llvm in the current code). Some people suggested machine-specific asm, but the empty asm statement with the loop counter as register/memory input has the benefit of being perfectly cross-platform and working in gcc and clang.
If time goes backwards (catastrophical NTP time difference, manual time change), timing measurements were shot because the new-old time subtraction yielded negative numbers which weren't handled correctly because the variable is unsigned. Simply return 1 microsecond timing in that case.
If time goes forward too fast, pick the biggest possible timing measurement with a guaranteed overflow avoidance for all timing calculations.
Maybe in case of leap seconds or time changing simply detect, and either fail and restart whole timing calculation, from scratch (possibly noting attempt number). If after some fixed number of retries you can't get reliable result. simply fail?
Check four times if the calculated timing is at most 10% too fast. This addresses OS scheduler interactions, e.g. being scheduled out during measurement which inflates measurements.
this makes sense but if we want that good results, maybe odd number of repeats, leave off minimum and maximum ones, then calculate over remaining (and minimum sane number of remaining atempts is imo 5, thus seven is needed to make sufficient number of iterations).
Anyway, imo code is good and readable and gives more acurate results on my machines.
Acked-by: Maciej Pijanka maciej.pijanka@gmail.com
babol:~/flashrom# ./flashrom -V 2>&1 |grep loop Calibrating delay loop... 167M loops per second, 10 myus = 12 us, 100 myus = 103 us, 1000 myus = 1003 us, 10000 myus = 10054 us, OK. babol:~/flashrom# ./flashrom -V 2>&1 |grep loop Calibrating delay loop... 164M loops per second, 10 myus = 12 us, 100 myus = 101 us, 1000 myus = 1031 us, 10000 myus = 9877 us, OK. babol:~/flashrom# ./flashrom -V -p nic3com 2>&1 |grep loop Calibrating delay loop... 167M loops per second, 10 myus = 12 us, 100 myus = 102 us, 1000 myus = 1034 us, 10000 myus = 10052 us, OK. babol:~/flashrom# ./flashrom -V -p nic3com 2>&1 |grep loop Calibrating delay loop... 167M loops per second, 10 myus = 11 us, 100 myus = 103 us, 1000 myus = 1035 us, 10000 myus = 10052 us, OK.
On 31.03.2010 18:58, Maciej Pijanka wrote:
On Mon, 29 Mar 2010, Carl-Daniel Hailfinger wrote:
The current delay loop calculation is still from revision 1 of flashrom, and since then it had a logic bug which caused all delays to be twice as long as intended. Fix the delay duration.
Protect against delay loop overflows.
Detect a non-working delay loop.
Change the delay loop itself to ensure clever compiler optimizers won't eliminate it (as happens with clang/llvm in the current code). Some people suggested machine-specific asm, but the empty asm statement with the loop counter as register/memory input has the benefit of being perfectly cross-platform and working in gcc and clang.
If time goes backwards (catastrophical NTP time difference, manual time change), timing measurements were shot because the new-old time subtraction yielded negative numbers which weren't handled correctly because the variable is unsigned. Simply return 1 microsecond timing in that case.
If time goes forward too fast, pick the biggest possible timing measurement with a guaranteed overflow avoidance for all timing calculations.
Maybe in case of leap seconds or time changing simply detect, and either fail and restart whole timing calculation, from scratch (possibly noting attempt number). If after some fixed number of retries you can't get reliable result. simply fail?
The code does exactly what you suggested (unless I'm mistaken). The changelog needs improvement to actually tell people about this.
Check four times if the calculated timing is at most 10% too fast. This addresses OS scheduler interactions, e.g. being scheduled out during measurement which inflates measurements.
this makes sense but if we want that good results, maybe odd number of repeats, leave off minimum and maximum ones, then calculate over remaining (and minimum sane number of remaining atempts is imo 5, thus seven is needed to make sufficient number of iterations).
Hmmm... could be done, but I'd rather tell the user that timing is garbage than to add an elaborate algorithm to convert garbage to gold. ;-)
Anyway, imo code is good and readable and gives more acurate results on my machines.
Acked-by: Maciej Pijanka maciej.pijanka@gmail.com
Thanks, committed in r990.
Regards, Carl-Daniel