Busy wait is a loop of some number of NOP instructions, as opposed to relying on some CPU peripheral such as a timer to signal elapsed time. The number of NOP instructions has to be calculated from the current CPU frequency.
That seems more complicated than it needs to be. Here is what I am thinking: JEDEC specifies what the wait/delay is supposed to be between memory initialization steps. They specify/measure these in memory clock cycles. Currently we just guess, and round up in microseconds (us) (I guess it is better to wait too long than not enough). But, even if we wait 1/2 a microsecond or more than needed on each step, That's 3 or 4 microseconds longer than needed. See where I am going with this?
It really can't be less complicated, but a lot of the work is already done. Take a look at delay_tsc.c, which uses the tsc for delays (which is a little bit nicer than counting NOPs) It does the tsc calibration vs. the PIT (or even vs. port 80s) to get the CPU frequency. delay_tsc has udelay now, but could reasonably easily have ndelay too. Precision in the 10s of ns should be possible.
Peter's point is that it probably does not matter too much. Even if you rounded up 5 individual delays to 1 usec each, that is only 5us. You can reclaim a lot of it, but it may be more work than it is worth.