you could use io delay, one outb uses roughly 1us iirc.
On LPC, yes -- or 0.5us or something like that. On ISA it's a lot faster, on PCI too -- better do 20 or so outb's to be safe. Or use a *real* timer instead, you'll have to abstract this for portability anyway...
Segher