On Sun, Apr 28, 2013 at 10:29 PM, Mark Cave-Ayland mark.cave-ayland@ilande.co.uk wrote:
On 27/04/13 07:06, Artyom Tarasenko wrote:
Is this a regression? If so, what was the last know good revision?
Doesn't look like a regression. Checked the OpenBIOS image packaged with QEMU (they didn't update it recently) and it fails too. Just haven't tested it before. Usually I boot from a CD/DVD image or load the kernel directly with the -kernel option.
Ha okay - I finally figured this one out with a reproducible test case like this:
- First drop the Linux filesystem cache:
echo 3 > /proc/sys/vm/drop_caches
- Start up QEMU - this now blows up in SILO:
./qemu-system-sparc64 -hda /home/build/src/qemu/image/sparc64/debian-wheezy.qcow2 -boot c -nographic -snapshot
- Load the debian-wheezy.qcow2 image into the filesystem cache
cat debian-wheezy.qcow2 > /dev/null
- Restart QEMU again and the image boots
./qemu-system-sparc64 -hda /home/build/src/qemu/image/sparc64/debian-wheezy.qcow2 -boot c -nographic -snapshot
Once I could reproduce this at will, I recompiled OpenBIOS with CONFIG_DEBUG_IDE enabled and got the following output:
Jumping to entry point 0000000000004000 for type 0000000000000005... switching to new context: entry point 0x4000 stack 0x00000000ffe86a39 SIDE - ob_ide_open: opening channel -1539032 unit 0 IDE DRIVE @ffe88428: unit: 0 present: 1 type: 1 media: 32 model: QEMU HARDDISK nr: 0 cyl: 8322 head: 16 sect: 63 bs: 512 IDE - ob_ide_block_size: ob_ide_block_size: block size 200 IDE - ob_ide_max_transfer: max_transfer 1fe00 IDE - ob_ide_read_blocks: ob_ide_read_blocks ffe86b20 block=0 n=1 IDE - ob_ide_read_sectors: ob_ide_read_sectors: block=0 sectors=1 IDE - ob_ide_read_blocks: ob_ide_read_blocks ffe86b20 block=0 n=1 IDE - ob_ide_read_sectors: ob_ide_read_sectors: block=0 sectors=1 IDE - ob_ide_read_blocks: ob_ide_read_blocks ffea9e98 block=2 n=2 IDE - ob_ide_read_sectors: ob_ide_read_sectors: block=2 sectors=2 IDE - ob_ide_read_blocks: ob_ide_read_blocks 41dc block=5248 n=1 IDE - ob_ide_read_sectors: ob_ide_read_sectors: block=5248 sectors=1 IDE - ob_ide_error: ob_ide_error drive<0>: timed out waiting for BUSY clear: IDE - ob_ide_error: cmd=20, stat=d0IDE - ob_ide_error: IDE - ob_ide_pio_data_in: bytes=512, stat=d0 IDE - ob_ide_read_blocks: ob_ide_read_blocks: error EXIT 0 >
Looking at drivers/ide.c, it was apparent that we were timing out waiting for the IDE event to finish:
/*
- wait for BUSY clear
*/ if (ob_ide_wait_stat(drive, 0, BUSY_STAT | ERR_STAT, &stat)) { ob_ide_error(drive, stat, "timed out waiting for BUSY clear"); cmd->stat = stat; break; }
And ob_ide_wait_stat() has a loop that looks like this:
for (i = 0; i < 5000; i++) { stat = ob_ide_pio_readb(drive, IDEREG_STATUS); if (!(stat & BUSY_STAT)) break;
udelay(1000);
}
This led me to wonder if there was a problem with udelay() returning too quickly and so I had a look at the implementation in arch/sparc64/openbios.c which looks like this:
void udelay(unsigned int usecs) { }
Ah. So the problem is that we are not waiting long enough for the IDE command to finish which is why we are getting the error. A quick patch to get things working again for me is:
--- a/openbios-devel/arch/sparc64/openbios.c +++ b/openbios-devel/arch/sparc64/openbios.c @@ -547,6 +547,9 @@ void setup_timers(void)
void udelay(unsigned int usecs) {
- volatile int i;
- for (i = 0; i < usecs * 100; i++);
}
Ha. Told ya, it's a race in OpenBIOS. :-) .
But we probably need a better solution for this that works for both SPARC32 and SPARC64. I've seen some code that waits for the RTC to tick 1s and measures the number of loop iterations within that time to find a basic timing constant which looks quite appealing as it is simple and will work across both SPARC32 and SPARC64 (SPARC32 doesn't have a %tick register).
AFAIK that's what OBP does on sun4m. Obviously worked good back then, but nowadays it's not so attractive: even bare metal CPUs may have frequency scaling. In the virtual world the number of instructions per millisecond is even more volatile. Also some day the TCG optimiser may just optimise the empty loops away. If we need a precise sleep, we need timer interrupts, or a paravirtual sleep device. Though in this particular case, why not just replace "5000" with some value large enough? Reading from a hardware register is a good delay itself.
Artyom
Blue - any thoughts on how to implement this?
-- Regards, Artyom Tarasenko
linux/sparc and solaris/sparc under qemu blog: http://tyom.blogspot.com/search/label/qemu