On Thu, Mar 15, 2007 at 07:37:07PM +0100, Stefan Reinauer wrote:
- Peter Stuge peter@stuge.se [070309 05:20]:
Hey,
On Tue, Mar 06, 2007 at 03:19:08PM -0500, Ward Vandewege wrote:
Stefan said you wrote a patch at the last symposium to make filo read several blocks at once from an ide device, and that that sped reading from IDE up by about 70%.
Do you happen to have that code lying around somewhere? I'd like to see it integrated in FILO.
Hmm.
I'm not sure that it ever happened.
I don't remember writing it.. And it's not in my filo source dir..
I remember us talking about it, but don't remember any code...
Ah, sorry, I thought you made something on the Symposium last year. I do remember you looked at the problem though. What was your suggestion? Reading multiple blocks at once or something?
It was someone else's suggestion (Ron maybe?) but I'm afraid I don't remember looking into it. Looking at the FILO code now it's not very familiar, but my memory may also have been drowned in malt beverage. :)
ide_read_sector_* only reads one sector at a time and could instead read up to 256. fs/blockdev.c:devread() always memcpy():s every sector as well . Lots of room for improvement here. :)
I do have two patches for the EPIA LB code though, one for HARD_RESET and one for using rdtsc to calibrate the timer.
I should get them into the tree. Will try to do that next week.
please! whenever you find the time.
Lots of people still use FILO. (Even though I try to talk it down while trying to find someone to look into the Grub2 port)
Find attached a patch for timer2 and hard reset. I'm looking at FILO right now.
//Peter
On Thu, Mar 15, 2007 at 09:27:48PM +0100, Peter Stuge wrote:
Find attached a patch for timer2 and hard reset.
Ok?
I'm looking at FILO right now.
I've hacked FILO to read up to 256 blocks per ATA command now, but am repeatedly getting some bad data here and there already in the first 10kb at kern_addr.
I'll investigate more but maybe someone has an idea already?
The two hexdumps don't change after reset.
There's a note in ide.c:pio_data_in() : /* FIXME handle commands with multiple blocks */
..but looking at the ATA PDF the code looks OK as it is, plus I'm getting bad data already in the first sector. Dunno.
//Peter
* Peter Stuge stuge-linuxbios@cdy.org [070317 04:43]:
I'm looking at FILO right now.
I've hacked FILO to read up to 256 blocks per ATA command now, but am repeatedly getting some bad data here and there already in the first 10kb at kern_addr.
Can all devices read 256 blocks at once? Does the problem happen with 16 blocks, too?
The two hexdumps don't change after reset.
You mean, the error is always at the same position?
There's a note in ide.c:pio_data_in() : /* FIXME handle commands with multiple blocks */
..but looking at the ATA PDF the code looks OK as it is, plus I'm getting bad data already in the first sector. Dunno.
Do you ever get a timeout? Is the ndelay too short for multiple blocks?
try udelay(1) instead. The openbios ide driver reads IDEREG_ASTATUS 4 times before delaying. Also, it waits 1000 times for the drive to become ready, while the FILO driver only seems to try 1 time?
maybe it makes sense to port the openbios ide driver (ie clean out the device tree hooks)?
It knows how to do multiple blocks and it knows hot to do ATAPI correctly. And cleaning this up might make it easily usable for GRUB2 as well..?
--- single_sector_read_kernel 2007-03-17 03:41:21.000000000 +0100 +++ multi_sector_read_kernel 2007-03-17 03:40:01.000000000 +0100 @@ -3,27 +3,27 @@ 00100020 32 00 00 00 02 00 00 00 45 4c 46 42 6f 6f 74 00 2.......ELFBoot. 00100030 30 2e 35 20 28 73 74 75 67 65 40 63 61 72 65 70 0.5 (stuge@carep 00100040 61 64 34 29 20 53 61 74 20 4d 61 72 20 31 37 20 ad4) Sat Mar 17 -00100050 30 33 3a 34 30 3a 30 38 20 43 45 54 20 32 30 30 03:40:08 CET 200 +00100050 30 33 3a 33 38 3a 34 38 20 43 45 54 20 32 30 30 03:38:48 CET 200
It's hard to day what is wrong and what is not, as it seems to load two different versions of filo. Can you try loading the same version of filo?
On Sun, Mar 18, 2007 at 06:25:30PM +0100, Stefan Reinauer wrote:
- Peter Stuge stuge-linuxbios@cdy.org [070317 04:43]:
I'm looking at FILO right now.
I've hacked FILO to read up to 256 blocks per ATA command now, but am repeatedly getting some bad data here and there already in the first 10kb at kern_addr.
Can all devices read 256 blocks at once?
It's the exact same command as reading a single sector.
Does the problem happen with 16 blocks, too?
Because of the fs, not all sectors are consecutive.
In my case it's first 7 sectors then 90. I compare 10kb, the first 10 fs (20 device) sectors and there are differences already in the first sector, even though the read code path is the same except for the sector_count parameter in pio_data_in(). :\
The two hexdumps don't change after reset.
You mean, the error is always at the same position?
Right. I only tried a few resets, will try more.
There's a note in ide.c:pio_data_in() : /* FIXME handle commands with multiple blocks */
..but looking at the ATA PDF the code looks OK as it is, plus I'm getting bad data already in the first sector. Dunno.
Do you ever get a timeout? Is the ndelay too short for multiple blocks?
No timeouts or errors seen.
try udelay(1) instead. The openbios ide driver reads IDEREG_ASTATUS 4 times before delaying. Also, it waits 1000 times for the drive to become ready, while the FILO driver only seems to try 1 time?
I'm using CF so it should be ready immediately.
maybe it makes sense to port the openbios ide driver (ie clean out the device tree hooks)?
It knows how to do multiple blocks and it knows hot to do ATAPI correctly. And cleaning this up might make it easily usable for GRUB2 as well..?
Sounds good. Is the code structure a lot different?
--- single_sector_read_kernel 2007-03-17 03:41:21.000000000 +0100 +++ multi_sector_read_kernel 2007-03-17 03:40:01.000000000 +0100 @@ -3,27 +3,27 @@ 00100020 32 00 00 00 02 00 00 00 45 4c 46 42 6f 6f 74 00 2.......ELFBoot. 00100030 30 2e 35 20 28 73 74 75 67 65 40 63 61 72 65 70 0.5 (stuge@carep 00100040 61 64 34 29 20 53 61 74 20 4d 61 72 20 31 37 20 ad4) Sat Mar 17 -00100050 30 33 3a 34 30 3a 30 38 20 43 45 54 20 32 30 30 03:40:08 CET 200 +00100050 30 33 3a 33 38 3a 34 38 20 43 45 54 20 32 30 30 03:38:48 CET 200
It's hard to day what is wrong and what is not, as it seems to load two different versions of filo. Can you try loading the same version of filo?
No, one version does what svn trunk does; read one sector at a time, the other tries to read multiple sectors.
I've restructured the code a bit and added some new multi-sector functions since single sector reads were assumed everywhere.
I realize it's difficult to suggest anything at this point.
//Peter
On Mon, Mar 19, 2007 at 02:10:25AM +0100, Peter Stuge wrote:
On Sun, Mar 18, 2007 at 06:25:30PM +0100, Stefan Reinauer wrote:
maybe it makes sense to port the openbios ide driver (ie clean out the device tree hooks)?
Sounds good. Is the code structure a lot different?
Turns out it wasn't very different. Basically I was just missing a proper loop for reading multiple blocks back from the controller.
Please test this patch. Note I have only implemented multiread support in fsys_ext2fs.c.
//Peter
Hi Peter,
On Tue, Mar 20, 2007 at 04:35:15AM +0100, Peter Stuge wrote:
On Mon, Mar 19, 2007 at 02:10:25AM +0100, Peter Stuge wrote:
On Sun, Mar 18, 2007 at 06:25:30PM +0100, Stefan Reinauer wrote:
maybe it makes sense to port the openbios ide driver (ie clean out the device tree hooks)?
Sounds good. Is the code structure a lot different?
Turns out it wasn't very different. Basically I was just missing a proper loop for reading multiple blocks back from the controller.
Please test this patch. Note I have only implemented multiread support in fsys_ext2fs.c.
Hmm, that doesn't seem to work. This is what's in my grub stanza to boot with an umodified filo:
title Ubuntu LB, kernel 2.6.21-rc3 root (hd4,0) kernel /boot/vmlinuz-2.6.21-rc3 root=/dev/sda1 ro apic=debug acpi_dbg_level=0xffffffff pci=noacpi,routeirq snd-hda-intel.enable_msi=1 console=tty0 console=ttyS0,115200 savedefault boot
I've attached the boot log.
Thanks, Ward.
Sorry for the delays here..
On Tue, Mar 20, 2007 at 03:43:46PM +0100, Stefan Reinauer wrote:
- Peter Stuge stuge-linuxbios@cdy.org [070320 04:35]:
Please test this patch. Note I have only implemented multiread support in fsys_ext2fs.c.
I am getting "Can't read kernel" on a CF:
hda: LBA 128MB: TRANSCEND DOM128M
On Tue, Mar 20, 2007 at 10:09:39AM -0400, Ward Vandewege wrote:
Hmm, that doesn't seem to work. This is what's in my grub stanza to boot with an umodified filo:
title Ubuntu LB, kernel 2.6.21-rc3 root (hd4,0) kernel /boot/vmlinuz-2.6.21-rc3 root=/dev/sda1 ro apic=debug acpi_dbg_level=0xffffffff pci=noacpi,routeirq snd-hda-intel.enable_msi=1 console=tty0 console=ttyS0,115200 savedefault boot
I've attached the boot log.
You're booting FILO from GRUB? Ok.
load_linux_kernel: offset=0x1e00 addr=0x100000 size=0x1d1473 Loading kernel... Can't read kernel
Same error as Stefan; FILO did not get enough bytes back from the filesystem.
I've added debugging output with this patch, please test this patch with DEBUG_BLOCKDEV, DEBUG_IDE and DEBUG_EXT2 (or DEBUG_ALL) set so I can get more details.
Unfortunately my own EPIA-MII board doesn't run anymore (blown GSC capacitors) so I haven't been able to test it myself.
Otherwise no new code since last patch. Please enable this debugging output (DEBUG_ALL=1 is easiest) and send it to the list.
The debugging was added back in a hurry and since I can't test I may need to send yet another patch if there's something missing.
No sign-off this time.
//Peter
On Mon, Mar 26, 2007 at 03:42:08AM +0200, Peter Stuge wrote:
I've added debugging output with this patch, please test this patch with DEBUG_BLOCKDEV, DEBUG_IDE and DEBUG_EXT2 (or DEBUG_ALL) set so I can get more details.
Ping.
//Peter
On Sat, Mar 31, 2007 at 05:12:32AM +0200, Peter Stuge wrote:
On Mon, Mar 26, 2007 at 03:42:08AM +0200, Peter Stuge wrote:
I've added debugging output with this patch, please test this patch with DEBUG_BLOCKDEV, DEBUG_IDE and DEBUG_EXT2 (or DEBUG_ALL) set so I can get more details.
Ping.
See the attached boot log.
Thanks, Ward.
On Tue, Apr 03, 2007 at 05:43:30PM -0400, Ward Vandewege wrote:
please test this patch with DEBUG_BLOCKDEV, DEBUG_IDE and DEBUG_EXT2 (or DEBUG_ALL) set so I can get more details.
Ping.
See the attached boot log.
Thanks! Unfortunately DEBUG_BLOCKDEV and DEBUG_IDE doesn't seem to have been set in this build. Could you double-check?
And if you do a rebuild, please set only DEBUG_BLOCKDEV, _IDE and _EXT2 rather than _ALL to make the debug output more readable.
I've put a version up at http://stuge.se/filo.elf which should have the right debug flags but again not tested so I may still have mad badness.
//Peter
On Wed, Apr 04, 2007 at 12:51:46AM +0200, Peter Stuge wrote:
On Tue, Apr 03, 2007 at 05:43:30PM -0400, Ward Vandewege wrote:
please test this patch with DEBUG_BLOCKDEV, DEBUG_IDE and DEBUG_EXT2 (or DEBUG_ALL) set so I can get more details.
Ping.
See the attached boot log.
Thanks! Unfortunately DEBUG_BLOCKDEV and DEBUG_IDE doesn't seem to have been set in this build. Could you double-check?
I had DEBUG_ALL set, which I assumed included DEBUG_BLOCKDEV, DEBUG_IDE, and DEBUG_EXT2 (and from your initial text above, you seemed to assume that too. Is that not so?
And if you do a rebuild, please set only DEBUG_BLOCKDEV, _IDE and _EXT2 rather than _ALL to make the debug output more readable.
See attached output. I don't see much debug output at all now.
Thanks, Ward.
I've put a version up at http://stuge.se/filo.elf which should have the right debug flags but again not tested so I may still have mad badness.
//Peter
-- linuxbios mailing list linuxbios@linuxbios.org http://www.linuxbios.org/mailman/listinfo/linuxbios
!DSPAM:4612daa4212521457816235!
On Thu, Apr 05, 2007 at 10:43:36AM -0400, Ward Vandewege wrote:
On Wed, Apr 04, 2007 at 12:51:46AM +0200, Peter Stuge wrote:
Thanks! Unfortunately DEBUG_BLOCKDEV and DEBUG_IDE doesn't seem to have been set in this build. Could you double-check?
I had DEBUG_ALL set, which I assumed included DEBUG_BLOCKDEV, DEBUG_IDE, and DEBUG_EXT2 (and from your initial text above, you seemed to assume that too. Is that not so?
I did and it is.
And if you do a rebuild, please set only DEBUG_BLOCKDEV, _IDE and _EXT2 rather than _ALL to make the debug output more readable.
See attached output. I don't see much debug output at all now.
But it's enough, I found a bug that is biting.
Booting 'hde1:/boot/vmlinuz-2.6.21-rc3 root=/dev/sda1 ro apic=debug acpi_dbg_le vel=0xffffffff pci=noacpi,routeirq snd-hda-intel.enable_msi=1 console=tty0 cons ole=ttyS0,115200' devopen: already open devopen: already open Found Linux version 2.6.21-rc3 (root@neuromancer) #1 SMP PREEMPT Mon Mar 12 15:10:47 EDT 2007 bzImage. Loading kernel... 10 consecutive blocks 43910301 - 43910310 len=1905267 ret=512 Can't read kernel
The kernel started 512 bytes into the 1024 bytes ext2fs block, but I didn't clear the offset after reading the first (partial) block and multiple sectors can't be read with non-zero offset.
New patch with bug fixed and more debugging output is attached.
//Peter
On Thu, Apr 05, 2007 at 07:18:49PM +0200, Peter Stuge wrote:
And if you do a rebuild, please set only DEBUG_BLOCKDEV, _IDE and _EXT2 rather than _ALL to make the debug output more readable.
See attached output. I don't see much debug output at all now.
But it's enough, I found a bug that is biting.
Booting 'hde1:/boot/vmlinuz-2.6.21-rc3 root=/dev/sda1 ro apic=debug acpi_dbg_le vel=0xffffffff pci=noacpi,routeirq snd-hda-intel.enable_msi=1 console=tty0 cons ole=ttyS0,115200' devopen: already open devopen: already open Found Linux version 2.6.21-rc3 (root@neuromancer) #1 SMP PREEMPT Mon Mar 12 15:10:47 EDT 2007 bzImage. Loading kernel... 10 consecutive blocks 43910301 - 43910310 len=1905267 ret=512 Can't read kernel
The kernel started 512 bytes into the 1024 bytes ext2fs block, but I didn't clear the offset after reading the first (partial) block and multiple sectors can't be read with non-zero offset.
New patch with bug fixed and more debugging output is attached.
OK, output attached - it got further this time, but not far enough. Note that there were a few extra lines on the screen that were not present on the serial console:
---------------------------------------------------- . Decompressing Linux...
invalid compressed format (err=2)
-- System halted ----------------------------------------------------
Thanks, Ward.
On Thu, Apr 05, 2007 at 02:30:48PM -0400, Ward Vandewege wrote:
New patch with bug fixed and more debugging output is attached.
OK, output attached - it got further this time, but not far enough.
It's starting to look good.
Please apply the attached patch on top of your current code. This simply removes the check that tripped up the read process from the FILO IDE driver.
The OpenBIOS IDE driver doesn't have it but it was in the FILO IDE driver before and I wasn't sure if I should keep it but left it in for good measure. It seems to cause problems.
Note that there were a few extra lines on the screen that were not present on the serial console:
Incomplete error checking in my new code. I've added that to my tree now and will send a signed-off patch once you and Stefan test OK.
//Peter
On Thu, Apr 05, 2007 at 10:59:47PM +0200, Peter Stuge wrote:
On Thu, Apr 05, 2007 at 02:30:48PM -0400, Ward Vandewege wrote:
New patch with bug fixed and more debugging output is attached.
OK, output attached - it got further this time, but not far enough.
It's starting to look good.
Please apply the attached patch on top of your current code. This simply removes the check that tripped up the read process from the FILO IDE driver.
The OpenBIOS IDE driver doesn't have it but it was in the FILO IDE driver before and I wasn't sure if I should keep it but left it in for good measure. It seems to cause problems.
Hmm, that's not it yet, output attached. Looks *very* similar to last time. Again, a couple of extra lines (only on console, not on serial):
-------------------------------------------------------------------- . Decompressing Linux...
invalid compressed format (err=1)
-- System halted --------------------------------------------------------------------
Thanks, Ward.
On Thu, Apr 05, 2007 at 04:47:37PM -0400, Ward Vandewege wrote:
Hmm, that's not it yet, output attached. Looks *very* similar to last time.
It's identical. Did the change to drivers/ide.c really take?
Look around line 408 in pio_data_in() - there should just be
return 0;
after the while loop.
//Peter
On Thu, Apr 05, 2007 at 11:26:55PM +0200, Peter Stuge wrote:
On Thu, Apr 05, 2007 at 04:47:37PM -0400, Ward Vandewege wrote:
Hmm, that's not it yet, output attached. Looks *very* similar to last time.
It's identical. Did the change to drivers/ide.c really take?
Look around line 408 in pio_data_in() - there should just be
return 0;
after the while loop.
Yeah, verified:
if (!(ctrl->stat & IDE_STATUS_DRQ)) { print_status(ctrl); return -1; } insw(IDE_REG_DATA(ctrl), buffer, count/2); buffer += count; bytes -= count; ndelay(400); } while(bytes);
return 0; }
I've built the image again to rule out any silly mistakes, and still get the same result.
Thanks, Ward.
On Thu, Apr 05, 2007 at 05:27:10PM -0400, Ward Vandewege wrote:
On Thu, Apr 05, 2007 at 11:26:55PM +0200, Peter Stuge wrote:
It's identical. Did the change to drivers/ide.c really take?
Yeah, verified:
Yep. Looks good.
I've built the image again to rule out any silly mistakes, and still get the same result.
Very strange.
Fresh LB buildtarget and make after rebuilding FILO still uses the old payload file.. How can that be?
Ok, I've added a little more debugging output and this patch also contains full error checking.
//Peter
* Peter Stuge stuge-linuxbios@cdy.org [070406 00:40]:
Fresh LB buildtarget and make after rebuilding FILO still uses the old payload file.. How can that be?
hm, broken dependency in the compression step?
On Fri, Apr 06, 2007 at 12:40:34AM +0200, Peter Stuge wrote:
On Thu, Apr 05, 2007 at 05:27:10PM -0400, Ward Vandewege wrote:
On Thu, Apr 05, 2007 at 11:26:55PM +0200, Peter Stuge wrote:
It's identical. Did the change to drivers/ide.c really take?
Yeah, verified:
Yep. Looks good.
I've built the image again to rule out any silly mistakes, and still get the same result.
Very strange.
Fresh LB buildtarget and make after rebuilding FILO still uses the old payload file.. How can that be?
I don't think that was the case - perhaps the patch just didn't fix the problem?
Ok, I've added a little more debugging output and this patch also contains full error checking.
Output attached; I assume you wanted me to patch this onto a fresh filo checkout. Seems like we regressed to the original problem. This time, though, all debug output *is* included with DEBUG_ALL (I forgot to disable it and enable the 3 specific debug settings).
Thanks, Ward.
Hi,
On Fri, Apr 06, 2007 at 10:51:00AM -0400, Ward Vandewege wrote:
Fresh LB buildtarget and make after rebuilding FILO still uses the old payload file.. How can that be?
I don't think that was the case - perhaps the patch just didn't fix the problem?
Of course you are right. Sorry about doubting that.
After having processed the read command, it seems the hard disk reports "not busy" before "there is data" but that's the wrong order according to the ATA-3 working draft that I'm using for reference. (From 1997 but it's the latest I've found available at no cost. T13 d2008r7b)
The print_status function read the status register again, and by then the DRQ ("data request") bit was set, causing my confusion.
I've added a wait for DRQ and the timeout should not be too long because the same code is used to detect that no disk is attached. I've used 50ms.
The attached patch should apply over the last full patch.
The FIXME I've added is also from reviewing the PIO protocol in the same draft. It seems the real thing to wait for is not 50ms but rather BSY=0&RDY=1 after having written the device register. This could make a big difference also for ATAPI and filesystems doing single sector reads, but I don't want to change it until multi sector reads are working.
Ok, I've added a little more debugging output and this patch also contains full error checking.
Output attached; I assume you wanted me to patch this onto a fresh filo checkout.
Yes, thanks.
Seems like we regressed to the original problem.
It's still the same problem, but now the error is reported rather than an attempt made to start an incomplete kernel.
Another thing that needs to be investigated is why FILO reads 4096+1380 bytes from the disk _one byte at a time_ before the menu timeou, but that's also for later.
//Peter
A couple of comments on all of this:
After having processed the read command, it seems the hard disk reports "not busy" before "there is data" but that's the wrong order according to the ATA-3 working draft that I'm using for reference. (From 1997 but it's the latest I've found available at no cost. T13 d2008r7b)
All of the ATA drafts, up to and inluding ATA-8 are available for free download from t13.org, as they have been for several years. ATA-3 has actually been withdrawn. ATA-5 and ATA-6 are usually the best references, they are the most easily readable. ATA-7 and ATA-8 include SATA, which starts to convolute them.
IDE UDMA is always going to be faster than PIO. If your intent is only to speed up CF, then it doesn't really matter, UDMA and MDMA CFs are still pretty rare. If you want to make all filo loads faster, you may want to look into IDE DMA. It really isn't much harder to program than IDE PIO.
If you don't want to do that much work, you may want to at least try to use rep insd when you can. It is faster. All relatively modern IDE controllers support it just fine. (Some older controllers do not, however)
In any IDE driver, polling "Status" is usually bad form. You should poll "Alternate Status", until BSY goes away, wait for the other bits to get set properly, then read Status once to clear the interrupt. Polling Status will sometimes lead to spurious interrupts, which *should* be innocuous, but sometimes can have confusing effects.
Tom Sylla wrote:
IDE UDMA is always going to be faster than PIO. If your intent is only to speed up CF, then it doesn't really matter, UDMA and MDMA CFs are still pretty rare. If you want to make all filo loads faster, you may want to look into IDE DMA. It really isn't much harder to program than IDE PIO.
I've been following this thread...DMA might not be much harder to program, but it's not supported by some compact flash cards and/or the IDE adapters some of us use. FILO needs to have some sort of PIO mode capability ;)
-Corey
Corey Osgood wrote:
Tom Sylla wrote:
IDE UDMA is always going to be faster than PIO. If your intent is only to speed up CF, then it doesn't really matter, UDMA and MDMA CFs are still pretty rare. If you want to make all filo loads faster, you may want to look into IDE DMA. It really isn't much harder to program than IDE PIO.
I've been following this thread...DMA might not be much harder to program, but it's not supported by some compact flash cards and/or the IDE adapters some of us use. FILO needs to have some sort of PIO mode capability ;)
-Corey
Heh...disregard that, just read your message again. It's been a long day, thank god it's friday.
-Corey
On Fri, Apr 06, 2007 at 03:16:41PM -0400, Tom Sylla wrote:
A couple of comments on all of this:
Thanks for the comments!
After having processed the read command, it seems the hard disk reports "not busy" before "there is data" but that's the wrong order according to the ATA-3 working draft that I'm using for reference. (From 1997 but it's the latest I've found available at no cost. T13 d2008r7b)
All of the ATA drafts, up to and inluding ATA-8 are available for free download from t13.org, as they have been for several years. ATA-3 has actually been withdrawn. ATA-5 and ATA-6 are usually the best references, they are the most easily readable.
Thanks for the notice! I didn't find the drafts there last time but got them now.
ATA-7 and ATA-8 include SATA, which starts to convolute them.
Right. I've also seen some mentions of PATA/SATA translation and the effects, if any, it has on the PIO protocol.
IDE UDMA is always going to be faster than PIO. If your intent is only to speed up CF, then it doesn't really matter, UDMA and MDMA CFs are still pretty rare.
The intent is to read up to 256 consecutive sectors from ATA devices with one command rather than many commands reading one sector each.
This required changes in FILO and pio_data_in() is the central function that we're debugging now.
If you want to make all filo loads faster, you may want to look into IDE DMA.
Could be nice, but one step at a time. :)
If you don't want to do that much work, you may want to at least try to use rep insd when you can. It is faster. All relatively modern IDE controllers support it just fine. (Some older controllers do not, however)
Would insd be less compatible than insw? I don't want to compromise compatibility.
I see Hale Landis' ATA/ATAPI driver on ata-atapi.com also defaults to 16 bit transfers and he seems to have done pretty thorough research.
In any IDE driver, polling "Status" is usually bad form. You should poll "Alternate Status", until BSY goes away, wait for the other bits to get set properly, then read Status once to clear the interrupt. Polling Status will sometimes lead to spurious interrupts, which *should* be innocuous, but sometimes can have confusing effects.
FILO doesn't use interrupts at all, so there should not be any interrupt problems.
Anyway, thanks for making me go read ATA-6. It confirms what is in the old ATA-3 draft.
Apparently there can be a delay in devices so that BSY is cleared one PIO cycle before the other bits are valid, which makes no sense to me, and is not allowed by either draft.
Reading altstatus would see BSY clear, then reading status would of course still see BSY clear, but by then the other bits would also be valid. If this is deliberate device design my head may explode.
Ward's SATA drive/controller reports BSY=0 DRQ=0 at first and BSY=0 DRQ=1 one PIO cycle later.
--8<-- ATA-3 draft Page 27 The device shall not change the state of the DRQ bit unless the BSY bit is equal to one. When the last block of a PIO data in command has been transferred by the host, then the DRQ bit is cleared without the BSY bit being set. -->8--
--8<-- ATA-6 draft Page 72 The BSY bit shall be cleared to zero by the device: 1) after setting DRQ to one to indicate the device is ready to transfer data; -->8--
I think the 50ms timeout should work and is the best we can do.
//Peter
Peter Stuge wrote:
If you don't want to do that much work, you may want to at least try to use rep insd when you can. It is faster. All relatively modern IDE controllers support it just fine. (Some older controllers do not, however)
Would insd be less compatible than insw? I don't want to compromise compatibility.
Well, maybe, but no more that doing multiple sector reads without checking if the drive can do them :) You really should be checking word 47 of the IDENTIFY DEVICE data to know how many sectors per interrupt are transferred for READ/WRITE MULTIPLE. I didn't see that in your patch any where. In general, the various IDE speed-ups can only be done when the drive, the controller, and the chipset can do them. Sometimes, there is a defined way to detect if they are supported. The ID data is usually a good place to look first.
I see Hale Landis' ATA/ATAPI driver on ata-atapi.com also defaults to 16 bit transfers and he seems to have done pretty thorough research.
I have worked with Hale on several occasions, while debugging and validating a couple of IDE controllers. I know his software pretty well. He supports 8, 16, and 32 bit transfers. 32-bit is in ATAIOPIO.c line 402:
if ( pxw == 32 ) { // do REP INSD pio_rep_indword( addrDataReg, bufSeg, bufOff, wc / 2L ); }
We used ATACT and ATAMDT from Hale a lot, and always ran 32 bit mode. The default of 16 *is* probably to be safe, but you would have to have a pretty crusty IDE controller for it not to support 32-bit PIO.
You may want to try 32 bit mode, and see how much it speeds things up; if it isn't worth it, ignore me. If it is a reasonable gain, maybe make it a build option or whatever is equivalent for FILO. The "compatibility" depends on the southbridge, the FILO user could speed it up if they want.
Reading altstatus would see BSY clear, then reading status would of course still see BSY clear, but by then the other bits would also be valid. If this is deliberate device design my head may explode.
Ward's SATA drive/controller reports BSY=0 DRQ=0 at first and BSY=0 DRQ=1 one PIO cycle later.
You will find all sorts of inconsistencies and non-spec-compliant IDE devices brand new. Any one trying to write another IDE driver encounters these sorts of things. :) SATA drives and controllers are notorious for doing poor PIO emulation.
On Fri, Apr 06, 2007 at 06:07:58PM -0400, Tom Sylla wrote:
Peter Stuge wrote:
If you don't want to do that much work, you may want to at least try to use rep insd when you can. It is faster. All relatively modern IDE controllers support it just fine. (Some older controllers do not, however)
How old by the way? ISA old? VLB old? Early PCI old?
Would insd be less compatible than insw? I don't want to compromise compatibility.
Well, maybe, but no more that doing multiple sector reads without checking if the drive can do them :)
Hehe! :)
You really should be checking word 47 of the IDENTIFY DEVICE data to know how many sectors per interrupt are transferred for READ/WRITE MULTIPLE. I didn't see that in your patch any where.
Right, but I'm still using READ SECTOR(S), just with Sector Count > 1 so I don't think I need SET MULTIPLE MODE.
In general, the various IDE speed-ups can only be done when the drive, the controller, and the chipset can do them. Sometimes, there is a defined way to detect if they are supported. The ID data is usually a good place to look first.
Aye, but nothing seems relevant for READ SECTOR(S).
We used ATACT and ATAMDT from Hale a lot, and always ran 32 bit mode. The default of 16 *is* probably to be safe, but you would have to have a pretty crusty IDE controller for it not to support 32-bit PIO.
You may want to try 32 bit mode, and see how much it speeds things up; if it isn't worth it, ignore me. If it is a reasonable gain, maybe make it a build option or whatever is equivalent for FILO.
On a 1GHz CPU it reduced load time with 48%.
Excellent idea! Said and done. :)
Ward's SATA drive/controller reports BSY=0 DRQ=0 at first and BSY=0 DRQ=1 one PIO cycle later.
You will find all sorts of inconsistencies and non-spec-compliant IDE devices brand new. Any one trying to write another IDE driver encounters these sorts of things. :) SATA drives and controllers are notorious for doing poor PIO emulation.
Ack. :(
//Peter
Peter Stuge wrote:
On Fri, Apr 06, 2007 at 06:07:58PM -0400, Tom Sylla wrote:
Peter Stuge wrote:
If you don't want to do that much work, you may want to at least try to use rep insd when you can. It is faster. All relatively modern IDE controllers support it just fine. (Some older controllers do not, however)
How old by the way? ISA old? VLB old? Early PCI old?
I little later than that, but not much: around the time IDE controllers were being integrated. For example, PIIX4-based systems did 32-bit transfer modes, as does every ICH since.
You really should be checking word 47 of the IDENTIFY DEVICE data to know how many sectors per interrupt are transferred for READ/WRITE MULTIPLE. I didn't see that in your patch any where.
Right, but I'm still using READ SECTOR(S), just with Sector Count > 1 so I don't think I need SET MULTIPLE MODE.
Ah, yep. Well, then there is another optimization to make, support mult_count. :)
Hi Peter,
On Fri, Apr 06, 2007 at 08:54:59PM +0200, Peter Stuge wrote:
On Fri, Apr 06, 2007 at 10:51:00AM -0400, Ward Vandewege wrote:
Fresh LB buildtarget and make after rebuilding FILO still uses the old payload file.. How can that be?
I don't think that was the case - perhaps the patch just didn't fix the problem?
Of course you are right. Sorry about doubting that.
No problem at all. I really appreciate all your efforts.
So - it boots!! I've attached a boot log. I'm going to try to time things a bit now, disabling debug etc.
Thanks!! Ward.
* Ward Vandewege ward@gnu.org [070406 22:17]:
Hi Peter,
On Fri, Apr 06, 2007 at 08:54:59PM +0200, Peter Stuge wrote:
On Fri, Apr 06, 2007 at 10:51:00AM -0400, Ward Vandewege wrote:
Fresh LB buildtarget and make after rebuilding FILO still uses the old payload file.. How can that be?
I don't think that was the case - perhaps the patch just didn't fix the problem?
Of course you are right. Sorry about doubting that.
No problem at all. I really appreciate all your efforts.
So - it boots!! I've attached a boot log. I'm going to try to time things a bit now, disabling debug etc.
Can you guys send a single complete patch when it's done?
On Fri, Apr 06, 2007 at 10:29:12PM +0200, Stefan Reinauer wrote:
So - it boots!! I've attached a boot log. I'm going to try to time things a bit now, disabling debug etc.
Can you guys send a single complete patch when it's done?
Yes, for sure. But I think Peter probably wants to speed it up some more :)
Thanks, Ward.
Hey,
On Fri, Apr 06, 2007 at 04:17:26PM -0400, Ward Vandewege wrote:
On Fri, Apr 06, 2007 at 08:54:59PM +0200, Peter Stuge wrote:
Of course you are right. Sorry about doubting that.
No problem at all. I really appreciate all your efforts.
I want to make this work well. :)
So - it boots!! I've attached a boot log.
Beautiful! :)
I'm going to try to time things a bit now, disabling debug etc.
On Fri, Apr 06, 2007 at 04:33:41PM -0400, Ward Vandewege wrote:
On Fri, Apr 06, 2007 at 10:29:12PM +0200, Stefan Reinauer wrote:
Can you guys send a single complete patch when it's done?
Yes, here's a complete patch with signoff.
Yes, for sure. But I think Peter probably wants to speed it up some more :)
Two things remain:
--8<-- minicom-20070406-file-speedup-patch2.log.gz Mounted ext2fs ext2fs_read_one: block 43935752 offset=0 len=1 ret=0 ext2fs_read_one: block 43935752 offset=1 len=1 ret=0 ext2fs_read_one: block 43935752 offset=2 len=1 ret=0 ext2fs_read_one: block 43935752 offset=3 len=1 ret=0 ext2fs_read_one: block 43935752 offset=4 len=1 ret=0 ext2fs_read_one: block 43935752 offset=5 len=1 ret=0 -->8--
This is GRUB code reading menu.lst one byte at a time because it is looking for a newline.
There's a simple but good enough sector cache in read_sector() that gets used here so this is not a problem.
The mdelay(50) that I was worried about is not executed for every IDE command, but only for commands going to a different device than the last command, so it is also a non-issue.
I think this patch is good to go in now but please feel free to test further.
Thanks to Ward for patient help with remote debugging! :)
//Peter
* Peter Stuge stuge-linuxbios@cdy.org [070407 05:34]:
This is GRUB code reading menu.lst one byte at a time because it is looking for a newline.
There's a simple but good enough sector cache in read_sector() that gets used here so this is not a problem.
Can we enhance this and use it for loading the kernel too ("readahead") so we don't have to patch all filesystems?
On Sat, Apr 07, 2007 at 11:09:46AM +0200, Stefan Reinauer wrote:
- Peter Stuge stuge-linuxbios@cdy.org [070407 05:34]:
There's a simple but good enough sector cache in read_sector() that gets used here so this is not a problem.
Can we enhance this and use it for loading the kernel too ("readahead") so we don't have to patch all filesystems?
Yes, but it would be difficult to find a good compromise between cache size and load time. Ward's kernel was in two chunks. My kernel is in 15 chunks.
Also, unless the filesystem caches needed metadata (journal) internally, every journal read will cause a cache miss and refill. :\
//Peter
* Peter Stuge stuge-linuxbios@cdy.org [070407 11:52]:
Yes, but it would be difficult to find a good compromise between cache size and load time. Ward's kernel was in two chunks. My kernel is in 15 chunks.
Who makes these chunks? Fragmented filesystem?
On Sat, Apr 14, 2007 at 07:27:23PM +0200, Stefan Reinauer wrote:
- Peter Stuge stuge-linuxbios@cdy.org [070407 11:52]:
Yes, but it would be difficult to find a good compromise between cache size and load time. Ward's kernel was in two chunks. My kernel is in 15 chunks.
Who makes these chunks? Fragmented filesystem?
Yes. 15 was on my flash ext2 that I've run rsync and scp to transfer files to. Also consider filesystems that need to read state somewhere else while reading one file. Perhaps reiser needs to look in the journal?
//Peter
* Peter Stuge stuge-linuxbios@cdy.org [070320 04:35]:
On Mon, Mar 19, 2007 at 02:10:25AM +0100, Peter Stuge wrote:
On Sun, Mar 18, 2007 at 06:25:30PM +0100, Stefan Reinauer wrote:
maybe it makes sense to port the openbios ide driver (ie clean out the device tree hooks)?
Sounds good. Is the code structure a lot different?
Turns out it wasn't very different. Basically I was just missing a proper loop for reading multiple blocks back from the controller.
Please test this patch. Note I have only implemented multiread support in fsys_ext2fs.c.
I am getting "Can't read kernel" on a CF:
hda: LBA 128MB: TRANSCEND DOM128M
* Peter Stuge stuge-linuxbios@cdy.org [070315 21:27]:
please! whenever you find the time.
Lots of people still use FILO. (Even though I try to talk it down while trying to find someone to look into the Grub2 port)
Find attached a patch for timer2 and hard reset. I'm looking at FILO right now.
The below patch works fine on my system, but some older versions of the C3 lack support for the rdtsc command. (Nehemiah has it)
Whereas the Centaur/Wincore is said to not have rdtsc.
I would assume the patch is wrong for the epia and right for the epia-m.
Do you mind dropping the epia part?
Changes by Richard Smith and me from the LinuxBIOS symposium 2006.
Without CONFIG_TSC_X86RDTSC_CALIBRATE_WITH_TIMER2 1 million outb():s are used for timer calibration and that takes over one second. EPIA boards have the x86 timer2 so let's use it and make boot faster.
src/mainboard/via/epia*/reset.c is dead code so HARD_RESET should be 0. (entire file within #if 0)
Signed-off-by: Peter Stuge peter@stuge.se
Index: src/mainboard/via/epia-m/Options.lb
--- src/mainboard/via/epia-m/Options.lb (revision 2570) +++ src/mainboard/via/epia-m/Options.lb (working copy) @@ -38,6 +38,7 @@ uses MAXIMUM_CONSOLE_LOGLEVEL uses CONFIG_CONSOLE_SERIAL8250 uses CONFIG_UDELAY_TSC +uses CONFIG_TSC_X86RDTSC_CALIBRATE_WITH_TIMER2 uses CONFIG_PCI_ROM_RUN uses CONFIG_CONSOLE_VGA uses CONFIG_MAX_PCI_BUSES @@ -66,11 +67,12 @@ ## Use TSC for udelay. ## default CONFIG_UDELAY_TSC=1 +default CONFIG_TSC_X86RDTSC_CALIBRATE_WITH_TIMER2=1
## ## Build code to reset the motherboard from linuxBIOS ## -default HAVE_HARD_RESET=1 +default HAVE_HARD_RESET=0
## ## Build code to export a programmable irq routing table Index: src/mainboard/via/epia/Options.lb =================================================================== --- src/mainboard/via/epia/Options.lb (revision 2570) +++ src/mainboard/via/epia/Options.lb (working copy) @@ -10,7 +10,8 @@ uses USE_FALLBACK_IMAGE uses HAVE_FALLBACK_BOOT uses HAVE_HARD_RESET -uses CONFIG_UDELAY_IO +uses CONFIG_UDELAY_TSC +uses CONFIG_TSC_X86RDTSC_CALIBRATE_WITH_TIMER2 uses HAVE_OPTION_TABLE uses USE_OPTION_TABLE uses CONFIG_ROM_PAYLOAD @@ -81,12 +82,13 @@ ## ## Build code to reset the motherboard from linuxBIOS ## -default HAVE_HARD_RESET=1 +default HAVE_HARD_RESET=0
## -## use io based udelay function +## use TSC based udelay function ## -default CONFIG_UDELAY_IO=1 +default CONFIG_UDELAY_TSC=1 +default CONFIG_TSC_X86RDTSC_CALIBRATE_WITH_TIMER2=1
## ## Build code to export a programmable irq routing table
-- linuxbios mailing list linuxbios@linuxbios.org http://www.openbios.org/mailman/listinfo/linuxbios
On Sun, Mar 18, 2007 at 06:36:44PM +0100, Stefan Reinauer wrote:
Find attached a patch for timer2 and hard reset. I'm looking at FILO right now.
The below patch works fine on my system, but some older versions of the C3 lack support for the rdtsc command. (Nehemiah has it)
Whereas the Centaur/Wincore is said to not have rdtsc.
Aha!
I would assume the patch is wrong for the epia and right for the epia-m.
Do you mind dropping the epia part?
Not at all, we should have working defaults, but I'll add a note to the wiki for people to enable it if they have Nehemiah.
Can CONFIG_UDELAY_TSC=1 in targets/via/epia/Config.lb also automatically set CONFIG_UDELAY_IO=0 ?
//Peter