I know of no supported SPI chip which would require delays for any operation. The delays in our SPI code are only there to reduce load on the SPI bus. As such, they might be a good idea for shared flash, but I'm not too convinced of that either. For all external programmers, delays between commands are 100% pointless unless required by the flash chip (and I don't know of any such requirements).
Michael, I'd like to hear your thoughts on the impact of reading the status register in a tight loop instead of sleeping in between for systems which use flash translation (e.g. by EC).
There are two options how we can handle this: - Remove the delays completely. - Mark the delays as optional, e.g. programmer_odelay() and let it default to a nop.
This patch removes the delays.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Index: flashrom-spi_nodelay/spi25.c =================================================================== --- flashrom-spi_nodelay/spi25.c (Revision 1048) +++ flashrom-spi_nodelay/spi25.c (Arbeitskopie) @@ -455,11 +455,11 @@ return result; } /* Wait until the Write-In-Progress bit is cleared. - * This usually takes 1-85 s, so wait in 1 s steps. + * This usually takes 1-85 s. */ /* FIXME: We assume spi_read_status_register will never fail. */ while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(1000 * 1000); + /* do nothing */; if (check_erased_range(flash, 0, flash->total_size * 1024)) { msg_cerr("ERASE FAILED!\n"); return -1; @@ -500,11 +500,11 @@ return result; } /* Wait until the Write-In-Progress bit is cleared. - * This usually takes 1-85 s, so wait in 1 s steps. + * This usually takes 1-85 s. */ /* FIXME: We assume spi_read_status_register will never fail. */ while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(1000 * 1000); + /* do nothing */; if (check_erased_range(flash, 0, flash->total_size * 1024)) { msg_cerr("ERASE FAILED!\n"); return -1; @@ -545,10 +545,10 @@ return result; } /* Wait until the Write-In-Progress bit is cleared. - * This usually takes 100-4000 ms, so wait in 100 ms steps. + * This usually takes 100-4000 ms. */ while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(100 * 1000); + /* do nothing */; if (check_erased_range(flash, addr, blocklen)) { msg_cerr("ERASE FAILED!\n"); return -1; @@ -594,10 +594,10 @@ return result; } /* Wait until the Write-In-Progress bit is cleared. - * This usually takes 100-4000 ms, so wait in 100 ms steps. + * This usually takes 100-4000 ms. */ while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(100 * 1000); + /* do nothing */; if (check_erased_range(flash, addr, blocklen)) { msg_cerr("ERASE FAILED!\n"); return -1; @@ -641,10 +641,10 @@ return result; } /* Wait until the Write-In-Progress bit is cleared. - * This usually takes 100-4000 ms, so wait in 100 ms steps. + * This usually takes 100-4000 ms. */ while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(100 * 1000); + /* do nothing */; if (check_erased_range(flash, addr, blocklen)) { msg_cerr("ERASE FAILED!\n"); return -1; @@ -709,10 +709,10 @@ return result; } /* Wait until the Write-In-Progress bit is cleared. - * This usually takes 15-800 ms, so wait in 10 ms steps. + * This usually takes 15-800 ms. */ while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(10 * 1000); + /* do nothing */; if (check_erased_range(flash, addr, blocklen)) { msg_cerr("ERASE FAILED!\n"); return -1; @@ -977,7 +977,7 @@ if (rc) break; while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(10); + /* do nothing */; } if (rc) break; @@ -1010,7 +1010,7 @@ if (result) return 1; while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(10); + /* do nothing */; }
return 0; @@ -1044,13 +1044,13 @@ return result; spi_send_command(6, 0, w, NULL); while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(5); /* SST25VF040B Tbp is max 10us */ + /* do nothing */; while (pos < size) { w[1] = buf[pos++]; w[2] = buf[pos++]; spi_send_command(3, 0, w, NULL); while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(5); /* SST25VF040B Tbp is max 10us */ + /* do nothing */; } spi_write_disable(); return 0; Index: flashrom-spi_nodelay/it87spi.c =================================================================== --- flashrom-spi_nodelay/it87spi.c (Revision 1048) +++ flashrom-spi_nodelay/it87spi.c (Arbeitskopie) @@ -304,7 +304,7 @@ * This usually takes 1-10 ms, so wait in 1 ms steps. */ while (spi_read_status_register() & JEDEC_RDSR_BIT_WIP) - programmer_delay(1000); + /* do nothing */; return 0; }
Index: flashrom-spi_nodelay/wbsio_spi.c =================================================================== --- flashrom-spi_nodelay/wbsio_spi.c (Revision 1048) +++ flashrom-spi_nodelay/wbsio_spi.c (Arbeitskopie) @@ -163,7 +163,6 @@
OUTB(writearr[0], wbsio_spibase); OUTB(mode, wbsio_spibase + 1); - programmer_delay(10);
if (!readcnt) return 0;
Am Mittwoch, den 16.06.2010, 14:13 +0200 schrieb Carl-Daniel Hailfinger:
Michael, I'd like to hear your thoughts on the impact of reading the status register in a tight loop instead of sleeping in between for systems which use flash translation (e.g. by EC).
In shared flash environments, a short sleep between polls of the status register might be very useful. I don't see a big timing impact of a 10us sleep on flashrom, but it will greatly reduce SPI load. Also continous polling is a host CPU hog.
So I would suggest the following: - 10us delay where a quick response is expected - usleep(1) where a slow response is expected (Should give up a time slice, don't really know how it behaves on current tickless linux, though.)
There are two options how we can handle this:
- Remove the delays completely.
- Mark the delays as optional, e.g. programmer_odelay() and let it
default to a nop.
Third option: Make the delays so short that they don't matter. I'm all for the third option.
I would suggest to factor out the wait-WIP code and introduce three delay levels: near-immediate (10us), short (usleep(1)) and long (something like 50ms). Then the chip definition can state what polling delay is recommended for chip erase, sector erase and write.
It might be an even better idea to state the expected duration of the erase/write instruction in the chip definition. This would enable flashrom to make a coarse guess about total time, and make choosing of a small polling interval possible.
The reason I suggest this definitely more convoluted approach than just killing the delays is that I dislike the CPU maxing out at 100% for several minutes when waiting for chip-erase complete on a slow SPI chip. Especially if this is a shared flash system, I have no idea whether you could get thermal issues by that. One would hope that the "idle" mode of the EC used during flashing puts the fan to maximum - but who can be sure?
For these reasons, I am not going to ack this patch as is.
Regards, Michael Karcher
On 16/06/2010 15:56, Michael Karcher wrote:
In shared flash environments, a short sleep between polls of the status register might be very useful. I don't see a big timing impact of a 10us sleep on flashrom, but it will greatly reduce SPI load.
No, I don't think so, at least not because of the shared environment. Generally (always?) in a shared environment you first have to tell the other device to completely stop accessing the flash. Once you start any operation other than read then you cannot have anything else access that flash device or else one or the other will get corrupted results eg ID instead of code fetch. So the best thing to do is to complete all operations as quickly as possible and then signal the shared accesses to resume.
Andrew
Am Mittwoch, den 16.06.2010, 21:41 +0100 schrieb Andrew Goodbody:
On 16/06/2010 15:56, Michael Karcher wrote:
In shared flash environments, a short sleep between polls of the status register might be very useful. I don't see a big timing impact of a 10us sleep on flashrom, but it will greatly reduce SPI load.
No, I don't think so, at least not because of the shared environment. Generally (always?) in a shared environment you first have to tell the other device to completely stop accessing the flash.
Hmm, you are right for raw SPI access. For the LPC->SPI translation in many ECs (which we were not talking about here), there is hardware arbitration that coordinates flash access from the LPC bust and flash access from the EC core.
Once you start any operation other than read then you cannot have anything else access that flash device or else one or the other will get corrupted results eg ID instead of code fetch.
For SPI this is not the case. A read ID command always returns the ID and a read data command always returns data. For SPI it would be important to not interrupt between WREN and the program instruction, but as you already stated at the beginning: During flashing, the EC system is shut down so no interfering action will happen.
So the best thing to do is to complete all operations as quickly as possible and then signal the shared accesses to resume.
But the point on excessive CPU load still stands. A programmer_delay(10) of cause will also cause excessive CPU load, as it is busy waiting. So we should have the cases - no delay - short delay (give up timeslices) - long delay (several 10s of ms)
Regards, Michae Karcher