Fix out-of-bounds access if all erase functions fail.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Index: flashrom-fix_erasefunctions_nullpointer/flashrom.c =================================================================== --- flashrom-fix_erasefunctions_nullpointer/flashrom.c (Revision 1280) +++ flashrom-fix_erasefunctions_nullpointer/flashrom.c (Arbeitskopie) @@ -1514,8 +1514,12 @@ memcpy(curcontents, oldcontents, size);
for (k = 0; k< NUM_ERASEFUNCTIONS; k++) { + if (!usable_erasefunctions) { + msg_cdbg("No usable erase functions left.\n"); + break; + } msg_cdbg("Looking at blockwise erase function %i... ", k); - if (check_block_eraser(flash, k, 1)&& usable_erasefunctions) { + if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue; }
On Fri, 15 Apr 2011 07:31:29 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Index: flashrom-fix_erasefunctions_nullpointer/flashrom.c
--- flashrom-fix_erasefunctions_nullpointer/flashrom.c (Revision 1280) +++ flashrom-fix_erasefunctions_nullpointer/flashrom.c (Arbeitskopie) @@ -1514,8 +1514,12 @@ memcpy(curcontents, oldcontents, size);
for (k = 0; k< NUM_ERASEFUNCTIONS; k++) {
^ ^^ there is something broken with your old branch. spaces on the beginning of lines and after some binary operators. this does not apply to my git-svn head (or the r1280 equivalent) (could be a git-svn bug).
if (!usable_erasefunctions) {
msg_cdbg("No usable erase functions
left.\n");
break;
msg_cdbg("Looking at blockwise erase function %i...}
", k);
if (check_block_eraser(flash, k, 1)&&
^^^^ but it will probably fix the OOB segfault, if it applies. i dont understand the whole function though. why do we precheck the erase functions with an extra loop? just for nice logs and sparing us the memcpy? not justified imho: it just complicates things and that OOB failure is a typical symptom.
Am 15.04.2011 13:16 schrieb Stefan Tauner:
On Fri, 15 Apr 2011 07:31:29 +0200 Carl-Daniel Hailfingerc-d.hailfinger.devel.2006@gmx.net wrote:
Index: flashrom-fix_erasefunctions_nullpointer/flashrom.c
--- flashrom-fix_erasefunctions_nullpointer/flashrom.c (Revision 1280) +++ flashrom-fix_erasefunctions_nullpointer/flashrom.c (Arbeitskopie) @@ -1514,8 +1514,12 @@ memcpy(curcontents, oldcontents, size);
for (k = 0; k< NUM_ERASEFUNCTIONS; k++) {
^ ^^ there is something broken with your old branch. spaces on the beginning of lines and after some binary operators. this does not apply to my git-svn head (or the r1280 equivalent) (could be a git-svn bug).
This corruption is apparently a Seamonkey Mail regression present in Seamonkey 2.0.x. Please find the patch attached, hopefully without corruption.
if (!usable_erasefunctions) {
msg_cdbg("No usable erase functions
left.\n");
break;
} msg_cdbg("Looking at blockwise erase function %i...
", k);
if (check_block_eraser(flash, k, 1)&&
^^^^
but it will probably fix the OOB segfault, if it applies. i dont understand the whole function though. why do we precheck the erase functions with an extra loop? just for nice logs and sparing us the memcpy? not justified imho: it just complicates things and that OOB failure is a typical symptom.
We don't care about the memcpy (would be a micro-optimization and those are almost always a bad idea). However, we care about good debug and error messages.
Regards, Carl-Daniel
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
On Sat, 16 Apr 2011 00:13:50 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
We don't care about the memcpy (would be a micro-optimization and those are almost always a bad idea). However, we care about good debug and error messages.
what does the first loop get us in respect to what you said?
i would change the check_block_eraser method to return different values for "not suitable" and "not defined/no more methods" (there cant be holes in the eraser array, right?). i would then remove the first loop and just do checking and printing in the main loop. would you accept a patch for that or is there something i have missed?
Am 16.04.2011 13:37 schrieb Stefan Tauner:
On Sat, 16 Apr 2011 00:13:50 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
We don't care about the memcpy (would be a micro-optimization and those are almost always a bad idea). However, we care about good debug and error messages.
what does the first loop get us in respect to what you said?
You refactored that loop since then, so it's not a code readability issue anymore.
i would change the check_block_eraser method to return different values for "not suitable" and "not defined/no more methods" (there cant be holes in the eraser array, right?).
That would fix the bug and simplify the code at the cost of additional lines in the log for the following case: erase method 0 is suitable, but fails (no change) ... suitable but fails (no change) erase method n is not suitable (my version: not printed, your version: printed) ...not suitable (see above) erase method m is not defined (no change) ...not defined (no change) END OF ARRAY
A few LPC/FWH chips have a chip-erase function which is only available in parallel programmer mode (i.e. not in a mainboard), and those would get those additional lines.
i would then remove the first loop and just do checking and printing in the main loop. would you accept a patch for that or is there something i have missed?
I'm not opposed to it per se, but I'd like to know how you plan to address the additional lines printed in that approach.
Regards, Carl-Daniel
On Sat, 16 Jul 2011 02:13:44 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Am 16.04.2011 13:37 schrieb Stefan Tauner:
i would change the check_block_eraser method to return different values for "not suitable" and "not defined/no more methods"
(there cant be holes in the eraser array, right?).
this is wrong. there can be holes. anyway, my current approach looks similar to yours.
That would fix the bug and simplify the code at the cost of additional lines in the log for the following case: erase method 0 is suitable, but fails (no change) ... suitable but fails (no change) erase method n is not suitable (my version: not printed, your version: printed) ...not suitable (see above) erase method m is not defined (no change) ...not defined (no change) END OF ARRAY
i am not 100% sure what you mean. i did not specify how/when i would print the state of erase functions... if the behavior of the attached patch has this problem please specify.
A few LPC/FWH chips have a chip-erase function which is only available in parallel programmer mode (i.e. not in a mainboard), and those would get those additional lines.
how and where are those en/disabled?
the first attached patch includes a fix similar to yours, but a little bit extended. some modification i have done to test it are in a second patch (not for merge of course). i have done testing with my spi-capable serprog programmer. is the dummy programmer able to simulate such things too? i didnt bother to look :)
my patch changes the output a bit... to the better imho. - it removes a line break after each call of walk_eraseregions. what's the purpose of this? it makes holes in the log in the case something in walk_eraseregions ends with \n (lots of things do). this does not fix all problems though... see -EV output below. - it informs the user about the (potentially long) read of the whole chip after an erase failed. this is very important for slow programmers imho. - it adds a message to check_block_eraser in the case of success. since each caller can decide if any output is printed in check_block_eraser this is ok and we can improve the user experience in verbose mode.
it also changes the long comment because it is no longer true. we may wanna make it true again by checking the other erasers too and continuing the loop after checking for walk_eraseregions > 0 afterwards, but with your/mine patch it is just not true anymore.
below is the output of erase runs with different verbosity levels (note that the last erase function succeeds, so there is no test for the actual oob problem... but that is obviously fixed anyway... and i have tested this too.):
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -E flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OK. serprog: Programmer name is "atmegaXXu2 " Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Erasing and writing flash chip... ERASE FAILED! Reading current chip content... <very long pause>done. Trying next erase function. Done.
without the new "Reading current chip content..." there would be only "ERASE FAILED!" be printed for a quite long time. i think that's scary for users, and that they may hit ctrl+c because of it.
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EV flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OS timer resolution is 1 usecs, 1454M loops per second, 10 myus = 10 us, 100 myus = 99 us, 1000 myus = 994 us, 10000 myus = 10003 us, 4 myus = 4 us, OK. Initializing serprog programmer serprog: connected - attempting to synchronize . serprog: Synchronized serprog: Interface version ok. serprog: Bus support: parallel=off, LPC=off, FWH=off, SPI=on serprog: Maximum write-n length is 16777215 serprog: Maximum read-n length is 16777215 serprog: Programmer name is "atmegaXXu2 " serprog: Serial buffer size is 65535 Probing for Winbond W25Q32, 4096 kB: probe_spi_rdid_generic: id1 0xef, id2 0x4016 Chip status register is 00 Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Note: using internal_delay, because the attached programmer does not support the delay opcode. Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:ENote: using internal_delay, because the attached programmer does not support the delay opcode. ERASE FAILED!
Reading current chip content... <very long pause>done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done.
nice verbose output explaining what really happens.. the extra line break is from inside the erase_and_write_block_helper i think.
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EVV flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OS timer resolution is 1 usecs, 1451M loops per second, 10 myus = 10 us, 100 myus = 99 us, 1000 myus = 1009 us, 10000 myus = 10244 us, 4 myus = 4 us, OK. Initializing serprog programmer serprog: connected - attempting to synchronize . serprog: Synchronized serprog: Interface version ok. serprog: Bus support: parallel=off, LPC=off, FWH=off, SPI=on serprog: Maximum write-n length is 16777215 serprog: Maximum read-n length is 16777215 serprog: Programmer name is "atmegaXXu2 " serprog: Serial buffer size is 65535 Probing for Winbond W25Q32, 4096 kB: serprog_spi_send_command, writecnt=1, readcnt=3 RDID returned 0xef 0x40 0x16. probe_spi_rdid_generic: id1 0xef, id2 0x4016 serprog_spi_send_command, writecnt=1, readcnt=2 Chip status register is 00 Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. serprog_delay usecs=100000 Note: using internal_delay, because the attached programmer does not support the delay opcode. serprog_spi_send_command, writecnt=1, readcnt=2 Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:Eserprog_spi_send_command, writecnt=1, readcnt=0 serprog_spi_send_command, writecnt=4, readcnt=0 serprog_spi_send_command, writecnt=1, readcnt=2 serprog_delay usecs=10000 Note: using internal_delay, because the attached programmer does not support the delay opcode. serprog_spi_send_command, writecnt=1, readcnt=2 serprog_spi_send_command, writecnt=4, readcnt=4096 ERASE FAILED!
Reading current chip content... serprog_spi_send_command, writecnt=4, readcnt=4194304<very long pause> done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done. serprog_shutdown
regarding our discussion yesterday...
a minor related bug is that erase_and_write_flash would report success if there is no single "ok" eraser (in the patched version). it would return the initial value of ret, which is 0. we should never reach this point, because we check for this in chip_safety_check, but it would not hurt if we initialize ret = 1 imho.
Am 16.07.2011 23:44 schrieb Stefan Tauner:
On Sat, 16 Jul 2011 02:13:44 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Am 16.04.2011 13:37 schrieb Stefan Tauner:
i would change the check_block_eraser method to return different values for "not suitable" and "not defined/no more methods"
(there cant be holes in the eraser array, right?).
this is wrong. there can be holes. anyway, my current approach looks similar to yours.
Where did you find holes in that array? If a patch contains flash chip definitions with holes, it should be rejected.
That would fix the bug and simplify the code at the cost of additional lines in the log for the following case: erase method 0 is suitable, but fails (no change) ... suitable but fails (no change) erase method n is not suitable (my version: not printed, your version: printed) ...not suitable (see above) erase method m is not defined (no change) ...not defined (no change) END OF ARRAY
i am not 100% sure what you mean. i did not specify how/when i would print the state of erase functions... if the behavior of the attached patch has this problem please specify.
AFAICS your current patch does not implement the check_block_eraser change you talked about, so your current patch is OK.
A few LPC/FWH chips have a chip-erase function which is only available in parallel programmer mode (i.e. not in a mainboard), and those would get those additional lines.
how and where are those en/disabled?
That code is mostly unfinished. The bustype is special (A/A Mux) and mentioned for a few chips in flashchips.c, but we didn't add specific support yet.
the first attached patch includes a fix similar to yours, but a little bit extended. some modification i have done to test it are in a second patch (not for merge of course). i have done testing with my spi-capable serprog programmer. is the dummy programmer able to simulate such things too? i didnt bother to look :)
The dummy programmer can blacklist SPI commands, but that patch is unmerged because I didn't have time to finish the man page entry for that feature. I think Maciej Pijanka wrote some something for the man page, and I should merge that into my patch and commit it.
my patch changes the output a bit... to the better imho.
- it removes a line break after each call of walk_eraseregions. what's the purpose of this? it makes holes in the log in the case something in walk_eraseregions ends with \n (lots of things do). this does not fix all problems though... see -EV output below.
Odd. I think that additional linebreak appeared somtime in 2011, and was not present before.
- it informs the user about the (potentially long) read of the whole chip after an erase failed. this is very important for slow programmers imho.
OK.
- it adds a message to check_block_eraser in the case of success. since each caller can decide if any output is printed in check_block_eraser this is ok and we can improve the user experience in verbose mode.
it also changes the long comment because it is no longer true. we may wanna make it true again by checking the other erasers too and continuing the loop after checking for walk_eraseregions > 0 afterwards, but with your/mine patch it is just not true anymore.
below is the output of erase runs with different verbosity levels (note that the last erase function succeeds, so there is no test for the actual oob problem... but that is obviously fixed anyway... and i have tested this too.):
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -E flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OK. serprog: Programmer name is "atmegaXXu2 " Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Erasing and writing flash chip... ERASE FAILED! Reading current chip content... <very long pause>done. Trying next erase function. Done.
without the new "Reading current chip content..." there would be only "ERASE FAILED!" be printed for a quite long time. i think that's scary for users, and that they may hit ctrl+c because of it.
Sorry, I don't buy that argument. flashrom already emits too much information in non-verbose mode, and this is not going to improve the situation. If people freak out, there is a good chance they will freak out already during write, and won't wait for the scary "ERASE FAILED" error message.
You once said that you think flashrom floods the user with too much output. Did you change your mind?
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EV flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OS timer resolution is 1 usecs, 1454M loops per second, 10 myus = 10 us, 100 myus = 99 us, 1000 myus = 994 us, 10000 myus = 10003 us, 4 myus = 4 us, OK. Initializing serprog programmer serprog: connected - attempting to synchronize . serprog: Synchronized serprog: Interface version ok. serprog: Bus support: parallel=off, LPC=off, FWH=off, SPI=on serprog: Maximum write-n length is 16777215 serprog: Maximum read-n length is 16777215 serprog: Programmer name is "atmegaXXu2 " serprog: Serial buffer size is 65535 Probing for Winbond W25Q32, 4096 kB: probe_spi_rdid_generic: id1 0xef, id2 0x4016 Chip status register is 00 Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Note: using internal_delay, because the attached programmer does not support the delay opcode. Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:ENote: using internal_delay, because the attached programmer does not support the delay opcode.
That "using internal delay" message should appear exactly once, not every time someone calls programmer_delay.
ERASE FAILED!
Reading current chip content... <very long pause>done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done.
nice verbose output explaining what really happens.. the extra line break is from inside the erase_and_write_block_helper i think.
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EVV flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OS timer resolution is 1 usecs, 1451M loops per second, 10 myus = 10 us, 100 myus = 99 us, 1000 myus = 1009 us, 10000 myus = 10244 us, 4 myus = 4 us, OK. Initializing serprog programmer serprog: connected - attempting to synchronize . serprog: Synchronized serprog: Interface version ok. serprog: Bus support: parallel=off, LPC=off, FWH=off, SPI=on serprog: Maximum write-n length is 16777215 serprog: Maximum read-n length is 16777215 serprog: Programmer name is "atmegaXXu2 " serprog: Serial buffer size is 65535 Probing for Winbond W25Q32, 4096 kB: serprog_spi_send_command, writecnt=1, readcnt=3 RDID returned 0xef 0x40 0x16. probe_spi_rdid_generic: id1 0xef, id2 0x4016 serprog_spi_send_command, writecnt=1, readcnt=2 Chip status register is 00 Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. serprog_delay usecs=100000 Note: using internal_delay, because the attached programmer does not support the delay opcode. serprog_spi_send_command, writecnt=1, readcnt=2 Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:Eserprog_spi_send_command, writecnt=1, readcnt=0 serprog_spi_send_command, writecnt=4, readcnt=0 serprog_spi_send_command, writecnt=1, readcnt=2 serprog_delay usecs=10000 Note: using internal_delay, because the attached programmer does not support the delay opcode. serprog_spi_send_command, writecnt=1, readcnt=2 serprog_spi_send_command, writecnt=4, readcnt=4096 ERASE FAILED!
Reading current chip content... serprog_spi_send_command, writecnt=4, readcnt=4194304<very long pause> done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done. serprog_shutdown
From: Stefan Tauner stefan.tauner@student.tuwien.ac.at Date: Sat, 16 Jul 2011 22:53:24 +0200 Subject: [PATCH 1/2] Fix out-of-bounds access if all erase functions fail
Signed-off-by: Stefan Tauner stefan.tauner@student.tuwien.ac.at
flashrom.c | 29 +++++++++++++++++++---------- 1 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/flashrom.c b/flashrom.c index 998a18f..25cd43e 100644 --- a/flashrom.c +++ b/flashrom.c @@ -1502,6 +1502,8 @@ static int check_block_eraser(const struct flashchip *flash, int k, int log) "eraseblock layout is not defined. "); return 1; }
- if (log)
msg_cdbg("looks ok. ");
Unneeded verbosity.
return 0; }
@@ -1522,34 +1524,41 @@ int erase_and_write_flash(struct flashchip *flash, uint8_t *oldcontents, uint8_t memcpy(curcontents, oldcontents, size);
for (k = 0; k < NUM_ERASEFUNCTIONS; k++) {
msg_cdbg("Looking at blockwise erase function %i... ", k);
if (check_block_eraser(flash, k, 1) && usable_erasefunctions) {
if (!usable_erasefunctions) {
msg_cdbg("No usable erase functions left.\n");
break;
}
msg_cdbg("Looking at erase function %i... ", k);
} usable_erasefunctions--;if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue;
msg_cdbg("trying... ");
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper, curcontents, newcontents);
msg_cdbg("\n");
msg_cdbg("Trying... ");
Why did you change trying->Trying? Consistency?
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper,
/* If everything is OK, don't try another erase function. */ if (!ret) break; /* Write/erase failed, so try to find out what the current chipcurcontents, newcontents);
* contents are. If no usable erase functions remain, we could
* abort the loop instead of continuing, the effect is the same.
* The only difference is whether the reason for other unusable
* functions is printed or not. If in doubt, verbosity wins.
* contents are. If no usable erase functions remain, we can
*/ if (!usable_erasefunctions) continue;* skip this: the next iteration will break immediately anyway.
/* Reading the whole chip may take a while, inform the user even
* in non-verbose mode.
*/
msg_cinfo("Reading current chip content... ");
msg_cdbg("Reading current flash chip contents... ");
if (flash->read(flash, curcontents, 0, size)) { /* Now we are truly screwed. Read failed as well. */
msg_cerr("Can't read anymore!\n");
}msg_cerr("Can't read anymore! Aborting.\n"); /* We have no idea about the flash chip contents, so * retrying with another erase function is pointless. */ break;
msg_cinfo("done. Trying next erase function.\n");
msg_cdbg("Looking for another erase function.\n"); That would make the messages more consistent.
} /* Free the scratchpad. */ free(curcontents);
Regards, Carl-Daniel
On Sun, 17 Jul 2011 15:20:59 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Am 16.07.2011 23:44 schrieb Stefan Tauner:
On Sat, 16 Jul 2011 02:13:44 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Am 16.04.2011 13:37 schrieb Stefan Tauner:
i would change the check_block_eraser method to return different values for "not suitable" and "not defined/no more methods"
(there cant be holes in the eraser array, right?).
this is wrong. there can be holes. anyway, my current approach looks similar to yours.
Where did you find holes in that array? If a patch contains flash chip definitions with holes, it should be rejected.
i did not find one in flashchips.c, but it is possible to introduce them via the probing function theoretically. should not happen, will probably never happen, would guard against it anyway if possible. :)
That would fix the bug and simplify the code at the cost of additional lines in the log for the following case: erase method 0 is suitable, but fails (no change) ... suitable but fails (no change) erase method n is not suitable (my version: not printed, your version: printed) ...not suitable (see above) erase method m is not defined (no change) ...not defined (no change) END OF ARRAY
i am not 100% sure what you mean. i did not specify how/when i would print the state of erase functions... if the behavior of the attached patch has this problem please specify.
AFAICS your current patch does not implement the check_block_eraser change you talked about, so your current patch is OK.
nope, ok.
A few LPC/FWH chips have a chip-erase function which is only available in parallel programmer mode (i.e. not in a mainboard), and those would get those additional lines.
how and where are those en/disabled?
That code is mostly unfinished. The bustype is special (A/A Mux) and mentioned for a few chips in flashchips.c, but we didn't add specific support yet.
i have seen the comments... ok.
the first attached patch includes a fix similar to yours, but a little bit extended. some modification i have done to test it are in a second patch (not for merge of course). i have done testing with my spi-capable serprog programmer. is the dummy programmer able to simulate such things too? i didnt bother to look :)
The dummy programmer can blacklist SPI commands, but that patch is unmerged because I didn't have time to finish the man page entry for that feature. I think Maciej Pijanka wrote some something for the man page, and I should merge that into my patch and commit it.
:)
my patch changes the output a bit... to the better imho.
- it removes a line break after each call of walk_eraseregions. what's the purpose of this? it makes holes in the log in the case something in walk_eraseregions ends with \n (lots of things do). this does not fix all problems though... see -EV output below.
Odd. I think that additional linebreak appeared somtime in 2011, and was not present before.
- it informs the user about the (potentially long) read of the whole chip after an erase failed. this is very important for slow programmers imho.
OK.
- it adds a message to check_block_eraser in the case of success. since each caller can decide if any output is printed in check_block_eraser this is ok and we can improve the user experience in verbose mode.
it also changes the long comment because it is no longer true. we may wanna make it true again by checking the other erasers too and continuing the loop after checking for walk_eraseregions > 0 afterwards, but with your/mine patch it is just not true anymore.
below is the output of erase runs with different verbosity levels (note that the last erase function succeeds, so there is no test for the actual oob problem... but that is obviously fixed anyway... and i have tested this too.):
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -E flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OK. serprog: Programmer name is "atmegaXXu2 " Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Erasing and writing flash chip... ERASE FAILED! Reading current chip content... <very long pause>done. Trying next erase function. Done.
without the new "Reading current chip content..." there would be only "ERASE FAILED!" be printed for a quite long time. i think that's scary for users, and that they may hit ctrl+c because of it.
Sorry, I don't buy that argument. flashrom already emits too much information in non-verbose mode, and this is not going to improve the situation.
5 lines + header for an erase cycle where one erase function fails is too much?
If people freak out, there is a good chance they will freak out already during write, and won't wait for the scary "ERASE FAILED" error message.
during write there the last characters are "..." not "FAILED". i hope we can agree that this is a major difference in the perspective of a user :) at least for me "..." indicates "it does something which might take a while, dont worry".
You once said that you think flashrom floods the user with too much output. Did you change your mind?
that was *only* about ichspi in verbose mode. for non-verbose mode i think it could inform the user more about what it does in general (alternative would be to have at least a progress bar.. i know wip).
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EV flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OS timer resolution is 1 usecs, 1454M loops per second, 10 myus = 10 us, 100 myus = 99 us, 1000 myus = 994 us, 10000 myus = 10003 us, 4 myus = 4 us, OK. Initializing serprog programmer serprog: connected - attempting to synchronize . serprog: Synchronized serprog: Interface version ok. serprog: Bus support: parallel=off, LPC=off, FWH=off, SPI=on serprog: Maximum write-n length is 16777215 serprog: Maximum read-n length is 16777215 serprog: Programmer name is "atmegaXXu2 " serprog: Serial buffer size is 65535 Probing for Winbond W25Q32, 4096 kB: probe_spi_rdid_generic: id1 0xef, id2 0x4016 Chip status register is 00 Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Note: using internal_delay, because the attached programmer does not support the delay opcode. Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:ENote: using internal_delay, because the attached programmer does not support the delay opcode.
That "using internal delay" message should appear exactly once, not every time someone calls programmer_delay.
that one is from the serprog spi patch, so lets postpone this discussion. i agree with you though.
ERASE FAILED!
Reading current chip content... <very long pause>done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done.
nice verbose output explaining what really happens.. the extra line break is from inside the erase_and_write_block_helper i think.
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EVV flashrom v0.9.3-r1372 on Linux 2.6.35-30-generic (x86_64), built with libpci 3.1.7, GCC 4.4.5, little endian flashrom is free software, get the source code at http://www.flashrom.org
Calibrating delay loop... OS timer resolution is 1 usecs, 1451M loops per second, 10 myus = 10 us, 100 myus = 99 us, 1000 myus = 1009 us, 10000 myus = 10244 us, 4 myus = 4 us, OK. Initializing serprog programmer serprog: connected - attempting to synchronize . serprog: Synchronized serprog: Interface version ok. serprog: Bus support: parallel=off, LPC=off, FWH=off, SPI=on serprog: Maximum write-n length is 16777215 serprog: Maximum read-n length is 16777215 serprog: Programmer name is "atmegaXXu2 " serprog: Serial buffer size is 65535 Probing for Winbond W25Q32, 4096 kB: serprog_spi_send_command, writecnt=1, readcnt=3 RDID returned 0xef 0x40 0x16. probe_spi_rdid_generic: id1 0xef, id2 0x4016 serprog_spi_send_command, writecnt=1, readcnt=2 Chip status register is 00 Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. serprog_delay usecs=100000 Note: using internal_delay, because the attached programmer does not support the delay opcode. serprog_spi_send_command, writecnt=1, readcnt=2 Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:Eserprog_spi_send_command, writecnt=1, readcnt=0 serprog_spi_send_command, writecnt=4, readcnt=0 serprog_spi_send_command, writecnt=1, readcnt=2 serprog_delay usecs=10000 Note: using internal_delay, because the attached programmer does not support the delay opcode. serprog_spi_send_command, writecnt=1, readcnt=2 serprog_spi_send_command, writecnt=4, readcnt=4096 ERASE FAILED!
Reading current chip content... serprog_spi_send_command, writecnt=4, readcnt=4194304<very long pause> done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done. serprog_shutdown
From: Stefan Tauner stefan.tauner@student.tuwien.ac.at Date: Sat, 16 Jul 2011 22:53:24 +0200 Subject: [PATCH 1/2] Fix out-of-bounds access if all erase functions fail
Signed-off-by: Stefan Tauner stefan.tauner@student.tuwien.ac.at
flashrom.c | 29 +++++++++++++++++++---------- 1 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/flashrom.c b/flashrom.c index 998a18f..25cd43e 100644 --- a/flashrom.c +++ b/flashrom.c @@ -1502,6 +1502,8 @@ static int check_block_eraser(const struct flashchip *flash, int k, int log) "eraseblock layout is not defined. "); return 1; }
- if (log)
msg_cdbg("looks ok. ");
Unneeded verbosity.
i beg to differ. i think it makes the verbose output more readable for someone who does not know the code (well). i think there should always be a terminating print for messages ending with "..." like this one. maybe just "ok."? :)
return 0; }
@@ -1522,34 +1524,41 @@ int erase_and_write_flash(struct flashchip *flash, uint8_t *oldcontents, uint8_t memcpy(curcontents, oldcontents, size);
for (k = 0; k < NUM_ERASEFUNCTIONS; k++) {
msg_cdbg("Looking at blockwise erase function %i... ", k);
if (check_block_eraser(flash, k, 1) && usable_erasefunctions) {
if (!usable_erasefunctions) {
msg_cdbg("No usable erase functions left.\n");
break;
}
msg_cdbg("Looking at erase function %i... ", k);
} usable_erasefunctions--;if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue;
msg_cdbg("trying... ");
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper, curcontents, newcontents);
msg_cdbg("\n");
msg_cdbg("Trying... ");
Why did you change trying->Trying? Consistency?
to have correct output together with the "looks ok." print above: "Looking at erase function 4... looks ok. Trying..." vs. "Looking at erase function 4... looks ok. trying..."
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper,
/* If everything is OK, don't try another erase function. */ if (!ret) break; /* Write/erase failed, so try to find out what the current chipcurcontents, newcontents);
* contents are. If no usable erase functions remain, we could
* abort the loop instead of continuing, the effect is the same.
* The only difference is whether the reason for other unusable
* functions is printed or not. If in doubt, verbosity wins.
* contents are. If no usable erase functions remain, we can
*/ if (!usable_erasefunctions) continue;* skip this: the next iteration will break immediately anyway.
/* Reading the whole chip may take a while, inform the user even
* in non-verbose mode.
*/
msg_cinfo("Reading current chip content... ");
msg_cdbg("Reading current flash chip contents... ");
ok
if (flash->read(flash, curcontents, 0, size)) { /* Now we are truly screwed. Read failed as well. */
msg_cerr("Can't read anymore!\n");
}msg_cerr("Can't read anymore! Aborting.\n"); /* We have no idea about the flash chip contents, so * retrying with another erase function is pointless. */ break;
msg_cinfo("done. Trying next erase function.\n");
msg_cdbg("Looking for another erase function.\n"); That would make the messages more consistent.
yes, good idea
} /* Free the scratchpad. */ free(curcontents);
Am 17.07.2011 15:46 schrieb Stefan Tauner:
On Sun, 17 Jul 2011 15:20:59 +0200 Carl-Daniel Hailfinger wrote:
Am 16.07.2011 23:44 schrieb Stefan Tauner:
On Sat, 16 Jul 2011 02:13:44 +0200 Carl-Daniel Hailfinger wrote:
Am 16.04.2011 13:37 schrieb Stefan Tauner:
[holes eraser array] Where did you find holes in that array? If a patch contains flash chip definitions with holes, it should be rejected.
i did not find one in flashchips.c, but it is possible to introduce them via the probing function theoretically. should not happen, will probably never happen, would guard against it anyway if possible. :)
We should guard against that in a selfcheck function on startup, not at write/erase time.
Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Erasing and writing flash chip... ERASE FAILED! Reading current chip content... <very long pause>done. Trying next erase function. Done.
without the new "Reading current chip content..." there would be only "ERASE FAILED!" be printed for a quite long time. i think that's scary for users, and that they may hit ctrl+c because of it.
Sorry, I don't buy that argument. flashrom already emits too much information in non-verbose mode, and this is not going to improve the situation.
5 lines + header for an erase cycle where one erase function fails is too much?
Not in itself, but I doubt the messages will fit in 5 lines in the near future when someone discovers that we need more default verbosity in other places, and by then a quiet mode will be needed for flashrom.
If people freak out, there is a good chance they will freak out already during write, and won't wait for the scary "ERASE FAILED" error message.
during write there the last characters are "..." not "FAILED". i hope we can agree that this is a major difference in the perspective of a user :) at least for me "..." indicates "it does something which might take a while, dont worry".
I am unhappy about the inconsistency this creates for messages about reading the flash chip. Depending on the flash chip and the operation, flashrom may read the full flash chip 8 times: Once during startup, once for each failed erase method (up to 6), once for final verify. Individual write methods may have internal verify/read steps as well. Only reads after a failed erase would be printed at default verbosity, and that inconsistency worries me.
You once said that you think flashrom floods the user with too much output. Did you change your mind?
that was *only* about ichspi in verbose mode. for non-verbose mode i think it could inform the user more about what it does in general (alternative would be to have at least a progress bar.. i know wip).
I think the Chromium branch of flashrom has a totally different user experience because it changed lots of output levels. We might want to take a look at their approach and check if it fits our needs as well.
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EV [...] Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:E ERASE FAILED!
Reading current chip content... <very long pause>done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done.
nice verbose output explaining what really happens.. the extra line break is from inside the erase_and_write_block_helper i think.
From: Stefan Tauner stefan.tauner@student.tuwien.ac.at Date: Sat, 16 Jul 2011 22:53:24 +0200 Subject: [PATCH 1/2] Fix out-of-bounds access if all erase functions fail
Signed-off-by: Stefan Tauner stefan.tauner@student.tuwien.ac.at
flashrom.c | 29 +++++++++++++++++++---------- 1 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/flashrom.c b/flashrom.c index 998a18f..25cd43e 100644 --- a/flashrom.c +++ b/flashrom.c @@ -1502,6 +1502,8 @@ static int check_block_eraser(const struct flashchip *flash, int k, int log) "eraseblock layout is not defined. "); return 1; }
- if (log)
msg_cdbg("looks ok. ");
Unneeded verbosity.
i beg to differ. i think it makes the verbose output more readable for someone who does not know the code (well). i think there should always be a terminating print for messages ending with "..." like this one. maybe just "ok."? :)
Unless I'm mistaken, the original "trying..." message fulfilled exactly that purpose.
return 0; }
@@ -1522,34 +1524,41 @@ int erase_and_write_flash(struct flashchip *flash, uint8_t *oldcontents, uint8_t memcpy(curcontents, oldcontents, size);
for (k = 0; k < NUM_ERASEFUNCTIONS; k++) {
msg_cdbg("Looking at blockwise erase function %i... ", k);
if (check_block_eraser(flash, k, 1) && usable_erasefunctions) {
if (!usable_erasefunctions) {
msg_cdbg("No usable erase functions left.\n");
break;
}
msg_cdbg("Looking at erase function %i... ", k);
} usable_erasefunctions--;if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue;
msg_cdbg("trying... ");
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper, curcontents, newcontents);
msg_cdbg("\n");
msg_cdbg("Trying... ");
Why did you change trying->Trying? Consistency?
to have correct output together with the "looks ok." print above: "Looking at erase function 4... looks ok. Trying..." vs. "Looking at erase function 4... looks ok. trying..."
Compare that to the current message:
"Looking at erase function 4... trying..."
The current message does not mislead users to assume that the erase function is OK. We will only know that after it has been tried, and the user does not care about flashrom internals.
My main objection to this patch is that it combines a bugfix with cosmetics, and the cosmetics distract from bugfixing. The bugfix part (and the ret=1 initialization you mentioned) would have been easily reviewable/ackable, yet we're investing hours into dicussing how the output should look like.
Here is my take on the bugfix part with minimal message changes:
Fix out-of-bounds access if all erase functions fail. Fix detection of unchanged chip contents on erase failure. Return error if no usable erase functions exist. Print a debug message for the chip read after erase failure.
Thanks to Stefan Tauner for spotting the last two problems.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
carldani: if you dont wanna ready anything about output printing skip to the paragraph/chunk before the last ;)
On Mon, 18 Jul 2011 02:34:57 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Am 17.07.2011 15:46 schrieb Stefan Tauner:
On Sun, 17 Jul 2011 15:20:59 +0200 Carl-Daniel Hailfinger wrote:
Am 16.07.2011 23:44 schrieb Stefan Tauner:
On Sat, 16 Jul 2011 02:13:44 +0200 Carl-Daniel Hailfinger wrote:
Am 16.04.2011 13:37 schrieb Stefan Tauner:
[holes eraser array] Where did you find holes in that array? If a patch contains flash chip definitions with holes, it should be rejected.
i did not find one in flashchips.c, but it is possible to introduce them via the probing function theoretically. should not happen, will probably never happen, would guard against it anyway if possible. :)
We should guard against that in a selfcheck function on startup, not at write/erase time.
we can't guard against a chip probing function that modifies the flashchip struct and introduce holes (except by making them constants). i don't think it is a big problem though... we always iterate over the whole block_erasers array so we will just skip "holes".
Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Erasing and writing flash chip... ERASE FAILED! Reading current chip content... <very long pause>done. Trying next erase function. Done.
without the new "Reading current chip content..." there would be only "ERASE FAILED!" be printed for a quite long time. i think that's scary for users, and that they may hit ctrl+c because of it.
Sorry, I don't buy that argument. flashrom already emits too much information in non-verbose mode, and this is not going to improve the situation.
5 lines + header for an erase cycle where one erase function fails is too much?
Not in itself, but I doubt the messages will fit in 5 lines in the near future when someone discovers that we need more default verbosity in other places, and by then a quiet mode will be needed for flashrom.
which would be a good solution imho. i dont think that the user interface design of flashrom is appropriate because it is does a very delicate job. printing "ok" or "failed" only is not enough for something like that.
If people freak out, there is a good chance they will freak out already during write, and won't wait for the scary "ERASE FAILED" error message.
during write there the last characters are "..." not "FAILED". i hope we can agree that this is a major difference in the perspective of a user :) at least for me "..." indicates "it does something which might take a while, dont worry".
I am unhappy about the inconsistency this creates for messages about reading the flash chip. Depending on the flash chip and the operation, flashrom may read the full flash chip 8 times: Once during startup, once for each failed erase method (up to 6), once for final verify. Individual write methods may have internal verify/read steps as well. Only reads after a failed erase would be printed at default verbosity, and that inconsistency worries me.
that is very true. my main concerns are long waits where there is no output and where it is not clear that we did not hang. so the short reads for erase verification etc do not worry me. the one at startup should be announced though imo.
You once said that you think flashrom floods the user with too much output. Did you change your mind?
that was *only* about ichspi in verbose mode. for non-verbose mode i think it could inform the user more about what it does in general (alternative would be to have at least a progress bar.. i know wip).
I think the Chromium branch of flashrom has a totally different user experience because it changed lots of output levels. We might want to take a look at their approach and check if it fits our needs as well.
first we need to discuss what our needs are :)
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EV [...] Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:E ERASE FAILED!
Reading current chip content... <very long pause>done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done.
nice verbose output explaining what really happens.. the extra line break is from inside the erase_and_write_block_helper i think.
From: Stefan Tauner stefan.tauner@student.tuwien.ac.at Date: Sat, 16 Jul 2011 22:53:24 +0200 Subject: [PATCH 1/2] Fix out-of-bounds access if all erase functions fail
Signed-off-by: Stefan Tauner stefan.tauner@student.tuwien.ac.at
flashrom.c | 29 +++++++++++++++++++---------- 1 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/flashrom.c b/flashrom.c index 998a18f..25cd43e 100644 --- a/flashrom.c +++ b/flashrom.c @@ -1502,6 +1502,8 @@ static int check_block_eraser(const struct flashchip *flash, int k, int log) "eraseblock layout is not defined. "); return 1; }
- if (log)
msg_cdbg("looks ok. ");
Unneeded verbosity.
i beg to differ. i think it makes the verbose output more readable for someone who does not know the code (well). i think there should always be a terminating print for messages ending with "..." like this one. maybe just "ok."? :)
Unless I'm mistaken, the original "trying..." message fulfilled exactly that purpose.
partially. i think having dedicated opening and closing "terminals" around ellipses are better, because it is a pattern known by the users. the "trying..." message does not match that pattern exactly because it combines the closing terminal for "Looking at erase function x... " with a new opening terminal. also, it moves the output away from the printing of the closing terminals done in cases of errors. that's all a bit inconsistent in my view. but you are right that the trying message somewhat solves the basic need for a closing terminal.
return 0; }
@@ -1522,34 +1524,41 @@ int erase_and_write_flash(struct flashchip *flash, uint8_t *oldcontents, uint8_t memcpy(curcontents, oldcontents, size);
for (k = 0; k < NUM_ERASEFUNCTIONS; k++) {
msg_cdbg("Looking at blockwise erase function %i... ", k);
if (check_block_eraser(flash, k, 1) && usable_erasefunctions) {
if (!usable_erasefunctions) {
msg_cdbg("No usable erase functions left.\n");
break;
}
msg_cdbg("Looking at erase function %i... ", k);
} usable_erasefunctions--;if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue;
msg_cdbg("trying... ");
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper, curcontents, newcontents);
msg_cdbg("\n");
msg_cdbg("Trying... ");
Why did you change trying->Trying? Consistency?
to have correct output together with the "looks ok." print above: "Looking at erase function 4... looks ok. Trying..." vs. "Looking at erase function 4... looks ok. trying..."
Compare that to the current message:
"Looking at erase function 4... trying..."
The current message does not mislead users to assume that the erase function is OK. We will only know that after it has been tried, and the user does not care about flashrom internals.
well, if you dont want the user to know about it, then it might be better to not even tell him that we are looking for well-defined erasers, but just that we are trying them. "Trying erase function x..." instead of the "Looking at" message and droping the "trying..." message altogether. the matching closing term would then be either one of the failure messages of check_block_eraser or the "ERASE FAILED!" of erase_and_write_block_helper. and.. there we are missing another closing terminal for successful runs. this is something i noticed to cause confusion too (at least for me :)
My main objection to this patch is that it combines a bugfix with cosmetics, and the cosmetics distract from bugfixing. The bugfix part (and the ret=1 initialization you mentioned) would have been easily reviewable/ackable, yet we're investing hours into dicussing how the output should look like.
true, this should not be mixed up, i am sorry. we should get the fix in and discuss the rest later when you have more time again (i am still not giving up hope :)
Here is my take on the bugfix part with minimal message changes:
i dunno if you are super busy atm or you just missed my private messages on irc, but you forgot to attach a patch...
Am 20.07.2011 16:20 schrieb Stefan Tauner:
carldani: if you dont wanna ready anything about output printing skip to the paragraph/chunk before the last ;)
Thanks for the info. I thought about your suggested output changes for a long time and now I'm convinced that your point is valid. We should reduce output elsewhere, though.
On Mon, 18 Jul 2011 02:34:57 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Am 17.07.2011 15:46 schrieb Stefan Tauner:
On Sun, 17 Jul 2011 15:20:59 +0200 Carl-Daniel Hailfinger wrote:
Am 16.07.2011 23:44 schrieb Stefan Tauner:
On Sat, 16 Jul 2011 02:13:44 +0200 Carl-Daniel Hailfinger wrote:
> Am 16.04.2011 13:37 schrieb Stefan Tauner: > >
[holes eraser array] Where did you find holes in that array? If a patch contains flash chip definitions with holes, it should be rejected.
i did not find one in flashchips.c, but it is possible to introduce them via the probing function theoretically. should not happen, will probably never happen, would guard against it anyway if possible. :)
We should guard against that in a selfcheck function on startup, not at write/erase time.
we can't guard against a chip probing function that modifies the flashchip struct and introduce holes (except by making them constants). i don't think it is a big problem though... we always iterate over the whole block_erasers array so we will just skip "holes".
Indeed, a programmatically constructed struct flashchip might behave odd in various ways.
Found chip "Winbond W25Q32" (4096 kB, SPI) on serprog. Erasing and writing flash chip... ERASE FAILED! Reading current chip content... <very long pause>done. Trying next erase function. Done.
without the new "Reading current chip content..." there would be only "ERASE FAILED!" be printed for a quite long time. i think that's scary for users, and that they may hit ctrl+c because of it.
Sorry, I don't buy that argument. flashrom already emits too much information in non-verbose mode, and this is not going to improve the situation.
5 lines + header for an erase cycle where one erase function fails is too much?
Not in itself, but I doubt the messages will fit in 5 lines in the near future when someone discovers that we need more default verbosity in other places, and by then a quiet mode will be needed for flashrom.
which would be a good solution imho. i dont think that the user interface design of flashrom is appropriate because it is does a very delicate job. printing "ok" or "failed" only is not enough for something like that.
We spent weeks trying to determine the correct message levels back when we migrated everything from printf to msg_*. That said, an audit of the current state would indeed be a good idea, but the biggest hurdle will be finding someone willing to review any changes.
If people freak out, there is a good chance they will freak out already during write, and won't wait for the scary "ERASE FAILED" error message.
during write there the last characters are "..." not "FAILED". i hope we can agree that this is a major difference in the perspective of a user :) at least for me "..." indicates "it does something which might take a while, dont worry".
I am unhappy about the inconsistency this creates for messages about reading the flash chip. Depending on the flash chip and the operation, flashrom may read the full flash chip 8 times: Once during startup, once for each failed erase method (up to 6), once for final verify. Individual write methods may have internal verify/read steps as well. Only reads after a failed erase would be printed at default verbosity, and that inconsistency worries me.
that is very true. my main concerns are long waits where there is no output and where it is not clear that we did not hang. so the short reads for erase verification etc do not worry me. the one at startup should be announced though imo.
Absolutely, yes.
You once said that you think flashrom floods the user with too much output. Did you change your mind?
that was *only* about ichspi in verbose mode. for non-verbose mode i think it could inform the user more about what it does in general (alternative would be to have at least a progress bar.. i know wip).
I think the Chromium branch of flashrom has a totally different user experience because it changed lots of output levels. We might want to take a look at their approach and check if it fits our needs as well.
first we need to discuss what our needs are :)
Agreed.
./flashrom -pserprog:dev=/dev/ttyU2flasher:115200 -c W25Q32 -EV [...] Erasing and writing flash chip... Looking at erase function 0... looks ok. Trying... 0x000000-0x000fff:E ERASE FAILED!
Reading current chip content... <very long pause>done. Trying next erase function. Looking at erase function 1... block erase function found, but eraseblock layout is not defined. Looking for another erase function. Looking at erase function 2... eraseblock layout is known, but matching block erase function is not implemented. Looking for another erase function. Looking at erase function 3... not defined. Looking for another erase function. Looking at erase function 4... looks ok. Trying... 0x000000-0x3fffff:S Done.
nice verbose output explaining what really happens.. the extra line break is from inside the erase_and_write_block_helper i think.
From: Stefan Tauner stefan.tauner@student.tuwien.ac.at Date: Sat, 16 Jul 2011 22:53:24 +0200 Subject: [PATCH 1/2] Fix out-of-bounds access if all erase functions fail
Signed-off-by: Stefan Tauner stefan.tauner@student.tuwien.ac.at
flashrom.c | 29 +++++++++++++++++++---------- 1 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/flashrom.c b/flashrom.c index 998a18f..25cd43e 100644 --- a/flashrom.c +++ b/flashrom.c @@ -1502,6 +1502,8 @@ static int check_block_eraser(const struct flashchip *flash, int k, int log) "eraseblock layout is not defined. "); return 1; }
- if (log)
msg_cdbg("looks ok. ");
Unneeded verbosity.
i beg to differ. i think it makes the verbose output more readable for someone who does not know the code (well). i think there should always be a terminating print for messages ending with "..." like this one. maybe just "ok."? :)
Unless I'm mistaken, the original "trying..." message fulfilled exactly that purpose.
partially. i think having dedicated opening and closing "terminals" around ellipses are better, because it is a pattern known by the users. the "trying..." message does not match that pattern exactly because it combines the closing terminal for "Looking at erase function x... " with a new opening terminal. also, it moves the output away from the printing of the closing terminals done in cases of errors. that's all a bit inconsistent in my view. but you are right that the trying message somewhat solves the basic need for a closing terminal.
"Trying" instead of "Looking at" sounds good.
return 0; }
@@ -1522,34 +1524,41 @@ int erase_and_write_flash(struct flashchip *flash, uint8_t *oldcontents, uint8_t memcpy(curcontents, oldcontents, size);
for (k = 0; k < NUM_ERASEFUNCTIONS; k++) {
msg_cdbg("Looking at blockwise erase function %i... ", k);
if (check_block_eraser(flash, k, 1) && usable_erasefunctions) {
if (!usable_erasefunctions) {
msg_cdbg("No usable erase functions left.\n");
break;
}
msg_cdbg("Looking at erase function %i... ", k);
} usable_erasefunctions--;if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue;
msg_cdbg("trying... ");
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper, curcontents, newcontents);
msg_cdbg("\n");
msg_cdbg("Trying... ");
Why did you change trying->Trying? Consistency?
to have correct output together with the "looks ok." print above: "Looking at erase function 4... looks ok. Trying..." vs. "Looking at erase function 4... looks ok. trying..."
Compare that to the current message:
"Looking at erase function 4... trying..."
The current message does not mislead users to assume that the erase function is OK. We will only know that after it has been tried, and the user does not care about flashrom internals.
well, if you dont want the user to know about it, then it might be better to not even tell him that we are looking for well-defined erasers, but just that we are trying them. "Trying erase function x..." instead of the "Looking at" message and droping the "trying..." message altogether. the matching closing term would then be either one of the failure messages of check_block_eraser or the "ERASE FAILED!" of erase_and_write_block_helper. and.. there we are missing another closing terminal for successful runs. this is something i noticed to cause confusion too (at least for me :)
Mh. First and foremost I care about the output of non-verbose mode. That should be reasonably sane. Second priority is verbose mode output. Spew output is expected to be an unreadable badly formatted flood of mostly useless messages.
My main objection to this patch is that it combines a bugfix with cosmetics, and the cosmetics distract from bugfixing. The bugfix part (and the ret=1 initialization you mentioned) would have been easily reviewable/ackable, yet we're investing hours into dicussing how the output should look like.
true, this should not be mixed up, i am sorry. we should get the fix in and discuss the rest later when you have more time again (i am still not giving up hope :)
Heh.
Here is my take on the bugfix part with minimal message changes:
i dunno if you are super busy atm or you just missed my private messages on irc, but you forgot to attach a patch...
Sorry, I was not on IRC for the last few days, and now I'm trying to catch up on mail first, then move on to IRC.
Here's the fix with no message changes. I think that part is where we both agree. Please note that a separate followup patch with improved messages (either from you or from me) is also very desirable for 0.9.4, and by now I pretty much agree with your reasoning.
Fix out-of-bounds access if all erase functions fail. Fix detection of unchanged chip contents on erase failure. Return error if no usable erase functions exist.
Thanks to Stefan Tauner for spotting the last problem.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Index: flashrom-fix_erasefunctions_nullpointer/flashrom.c =================================================================== --- flashrom-fix_erasefunctions_nullpointer/flashrom.c (Revision 1373) +++ flashrom-fix_erasefunctions_nullpointer/flashrom.c (Arbeitskopie) @@ -1507,7 +1507,7 @@
int erase_and_write_flash(struct flashchip *flash, uint8_t *oldcontents, uint8_t *newcontents) { - int k, ret = 0; + int k, ret = 1; uint8_t *curcontents; unsigned long size = flash->total_size * 1024; unsigned int usable_erasefunctions = count_usable_erasers(flash); @@ -1522,8 +1522,12 @@ memcpy(curcontents, oldcontents, size);
for (k = 0; k < NUM_ERASEFUNCTIONS; k++) { + if (!usable_erasefunctions) { + msg_cdbg("No usable erase functions left.\n"); + break; + } msg_cdbg("Looking at blockwise erase function %i... ", k); - if (check_block_eraser(flash, k, 1) && usable_erasefunctions) { + if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue; } @@ -1535,10 +1539,8 @@ if (!ret) break; /* Write/erase failed, so try to find out what the current chip - * contents are. If no usable erase functions remain, we could - * abort the loop instead of continuing, the effect is the same. - * The only difference is whether the reason for other unusable - * functions is printed or not. If in doubt, verbosity wins. + * contents are. If no usable erase functions remain, we can + * skip this: the next iteration will break immediately anyway. */ if (!usable_erasefunctions) continue;
Am 21.07.2011 02:41 schrieb Carl-Daniel Hailfinger:
Here's the fix with no message changes. I think that part is where we both agree. Please note that a separate followup patch with improved messages (either from you or from me) is also very desirable for 0.9.4, and by now I pretty much agree with your reasoning.
And here are the pure message changes on top. AFAICS they should be mostly what you created yourself and a few small tweaks by me, so it should probably carry your signoff. In the meantime, this has my signoff to make sure nobody thinks the patch is restricted.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
--- flashrom-cosmetics_blockwalker_read_write_error/flashrom.c 2011-07-21 02:50:59.000000000 +0200 +++ flashrom-cosmetics_blockwalker_read_write_error/flashrom.c 2011-07-21 03:00:20.000000000 +0200 @@ -1526,15 +1526,14 @@ msg_cdbg("No usable erase functions left.\n"); break; } - msg_cdbg("Looking at blockwise erase function %i... ", k); + msg_cdbg("Trying erase function %i... ", k); if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue; } usable_erasefunctions--; - msg_cdbg("trying... "); - ret = walk_eraseregions(flash, k, &erase_and_write_block_helper, curcontents, newcontents); - msg_cdbg("\n"); + ret = walk_eraseregions(flash, k, &erase_and_write_block_helper, + curcontents, newcontents); /* If everything is OK, don't try another erase function. */ if (!ret) break; @@ -1544,14 +1543,19 @@ */ if (!usable_erasefunctions) continue; + /* Reading the whole chip may take a while, inform the user even + * in non-verbose mode. + */ + msg_cinfo("Reading current flash chip contents... "); if (flash->read(flash, curcontents, 0, size)) { /* Now we are truly screwed. Read failed as well. */ - msg_cerr("Can't read anymore!\n"); + msg_cerr("Can't read anymore! Aborting.\n"); /* We have no idea about the flash chip contents, so * retrying with another erase function is pointless. */ break; } + msg_cinfo("done. Trying next erase function.\n"); } /* Free the scratchpad. */ free(curcontents); @@ -1938,13 +1942,13 @@ * preserved, but in that case we might perform unneeded erase which * takes time as well. */ - msg_cdbg("Reading old flash chip contents... "); + msg_cinfo("Reading old flash chip contents... "); if (flash->read(flash, oldcontents, 0, size)) { ret = 1; - msg_cdbg("FAILED.\n"); + msg_cinfo("FAILED.\n"); goto out; } - msg_cdbg("done.\n"); + msg_cinfo("done.\n");
// This should be moved into each flash part's code to do it // cleanly. This does the job.
On Thu, 21 Jul 2011 03:08:36 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Am 21.07.2011 02:41 schrieb Carl-Daniel Hailfinger:
Here's the fix with no message changes. I think that part is where we both agree. Please note that a separate followup patch with improved messages (either from you or from me) is also very desirable for 0.9.4, and by now I pretty much agree with your reasoning.
And here are the pure message changes on top. AFAICS they should be mostly what you created yourself and a few small tweaks by me, so it should probably carry your signoff. In the meantime, this has my signoff to make sure nobody thinks the patch is restricted.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
--- flashrom-cosmetics_blockwalker_read_write_error/flashrom.c 2011-07-21 02:50:59.000000000 +0200 +++ flashrom-cosmetics_blockwalker_read_write_error/flashrom.c 2011-07-21 03:00:20.000000000 +0200 @@ -1526,15 +1526,14 @@ msg_cdbg("No usable erase functions left.\n"); break; }
msg_cdbg("Looking at blockwise erase function %i... ", k);
if (check_block_eraser(flash, k, 1)) { msg_cdbg("Looking for another erase function.\n"); continue; } usable_erasefunctions--;msg_cdbg("Trying erase function %i... ", k);
msg_cdbg("trying... ");
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper, curcontents, newcontents);
msg_cdbg("\n");
ret = walk_eraseregions(flash, k, &erase_and_write_block_helper,
/* If everything is OK, don't try another erase function. */ if (!ret) break;curcontents, newcontents);
@@ -1544,14 +1543,19 @@ */ if (!usable_erasefunctions) continue;
this case now is an exception: it does not print a "Looking for another erase function" message.
a possible fix is to add this at the top of the loop: if (k != 0) msg_cdbg("Looking for another erase function.\n");
/* Reading the whole chip may take a while, inform the user even
* in non-verbose mode.
*/
if (flash->read(flash, curcontents, 0, size)) { /* Now we are truly screwed. Read failed as well. */msg_cinfo("Reading current flash chip contents... ");
msg_cerr("Can't read anymore!\n");
}msg_cerr("Can't read anymore! Aborting.\n"); /* We have no idea about the flash chip contents, so * retrying with another erase function is pointless. */ break;
msg_cinfo("done. Trying next erase function.\n");
i am quoting you:
msg_cdbg("Looking for another erase function.\n"); That would make the messages more consistent.
and you are mixing dbg with info in your version. the done should be info, the trying thingy dbg. i almost missed that too... this loop is a hard nut :)
} /* Free the scratchpad. */ free(curcontents); @@ -1938,13 +1942,13 @@ * preserved, but in that case we might perform unneeded erase which * takes time as well. */
- msg_cdbg("Reading old flash chip contents... ");
- msg_cinfo("Reading old flash chip contents... "); if (flash->read(flash, oldcontents, 0, size)) { ret = 1;
msg_cdbg("FAILED.\n");
goto out; }msg_cinfo("FAILED.\n");
- msg_cdbg("done.\n");
msg_cinfo("done.\n");
// This should be moved into each flash part's code to do it // cleanly. This does the job.
i have fixed those issues in the attached patch and below are the dummy-simulated outputs of writes and erases.
verbose output with one error: Reading old flash chip contents... done. Erasing and writing flash chip... Trying erase function 0... 0x000000-0x007fff:EBLOCK ERASE 0xd8 fails because i said so! Invalid command sent to flash chip! spi_block_erase_d8 failed during command execution at address 0x0
Reading current flash chip contents... done. Looking for another erase function. Trying erase function 1... 0x000000-0x01ffff:EW Done. Verifying flash... VERIFIED.
non-verbose output with one error: Reading old flash chip contents... done. Erasing and writing flash chip... Invalid command sent to flash chip! spi_block_erase_d8 failed during command execution at address 0x0 Reading current flash chip contents... done. Done. Verifying flash... VERIFIED.
verbose output without errors: Reading old flash chip contents... done. Erasing and writing flash chip... Trying erase function 0... 0x000000-0x007fff:EW, 0x008000-0x00ffff:EW, 0x010000-0x017fff:EW, 0x018000-0x01ffff:EW Done. Verifying flash... VERIFIED.
new non-verbose output without errors: Reading old flash chip contents... done. Erasing and writing flash chip... Done. Verifying flash... VERIFIED.
non-verbose erase with one error: Erasing and writing flash chip... Invalid command sent to flash chip! spi_block_erase_d8 failed during command execution at address 0x0 Reading current flash chip contents... done. Done.
^ that one is a bit odd, because we dont tell the use why we do it and it is not obvious. OTOH one could argue that this is caused by sharing the erase_and_write_flash function as is. any ideas how this could be fixed?
another thing that is obvious in the case above but applies to all: the D in Done should be d (or vice-versa).
non-verbose erase with no errors: Erasing and writing flash chip... Done.
verbose erase with one error: Erasing and writing flash chip... Trying erase function 0... 0x000000-0x007fff:EBLOCK ERASE 0xd8 fails because i said so! Invalid command sent to flash chip! spi_block_erase_d8 failed during command execution at address 0x0
Reading current flash chip contents... done. Looking for another erase function. Trying erase function 1... 0x000000-0x01ffff:E Done.
verbose erase with no errors: Erasing and writing flash chip... Trying erase function 0... 0x000000-0x007fff:E, 0x008000-0x00ffff:E, 0x010000-0x017fff:E, 0x018000-0x01ffff:E Done.
we should also get rid of that extra \n in the case of errors. i think it is the one in walk_eraseregions in the if that checks do_something. this breaks consistency in that function, but because some/most/all(?) error messages in erasers have a line break at the end this is the right thing to do. i have commented it out in the attached patch (the outputs above are with it still included), so you can spot it easily.
in general i think this is a step forward and i am glad that all those "invested hours" show some return ;) regarding the sign-off... i dont really care, but the best solution is probably, that we both sign it... we actually have done that already for "our" parts respectively. i dont see a problem with that, but i am fine with any other solution too.
todo: - \n - Done
was acked on IRC by carldani (thanks) and committed with minor changes in r1392.
Am 21.07.2011 18:36 schrieb Stefan Tauner:
in general i think this is a step forward and i am glad that all those "invested hours" show some return ;)
Your patch is really excellent. It solves all the output formatting problems I saw.
regarding the sign-off... i dont really care, but the best solution is probably, that we both sign it... we actually have done that already for "our" parts respectively. i dont see a problem with that, but i am fine with any other solution too.
todo:
- \n
Kill as mentioned on IRC.
- Done
"Erase/write done"
From: Stefan Tauner stefan.tauner@student.tuwien.ac.at Date: Thu, 21 Jul 2011 18:33:20 +0200 Subject: [PATCH] fix output of erase_and_write_flash and surroundings
see http://www.flashrom.org/pipermail/flashrom/2011-July/007220.html for a discussion about the details.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net Signed-off-by: Stefan Tauner stefan.tauner@student.tuwien.ac.at
Acked-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Regards, Carl-Daniel
On Thu, 21 Jul 2011 02:41:02 +0200 Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Fix out-of-bounds access if all erase functions fail. Fix detection of unchanged chip contents on erase failure. Return error if no usable erase functions exist.
Thanks to Stefan Tauner for spotting the last problem.
Signed-off-by: Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net
Acked-by: Stefan Tauner stefan.tauner@student.tuwien.ac.at
Am 21.07.2011 15:31 schrieb Stefan Tauner:
Acked-by: Stefan Tauner stefan.tauner@student.tuwien.ac.at
Thanks, committed in r1380.
Regards, Carl-Daniel