Anastasia Klimchuk submitted this change.

View Change

Approvals: build bot (Jenkins): Verified Anastasia Klimchuk: Looks good to me, approved Peter Marheine: Looks good to me, but someone else must approve
flashrom: Don't throw around "delay 1 second" so lightly

Waiting a full second is a very long time, especially when
default_delay() chooses to busy-loop. This code has been around for a
decade, with vague references to user reports:

commit 8ab49e72af8465d4527de2ec37b22cd44f7a1169
Date: Wed Aug 19 13:55:34 2009 +0000
Disallow erase/write for known bad chips so people won't try without a clear understanding

Still, this logic does not belong in the high-level library logic, used
by all programmers and all chips. If there is a timing issue, it should
either be encoded in the appropriate programmer or flashchip timing.
However, we don't really know what chips were in use, as the above
commit doesn't have any links to reports. So in a feeble attempt at
avoiding breaking users here, we also surmise that...

* SPI chips weren't all that common in 2009;
* I'm mostly motivated by flashrom performance on Chromebooks, were SPI
chips (and linux_mtd / BUS_PROG) are the rule; and
* SPI chips have precise timing requirements and an appropriate BUSY
status. So we guess that this "calm down" magic delay wouldn't be
necessary there.

Thus, we allow this magic delay only on non-SPI (and non-BUG_PROG, used
by linux_mtd for one) buses as a compromise.

Now, this change has some (hopefully [1] tiny) chance of regression, so
we have the following considerations:

1. emergency_help_message() already provides documentation on how to
contact support, in case we need to handle any user-reported
regressions.
2. If there is any regression here, it's only in the --verify code; so
we can always provide workarounds for testing this, to determine
whether this change may have been at fault. For example, something
like:

flashrom --write /my/new/image.bin --noverify
sleep 1
flashrom --read /tmp/bar.bin
cmp /my/new/image.bin /tmp/bar.bin

If such problems occur, we can collect system/programmer/chip info to
try to encode a more targeted delay into the appropriate
chip/programmer implementation, and avoid penalizing the entire
project like this.
3. We already have (embedded in erase_write()) erase verification that
performs no such delay. So depending on the type of timing error that
this delay was attempting to cover, we may have some proof that this
delay is no longer necessary (or at least, that whatever systems were
needing this delay in the first place are no longer caring about
flashrom).
4. We've retained the delay for buses that were likely common in 2009
(per the above "feeble attempt").

NB: I avoid using the BUS_NONSPI macro, because I want to exclude any
future buses from this workaround, even in the event that the BUS_NONSPI
category grows in the future.

[1] Famous last words.

BUG=b:325539928
TEST=`flashrom_tester --flashrom_binary=$(which flashrom) \
internal Erase_and_Write Fail_to_verify`,
TEST=`vpd -i RW_VPD -s foo=bar; vpd -i RW_VPD -l; \
vpd -i RW_VPD -d foo; vpd -i RW_VPD -l`
TEST=`elogtool list; elogtool add 0xa7; elogtool list`

on (at least) 2 systems:
#1: Kukui/Kakadu rev2 - MTD programmer /
kernel 5.10.215-24542-g0515a679eb42 /
CrOS ~ 15857.0.0
#2: Zork/Dirinboz rev2 -
chip name: vendor="Winbond" name="W25Q128.JW.DTR" /
BIOS: Google_Dirinboz.13434.688.0 /
kernel 5.4.267-21940-g67f70e251a74 /
CrOS ~ 15753.43.0

Change-Id: Ie09651fede3f9f03425244c94a2da8bae00315fc
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://review.coreboot.org/c/flashrom/+/80807
Reviewed-by: Peter Marheine <pmarheine@chromium.org>
Reviewed-by: Anastasia Klimchuk <aklm@chromium.org>
Tested-by: build bot (Jenkins) <no-reply@coreboot.org>
---
M flashrom.c
1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/flashrom.c b/flashrom.c
index 630c69d..6b15ee5 100644
--- a/flashrom.c
+++ b/flashrom.c
@@ -2330,8 +2330,31 @@
if (verify && !all_skipped) {
msg_cinfo("Verifying flash... ");

- /* Work around chips which need some time to calm down. */
- programmer_delay(flashctx, 1000*1000);
+ /*
+ * Work around chips which "need some time to calm down."
+ *
+ * Frankly, it's not 100% clear why this delay is here at all,
+ * except for a terse message from 2009 of "a few reports where
+ * verify directly after erase had unpleasant side effects like
+ * corrupting flash or at least getting incorrect verify
+ * results". Ideally, if there were a few known problematic
+ * chips or programmers, we could add quirks flags for those
+ * specific implementations without penalizing all other
+ * flashrom users. But alas, we don't know which systems
+ * experienced those issues.
+ *
+ * Out of an extreme abundance of caution, we retain this
+ * delay, but only for a few non-SPI bus types that were the
+ * likely prevalent targets at the time. This is a complete
+ * guess, which conveniently avoids wasting time on common
+ * BUS_SPI and BUS_PROG systems.
+ *
+ * Background thread:
+ * Subject: RFC: removing 1 second verification delay
+ * https://mail.coreboot.org/hyperkitty/list/flashrom@flashrom.org/thread/SFV3OJBVVMDKRLI3FQA3DDDGEXJ7W4ED/
+ */
+ if (flashctx->chip->bustype & (BUS_PARALLEL | BUS_LPC | BUS_FWH))
+ programmer_delay(flashctx, 1000*1000);

if (verify_all)
combine_image_by_layout(flashctx, newcontents, oldcontents);

To view, visit change 80807. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: flashrom
Gerrit-Branch: main
Gerrit-Change-Id: Ie09651fede3f9f03425244c94a2da8bae00315fc
Gerrit-Change-Number: 80807
Gerrit-PatchSet: 7
Gerrit-Owner: Brian Norris <briannorris@chromium.org>
Gerrit-Reviewer: Anastasia Klimchuk <aklm@chromium.org>
Gerrit-Reviewer: Nikolai Artemiev <nartemiev@google.com>
Gerrit-Reviewer: Peter Marheine <pmarheine@chromium.org>
Gerrit-Reviewer: build bot (Jenkins) <no-reply@coreboot.org>
Gerrit-MessageType: merged