Background: https://review.coreboot.org/c/flashrom/+/80807
A long time ago, in the pre-git times [1], flashrom added a 1 second delay to all verification, and claimed that some chips "need some time to calm down." The commit message claims it "fixes a few reports where verify directly after erase had unpleasant side effects like corrupting flash or at least getting incorrect verify results." It provides no details of what systems, chips, or programmers were involved.
This delay remains in the --verify path today, and IMO, it's a big waste of time. If there are truly chips that are, say, deasserting the BUSY line before they're truly finished programming ... well, we should track those chips down and add targeted quirk flags for them. We shouldn't penalize all flashrom users in all cases for all time.
Personally, I highly doubt that this delay is still relevant today -- there may have been a bug in some programmer that has since been fixed; there may have been some malfunctioning system that is no longer in use; ... or it's possible this is still hiding a real buggy chip somewhere out there.
I propose: we still remove the delay. There's plenty of description in the above Gerrit link about why, and how we can handle regressions. (For one, it's relatively simple to split up a --verify operation into its constituent --write/sleep/--read operations, for debugging.)
The request: 1. Tests: I've tested a few Chromebooks, but imagine this had to have been some more esoteric system. Extra testing is welcome. 2. Thoughts: does my proposal make sense? Am I missing something obvious? 3. Awareness: if nothing else, this email may serve to highlight in case we land the above patch and later hear back that there are some sketchy --verify reports from users.
Thanks, Brian
[1] The svn->git commit in question:
commit 8ab49e72af8465d4527de2ec37b22cd44f7a1169 Date: Wed Aug 19 13:55:34 2009 +0000 Disallow erase/write for known bad chips so people won't try without a clear understanding
Hi Brian, list,
Thanks for bringing this up on the mailing list.
On Wed, Apr 24, 2024 at 9:16 PM Brian Norris briannorris@chromium.org wrote:
Background: https://review.coreboot.org/c/flashrom/+/80807
A long time ago, in the pre-git times [1], flashrom added a 1 second delay to all verification, and claimed that some chips "need some time to calm down." The commit message claims it "fixes a few reports where verify directly after erase had unpleasant side effects like corrupting flash or at least getting incorrect verify results." It provides no details of what systems, chips, or programmers were involved.
Back then, SPI flash chips weren't as ubiquitous as they currently are. This workaround is most likely for Parallel/LPC/FWH flash chips, which can actually be quite weird.
This delay remains in the --verify path today, and IMO, it's a big waste of time. If there are truly chips that are, say, deasserting the BUSY line before they're truly finished programming ... well, we should track those chips down and add targeted quirk flags for them. We shouldn't penalize all flashrom users in all cases for all time.
I agree that this delay shouldn't be unconditional.
Personally, I highly doubt that this delay is still relevant today -- there may have been a bug in some programmer that has since been fixed; there may have been some malfunctioning system that is no longer in use; ... or it's possible this is still hiding a real buggy chip somewhere out there.
If we don't know for sure whether this delay is no longer needed, I wouldn't risk re-introducing issues in flashrom.
I propose: we still remove the delay. There's plenty of description in the above Gerrit link about why, and how we can handle regressions. (For one, it's relatively simple to split up a --verify operation into its constituent --write/sleep/--read operations, for debugging.)
The request:
- Tests: I've tested a few Chromebooks, but imagine this had to have
been some more esoteric system. Extra testing is welcome.
Do Chromebooks use non-SPI flash chips? They probably don't.
- Thoughts: does my proposal make sense? Am I missing something obvious?
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
- Awareness: if nothing else, this email may serve to highlight in
case we land the above patch and later hear back that there are some sketchy --verify reports from users.
I agree that awareness is important, but bear in mind that external programmers for non-SPI flash chips are increasingly rare, so recovering from a brick can be quite convoluted (e.g. having to desolder a TSOP32 flash chip to reflash it somehow).
Thanks, Brian
[1] The svn->git commit in question:
commit 8ab49e72af8465d4527de2ec37b22cd44f7a1169 Date: Wed Aug 19 13:55:34 2009 +0000 Disallow erase/write for known bad chips so people won't try without a clear understanding _______________________________________________ flashrom mailing list -- flashrom@flashrom.org To unsubscribe send an email to flashrom-leave@flashrom.org
Best regards, Angel
Angel, great to see you here again! Also, you provided very useful information.
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
This sounds interesting. If we check for non-SPI chips, it only adds one more line to the patch (if I understand your idea correctly):
if (flash->chip->bustype == BUS_NONSPI)
On Thu, Apr 25, 2024 at 7:35 AM Angel Pons th3fanbus@gmail.com wrote:
Hi Brian, list,
Thanks for bringing this up on the mailing list.
On Wed, Apr 24, 2024 at 9:16 PM Brian Norris briannorris@chromium.org wrote:
Background: https://review.coreboot.org/c/flashrom/+/80807
A long time ago, in the pre-git times [1], flashrom added a 1 second delay to all verification, and claimed that some chips "need some time to calm down." The commit message claims it "fixes a few reports where verify directly after erase had unpleasant side effects like corrupting flash or at least getting incorrect verify results." It provides no details of what systems, chips, or programmers were involved.
Back then, SPI flash chips weren't as ubiquitous as they currently are. This workaround is most likely for Parallel/LPC/FWH flash chips, which can actually be quite weird.
This delay remains in the --verify path today, and IMO, it's a big waste of time. If there are truly chips that are, say, deasserting the BUSY line before they're truly finished programming ... well, we should track those chips down and add targeted quirk flags for them. We shouldn't penalize all flashrom users in all cases for all time.
I agree that this delay shouldn't be unconditional.
Personally, I highly doubt that this delay is still relevant today -- there may have been a bug in some programmer that has since been fixed; there may have been some malfunctioning system that is no longer in use; ... or it's possible this is still hiding a real buggy chip somewhere out there.
If we don't know for sure whether this delay is no longer needed, I wouldn't risk re-introducing issues in flashrom.
I propose: we still remove the delay. There's plenty of description in the above Gerrit link about why, and how we can handle regressions. (For one, it's relatively simple to split up a --verify operation into its constituent --write/sleep/--read operations, for debugging.)
The request:
- Tests: I've tested a few Chromebooks, but imagine this had to have
been some more esoteric system. Extra testing is welcome.
Do Chromebooks use non-SPI flash chips? They probably don't.
- Thoughts: does my proposal make sense? Am I missing something obvious?
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
- Awareness: if nothing else, this email may serve to highlight in
case we land the above patch and later hear back that there are some sketchy --verify reports from users.
I agree that awareness is important, but bear in mind that external programmers for non-SPI flash chips are increasingly rare, so recovering from a brick can be quite convoluted (e.g. having to desolder a TSOP32 flash chip to reflash it somehow).
Thanks, Brian
[1] The svn->git commit in question:
commit 8ab49e72af8465d4527de2ec37b22cd44f7a1169 Date: Wed Aug 19 13:55:34 2009 +0000 Disallow erase/write for known bad chips so people won't try without a clear understanding _______________________________________________ flashrom mailing list -- flashrom@flashrom.org To unsubscribe send an email to flashrom-leave@flashrom.org
Best regards, Angel _______________________________________________ flashrom mailing list -- flashrom@flashrom.org To unsubscribe send an email to flashrom-leave@flashrom.org
Hi Anastasia, list,
On Thu, Apr 25, 2024 at 10:26 AM Anastasia Klimchuk aklm@chromium.org wrote:
Angel, great to see you here again! Also, you provided very useful information.
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
This sounds interesting. If we check for non-SPI chips, it only adds one more line to the patch (if I understand your idea correctly):
if (flash->chip->bustype == BUS_NONSPI)
I haven't looked into the specifics at all, but that's the idea. It would be good to test that the delay is still executed on non-SPI flash chips. I think `BUS_NONSPI` actually encompasses multiple buses, so a simple equality test might not work. I would try something like this:
/* * Work around chips which need some time to calm down. * Not needed for SPI flash chips because of the bus' strict timing. */ if (flash->chip->bustype != BUS_SPI) programmer_delay(flashctx, 1000*1000);
[The following is a textual representation of my thought processes; it's not particularly necessary but I felt it could be interesting]
My idea is to maintain backwards compatibility while still enabling new features and improvements. To do this, my very general approach is to look for "cut-off points", i.e. some criteria to differentiate between "preserve existing behaviour to avoid regressions" and "add new features and/or make existing stuff better", seeking to maximise the extent of the improvements (e.g. improving SPI support in general instead of artificially restricting the improvements to a few chips/programmers) while also trying to minimise the chance of regressions (which can be fatal in the case of flashrom: some LPC/FWH flash chips are TSOP32, externally reflashing them sounds very not-fun). Analytically solving this is most likely impossible, but for specific cases it's possible to find a reasonably acceptable ("good enough") balance with the usual problem-solving tools: reason (e.g. check against specifications, test things, etc.), intuition (useful as a heuristic to get "directions" to explore further), and communication (e.g. this very mailing list thread).
In this case (removing the delay in the verify path), my thoughts were (not necessarily in order): - I feel this delay was introduced for a good reason (even though I have no idea what the reason is), so removing it will probably break something. - But do we have to remove the delay for *everything*, or can we conditionally skip it in some circumstances? This isn't a strict yes/no scenario. - Looks like the messages proposing this change pretty much only talk about flashing SPI chips, maybe this delay relates to the bus somehow. Plus, the delay was introduced a long time ago. - The SPI bus has very strict timing requirements (emulating SPI flash isn't easy). However, Non-SPI flash chips don't have the same requirements (older laptop ECs used SPI flash chips, but exposed the flash contents to the chipset over LPC). - It's likely this delay isn't needed for SPI because of all of this, so let's consider skipping the delay for SPI. It's what most (but not all) people use these flashrom for these days. - Even if some non-SPI chips don't need the delay, it's much harder to test this. Plus, keeping the delay in there for non-SPI won't hurt.
On Thu, Apr 25, 2024 at 7:35 AM Angel Pons th3fanbus@gmail.com wrote:
Hi Brian, list,
Thanks for bringing this up on the mailing list.
On Wed, Apr 24, 2024 at 9:16 PM Brian Norris briannorris@chromium.org wrote:
Background: https://review.coreboot.org/c/flashrom/+/80807
A long time ago, in the pre-git times [1], flashrom added a 1 second delay to all verification, and claimed that some chips "need some time to calm down." The commit message claims it "fixes a few reports where verify directly after erase had unpleasant side effects like corrupting flash or at least getting incorrect verify results." It provides no details of what systems, chips, or programmers were involved.
Back then, SPI flash chips weren't as ubiquitous as they currently are. This workaround is most likely for Parallel/LPC/FWH flash chips, which can actually be quite weird.
This delay remains in the --verify path today, and IMO, it's a big waste of time. If there are truly chips that are, say, deasserting the BUSY line before they're truly finished programming ... well, we should track those chips down and add targeted quirk flags for them. We shouldn't penalize all flashrom users in all cases for all time.
I agree that this delay shouldn't be unconditional.
Personally, I highly doubt that this delay is still relevant today -- there may have been a bug in some programmer that has since been fixed; there may have been some malfunctioning system that is no longer in use; ... or it's possible this is still hiding a real buggy chip somewhere out there.
If we don't know for sure whether this delay is no longer needed, I wouldn't risk re-introducing issues in flashrom.
I propose: we still remove the delay. There's plenty of description in the above Gerrit link about why, and how we can handle regressions. (For one, it's relatively simple to split up a --verify operation into its constituent --write/sleep/--read operations, for debugging.)
The request:
- Tests: I've tested a few Chromebooks, but imagine this had to have
been some more esoteric system. Extra testing is welcome.
Do Chromebooks use non-SPI flash chips? They probably don't.
- Thoughts: does my proposal make sense? Am I missing something obvious?
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
- Awareness: if nothing else, this email may serve to highlight in
case we land the above patch and later hear back that there are some sketchy --verify reports from users.
I agree that awareness is important, but bear in mind that external programmers for non-SPI flash chips are increasingly rare, so recovering from a brick can be quite convoluted (e.g. having to desolder a TSOP32 flash chip to reflash it somehow).
Thanks, Brian
[1] The svn->git commit in question:
commit 8ab49e72af8465d4527de2ec37b22cd44f7a1169 Date: Wed Aug 19 13:55:34 2009 +0000 Disallow erase/write for known bad chips so people won't try without a clear understanding _______________________________________________ flashrom mailing list -- flashrom@flashrom.org To unsubscribe send an email to flashrom-leave@flashrom.org
Best regards, Angel _______________________________________________ flashrom mailing list -- flashrom@flashrom.org To unsubscribe send an email to flashrom-leave@flashrom.org
-- Anastasia.
Best regards, Angel
Hi Angel,
On Thu, Apr 25, 2024 at 11:44:47AM +0000, Angel Pons wrote:
On Thu, Apr 25, 2024 at 10:26 AM Anastasia Klimchuk aklm@chromium.org wrote:
/*
- Work around chips which need some time to calm down.
- Not needed for SPI flash chips because of the bus' strict timing.
*/ if (flash->chip->bustype != BUS_SPI) programmer_delay(flashctx, 1000*1000);
For the record, linux_mtd is a primary target for all Arm Chromebooks, and it registers an opaque programme with BUG_PROG. Presumably an "opaque" chip qualifies as "safe enough" too, because the opaque abstraction should be taking care of timing details? If not, then this is not a sufficient solution for me.
I guess my proposal would probably enumerate the old buses (BUS_PARALLEL, BUS_LPC, BUS_FWH) specifically, to avoid further unintentional inclusion.
Anyway, I appreciate the general thrust of your thoughts and suggestions. I'll reply to a few bits below.
[The following is a textual representation of my thought processes; it's not particularly necessary but I felt it could be interesting]
Appreciated!
My idea is to maintain backwards compatibility while still enabling new features and improvements. [...]
I appreciate the attempt at a middle ground here.
On Thu, Apr 25, 2024 at 7:35 AM Angel Pons th3fanbus@gmail.com wrote:
On Wed, Apr 24, 2024 at 9:16 PM Brian Norris briannorris@chromium.org wrote:
Background: https://review.coreboot.org/c/flashrom/+/80807
A long time ago, in the pre-git times [1], flashrom added a 1 second delay to all verification, and claimed that some chips "need some time to calm down." The commit message claims it "fixes a few reports where verify directly after erase had unpleasant side effects like corrupting flash or at least getting incorrect verify results." It provides no details of what systems, chips, or programmers were involved.
Back then, SPI flash chips weren't as ubiquitous as they currently are. This workaround is most likely for Parallel/LPC/FWH flash chips, which can actually be quite weird.
Interesting callout. I've dealt with some parallel NOR as well in the past, and I don't really recall magical sleeps there either. Look around at Linux's drivers/mtd/chips/cfi* for example. There's some incidence of udelay()/msleep()/etc., but they're coupled with status checks and polling loops, similar to any SPI protocol. The only exception I found is a verbosely documented cfi_fixup_m29ew_delay_after_resume() sleep. And even that's only 0.5 milliseconds.
But I suppose the existence of good parallel flash drivers doesn't preclude the existence of poor ones elsewhere.
Do you have any kind of examples of what "weirdness" might be present here? Or is this just a general feeling and guess based on the year of the original change?
If we don't know for sure whether this delay is no longer needed, I wouldn't risk re-introducing issues in flashrom.
I don't think that condition is possible to satisfy given sufficiently under-documented "fixes" from ancient history. I appreciate that we don't want to regress things, but when there's no evidence or details of the initial problem, it's logically impossible to prove its non-existence now.
But toward a non-guaranteed proof: see consideration #3 in my aforementioned patch. We already perform verify-after-erase steps without any such delay. Wouldn't this path be affected in the same way, if it was still a real problem? Similarly, we have no such protection against a sequence of `flashrom --noverify --write; flashrom --read`. So even if the delay was serving some purpose, it seems a wholly insufficient one that feels little better than a band-aid.
Still, maybe y'all consider the band-aid better than ripping it off. And if so, I'll consider incorporating your suggestion.
Do Chromebooks use non-SPI flash chips? They probably don't.
Not that I know of.
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
I did not consider this yet, but I'm considering it now! If this is the best semi-conservative solution we can come up with, I suppose I'm fine with that. Per the above, I don't think non-SPI is relevant to anything I care about, so maybe it's good enough.
Thanks for your thoughts, Brian
Currently, BUS_NONSPI = BUS_PARALLEL | BUS_LPC | BUS_FWH
so it does not include BUS_PROG, and of course not BUS_SPI. It seems to cover what is needed?
I also need to correct my little piece of code, it actually should be if (flash->chip->bustype & BUS_NONSPI) It seems like something everyone can agree on?
The patch saves 1s for every flashrom run, for almost all users, so it's a good thing. If it can be unblocked now just by adding 1-line condition, let's unblock it now.
Perhaps there is a bigger question of Parallel/LPC/FWH flash chips which need special treatment, but rarely used nowadays. I want to discuss, but as a topic by itself.
I don't think we need to resolve all of the problems in one patch :)
On Fri, Apr 26, 2024 at 8:54 AM Brian Norris briannorris@chromium.org wrote:
Hi Angel,
On Thu, Apr 25, 2024 at 11:44:47AM +0000, Angel Pons wrote:
On Thu, Apr 25, 2024 at 10:26 AM Anastasia Klimchuk aklm@chromium.org wrote:
/*
- Work around chips which need some time to calm down.
- Not needed for SPI flash chips because of the bus' strict timing.
*/ if (flash->chip->bustype != BUS_SPI) programmer_delay(flashctx, 1000*1000);
For the record, linux_mtd is a primary target for all Arm Chromebooks, and it registers an opaque programme with BUG_PROG. Presumably an "opaque" chip qualifies as "safe enough" too, because the opaque abstraction should be taking care of timing details? If not, then this is not a sufficient solution for me.
I guess my proposal would probably enumerate the old buses (BUS_PARALLEL, BUS_LPC, BUS_FWH) specifically, to avoid further unintentional inclusion.
Anyway, I appreciate the general thrust of your thoughts and suggestions. I'll reply to a few bits below.
[The following is a textual representation of my thought processes; it's not particularly necessary but I felt it could be interesting]
Appreciated!
My idea is to maintain backwards compatibility while still enabling new features and improvements. [...]
I appreciate the attempt at a middle ground here.
On Thu, Apr 25, 2024 at 7:35 AM Angel Pons th3fanbus@gmail.com wrote:
On Wed, Apr 24, 2024 at 9:16 PM Brian Norris briannorris@chromium.org wrote:
Background: https://review.coreboot.org/c/flashrom/+/80807
A long time ago, in the pre-git times [1], flashrom added a 1 second delay to all verification, and claimed that some chips "need some time to calm down." The commit message claims it "fixes a few reports where verify directly after erase had unpleasant side effects like corrupting flash or at least getting incorrect verify results." It provides no details of what systems, chips, or programmers were involved.
Back then, SPI flash chips weren't as ubiquitous as they currently are. This workaround is most likely for Parallel/LPC/FWH flash chips, which can actually be quite weird.
Interesting callout. I've dealt with some parallel NOR as well in the past, and I don't really recall magical sleeps there either. Look around at Linux's drivers/mtd/chips/cfi* for example. There's some incidence of udelay()/msleep()/etc., but they're coupled with status checks and polling loops, similar to any SPI protocol. The only exception I found is a verbosely documented cfi_fixup_m29ew_delay_after_resume() sleep. And even that's only 0.5 milliseconds.
But I suppose the existence of good parallel flash drivers doesn't preclude the existence of poor ones elsewhere.
Do you have any kind of examples of what "weirdness" might be present here? Or is this just a general feeling and guess based on the year of the original change?
If we don't know for sure whether this delay is no longer needed, I wouldn't risk re-introducing issues in flashrom.
I don't think that condition is possible to satisfy given sufficiently under-documented "fixes" from ancient history. I appreciate that we don't want to regress things, but when there's no evidence or details of the initial problem, it's logically impossible to prove its non-existence now.
But toward a non-guaranteed proof: see consideration #3 in my aforementioned patch. We already perform verify-after-erase steps without any such delay. Wouldn't this path be affected in the same way, if it was still a real problem? Similarly, we have no such protection against a sequence of `flashrom --noverify --write; flashrom --read`. So even if the delay was serving some purpose, it seems a wholly insufficient one that feels little better than a band-aid.
Still, maybe y'all consider the band-aid better than ripping it off. And if so, I'll consider incorporating your suggestion.
Do Chromebooks use non-SPI flash chips? They probably don't.
Not that I know of.
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
I did not consider this yet, but I'm considering it now! If this is the best semi-conservative solution we can come up with, I suppose I'm fine with that. Per the above, I don't think non-SPI is relevant to anything I care about, so maybe it's good enough.
Thanks for your thoughts, Brian
On Mon, Apr 29, 2024 at 5:58 AM Anastasia Klimchuk aklm@chromium.org wrote:
The patch saves 1s for every flashrom run, for almost all users, so it's a good thing. If it can be unblocked now just by adding 1-line condition, let's unblock it now.
I've updated the patch to approximately this suggestion.
Perhaps there is a bigger question of Parallel/LPC/FWH flash chips which need special treatment, but rarely used nowadays. I want to discuss, but as a topic by itself.
Honestly, I don't know how to move forward on assertions about "know[ing] for sure whether this delay is no longer needed". We have no actionable information about the original problem, so we can never be sure.
So for my purposes, I'll just drop the subject after this patch.
I don't think we need to resolve all of the problems in one patch :)
Great!
Brian, thank you for your work! The patch is submitted.
Honestly, I don't know how to move forward on assertions about "know[ing] for sure whether this delay is no longer needed". We have no actionable information about the original problem, so we can never be sure.
I have been thinking about it in the background. I feel it's a good topic to discuss in the meeting.