Hi Anastasia, list,
On Thu, Apr 25, 2024 at 10:26 AM Anastasia Klimchuk aklm@chromium.org wrote:
Angel, great to see you here again! Also, you provided very useful information.
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
This sounds interesting. If we check for non-SPI chips, it only adds one more line to the patch (if I understand your idea correctly):
if (flash->chip->bustype == BUS_NONSPI)
I haven't looked into the specifics at all, but that's the idea. It would be good to test that the delay is still executed on non-SPI flash chips. I think `BUS_NONSPI` actually encompasses multiple buses, so a simple equality test might not work. I would try something like this:
/* * Work around chips which need some time to calm down. * Not needed for SPI flash chips because of the bus' strict timing. */ if (flash->chip->bustype != BUS_SPI) programmer_delay(flashctx, 1000*1000);
[The following is a textual representation of my thought processes; it's not particularly necessary but I felt it could be interesting]
My idea is to maintain backwards compatibility while still enabling new features and improvements. To do this, my very general approach is to look for "cut-off points", i.e. some criteria to differentiate between "preserve existing behaviour to avoid regressions" and "add new features and/or make existing stuff better", seeking to maximise the extent of the improvements (e.g. improving SPI support in general instead of artificially restricting the improvements to a few chips/programmers) while also trying to minimise the chance of regressions (which can be fatal in the case of flashrom: some LPC/FWH flash chips are TSOP32, externally reflashing them sounds very not-fun). Analytically solving this is most likely impossible, but for specific cases it's possible to find a reasonably acceptable ("good enough") balance with the usual problem-solving tools: reason (e.g. check against specifications, test things, etc.), intuition (useful as a heuristic to get "directions" to explore further), and communication (e.g. this very mailing list thread).
In this case (removing the delay in the verify path), my thoughts were (not necessarily in order): - I feel this delay was introduced for a good reason (even though I have no idea what the reason is), so removing it will probably break something. - But do we have to remove the delay for *everything*, or can we conditionally skip it in some circumstances? This isn't a strict yes/no scenario. - Looks like the messages proposing this change pretty much only talk about flashing SPI chips, maybe this delay relates to the bus somehow. Plus, the delay was introduced a long time ago. - The SPI bus has very strict timing requirements (emulating SPI flash isn't easy). However, Non-SPI flash chips don't have the same requirements (older laptop ECs used SPI flash chips, but exposed the flash contents to the chipset over LPC). - It's likely this delay isn't needed for SPI because of all of this, so let's consider skipping the delay for SPI. It's what most (but not all) people use these flashrom for these days. - Even if some non-SPI chips don't need the delay, it's much harder to test this. Plus, keeping the delay in there for non-SPI won't hurt.
On Thu, Apr 25, 2024 at 7:35 AM Angel Pons th3fanbus@gmail.com wrote:
Hi Brian, list,
Thanks for bringing this up on the mailing list.
On Wed, Apr 24, 2024 at 9:16 PM Brian Norris briannorris@chromium.org wrote:
Background: https://review.coreboot.org/c/flashrom/+/80807
A long time ago, in the pre-git times [1], flashrom added a 1 second delay to all verification, and claimed that some chips "need some time to calm down." The commit message claims it "fixes a few reports where verify directly after erase had unpleasant side effects like corrupting flash or at least getting incorrect verify results." It provides no details of what systems, chips, or programmers were involved.
Back then, SPI flash chips weren't as ubiquitous as they currently are. This workaround is most likely for Parallel/LPC/FWH flash chips, which can actually be quite weird.
This delay remains in the --verify path today, and IMO, it's a big waste of time. If there are truly chips that are, say, deasserting the BUSY line before they're truly finished programming ... well, we should track those chips down and add targeted quirk flags for them. We shouldn't penalize all flashrom users in all cases for all time.
I agree that this delay shouldn't be unconditional.
Personally, I highly doubt that this delay is still relevant today -- there may have been a bug in some programmer that has since been fixed; there may have been some malfunctioning system that is no longer in use; ... or it's possible this is still hiding a real buggy chip somewhere out there.
If we don't know for sure whether this delay is no longer needed, I wouldn't risk re-introducing issues in flashrom.
I propose: we still remove the delay. There's plenty of description in the above Gerrit link about why, and how we can handle regressions. (For one, it's relatively simple to split up a --verify operation into its constituent --write/sleep/--read operations, for debugging.)
The request:
- Tests: I've tested a few Chromebooks, but imagine this had to have
been some more esoteric system. Extra testing is welcome.
Do Chromebooks use non-SPI flash chips? They probably don't.
- Thoughts: does my proposal make sense? Am I missing something obvious?
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
- Awareness: if nothing else, this email may serve to highlight in
case we land the above patch and later hear back that there are some sketchy --verify reports from users.
I agree that awareness is important, but bear in mind that external programmers for non-SPI flash chips are increasingly rare, so recovering from a brick can be quite convoluted (e.g. having to desolder a TSOP32 flash chip to reflash it somehow).
Thanks, Brian
[1] The svn->git commit in question:
commit 8ab49e72af8465d4527de2ec37b22cd44f7a1169 Date: Wed Aug 19 13:55:34 2009 +0000 Disallow erase/write for known bad chips so people won't try without a clear understanding _______________________________________________ flashrom mailing list -- flashrom@flashrom.org To unsubscribe send an email to flashrom-leave@flashrom.org
Best regards, Angel _______________________________________________ flashrom mailing list -- flashrom@flashrom.org To unsubscribe send an email to flashrom-leave@flashrom.org
-- Anastasia.
Best regards, Angel