Hi Angel,
On Thu, Apr 25, 2024 at 11:44:47AM +0000, Angel Pons wrote:
On Thu, Apr 25, 2024 at 10:26 AM Anastasia Klimchuk aklm@chromium.org wrote:
/*
- Work around chips which need some time to calm down.
- Not needed for SPI flash chips because of the bus' strict timing.
*/ if (flash->chip->bustype != BUS_SPI) programmer_delay(flashctx, 1000*1000);
For the record, linux_mtd is a primary target for all Arm Chromebooks, and it registers an opaque programme with BUG_PROG. Presumably an "opaque" chip qualifies as "safe enough" too, because the opaque abstraction should be taking care of timing details? If not, then this is not a sufficient solution for me.
I guess my proposal would probably enumerate the old buses (BUS_PARALLEL, BUS_LPC, BUS_FWH) specifically, to avoid further unintentional inclusion.
Anyway, I appreciate the general thrust of your thoughts and suggestions. I'll reply to a few bits below.
[The following is a textual representation of my thought processes; it's not particularly necessary but I felt it could be interesting]
Appreciated!
My idea is to maintain backwards compatibility while still enabling new features and improvements. [...]
I appreciate the attempt at a middle ground here.
On Thu, Apr 25, 2024 at 7:35 AM Angel Pons th3fanbus@gmail.com wrote:
On Wed, Apr 24, 2024 at 9:16 PM Brian Norris briannorris@chromium.org wrote:
Background: https://review.coreboot.org/c/flashrom/+/80807
A long time ago, in the pre-git times [1], flashrom added a 1 second delay to all verification, and claimed that some chips "need some time to calm down." The commit message claims it "fixes a few reports where verify directly after erase had unpleasant side effects like corrupting flash or at least getting incorrect verify results." It provides no details of what systems, chips, or programmers were involved.
Back then, SPI flash chips weren't as ubiquitous as they currently are. This workaround is most likely for Parallel/LPC/FWH flash chips, which can actually be quite weird.
Interesting callout. I've dealt with some parallel NOR as well in the past, and I don't really recall magical sleeps there either. Look around at Linux's drivers/mtd/chips/cfi* for example. There's some incidence of udelay()/msleep()/etc., but they're coupled with status checks and polling loops, similar to any SPI protocol. The only exception I found is a verbosely documented cfi_fixup_m29ew_delay_after_resume() sleep. And even that's only 0.5 milliseconds.
But I suppose the existence of good parallel flash drivers doesn't preclude the existence of poor ones elsewhere.
Do you have any kind of examples of what "weirdness" might be present here? Or is this just a general feeling and guess based on the year of the original change?
If we don't know for sure whether this delay is no longer needed, I wouldn't risk re-introducing issues in flashrom.
I don't think that condition is possible to satisfy given sufficiently under-documented "fixes" from ancient history. I appreciate that we don't want to regress things, but when there's no evidence or details of the initial problem, it's logically impossible to prove its non-existence now.
But toward a non-guaranteed proof: see consideration #3 in my aforementioned patch. We already perform verify-after-erase steps without any such delay. Wouldn't this path be affected in the same way, if it was still a real problem? Similarly, we have no such protection against a sequence of `flashrom --noverify --write; flashrom --read`. So even if the delay was serving some purpose, it seems a wholly insufficient one that feels little better than a band-aid.
Still, maybe y'all consider the band-aid better than ripping it off. And if so, I'll consider incorporating your suggestion.
Do Chromebooks use non-SPI flash chips? They probably don't.
Not that I know of.
Did you conside making it so that only SPI programmers / flash chips skip the delay? The SPI bus' strict timing leaves no room for this problem to occur, so it should be safe to skip this delay. And this would keep non-SPI unaffected (which is most likely what needs this extra delay).
I did not consider this yet, but I'm considering it now! If this is the best semi-conservative solution we can come up with, I suppose I'm fine with that. Per the above, I don't think non-SPI is relevant to anything I care about, so maybe it's good enough.
Thanks for your thoughts, Brian