Change in flashrom[master]: Add "gingerly" flashing mode for the unreliable ISP environments - flashrom-gerrit

8 Mar 2018


      Mike Banon has posted comments on this change. ( https://review.coreboot.org/23840 )
Change subject: Add "gingerly" flashing mode for the unreliable ISP environments
......................................................................
Patch Set 2:
...
...
...
... though, as you already work with two masters on the
SPI bus, did you try to boot the SoC up and halt the OS?
Maybe it is possible to put this SoC to reset from the firmware,
but we need a universal method to read/flash the firmware without
relying on what is already flashed at the router's flash chip: in
example, what if my router is bricked and its' firmware cannot be
loaded to halt a router?
Universal method: isolate VCC (e.g. desolder flash chip, or put a
diode on the VCC).
But that wouldn't be ISP method ;) And many people are afraid of soldering their densely packed boards and would like to avoid this at all costs - which is reasonable! My friend, who is quite experienced at soldering, accidentally broke a copper track while desoldering a SOIC8 chip from his coreboot board - and still couldn't repair :(
...
...
I can't see how this new mode could be more dangerous than a
normal
...
operation; also, dangerous to what - software or hardware?
Hardware. It encourages users to operate flashrom with multiple
masters on the SPI bus. Which is generally not supported by hard-
ware (it depends on how the masters drive their outputs, e.g.
open-drain vs. push-pull). If you do it nevertheless and flashrom
only runs for 1s and bails out, that is much safer than shorting
the master's outputs for a longer period.
That it works for you and you haven't broken anything yet tells us
nothing about other possible hardware setups.
Sometimes the whole ISP thing can be very dangerous, even without any extra flashrom modes! This week I connected CH341A USB programmer through a test clip to SOIC8 flash chip at HD capture board - of course without any power adapters connected to a board or a clip. Then I plugged CH341A to the laptop's USB port and 1 second later pressed the "Enter" key for " sudo ./flashrom -p ch341a_spi -V " command - without any gingerly mode as you see, just a quick probe! But the flashrom did not detect a programmer, about 5 seconds later I sensed a horrible burn smell and instantly disconnected everything, almost shitting my pants in the process ;)
This smell definitely came from CH341A, because the HD capture board still works 100% OK and at CH341A's chip label almost all letters color got dark from the heat: only "WC" letters are of the original light-grey color, the rest "H CH341A" letters got very dark. The most wonderful thing is that CH341A turned out to be very resilient and is still working perfectly! :)
It seems this hungry board tried to eat too much current through CH341A while trying to power itself! Could have killed a Bus Pirate or another more-advanced-but-less-resilient programmer without any protection against such things. (maybe would repost this experience to the mailing lists)
However, even with such a dangerous boards in the existence, ISP is still much safer than desoldering a flash chip without an experienced steady hand - that's why I believe the advanced modes for ISP flashing, be it that "gingerly" mode or any other future inventions, have a right to exist
...
...
...
And... this might be the biggest issue: the possible
endless loop in spi_rw_gingerly(). For a mergeable solution,
you'd have to put some kind of timeout there
Yes, it will hang a flashrom if this chip never becomes available
even for a split second; but I don't see it as a problem: the
user
...
can terminate flashrom / restart his hardware programmer and
understand that perhaps even with this mode its' impossible to
read
...
a firmware of his board and the desoldering of a chip can't be
avoided with ISP mode
I would generally agree, but all that without a progress indi-
cation? How do you tell after 30min if it's nearly done or didn't
get anywhere so far?
Currently, if we'd run it with -V, if there are any successful "probing for" messages - we could continue waiting and the whole process will be successful. I agree with you that we need to come up with a better indication of how much percentage of a chip has been done, although perhaps this feature could be added independently for the whole flashrom modes - will be more informative than e.g. a current simple "Reading flash" message, which in case of a turtle speed chips like KB9012 (about 15 minutes to read) without "-V" flag and any percentage printed - almost gives an impression that a flashrom is stuck :P
...
...
...
And that is where the problems start... if it can fail,
we have to handle the failure correctly, otherwise
flashrom (in its current implementation) would probably
fall back to erase the whole chip and make things worse.
If there is a timeout: we could just print some error message and
shutdown the flashrom, then it shouldn't do anything else...
also,
...
why would it try to erase the whole chip if we didn't specify
such
...
an operation?
Flashrom defaults to try another erase function if one fails,
finally using one that erases the whole chip. Can be handled
more gracefully ofc, just something that we have to keep in
mind.
Perhaps there should be a way to just exit a flashrom with some message without trying any other erase functions. Maybe by adding a new return value to the erase functions ( perhaps "2" or "-2" if there is already "1" or "-1" code in the failure case ) and adjusting the flashrom algorithm to act accordingly if such a return value has been accepted from erase function
...
...
Also, if we'd introduce a timeout - it shouldn't be hardcoded. In
example: if I'd hardcode it to 5 seconds - it will block a way
for
...
the boards whose chip is available for 1 second per each 10
seconds;
...
hardcode to 15 seconds - blocks those which are 1sec/20sec
available,
...
and so on
Most likely we'll have to introduce a new flag "-t | --timeout
<seconds>" ("t" is also not occupied) but lets' have more
discussions
...
to clarify our further steps
I would prefer a static timeout, doesn't have to be a small one.
For instance 60s, then bail out. It's not that bad to wait 1min
but if single hunks take more than 60s, nobody would have the
patience to wait for the full run anyway.
Is it possible that there are some boards with a lengthy available/unavailable durations, like " 70 seconds unavailable / 30 seconds available " ? If yes, the full run would still could take a reasonably small time despite the long wait between each two groups of many successful attempts
...
...
...
Last but not least, why do it at the SPI level? Retries
due to unreliable connections should be handled at a higher
level, IMO.
Sorry, I could not see a way to do this low level verification of
each chunk at the higher level than the same level where these
individual chunks are being read/written
spi_write_chunked() and spi_read_chunked() can be called with
huge ranges, yes. But they don't have to be. You could just
generate smaller hunks at a higher level. For instance, you
could replace the flash chip's read() and write() pointers
with implementations that produce smaller hunks and "gingerly"
run the original read/write functions on them.
This is a good idea. Perhaps I would implement it after we'd discuss other issues with this "gingerly" mode and its' fate would become more clear
-- 
To view, visit https://review.coreboot.org/23840
To unsubscribe, or for help writing mail filters, visit https://review.coreboot.org/settings

Gerrit-Project: flashrom
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f18276d9fb7233d082720cb29d154f31c77100
Gerrit-Change-Number: 23840
Gerrit-PatchSet: 2
Gerrit-Owner: Mike Banon mikebdp2@gmail.com
Gerrit-Reviewer: David Hendricks david.hendricks@gmail.com
Gerrit-Reviewer: Mike Banon mikebdp2@gmail.com
Gerrit-Reviewer: Nico Huber nico.h@gmx.de
Gerrit-Reviewer: Paul Menzel paulepanter@users.sourceforge.net
Gerrit-Reviewer: build bot (Jenkins) no-reply@coreboot.org
Gerrit-Comment-Date: Thu, 08 Mar 2018 19:02:24 +0000
Gerrit-HasComments: No
Gerrit-HasLabels: No