Bigger things like this should actually be discussed on
the mailing list first, e.g. to avoid spending time on
something that might not get merged. Doesn't matter now,
you already wrote it.

In any case it wouldn't be a waste of time: before this patch there were no existing methods to read a firmware in such an unreliable ISP environment while I desperately needed a method; - now there is one :) And so far it's very successful, although a little bit slow - read it many times and all my checksums are matching! Btw I'm sure I could erase/write back in a similar way - the code for "writing" is already there, just still-not-tested on the real hardware

Regarding your original problem, I guess it could be solved
with some tuning of your hardware setup. Did you try to get
the SoC into a better state? (reset preferred)

QCA 9533 SoC has the active low RESET_L pin, which theoretically could have be shortened to GND for the SoC to enter the "reset state" - but this pin isn't at the front row, its' under a chip at the rear row ( check out https://forum.openwrt.org/viewtopic.php?id=73100 for the pinout diagram ) so I couldn't touch it with multimeter to find the exposure of this pin at some convenient place on a board where it could have been shortened

Also, I can't find a datasheet for this router board - but a datasheet for this SoC tells: "It is recommended to leave this signal floating if resetting the chip externally is not required", so its quite likely this pin is simply floating - not connected to anything on the board - and, since its at the rear row under a chip, it is impossible to shorten it with GND

... though, as you already work with two masters on the
SPI bus, did you try to boot the SoC up and halt the OS?

Maybe it is possible to put this SoC to reset from the firmware, but we need a universal method to read/flash the firmware without relying on what is already flashed at the router's flash chip: in example, what if my router is bricked and its' firmware cannot be loaded to halt a router?

About this implementation, when I read "gingerly" I
expected something more sophisticated (e.g. some sort of
collision avoidance, listening on the bus instead of
jamming it). Though, I have nothing against the way
you do it, I would call it brute-force.

Could be renamed of course! "gingerly" was a temporary name I've chosen partially because it starts with unoccupied 'g' letter while giving an impression of what we're trying to do. If you'd like I could rename this mode to "-b | --brute-force" ("b" isn't occupied too) while freeing a "gingerly" name for the something more significant

Also, if I call `flashrom --gingerly` I would expect
something that is safer to use, but it kind of is
the opposite, isn't it?

I can't see how this new mode could be more dangerous than a normal operation; also, dangerous to what - software or hardware?


I don't like to clutter up spi_read_chunked() and
spi_write_chunked(), most changes to them seem to be there
only to avoid malloc() in your spi_rw_gingerly()?
I would rather move it there.

I agree with you, but can't see a good way of how this could be avoided. spi_rw_gingerly has to be called for each chunk, and right now there are just two malloc's for two buffers which are constantly being reused by spi_rw_gingerly. If I'd move the malloc's inside the "spi_rw_gingerly" - it will significantly raise the frequency of malloc/free calls, reducing the efficiency of flashrom. E.g. if we'd try to read a 4MB chip by those 256 byte chunks, there would be at least 32768 extra malloc's and free's

And... this might be the biggest issue: the possible
endless loop in spi_rw_gingerly(). For a mergeable solution,
you'd have to put some kind of timeout there

Yes, it will hang a flashrom if this chip never becomes available even for a split second; but I don't see it as a problem: the user can terminate flashrom / restart his hardware programmer and understand that perhaps even with this mode its' impossible to read a firmware of his board and the desoldering of a chip can't be avoided with ISP mode

Also, if we'd introduce a timeout - it shouldn't be hardcoded. In example: if I'd hardcode it to 5 seconds - it will block a way for the boards whose chip is available for 1 second per each 10 seconds; hardcode to 15 seconds - blocks those which are 1sec/20sec available, and so on

Most likely we'll have to introduce a new flag "-t | --timeout <seconds>" ("t" is also not occupied) but lets' have more discussions to clarify our further steps

And that is where the problems start... if it can fail,
we have to handle the failure correctly, otherwise
flashrom (in its current implementation) would probably
fall back to erase the whole chip and make things worse.

If there is a timeout: we could just print some error message and shutdown the flashrom, then it shouldn't do anything else... also, why would it try to erase the whole chip if we didn't specify such an operation?

Last but not least, why do it at the SPI level? Retries
due to unreliable connections should be handled at a higher
level, IMO.

Sorry, I could not see a way to do this low level verification of each chunk at the higher level than the same level where these individual chunks are being read/written

View Change

To view, visit change 23840. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: flashrom
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3f18276d9fb7233d082720cb29d154f31c77100
Gerrit-Change-Number: 23840
Gerrit-PatchSet: 2
Gerrit-Owner: Mike Banon <mikebdp2@gmail.com>
Gerrit-Reviewer: David Hendricks <david.hendricks@gmail.com>
Gerrit-Reviewer: Mike Banon <mikebdp2@gmail.com>
Gerrit-Reviewer: Nico Huber <nico.h@gmx.de>
Gerrit-Reviewer: Paul Menzel <paulepanter@users.sourceforge.net>
Gerrit-Reviewer: build bot (Jenkins) <no-reply@coreboot.org>
Gerrit-Comment-Date: Thu, 08 Mar 2018 15:47:15 +0000
Gerrit-HasComments: No
Gerrit-HasLabels: No