Attention is currently required from: Nico Huber, Caveh Jalali, Tim Wawrzynczak, Rizwan Qureshi, Edward O'Callaghan, Angel Pons, Nick Vaccaro, Alex Levin.
Subrata Banik has posted comments on this change. ( https://review.coreboot.org/c/flashrom/+/61854 )
Change subject: ichspi.c: Check SPI Cycle In-Progress prior start HW Seq
......................................................................
Patch Set 2:
(1 comment)
Patchset:
PS2:
While you are flashing the SPI using flashrom on DUT at S0, there are cases when we are seeing some flashrom failure issue and debug data that we have so far is not very conclusive to say that CSE is doing anything wrong.
Was upstream flashrom used?
Yup.
So is the issue easy to reproduce on the affected systems?
It's not easy for sure but we are able to replicate.
Why do you assume that the CSE is involved?
Because Chrome OS needs few features that involve CSE to start it's services post booting to OS. Those services might further need to fetch the CSE active partition residing in SPI at regular intervals. We suspect flashrom operation is conflicting with such access as flashrom doesn't sync based on the SCIP bit.
Please describe the actual symptoms, error messages? data corruption?
Something must have led to this suspicion, right?
I would expect the hardware to coordinate between the different masters.
Unfortunately, it seems doesn't support multimaster protocol.
I don't know any such thing. What I am referring to is that Intel calls the
hardware blocks with SPI access (CSE, host, GbE etc.) masters. Somehow these
must be coordinated. It's possible that one master can toggle the SCIP bits
of all masters at once. It's possible that this is all there is. I doubt it,
but it's possible. That there is no hardware synchronization at all would be
very odd.
We are waiting to understand how HW arbitration actually works between those agents. So, far we don't have much answer.
At least,
the SCIP bit can never be enough to synchronize masters. Even with the added
waiting loop, there is still much room for race conditions (e.g. second master
reading SCIP = 0 before the first posted its write to start a cycle).
I believe this sync might work like this.
When first master is creating the command and haven't set the FGO bit. There is no operation that is taking place so, 2nd master can read SCIP = 0 and if 2nd master wish to start a new command, it has still room for that. But immediately after FGO is set, the HW logic will flip the SCIP bit so, 2nd master if about to start the new command, it has to in wait loop till 1st master finishes the operation. And I have created the blocking logic. At based on the initial code analysis, CSE FW will ensure not to intrude into host cpu operation but host CPU (using flashrom) might run into issue because CSE is using SPI bus for read operation hence flashrom operation might fail.
It seems you ignore hardware concurrency. What do you think happens if
1st master reads SCIP = 0, 2nd master reads SCIP = 0, and both write
FGO = 1. There is no magic that would make any of the masters return
to the waiting loop.
Yes, this is valid concern. And even with this implementation, we still might have few % chance that we may see a failure. What I have suggested is to have a retry implementation inside flashrom for giving you an insurance. The example that you have given, if we assume that CSE attempt to get the bus and Flashrom fail, but it can retry again. On the next attempt, it might see the SPI cycle is busy and wait there. isn't it?
I guess a simple experiment could give a hint if it does: The hardware is also
able to start cycles on its own for memory-mapped flash access. You could test
if this toggles the SCIP bit in the host master interface. For instance, poll
SCIP and then read from the memory-mapped BIOS area. If polling too fast is an
issue, I'd measure the time of a short cycle, then take half of that as the > polling
delay.
I believe this scenario is valid while we are booting from SPI memory mapped memory (till we have resources are memory mapped) and not in the case of while we are running the flashrom from OS. There should not be any code which is getting fetched from XIP/SPI mapped memory while BIOS is done.
Doesn't matter what is valid. If it toggles the host master SCIP bit, than there
is definitely something to do in flashrom. If it doesn't, that says nothing, but
we could continue to investigate and see if CSE activity toggles the host SCIP bit.
CSE access the SPI for read hence, it will for sure flip the SCIP. No doubt about that.
--
To view, visit
https://review.coreboot.org/c/flashrom/+/61854
To unsubscribe, or for help writing mail filters, visit
https://review.coreboot.org/settings
Gerrit-Project: flashrom
Gerrit-Branch: master
Gerrit-Change-Id: Ib9265cc20513fd00f32f8fa22e28c312903ca484
Gerrit-Change-Number: 61854
Gerrit-PatchSet: 2
Gerrit-Owner: Subrata Banik
subratabanik@google.com
Gerrit-Reviewer: Alex Levin
levinale@chromium.org
Gerrit-Reviewer: Caveh Jalali
caveh@chromium.org
Gerrit-Reviewer: Edward O'Callaghan
quasisec@chromium.org
Gerrit-Reviewer: Nick Vaccaro
nvaccaro@google.com
Gerrit-Reviewer: Nico Huber
nico.h@gmx.de
Gerrit-Reviewer: Rizwan Qureshi
rizwan.qureshi@intel.com
Gerrit-Reviewer: Tim Wawrzynczak
twawrzynczak@chromium.org
Gerrit-Reviewer: build bot (Jenkins)
no-reply@coreboot.org
Gerrit-CC: Angel Pons
th3fanbus@gmail.com
Gerrit-Attention: Nico Huber
nico.h@gmx.de
Gerrit-Attention: Caveh Jalali
caveh@chromium.org
Gerrit-Attention: Tim Wawrzynczak
twawrzynczak@chromium.org
Gerrit-Attention: Rizwan Qureshi
rizwan.qureshi@intel.com
Gerrit-Attention: Edward O'Callaghan
quasisec@chromium.org
Gerrit-Attention: Angel Pons
th3fanbus@gmail.com
Gerrit-Attention: Nick Vaccaro
nvaccaro@google.com
Gerrit-Attention: Alex Levin
levinale@chromium.org
Gerrit-Comment-Date: Fri, 11 Feb 2022 14:50:19 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Nico Huber
nico.h@gmx.de
Comment-In-Reply-To: Subrata Banik
subratabanik@google.com
Gerrit-MessageType: comment