Attention is currently required from: Nico Huber, Caveh Jalali, Tim Wawrzynczak, Rizwan Qureshi, Edward O'Callaghan, Angel Pons, Nick Vaccaro, Alex Levin.
1 comment:
Patchset:
While you are flashing the SPI using flashrom on DUT at S0, there are cases when we are seeing some flashrom failure issue and debug data that we have so far is not very conclusive to say that CSE is doing anything wrong.
Was upstream flashrom used?
Yup.
So is the issue easy to reproduce on the affected systems?
It's not easy for sure but we are able to replicate.
> >
> > > Why do you assume that the CSE is involved?
> >
> > Because Chrome OS needs few features that involve CSE to start it's services post booting to OS. Those services might further need to fetch the CSE active partition residing in SPI at regular intervals. We suspect flashrom operation is conflicting with such access as flashrom doesn't sync based on the SCIP bit.Please describe the actual symptoms, error messages? data corruption?
Something must have led to this suspicion, right?
I would expect the hardware to coordinate between the different masters.Unfortunately, it seems doesn't support multimaster protocol.
I don't know any such thing. What I am referring to is that Intel calls the
hardware blocks with SPI access (CSE, host, GbE etc.) masters. Somehow these
must be coordinated. It's possible that one master can toggle the SCIP bits
of all masters at once. It's possible that this is all there is. I doubt it,
but it's possible. That there is no hardware synchronization at all would be
very odd.
We are waiting to understand how HW arbitration actually works between those agents. So, far we don't have much answer.
> >
> > > At least,
> > > the SCIP bit can never be enough to synchronize masters. Even with the added
> > > waiting loop, there is still much room for race conditions (e.g. second master
> > > reading SCIP = 0 before the first posted its write to start a cycle).
> >
> > I believe this sync might work like this.
> > When first master is creating the command and haven't set the FGO bit. There is no operation that is taking place so, 2nd master can read SCIP = 0 and if 2nd master wish to start a new command, it has still room for that. But immediately after FGO is set, the HW logic will flip the SCIP bit so, 2nd master if about to start the new command, it has to in wait loop till 1st master finishes the operation. And I have created the blocking logic. At based on the initial code analysis, CSE FW will ensure not to intrude into host cpu operation but host CPU (using flashrom) might run into issue because CSE is using SPI bus for read operation hence flashrom operation might fail.It seems you ignore hardware concurrency. What do you think happens if
1st master reads SCIP = 0, 2nd master reads SCIP = 0, and both write
FGO = 1. There is no magic that would make any of the masters return
to the waiting loop.
Yes, this is valid concern. And even with this implementation, we still might have few % chance that we may see a failure. What I have suggested is to have a retry implementation inside flashrom for giving you an insurance. The example that you have given, if we assume that CSE attempt to get the bus and Flashrom fail, but it can retry again. On the next attempt, it might see the SPI cycle is busy and wait there. isn't it?
>
> > I guess a simple experiment could give a hint if it does: The hardware is also
> > able to start cycles on its own for memory-mapped flash access. You could test
> > if this toggles the SCIP bit in the host master interface. For instance, poll
> > SCIP and then read from the memory-mapped BIOS area. If polling too fast is an
> > issue, I'd measure the time of a short cycle, then take half of that as the > polling
> delay.
>
> I believe this scenario is valid while we are booting from SPI memory mapped memory (till we have resources are memory mapped) and not in the case of while we are running the flashrom from OS. There should not be any code which is getting fetched from XIP/SPI mapped memory while BIOS is done.Doesn't matter what is valid. If it toggles the host master SCIP bit, than there
is definitely something to do in flashrom. If it doesn't, that says nothing, but
we could continue to investigate and see if CSE activity toggles the host SCIP bit.
CSE access the SPI for read hence, it will for sure flip the SCIP. No doubt about that.
To view, visit change 61854. To unsubscribe, or for help writing mail filters, visit settings.