HI Nico,

Thanks for your email.
Please find my response below.

Thanks,
Subrata


On Thu, Mar 17, 2022 at 6:08 PM Nico Huber <nico.h@gmx.de> wrote:
Hi Subrata,

On 16.03.22 17:19, Subrata Banik wrote:
> Hope you are doing well.
> I would like bring this activity in your notice
> https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup and seek help to
> review the code.

thank you for bringing this to the mailing list. Not only but also
because `ichspi` is maybe the most important driver in flashrom it
is important to discuss changes very early. We have to avoid re-
gressions at all cost, so all changes need to be fully understood
before it makes sense to look into the implementation details.
[Subrata] Sure. I will try to explain the situation better and seek help from Anastasia to fill in if still some gap exists. 

> The reason for this refactoring of HW sequencing SPI driver code are:
> 1. We (Chrome OS team) recently ran into a firmware update issue on
> dogfooders (test device utilise by real users and share feedback) systems,
> where the attempt to perform SPI flash update operation from host-cpu
> (using underlying flashrom utility) is silently failing. Furthermore, we
> debug the issue with help of Intel and figure out the problem is related to
> multiple master is accessing the SPI flash and operation triggered by
> host-cpu side using flashrom (write and erase operation) is getting timed
> out (due to given lesser timeout boundary) due to underlying SPI bus is
> occupied by other master.

Does the issue persist with the following commit applied to libflashrom,
activated and integrated into futility?

  commit 330088d77 [CHROMIUM]: libflashrom: Also use USE_BIG_LOCK

This would give us a hint if the issue is indeed due to multiple masters
or actually due to multiple programs trying to control the same master.
[Subrata] Yes, we are seeing failure even with USE_BIG_LOCK code changes being integrated. The only code change that helps us to W/A this issue is here https://chromium-review.googlesource.com/c/chromiumos/third_party/flashrom/+/3498895 (increase timeout in flashrom operations). Also, Intel disable the CSE runtime access to SPI using a custom tool and that helped to fix this update failure as well. These are the evidence towards multi master theory in AU failure issue.

> This is the special case, starting from Tiger Lake Chrome Platform, where
> we have enable the PVAP (Protected Audio Video Path) which request CSE to
> perform some erase and write operation as needed.

We used flashrom on PAVP enabled systems for more than a decade without
issues. Did something change in that area for Tiger Lake in particular?
[Subrata]  I'm not sure about other platforms, but from TGL onwards, on Chrome devices we have enabled PAVP use case and Intel had tried to replicate this issue on older platforms as well and informed us it's replicated the issue on TGL platform as well where PAVP first being enabled and not on JSL (which doesn't enable PAVP). Right now the recommendation that is coming out from Intel is that, on latest Chrome OS devices, there might be concurrent SPI accesses between host CPU and CSE. But I will ask this question about the older generation platform as you have highlighted which has PAVP enabled. Can you please let me know those platform names ? Also,  do you know on those devices if we are trying to update the FW using flashrom tool as well or not?

> 2. Maintain the code symmetry between coreboot and flashrom SPI hardware
> sequencing operations.

Flashrom has tighter requirements compared to coreboot because we try to
support all compatible chipset generations with the same code. If you
want to align the two code-bases, please use flashrom as reference and
patch coreboot.
[Subrata] Let me understand this a little better, are you suggesting not to increase the SPI operation timing in flashrom upstream master ? knowing that there is a hidden issue acknowledged by Intel with their validation data, also, promised to update the SPI programming guide to capture the recommendation of timeout in multi master scenarios.

Apart from timeout increasing CLs, other code changes are to increase the code modularity (like using macro and helper function), are you suggesting to not touch the flashrom code? can't we all contribute towards validating the concerned target platforms. At Least in the coreboot community we all help each other to validate the platform if we are out of required devices.

Also, if you want to save some time in the long run: If we'd focus on
the flashrom code-base, e.g. prepare libflashrom for easier integration
into embedded environments, we could drop the redundant code in core-
boot.

[Subrata] Edward has some suggestions for increasing the code reusability (between coreboot and flaashrom) going forward, and will let him share his thoughts on this.
 
Cheers,
Nico