[SeaBIOS] Long delay: WARNING - Timeout at wait_reg8:81!

Mon Mar 12 18:38:52 CET 2018

On 03/09/2018 09:49 AM, Stephen Douthit wrote:
> [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
> 
> On 03/09/2018 06:05 AM, Paul Menzel wrote:
>> Dear Stephen,
>>
>>
>> On 03/07/2018 07:24 PM, Stephen Douthit wrote:
>>> On 03/07/2018 12:41 PM, Kevin O'Connor wrote:
>>>> On Wed, Mar 07, 2018 at 12:33:36PM -0500, Stephen Douthit wrote:
>>>>> On 03/07/2018 10:33 AM, Paul Menzel wrote:
>>
>>>>>> Am Dienstag, den 06.03.2018, 11:57 -0500 schrieb Stephen Douthit:
>>>>>>> On 03/06/2018 11:04 AM, Paul Menzel wrote:
>>>>>>>> On 03/02/18 17:31, Kevin O'Connor wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Feb 27, 2018 at 02:17:08PM -0500, Stephen Douthit wrote:
>>>>>>>> […]
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.  I committed this series.
>>>>>>>> The second commit introduced a regression with coreboot on the
>>>>>>>> ASRock E350M1. There are a bunch of time-outs, causing the startup
>>>>>>>> to be really slow. With no serial console, the user thinks, it’s
>>>>>>>> not working and start to debug.
>>>>>>>
>>>>>>> Looking through the the user manual for that board I don't see that it
>>>>>>> has a TPM, or even the header for one, so a timeout seems correct.
>>>>>>
>>>>>> Indeed, no TPM is present.
>>>>>
>>>>> Thanks for confirming.
>>>>>
>>>>>>> Multiple 750ms timeouts does seem pretty painful though.  I hadn't
>>>>>>> considered that tis_probe() would be called multiple times if no TPM
>>>>>>> was present.
>>>>>>>
>>>>>>> What's the preferred way to have a probe function run and bail before
>>>>>>> rerunning the timeout?  Just put a static flag in tis_probe()?  The
>>>>>>> attached patch takes that approach.  Please let me know if that fixes
>>>>>>> the issue for you, or if there's some other preferred pattern I should
>>>>>>> use here.
>>>>>>
>>>>>> Unfortunately, that didn’t help.
>>>>>>
>>>>>> ```
>>>>>> $ git log --oneline -2
>>>>>> fd1cbb4 (HEAD -> master, origin/master, origin/HEAD) tpm: Save tis_probe() result to avoid a reun of lengthy timeouts
>>>>>> 5adc8bd tpm: Handle unimplemented TIS_REG_IFACE_ID in tis_get_tpm_version()
>>>>>> ```
>>>>>>
>>>>>> And the time-outs seem to be around 20 seconds or more. Please find the
>>>>>> log with time stamps attached (`sudo ./readserial.py /dev/ttyUSB0`).
>>>>>
>>>>> Yikes, 20 seconds is the medium duration timeout, not the default A
>>>>> timeout of 750ms.  I was poking the wrong area with the last patch.
>>>>> It looks like tis_probe() is propagating the return from
>>>>> tis_wait_access() in the no device present case.
>>>>
>>>> FYI, even adding 5ms to the boot time is unacceptable.  Is there
>>>> anyway to verify the hardware exists before waiting for it to be
>>>> ready?
>>>
>>> The only way I know of would be to check if we have TCPA or TPM2 ACPI
>>> tables, and only attempt to probe for a TPM if those are present.
>>>
>>> Attached patch should do that, and it's probably a good idea
>>> independent of any of my other patches.
>>
>> I applied both the latest commits, and quickly testing that, I believe the long delay is still there. I won’t be able to get to until next week, and make the ACPI tables available. Maybe there is a way to test this with QEMU? Kevin also owns the ASRock E350M1 to my knowledge.
> 
> Thanks for the continued testing.  I don't have a good theory for
> what's going on at the moment.
> 
> It looks like there's a series resistor I can depop to isolate the TPM
> reset on the board I was testing on.  I should be able to jumper that
> so I can test the TPM and no-TPM cases on the same hardware.
> Hopefully I can reproduce the timeout that way.

I've got a board modded so I can jumper the TPM in and out.

What I found in the no-TPM case was that both tis_probe() and
crb_probe() incorrectly return 1 for device present if all Fs are read.

For tis_probe() that was because rc wasn't updated to 0 if didvid was
0xffffffff.  For crb_probe() the last three return statements are
inverted from what they should be, and the first 64bit address check
returned the wrong value.  Fixing both probe functions got rid of the
timeout for me when the TPM was disconnected.

It looks like there's a bit in the ACCESS register called Seize that
must always read '0' for the version 1.2/1.3 interfaces.  I'd like to
check that instead of didvid in tis_probe to handle the aborted read all
0s/Fs case.

I'd like to add a poll for tpmRegValidSts to crb_probe() similar to
what's in tis_probe() to avoid potential races on real hardware.
There's a Seize bit in TPM_LOC_CTRL_x which always reads 0 that we could
use as a sanity check against the no device all Fs case.

Let me know if that sounds like a better way to catch the no device
case, or if there's is some other check that would be better.

Thanks,
Steve