[coreboot] Wired problems with Intel skylake based board

Christian Gmeiner christian.gmeiner at gmail.com
Tue Oct 16 09:46:06 CEST 2018


Hi

Am Mo., 15. Okt. 2018 um 15:26 Uhr schrieb Naresh G. Solanki
<naresh.solanki.2011 at gmail.com>:
>
> Hi,
>
> There are two things here.
> 1.  System fails to boot i.e., hangs

Correct.

> 2.  FPGA's connected to root port are not detected in FW/OS

Wrong. The FPGAs are detected correctly and work most of the time. The only
problem we have is that the PCIe reference clock is not 100MHz as
spread spectrum
is activated. This causes that the internal clock used for EtherCAT
ist not reliable
which has the result that we fail to sync with EtherCAT clients and
there no communication
over EtherCAT works.

>
> For problem 1, can you give data on where exactly it hangs?, Is it in
> OS or FW ?, Can you provide kernel/coreboot log, port 80 dump when it
> hangs.

It only hangs If I change the following values:

    mem_cfg->PegDisableSpreadSpectrumClocking = 1;
    mem_cfg->PchPmPciePllSsc = 0;

The 'physical' cause for the hang is also known: PLT_RST# never gets
high again. There are chances
that it does not hang but PCI devices (sata, usb hc, ...) are not
working as expected as the PCe reference clock
is in such a case at around 92 Mhz.

The end goal is to disable PCIe Spread Specturm and get a constant
PCIe reference clock
of 100MHz.

The system does not hang if I only change

    mem_cfg->PegDisableSpreadSpectrumClocking = 1;

But it has no effect on the PCIe reference clock and it looks like
spread spectrum is still used.

> For problem 2, Can you try setting UPD PcieRpHotPlug, & check the
> behaviour. reference:
> https://review.coreboot.org/cgit/coreboot.git/tree/src/soc/intel/skylake/chip_fsp20.c#n300
>
> Can you please check FPGA datasheet & check for frequency tolerance of
> PCIE_CLK_REF signal from FPGA.
> Also can you re-verify board layout for any PDG(Platform Design Guide)
> violations like impedance, length matching, limits for  differential
> PCIE_CLK_REF, PCIe Lanes.
> Sometimes noise generated within board can generate huge EMI. I assume
> high noisy circuits(power supply, backlight driver etc) are kept away
> from high speed PCIe signals.
>
> Regards,
> Naresh G Solanki
> On Mon, Oct 15, 2018 at 5:37 PM Christian Gmeiner
> <christian.gmeiner at gmail.com> wrote:
> >
> > Am Fr., 12. Okt. 2018 um 10:15 Uhr schrieb Nico Huber <nico.h at gmx.de>:
> > >
> > > On 10/11/18 11:29 AM, Christian Gmeiner wrote:
> > > > During the last weeks I found the root cause of my problem - PCIe
> > > > spread spectrum
> > > >
> > > > Our FPGAs need a stable 100MHz PCIE clock to work. The used FSP config
> > > > thing looked
> > > > like this:
> > > >
> > > > void mainboard_memory_init_params(FSPM_UPD *mupd)
> > > > {
> > > >     FSP_M_CONFIG *mem_cfg;
> > > >     struct spd_block blk = {
> > > >         .addr_map = { 0x50 },
> > > >     };
> > > >
> > > >     mem_cfg = &mupd->FspmConfig;
> > > >
> > > >     mem_cfg->PegDisableSpreadSpectrumClocking = 1;
> > > >     mem_cfg->PchPmPciePllSsc = 0;
> > > >
> > > >     ...
> > > > }
> > > >
> > > > With this configuration the PCIe reference clock was off more then 8% which
> > > > caused the system to hang during cold and warm boots.
> > > >
> > > > In the next step I removed assignment of PchPmPciePllSsc as it is documented
> > > > as 'No BIOS override'. With this change I got more then 1000 soft and
> > > > 2000 hard reboots
> > > > without any problem. Keep in mind we started with only 10 successful reboots.
> > >
> > > Please be more specific about the final setting of this UPD. `No BIOS
> > > override` is the documentation for the default value of 0xff. But is
> > > this set to the default in the binary? who knows...
> > >
> >
> > void mainboard_memory_init_params(FSPM_UPD *mupd)
> > {
> >     FSP_M_CONFIG *mem_cfg;
> >     struct spd_block blk = {
> >         .addr_map = { 0x50 },
> >     };
> >
> >     mem_cfg = &mupd->FspmConfig;
> >
> >     /* Disable PCIe Spread Spectrum Clocking */
> >     printk(BIOS_ERR, "PegDisableSpreadSpectrumClocking: %x\n",
> > mem_cfg->PegDisableSpreadSpectrumClocking);
> >     printk(BIOS_ERR, "PchPmPciePllSsc: %x\n", mem_cfg->PchPmPciePllSsc);
> >     mem_cfg->PegDisableSpreadSpectrumClocking = 1;
> >
> >     get_spd_smbus(&blk);
> >     dump_spd_info(&blk);
> >     assert(blk.spd_array[0][0] != 0);
> >
> >     mainboard_fill_dq_map_data(&mem_cfg->DqByteMapCh0);
> >     mainboard_fill_dqs_map_data(&mem_cfg->DqsMapCpu2DramCh0);
> >     mainboard_fill_rcomp_res_data(&mem_cfg->RcompResistor);
> >     mainboard_fill_rcomp_strength_data(&mem_cfg->RcompTarget);
> >
> >     mem_cfg->DqPinsInterleaved = TRUE;
> >     mem_cfg->MemorySpdDataLen = blk.len;
> >     mem_cfg->MemorySpdPtr00 = (uintptr_t) blk.spd_array[0];
> > }
> >
> > And here is the output taken from cbmem -1:
> >
> > ..
> > FMAP: base = ff000000 size = 1000000 #areas = 4
> > FMAP: area RW_MRC_CACHE found @ a50000 (65536 bytes)
> > MRC: no data in 'RW_MRC_CACHE'
> > PegDisableSpreadSpectrumClocking: 0
> > PchPmPciePllSsc: ff
> > SPD @ 0x50
> > SPD: module type is DDR4
> > ..
> >
> > > >
> > > > The big problem is that PegDisableSpreadSpectrumClocking has no effect
> > > > at all. I measured
> > > > the freq it is not the 100MHz as expected. And I need to have a stable
> > > > 100MHz this clock source
> > > > is used internally by the FPGA to drive internal clocks. The end
> > > > results is that EtherCAT is not
> > > > able to sync.
> > >
> > > This setting is about a different clock, I guess. Can you please clarify
> > > what is connected to which clock on your board.
> > >
> >
> > Our FPGAs are using the 100 MHz PCIe clock as input to drive internal
> > clocks etc. One
> > of these clock is used for EtherCAT. If spread spectrum is active we
> > are around 100 MHz
> > (worst measured freq was ~92 MHz) and as a result the internal FPGA clock is not
> > stable/reliable --> EtherCAT sync fails.
> >
> > If I use the following pattern:
> > mem_cfg->PegDisableSpreadSpectrumClocking = 1;
> > mem_cfg->PchPmPciePllSsc = 0;
> >
> > I get a stable 100 MHz PCIe clock signal and everything works - except
> > the device hangs
> > after < 10 warm reboots. Looks like PCIe link training fails
> > uncountable times (seen with protocol
> > analyzer).
> >
> > --
> > greets
> > --
> > Christian Gmeiner, MSc
> >
> > https://christian-gmeiner.info
> >
> > --
> > coreboot mailing list: coreboot at coreboot.org
> > https://mail.coreboot.org/mailman/listinfo/coreboot
>
>
>
> --
> Best regards,
> Naresh G. Solanki



-- 
greets
--
Christian Gmeiner, MSc

https://christian-gmeiner.info



More information about the coreboot mailing list