Hi,
There are two things here.
1. System fails to boot i.e., hangs
2. FPGA's connected to root port are not detected in FW/OS
For problem 1, can you give data on where exactly it hangs?, Is it in
OS or FW ?, Can you provide kernel/coreboot log, port 80 dump when it
hangs.
For problem 2, Can you try setting UPD PcieRpHotPlug, & check the
behaviour. reference:
https://review.coreboot.org/cgit/coreboot.git/tree/src/soc/intel/skylake/chi...
Can you please check FPGA datasheet & check for frequency tolerance of
PCIE_CLK_REF signal from FPGA.
Also can you re-verify board layout for any PDG(Platform Design Guide)
violations like impedance, length matching, limits for differential
PCIE_CLK_REF, PCIe Lanes.
Sometimes noise generated within board can generate huge EMI. I assume
high noisy circuits(power supply, backlight driver etc) are kept away
from high speed PCIe signals.
Regards,
Naresh G Solanki
On Mon, Oct 15, 2018 at 5:37 PM Christian Gmeiner
christian.gmeiner@gmail.com wrote:
Am Fr., 12. Okt. 2018 um 10:15 Uhr schrieb Nico Huber nico.h@gmx.de:
On 10/11/18 11:29 AM, Christian Gmeiner wrote:
During the last weeks I found the root cause of my problem - PCIe
spread spectrum
Our FPGAs need a stable 100MHz PCIE clock to work. The used FSP config
thing looked
like this:
void mainboard_memory_init_params(FSPM_UPD *mupd)
{
FSP_M_CONFIG *mem_cfg;
struct spd_block blk = {
.addr_map = { 0x50 },
};
mem_cfg = &mupd->FspmConfig;
mem_cfg->PegDisableSpreadSpectrumClocking = 1;
mem_cfg->PchPmPciePllSsc = 0;
...
}
With this configuration the PCIe reference clock was off more then 8% which
caused the system to hang during cold and warm boots.
In the next step I removed assignment of PchPmPciePllSsc as it is documented
as 'No BIOS override'. With this change I got more then 1000 soft and
2000 hard reboots
without any problem. Keep in mind we started with only 10 successful reboots.
Please be more specific about the final setting of this UPD. `No BIOS
override` is the documentation for the default value of 0xff. But is
this set to the default in the binary? who knows...
void mainboard_memory_init_params(FSPM_UPD *mupd)
{
FSP_M_CONFIG *mem_cfg;
struct spd_block blk = {
.addr_map = { 0x50 },
};
mem_cfg = &mupd->FspmConfig;
/* Disable PCIe Spread Spectrum Clocking */
printk(BIOS_ERR, "PegDisableSpreadSpectrumClocking: %x\n",
mem_cfg->PegDisableSpreadSpectrumClocking);
printk(BIOS_ERR, "PchPmPciePllSsc: %x\n", mem_cfg->PchPmPciePllSsc);
mem_cfg->PegDisableSpreadSpectrumClocking = 1;
get_spd_smbus(&blk);
dump_spd_info(&blk);
assert(blk.spd_array[0][0] != 0);
mainboard_fill_dq_map_data(&mem_cfg->DqByteMapCh0);
mainboard_fill_dqs_map_data(&mem_cfg->DqsMapCpu2DramCh0);
mainboard_fill_rcomp_res_data(&mem_cfg->RcompResistor);
mainboard_fill_rcomp_strength_data(&mem_cfg->RcompTarget);
mem_cfg->DqPinsInterleaved = TRUE;
mem_cfg->MemorySpdDataLen = blk.len;
mem_cfg->MemorySpdPtr00 = (uintptr_t) blk.spd_array[0];
}
And here is the output taken from cbmem -1:
..
FMAP: base = ff000000 size = 1000000 #areas = 4
FMAP: area RW_MRC_CACHE found @ a50000 (65536 bytes)
MRC: no data in 'RW_MRC_CACHE'
PegDisableSpreadSpectrumClocking: 0
PchPmPciePllSsc: ff
SPD @ 0x50
SPD: module type is DDR4
..
The big problem is that PegDisableSpreadSpectrumClocking has no effect
at all. I measured
the freq it is not the 100MHz as expected. And I need to have a stable
100MHz this clock source
is used internally by the FPGA to drive internal clocks. The end
results is that EtherCAT is not
able to sync.
This setting is about a different clock, I guess. Can you please clarify
what is connected to which clock on your board.
Our FPGAs are using the 100 MHz PCIe clock as input to drive internal
clocks etc. One
of these clock is used for EtherCAT. If spread spectrum is active we
are around 100 MHz
(worst measured freq was ~92 MHz) and as a result the internal FPGA clock is not
stable/reliable --> EtherCAT sync fails.
If I use the following pattern:
mem_cfg->PegDisableSpreadSpectrumClocking = 1;
mem_cfg->PchPmPciePllSsc = 0;
I get a stable 100 MHz PCIe clock signal and everything works - except
the device hangs
after < 10 warm reboots. Looks like PCIe link training fails
uncountable times (seen with protocol
analyzer).
--
greets
--
Christian Gmeiner, MSc
https://christian-gmeiner.info
--
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot
--
Best regards,
Naresh G. Solanki