Hi,
There are two things here. 1. System fails to boot i.e., hangs 2. FPGA's connected to root port are not detected in FW/OS
For problem 1, can you give data on where exactly it hangs?, Is it in OS or FW ?, Can you provide kernel/coreboot log, port 80 dump when it hangs. For problem 2, Can you try setting UPD PcieRpHotPlug, & check the behaviour. reference: https://review.coreboot.org/cgit/coreboot.git/tree/src/soc/intel/skylake/chi...
Can you please check FPGA datasheet & check for frequency tolerance of PCIE_CLK_REF signal from FPGA. Also can you re-verify board layout for any PDG(Platform Design Guide) violations like impedance, length matching, limits for differential PCIE_CLK_REF, PCIe Lanes. Sometimes noise generated within board can generate huge EMI. I assume high noisy circuits(power supply, backlight driver etc) are kept away from high speed PCIe signals.
Regards, Naresh G Solanki On Mon, Oct 15, 2018 at 5:37 PM Christian Gmeiner christian.gmeiner@gmail.com wrote:
Am Fr., 12. Okt. 2018 um 10:15 Uhr schrieb Nico Huber nico.h@gmx.de:
On 10/11/18 11:29 AM, Christian Gmeiner wrote:
During the last weeks I found the root cause of my problem - PCIe spread spectrum
Our FPGAs need a stable 100MHz PCIE clock to work. The used FSP config thing looked like this:
void mainboard_memory_init_params(FSPM_UPD *mupd) { FSP_M_CONFIG *mem_cfg; struct spd_block blk = { .addr_map = { 0x50 }, };
mem_cfg = &mupd->FspmConfig; mem_cfg->PegDisableSpreadSpectrumClocking = 1; mem_cfg->PchPmPciePllSsc = 0; ...
}
With this configuration the PCIe reference clock was off more then 8% which caused the system to hang during cold and warm boots.
In the next step I removed assignment of PchPmPciePllSsc as it is documented as 'No BIOS override'. With this change I got more then 1000 soft and 2000 hard reboots without any problem. Keep in mind we started with only 10 successful reboots.
Please be more specific about the final setting of this UPD. `No BIOS override` is the documentation for the default value of 0xff. But is this set to the default in the binary? who knows...
void mainboard_memory_init_params(FSPM_UPD *mupd) { FSP_M_CONFIG *mem_cfg; struct spd_block blk = { .addr_map = { 0x50 }, };
mem_cfg = &mupd->FspmConfig; /* Disable PCIe Spread Spectrum Clocking */ printk(BIOS_ERR, "PegDisableSpreadSpectrumClocking: %x\n",
mem_cfg->PegDisableSpreadSpectrumClocking); printk(BIOS_ERR, "PchPmPciePllSsc: %x\n", mem_cfg->PchPmPciePllSsc); mem_cfg->PegDisableSpreadSpectrumClocking = 1;
get_spd_smbus(&blk); dump_spd_info(&blk); assert(blk.spd_array[0][0] != 0); mainboard_fill_dq_map_data(&mem_cfg->DqByteMapCh0); mainboard_fill_dqs_map_data(&mem_cfg->DqsMapCpu2DramCh0); mainboard_fill_rcomp_res_data(&mem_cfg->RcompResistor); mainboard_fill_rcomp_strength_data(&mem_cfg->RcompTarget); mem_cfg->DqPinsInterleaved = TRUE; mem_cfg->MemorySpdDataLen = blk.len; mem_cfg->MemorySpdPtr00 = (uintptr_t) blk.spd_array[0];
}
And here is the output taken from cbmem -1:
.. FMAP: base = ff000000 size = 1000000 #areas = 4 FMAP: area RW_MRC_CACHE found @ a50000 (65536 bytes) MRC: no data in 'RW_MRC_CACHE' PegDisableSpreadSpectrumClocking: 0 PchPmPciePllSsc: ff SPD @ 0x50 SPD: module type is DDR4 ..
The big problem is that PegDisableSpreadSpectrumClocking has no effect at all. I measured the freq it is not the 100MHz as expected. And I need to have a stable 100MHz this clock source is used internally by the FPGA to drive internal clocks. The end results is that EtherCAT is not able to sync.
This setting is about a different clock, I guess. Can you please clarify what is connected to which clock on your board.
Our FPGAs are using the 100 MHz PCIe clock as input to drive internal clocks etc. One of these clock is used for EtherCAT. If spread spectrum is active we are around 100 MHz (worst measured freq was ~92 MHz) and as a result the internal FPGA clock is not stable/reliable --> EtherCAT sync fails.
If I use the following pattern: mem_cfg->PegDisableSpreadSpectrumClocking = 1; mem_cfg->PchPmPciePllSsc = 0;
I get a stable 100 MHz PCIe clock signal and everything works - except the device hangs after < 10 warm reboots. Looks like PCIe link training fails uncountable times (seen with protocol analyzer).
-- greets -- Christian Gmeiner, MSc
https://christian-gmeiner.info
-- coreboot mailing list: coreboot@coreboot.org https://mail.coreboot.org/mailman/listinfo/coreboot