Hi
Am Mo., 15. Okt. 2018 um 15:26 Uhr schrieb Naresh G. Solanki naresh.solanki.2011@gmail.com:
Hi,
There are two things here.
- System fails to boot i.e., hangs
Correct.
- FPGA's connected to root port are not detected in FW/OS
Wrong. The FPGAs are detected correctly and work most of the time. The only problem we have is that the PCIe reference clock is not 100MHz as spread spectrum is activated. This causes that the internal clock used for EtherCAT ist not reliable which has the result that we fail to sync with EtherCAT clients and there no communication over EtherCAT works.
For problem 1, can you give data on where exactly it hangs?, Is it in OS or FW ?, Can you provide kernel/coreboot log, port 80 dump when it hangs.
It only hangs If I change the following values:
mem_cfg->PegDisableSpreadSpectrumClocking = 1; mem_cfg->PchPmPciePllSsc = 0;
The 'physical' cause for the hang is also known: PLT_RST# never gets high again. There are chances that it does not hang but PCI devices (sata, usb hc, ...) are not working as expected as the PCe reference clock is in such a case at around 92 Mhz.
The end goal is to disable PCIe Spread Specturm and get a constant PCIe reference clock of 100MHz.
The system does not hang if I only change
mem_cfg->PegDisableSpreadSpectrumClocking = 1;
But it has no effect on the PCIe reference clock and it looks like spread spectrum is still used.
For problem 2, Can you try setting UPD PcieRpHotPlug, & check the behaviour. reference: https://review.coreboot.org/cgit/coreboot.git/tree/src/soc/intel/skylake/chi...
Can you please check FPGA datasheet & check for frequency tolerance of PCIE_CLK_REF signal from FPGA. Also can you re-verify board layout for any PDG(Platform Design Guide) violations like impedance, length matching, limits for differential PCIE_CLK_REF, PCIe Lanes. Sometimes noise generated within board can generate huge EMI. I assume high noisy circuits(power supply, backlight driver etc) are kept away from high speed PCIe signals.
Regards, Naresh G Solanki On Mon, Oct 15, 2018 at 5:37 PM Christian Gmeiner christian.gmeiner@gmail.com wrote:
Am Fr., 12. Okt. 2018 um 10:15 Uhr schrieb Nico Huber nico.h@gmx.de:
On 10/11/18 11:29 AM, Christian Gmeiner wrote:
During the last weeks I found the root cause of my problem - PCIe spread spectrum
Our FPGAs need a stable 100MHz PCIE clock to work. The used FSP config thing looked like this:
void mainboard_memory_init_params(FSPM_UPD *mupd) { FSP_M_CONFIG *mem_cfg; struct spd_block blk = { .addr_map = { 0x50 }, };
mem_cfg = &mupd->FspmConfig; mem_cfg->PegDisableSpreadSpectrumClocking = 1; mem_cfg->PchPmPciePllSsc = 0; ...
}
With this configuration the PCIe reference clock was off more then 8% which caused the system to hang during cold and warm boots.
In the next step I removed assignment of PchPmPciePllSsc as it is documented as 'No BIOS override'. With this change I got more then 1000 soft and 2000 hard reboots without any problem. Keep in mind we started with only 10 successful reboots.
Please be more specific about the final setting of this UPD. `No BIOS override` is the documentation for the default value of 0xff. But is this set to the default in the binary? who knows...
void mainboard_memory_init_params(FSPM_UPD *mupd) { FSP_M_CONFIG *mem_cfg; struct spd_block blk = { .addr_map = { 0x50 }, };
mem_cfg = &mupd->FspmConfig; /* Disable PCIe Spread Spectrum Clocking */ printk(BIOS_ERR, "PegDisableSpreadSpectrumClocking: %x\n",
mem_cfg->PegDisableSpreadSpectrumClocking); printk(BIOS_ERR, "PchPmPciePllSsc: %x\n", mem_cfg->PchPmPciePllSsc); mem_cfg->PegDisableSpreadSpectrumClocking = 1;
get_spd_smbus(&blk); dump_spd_info(&blk); assert(blk.spd_array[0][0] != 0); mainboard_fill_dq_map_data(&mem_cfg->DqByteMapCh0); mainboard_fill_dqs_map_data(&mem_cfg->DqsMapCpu2DramCh0); mainboard_fill_rcomp_res_data(&mem_cfg->RcompResistor); mainboard_fill_rcomp_strength_data(&mem_cfg->RcompTarget); mem_cfg->DqPinsInterleaved = TRUE; mem_cfg->MemorySpdDataLen = blk.len; mem_cfg->MemorySpdPtr00 = (uintptr_t) blk.spd_array[0];
}
And here is the output taken from cbmem -1:
.. FMAP: base = ff000000 size = 1000000 #areas = 4 FMAP: area RW_MRC_CACHE found @ a50000 (65536 bytes) MRC: no data in 'RW_MRC_CACHE' PegDisableSpreadSpectrumClocking: 0 PchPmPciePllSsc: ff SPD @ 0x50 SPD: module type is DDR4 ..
The big problem is that PegDisableSpreadSpectrumClocking has no effect at all. I measured the freq it is not the 100MHz as expected. And I need to have a stable 100MHz this clock source is used internally by the FPGA to drive internal clocks. The end results is that EtherCAT is not able to sync.
This setting is about a different clock, I guess. Can you please clarify what is connected to which clock on your board.
Our FPGAs are using the 100 MHz PCIe clock as input to drive internal clocks etc. One of these clock is used for EtherCAT. If spread spectrum is active we are around 100 MHz (worst measured freq was ~92 MHz) and as a result the internal FPGA clock is not stable/reliable --> EtherCAT sync fails.
If I use the following pattern: mem_cfg->PegDisableSpreadSpectrumClocking = 1; mem_cfg->PchPmPciePllSsc = 0;
I get a stable 100 MHz PCIe clock signal and everything works - except the device hangs after < 10 warm reboots. Looks like PCIe link training fails uncountable times (seen with protocol analyzer).
-- greets -- Christian Gmeiner, MSc
https://christian-gmeiner.info
-- coreboot mailing list: coreboot@coreboot.org https://mail.coreboot.org/mailman/listinfo/coreboot
-- Best regards, Naresh G. Solanki