yanvasilij yan wrote:
I have two Intel Atom E3800 based boards. The first one is older version, which woks properly,
..
The second one is modernized version, we added 89HPES5T5 PCIe I/O expansion switch. And WGI210IT based Ethernet ports are connected to switch.
So the good news is that your log clearly shows the PCIe switch working correctly and both NICs behind reachable by software.
Further we rebuild a bit a power up circuit of the E3800 SOC.
This is quite possibly the root cause, but I wouldn't exclude any other possibilities.
Launching stops when starts payload loading. The launch log of this board see in attached “not_working_board_log.txt”.
The log is very clear about why: (down near the end)
--8<-- SELF segment doesn't target RAM: 0x00800000, 4259840 bytes -->8--
Looking at the coreboot table a little further up, we see: --8<-- Writing coreboot table at 0x3add3000 0. 0000000000000000-0000000000000fff: CONFIGURATION TABLES 1. 0000000000e00000-0000000000e39fff: RAMSTAGE 2. 000000003ad9e000-000000003adfffff: CONFIGURATION TABLES 3. 00000000feb00000-00000000fec00fff: RESERVED 4. 00000000fed01000-00000000fed01fff: RESERVED 5. 00000000fed03000-00000000fed03fff: RESERVED 6. 00000000fed05000-00000000fed05fff: RESERVED 7. 00000000fed08000-00000000fed08fff: RESERVED 8. 00000000fed0c000-00000000fed0ffff: RESERVED 9. 00000000fed1c000-00000000fed1cfff: RESERVED 10. 00000000fef00000-00000000feffffff: RESERVED -->8--
Compare that with your working board: --8<-- Writing coreboot table at 0x3add3000 0. 0000000000000000-0000000000000fff: CONFIGURATION TABLES 1. 0000000000001000-000000000009ffff: RAM 2. 00000000000a0000-00000000000fffff: RESERVED 3. 0000000000100000-0000000000dfffff: RAM 4. 0000000000e00000-0000000000e39fff: RAMSTAGE 5. 0000000000e3a000-000000003ad9dfff: RAM 6. 000000003ad9e000-000000003adfffff: CONFIGURATION TABLES 7. 000000003ae00000-000000003fffffff: RESERVED 8. 00000000e0000000-00000000efffffff: RESERVED 9. 00000000feb00000-00000000fec00fff: RESERVED 10. 00000000fed01000-00000000fed01fff: RESERVED 11. 00000000fed03000-00000000fed03fff: RESERVED 12. 00000000fed05000-00000000fed05fff: RESERVED 13. 00000000fed08000-00000000fed08fff: RESERVED 14. 00000000fed0c000-00000000fed0ffff: RESERVED 15. 00000000fed1c000-00000000fed1cfff: RESERVED 16. 00000000fee00000-00000000fee00fff: RESERVED 17. 00000000fef00000-00000000feffffff: RESERVED -->8--
The new board ends up with no RAM regions in the coreboot table.
That results in the payload loader not finding RAM where the payload is to be loaded, so boot stops.
Why are there no RAM regions? I don't know.
Looking near the beginning of the log about FSP memory init: --8<-- Memory Down Data Existed : Enabled - Speed (0: 800, 1: 1066, 2: 1333, 3: 1600): 1 - Type (0: DDR3, 1: DDR3L) : 1 - DIMM0 : Enabled - DIMM1 : Disabled - Width : x16 - Density : 2Gbit - BudWidth : 64bit - Rank # : 1 - tCL : 0B - tRPtRCD : 0B - tWR : 0C - tWTR : 06 - tRRD : 06 - tRTP : 06 - tFAW : 14 Using 1066 MHz DDR3 settings. 1 GB Minnowboard Max detected. romstage_main_continue status: 0 hob_list_ptr: 3ae20000 FSP Status: 0x0 PM1_STS = 0x1 PM1_CNT = 0x0 GEN_PMCON1 = 0x1001808 romstage_main_continue: prev_sleep_state = S0 Baytrail Chip Variant: Bay Trail-I (ISG/embedded) MRC v0.102 1 channels of DDR3 @ 1066MHz -->8--
It appears OK - but do check that those numbers actually match the DRAM chips assembled on the board. Are DRAM parts identical between old and new?
Were there *any* hardware changes between SoC and RAM?
That's worth checking, but..
nico_h in the IRC chat noticed that in non-working board appears a starnge device with vid/did PCI: 00:00.0 [8086/0000].
The 0000 is a HUGE red sign, screaming to be thoroughly investigated.
This also hints that the power up changes may be the problem.
It's VERY unlikely that Intel has suddenly released a variant of this particular SoC with PCI DID=0000 when it used to be DID=0f00. In fact it's really unlikely that 0000 would be used in correct operation at all.
Very likely on the other hand is that the SoC isn't being powered on correctly, and so it ends up in some half-initialized state, with the memory controller not working, and while some part of coreboot seems to notice (because no RAM regions in coreboot table) clearly that isn't causing a fatal error, which I think is a bug. Oh well.
If you go through every single powerup hardware change together with a hardware engineer, starting with the previous circuit and manually applying one change at a time, maybe you can find one or even more changes causing that device ID symptom. It depends on how many changes you have there, but with a good hardware engineer you could perhaps go through them all in a couple days, which would be really fast results for a problem like this.
Maybe even simpler, hack this into some early part of the code, maybe even console_init() works, if pci_early is available there.
while (1) if (0x0f00 == pci_read_config32(PCI_DEV(0,0,0), PCI_VENDORID)) printk(BIOS_INFO, "PASS\n"); else printk(BIOS_INFO, "FAIL: want 0f00 is %04x\n", pci_read_config32(PCI_DEV(0,0,0), PCI_VENDORID));
Then hardware engineering can do analysis on their own. But make sure to confirm that your test is reliable, using the hardware you have (old and new) before you give a flash image to them.
Oh, and test on multiple new boards, a single unit in a new batch isn't representative. New PCB; potentially the process has to be tuned. I don't know how early in bringup you are.
Good luck and have fun! :)
//Peter
On 10/28/18 12:25 AM, Peter Stuge wrote:
Why are there no RAM regions? I don't know.
Quite simple, because the code that adds them is tied to 8086:0f00.
nico_h in the IRC chat noticed that in non-working board appears a starnge device with vid/did PCI: 00:00.0 [8086/0000].
The 0000 is a HUGE red sign, screaming to be thoroughly investigated.
This also hints that the power up changes may be the problem.
I agree, but note that in the datasheet (or EDS at least), the DID is not read-only. A strap (that I coudn't find more about) is supposed to set the initial value plus there is a message mentioned (to be sent to sth. I didn't look into) that may change it.
So alternatively to diagnosing the hardware changes, you could also follow the bread crumbs in the documentation.
Nico
Thank lot too all!
Sory for last response!
You were right, the problem was in power up sequence. After fixing all become correct. The 0x8086/0x0000 device disappeared, and payload succesfuly downloaded.
Peter Stuge, thank you for detailed comments!
Nico Huber, thank you for active participation!
вс, 28 окт. 2018 г. в 4:18, Nico Huber nico.h@gmx.de:
On 10/28/18 12:25 AM, Peter Stuge wrote:
Why are there no RAM regions? I don't know.
Quite simple, because the code that adds them is tied to 8086:0f00.
nico_h in the IRC chat noticed that in non-working board appears a
starnge
device with vid/did PCI: 00:00.0 [8086/0000].
The 0000 is a HUGE red sign, screaming to be thoroughly investigated.
This also hints that the power up changes may be the problem.
I agree, but note that in the datasheet (or EDS at least), the DID is not read-only. A strap (that I coudn't find more about) is supposed to set the initial value plus there is a message mentioned (to be sent to sth. I didn't look into) that may change it.
So alternatively to diagnosing the hardware changes, you could also follow the bread crumbs in the documentation.
Nico