Since it is a race condition, there are many factors that come into play that determine if it would cause the SoC to lock up or not, so my statement that it's a problem with SKUs with more than 4 cores is just a generalization. Other factors that could impact the timing of the execution of the SMM relocation code on each core during MP init could result in masking the problem. Frankly, it's just a fluke that it wasn't caught in the original development of the Broadwell-DE solution, and only because it can't be reproduced on the Camelback Mountain CRBs with the stock 4-core SKU that was in all of them that were built back in the day.
Serialization of the SMM relocation code, which would otherwise run in parallel on all cores with no serialization, is what masks the race condition, be it from either turning up the console output such that printks in the code path are executed, OR by a call to wbinvd() in function smm_relocation_handler() in file src/soc/intel/fsp_broadwell_de/smmrelocate.c. But again, this just masks the underlying root cause.
My point is that the race condition itself exists in the code regardless of what SKU you are using. The number of cores that execute the SMM relocation code in parallel just increases the chance of causing a lockup. Our experience with a 12-core SKU was that it would lock up 100% of the time if the console level was dialed down and before we introduced the wbinvd() workaround. And others on this list have reported similar results with other SKUs with more than 4 cores. But again, when it comes to race conditions, there are many factors that come into play that could impact whether or not a lockup occurs.
Looking at the last post code dumped to the console in the original message before it hung, I immediately recognized that from when we saw the same hang. That's the last post code you get before the MP init code takes off and ultimately runs the SMM relocation handler code.
Bottom line is that at some point somebody should investigate the underlying root cause and fix it right. At the moment, we are busy with other things and don't have the bandwidth to look into it deeper, but at least we have a pretty good idea of where the problem lies.
- Jay
Jay Talbott Principal Consulting Engineer SysPro Consulting, LLC 3057 E. Muirfield St. Gilbert, AZ 85298 (480) 704-8045 (480) 445-9895 (FAX) JayTalbott@sysproconsulting.com http://www.sysproconsulting.com
-----Original Message----- From: Frans Hendriks [mailto:fhendriks@eltan.com] Sent: Thursday, February 28, 2019 1:13 AM To: coreboot@coreboot.org Cc: 'Jay Talbott'; aurelio@platinasystems.com Subject: [coreboot] Re: Intel Xeon D-1577 (16-core)
This relocation is performed in a later stage, so some output is expected. Is (correct) microcode included for D-1577?
I assume wbinvd() must be included in smm_relocation() function. (We have implemented coreboot on several Broadwell-DE boards with >4 cores without using wbinv() of 8:SPEW.)
Best regards, Frans Hendriks Eltan B.V.
-----Original Message----- From: Jay Talbott [mailto:JayTalbott@sysproconsulting.com] Sent: woensdag 27 februari 2019 22:19 To: aurelio@platinasystems.com; coreboot@coreboot.org Subject: [coreboot] Re: Intel Xeon D-1577 (16-core)
There's a bug in the SMM relocation code for Broadwell-DE that causes a
race
condition resulting in the SoC locking up during the MP init. With the
stock
4-core SKU that comes in most of the Camelback Mountain CRBs, it's not a problem, which is why Intel didn't find it when they developed the
original
coreboot implementation for the CRB. But as has been reported previously on this list, it becomes a problem with more than 4 cores. We actually have
an
open ticket with Intel to see if they are willing to diagnose the root
cause
and fix it, but I have zero expectation that any action will ever be done
on
their part at this point.
If you turn up your console output level to 8:SPEW, the problem will go away, as the extra printks that get enabled in that case result in serialization of the SMM relocation code on each core, thus masking the
race
condition (try this first!).
Another workaround is to insert a call to wbinvd() in function smm_relocation_handler() in file src/soc/intel/fsp_broadwell_de/smmrelocate.c. This will also result in serialization that masks the race condition (fixed it for us on a 12-core SKU) without needing to have the console turned all the way up.
At some point somebody needs to dig into the actual code in smmrelocate.c and identify the root cause of the actual race condition. We just haven't had the time to do any further investigation into the root cause since we have a working workaround.
Hope that helps...
- Jay
Jay Talbott Principal Consulting Engineer SysPro Consulting, LLC 3057 E. Muirfield St. Gilbert, AZ 85298 (480) 704-8045 (480) 445-9895 (FAX) JayTalbott@sysproconsulting.com http://www.sysproconsulting.com
-----Original Message----- From: aurelio@platinasystems.com [mailto:aurelio@platinasystems.com] Sent: Wednesday, February 27, 2019 12:46 PM To: coreboot@coreboot.org Subject: [coreboot] Intel Xeon D-1577 (16-core)
Hello,
I got a daughter-card (DC) based on the Intel's Camelback Mountain CRB. Coreboot won't boot when a DC is populated with a 16-core Xeon D-1577 processor. Nothing is printed in the boot process, so it doesn't seem to
be
getting very far. However, if i load/program the boot SPI with AMI BIOS instead of coreboot, then everything is hunky-dory. It boots up all the
way
into linux (see below for the platform information when AMI is loaded).
In
addition, if DC is populated with a 4-core D-1527 or 2-core D-1508 then coreboot has no issues (see below for info).
Is there any configuration that i need to change in coreboot to support
the D-
1577?
thanks!
## AMI BIOS ## @unassigned:~$ inxi -F System: Host: unassigned Kernel: 4.13-platina-mk1 x86_64 (64 bit) Console: tty 0 Distro: Debian GNU/Linux 8 Machine: Mobo: Default string model: Default string v: Default string Bios: American Megatrends v: 5.11 date: 05/31/2017 CPU: Octa core Intel Xeon D-1577 (-HT-MCP-) cache: 24576 KB Clock Speeds: 1: 1300 MHz 2: 1300 MHz 3: 1300 MHz 4: 1300 MHz 5: 1300 MHz 6: 1300 MHz 7: 1300 MHz 8: 1300 MHz Graphics: Card: Failed to Detect Video Card! Display Server: N/A driver: N/A tty size: 80x24 Advanced Data: N/A out of X Network: Card-1: Broadcom Device b960 IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A Card-2: Broadcom Device b960 IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A Card-3: Intel Device 15ab driver: ixgbe IF: eth1 state: down mac: 00:a0:c9:00:00:00 Card-4: Intel Device 15ab driver: ixgbe IF: eth2 state: down mac: 34:12:78:56:01:00 Card-5: Intel I210 Gigabit Network Connection driver: igb IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 50:18:4c:00:16:a1 Drives: HDD Total Size: 520.1GB (4.0% used) ID-1: /dev/sda model: TS512ZBTDM1500T size: 512.1GB ID-2: USB /dev/sdb model: Echo size: 8.0GB Partition: ID-1: / size: 451G used: 942M (1%) fs: ext4 dev: /dev/sda1 ID-2: swap-1 size: 20.75GB used: 0.00GB (0%) fs: swap dev:
/dev/sda5
RAID: No RAID devices: /proc/mdstat, md_mod kernel module present Sensors: System Temperatures: cpu: 56.0C mobo: N/A Fan Speeds (in rpm): cpu: N/A Info: Processes: 118 Uptime: 17:49 Memory: 110.9/32087.7MB Init: systemd runlevel: 5 Client: Shell (bash) inxi: 2.1.28
## Coreboot on 4-core D-1527: ## root@invader0:~# POST: 0x4a romstage_main_continue status: 0 hob_list_ptr: 7f100000 FSP Status: 0x0 POST: 0x4b POST: 0x4c POST: 0x4d CBMEM: IMD: root @ 7efff000 254 entries. IMD: root @ 7effec00 62 entries. POST: 0x4e CBFS: 'Master Header Locator' located CBFS at [800100:ffffc0) CBFS: Locating 'fallback/ramstage' CBFS: Found @ offset 33b80 size d857
coreboot-v0.4-5-g0e4829a5b5 Wed Jun 20 18:38:46 UTC 2018 ramstage starting... POST: 0x39 Moving GDT to 7effe9e0...ok POST: 0x80
## root@invader0:~# inxi -F System: Host: invader0 Kernel: 4.13-platina-mk1 x86_64 (64 bit) Console: tty 0 Distro: Debian GNU/Linux 8 Machine: Mobo: Intel model: Camelback Mountain Platina DC v: 1.0
serial:
123456789 Bios: coreboot v: v0.4-5-g0e4829a5b5 date: 06/20/2018 CPU: Quad core Intel Xeon D-1527 (-HT-MCP-) cache: 6144 KB Clock Speeds: 1: 2194 MHz 2: 2194 MHz 3: 2194 MHz 4: 2194 MHz 5: 2194 MHz 6: 2194 MHz 7: 2194 MHz 8: 2194 MHz Graphics: Card: Failed to Detect Video Card! Display Server: N/A driver: N/A tty size: 80x24 Advanced Data: N/A for root out of X Network: Card-1: Broadcom Device b960 driver: vfio-pci IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A Card-2: Broadcom Device b960 driver: vfio-pci IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A Card-3: Intel Device 15ab driver: ixgbe IF: eth1 state: down mac: 50:18:4c:00:16:a2 Card-4: Intel Device 15ab driver: ixgbe IF: eth2 state: down mac: 50:18:4c:00:16:a3 Card-5: Intel I210 Gigabit Network Connection driver: igb IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 50:18:4c:00:16:a1 Drives: HDD Total Size: 136.1GB (5.0% used) ID-1: /dev/sda model: SanDisk_SD8SMAT1 size: 128.0GB ID-2: USB /dev/sdb model: Echo size: 8.0GB Partition: ID-1: / size: 98G used: 1.4G (2%) fs: ext4 dev: /dev/sda6 ID-2: swap-1 size: 5.70GB used: 0.00GB (0%) fs: swap dev:
/dev/sda5
RAID: No RAID devices: /proc/mdstat, md_mod kernel module present Sensors: System Temperatures: cpu: 56.0C mobo: N/A Fan Speeds (in rpm): cpu: N/A Info: Processes: 133 Uptime: 1 min Memory: 149.1/16078.1MB Init: systemd runlevel: 5 Client: Shell (bash) inxi: 2.1.28 root@invader0:~# _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org