Dear coreboot folks,
On the Lenovo T60 (with AMD/ATI graphics) the Linux kernel (4.9, 4.19, 5.3, 5.4) hangs after starting user space. As SeaBIOS, GRUB, payloads and FreeDOS work, I tried to limit the number of CPUs, and booting Linux with `nosmp` gave me a booting system. It worked with older coreboot versions, so I think it’s a regression. I am able to reproduce this with coreboot 4.11 and 4.11-422-g1a5c3bb7fa.
Is somebody else seeing this issue? Maybe on some i945 desktop board, so it would be easier to bisect?
Kind regards,
Paul
Dear coreboot folks,
On 2019-12-15 11:54, Paul Menzel wrote:
On the Lenovo T60 (with AMD/ATI graphics) the Linux kernel (4.9, 4.19, 5.3, 5.4) hangs after starting user space. As SeaBIOS, GRUB, payloads and FreeDOS work, I tried to limit the number of CPUs, and booting Linux with `nosmp` gave me a booting system. It worked with older coreboot versions, so I think it’s a regression. I am able to reproduce this with coreboot 4.11 and 4.11-422-g1a5c3bb7fa.
Is somebody else seeing this issue? Maybe on some i945 desktop board, so it would be easier to bisect?
Just as an update, here is the description.
nosmp [SMP] Tells an SMP kernel to act as a UP kernel, and disable the IO APIC. legacy for "maxcpus=0".
So, after seeing some IRC discussion in #coreboot@irc.freenode.net, the tests below were done.
System *boots* with one of:
1. maxcpus=0 (equivalent nosmp) 2. maxcpus=1 3. nolapic (with e1000 warning about missing MSI-X
System does *not* boot with one of:
1. maxcpus=2 2. noapic
But first, it’d be great if other i945 device users could confirm this.
Kind regards,
Paul
Hi,
Well my system have only 4.10-637 coreboot version, so I don't know if it is relevant.
My Kontron 986LCD-M (supported by coreboot) does the SMP without a problem. Kernel 4.20.0-rc2 (I didn't see any problem with current slackware kernel too). GPU is radeon RX460, kernel parameters "amdgpu.ppfeaturemask=0xfffffffb amdgpu.dc=1 earlyprintk=serial,ttyS0,115200,keep pci=assign-busses,pcie_scan_all,realloc raid=noautodetect acpi_enforce_resources =lax video=1440x900MR fbcon=map:0 memory_corruption_check=0 resume=/dev/sda1 resume_offset=260096" (but there are most likely some redundant ones - accumulated from testing). Suspend to HDD doesn't work (I think).
I've had to modify coreboot's devicetree as some superio devices requires to be defined even if not used (otherwise the resource allocator goes mad).
Petr
Dne 17. 01. 20 v 13:58 Paul Menzel napsal(a):
Dear coreboot folks,
On 2019-12-15 11:54, Paul Menzel wrote:
On the Lenovo T60 (with AMD/ATI graphics) the Linux kernel (4.9, 4.19, 5.3, 5.4) hangs after starting user space. As SeaBIOS, GRUB, payloads and FreeDOS work, I tried to limit the number of CPUs, and booting Linux with `nosmp` gave me a booting system. It worked with older coreboot versions, so I think it’s a regression. I am able to reproduce this with coreboot 4.11 and 4.11-422-g1a5c3bb7fa.
Is somebody else seeing this issue? Maybe on some i945 desktop board, so it would be easier to bisect?
Just as an update, here is the description.
nosmp [SMP] Tells an SMP kernel to act as a UP kernel, and disable the IO APIC. legacy for "maxcpus=0".
So, after seeing some IRC discussion in #coreboot@irc.freenode.net, the tests below were done.
System *boots* with one of:
- maxcpus=0 (equivalent nosmp)
- maxcpus=1
- nolapic (with e1000 warning about missing MSI-X
System does *not* boot with one of:
- maxcpus=2
- noapic
But first, it’d be great if other i945 device users could confirm this.
Kind regards,
Paul
coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
System *boots* with one of:
- maxcpus=0 (equivalent nosmp)
- maxcpus=1
- nolapic (with e1000 warning about missing MSI-X
System does *not* boot with one of:
- maxcpus=2
- noapic
I thought SMP was generally not compatible with PIC IRQ routing (which noapic enforces?) and this would explain case 2.
As for case 1, maybe I missed some detail with my commit [1] when switching from LAPIC to TSC timers. Like leaving LAPIC timers running at different rate or generally having the timer counters too much out-of-sync across CPU #0 and #1. You coud try if that one is the commit with regression.
[1] https://review.coreboot.org/c/coreboot/+/34200
Kyösti
Dear Kyösti,
On 2020-01-25 00:56, Kyösti Mälkki wrote:
System *boots* with one of:
- maxcpus=0 (equivalent nosmp)
- maxcpus=1
- nolapic (with e1000 warning about missing MSI-X
System does *not* boot with one of:
- maxcpus=2
- noapic
Booting with `maxcpus=1` and then starting the second CPU also results in a hang.
echo 1 | sudo tee /sys/devices/system/cpu/cpu0/online
I thought SMP was generally not compatible with PIC IRQ routing (which noapic enforces?) and this would explain case 2.
As for case 1, maybe I missed some detail with my commit [1] when switching from LAPIC to TSC timers. Like leaving LAPIC timers running at different rate or generally having the timer counters too much out-of-sync across CPU #0 and #1. You could try if that one is the commit with regression.
Yes, you are spot on.
Building the parent commit c00e2fb996 (cpu/intel: Use CPU_INTEL_COMMON_TIMEBASE) the system boots with both CPUs.
Kind regards,
Paul
[1] https://review.coreboot.org/c/coreboot/+/34200%5B2]: https://review.coreboot.org/c/coreboot/+/31342
Dear Kyösti, dear coreboot folks,
Am 11.02.20 um 16:11 schrieb Paul Menzel:
On 2020-01-25 00:56, Kyösti Mälkki wrote:
System *boots* with one of:
- maxcpus=0 (equivalent nosmp)
- maxcpus=1
- nolapic (with e1000 warning about missing MSI-X
System does *not* boot with one of:
- maxcpus=2
- noapic
Booting with `maxcpus=1` and then starting the second CPU also results in a hang.
echo 1 | sudo tee /sys/devices/system/cpu/cpu0/online
I thought SMP was generally not compatible with PIC IRQ routing (which noapic enforces?) and this would explain case 2.
As for case 1, maybe I missed some detail with my commit [1] when switching from LAPIC to TSC timers. Like leaving LAPIC timers running at different rate or generally having the timer counters too much out-of-sync across CPU #0 and #1. You could try if that one is the commit with regression.
Yes, you are spot on.
Building the parent commit c00e2fb996 (cpu/intel: Use CPU_INTEL_COMMON_TIMEBASE) the system boots with both CPUs.
It turns out, that my conclusion was incorrect, and this wasn’t enough to actually find the change introducing the regression. Yesterday, a Lenovo T60 user finished bisecting the problem after Nico encouraged it, and it turns out that commit c1dc2d5e68 (mb/lenovo/t60: Switch to override tree) [3] was incomplete. Nico found the problem, and submitted a fix [4]. That also explains, why the problem was only seen on the Lenovo T60 (and possibly variants), and not other Intel 945 device users.
Sorry for not going all the way in the beginning.
Kind regards,
Paul
[3]: https://review.coreboot.org/c/coreboot/+/34779 [4]: https://review.coreboot.org/c/coreboot/+/43150 "mb/lenovo/t60: Fix override devicetrees"