Hi,
this is my first time posting here and it is quite possible that I've overlooked something obvious. In that case please just point me to whatever I should have read and accept my apologies.
On my ThinkPad T430 running Coreboot-4.8.1 as part of an Heads install, I see these error messages when turning on the PC:
mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 7: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe78c0 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 8: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe7880 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 APIC 0 microcode 1f
I do not think it is an issue with the actual RAM:
[i7-3520M] [2x8GB Silicon Power Hynix] ... no error [i7-3632QM] [2x8GB Silicon Power Hynix] ... mce errors
[i7-3520M] [2x8GB Crucial] ... no error [i7-3740QM] [2x8GB Crucial] ... mce errors
While looking at this (but for other reasons) I replaced the original i7-3520M with a i7-3632QM and then went back to the original before ultimately settling with a i7-3740QM. I also went through three different sets of RAM, one of them obviously corrupt so I didn't even include it above.
All this is to say that the above and some searching leads me to think this might be more of a software / timing issue maybe -- but I am far out of my depth. Hence my asking here for ideas and pointers what to do next.
Thanks & Cheers, /Sven
Hi Sven,
On 11.06.21 00:55, Sven Semmler wrote:
this is my first time posting here and it is quite possible that I've overlooked something obvious. In that case please just point me to whatever I should have read and accept my apologies.
don't worry. If this were documented, I would have missed it too :)
On my ThinkPad T430 running Coreboot-4.8.1 as part of an Heads install, I see these error messages when turning on the PC:
mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 7: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe78c0 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 8: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe7880 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 APIC 0 microcode 1f
I do not think it is an issue with the actual RAM:
Indeed, this is not about the actual (D)RAM. One can tell by the address already, 0xfefe.... this is part of what we call I/O hole, a region reserved from the memory address space for different purposes.
More specifically, 0xfefe0000..0xfeffffff is a range used for cache- as-ram (CAR) which is a mode where the processor cache is used as RAM before the actual DRAM is available.
I have seen these MCEs before, but never investigated. They might affect the stability of coreboot, but it seems less likely that they affect the running OS once the system succeeded to boot.
Intel's x86 Software Developer's Manual (SDM) should explain how to decode these MCEs.
Some things to test come to mind: Does it report the same addresses on every boot? If so, one could try write-read-test these addresses early, right after CAR is set up.
Nico
Hi Nico, Sven,
On Fri, Jun 11, 2021 at 9:19 AM Nico Huber nico.h@gmx.de wrote:
Hi Sven,
On 11.06.21 00:55, Sven Semmler wrote:
On my ThinkPad T430 running Coreboot-4.8.1 as part of an Heads install, I see these error messages when turning on the PC:
mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 7: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe78c0 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 8: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe7880 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 APIC 0 microcode 1f
I do not think it is an issue with the actual RAM:
Indeed, this is not about the actual (D)RAM. One can tell by the address already, 0xfefe.... this is part of what we call I/O hole, a region reserved from the memory address space for different purposes.
More specifically, 0xfefe0000..0xfeffffff is a range used for cache- as-ram (CAR) which is a mode where the processor cache is used as RAM before the actual DRAM is available.
I have seen these MCEs before, but never investigated. They might affect the stability of coreboot, but it seems less likely that they affect the running OS once the system succeeded to boot.
They look a lot like what https://review.coreboot.org/28443 fixed. You could check if commit dfaff4d18a711f764c9198f488435fdc553dcea2 exists on 4.8.1 (if it does not, there's your problem).
Also, have you considered using a more recent coreboot version? 4.8.1 is over three years old, and the ThinkPad T430 is still supported on all newer releases (latest is 4.14).
Nico
Best regards, Angel
Dear Angel,
Am 11.06.21 um 12:05 schrieb Angel Pons:
On Fri, Jun 11, 2021 at 9:19 AM Nico Huber nico.h@gmx.de wrote:
On 11.06.21 00:55, Sven Semmler wrote:
On my ThinkPad T430 running Coreboot-4.8.1 as part of an Heads install, I see these error messages when turning on the PC:
mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 7: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe78c0 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 8: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe7880 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 APIC 0 microcode 1f
I do not think it is an issue with the actual RAM:
Indeed, this is not about the actual (D)RAM. One can tell by the address already, 0xfefe.... this is part of what we call I/O hole, a region reserved from the memory address space for different purposes.
More specifically, 0xfefe0000..0xfeffffff is a range used for cache- as-ram (CAR) which is a mode where the processor cache is used as RAM before the actual DRAM is available.
I have seen these MCEs before, but never investigated. They might affect the stability of coreboot, but it seems less likely that they affect the running OS once the system succeeded to boot.
They look a lot like what https://review.coreboot.org/28443 fixed. You could check if commit dfaff4d18a711f764c9198f488435fdc553dcea2 exists on 4.8.1 (if it does not, there's your problem).
$ git tag --contains dfaff4d18a711f764c9198f488435fdc553dcea2 4.10 4.11 4.12 4.13 4.14 4.9
So, it’s not included in 4.8.1
Also, have you considered using a more recent coreboot version? 4.8.1 is over three years old, and the ThinkPad T430 is still supported on all newer releases (latest is 4.14).
That’s a good idea. Looking at the Heads configuration files, the Lenovo systems are still at coreboot 4.8.1 [2]. There is a draft merge/pull request for coreboot 4.13: *(BRICKS T430 because OPTION_TABLE) WiP: build xx30 boards against coreboot 4.13 #944* [3].
Kind regards,
Paul
[1]: https://github.com/osresearch/heads/blob/c3b0bd6ffbe816430dd41ef54e649af52ed... [2]: https://github.com/osresearch/heads/blob/c3b0bd6ffbe816430dd41ef54e649af52ed... [3]: https://github.com/osresearch/heads/pull/944
Hi Angel, Long time no speak. Hope you are well. Sven is using 4.8.1 as its a Heads based installation.I see the patch went in Sept 2018 which is a fair few months after 4.8.1. Ill port a heads build with 4.11 for Sven (and myself, as I have the same issue) and see if that resolves.
cheers Simon
On Fri, 11 Jun 2021 at 11:06, Angel Pons th3fanbus@gmail.com wrote:
Hi Nico, Sven,
On Fri, Jun 11, 2021 at 9:19 AM Nico Huber nico.h@gmx.de wrote:
Hi Sven,
On 11.06.21 00:55, Sven Semmler wrote:
On my ThinkPad T430 running Coreboot-4.8.1 as part of an Heads install, I see these error messages when turning on the PC:
mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 7: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe78c0 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank: 8: ee20000000 3110a mce: [Hardware Error]: TSC 0 ADDR fefe7880 MISC 3880000086 mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1622589409 SOCKET 0 APIC 0 microcode 1f
I do not think it is an issue with the actual RAM:
Indeed, this is not about the actual (D)RAM. One can tell by the address already, 0xfefe.... this is part of what we call I/O hole, a region reserved from the memory address space for different purposes.
More specifically, 0xfefe0000..0xfeffffff is a range used for cache- as-ram (CAR) which is a mode where the processor cache is used as RAM before the actual DRAM is available.
I have seen these MCEs before, but never investigated. They might affect the stability of coreboot, but it seems less likely that they affect the running OS once the system succeeded to boot.
They look a lot like what https://review.coreboot.org/28443 fixed. You could check if commit dfaff4d18a711f764c9198f488435fdc553dcea2 exists on 4.8.1 (if it does not, there's your problem).
Also, have you considered using a more recent coreboot version? 4.8.1 is over three years old, and the ThinkPad T430 is still supported on all newer releases (latest is 4.14).
Nico
Best regards, Angel _______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-leave@coreboot.org
On 6/11/21 4:18 AM, Nico Huber wrote:
0xfefe0000..0xfeffffff is a range used for cache- as-ram (CAR) which is a mode where the processor cache is used as RAM before the actual DRAM is available.
Thank you Nico for the insight!
On 6/11/21 5:05 AM, Angel Pons wrote:
They look a lot like what https://review.coreboot.org/28443 fixed. You could check if commit dfaff4d18a711f764c9198f488435fdc553dcea2 exists on 4.8.1 (if it does not, there's your problem).
I manually applied the linked patch to my local coreboot-4.8.1 and no longer see those mce messages. Thank you Angel!
On 6/11/21 5:25 AM, Paul Menzel wrote:
There is a draft merge/pull request for coreboot 4.13: *(BRICKS T430 because OPTION_TABLE) WiP: build xx30 boards against coreboot 4.13 #944* [3].
Yeah, that worries me. So I'll stay away from that until I understand the subject matter a bit deeper.
On 6/11/21 5:26 AM, Simon Newton wrote:
Ill port a heads build with> 4.11 for Sven (and myself, as I have the same issue) and see if that resolves.
That might still be worthwhile, but if you just want to get rid of the mce errors the patch linked by Angel does the trick.
Cheers, /Sven
PS: I'll stay around and learn. Thanks everyone!