I have been looking into some thermal shutdown issues with our board, and discovered that LinuxBIOS is not enabling the P4's ability to throttle back when the internal sensors indicate that the processor is getting too hot.
According to the System Programming Guide, Vol. 3 of the IA-32 Intel Architecture Software Developer's Manual (p. 13-24):
"BIOS is required to enable only one automatic thermal monitoring modes (sic). Operating systems and applications must not disable the operation of these mechanisms."
Has this been considered for LB? I didn't see any discussions in the mailing list archives.
It would be relatively easy to implement support for "TM1" thermal control, since that's just setting a MSR bit if the proper CPUID flag is set. "TM2" control would be a little more complicated, since there is the added ability to control the operating frequency and voltage when the processor trips into thermal management mode.
Steve www.digidescorp.com
On Wed, 27 Jul 2005, Steven J. Magnani wrote:
I have been looking into some thermal shutdown issues with our board, and discovered that LinuxBIOS is not enabling the P4's ability to throttle back when the internal sensors indicate that the processor is getting too hot.
yes, this is purposeful. We let linux do that here. It is a bad deal if 1 node of 1022 decides to slow down.
We do this type of thing in Linux. See the LLNL p4therm module for a piece of kernel code that can let you talk to the hardware.
I think Intel's emphasis on having the BIOS do this kind of thing is a real mistake.
It would be relatively easy to implement support for "TM1" thermal control, since that's just setting a MSR bit if the proper CPUID flag is set. "TM2" control would be a little more complicated, since there is the added ability to control the operating frequency and voltage when the processor trips into thermal management mode.
It's easy, and the kernel (directed by a user-mode program) should do it.
ron
Ronald G. Minnich writes:
the kernel (directed by a user-mode program) should do it.
Ahh, but if the payload is memtest86, there IS no kernel. And that's where we're having serious shutdown issues.
See the LLNL p4therm module for a piece of kernel code that can let
you talk to the hardware.
All I can determine from the internet is that this magical piece of software handles Xeon thermal management. I can't see any links to source code.
I would argue that since this is a lurking "feature" of LinuxBIOS (you have to know about it to avoid being bitten, and have to actively do something in the kernel to avoid thermal issues) that having LB enable thermal monitoring should be configurable. That would solve both our issues.
Steve
-----Original Message----- From: Ronald G. Minnich [mailto:rminnich@lanl.gov] Sent: Wednesday, July 27, 2005 1:11 PM To: Steven J. Magnani Cc: linuxbios@openbios.org Subject: Re: [LinuxBIOS] Thermal monitoring of Pentium 4s
On Wed, 27 Jul 2005, Steven J. Magnani wrote:
I have been looking into some thermal shutdown issues with our board, and discovered that LinuxBIOS is not enabling the P4's ability to throttle back when the internal sensors indicate that the processor is
getting too hot.
yes, this is purposeful. We let linux do that here. It is a bad deal if 1 node of 1022 decides to slow down.
We do this type of thing in Linux. See the LLNL p4therm module for a piece of kernel code that can let you talk to the hardware.
I think Intel's emphasis on having the BIOS do this kind of thing is a real mistake.
It would be relatively easy to implement support for "TM1" thermal control, since that's just setting a MSR bit if the proper CPUID flag is set. "TM2" control would be a little more complicated, since there is the added ability to control the operating frequency and voltage when the processor trips into thermal management mode.
It's easy, and the kernel (directed by a user-mode program) should do it.
ron
On Wed, 27 Jul 2005, Steven J. Magnani wrote:
Ronald G. Minnich writes:
the kernel (directed by a user-mode program) should do it.
Ahh, but if the payload is memtest86, there IS no kernel. And that's where we're having serious shutdown issues.
ah, ok.
I would argue that since this is a lurking "feature" of LinuxBIOS (you have to know about it to avoid being bitten, and have to actively do something in the kernel to avoid thermal issues) that having LB enable thermal monitoring should be configurable. That would solve both our issues.
you are right. We'll need to set up an enable method for the p4 cpu resource, and then allow you to set "registers" commands to turn on t1, etc.
Um, volunteers :-)
ron