On Wed, Jun 24, 2009 at 12:30 AM, David Hendricks dhendrix@google.comwrote:
On Tue, Jun 23, 2009 at 2:28 AM, Alois Schlögl alois.schloegl@tugraz.atwrote:
The reason for asking is the bug as described here: http://bugzilla.kernel.org/show_bug.cgi?id=13573
The bug is affecting my research at the university. It was suggested that a Bios-update could solve the problem.
IIRC you can disable thermal throttling, but it's usually not a good idea if you expect to keep your machine running with reasonable performance under load. If the vendor BIOS had the proper tables (See section 2.4.2 on P-States in the AMD BIOS and Kernel Developer's Guide for Fam10 processors), your CPU would slow itself down to avoid generating too much heat. If you disable thermal throttling and continue to run your workload, your CPU will hit "Tjunction" at around 116 degrees C and shut itself off abruptly, possibly after physical damage has been done to the CPU or the socket.
I would suggest starting with something much simpler, like making sure you have quality thermal transfer compound applied in the proper quantity for your CPUs. I know it sounds stupid, but I have seen many machines from many datacenters with very powerful rack cooling overheat under heavy loads due to improperly applied thermal grease. There are many tutorials and videos on how to do this. Make sure you clean off the old thermal grease first with a high-concentration isopropyl alcohol (>90%) first.
Oh, and while you're at it make sure the heatsinks are securely fastened. After you re-apply thermal grease, tighten the screws such that they will not turn any more. The mounting points on the motherboard will ensure the maximum threshold is not exceeded, though I suggest tightening one about 80-90%, then the second one 100%, then finish the first one to apply the pressure more evenly.
Just another very silly thing that can cause unexpected behavior under heavy workloads...