-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
David Hendricks wrote:
On Wed, Jun 24, 2009 at 12:30 AM, David Hendricks dhendrix@google.comwrote:
On Tue, Jun 23, 2009 at 2:28 AM, Alois Schlögl alois.schloegl@tugraz.atwrote:
The reason for asking is the bug as described here: http://bugzilla.kernel.org/show_bug.cgi?id=13573
The bug is affecting my research at the university. It was suggested that a Bios-update could solve the problem.
IIRC you can disable thermal throttling, but it's usually not a good idea if you expect to keep your machine running with reasonable performance under load. If the vendor BIOS had the proper tables (See section 2.4.2 on P-States in the AMD BIOS and Kernel Developer's Guide for Fam10 processors), your CPU would slow itself down to avoid generating too much heat. If you disable thermal throttling and continue to run your workload, your CPU will hit "Tjunction" at around 116 degrees C and shut itself off abruptly, possibly after physical damage has been done to the CPU or the socket.
I would suggest starting with something much simpler, like making sure you have quality thermal transfer compound applied in the proper quantity for your CPUs. I know it sounds stupid, but I have seen many machines from many datacenters with very powerful rack cooling overheat under heavy loads due to improperly applied thermal grease. There are many tutorials and videos on how to do this. Make sure you clean off the old thermal grease first with a high-concentration isopropyl alcohol (>90%) first.
Oh, and while you're at it make sure the heatsinks are securely fastened. After you re-apply thermal grease, tighten the screws such that they will not turn any more. The mounting points on the motherboard will ensure the maximum threshold is not exceeded, though I suggest tightening one about 80-90%, then the second one 100%, then finish the first one to apply the pressure more evenly.
Just another very silly thing that can cause unexpected behavior under heavy workloads...
Thanks for these hints. That sound very reasonable to me. If understood you correctly, the shutdown occurs because the current bios is missing the _PSS table. Update the bios would resolve this, and re-apply thermal grease, would throttle the CPU later or not at all.
In order to update the bios, I tried followed also this approach http://ubuntuforums.org/showthread.php?p=2195208 as well as this http://manual.sidux.com/en/bios-freedos-en.htm
The final test showed that my USB-stick is bootable. Unfortunately, the system never boots from USB. I changed 1st, 2nd and 3rd boot device to USB-floppy, USB-ZIP, USB-CDROM and turned off HDD, no boot device was found. I guess this is another problem of the current bios. Therefore, coreboot would be really appreciated.
Carl-Daniel Hailfinger wrote:
Judging from experience, the legal review happens faster if we can show more (or more interesting) reference customers, so if you plan to use coreboot on 780G/SB700 as part of university research, we'd tell AMD about this immediately.
Carl,
you can tell AMD, that I'm working on some numerical methods that can efficiently handle missing values (encoded as NaN)
http://hci.tugraz.at/schloegl/matlab/NaN/ http://hci.tugraz.at/schloegl/matlab/tsa/
These methods are quite useful in various applications of biomedical signal processing (like electroencephalography) http://biosig.sourceforge.net/ http://hci.tugraz.at/schloegl/
These methods could be useful for Brain-Computer interface research, and for a better understanding of the human brain. Perhaps, the methods will be also useful in other application areas.
Best regards, Alois