One side comment from a lurker on this list. Your experience with the error hang after your login prompt (prints message like the one below)
LinuxBIOS-1.1.8_s2881_Fallback Fri Apr 7 18:06:33 EDT 2006 starting... (0,1) link=00 (1,0) link=00 02 nodes initialized. SBLink=02 NC node|link=02
reminded me of something I found and fixed a couple of weeks ago.
The board hangs when it does a warm reset on a bus error condition. This is due to a typo in the code in src/northbridge/amdk8/incoherent_ht.c in the function ht_setup_chainx (base indentation removed for clarity):
if (ctrl & ((1 << 4) | (1 << 8))) { /* * Either the link has failed, or we have * a CRC error. * Sometimes this can happen due to link * retrain, so lets knock it down and see * if its transient */ --> ctrl |= ((1 << 6) | (1 <<8)); // Link fail + Crc pci_write_config16(udev, upos + LINK_CTRL(uoffs), ctrl); ctrl = pci_read_config16(udev, upos + LINK_CTRL(uoffs)); if (ctrl & ((1 << 4) | (1 << 8))) { print_err("Detected error on Hypertransport Link\n"); break; } }
The line (marked with "-->" above) where the error bits to be cleared are set to '1' ctrl |= ((1 << 6) | (1 <<8)); // Link fail + Crc SHOULD be ctrl |= ((1 << 4) | (1 <<8)); // Link fail + Crc
The code should be setting the 1<<4 bit (so as to clear one of the error bits) instead of 1<<6 (the read only bit that signals end of HT bus chain).
After making this change, your board won't hang when the BIOS or Linux detects a need to reboot due to some fatal error.
This is a pretty trivial change to an error case that hardly ever occurs but was crucial for us. I claim this change is safe and correct and should be merged into the main line code.
Cheers, and thanks to all you who work so hard for all of the good and useful LinuxBIOS code. We have been very successful with it thus far in our project.
Alan Mimms, Senior Architect F5 Networks, Inc. Spokane Development Center 1322 North Whitman Lane Liberty Lake, Washington 99019 v: 509-343-3524 f: 509-343-3501
-----Original Message----- From: linuxbios-bounces@linuxbios.org [mailto:linuxbios-bounces@linuxbios.org] On Behalf Of Ward Vandewege Sent: Friday, April 07, 2006 3:15 PM To: Lu, Yinghai Cc: linuxbios@linuxbios.org Subject: Re: [LinuxBIOS] CONFIG_LB_MEM_TOPK
On Fri, Apr 07, 2006 at 02:02:47PM -0700, Lu, Yinghai wrote:
- Can you make one elf image with mkelfImage ( combine the kernel and
initrd)?
OK; this is what I've done:
1. Retrieved the mkelfImage program (v2.7) from
ftp://ftp.lnxi.com
2.
mkelfImage --command-line="ro root=/dev/md3 quiet splash console=tty0 console=ttyS0,115200n8" \ --kernel="/boot/vmlinuz-2.6.12-9-amd64-generic" \ --initrd="/boot/initrd.img-2.6.12-9-amd64-generic" \ --output="/boot/linuxbios_2.6.12-9-amd64-generic.elf"
3. Adjusted etherboot-5.4.1/file/Config
AUTOBOOT_FILE = "hde1:/linuxbios_2.6.12-9-amd64-generic.elf"
4.
cd etherboot-5.4.1/src make bin/tg3--filo.elf
5. Adjusted targets/tyan/s2881/Config.lb to point to the new payload (tg3--filo.elf)
6. Tried booting - it actually got to a login prompt this time, but as I hit enter on the kbd, it crashed again. Log is attached.
So that did not help. I'm going to try your next suggestion now.
Thanks, Ward.
- use latest kernel + suse rescue disk to make on elf to check your
/dev/md3...
YH
-----Original Message----- From: Ward Vandewege [mailto:ward@gnu.org] Sent: Friday, April 07, 2006 1:55 PM To: Lu, Yinghai Cc: linuxbios@linuxbios.org Subject: Re: [LinuxBIOS] CONFIG_LB_MEM_TOPK
On Fri, Apr 07, 2006 at 01:37:06PM -0700, Lu, Yinghai wrote:
I fixed one merge typo in src/southbride/amd/amd8111/amd8111_early_ctrl.c
Please change 15 to 11 at line 13.
OK; did that, but it did not help. Same problem still.
Ward.
-- Ward Vandewege ward@fsf.org Free Software Foundation - Senior System Administrator
-- linuxbios mailing list linuxbios@linuxbios.org http://www.openbios.org/mailman/listinfo/linuxbios
!DSPAM:4436d470278011542430122!
On Fri, Apr 07, 2006 at 04:00:28PM -0700, Alan Mimms wrote:
The code should be setting the 1<<4 bit (so as to clear one of the error bits) instead of 1<<6 (the read only bit that signals end of HT bus chain).
After making this change, your board won't hang when the BIOS or Linux detects a need to reboot due to some fatal error.
This is a pretty trivial change to an error case that hardly ever occurs but was crucial for us. I claim this change is safe and correct and should be merged into the main line code.
Yes, please merge that change!
Our box now keeps rebooting itself instead of crashing - have a look at the attached boot log. Maybe this will be more of a clue as to what exactly is happening? Anyone got anymore ideas?
Thanks, Alan!! Ward.