We're in the process of deploying Linuxbios to boot a cluster of Opteron based blade servers. I built a new kernel, put it in /tftpboot after using:
mkelf-linux --rootdir=rom --ip=rom ./bzImage > vmlinuz.tyan
and cycled the blade. Wen booting, I get:
Searching for server (DHCP) Me: 192.168.1.200, Server: 192.168.1.200, Gateway 192.168.1.1 Loading 192.168.1.200:vmlinuz.tyan (ELF)... done
LinuxBIOS-1.1.62.0_Fallback [date date] starting... Unknown Trwt
Looking at an old message to this list, there's this snippet: http://www.mail-archive.com/linuxbios@clustermatic.org/msg03174.html
where the following line(s) are, in set_Trwt()... if ((clocks < DTH_TRWT_MIN) || (clocks > DTH_TRWT_MAX)) { die("Unknown Trwt"); }
What is this, and why is it happening? Is there a patch needed for LinuxBIOS to work properly on modern Opterons?
Thanks, I'm in dire straights here, any help would be appreciated.
-dbs
Is this on the S2880? The only patch you might need is for integrated ATi VGA support on some mainboards. Other than that a current CVS tree should do the trick.
On Thu, 19 Aug 2004, Dave Belfer-Shevett wrote:
We're in the process of deploying Linuxbios to boot a cluster of Opteron based blade servers. I built a new kernel, put it in /tftpboot after using:
mkelf-linux --rootdir=rom --ip=rom ./bzImage > vmlinuz.tyan
and cycled the blade. Wen booting, I get:
Searching for server (DHCP) Me: 192.168.1.200, Server: 192.168.1.200, Gateway 192.168.1.1 Loading 192.168.1.200:vmlinuz.tyan (ELF)... done
LinuxBIOS-1.1.62.0_Fallback [date date] starting... Unknown Trwt
Looking at an old message to this list, there's this snippet: http://www.mail-archive.com/linuxbios@clustermatic.org/msg03174.html
where the following line(s) are, in set_Trwt()... if ((clocks < DTH_TRWT_MIN) || (clocks > DTH_TRWT_MAX)) { die("Unknown Trwt"); }
What is this, and why is it happening? Is there a patch needed for LinuxBIOS to work properly on modern Opterons?
Thanks, I'm in dire straights here, any help would be appreciated.
-dbs
On Thu, 2004-08-19 at 18:34, Hendricks David W. wrote:
Is this on the S2880? The only patch you might need is for integrated ATi VGA support on some mainboards. Other than that a current CVS tree should do the trick.
Nope, this is on an S2882...
Let me try to clarify what you're saying... the version we're running (1.1.62.0) will not work properly, and we need a current version from CVS to fix this problem?
No problem either way, I just want to be clear before I start down this road :)
-dbs
On Thu, 19 Aug 2004, Dave Belfer-Shevett wrote:
We're in the process of deploying Linuxbios to boot a cluster of Opteron based blade servers. I built a new kernel, put it in /tftpboot after using:
mkelf-linux --rootdir=rom --ip=rom ./bzImage > vmlinuz.tyan
and cycled the blade. Wen booting, I get:
Searching for server (DHCP) Me: 192.168.1.200, Server: 192.168.1.200, Gateway 192.168.1.1 Loading 192.168.1.200:vmlinuz.tyan (ELF)... done
LinuxBIOS-1.1.62.0_Fallback [date date] starting... Unknown Trwt
Looking at an old message to this list, there's this snippet: http://www.mail-archive.com/linuxbios@clustermatic.org/msg03174.html
where the following line(s) are, in set_Trwt()... if ((clocks < DTH_TRWT_MIN) || (clocks > DTH_TRWT_MAX)) { die("Unknown Trwt"); }
What is this, and why is it happening? Is there a patch needed for LinuxBIOS to work properly on modern Opterons?
Thanks, I'm in dire straights here, any help would be appreciated.
-dbs
Dave Belfer-Shevett dbs@stonekeep.com writes:
We're in the process of deploying Linuxbios to boot a cluster of Opteron based blade servers. I built a new kernel, put it in /tftpboot after using:
mkelf-linux --rootdir=rom --ip=rom ./bzImage > vmlinuz.tyan
This is your first problem.
mkelf-linux does not know how to bypass the BIOS calls so it will not work under LinuxBIOS.
Please get mkelfImage 2.5
ftp://ftp.lnxi.com/pub/mkelfImage.
and cycled the blade. Wen booting, I get:
Searching for server (DHCP) Me: 192.168.1.200, Server: 192.168.1.200, Gateway 192.168.1.1 Loading 192.168.1.200:vmlinuz.tyan (ELF)... done
LinuxBIOS-1.1.62.0_Fallback [date date] starting... Unknown Trwt
I would not expect to see this message after a triple fault, but stranger things have happened.
Looking at an old message to this list, there's this snippet: http://www.mail-archive.com/linuxbios@clustermatic.org/msg03174.html
where the following line(s) are, in set_Trwt()... if ((clocks < DTH_TRWT_MIN) || (clocks > DTH_TRWT_MAX)) { die("Unknown Trwt"); }
What is this, and why is it happening?
I don't know the why. This bit is simply a sanity check that the information coming from your serial EEPROM on your dimms is fine.
Is there a patch needed for LinuxBIOS to work properly on modern Opterons?
One should not be needed.
Thanks, I'm in dire straights here, any help would be appreciated.
If you still have problems after you start using mkelfImage ask again.
Eric
On 19 Aug 2004, Eric W. Biederman wrote:
where the following line(s) are, in set_Trwt()... if ((clocks < DTH_TRWT_MIN) || (clocks > DTH_TRWT_MAX)) { die("Unknown Trwt"); }
What is this, and why is it happening?
I don't know the why. This bit is simply a sanity check that the information coming from your serial EEPROM on your dimms is fine.
I have seen this when SMBUS is broken and "clocks" is 0xff (i.e. > DTH_TRWT_MAX). You might want to add a print here to see what the value of clocks is.
ron
On Thu, 2004-08-19 at 19:29, Eric W. Biederman wrote:
mkelf-linux --rootdir=rom --ip=rom ./bzImage > vmlinuz.tyan
This is your first problem.
mkelf-linux does not know how to bypass the BIOS calls so it will not work under LinuxBIOS.
Ahhhh.
Please get mkelfImage 2.5 ftp://ftp.lnxi.com/pub/mkelfImage.
Done, compiled, and installed. I'm a little worried here though:
./mkelfImage --kernel=/tftpboot/bzImage --output=/tftpboot/vmlinuz-2.tyan
When I do a 'file' on the new image, I see:
/tftpbot/vmlinuz-2.tyan: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, corrupted section header size
This image (bzImage) is an Opteron compiled kernel, which should be 64 bit, unless there's no real difference in the elf header.
Sorry if this is sounding pretty whiny - I'm under a lot of pressure to get this all going :)
BTW - if folks are live and have suggestions, -please- feel free to IM me:
AIM: bostonshayde Jabber: dbs@jabber.stonekeep.com irc.freenode.net: shayde
-dbs
* Dave Belfer-Shevett dbs@stonekeep.com [040820 16:35]:
When I do a 'file' on the new image, I see:
/tftpbot/vmlinuz-2.tyan: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, corrupted section header size
This image (bzImage) is an Opteron compiled kernel, which should be 64 bit, unless there's no real difference in the elf header.
Sorry if this is sounding pretty whiny - I'm under a lot of pressure to get this all going :)
The kernel normally switches from 16bit real mode to protected mode, and from 32bit protected mode to 64bit long mode.
After applying mkelfImage, the resulting kernel image is a 32bit image that switches to 64bit long mode itself. Since LinuxBIOS on Opteron is 32bit only, this is exactly what you want.
No idea what goes wrong with the section header size though..
Stefan
Stefan Reinauer stepan@openbios.org writes:
- Dave Belfer-Shevett dbs@stonekeep.com [040820 16:35]:
When I do a 'file' on the new image, I see:
/tftpbot/vmlinuz-2.tyan: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, corrupted section header size
This image (bzImage) is an Opteron compiled kernel, which should be 64 bit, unless there's no real difference in the elf header.
Sorry if this is sounding pretty whiny - I'm under a lot of pressure to get this all going :)
The kernel normally switches from 16bit real mode to protected mode, and from 32bit protected mode to 64bit long mode.
After applying mkelfImage, the resulting kernel image is a 32bit image that switches to 64bit long mode itself. Since LinuxBIOS on Opteron is 32bit only, this is exactly what you want.
No idea what goes wrong with the section header size though..
Some tools get upset when the see a ELF file with the option section header missing. They report that a section header size of 0 is ``corruption''.
Eric
the elfimage looks kinda look an elf file to many linux tools but they can't quite grok it, so ignore the error from 'file'.
ron
Dave Belfer-Shevett dbs@stonekeep.com writes:
On Thu, 2004-08-19 at 19:29, Eric W. Biederman wrote:
mkelf-linux --rootdir=rom --ip=rom ./bzImage > vmlinuz.tyan
This is your first problem.
mkelf-linux does not know how to bypass the BIOS calls so it will not work under LinuxBIOS.
Ahhhh.
Please get mkelfImage 2.5 ftp://ftp.lnxi.com/pub/mkelfImage.
Done, compiled, and installed. I'm a little worried here though:
./mkelfImage --kernel=/tftpboot/bzImage --output=/tftpboot/vmlinuz-2.tyan
When I do a 'file' on the new image, I see:
/tftpbot/vmlinuz-2.tyan: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, corrupted section header size
This image (bzImage) is an Opteron compiled kernel, which should be 64 bit, unless there's no real difference in the elf header.
Stefan already gave the practical explanation. The small bit beyond that is the x86_64 kernel does not even have a valid 64bit entry point.
The code is actually present in the 32bit etherboot to cope with an 64bit in for x86_64. But except for test programs I have not been able to find any native ones.
Eric
On Thu, 2004-08-19 at 19:29, Eric W. Biederman wrote:
mkelf-linux does not know how to bypass the BIOS calls so it will not work under LinuxBIOS.
Please get mkelfImage 2.5
ftp://ftp.lnxi.com/pub/mkelfImage.
Thanks Eric, but I think the problem is deeper than this.
I just tried to start Linuxbios on the machine, with no network cable, no drives, no nothing. It starts up, comes to the Boot from (N)etwork (D)isk (F)loppy or (Q)uit.
I hit 'Q' here.
And I get LinuxBIOS-1.1.62.0_Fallback [date date] starting... Unknown Trwt
So this is happening way before the image is even downloaded.
I'm experiencing other problems with this bios image as well. I can't boot of hard drive or off floppy, it justs, for [D]isk boot "probing pci disk [IDE] LBA48mode, disk-1... Searching for image... <abort> Probing pci disk... [IDE] Probing isa disk... <sleep> Boot from (N)etwork...
When I replace the BIOS chip back with the AMI original image, I'm able to boot from floppy without a problem.
So, my next step is to rebuild my linuxbios image, and flash it again... but... it appears that cvs.sourceforge.net is down. Does anyone have a recent checkout of the CVS tree I can wget or ftp? Please IM or mail me...
-dbs
Dave Belfer-Shevett dbs@stonekeep.com writes:
On Thu, 2004-08-19 at 19:29, Eric W. Biederman wrote:
mkelf-linux does not know how to bypass the BIOS calls so it will not work under LinuxBIOS.
Please get mkelfImage 2.5
ftp://ftp.lnxi.com/pub/mkelfImage.
Thanks Eric, but I think the problem is deeper than this.
I just tried to start Linuxbios on the machine, with no network cable, no drives, no nothing. It starts up, comes to the Boot from (N)etwork (D)isk (F)loppy or (Q)uit.
I hit 'Q' here.
The 'Q' option is generic in etherboot and it does not do anything sensible under LinuxBIOS. So you are probably triggering a triple fault again. It does look like the code you have does not know how to reboot properly but that is a minor issue.
And I get LinuxBIOS-1.1.62.0_Fallback [date date] starting... Unknown Trwt
So this is happening way before the image is even downloaded.
I'm experiencing other problems with this bios image as well. I can't boot of hard drive or off floppy, it justs, for [D]isk boot "probing pci disk [IDE] LBA48mode, disk-1... Searching for image...
<abort> Probing pci disk... [IDE] Probing isa disk... <sleep> Boot from (N)etwork...
Unless you have an ELF header at the start of your disk that is likely the culprit.
The simple path is to get etherboot working with an image created by mkelfImage.
When I replace the BIOS chip back with the AMI original image, I'm able to boot from floppy without a problem.
You have a cluster node with a floppy drive?
So, my next step is to rebuild my linuxbios image, and flash it again... but... it appears that cvs.sourceforge.net is down. Does anyone have a recent checkout of the CVS tree I can wget or ftp? Please IM or mail me...
On Fri, 2004-08-20 at 13:03, Eric W. Biederman wrote:
I hit 'Q' here.
The 'Q' option is generic in etherboot and it does not do anything sensible under LinuxBIOS. So you are probably triggering a triple fault again. It does look like the code you have does not know how to reboot properly but that is a minor issue.
Okay, sounds fine.
I'm experiencing other problems with this bios image as well. I can't boot of hard drive or off floppy, it justs, for [D]isk boot "probing pci disk [IDE] LBA48mode, disk-1... Searching for image...
<abort> Probing pci disk... [IDE] Probing isa disk... <sleep> Boot from (N)etwork...
Unless you have an ELF header at the start of your disk that is likely the culprit.
Ummm. okay, so I need to make an elf-enabled floppy disk image?
The simple path is to get etherboot working with an image created by mkelfImage.
I agree 8)
When I replace the BIOS chip back with the AMI original image, I'm able to boot from floppy without a problem.
You have a cluster node with a floppy drive?
Only when I pull the blade out and manually hook a floppy drive to it so I can re-flash the bios.
But getting back on track, am I doing this sequence improperly?
On the boot host... cd /usr/src/linux[whatever] make bzImage [compilecompilecompile] I end up with an arch/x86_64/boot/bzImage bootable image. cp arch/x86_64/boot/bzImage /tftpboot ~/src/mkelfImage-2.5/objdir/sbin/mkelfImage --kernel=/tftpboot/bzImage --output=/tftpboot/test1 I alter dhcpd.conf to have 'filename "test1"'
I restart dhcpd, restart the blade.
Note the Etherboot header says:
Loading Etherboot version: 5.2.4 ROM segment 0x0000 length 0x0000 reloc 0x00000000 CPU 2063mhz Etherboot 4.2.5( (GPL)... Tagged ELF for [IDE][TG3] <-- okay? Relocating _text from... Boot from (N)etwork....
After selecting 'boot from Network'
I get _Probing pci nic... [ptg3-5704]Ethernet add... Tigon3 [partno... Link is up.. Searching for server Me: (addresses) Loading 192.168.1.200:test1 (ELF)... done Firmware type: LinuxBIOS _
(note the ' ' before the cursor) - at this point it's hung.
Is this the right process?
Dave Belfer-Shevett dbs@stonekeep.com writes:
On Fri, 2004-08-20 at 13:03, Eric W. Biederman wrote:
I hit 'Q' here.
The 'Q' option is generic in etherboot and it does not do anything sensible under LinuxBIOS. So you are probably triggering a triple fault again. It does look like the code you have does not know how to reboot properly but that is a minor issue.
Okay, sounds fine.
I'm experiencing other problems with this bios image as well. I can't boot of hard drive or off floppy, it justs, for [D]isk boot "probing pci disk [IDE] LBA48mode, disk-1... Searching for image...
<abort> Probing pci disk... [IDE] Probing isa disk... <sleep> Boot from (N)etwork...
Unless you have an ELF header at the start of your disk that is likely the culprit.
Ummm. okay, so I need to make an elf-enabled floppy disk image?
dd the output of mkelfImage onto the floppy should work. Assuming you have a valid floppy image.
The simple path is to get etherboot working with an image created by
mkelfImage.
I agree 8)
When I replace the BIOS chip back with the AMI original image, I'm able to boot from floppy without a problem.
You have a cluster node with a floppy drive?
Only when I pull the blade out and manually hook a floppy drive to it so I can re-flash the bios.
But getting back on track, am I doing this sequence improperly?
On the boot host... cd /usr/src/linux[whatever] make bzImage [compilecompilecompile] I end up with an arch/x86_64/boot/bzImage bootable image. cp arch/x86_64/boot/bzImage /tftpboot ~/src/mkelfImage-2.5/objdir/sbin/mkelfImage --kernel=/tftpboot/bzImage --output=/tftpboot/test1 I alter dhcpd.conf to have 'filename "test1"'
I restart dhcpd, restart the blade.
Hmm. You have not specified any command line arguments. --append="console=ttyS0,115200" or something like that is likely needed.
Note the Etherboot header says:
Loading Etherboot version: 5.2.4 ROM segment 0x0000 length 0x0000 reloc 0x00000000 CPU 2063mhz Etherboot 4.2.5( (GPL)... Tagged ELF for [IDE][TG3] <-- okay? Relocating _text from... Boot from (N)etwork....
After selecting 'boot from Network'
I get _Probing pci nic... [ptg3-5704]Ethernet add... Tigon3 [partno... Link is up.. Searching for server Me: (addresses) Loading 192.168.1.200:test1 (ELF)... done Firmware type: LinuxBIOS _
(note the ' ' before the cursor) - at this point it's hung.
Is this the right process?
Yes. And the Firmware type: LinuxBIOS comes from the prefix mkelfImage prepended. So it is even running.
My best guess is that you don't have serial console output configured.
Eric