Hi,
I'm trying to revive a Linux Netwox eVelocity 2 Cluster with LinuxBIOS-1.1.7.8 and Etherboot 5.2 on compute nodes. After some trouble, I have access to a shell after node system installation over network. My problem now is that I can't get localboot to work.
I need to know if I can re-burn the BIOS with a newer version of coreboot or something else, so I can use PXE and use GRUB for booting from local.
Here is my lspci output. Thanks in advance for any help
lspci -tv -+-[0000:01]-+-01.0-[0000:02]--+-03.0 Broadcom Corporation NetXtreme BCM5702X Gigabit Ethernet | | -04.0 Broadcom Corporation NetXtreme BCM5702X Gigabit Ethernet | +-01.1 Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC | +-02.0-[0000:03-04]----01.0-[0000:04]----00.0 Mellanox Technologies MT23108 InfiniHost | +-02.1 Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC | +-03.0-[0000:05]--+-00.0 Advanced Micro Devices [AMD] AMD-8111 USB | | +-00.1 Advanced Micro Devices [AMD] AMD-8111 USB | | -06.0 ATI Technologies Inc Rage XL | +-04.0 Advanced Micro Devices [AMD] AMD-8111 LPC | +-04.1 Advanced Micro Devices [AMD] AMD-8111 IDE | +-04.2 Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 | +-04.3 Advanced Micro Devices [AMD] AMD-8111 ACPI | -04.6 Advanced Micro Devices [AMD] AMD-8111 MC97 Modem -[0000:00]-+-18.0 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration +-18.1 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map +-18.2 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller +-18.3 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control +-19.0 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration +-19.1 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map +-19.2 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller -19.3 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
sh-3.2# lspci -tvnn -+-[0000:01]-+-01.0-[0000:02]--+-03.0 14e4:16a6 | | -04.0 14e4:16a6 | +-01.1 1022:7451 | +-02.0-[0000:03-04]----01.0-[0000:04]----00.0 15b3:5a44 | +-02.1 1022:7451 | +-03.0-[0000:05]--+-00.0 1022:7464 | | +-00.1 1022:7464 | | -06.0 1002:4752 | +-04.0 1022:7468 | +-04.1 1022:7469 | +-04.2 1022:746a | +-04.3 1022:746b | -04.6 1022:746e -[0000:00]-+-18.0 1022:1100 +-18.1 1022:1101 +-18.2 1022:1102 +-18.3 1022:1103 +-19.0 1022:1100 +-19.1 1022:1101 +-19.2 1022:1102 -19.3 1022:1103
Sebastian Lara wrote:
Hi,
I'm trying to revive a Linux Netwox eVelocity 2 Cluster with LinuxBIOS-1.1.7.8 and Etherboot 5.2 on compute nodes. After some trouble, I have access to a shell after node system installation over network. My problem now is that I can't get localboot to work.
What's the error?
I need to know if I can re-burn the BIOS with a newer version of coreboot or something else, so I can use PXE and use GRUB for booting from local.
What mainboard type is that? Can you post a boot log from serial console?
Here is my lspci output. Thanks in advance for any help
lspci -tv -+-[0000:01]-+-01.0-[0000:02]--+-03.0 Broadcom Corporation NetXtreme BCM5702X Gigabit Ethernet | | -04.0 Broadcom Corporation NetXtreme BCM5702X Gigabit Ethernet | +-01.1 Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC | +-02.0-[0000:03-04]----01.0-[0000:04]----00.0 Mellanox Technologies MT23108 InfiniHost | +-02.1 Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC | +-03.0-[0000:05]--+-00.0 Advanced Micro Devices [AMD] AMD-8111 USB | | +-00.1 Advanced Micro Devices [AMD] AMD-8111 USB | | -06.0 ATI Technologies Inc Rage XL | +-04.0 Advanced Micro Devices [AMD] AMD-8111 LPC | +-04.1 Advanced Micro Devices [AMD] AMD-8111 IDE | +-04.2 Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 | +-04.3 Advanced Micro Devices [AMD] AMD-8111 ACPI | -04.6 Advanced Micro Devices [AMD] AMD-8111 MC97 Modem -[0000:00]-+-18.0 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration +-18.1 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map +-18.2 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller +-18.3 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control +-19.0 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration +-19.1 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map +-19.2 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller -19.3 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
sh-3.2# lspci -tvnn -+-[0000:01]-+-01.0-[0000:02]--+-03.0 14e4:16a6 | | -04.0 14e4:16a6 | +-01.1 1022:7451 | +-02.0-[0000:03-04]----01.0-[0000:04]----00.0 15b3:5a44 | +-02.1 1022:7451 | +-03.0-[0000:05]--+-00.0 1022:7464 | | +-00.1 1022:7464 | | -06.0 1002:4752 | +-04.0 1022:7468 | +-04.1 1022:7469 | +-04.2 1022:746a | +-04.3 1022:746b | -04.6 1022:746e -[0000:00]-+-18.0 1022:1100 +-18.1 1022:1101 +-18.2 1022:1102 +-18.3 1022:1103 +-19.0 1022:1100 +-19.1 1022:1101 +-19.2 1022:1102 -19.3 1022:1103
Hi,
2009/9/3 Stefan Reinauer stepan@coresystems.de:
Sebastian Lara wrote:
Hi,
I'm trying to revive a Linux Netwox eVelocity 2 Cluster with LinuxBIOS-1.1.7.8 and Etherboot 5.2 on compute nodes. After some trouble, I have access to a shell after node system installation over network. My problem now is that I can't get localboot to work.
What's the error?
Boot from (N)etwork (D)isk or (Q)uit? D
Probing pci disk... [IDE]LBA48 mode disk-1 78150744k cap: 2f00 Searching for image... ................................<abort> Probing pci disk... [IDE] Probing isa disk... <sleep>
Etherboot searches for an ELF header in the first 8K. I don't really know how can I do this. I had tried to use a ELF image of the kernel installed in the node but it doesn't work.
I need to know if I can re-burn the BIOS with a newer version of coreboot or something else, so I can use PXE and use GRUB for booting from local.
What mainboard type is that? Can you post a boot log from serial console?
It's there some way to know the mainboard type without open a node case? Get permission to do that may take some time.
Here is the boot log:
LinuxBIOS-1.1.7.8Normal Fri Jan 7 13:55:58 MST 2005 starting... setting up resource map....done. 02 nodes initialized. ht reset -
LinuxBIOS-1.1.7.8Normal Fri Jan 7 13:55:58 MST 2005 starting... setting up resource map....done. 02 nodes initialized. Ram1.00 Ram1.01 Ram2.00 Ram2.01 Ram3 Initializing memory: done Initializing memory: done Clearing initial memory region: done Ram4 Copying LinuxBIOS to ram. Jumping to LinuxBIOS. LinuxBIOS-1.1.7.8Normal Fri Jan 7 13:55:58 MST 2005 booting... Enumerating buses... PCI_DOMAIN: 0000 enabled APIC_CLUSTER: 0 enabled PCI: pci_scan_bus for bus 0 PCI: 00:18.0 [1022/1100] bus ops PCI: 00:18.0 [1022/1100] enabled PCI: 00:18.1 [1022/1101] enabled PCI: 00:18.2 [1022/1102] enabled PCI: 00:18.3 [1022/1103] ops PCI: 00:18.3 [1022/1103] enabled PCI: 00:19.0 [1022/1100] bus ops PCI: 00:19.0 [1022/1100] enabled PCI: 00:19.1 [1022/1101] enabled PCI: 00:19.2 [1022/1102] enabled PCI: 00:19.3 [1022/1103] ops PCI: 00:19.3 [1022/1103] enabled PCI: 01:01.0 [1022/7450] enabled next_unitid: 0003 PCI: 01:03.0 [1022/7460] enabled next_unitid: 0007 HyperT reset not needed PCI: pci_scan_bus for bus 1 PCI: 01:01.0 [1022/7450] bus ops PCI: 01:01.0 [1022/7450] enabled PCI: 01:01.1 [1022/7451] ops PCI: 01:01.1 [1022/7451] enabled PCI: 01:02.0 [1022/7450] bus ops PCI: 01:02.0 [1022/7450] enabled PCI: 01:02.1 [1022/7451] ops PCI: 01:02.1 [1022/7451] enabled PCI: 01:03.0 [1022/7460] bus ops PCI: 01:03.0 [1022/7460] enabled PCI: 01:04.0 [1022/7468] bus ops PCI: 01:04.0 [1022/7468] enabled PCI: 01:04.1 [1022/7469] ops PCI: 01:04.1 [1022/7469] enabled PCI: 01:04.2 [1022/746a] bus ops PCI: 01:04.2 [1022/746a] enabled PCI: 01:04.3 [1022/746b] bus ops PCI: 01:04.3 [1022/746b] enabled PCI: 01:04.5 No device operations PCI: 01:04.6 [1022/746e] ops PCI: 01:04.6 [1022/746e] enabled PCI: pci_scan_bus for bus 2 PCI: 02:03.0 [14e4/16a6] enabled PCI: 02:04.0 [14e4/16a6] enabled PCI: pci_scan_bus returning with max=02 PCI: pci_scan_bus for bus 3 PCI: 03:01.0 [15b3/5a46] enabled PCI: pci_scan_bus for bus 4 PCI: 04:00.0 [15b3/5a44] enabled PCI: pci_scan_bus returning with max=04 PCI: pci_scan_bus returning with max=04 PCI: pci_scan_bus for bus 5 PCI: 05:00.0 [1022/7464] bus ops PCI: 05:00.0 [1022/7464] enabled PCI: 05:00.1 [1022/7464] bus ops PCI: 05:00.1 [1022/7464] enabled PCI: 05:00.2 No device operations PCI: 05:01.0 No device operations PCI: 05:06.0 [1002/4752] enabled PCI: pci_scan_bus returning with max=05 PNP: 002e.0 disabled PNP: 002e.1 disabled PNP: 002e.2 disabled PNP: 002e.3 enabled PNP: 002e.4 disabled PNP: 002e.5 disabled PNP: 002e.6 enabled PNP: 002e.7 disabled PNP: 002e.8 disabled PNP: 002e.9 disabled PNP: 002e.a disabled I2C: 70 enabled I2C: 50 enabled I2C: 51 enabled I2C: 52 enabled I2C: 53 enabled I2C: 54 enabled I2C: 55 enabled I2C: 56 enabled I2C: 57 enabled PCI: pci_scan_bus returning with max=05 PCI: pci_scan_bus returning with max=05 CPU: APIC: 00 enabled CPU: APIC: 01 enabled done Allocating resources... PCI: 01:01.0 1c <- [0x00fffff000 - 0x00ffffefff] bus 2 io PCI: 01:01.0 24 <- [0xfffffffffff00000 - 0xffffffffffefffff] bus 2 prefmem PCI: 03:01.0 1c <- [0x00fffff000 - 0x00ffffefff] bus 4 io PCI: 01:02.0 1c <- [0x00fffff000 - 0x00ffffefff] bus 3 io PCI: 01:03.0 24 <- [0x00fff00000 - 0x00ffefffff] bus 5 prefmem Allocating VGA resource PCI: 05:06.0 PCI: 00:18.0 1b8 <- [0x00e8000000 - 0x00f07fffff] prefmem <node 0 link 0> PCI: 00:18.0 1c0 <- [0x0000001000 - 0x0000002fff] io <node 0 link 0> PCI: 00:18.0 1b0 <- [0x00f8000000 - 0x00f93fffff] mem <node 0 link 0> PCI: 01:01.0 20 <- [0x00f9100000 - 0x00f91fffff] bus 2 mem PCI: 02:03.0 10 <- [0x00f9100000 - 0x00f910ffff] mem PCI: 02:04.0 10 <- [0x00f9110000 - 0x00f911ffff] mem PCI: 01:01.1 10 <- [0x00f9300000 - 0x00f9300fff] mem PCI: 01:02.0 24 <- [0x00e8000000 - 0x00f07fffff] bus 3 prefmem PCI: 01:02.0 20 <- [0x00f9200000 - 0x00f92fffff] bus 3 mem PCI: 03:01.0 24 <- [0x00e8000000 - 0x00f07fffff] bus 4 prefmem PCI: 03:01.0 20 <- [0x00f9200000 - 0x00f92fffff] bus 4 mem PCI: 04:00.0 10 <- [0x00f9200000 - 0x00f92fffff] mem PCI: 04:00.0 18 <- [0x00f0000000 - 0x00f07fffff] prefmem PCI: 04:00.0 20 <- [0x00e8000000 - 0x00efffffff] prefmem PCI: 01:02.1 10 <- [0x00f9301000 - 0x00f9301fff] mem PCI: 01:03.0 1c <- [0x0000001000 - 0x0000001fff] bus 5 io PCI: 01:03.0 20 <- [0x00f8000000 - 0x00f90fffff] bus 5 mem PCI: 05:00.0 10 <- [0x00f9000000 - 0x00f9000fff] mem PCI: 05:00.1 10 <- [0x00f9001000 - 0x00f9001fff] mem PCI: 05:06.0 10 <- [0x00f8000000 - 0x00f8ffffff] mem PCI: 05:06.0 14 <- [0x0000001000 - 0x00000010ff] io PCI: 05:06.0 18 <- [0x00f9002000 - 0x00f9002fff] mem PNP: 002e.3 60 <- [0x00000003f8 - 0x00000003ff] io PNP: 002e.3 70 <- [0x0000000004 - 0x0000000004] irq PNP: 002e.6 60 <- [0x0000000060 - 0x0000000067] io PNP: 002e.6 62 <- [0x0000000064 - 0x000000006b] io PNP: 002e.6 70 <- [0x0000000001 - 0x0000000001] irq PCI: 01:04.1 20 <- [0x00000028a0 - 0x00000028af] io PCI: 01:04.2 10 <- [0x0000002880 - 0x000000289f] io PCI: 01:04.3 58 <- [0x0000002000 - 0x00000020ff] io PCI: 01:04.6 10 <- [0x0000002400 - 0x00000024ff] io PCI: 01:04.6 14 <- [0x0000002800 - 0x000000287f] io PCI: 00:18.3 94 <- [0x00f4000000 - 0x00f7ffffff] mem <gart> PCI: 00:19.3 94 <- [0x00f4000000 - 0x00f7ffffff] mem <gart> done. Enabling resourcess... PCI: 00:18.0 cmd <- 140 PCI: 01:01.0 bridge ctrl <- 0003 PCI: 01:01.0 cmd <- 146 PCI: 02:03.0 cmd <- 142 PCI: 02:04.0 cmd <- 142 PCI: 01:01.1 subsystem <- 161f/3016 PCI: 01:01.1 cmd <- 146 PCI: 01:02.0 bridge ctrl <- 0003 PCI: 01:02.0 cmd <- 146 PCI: 03:01.0 bridge ctrl <- 0003 PCI: 03:01.0 cmd <- 146 PCI: 04:00.0 cmd <- 142 PCI: 01:02.1 subsystem <- 161f/3016 PCI: 01:02.1 cmd <- 146 PCI: 01:03.0 bridge ctrl <- 000b PCI: 01:03.0 cmd <- 147 PCI: 05:00.0 subsystem <- 161f/3016 PCI: 05:00.0 cmd <- 142 PCI: 05:00.1 subsystem <- 161f/3016 PCI: 05:00.1 cmd <- 142 PCI: 05:06.0 cmd <- 1c3 PCI: 01:04.0 subsystem <- 161f/3016 PCI: 01:04.0 cmd <- 14f PCI: 01:04.1 subsystem <- 161f/3016 PCI: 01:04.1 cmd <- 141 PCI: 01:04.2 subsystem <- 161f/3016 PCI: 01:04.2 cmd <- 141 PCI: 01:04.3 subsystem <- 161f/3016 PCI: 01:04.3 cmd <- 141 PCI: 01:04.6 subsystem <- 161f/3016 PCI: 01:04.6 cmd <- 141 PCI: 00:18.1 subsystem <- 161f/3016 PCI: 00:18.1 cmd <- 140 PCI: 00:18.2 subsystem <- 161f/3016 PCI: 00:18.2 cmd <- 140 PCI: 00:18.3 cmd <- 140 PCI: 00:19.0 cmd <- 140 PCI: 00:19.1 subsystem <- 161f/3016 PCI: 00:19.1 cmd <- 140 PCI: 00:19.2 subsystem <- 161f/3016 PCI: 00:19.2 cmd <- 140 PCI: 00:19.3 cmd <- 140 done. Initializing devices... Root Device init PCI: 00:18.0 init PCI: 01:01.0 init PCI: 01:02.0 init PCI: 01:03.0 init PCI: 01:04.0 init RTC Init enabling HPET @0xfed00000 PNP: 002e.3 init PNP: 002e.6 init PCI: 01:04.1 init IDE1 IDE0 PCI: 01:04.3 init set power on after power fail PCI: 00:18.3 init NB: Function 3 Misc Control.. done. PCI: 00:19.0 init PCI: 00:19.3 init NB: Function 3 Misc Control.. done. APIC_CLUSTER: 0 init Initializing CPU #0 CPU: vendor AMD device f5a Enabling cache
Setting fixed MTRRs(0-88) type: UC Setting fixed MTRRs(0-16) Type: WB Setting fixed MTRRs(24-88) Type: WB DONE fixed MTRRs
Setting fixed MTRRs(0-88) type: UC Setting fixed MTRRs(0-16) Type: WB Setting fixed MTRRs(24-88) Type: WB DONE fixed MTRRs Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB DONE variable MTRRs Clear out the extra MTRR's
MTRR check Fixed MTRRs : Enabled Variable MTRRs: Enabled
Clearing memory 0K - 1048576K: --------------- done Setting up local apic... apic_id: 0 done. CPU #0 Initialized Initializing CPU #1 Waiting for 1 CPUS to stop CPU: vendor AMD device f5a Enabling cache
Setting fixed MTRRs(0-88) type: UC Setting fixed MTRRs(0-16) Type: WB Setting fixed MTRRs(24-88) Type: WB DONE fixed MTRRs
Setting fixed MTRRs(0-88) type: UC Setting fixed MTRRs(0-16) Type: WB Setting fixed MTRRs(24-88) Type: WB DONE fixed MTRRs Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB DONE variable MTRRs Clear out the extra MTRR's
MTRR check Fixed MTRRs : Enabled Variable MTRRs: Enabled
Clearing memory 1048576K - 2097152K: ---------------- done Setting up local apic... apic_id: 1 done. CPU #1 Initialized All AP CPUs stopped Devices initialized Copying IRQ routing tables to 0xf0000...done. Verifing copy of IRQ routing tables at 0xf0000...done Checking IRQ routing table consistency... /home/cwxbuild/modules/linuxbios/hdama/freebios2/src/arch/i386/boot/pirq_routing.c: 28:check_pirq_routing_table() - irq_routing_table located at: 0x000f0000 done. Wrote the mp table end at: 00000020 - 00000224 Wrote linuxbios table at: 00000500 - 00000d88 checksum 9e78
Welcome to elfboot, the open sourced starter. January 2002, Eric Biederman. Version 1.3
Found ELF candiate at offset 0 Loading Etherboot version: 5.2.4js1 Dropping non PT_LOAD segment New segment addr 0x20000 size 0x41e65 offset 0xc0 filesize 0x993a (cleaned up) New segment addr 0x20000 size 0x41e65 offset 0xc0 filesize 0x993a Loading Segment: addr: 0x000000007ffc4000 memsz: 0x0000000000010000 filesz: 0x000000000000993a Clearing Segment: addr: 0x000000007ffcd93a memsz: 0x00000000000066c6 Loading Segment: addr: 0x0000000000030000 memsz: 0x0000000000031e65 filesz: 0x0000000000000000 Clearing Segment: addr: 0x0000000000030000 memsz: 0x0000000000031e65 Jumping to boot code at 0x20000 ROM segment 0x0000 length 0x0000 reloc 0x00020000 CPU 2486 Mhz Etherboot 5.2.4js1 (GPL) http://etherboot.org ELF64 ELF with TFTP SLAM LACP for [EEPRO100][E1000][3C90X][TG3][IDE] Relocating _text from: [00029930,00063080) to [7fec68b0,7ff00000) �Probing pci nic... (D)isk or (Q)uit? [tg3-5702X]Ethernet addr: 00:50:45:5C:34:1A Tigon3 [partno(BCM95702A20) rev 1002 PHY(5703)] (PCI:66MHz:32-bit) Link is up at 100 Mbps, full duplex. Searching for server (DHCP)... ...Me: 10.1.255.254, Server: 10.1.1.1, Gateway 10.1.1.1 Loading 10.1.1.1:vmlinuz.vo ...(ELF)... ................
Thanks for the help.
Sebastian Lara wrote:
Hi,
2009/9/3 Stefan Reinauer stepan@coresystems.de:
Sebastian Lara wrote:
Hi,
I'm trying to revive a Linux Netwox eVelocity 2 Cluster with LinuxBIOS-1.1.7.8 and Etherboot 5.2 on compute nodes. After some trouble, I have access to a shell after node system installation over network. My problem now is that I can't get localboot to work.
What's the error?
Boot from (N)etwork (D)isk or (Q)uit? D
Probing pci disk... [IDE]LBA48 mode disk-1 78150744k cap: 2f00 Searching for image... ................................<abort> Probing pci disk... [IDE] Probing isa disk...
<sleep>
Etherboot searches for an ELF header in the first 8K. I don't really know how can I do this. I had tried to use a ELF image of the kernel installed in the node but it doesn't work.
I think you need to run "mkelfimage" on a Linux kernel in order to get it booted and dump that directly on the first partition
I need to know if I can re-burn the BIOS with a newer version of coreboot or something else, so I can use PXE and use GRUB for booting from local.
What mainboard type is that? Can you post a boot log from serial console?
It's there some way to know the mainboard type without open a node case? Get permission to do that may take some time.
if you can run "nvramtool" from a current coreboot tree on that node, it can dump the fields
Here is the boot log:
Found ELF candiate at offset 0 Loading Etherboot version: 5.2.4js1 Dropping non PT_LOAD segment New segment addr 0x20000 size 0x41e65 offset 0xc0 filesize 0x993a (cleaned up) New segment addr 0x20000 size 0x41e65 offset 0xc0 filesize 0x993a Loading Segment: addr: 0x000000007ffc4000 memsz: 0x0000000000010000 filesz: 0x000000000000993a Clearing Segment: addr: 0x000000007ffcd93a memsz: 0x00000000000066c6 Loading Segment: addr: 0x0000000000030000 memsz: 0x0000000000031e65 filesz: 0x0000000000000000 Clearing Segment: addr: 0x0000000000030000 memsz: 0x0000000000031e65 Jumping to boot code at 0x20000 ROM segment 0x0000 length 0x0000 reloc 0x00020000 CPU 2486 Mhz Etherboot 5.2.4js1 (GPL) http://etherboot.org ELF64 ELF with TFTP SLAM LACP for [EEPRO100][E1000][3C90X][TG3][IDE] Relocating _text from: [00029930,00063080) to [7fec68b0,7ff00000) �Probing pci nic... (D)isk or (Q)uit? [tg3-5702X]Ethernet addr: 00:50:45:5C:34:1A Tigon3 [partno(BCM95702A20) rev 1002 PHY(5703)] (PCI:66MHz:32-bit) Link is up at 100 Mbps, full duplex. Searching for server (DHCP)... ...Me: 10.1.255.254, Server: 10.1.1.1, Gateway 10.1.1.1 Loading 10.1.1.1:vmlinuz.vo ...(ELF)... ................
Does booting over the network work?
2009/9/4 Stefan Reinauer stepan@coresystems.de:
Sebastian Lara wrote:
Hi, [...]
[...]
I think you need to run "mkelfimage" on a Linux kernel in order to get it booted and dump that directly on the first partition
How should I dump that image? should I use dd or just put /boot as the first partition of the disc?
Does booting over the network work?
Network booting works fine. I can install the node without any problem.
quick meta-question: why do you want to boot from local? Reason I ask is that the newest compute clusters are moving away from local disks for many reasons.
Using a local disk from storage is fine, people still do that, but booting from local can make life harder when errors crop up.
ron
2009/9/4 ron minnich rminnich@gmail.com:
quick meta-question: why do you want to boot from local? Reason I ask is that the newest compute clusters are moving away from local disks for many reasons.
Using a local disk from storage is fine, people still do that, but booting from local can make life harder when errors crop up.
Just because we start using ROCKS Clusters. Are there some cluster distributions that can run without local disk on nodes?
Thanks
On Fri, Sep 4, 2009 at 6:40 PM, Sebastian Laraslara@udec.cl wrote:
Just because we start using ROCKS Clusters. Are there some cluster distributions that can run without local disk on nodes?
Yes. I strongly suggest you take a look at this: http://onesis.org/
No local disks required. How big is your cluster? If less than 128 nodes, just run NFS root with one NFS root server. onesis is incredibly clever, in that you can easily configure it to put each nodes /tmp, /var, and so on in local ramfs or on a local disk.
We use it to run a 4400 node (not a typo) system at sandia: it scales. We have many different types of installations, and it runs well one even very small systems, like my Geode clusters.
And, it's very network oriented, but allows you to have data on local disks. I'm really sold on it. One of our newer interns, Chris Kinney, who is also on this list, can tell you more. He set up an 80-node cluster, with no previous experience, in an afternoon.
I think if you went with onesis you could avoid having to refresh your bios. Your life would be easier.
For compute node clusters, in fact, the best thing you can do is yank the disks and throw them away -- unless you need them for local data storage. They tend to cause trouble. I have not built a disk-based cluster in 10 years, and I've built clusters that range in size from 4 nodes to 2048 nodes. Local disks are just trouble.
While I respect the work the Rocks guys have done, I think onesis is a good way to go. So does Sun: they use onesis for their commercial cluster offerings.
Thanks
ron
2009/9/4 ron minnich rminnich@gmail.com:
On Fri, Sep 4, 2009 at 6:40 PM, Sebastian Laraslara@udec.cl wrote:
Just because we start using ROCKS Clusters. Are there some cluster distributions that can run without local disk on nodes?
Yes. I strongly suggest you take a look at this: http://onesis.org/
No local disks required. How big is your cluster? If less than 128 nodes, just run NFS root with one NFS root server. onesis is incredibly clever, in that you can easily configure it to put each nodes /tmp, /var, and so on in local ramfs or on a local disk.
We use it to run a 4400 node (not a typo) system at sandia: it scales. We have many different types of installations, and it runs well one even very small systems, like my Geode clusters.
And, it's very network oriented, but allows you to have data on local disks. I'm really sold on it. One of our newer interns, Chris Kinney, who is also on this list, can tell you more. He set up an 80-node cluster, with no previous experience, in an afternoon.
I think if you went with onesis you could avoid having to refresh your bios. Your life would be easier.
For compute node clusters, in fact, the best thing you can do is yank the disks and throw them away -- unless you need them for local data storage. They tend to cause trouble. I have not built a disk-based cluster in 10 years, and I've built clusters that range in size from 4 nodes to 2048 nodes. Local disks are just trouble.
While I respect the work the Rocks guys have done, I think onesis is a good way to go. So does Sun: they use onesis for their commercial cluster offerings.
Thanks
This sounds really good. I will definitely try this. Thanks.
Hi,
2009/9/6 Sebastian Lara slara@udec.cl:
2009/9/4 ron minnich rminnich@gmail.com:
On Fri, Sep 4, 2009 at 6:40 PM, Sebastian Laraslara@udec.cl wrote:
Just because we start using ROCKS Clusters. Are there some cluster distributions that can run without local disk on nodes?
Yes. I strongly suggest you take a look at this: http://onesis.org/
Thanks for the suggestion. My cluster is finnally up. The installation went fine
On Thu, Sep 10, 2009 at 3:22 PM, Sebastian Laraslara@udec.cl wrote:
Thanks for the suggestion. My cluster is finnally up. The installation went fine
If you have onesis questions don't hesitate to contact me. Also, what size cluster? Just curious.
thanks
ron
2009/9/10 ron minnich rminnich@gmail.com:
On Thu, Sep 10, 2009 at 3:22 PM, Sebastian Laraslara@udec.cl wrote:
Thanks for the suggestion. My cluster is finnally up. The installation went fine
If you have onesis questions don't hesitate to contact me. Also, what size cluster? Just curious.
It's a 10 node, 20 cores, 2.35GHz per core 2Gb RAM per node, and infiniband.
Thanks ;-)