The LinuxBIOS boot doesn't always work for me 100%. Sometimes I get Exception 6 in LB itself (causing a halt), other times, the Kernel has an Exception 6, or it can't find the hard drive.
Here's the weird part: I flip back to the factory BIOS, it says "CMOS checksum invalid, using defaults". I hit DEL, save the settings to CMOS, and power down. I flip back to LB, and all of a sudden, it's working fine for a while. The kernel is a stock Ubuntu kernel with initrd that always works with the stock BIOS + grub loading.
1) Does LB store anything in the CMOS? 2) If yes, is there anything in there that could become corrupted and cause "weird issues" as described above? 3) If no, any other guesses?
-- Eric
P.S>> I haven't had any issues since I got VGA working, but I've only powered up the machine twice since that flashing.
Eric Poulsen wrote:
- Does LB store anything in the CMOS?
it can. It does store a few abouts about the type of boot that occurred (fallback or normal).
- If yes, is there anything in there that could become corrupted and
cause "weird issues" as described above?
What's exception 6? I don't recall.
sometimes clocking info is stored in CMOS. There could be a collision here.
If you can tell more it would be good to know.
ron
* Ronald G Minnich rminnich@lanl.gov [060425 20:44]:
- If yes, is there anything in there that could become corrupted and
cause "weird issues" as described above?
What's exception 6? I don't recall.
illegal opcode.
I think I saw this on opteron before as well, but I don't remember the issue. i seem to remember that writing all 0 to cmos and populating it from scratch made it work reliably.
sometimes clocking info is stored in CMOS. There could be a collision here.
My kernel crash issue is attached below. This one isn't an Exception 6 (that I can tell), but it's one I've seen before. This happened twice in a row, after the machine had been off for a while. As usual, flipping to the factory BIOS, seeing the "corrupt CMOS" message, and re-writing the CMOS fixed the issue. I immediately flipped back to LB, and it worked as expected.
I looked at the CMOS code in src/pc80/mc146818rtc.c. The code that prints "Invalid CMOS LB checksum" is found immediately after a call to CMOS_WRITE, then it sets the checksum. If you remove the comments in the source, the following two code snippets are contiguous, but I've inserted comments.
The following works ok: === checksum_invalid = !rtc_checksum_valid(PC_CKS_RANGE_START,PC_CKS_RANGE_END,PC_CKS_LOC);
if (invalid || cmos_invalid || checksum_invalid) { printk_warning("RTC:%s%s%s zeroing cmos\n", invalid?" Clear requested":"", cmos_invalid?" Power Problem":"", checksum_invalid?" Checksum invalid":""); ===
The following fails. You can see at the end of this block, there is a routine to set the checksum. Here's my guess: the CMOS_WRITE commands below do not work correctly, causing the checksum error. A valid checksum is computed and written, which prevents the error from occurring above on the next reboot. OR, 'Invalid CMOS LB checksum' is perfectly normal because the checksum should be invalid after a write. In the case of the latter, the checksumming that occurs below is pointless, since it's about to be written anyway. ===
/* Setup the real time clock */ CMOS_WRITE(RTC_CONTROL_DEFAULT, RTC_CONTROL); /* Setup the frequency it operates at */ CMOS_WRITE(RTC_FREQ_SELECT_DEFAULT, RTC_FREQ_SELECT);
/* See if there is a LB CMOS checksum error */ checksum_invalid = !rtc_checksum_valid(LB_CKS_RANGE_START,LB_CKS_RANGE_END,LB_CKS_LOC); if(checksum_invalid) printk_debug("Invalid CMOS LB checksum\n");
/* Make certain we have a valid checksum */ rtc_set_checksum(PC_CKS_RANGE_START,PC_CKS_RANGE_END,PC_CKS_LOC);
===
-- Eric
Ronald G Minnich wrote:
Eric Poulsen wrote:
- Does LB store anything in the CMOS?
it can. It does store a few abouts about the type of boot that occurred (fallback or normal).
- If yes, is there anything in there that could become corrupted and
cause "weird issues" as described above?
What's exception 6? I don't recall.
sometimes clocking info is stored in CMOS. There could be a collision here.
If you can tell more it would be good to know.
ron
0
LinuxBIOS-1.1.8.0Fallback Tue Apr 25 20:17:47 PDT 2006 starting... Enabling mainboard devices Enabling shadow ram vt8623 init starting Detecting Memory Number of Banks 04 Number of Rows 0d Priamry DRAM width08 No Columns 0a MA type e0 Bank 0 (*16 Mb) 10 No Physical Banks 01 Total Memory (*16 Mb) 10 CAS Supported 2 2.5 3 Cycle time at CL X (nS)50 Cycle time at CL X-0.5 (nS)60 Cycle time at CL X-1 (nS)75 Starting at CAS 3 We can do CAS 2.5 We can do CAS 2 tRP 48 tRCD 48 tRAS 28 Low Bond 00 High Bondc7 Setting DQS delay84vt8623 done 00:06 11 23 31 06 00 30 22 00 00 00 06 00 00 00 00 10:08 00 00 d0 00 00 00 00 00 00 00 00 00 00 00 00 20:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30:00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40:00 18 88 80 82 44 00 00 18 99 88 80 82 44 00 00 50:c8 de cf 88 e0 07 00 00 e0 00 10 10 10 10 00 00 60:02 ff 00 30 d6 32 01 20 42 2d 43 58 00 44 00 00 70:82 48 00 01 01 08 50 00 01 00 00 00 00 00 00 02 80:0f 65 00 00 80 00 00 00 02 00 00 00 00 00 00 00 90:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0:02 c0 20 00 07 02 00 1f 04 00 00 00 2f 02 04 00 b0:00 00 00 00 40 00 00 00 a8 00 00 00 00 00 00 00 c0:01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0:00 dd 00 00 00 00 01 00 40 00 00 00 00 00 00 00 f0:00 00 00 00 00 00 12 13 00 00 00 00 00 00 00 00 AGP Doing MTRR init. Copying LinuxBIOS to ram. Jumping to LinuxBIOS. LinuxBIOS-1.1.8.0Fallback Tue Apr 25 20:17:47 PDT 2006 booting... clocks_per_usec: 838 Enumerating buses... Finding PCI configuration type. PCI: Using configuration type 1 PCI_DOMAIN: 0000 enabled APIC_CLUSTER: 0 enabled PCI: pci_scan_bus for bus 0 PCI: 00:00.0 [1106/3123] enabled PCI: 00:01.0 [1106/b091] enabled Disabling static device: PCI: 00:0a.0 Disabling static device: PCI: 00:0a.1 In vt8235_enable 1106 3038. PCI: 00:10.0 [1106/3038] enabled In vt8235_enable 1106 3038. PCI: 00:10.1 [1106/3038] enabled In vt8235_enable 1106 3038. PCI: 00:10.2 [1106/3038] enabled In vt8235_enable 1106 3104. PCI: 00:10.3 [1106/3104] enabled In vt8235_enable 1106 3177. Initialising Devices PCI: 00:11.0 [1106/3177] enabled In vt8235_enable 1106 0571. PCI: 00:11.1 [1106/0571] enabled In vt8235_enable 1106 3059. PCI: 00:11.5 [1106/3059] enabled In vt8235_enable ffff ffff. In vt8235_enable 1106 3065. PCI: 00:12.0 [1106/3065] enabled PCI: pci_scan_bus for bus 1 PCI: 01:00.0 [1106/3122] enabled PCI: pci_scan_bus returning with max=01 vt1211 enabling PNP devices. PNP: 002e.0 enabled vt1211 enabling PNP devices. PNP: 002e.1 enabled vt1211 enabling PNP devices. PNP: 002e.2 enabled vt1211 enabling PNP devices. PNP: 002e.3 enabled vt1211 enabling PNP devices. PNP: 002e.b enabled PCI: pci_scan_bus returning with max=01 done Allocating resources... Reading resources... Done reading resources. Setting resources... I would set ram size to 0x40000 Kbytes PCI: 00:10.0 20 <- [0x0000001800 - 0x000000181f] io PCI: 00:10.1 20 <- [0x0000001820 - 0x000000183f] io PCI: 00:10.2 20 <- [0x0000001840 - 0x000000185f] io PCI: 00:10.3 10 <- [0x00febff000 - 0x00febff0ff] mem PNP: 002e.0 60 <- [0x00000003f0 - 0x00000003f7] io PNP: 002e.0 70 <- [0x0000000006 - 0x0000000006] irq PNP: 002e.0 74 <- [0x0000000002 - 0x0000000002] drq PNP: 002e.1 60 <- [0x0000000378 - 0x000000037f] io PNP: 002e.1 70 <- [0x0000000007 - 0x0000000007] irq PNP: 002e.1 74 <- [0x0000000003 - 0x0000000003] drq PNP: 002e.2 60 <- [0x00000003f8 - 0x00000003ff] io PNP: 002e.2 70 <- [0x0000000004 - 0x0000000004] irq PNP: 002e.3 60 <- [0x00000002f8 - 0x00000002ff] io PNP: 002e.3 70 <- [0x0000000003 - 0x0000000003] irq PNP: 002e.b 60 <- [0x000000ec00 - 0x000000ecff] io PCI: 00:11.1 20 <- [0x0000001860 - 0x000000186f] io PCI: 00:11.5 10 <- [0x0000001000 - 0x00000010ff] io PCI: 00:12.0 10 <- [0x0000001400 - 0x00000014ff] io PCI: 00:12.0 14 <- [0x00fec00000 - 0x00fec000ff] mem Done setting resources. Done allocating resources. Enabling resources... PCI: 00:00.0 cmd <- 146 PCI: 00:01.0 bridge ctrl <- 000f PCI: 00:01.0 cmd <- 147 PCI: 01:00.0 cmd <- 140 PCI: 00:10.0 subsystem <- 00/00 PCI: 00:10.0 cmd <- 141 PCI: 00:10.1 subsystem <- 00/00 PCI: 00:10.1 cmd <- 141 PCI: 00:10.2 subsystem <- 00/00 PCI: 00:10.2 cmd <- 141 PCI: 00:10.3 subsystem <- 00/00 PCI: 00:10.3 cmd <- 142 PCI: 00:11.0 cmd <- 147 PNP: 002e.0 - enabling PNP: 002e.1 - enabling PNP: 002e.2 - enabling PNP: 002e.3 - enabling PNP: 002e.b - enabling PCI: 00:11.1 cmd <- 147 PCI: 00:11.5 subsystem <- 00/00 PCI: 00:11.5 cmd <- 141 PCI: 00:12.0 cmd <- 1c3 done. Initializing devices... Root Device init PCI: 00:10.0 init PCI: 00:10.1 init PCI: 00:10.2 init PCI: 00:10.3 init PCI: 00:11.0 init vt8235 init RTC Init Invalid CMOS LB checksum pci_routing_fixup: dev is 00010fa0 setting firewire setting usb Assigning IRQ 5 to 0:10.0 Readback = 5 pci_level_irq: lower order bits are wrong: want 0x0, got 0x20 Assigning IRQ 9 to 0:10.1 Readback = 9 pci_level_irq: lower order bits are wrong: want 0x0, got 0x20 Assigning IRQ 9 to 0:10.2 Readback = 9 pci_level_irq: lower order bits are wrong: want 0x0, got 0x20 Assigning IRQ 5 to 0:10.3 Readback = 5 pci_level_irq: lower order bits are wrong: want 0x0, got 0x20 setting vt8235 Assigning IRQ 5 to 0:11.1 Readback = 5 pci_level_irq: lower order bits are wrong: want 0x0, got 0x20 Assigning IRQ 9 to 0:11.5 Readback = 9 pci_level_irq: lower order bits are wrong: want 0x0, got 0x20 setting ethernet Assigning IRQ 5 to 0:12.0 Readback = 5 pci_level_irq: lower order bits are wrong: want 0x0, got 0x20 setting vga Assigning IRQ 5 to 1:0.0 Readback = 5 pci_level_irq: lower order bits are wrong: want 0x0, got 0x20 setting pci slot setting cardbus slot setting riser slot PNP: 002e.0 init PNP: 002e.1 init PNP: 002e.2 init PNP: 002e.3 init PNP: 002e.b init PCI: 00:11.1 init Enabling VIA IDE. ide_init: enabling compatibility IDE addresses enables in reg 0x42 0x0 enables in reg 0x42 read back as 0x0 enables in reg 0x40 0x13 enables in reg 0x40 read back as 0x13 enables in reg 0x9 0x8a enables in reg 0x9 read back as 0x8a command in reg 0x4 0x7 command in reg 0x4 reads back as 0x7 PCI: 00:11.5 init PCI: 00:12.0 init Configuring VIA Rhine LAN APIC_CLUSTER: 0 init Initializing CPU #0 CPU: vendor Centaur device 673 Enabling cache
Setting fixed MTRRs(0-88) type: UC Setting fixed MTRRs(0-16) Type: WB Setting fixed MTRRs(24-88) Type: WB DONE fixed MTRRs Setting variable MTRR 0, base: 0MB, range: 128MB, type WB Setting variable MTRR 1, base: 128MB, range: 64MB, type WB Setting variable MTRR 2, base: 192MB, range: 32MB, type WB DONE variable MTRRs Clear out the extra MTRR's
MTRR check Fixed MTRRs : Enabled Variable MTRRs: Enabled
Disabling local apic...done. CPU #0 Initialized PCI: 00:00.0 init VT8623 random fixup ... Frame buffer at d0000000 PCI: 00:01.0 init VT8623 AGP random fixup ... PCI: 01:00.0 init VGA random fixup ... INSTALL REAL-MODE IDT DO THE VGA BIOS found VGA: vid=1106, did=3122 rom base, size: fffc0000 write_protect_vgabios bus/devfn = 0x100 biosint: INT# 0x15 biosint: eax 0x5f00 ebx 0x18538 ecx 0x17fa0 edx 0xa biosint: ebp 0x17f70 esp 0xff2 edi 0xecf0 esi 0x18538 biosint: ip 0x637f cs 0xc000 flags 0x46 biosint: INT# 0x1a biosint: eax 0xb108 ebx 0x10000 ecx 0x10000 edx 0x3d5 biosint: ebp 0x17f70 esp 0xfcc edi 0xf6 esi 0x155eb biosint: ip 0x40da cs 0xc000 flags 0x46 0xb108: bus 0 devfn 0x0 reg 0xf6 val 0x12 biosint: INT# 0x15 biosint: eax 0x5f0f ebx 0x18538 ecx 0x7fa0 edx 0x3d5 biosint: ebp 0x17f70 esp 0xfee edi 0x44 esi 0x18538 biosint: ip 0x647e cs 0xc000 flags 0x7 biosint: INT# 0x15 biosint: eax 0x5f02 ebx 0x18538 ecx 0x7f01 edx 0x3d5 biosint: ebp 0x17f70 esp 0xfdc edi 0x44 esi 0x18538 biosint: ip 0x63cb cs 0xc000 flags 0x46 biosint: INT# 0x15 biosint: eax 0x5f18 ebx 0x18501 ecx 0x7fa0 edx 0x3d5 biosint: ebp 0x17f70 esp 0xfde edi 0x44 esi 0x18538 biosint: ip 0x6496 cs 0xc000 flags 0x46 biosint: INT# 0x15 biosint: eax 0x5f06 ebx 0x18001 ecx 0x1 edx 0x0 biosint: ebp 0x10fd6 esp 0xfa4 edi 0x0 esi 0x14672 biosint: ip 0x63dc cs 0xc000 flags 0x246 biosint: INT# 0x15 biosint: eax 0x5f08 ebx 0x10d01 ecx 0x8301 edx 0xd4 biosint: ebp 0x10fd6 esp 0xfa4 edi 0x0 esi 0x10d0e biosint: ip 0x63e8 cs 0xc000 flags 0x246 Devices initialized Copying IRQ routing tables to 0xf0000...done. Verifing copy of IRQ routing tables at 0xf0000...done Checking IRQ routing table consistency... check_pirq_routing_table() - irq_routing_table located at: 0x000f0000 done. ACPI: Writing ACPI tables at f0400... ACPI: * FACS ACPI: * DSDT @ 000f049e Length 3f0 ACPI: * FADT ACPI: added table 1/5 Length now 40 ACPI: done. Moving GDT to 0x500...ok Wrote linuxbios table at: 00000530 - 00000b80 checksum 68b7
Welcome to elfboot, the open sourced starter. January 2002, Eric Biederman. Version 1.3
33:stream_init() - rom_stream: 0xfffd0000 - 0xfffeffff Found ELF candiate at offset 0 New segment addr 0x100000 size 0x23760 offset 0xc0 filesize 0x96e8 (cleaned up) New segment addr 0x100000 size 0x23760 offset 0xc0 filesize 0x96e8 New segment addr 0x123760 size 0x48 offset 0x97c0 filesize 0x48 (cleaned up) New segment addr 0x123760 size 0x48 offset 0x97c0 filesize 0x48 Dropping non PT_LOAD segment Dropping non PT_LOAD segment Loading Segment: addr: 0x0000000000100000 memsz: 0x0000000000023760 filesz: 0x00000000000096e8 Clearing Segment: addr: 0x00000000001096e8 memsz: 0x000000000001a078 Loading Segment: addr: 0x0000000000123760 memsz: 0x0000000000000048 filesz: 0x0000000000000048 Jumping to boot code at 0x107860 FILO version 0.4.2 (root@embedded) Tue Apr 25 20:15:07 PDT 2006 boot: hda1:/vmlinuz root=/dev/hda1 console=tty0 console=ttyS0,115200 hda: LBA 80GB: WDC WD800JB-00FMA0 Mounted ext2fs Found Linux version 2.6.16.5 (root@Proteus) #3 Sun Apr 16 21:06:34 PDT 2006 bzImage. Loading kernel... ok Jumping to entry point... [17179569.184000] Linux version 2.6.16.5 (root@Proteus) (gcc version 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)) #3 Sun Apr 16 21:06:34 PDT 2006 [17179569.184000] BIOS-provided physical RAM map: [17179569.184000] BIOS-e820: 0000000000000be0 - 00000000000a0000 (usable) [17179569.184000] BIOS-e820: 0000000000100000 - 000000000e000000 (usable) [17179569.184000] 0MB HIGHMEM available. [17179569.184000] 224MB LOWMEM available. [17179569.184000] DMI not present or invalid. [17179569.184000] ACPI: PM-Timer IO Port: 0x408 [17179569.184000] Allocating PCI resources starting at 10000000 (gap: 0e000000:f2000000) [17179569.184000] Built 1 zonelists [17179569.184000] Kernel command line: root=/dev/hda1 console=tty0 console=ttyS0,115200 [17179569.184000] No local APIC present or hardware disabled [17179569.184000] Initializing CPU#0 [17179569.184000] PID hash table entries: 1024 (order: 10, 16384 bytes) [17179569.184000] Detected 533.193 MHz processor. [17179569.184000] Using pmtmr for high-res timesource [17179569.184000] Console: colour VGA+ 80x25 [17179571.984000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) [17179571.992000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) [17179572.052000] Memory: 223968k/229376k available (1628k kernel code, 4992k reserved, 604k data, 232k init, 0k highmem) [17179572.064000] Checking if this processor honours the WP bit even in supervisor mode... Ok. [17179572.152000] Calibrating delay using timer specific routine.. 1068.40 BogoMIPS (lpj=2136811) [17179572.160000] Security Framework v1.0.0 initialized [17179572.164000] SELinux: Disabled at boot. [17179572.168000] Mount-cache hash table entries: 512 [17179572.172000] CPU: L1 I Cache: 64K (32 bytes/line), D cache 64K (32 bytes/line) [17179572.180000] CPU: L2 Cache: 64K (32 bytes/line) [17179572.184000] CPU: Centaur VIA Samuel 2 stepping 03 [17179572.192000] Checking 'hlt' instruction... OK. [17179572.268000] ACPI: setting ELCR to 0020 (from 0220) [17179572.276000] NET: Registered protocol family 16 [17179572.280000] EISA bus registered [17179572.284000] ACPI: bus type pci registered [17179572.340000] Unable to handle kernel paging request at virtual address 6de8d753 [17179572.340000] printing eip: [17179572.340000] c00fab46 [17179572.340000] *pde = 00000000 [17179572.340000] Oops: 0002 [#1] [17179572.340000] Modules linked in: [17179572.340000] CPU: 0 [17179572.340000] EIP: 0060:[<c00fab46>] Not tainted VLI [17179572.340000] EFLAGS: 00010087 (2.6.16.5 #3) [17179572.340000] EIP is at 0xc00fab46 [17179572.340000] eax: 00008701 ebx: 000f0000 ecx: 0000d5b4 edx: 00000000 [17179572.340000] esi: 00000206 edi: c02f9874 ebp: 000fa960 esp: c121ffb6 [17179572.340000] ds: 007b es: 0000 ss: 0068 [17179572.340000] Process swapper (pid: 1, threadinfo=c121e000 task=c1217a70) [17179572.340000] Stack: <0>c00fa99d 00000000 a2350006 0060c034 20440000 83c8c02b 0000c036 00000000 [17179572.340000] 00000000 03070000 0290c010 0000c010 10050000 0000c010 00000000 00000000 [17179572.340000] 00000000 00000000 [17179572.340000] Call Trace: [17179572.340000] Code: 00 f8 c3 87 db 52 50 e8 95 01 00 00 72 12 66 51 66 8b df 8a eb e8 23 02 00 00 66 59 8a c8 b4 00 8a d4 58 8a e2 5a 0a e4 74 01 f5 <d3> 90 52 50 e8 6d 01 10 00 72 1a 66 f7 c7 01 00 74 04 b4 87 eb [17179572.340000] <0>Kernel panic - not syncing: Attempted to kill init! [17179572.340000] 0
On 4/26/06, Eric Poulsen eric@zyxod.com wrote:
My kernel crash issue is attached below. This one isn't an Exception 6 (that I can tell), but it's one I've seen before. This happened twice
Have you used memtest86 as a payload and verified that your RAM is solid?
-- Richard A. Smith
Richard Smith wrote:
On 4/26/06, Eric Poulsen eric@zyxod.com wrote:
My kernel crash issue is attached below. This one isn't an Exception 6 (that I can tell), but it's one I've seen before. This happened twice
Have you used memtest86 as a payload and verified that your RAM is solid?
-- Richard A. Smith
No, I haven't. I have memtest86 as a boot option from factory BIOS / Grub -- I'll fire it up at lunch and let it go for a while.
On Wed, Apr 26, 2006 at 10:41:27AM -0700, Eric Poulsen wrote:
No, I haven't. I have memtest86 as a boot option from factory BIOS / Grub -- I'll fire it up at lunch and let it go for a while.
If it works there then please also set it up to run with LinuxBIOS to see if anything is different with the RAM setup.
//Peter
On Wed, Apr 26, 2006 at 09:02:12AM -0700, Eric Poulsen wrote:
As usual, flipping to the factory BIOS, seeing the "corrupt CMOS" message, and re-writing the CMOS fixed the issue.
Are you sure this is actually the case, as opposed to "after rebooting with the factory BIOS the system does not crash immediately on the next boot with LinuxBIOS" - they are quite different.
I immediately flipped back to LB, and it worked as expected.
Worked reliably or did not crash while you were looking?
Can you reliably reproduce the crash? If not there's no way to tell if the problem has been fixed or merely isn't manifesting itself at that particular point in time.
Does just rebooting with LinuxBIOS produce different results than factory(resetCMOS)->LinuxBIOS?
I second Richard on running memtest86, RAM problems can cause all sorts of funny things.
I looked at the CMOS code in src/pc80/mc146818rtc.c.
Any system that requires special data to be in CMOS or anywhere else and does not validate this data before using it is broken.
If one of the OS/mainboard combinations LinuxBIOS works with requires data in CMOS I guess it could just as well be the OS' responsibility to validate/create it, but such a dependency would be kind of stupid IMHO..
//Peter
Peter Stuge wrote:
On Wed, Apr 26, 2006 at 09:02:12AM -0700, Eric Poulsen wrote:
As usual, flipping to the factory BIOS, seeing the "corrupt CMOS" message, and re-writing the CMOS fixed the issue.
Are you sure this is actually the case, as opposed to "after rebooting with the factory BIOS the system does not crash immediately on the next boot with LinuxBIOS" - they are quite different.
I'm not sure I fully understand your definition / distinction. Here are some options:
1) Use factory BIOS, re-save CMOS, Boot OS, Reboot later using LB 2) Use factory BIOS, NOT re-save CMOS, Boot OS, Reboot later using LB 3) Use factory BIOS, re-save CMOS, powerdown, boot use LB 4) Use factory BIOS, NOT re-save CMOS, powerdown, boot use LB 5) Other ?
"re-save CMOS" means entering BIOS menu and choosing "save changes and exit"
When I have the crash problem, I have been using option #3. I'm not sure if that answers your question =)
If the "using defaults" message from the Factory BIOS does NOT re-write the CMOS, I suspect that #2 and #4 WON'T fix the problem. I'm fairly certain that #1 fixes the issue.
Actually booting the OS after the CMOS reset doesn't seem to be necessary.
I immediately flipped back to LB, and it worked as expected.
Worked reliably or did not crash while you were looking?
The crash _always_ occurs during initial kernel execution, before 'init' starts. It never crashes once it fully boots. I'm not sure what "reliably" vs "while looking" means. Once it goes into 'crash mode', it never fully boots.
Can you reliably reproduce the crash? If not there's no way to tell if the problem has been fixed or merely isn't manifesting itself at that particular point in time.
Does just rebooting with LinuxBIOS produce different results than factory(resetCMOS)->LinuxBIOS?
Rebooting with LB crashes every time, until I reset the CMOS with the Factory BIOS. This is why I think it might be a CMOS issue -- the crashing seems stateful.
I second Richard on running memtest86, RAM problems can cause all sorts of funny things.
I'll hit the ram test ASAP. I've had other weird issues, such as the kernel taking a REALLY LONG time to initialize stuff. This is new RAM, so hopefully still under warranty.
I looked at the CMOS code in src/pc80/mc146818rtc.c.
Any system that requires special data to be in CMOS or anywhere else and does not validate this data before using it is broken.
If by "system" you mean the BIOS, then I agree. AFAIK, the only setting that Linux (or any OS) uses from the CMOS is the RTC.
If one of the OS/mainboard combinations LinuxBIOS works with requires data in CMOS I guess it could just as well be the OS' responsibility to validate/create it, but such a dependency would be kind of stupid IMHO..
//Peter
On Wed, 26 Apr 2006, Eric Poulsen wrote:
Peter Stuge wrote:
On Wed, Apr 26, 2006 at 09:02:12AM -0700, Eric Poulsen wrote:
As usual, flipping to the factory BIOS, seeing the "corrupt CMOS" message, and re-writing the CMOS fixed the issue.
Are you sure this is actually the case, as opposed to "after rebooting with the factory BIOS the system does not crash immediately on the next boot with LinuxBIOS" - they are quite different.
I'm not sure I fully understand your definition / distinction. Here are some options:
- Use factory BIOS, re-save CMOS, Boot OS, Reboot later using LB
- Use factory BIOS, NOT re-save CMOS, Boot OS, Reboot later using LB
- Use factory BIOS, re-save CMOS, powerdown, boot use LB
- Use factory BIOS, NOT re-save CMOS, powerdown, boot use LB
- Other ?
"re-save CMOS" means entering BIOS menu and choosing "save changes and exit"
When I have the crash problem, I have been using option #3. I'm not sure if that answers your question =)
If the "using defaults" message from the Factory BIOS does NOT re-write the CMOS, I suspect that #2 and #4 WON'T fix the problem. I'm fairly certain that #1 fixes the issue.
Actually booting the OS after the CMOS reset doesn't seem to be necessary.
I immediately flipped back to LB, and it worked as expected.
Worked reliably or did not crash while you were looking?
The crash _always_ occurs during initial kernel execution, before 'init' starts. It never crashes once it fully boots. I'm not sure what "reliably" vs "while looking" means. Once it goes into 'crash mode', it never fully boots.
Can you reliably reproduce the crash? If not there's no way to tell if the problem has been fixed or merely isn't manifesting itself at that particular point in time.
Does just rebooting with LinuxBIOS produce different results than factory(resetCMOS)->LinuxBIOS?
Rebooting with LB crashes every time, until I reset the CMOS with the Factory BIOS. This is why I think it might be a CMOS issue -- the crashing seems stateful.
I second Richard on running memtest86, RAM problems can cause all sorts of funny things.
I'll hit the ram test ASAP. I've had other weird issues, such as the kernel taking a REALLY LONG time to initialize stuff. This is new RAM, so hopefully still under warranty.
[..] You might try replacing the CMOS battery. If its voltage is a little low it would cause the CMOS to loose a bit or two over time.
Russ
Richard Smith wrote:
On 4/26/06, Eric Poulsen eric@zyxod.com wrote:
My kernel crash issue is attached below. This one isn't an Exception 6 (that I can tell), but it's one I've seen before. This happened twice
Have you used memtest86 as a payload and verified that your RAM is solid?
-- Richard A. Smith
Factory BIOS: Ran memtest86+ for four hours -- no errors.
Walltime Cached RsvdMem MemMap Cache ECC Test 224M 76K e820-std on off std
LinuxBIOS + filo Errors from C0000 to EFEFC Walltime Cached RsvdMem MemMap Cache ECC Test 224M 0 LinuxBIOS on off std
Obviously, memtest86 isn't skipping the cmos -- I managed to pause it just as it hit C000, and looked at the error result -- AA 55 was in there. It's reading the cmos, and failing to test it, as it should.
I'm not sure if this is a real issue with LB or not ...
* Eric Poulsen eric@zyxod.com [060427 05:31]:
Factory BIOS: Ran memtest86+ for four hours -- no errors.
Walltime Cached RsvdMem MemMap Cache ECC Test 224M 76K e820-std on off std
LinuxBIOS + filo Errors from C0000 to EFEFC Walltime Cached RsvdMem MemMap Cache ECC Test 224M 0 LinuxBIOS on off std
Obviously, memtest86 isn't skipping the cmos -- I managed to pause it just as it hit C000, and looked at the error result -- AA 55 was in there. It's reading the cmos, and failing to test it, as it should.
Is the factory bios specially marking the C segment in the e820 table?
I doubt they mark it as overwritable ram.
Stefan
Richard Smith wrote:
On 4/26/06, Eric Poulsen eric@zyxod.com wrote:
My kernel crash issue is attached below. This one isn't an Exception 6 (that I can tell), but it's one I've seen before. This happened twice
Have you used memtest86 as a payload and verified that your RAM is solid?
-- Richard A. Smith
Oh yeah, one other thing.
I booted LB, got a kernel crash, rebooted and ran memtest86 in LB, got the errors, rebooted again to a kernel, and it worked -- didn't need to reset the cmos using the factory BIOS.