I've dived into the northbridge/southbridge fixup code in LB, in response to Richard Smith's suggestion that I check for differences in the PCI configuration space between the Factory and LB boots. I've learned a lot about how PCI works, but I've got a long ways to go.
Here's what I've noticed:
Some of the data in 0x00 --> 0x3f is different. I'm guessing this isn't a big deal, since (in theory) the kernel knows how to deal with the "standard" PCI registers, and some of them (IRQ/IO port settings) can/will be different. Is my assumption correct?
Some of the values in the higher addresses are different. Some of the values are placed there by the PCI fixup, and many match the values found when running the factory BIOS. Oddly, some of the values placed by the fixup are different, but the 'lspci' output is the same. For example, the fixup calls:
pci_write_config32(dev, 0x88, 0x00000002);
However, the value in the configuration for lb and factory after boot is:
Factory: 80: 0f 65 00 00 80 00 00 00 _03 00 d6 0c_ 00 00 00 00 LB: 80: 0f 65 00 00 80 00 00 00 _03 00 98 0c_ 00 00 00 00
There's a 0x03 byte in location 0x88, and the upper word is NOT zeroes on either one of them. This leads me to believe that either the chip itself changed the value, or the Kernel did.
Of course, there may be other LB code sections that put more/different values into the northbridge PCI config.
Is there anyone who can give me some tips on extracting the "magic fixups" from a factory BIOS? I know that there are probably newer factory BIOSes since LB was ported to the Epia M.
Eric Poulsen wrote:
In response to the instability issues I've had with LB (It's not ram -- memtest86+ ran for hours under LB with no errors), someone here suggested that there are chipset registers that are reset by the factory BIOS that LB isn't (re)setting correctly, and that LB works well right after using the factory BIOS because those registers hold their values for a while.
I'm conviced this is the case -- I have too many weird issues that can be fixed by simply flipping back to the factory BIOS, turning the system on, getting a "bad CMOS" error, then immediately powering off and switching to LB, which suddenly works again.
I'm pretty sure the DMA transfer under Linux bug (this is a chipset bug specific to some Epia models that can be fixed (in theory) with a BIOS upgrade) is exhibiting itself -- I get hard locks with the HD light on when xferring larger files. It's really easy to reproduce. I tried to lock it under the factory bios, but it wouldn't lock. After I rebooted using LB, the problem went away in LB as well. This tends to support the "chipset register remembrance" theory.
Here's the latest. The full serial caps are at the bottom, but here are the differences. I set all the kernel times to zero so that diff would work.
Differences in LB output:
Crash: Low Bond 00 High Bondc0 Setting DQS delay80vt8623 done Worked: Low Bond 00 High Bondc1 Setting DQS delay80vt8623 done
I'm no expert, but this appears to be looking for a range of usable RAM under 1M. Should the High Bond values be different, ever?
Differences in kernel output: Crash: [00000.00000] DMI 2.2 present. Worked: [00000.00000] DMI not present or invalid.
I'm not sure if this is relevant or not ...
And finally, the crash itself:
[00000.00000] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report [00000.00000] general protection fault: 3120 [#1] [00000.00000] Modules linked in: [00000.00000] CPU: 0 [00000.00000] EIP: 0060:[<c00faa2c>] Not tainted VLI [00000.00000] EFLAGS: 00010013 (2.6.16.5 #4) [00000.00000] EIP is at 0xc00faa2c [00000.00000] eax: 0000b102 ebx: c13f7400 ecx: 00003123 edx: 00001106 [00000.00000] esi: 00000000 edi: c03016cc ebp: 00000000 esp: c11fff76 [00000.00000] ds: 007b es: 007b ss: 0068 [00000.00000] Process swapper (pid: 1, threadinfo=c11fe000 task=c11fba70) [00000.00000] Stack: <0>31230000 c03016cc c00fa97c 0000b102 00001106 072c0246 0060c024 74000000 [00000.00000] 0000c13f 00000000 09fb0000 1106c024 31230000 00000000 ffba0000 ffbbc11f [00000.00000] 0000c11f ffbc0000 ffbcc11f 1274c11f 0000c036 00000000 7d150000 06fbc035 [00000.00000] Call Trace: [00000.00000] Code: cb 87 db b4 81 f9 c3 e8 aa 03 00 00 ba 50 43 49 20 66 b8 11 00 66 bb 10 02 f8 c3 57 66 51 66 56 66 83 fa ff 75 05 b4 83 f9 eb 53 <66> cb f9 c1 e7 10 66 8b fa e8 7d 03 00 00 8a d9 32 ff 32 ed 80 [00000.00000] <0>Kernel panic - not syncing: Attempted to kill init! [00000.00000]