[LinuxBIOS] Need: PCI FIXUP HOWTO (was: LB vs Factory BIOS -- more weirdness)

Mon May 8 17:49:20 CEST 2006

I've dived into the northbridge/southbridge fixup code in LB, in 
response to Richard Smith's suggestion that I check for differences in 
the PCI configuration space between the Factory and LB boots.  I've 
learned a lot about how PCI works, but I've got a long ways to go.

Here's what I've noticed:

Some of the data in 0x00 --> 0x3f is different.  I'm guessing this isn't 
a big deal, since (in theory) the  kernel knows how to deal with the 
"standard" PCI registers, and some of them (IRQ/IO port settings) 
can/will be different.  Is my assumption correct?

Some of the values in the higher addresses are different.  Some of the 
values are placed there by the PCI fixup, and many match the values 
found when running the factory BIOS.  Oddly, some of the values placed 
by the fixup are different, but the 'lspci' output is the same.  For 
example, the fixup calls:

pci_write_config32(dev, 0x88, 0x00000002);

However, the value in the configuration for lb and factory after boot is:

Factory:  80: 0f 65 00 00 80 00 00 00 _03 00 d6 0c_ 00 00 00 00
     LB:  80: 0f 65 00 00 80 00 00 00 _03 00 98 0c_ 00 00 00 00

There's a 0x03 byte in location 0x88, and the upper word is NOT zeroes 
on either one of them.  This leads me to believe that either the chip 
itself changed the value, or the Kernel did.

Of course, there may be other LB code sections that put more/different 
values into the northbridge PCI config.

Is there anyone who can give me some tips on extracting the "magic 
fixups" from a factory BIOS?  I know that there are probably newer 
factory BIOSes since LB was ported to the Epia M.

Eric Poulsen wrote:
> In response to the instability issues I've had with LB (It's not ram -- 
> memtest86+ ran for hours under LB with no errors), someone here 
> suggested that there are chipset registers that are reset by the factory 
> BIOS that LB isn't (re)setting correctly, and that LB works well right 
> after using the factory BIOS because those registers hold their values 
> for a while.
>
> I'm conviced this is the case -- I have too many weird issues that can 
> be fixed by simply flipping back to the factory BIOS, turning the system 
> on, getting a "bad CMOS" error, then immediately powering off and 
> switching to LB, which suddenly works again.
>
> I'm pretty sure the DMA transfer under Linux bug (this is a chipset bug 
> specific to some Epia models that can be fixed (in theory) with a BIOS 
> upgrade) is exhibiting itself -- I get hard locks with the HD light on 
> when xferring larger files.  It's really easy to reproduce.  I tried to 
> lock it under the factory bios, but it wouldn't lock.  After I rebooted 
> using LB, the problem went away in LB as well.  This tends to support 
> the "chipset register remembrance" theory.
>
> Here's the latest.  The full serial caps are at the bottom, but here are 
> the differences.  I set all the kernel times to zero so that diff would 
> work.
>
> Differences in LB output:
>
>  Crash: Low Bond 00  High Bondc0  Setting DQS delay80vt8623 done
> Worked: Low Bond 00  High Bondc1  Setting DQS delay80vt8623 done
>
> I'm no expert, but this appears to be looking for a range of usable RAM 
> under 1M.  Should the High Bond values be different, ever?
>
> Differences in kernel output:
>  Crash: [00000.00000] DMI 2.2 present.
> Worked: [00000.00000] DMI not present or invalid.
>
> I'm not sure if this is relevant or not ...
>
>
> And finally, the crash itself:
>
>   [00000.00000] PCI: If a device doesn't work, try "pci=routeirq".  If 
> it helps, post a report
>   [00000.00000] general protection fault: 3120 [#1]
>   [00000.00000] Modules linked in:
>   [00000.00000] CPU:    0
>   [00000.00000] EIP:    0060:[<c00faa2c>]    Not tainted VLI
>   [00000.00000] EFLAGS: 00010013   (2.6.16.5 #4)
>   [00000.00000] EIP is at 0xc00faa2c
>   [00000.00000] eax: 0000b102   ebx: c13f7400   ecx: 00003123   edx: 
> 00001106
>   [00000.00000] esi: 00000000   edi: c03016cc   ebp: 00000000   esp: 
> c11fff76
>   [00000.00000] ds: 007b   es: 007b   ss: 0068
>   [00000.00000] Process swapper (pid: 1, threadinfo=c11fe000 task=c11fba70)
>   [00000.00000] Stack: <0>31230000 c03016cc c00fa97c 0000b102 00001106 
> 072c0246 0060c024 74000000
>   [00000.00000]        0000c13f 00000000 09fb0000 1106c024 31230000 
> 00000000 ffba0000 ffbbc11f
>   [00000.00000]        0000c11f ffbc0000 ffbcc11f 1274c11f 0000c036 
> 00000000 7d150000 06fbc035
>   [00000.00000] Call Trace:
>   [00000.00000] Code: cb 87 db b4 81 f9 c3 e8 aa 03 00 00 ba 50 43 49 20 
> 66 b8 11 00 66 bb 10 02 f8 c3 57 66 51 66 56 66 83   fa ff 75 05 b4 83 
> f9 eb 53 <66> cb f9 c1 e7 10 66 8b fa e8 7d 03 00 00 8a d9 32 ff 32 ed 80
>   [00000.00000]  <0>Kernel panic - not syncing: Attempted to kill init!
>   [00000.00000]
>