I'm extracting this from a different thread hoping for more help :) Thanks Rudolf for all the help so far.
This is the last "funny" snippet from a Linux boot log with ACPI enabled:
irq 9: nobody cared (try booting with the "irqpoll" option) Pid: 0, comm: swapper Not tainted 2.6.27-11-generic #1
Call Trace: <IRQ> [<ffffffff8029e8ab>] __report_bad_irq+0x2b/0x90 [<ffffffff8029ea47>] note_interrupt+0x137/0x170 [<ffffffff8029f1dd>] handle_fasteoi_irq+0xed/0x110 [<ffffffff80215b16>] do_IRQ+0x86/0x100 [<ffffffff80212f0e>] ret_from_intr+0x0/0x29 <EOI> [<ffffffff8022d236>] ? native_safe_halt+0x6/0x10 [<ffffffff805068ca>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8021ac35>] ? default_idle+0x55/0x60 [<ffffffff80210e95>] ? cpu_idle+0x75/0x110 [<ffffffff804fe845>] ? start_secondary+0x97/0xc2
handlers: [<ffffffff803d2b90>] (acpi_irq+0x0/0x2b) Disabling IRQ #9 Freeing initrd memory: 8460k freed audit: initializing netlink socket (disabled)
This IRQ is very active
9: 1 276 15 99709 IO-APIC-fasteoi acpi
Huh quite big number. Is it from coreboot or legacy BIOS?
It's from Coreboot.
Here's the same line from the factory BIOS:
9: 0 0 0 0 IO-APIC-fasteoi acpi
Maybe some ACPI GP timer is generating the IRQ9?
How do you find an interrupt source that's going crazy like that? When I boot with acpi=off I IRQ9 doesn't even get registered.
Thanks, Myles
I'm extracting this from a different thread hoping for more help :)
How do you find an interrupt source that's going crazy like that? When I boot with acpi=off I IRQ9 doesn't even get registered.
This matches whats going on. The shared IRQ handler for IRQ9 looks to all functions which has registered via request_irq. Each such function returns IRQ_HANDLED or IRQ_NONE when it detects its not their iRQ.
To get a source look to:
1) superIO config 2) PCI IRQ router inside SB (it is used to route the IRQ to 8259, its just a bit more complex multiplexor which decides if IRQ goes to APIC or 8259 or both.
I cannot find anything about nvidia IRQ routers :/ I hate no-docs state!
3) by observation a) boot kernel with initramfs filesystem (or initrd) b) mount /proc/ c) observe if any activity is on that IRQ d) if not load some drivers for PCI devices (network etc...) e) or even better try without ethernet plugged, USB...
Or better method is to hack the "disabling IRQ" handler and printk the interrupt counter there to see if it matches some other count.
Rudolf
http://www.coreboot.org/Nvidia_CK804_Porting_Notes
Maybe this will help a bit. So check the suggested registers.
Rudolf
On Tue, Mar 10, 2009 at 5:06 PM, Rudolf Marek r.marek@assembler.cz wrote:
I'm extracting this from a different thread hoping for more help :)
How do you find an interrupt source that's going crazy like that? When I boot with acpi=off I IRQ9 doesn't even get registered.
This matches whats going on. The shared IRQ handler for IRQ9 looks to all functions which has registered via request_irq. Each such function returns IRQ_HANDLED or IRQ_NONE when it detects its not their iRQ.
To get a source look to:
- superIO config
I'm assuming that since I have the problem on my Tyan s2895 and s2892, and they have different superIOs that that's not the problem. What do you think?
- PCI IRQ router inside SB (it is used to route the IRQ to 8259, its just a
bit more complex multiplexor which decides if IRQ goes to APIC or 8259 or both.
Ah. I'm sorry I think you've tried to tell me this multiple times but I've missed it. You're saying that the IRQ is getting sent to two different IRQs and one of them has a handler, but the other doesn't?
In that case we can narrow it down because only a few IRQs have counts
= IRQ9. Right?
I cannot find anything about nvidia IRQ routers :/ I hate no-docs state!
Yes. I don't have another option right now. There aren't very many boards with multiple PCI root buses on different Opterons.
- by observation
a) boot kernel with initramfs filesystem (or initrd) b) mount /proc/ c) observe if any activity is on that IRQ d) if not load some drivers for PCI devices (network etc...) e) or even better try without ethernet plugged, USB...
Or better method is to hack the "disabling IRQ" handler and printk the interrupt counter there to see if it matches some other count.
Thanks for your help!
Myles
Rudolf
I'm assuming that since I have the problem on my Tyan s2895 and s2892, and they have different superIOs that that's not the problem. What do you think?
Ok then. But it is worth to check if there is no misconfiguration. The IRQ regs are on std places.
Ah. I'm sorry I think you've tried to tell me this multiple times but I've missed it. You're saying that the IRQ is getting sent to two different IRQs and one of them has a handler, but the other doesn't?
Yes thats what I think. Second option is that it is something else like some nVidia ACPI timer, but this is unlikely becuase we would have seen this before the changes.
Maybe you can boot orig coreboot sources and see what device listens on IRQ 9 ;)
Rudolf
On Wed, Mar 11, 2009 at 11:29 AM, Rudolf Marek r.marek@assembler.cz wrote:
I'm assuming that since I have the problem on my Tyan s2895 and s2892, and they have different superIOs that that's not the problem. What do you think?
Ok then. But it is worth to check if there is no misconfiguration. The IRQ regs are on std places.
OK. I'll compare superiotool output from factory and Coreboot.
Ah. I'm sorry I think you've tried to tell me this multiple times but I've missed it. You're saying that the IRQ is getting sent to two different IRQs and one of them has a handler, but the other doesn't?
Yes thats what I think. Second option is that it is something else like some nVidia ACPI timer, but this is unlikely becuase we would have seen this before the changes.
Maybe you can boot orig coreboot sources and see what device listens on IRQ 9 ;)
Sure. I'll try that too.
Thanks, Myles
On Wed, Mar 11, 2009 at 11:32 AM, Myles Watson mylesgw@gmail.com wrote:
On Wed, Mar 11, 2009 at 11:29 AM, Rudolf Marek r.marek@assembler.cz wrote:
I'm assuming that since I have the problem on my Tyan s2895 and s2892, and they have different superIOs that that's not the problem. What do you think?
Ok then. But it is worth to check if there is no misconfiguration. The IRQ regs are on std places.
OK. I'll compare superiotool output from factory and Coreboot.
Ah. I'm sorry I think you've tried to tell me this multiple times but I've missed it. You're saying that the IRQ is getting sent to two different IRQs and one of them has a handler, but the other doesn't?
I booted again, and IRQ9 has 10x the interrupts as any other source. I guess that means it's not a shared one?
Yes thats what I think. Second option is that it is something else like some nVidia ACPI timer, but this is unlikely becuase we would have seen this before the changes.
Maybe you can boot orig coreboot sources and see what device listens on IRQ 9 ;)
Sure. I'll try that too.
Won't I have to add IRQ 9 to the mptable? Will it find it otherwise?
Thanks, Myles
I booted again, and IRQ9 has 10x the interrupts as any other source. I guess that means it's not a shared one?
Hmm and MPtable has no entries for this IRQ? So it must be something else. It makes me wonder what changes trigger the irq9 storm. Perhaps kernel would complain even when there is none handler at all.
Any ideas what it could be? maybe booting irqroute=pic or smth like this could force the old way of pci routing... but leaving IRQ9 for acpi. Should be interesting test.
Maybe you have smth wrong in FADT? Dont know
Won't I have to add IRQ 9 to the mptable? Will it find it otherwise?
If Irq 9 is not used then it must be something else. Is old coreboot having MPtable entries >15? Or write simple kernel driver just requesting IRQ9 and returing IRQ_NONE with old coreboot. See if IRQ is busy too.
Rudolf
On Wed, Mar 11, 2009 at 5:20 PM, Rudolf Marek r.marek@assembler.cz wrote:
I booted again, and IRQ9 has 10x the interrupts as any other source. I guess that means it's not a shared one?
Hmm and MPtable has no entries for this IRQ? So it must be something else. It makes me wonder what changes trigger the irq9 storm. Perhaps kernel would complain even when there is none handler at all.
I think it says Spurious IRQ then.
Any ideas what it could be? maybe booting irqroute=pic or smth like this could force the old way of pci routing... but leaving IRQ9 for acpi. Should be interesting test.
I'll try it.
Maybe you have smth wrong in FADT? Dont know
Could be. I've tried matching the factory and yours.
Won't I have to add IRQ 9 to the mptable? Will it find it otherwise?
If Irq 9 is not used then it must be something else. Is old coreboot having MPtable entries >15?
Yes. 21,22,23... I copied the interrupts for the DSDT from the MPtable.
Or write simple kernel driver just requesting IRQ9 and returing IRQ_NONE with old coreboot. See if IRQ is busy too.
I'll look around, someone must have already written one.
Thanks, Myles
On Wed, Mar 11, 2009 at 11:29 AM, Rudolf Marek r.marek@assembler.cz wrote:
I'm assuming that since I have the problem on my Tyan s2895 and s2892, and they have different superIOs that that's not the problem. What do you think?
Ok then. But it is worth to check if there is no misconfiguration. The IRQ regs are on std places.
Ah. I'm sorry I think you've tried to tell me this multiple times but I've missed it. You're saying that the IRQ is getting sent to two different IRQs and one of them has a handler, but the other doesn't?
Yes thats what I think. Second option is that it is something else like some nVidia ACPI timer, but this is unlikely becuase we would have seen this before the changes.
Maybe you can boot orig coreboot sources and see what device listens on IRQ 9 ;)
With coreboot without ACPI tables no one listens on IRQ 9. With or without ACPI tables I get a spurious IRQ 7.
These are interesting things from the boot log:
Nvidia board detected. Ignoring ACPI timer override. If you got timer trouble try acpi_use_timer_override Intel MultiProcessor Specification v1.4 MPTABLE: OEM ID: TYAN MPTABLE: Product ID: S2895 MPTABLE: APIC at: 0xFEE00000 Processor #0 (Bootup-CPU) Processor #1 Processor #2 Processor #3 I/O APIC #4 Version 17 at 0xF5200000. I/O APIC #5 Version 17 at 0xFC200000. I/O APIC #6 Version 17 at 0xFC201000. I/O APIC #7 Version 17 at 0xFC100000. Setting APIC routing to flat ... spurious 8259A interrupt: IRQ7. ... ExtINT in hardware and MP table differ ... PCI: Discovered primary peer bus 80 [IRQ] pci 0000:00:09.0: default IRQ router [10de/005c] pci 0000:00:01.1: PCI->APIC IRQ transform: INT A -> IRQ 10 pci 0000:00:02.0: PCI->APIC IRQ transform: INT A -> IRQ 21 pci 0000:00:02.1: PCI->APIC IRQ transform: INT B -> IRQ 20 pci 0000:00:04.0: PCI->APIC IRQ transform: INT A -> IRQ 20 pci 0000:00:07.0: PCI->APIC IRQ transform: INT A -> IRQ 23 pci 0000:00:08.0: PCI->APIC IRQ transform: INT A -> IRQ 22 pci 0000:00:0a.0: PCI->APIC IRQ transform: INT A -> IRQ 21 pci 0000:01:05.0: PCI->APIC IRQ transform: INT A -> IRQ 19 pci 0000:02:00.0: PCI->APIC IRQ transform: INT A -> IRQ 18 pci 0000:80:0a.0: PCI->APIC IRQ transform: INT A -> IRQ 53 pci 0000:81:00.0: PCI->APIC IRQ transform: INT A -> IRQ 50 pci 0000:81:00.1: PCI->APIC IRQ transform: INT B -> IRQ 51
Thanks, Myles
Maybe you can boot orig coreboot sources and see what device listens on IRQ 9 ;)
With coreboot without ACPI tables no one listens on IRQ 9. With or without ACPI tables I get a spurious IRQ 7.
Hmm OK. Maybe there is some glitch on int line?
These are interesting things from the boot log:
Nvidia board detected. Ignoring ACPI timer override. If you got timer trouble try acpi_use_timer_override Intel MultiProcessor Specification v1.4
Hmm this is because reference BIOS has some timer override bug. Its OK (the override when there is HPET)
ExtINT in hardware and MP table differ
This means that the pin for 8259IRQ is configured else where (the ExtINT) than it is written in MPTABLE. It may be some other bug?
This does not move us any further. It makes me wonder why the unchanged coreboot is not having the IRQ9 issue.
The only explanation could be that that usage the IRQ9 interrupt with ACPI is set to level, active low. But without ACPI it is unused and set to edge trigger. Maybe changing the 0x4d0 (ECLR) in unmodified coreboot from edge to level may expose again the irq9 problems.
Can you dump here the 0x4d0 and 0x4d1 so we can check how the level/trigger is set.
Rudolf
On Sun, Mar 15, 2009 at 1:43 PM, Rudolf Marek r.marek@assembler.cz wrote:
Maybe you can boot orig coreboot sources and see what device listens on IRQ 9 ;)
With coreboot without ACPI tables no one listens on IRQ 9. With or without ACPI tables I get a spurious IRQ 7.
Hmm OK. Maybe there is some glitch on int line?
These are interesting things from the boot log:
Nvidia board detected. Ignoring ACPI timer override. If you got timer trouble try acpi_use_timer_override Intel MultiProcessor Specification v1.4
Hmm this is because reference BIOS has some timer override bug. Its OK (the override when there is HPET)
ExtINT in hardware and MP table differ
This means that the pin for 8259IRQ is configured else where (the ExtINT) than it is written in MPTABLE. It may be some other bug?
This does not move us any further. It makes me wonder why the unchanged coreboot is not having the IRQ9 issue.
The only explanation could be that that usage the IRQ9 interrupt with ACPI is set to level, active low. But without ACPI it is unused and set to edge trigger. Maybe changing the 0x4d0 (ECLR) in unmodified coreboot from edge to level may expose again the irq9 problems.
All of the other interrupts from 0x0-0xf are set as Edge triggered in the mptable, so that seems likely.
Can you dump here the 0x4d0 and 0x4d1 so we can check how the level/trigger is set.
Again I'm ignorant. How do I dump those values?
Thanks, Myles
Hi,
All of the other interrupts from 0x0-0xf are set as Edge triggered in the mptable, so that seems likely.
Fine. Second reason might be that the ExtInt pin is wrong for the APIC. I mean it is set wrongly. To which pin sets the original BIOS the ExtInt IRQ?
Can you dump here the 0x4d0 and 0x4d1 so we can check how the level/trigger is set.
Again I'm ignorant. How do I dump those values?
You arent, its important to ask!
#isadump -f 0x4d0
you should see the two bytes decoding at 0x4d0 and 0x4d1.
bit0 at 0x4d0 is IRQ0, bit7 is IRQ7
0 is level 1 is edge
This is set by linux.
Rudolf
On Mon, Mar 16, 2009 at 1:23 PM, Rudolf Marek r.marek@assembler.cz wrote:
Hi,
All of the other interrupts from 0x0-0xf are set as Edge triggered in the mptable, so that seems likely.
Fine. Second reason might be that the ExtInt pin is wrong for the APIC. I mean it is set wrongly. To which pin sets the original BIOS the ExtInt IRQ?
Do I need to decode the mptable to know that? Is there a utility to do that? I've gotten as far as "ISA is bus 86", but I'm struggling to find ExtINT in there. I don't see anything that matches what I expect.
Can you dump here the 0x4d0 and 0x4d1 so we can check how the level/trigger is set.
Again I'm ignorant. How do I dump those values?
You arent, its important to ask!
#isadump -f 0x4d0
you should see the two bytes decoding at 0x4d0 and 0x4d1.
bit0 at 0x4d0 is IRQ0, bit7 is IRQ7
0 is level 1 is edge
04d0: 20 0c
IRQ 7, IRQ a, IRQ b are edge?
Thanks, Myles
Myles Watson napsal(a):
On Mon, Mar 16, 2009 at 1:23 PM, Rudolf Marek r.marek@assembler.cz wrote:
Hi,
All of the other interrupts from 0x0-0xf are set as Edge triggered in the mptable, so that seems likely.
Fine. Second reason might be that the ExtInt pin is wrong for the APIC. I mean it is set wrongly. To which pin sets the original BIOS the ExtInt IRQ?
Do I need to decode the mptable to know that? Is there a utility to do that? I've gotten as far as "ISA is bus 86", but I'm struggling to find ExtINT in there. I don't see anything that matches what I expect.
It should be the interrupt type 3 maybe you can dump the table somewhere?
mp_ExtINT Looking into the mptable for yhe 2895 seems that extint is not pin0 but pin2!
Maybe you can change the code here:
--- ck804_lpc.c (revision 3613) +++ ck804_lpc.c (working copy) @@ -43,9 +43,9 @@ #define INT (1 << 8) /* IO-APIC virtual wire mode configuration */ /* mask, trigger, polarity, destination, delivery, vector */ - { 0, ENABLED | TRIGGER_EDGE | POLARITY_HIGH | PHYSICAL_DEST | ExtINT, NONE}, + { 0, DISABLED, NONE}, { 1, DISABLED, NONE}, - { 2, DISABLED, NONE}, + { 2, ENABLED | TRIGGER_EDGE | POLARITY_HIGH | PHYSICAL_DEST | ExtINT, NONE}, { 3, DISABLED, NONE}, { 4, DISABLED, NONE}, { 5, DISABLED, NONE},
04d0: 20 0c
IRQ 7, IRQ a, IRQ b are edge?
Nope IRQ5 and IRQ10 and IRQ11 are edge.
But I think that linux trusts more the APIC conf and not the MPTable so, after you fix it it may actually start to work.
Rudolf
On Mon, Mar 16, 2009 at 2:36 PM, Rudolf Marek r.marek@assembler.cz wrote:
Myles Watson napsal(a):
On Mon, Mar 16, 2009 at 1:23 PM, Rudolf Marek r.marek@assembler.cz wrote:
Hi,
All of the other interrupts from 0x0-0xf are set as Edge triggered in the mptable, so that seems likely.
Fine. Second reason might be that the ExtInt pin is wrong for the APIC. I mean it is set wrongly. To which pin sets the original BIOS the ExtInt IRQ?
Do I need to decode the mptable to know that? Is there a utility to do that? I've gotten as far as "ISA is bus 86", but I'm struggling to find ExtINT in there. I don't see anything that matches what I expect.
It should be the interrupt type 3 maybe you can dump the table somewhere?
That's what I was looking for. I thought 03 03 .... 86, but I didn't see anything like that.
I'm attaching the hexdump.
mp_ExtINT Looking into the mptable for yhe 2895 seems that extint is not pin0 but pin2!
Maybe you can change the code here:
--- ck804_lpc.c (revision 3613) +++ ck804_lpc.c (working copy) @@ -43,9 +43,9 @@ #define INT (1 << 8) /* IO-APIC virtual wire mode configuration */ /* mask, trigger, polarity, destination, delivery, vector */
- { 0, ENABLED | TRIGGER_EDGE | POLARITY_HIGH | PHYSICAL_DEST |
ExtINT, NONE},
- { 0, DISABLED, NONE},
{ 1, DISABLED, NONE},
- { 2, DISABLED, NONE},
- { 2, ENABLED | TRIGGER_EDGE | POLARITY_HIGH | PHYSICAL_DEST |
ExtINT, NONE}, { 3, DISABLED, NONE}, { 4, DISABLED, NONE}, { 5, DISABLED, NONE},
I'll try it.
04d0: 20 0c
IRQ 7, IRQ a, IRQ b are edge?
Nope IRQ5 and IRQ10 and IRQ11 are edge.
Right.
But I think that linux trusts more the APIC conf and not the MPTable so, after you fix it it may actually start to work.
Thanks, Myles
I'm attaching the hexdump.
Could you please use the util/.... something for dumping the mptable? It would be easier for me to verify that. On the other hand here it seems that the ExtINT is in fact pin0 :/ But it may be wrong too.
Rudolf