I have been on other things for a while but had a chance to look at this again.
Recap: on the smartcore P3, etherboot 5.0.7 fails because all inw() operations return 0. Two inw()s in a row sometimes return the right value on the second one.
I thought this might be a memory config problem so I brought down memtest 3.0, and am now running it. Unfortunately, it runs just fine. Memory configuration, at least judging by the memtest results, is correct on this machine. I'm up to test 4, running for 7 minutes now, and usually memory problems if they existed would have shown up by now.
So, back to the original issue: inw operations acting wrong. The first inw() always reads 0, the seconds reads what looks like the right value. Anybody have an idea on what kind of north/south configuration problems could make this happen?
ron
On Mon, 2002-09-23 at 12:34, Ronald G Minnich wrote:
So, back to the original issue: inw operations acting wrong. The first inw() always reads 0, the seconds reads what looks like the right value. Anybody have an idea on what kind of north/south configuration problems could make this happen?
ron
Slow (bad) hardware? I've seen that happen on a PCI device at 80 deg. C, and I've seen it happen on a device which could only speced up to 28MHz at 3.3V.
On 23 Sep 2002, Christopher Stutts wrote:
On Mon, 2002-09-23 at 12:34, Ronald G Minnich wrote:
So, back to the original issue: inw operations acting wrong. The first inw() always reads 0, the seconds reads what looks like the right value. Anybody have an idea on what kind of north/south configuration problems could make this happen?
ron
Slow (bad) hardware? I've seen that happen on a PCI device at 80 deg. C, and I've seen it happen on a device which could only speced up to 28MHz at 3.3V.
maybe. But, here is the thing: serial I/O has been working on memtest for 3.5 hours. The ethernet hardware drives packets out just fine -- I receive them and respond to them. outb/w/l works. inw() does not.
This is really weird.
ron
So, back to the original issue: inw operations acting wrong. The first inw() always reads 0, the seconds reads what looks like the
right value.
Anybody have an idea on what kind of north/south
configuration problems
could make this happen?
is this all ports, just bridge registers, or external I/O (ISA or PCI)? I agree that it sounds like a timing problem, but it depends on which I/O ports.
-Steve
On Mon, 23 Sep 2002, Steve M. Gehlbach wrote:
is this all ports, just bridge registers, or external I/O (ISA or PCI)? I agree that it sounds like a timing problem, but it depends on which I/O ports.
it is PCI ports. I don't seem to be able to inw() from any PCI ports but outw is fine.
I'm going to check the northbridge manual for hints. This is the first 440bx we have seen this problem on.
ron
Hello again from Gregg C Levine "northbridge manual"? You've lost me there, Ron. Please explain. I know, (I think), which part of the PCI layout, is which, but that reference eludes me. ------------------- Gregg C Levine hansolofalcon@worldnet.att.net ------------------------------------------------------------ "The Force will be with you...Always." Obi-Wan Kenobi "Use the Force, Luke." Obi-Wan Kenobi (This company dedicates this E-Mail to General Obi-Wan Kenobi ) (This company dedicates this E-Mail to Master Yoda )
-----Original Message----- From: linuxbios-admin@clustermatic.org [mailto:linuxbios- admin@clustermatic.org] On Behalf Of Ronald G Minnich Sent: Monday, September 23, 2002 6:07 PM To: Steve M. Gehlbach Cc: linuxbios@clustermatic.org Subject: RE: more news on the smartcore P3 and etherboot failures.
On Mon, 23 Sep 2002, Steve M. Gehlbach wrote:
is this all ports, just bridge registers, or external I/O (ISA or
PCI)? I
agree that it sounds like a timing problem, but it depends on which
I/O
ports.
it is PCI ports. I don't seem to be able to inw() from any PCI ports
but
outw is fine.
I'm going to check the northbridge manual for hints. This is the first 440bx we have seen this problem on.
ron
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
On Tue, 24 Sep 2002, Gregg C Levine wrote:
"northbridge manual"? You've lost me there, Ron. Please explain. I know, (I think), which part of the PCI layout, is which, but that reference eludes me.
northbridge is the common term for "that big buggy chip which connects CPUs to memory and PCI bus"
ron
Hello from Gregg C Levine Okay. That I will agree on. Indeed there are some bugs inside that thing. However I was referring to the term you used, "northbridge manual". Is there actually a manual for the particular chipsets you were discussing or insulting in this thread? If its a 440 family part, I think I know where to find it. ------------------- Gregg C Levine hansolofalcon@worldnet.att.net ------------------------------------------------------------ "The Force will be with you...Always." Obi-Wan Kenobi "Use the Force, Luke." Obi-Wan Kenobi (This company dedicates this E-Mail to General Obi-Wan Kenobi ) (This company dedicates this E-Mail to Master Yoda )
-----Original Message----- From: linuxbios-admin@clustermatic.org [mailto:linuxbios- admin@clustermatic.org] On Behalf Of Ronald G Minnich Sent: Tuesday, September 24, 2002 10:18 AM To: Gregg C Levine Cc: Linuxbios Subject: RE: more news on the smartcore P3 and etherboot failures.
On Tue, 24 Sep 2002, Gregg C Levine wrote:
"northbridge manual"? You've lost me there, Ron. Please explain. I
know,
(I think), which part of the PCI layout, is which, but that
reference
eludes me.
northbridge is the common term for "that big buggy chip which connects CPUs to memory and PCI bus"
ron
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
I have the manual, but found nothing useful for this particular problem.
ron
Hello from Gregg C Levine Okay. That I will agree on. Indeed there are some bugs inside that thing. However I was referring to the term you used, "northbridge manual". Is there actually a manual for the particular chipsets you were discussing or insulting in this thread? If its a 440 family part, I think I know where to find it. ------------------- Gregg C Levine hansolofalcon@worldnet.att.net ------------------------------------------------------------ "The Force will be with you...Always." Obi-Wan Kenobi "Use the Force, Luke." Obi-Wan Kenobi (This company dedicates this E-Mail to General Obi-Wan Kenobi ) (This company dedicates this E-Mail to Master Yoda )
-----Original Message----- From: linuxbios-admin@clustermatic.org [mailto:linuxbios- admin@clustermatic.org] On Behalf Of Ronald G Minnich Sent: Tuesday, September 24, 2002 10:18 AM To: Gregg C Levine Cc: Linuxbios Subject: RE: more news on the smartcore P3 and etherboot failures.
On Tue, 24 Sep 2002, Gregg C Levine wrote:
"northbridge manual"? You've lost me there, Ron. Please explain. I
know,
(I think), which part of the PCI layout, is which, but that
reference
eludes me.
northbridge is the common term for "that big buggy chip which connects CPUs to memory and PCI bus"
ron
Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
I have an interesting problem with the pcchips m787cl+ motherboard: it won't do a reboot. If you use the three finger salute or issue an init 6, it hangs at the "Restarting system" message at the bottom of the shutdown cycle.
I traced the code to the machine_restart sub in arch/i386/kernel/process.c. It seems that linux tries to reset using a keyboard command (out 0xfe,0x64) and if that fails it forces a triple fault. Neither seem to work. I tried setting b0 of reg 0x46 in the ISA bridge (-d 1039:8), on the sis630e, which is labeled "Enable Keyboard Hardware Reset", but it didn't work. So I put in a patch to process.c that sets b6,7 of reg 0x46 of the ISA bridge, and this works, it causes a reset (per the data sheet). But of course that is a kernel patch and I would like to avoid that.
Anyone run into this issue before, or know if I should blame the sis630 or the via C3? Seems like a poor design if you can't reboot without a kernel patch, although it is a simple one. Hanging on a reboot is a problem for my embedded system, not sure about large clusters but it seems like that would be a problem. For my project, I can live with the patch, but would like to find a more robust solution for the linuxbios project.
The three finger salute does work with the orig BIOS and DOS, but I suspect that is because DOS makes a BIOS call that sets the correct bits in the ISA bridge. Linux with the orig BIOS hangs at the same place, though.
-Steve
On Mon, 23 Sep 2002, Steve M. Gehlbach wrote:
I traced the code to the machine_restart sub in arch/i386/kernel/process.c. It seems that linux tries to reset using a keyboard command (out 0xfe,0x64) and if that fails it forces a triple fault. Neither seem to work. I tried setting b0 of reg 0x46 in the ISA bridge (-d 1039:8), on the sis630e, which is labeled "Enable Keyboard Hardware Reset", but it didn't work. So I put in a patch to process.c that sets b6,7 of reg 0x46 of the ISA bridge, and this works, it causes a reset (per the data sheet). But of course that is a kernel patch and I would like to avoid that.
ah yes. Linux restart. It never actually worked for me. In fact I'm dubious that on most modern machines it ever worked at all. I learned that once while tracing Linux restart with an ICE. Lots of careful code in there, but the thing that makes reset happen is ... the triple fault. It's funny in a way.
What linux seems to be tending to for reset is to set the watchdog timer, and then just sit there tapping your toes until it times out and resets the machine. If we can get WDT support for the 630 hardware in, that would be the way to do it.
Anyone run into this issue before, or know if I should blame the sis630 or the via C3? Seems like a poor design if you can't reboot without a kernel patch, although it is a simple one.
The mistake here, in my view, is Intel. At one time, microprocessors had a reset instruction that would reset the machine. For some reason x86 boxes never had it -- they relied on magic external stuff. I have never figured out why Intel did this. Maybe to save a pin? Not sure.
So we have a zillion ways to reset the machine that are in essence "side effects". You do this strange thing and somehow a reset happens. I find it very weird.
I think we need to get WDT support in and make it work that way. That seems to be the approved method nowadays, and there is support for lots of chips in the Linux WDT code.
ron
What linux seems to be tending to for reset is to set the watchdog timer, and then just sit there tapping your toes until it times out and resets the machine. If we can get WDT support for the 630 hardware in, that would be the way to do it.
ron
Good idea, I tried it with a little test code, and the wdt reset definitely works. I guess what is needed is a driver, since I did not see any support for the sis630 wdt in drivers/char.
I'll probably go with my 2 line patch for now, since it does a perfect reset, without toe tapping, and maybe someday SiS can write a driver to support the chip. I may do it at some point if my code otherwise needs it, and it may, but it is too much of an investment of time just for the reset.
Thanks for the help.
-Steve
PS: if anyone is interested the patch to 2.4.19 is pretty simple, put in before the triple fault:
diff arch/i386/kernel/process.c arch/i386/kernel/process.c.orig 416,421d415 < < // S. Gehlbach: 3-fault doesn't work for SiS630/C3, < // so reset via the SiS630 ISA Bridge 0xc0 -> reg 0x46 < outl(0x80000844,0xCF8); < outb(0xc0, 0xCFE); <
steve, my power patches are a bit cleaner than that, if you want to take a look.
They actually suggest a framework for this kind of thing but the WDT stuff came along and it makes a bit more sense.
ron
On Tue, 2002-09-24 at 06:19, Steve M. Gehlbach wrote:
I have an interesting problem with the pcchips m787cl+ motherboard: it won't do a reboot. If you use the three finger salute or issue an init 6, it hangs at the "Restarting system" message at the bottom of the shutdown cycle.
I traced the code to the machine_restart sub in arch/i386/kernel/process.c. It seems that linux tries to reset using a keyboard command (out 0xfe,0x64) and if that fails it forces a triple fault. Neither seem to work. I tried setting b0 of reg 0x46 in the ISA bridge (-d 1039:8), on the sis630e, which is labeled "Enable Keyboard Hardware Reset", but it didn't work. So I put in a patch to process.c that sets b6,7 of reg 0x46 of the ISA bridge, and this works, it causes a reset (per the data sheet). But of course that is a kernel patch and I would like to avoid that.
The ACPI watch dog code is in the SiS LinuxBIOS patch avaliable on CVS.
Ollie
The ACPI watch dog code is in the SiS LinuxBIOS patch avaliable on CVS.
Ollie
Thanks Ollie, that's a lot of great code. I did not realize this was there.
I'm curious, though, do you think the watchdog timer reset is better, ie, less likely to have race conditions, than my simple bit setting of b6,7 in the INIT Enable Register (Hardware Reset Initiated by Software) at reg 0x46 in LPC Bridge? By the name it seems this is what it is for.
-Steve
On Tue, 2002-09-24 at 11:19, Steve M. Gehlbach wrote:
The ACPI watch dog code is in the SiS LinuxBIOS patch avaliable on CVS.
Ollie
Thanks Ollie, that's a lot of great code. I did not realize this was there.
I'm curious, though, do you think the watchdog timer reset is better, ie, less likely to have race conditions, than my simple bit setting of b6,7 in the INIT Enable Register (Hardware Reset Initiated by Software) at reg 0x46 in LPC Bridge? By the name it seems this is what it is for.
What kind of race condition are you afraid of ??
I don't have much idea about the difference between these two methods. AFIAK, there are various kind of "Reset" or "INIT" form HW point of view. Reset by LPC bridge only reset CPU. You have to use ACPI WDT to reset the "whole" system.
Ollie
I'm curious, though, do you think the watchdog timer reset is
better, ie,
less likely to have race conditions, than my simple bit setting
of b6,7 in
the INIT Enable Register (Hardware Reset Initiated by Software)
at reg 0x46
in LPC Bridge? By the name it seems this is what it is for.
What kind of race condition are you afraid of ??
I don't have much idea about the difference between these two methods. AFIAK, there are various kind of "Reset" or "INIT" form HW point of view. Reset by LPC bridge only reset CPU. You have to use ACPI WDT to reset the "whole" system.
I had no specific problem in mind, just speculating as to why it might be better to use the WDT. And you answered my question, that the WDT reset is more complete.
But in fact, I put a scope on it, the Hardware Reset Initiated by Software also activates RESET# on PCI bus (~10-15ms, A15), so it appears that both PCIRST# and CPURST# are activated.
-Steve
Steve,
The ACPI watch dog code is in the SiS LinuxBIOS patch avaliable on CVS.
Ollie
Thanks Ollie, that's a lot of great code. I did not realize this was there. I'm curious, though, do you think the watchdog timer reset is better, ie, less likely to have race conditions, than my simple bit setting of b6,7 in the INIT Enable Register (Hardware Reset Initiated by Software) at reg 0x46 in LPC Bridge? By the name it seems this is what it is for.
I have also got a kernel source with all required linuxbios patches, eg. sis, fb reset twice, kexec along with XFS, JFS, LVM, EVMS, pre-emptive. It should save you some time. You can get it at ftp://ftp.cwlinux.com/pub/downloads/kernel/2.4.19/kernel-source-linuxbios-2.4.19-CWLINUX_4.i386.rpm
-Andrew