What could be a problem when it passes RAM test but hangs immediately after jumping to RAM?
I'm developing a new raminit code and this happens only with a particular installation of DIMM. DIMM itself is not bad, it works when move to another slot, or another slot is filled with another DIMM. Award BIOS runs just fine with this DIMM installation.
LinuxBIOS-1.0.0 Thu May 15 07:27:04 JST 2003 starting... Enabled first bank of RAM: 0x10000000 bytes Testing SDRAM : 00000000-0009ffff SDRAM fill: 0009ffff SDRAM verify: 0009ffff Done. Testing SDRAM : 00100000-01000000 SDRAM fill: 01000000 SDRAM verify: 01000000 Done. Copying LinuxBIOS to ram. Jumping to LinuxBIOS. (hangs up here)
Do you have option CONFIG_COMPRESS=0
-Andrew
On Thu, May 15, 2003 at 06:37:17PM +0900, SONE Takeshi wrote:
What could be a problem when it passes RAM test but hangs immediately after jumping to RAM?
I'm developing a new raminit code and this happens only with a particular installation of DIMM. DIMM itself is not bad, it works when move to another slot, or another slot is filled with another DIMM. Award BIOS runs just fine with this DIMM installation.
LinuxBIOS-1.0.0 Thu May 15 07:27:04 JST 2003 starting... Enabled first bank of RAM: 0x10000000 bytes Testing SDRAM : 00000000-0009ffff SDRAM fill: 0009ffff SDRAM verify: 0009ffff Done. Testing SDRAM : 00100000-01000000 SDRAM fill: 01000000 SDRAM verify: 01000000 Done. Copying LinuxBIOS to ram. Jumping to LinuxBIOS. (hangs up here)
-- Takeshi _______________________________________________ Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
On Thu, May 15, 2003 at 06:02:49PM +0800, Andrew Ip wrote:
Do you have option CONFIG_COMPRESS=0
No. I didn't think it matters but I'll try it anyway.
-- Takeshi
On Thu, 15 May 2003, SONE Takeshi wrote:
What could be a problem when it passes RAM test but hangs immediately after jumping to RAM?
I'm developing a new raminit code and this happens only with a particular installation of DIMM. DIMM itself is not bad, it works when move to another slot, or another slot is filled with another DIMM. Award BIOS runs just fine with this DIMM installation.
Every time I've seen this it is either drive strength settings or timing (esp. on the 8601, it is VERY sensitive to buffer strength).
You really need to dump the northbridge and make sure you know exactly, for each setting, why award set it one way and why linuxbios set it the same way or differently. This is very tedious but it is your only choice. With the 8601 I saw one case where all the bits except one were correct on a data read. It only takes on wrong bit, however, to wreak havoc.
ron
On Thu, May 15, 2003 at 08:01:01AM -0600, ron minnich wrote:
On Thu, 15 May 2003, SONE Takeshi wrote:
What could be a problem when it passes RAM test but hangs immediately after jumping to RAM?
I'm developing a new raminit code and this happens only with a particular installation of DIMM. DIMM itself is not bad, it works when move to another slot, or another slot is filled with another DIMM. Award BIOS runs just fine with this DIMM installation.
Every time I've seen this it is either drive strength settings or timing (esp. on the 8601, it is VERY sensitive to buffer strength).
I've found 0x6B is once asssigned the same value as Aword, then rewritten later with different value. (it's from CVS's raminit.inc) I thought this is something to do with drive strength (changes slew rate) but removing the latter one doesn't really change things.
I also tried COMFIG_COMPRESS=0, then Steve M. Gehlbach's copy-verify hack (found here: http://www.clustermatic.org/pipermail/linuxbios/2002-October/000527.html ), and totally disabling cache. Everytime I change something, it stops execution at different point.
You really need to dump the northbridge and make sure you know exactly, for each setting, why award set it one way and why linuxbios set it the same way or differently. This is very tedious but it is your only choice. With the 8601 I saw one case where all the bits except one were correct on a data read. It only takes on wrong bit, however, to wreak havoc.
Thanks. I'll again have to look at those hexadecimal dumps and the book. It really exhausts my eyes...
-- Takeshi
The symptoms you are describing sound almost certainly like RAM problems.
ron
On Fri, May 16, 2003 at 08:40:14AM -0600, ron minnich wrote:
The symptoms you are describing sound almost certainly like RAM problems.
I think I solved the problem now! Now my code automatically detects DIMM presence, size, and MA mapping type correctly for my 3 different types of DIMMs, in whichever slot, or any combination of these DIMMs.
I'll do some cleanups and extensive test using memtest86, then post the patch.
-- Takeshi
Takeshi,
I think I solved the problem now! Now my code automatically detects DIMM presence, size, and MA mapping type correctly for my 3 different types of DIMMs, in whichever slot, or any combination of these DIMMs. I'll do some cleanups and extensive test using memtest86, then post the patch.
I'm just testing your latest code here with LinuxBIOS + ADLO. It is able to start VGA and grub sometimes, but not all the time. FYI, I'm running EPIA with IDE to CF adapter. It maybe related to memory again.
-Andrew
On Wed, May 21, 2003 at 11:18:09PM +0800, Andrew Ip wrote:
I'm just testing your latest code here with LinuxBIOS + ADLO. It is able to start VGA and grub sometimes, but not all the time. FYI, I'm running EPIA with IDE to CF adapter. It maybe related to memory again.
Could you send me a lspci -xxx -s0:0.0 of Award BIOS and serial output of LinuxBIOS? If I found something interesting there, maybe I can help.
Otherwise, it's difficult for me to cope with it since the only one EPIA board I have access to is working quite happily with current code.
I think you don't have to be asked to do basic things to track it down (like memtest86, replacing DIMMs, disabling framebuffer/VGABIOS, etc.)
Only possible solution comes to mind is to introduce more "Aword compatible" register values. I tried many of them, then dropped most of them once it began to work, to keep it minimum. I didn't like to have things I can't explain.
-- Takeshi
Sounds like you may have a signal integrety problem. try hanging a scope on the address and data lines and look for transients... --- SONE Takeshi ts1@cma.co.jp wrote:
On Thu, May 15, 2003 at 08:01:01AM -0600, ron minnich wrote:
On Thu, 15 May 2003, SONE Takeshi wrote:
What could be a problem when it passes RAM test but hangs immediately after jumping to RAM?
I'm developing a new raminit code and this happens only with a particular installation of DIMM. DIMM itself is not bad, it works when move to another
slot,
or another slot is filled with another DIMM. Award BIOS runs just fine with this DIMM installation.
Every time I've seen this it is either drive strength
settings or timing
(esp. on the 8601, it is VERY sensitive to buffer strength).
I've found 0x6B is once asssigned the same value as Aword, then rewritten later with different value. (it's from CVS's raminit.inc) I thought this is something to do with drive strength (changes slew rate) but removing the latter one doesn't really change things.
I also tried COMFIG_COMPRESS=0, then Steve M. Gehlbach's copy-verify hack (found here:
http://www.clustermatic.org/pipermail/linuxbios/2002-October/000527.html
), and totally disabling cache. Everytime I change something, it stops execution at different point.
You really need to dump the northbridge and make sure you
know exactly,
for each setting, why award set it one way and why linuxbios
set it the
same way or differently. This is very tedious but it is your
only choice.
With the 8601 I saw one case where all the bits except one
were correct on
a data read. It only takes on wrong bit, however, to wreak
havoc.
Thanks. I'll again have to look at those hexadecimal dumps and the book. It really exhausts my eyes...
-- Takeshi _______________________________________________ Linuxbios mailing list Linuxbios@clustermatic.org http://www.clustermatic.org/mailman/listinfo/linuxbios
__________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
On Fri, 16 May 2003, Frank wrote:
Sounds like you may have a signal integrety problem. try hanging a scope on the address and data lines and look for transients...
To show you how bad this can get. The last problem I had with an 8601 before I gave up was that a function got called from, e.g, address f8048. The return PC got pushed on the stack as something like f804c. These are not the exact numbers as this was a long time ago.
The data error was such that most of linuxbios was working, but in this case, bit 3 got corrupted. When the RET was executed, the return address was f8048.
Infinite loop. At that point I gave up.
These data corruption problems can be very difficult.
ron