Please excuse this blast. Here's the problem: CBFS is breaking something it can't break. If you turn on CBFS, then very early startup in the opteron code fails. this is verified across several mainboards. Any wild ideas welcome. I can't even figure out where to start ...
ron
Forwarded conversation Subject: s2892 + CBFS strange failure ------------------------
From: *Myles Watson* mylesgw@gmail.com Date: Wed, Apr 22, 2009 at 9:05 AM To: ron minnich rminnich@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de, Marc Jones < marcj303@gmail.com>
On Wed, Apr 22, 2009 at 9:56 AM, ron minnich rminnich@gmail.com wrote:
can I bring in patrick and stephan and marcj? This is getting too weird.
:)
Probably part of it is miscommunication on my part, but I'd be glad for any help.
Here's the summary:
With CONFIG_CBFS = 0 it works fine With CONFIG_CBFS = 1 I get (warm reset):
INIT detected from --- { APICID = 00 NODEID = 00 COREID = 00} ---
Issuing SOFT_RESET...
Then nothing else. Post code 0xf0
With CONFIG_CBFS = 1 I get (cold reset):
Nothing. Post code 0xf0
I've been inserting post codes, and it always makes it to real_main. It just doesn't make it out of init_cpus. On a warm reset I get the serial output. Otherwise there is none.
We've tried using a different compiler. Same results. We've tried no payload and no VGA ROM.
Thanks, Myles
---------- From: *ron minnich* rminnich@gmail.com Date: Wed, Apr 22, 2009 at 9:12 AM To: Myles Watson mylesgw@gmail.com Cc: Stefan Reinauer stepan@coresystems.de, Patrick Georgi < patrick.georgi@coresystems.de>, Marc Jones marcj303@gmail.com
Also, myles, this all works on serengeti, right?
ron
---------- From: *Myles Watson* mylesgw@gmail.com Date: Wed, Apr 22, 2009 at 9:13 AM To: ron minnich rminnich@gmail.com Cc: Stefan Reinauer stepan@coresystems.de, Patrick Georgi < patrick.georgi@coresystems.de>, Marc Jones marcj303@gmail.com
I was in the middle of writing that :)
I forgot an interesting point:
The broken image works on SimNOW until it can't find the SMBUS. But it always gets far enough that there is some serial output.
Thanks, Myles
---------- From: *Marc Jones* marcj303@gmail.com Date: Wed, Apr 22, 2009 at 10:28 AM To: Myles Watson mylesgw@gmail.com Cc: ron minnich rminnich@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de
That is very strange. can you attempt to track when it starts to fail? Does it have to boot all the way into linux before the reset stops working or does it happened before it loads any payloads?
I can't think of anything that would cause that kind of problem. Can you narrow it down in cpu_init?
Marc
-- http://marcjonesconsulting.com
---------- From: *Myles Watson* mylesgw@gmail.com Date: Wed, Apr 22, 2009 at 10:48 AM To: Marc Jones marcj303@gmail.com Cc: ron minnich rminnich@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de
Sorry I was unclear again. I'll try to explain better.
When I said it happens on warm reset, I meant from a working image.
1. boot a working image 2. switch to cbfs image 3. warm reset gives some output It seems like it hangs on the first call to printk that it reaches. I tried moving console_init ahead of init_cpus in real_main, but it didn't change the behavior.
Thanks, Myles
---------- From: *ron minnich* rminnich@gmail.com Date: Wed, Apr 22, 2009 at 1:44 PM To: Myles Watson mylesgw@gmail.com Cc: Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de
so cbfs works with qemu kontron (yes or no?I think yes) serengeit
and it fails with this board.
are these older CPUs? What stepping?
I have to admit I'm stumped.
ron
---------- From: *Myles Watson* mylesgw@gmail.com Date: Wed, Apr 22, 2009 at 1:47 PM To: ron minnich rminnich@gmail.com Cc: Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de
I may have been chasing the wrong thing here. When I was helping Samuel with the the dl145 he said that somewhere after 4030 cold boot broke for him. He's bisecting now.
Thanks, Myles
---------- From: *Myles Watson* mylesgw@gmail.com Date: Thu, Apr 23, 2009 at 6:15 AM To: ron minnich rminnich@gmail.com Cc: Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de
Just to add a wrinkle my onboard graphics died. That's why things were flaky yesterday. It just stopped responding to config reads and gets disabled by coreboot.
I added a video card and I'm back up. Cold boot works for me with 4193 (No CBFS), so the Config changes were fine. It's still broken for CBFS for me. Unless someone has an idea of how to track it down I'm just going to not use CBFS for now, even though I like the CBFS option much better.
Thanks, Myles
---------- From: *ron minnich* rminnich@gmail.com Date: Thu, Apr 23, 2009 at 7:57 AM To: Myles Watson mylesgw@gmail.com Cc: Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de
we really need to track this down because whatever this may be, it's unlikely to be cbfs. Not if you're not getting any prints at all.
It would still be interesting if you could try the very first version where cbfs was introduced.
ron
---------- From: *Myles Watson* mylesgw@gmail.com Date: Thu, Apr 23, 2009 at 12:04 PM To: ron minnich rminnich@gmail.com Cc: Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de
There were some fixes put in pretty quickly. I just tried 4113 (the rename.) Which one would you suggest next?
Thanks, Myles
---------- From: *Myles Watson* mylesgw@gmail.com Date: Thu, Apr 23, 2009 at 3:09 PM To: ron minnich rminnich@gmail.com
4061 fails with CBFS but not without.
Thanks, Myles
---------- From: *ron minnich* rminnich@gmail.com Date: Fri, Apr 24, 2009 at 7:23 AM To: Myles Watson mylesgw@gmail.com
no serial output and SPEW?
ron
---------- From: *Myles Watson* mylesgw@gmail.com Date: Fri, Apr 24, 2009 at 7:38 AM To: ron minnich rminnich@gmail.com
For a warm boot. Nothing from a cold boot. 4061 no CBFS works fine.
Thanks, Myles
---------- From: *Myles Watson* mylesgw@gmail.com Date: Fri, Apr 24, 2009 at 7:40 AM To: ron minnich rminnich@gmail.com
SPEW is definitely enabled for the working one.
Thanks, Myles
---------- From: *Myles Watson* mylesgw@gmail.com Date: Fri, Apr 24, 2009 at 8:18 AM To: ron minnich rminnich@gmail.com Cc: Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de
4061 on my s2892 with SPEW: On my s2895 I am having problems with warm reset, and a cold boot powers itself off quickly with post code 0xf0.
Thanks, Myles
---------- From: *ron minnich* rminnich@gmail.com Date: Fri, Apr 24, 2009 at 9:47 AM To: Myles Watson mylesgw@gmail.com Cc: Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Patrick Georgi patrick.georgi@coresystems.de, Ward Vandewege <ward@gnu.org
OK, this is nuts. CBFS is in the ram code. It can't affect he ROM code, can it? And this is really early! Here are the only things I can think of: 1. CBFS changes layout somehow 2. Turning off ELFBOOT turned off something hidden 3. it's changing the way gcc works
I just don't know. Somehow we've got to find this. I will set up my dbm board tonight.
Patrick, Stefan, have you tested CBFS with the kontron?
ron
---------- From: *Patrick Georgi* patrick.georgi@coresystems.de Date: Fri, Apr 24, 2009 at 9:49 AM To: ron minnich rminnich@gmail.com Cc: Myles Watson mylesgw@gmail.com, Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
Am 24.04.2009 18:47, schrieb ron minnich: That's where my lzma.c patch came from. I'm debugging the bounce buffer code right now, it seems to copy correctly into the buffer, but I'm not convinced yet that it correctly copies back.
Patrick
---------- From: *ron minnich* rminnich@gmail.com Date: Fri, Apr 24, 2009 at 9:55 AM To: Patrick Georgi patrick.georgi@coresystems.de Cc: Myles Watson mylesgw@gmail.com, Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
but Myle's failure is WAY before any of that. His machine dies in the very earliest C code.
I do not really like the bounce buffer ... it' just too fragile for my taste. If anything goes wrong, well, you're in assembly code with no way out.
ron
---------- From: *Patrick Georgi* patrick.georgi@coresystems.de Date: Fri, Apr 24, 2009 at 10:35 AM To: ron minnich rminnich@gmail.com Cc: Myles Watson mylesgw@gmail.com, Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
OK, this is nuts. CBFS is in the ram code. It can't affect he ROM
I'm not quite sure at which point in the boot process the last message before the reboot comes up, so this is just a guess.
Could it be that it tries to jump into the normal image? I'm not quite certain that we get that entirely correct (and the layout might change in that dark corner of the build system). Replacing that "jmp __normal_image" with "jmp __fallback_image" might help then (for testing).
Patrick
---------- From: *ron minnich* rminnich@gmail.com Date: Fri, Apr 24, 2009 at 10:42 AM To: Patrick Georgi patrick.georgi@coresystems.de Cc: Myles Watson mylesgw@gmail.com, Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
On Fri, Apr 24, 2009 at 10:35 AM, Patrick Georgi not *that* is a pretty smart guess. Myles, were you runinng fallback/normal?
ron
---------- From: *Myles Watson* mylesgw@gmail.com Date: Fri, Apr 24, 2009 at 10:42 AM To: ron minnich rminnich@gmail.com Cc: Patrick Georgi patrick.georgi@coresystems.de, Marc Jones < marcj303@gmail.com>, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
fallback only. I'll try it.
Thanks, Myles
---------- From: *ron minnich* rminnich@gmail.com Date: Fri, Apr 24, 2009 at 10:57 AM To: Myles Watson mylesgw@gmail.com Cc: Patrick Georgi patrick.georgi@coresystems.de, Marc Jones < marcj303@gmail.com>, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
src/lib/cbfs.c src/include/cbfs.h src/devices/pci_rom.c src/boot/selfboot.c src/boot/hardwaremain.c
But none of these are involved in the early CAR code.
There is another possibility: are we somehow messing up the HT configuration space? That would explain why you die after init_cpus.
ron
---------- From: *ron minnich* rminnich@gmail.com Date: Fri, Apr 24, 2009 at 11:11 AM To: Myles Watson mylesgw@gmail.com Cc: Patrick Georgi patrick.georgi@coresystems.de, Marc Jones < marcj303@gmail.com>, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
well,this problem just became urgent. I've got no idea where to start and no time right now to work on it :-(
And it doesn't break on simnow, right, myles? Patrick, any progress on kontron?
oh, !@#$@!#$@!#$!@$#
ron
---------- From: *Myles Watson* mylesgw@gmail.com Date: Fri, Apr 24, 2009 at 11:12 AM To: ron minnich rminnich@gmail.com Cc: Patrick Georgi patrick.georgi@coresystems.de, Marc Jones < marcj303@gmail.com>, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
And why there's only output on a warm reset.
I don't know. I tried removing the normal image jump in cache_as_ram_auto.c. I guess I should have remembered that it got past there before init_cpus.
Myles
---------- From: *Myles Watson* mylesgw@gmail.com Date: Fri, Apr 24, 2009 at 11:13 AM To: ron minnich rminnich@gmail.com Cc: Patrick Georgi patrick.georgi@coresystems.de, Marc Jones < marcj303@gmail.com>, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
Right. The serengeti image works fine, and the s2892 image runs until it notices it has a different chipset and dies with "SMBUS not found."
Myles
---------- From: *ron minnich* rminnich@gmail.com Date: Fri, Apr 24, 2009 at 11:15 AM To: Myles Watson mylesgw@gmail.com Cc: Patrick Georgi patrick.georgi@coresystems.de, Marc Jones < marcj303@gmail.com>, Stefan Reinauer stepan@coresystems.de, Ward Vandewege ward@gnu.org
Anyone mind if I just take this to the list.
ron
---------- From: *Ward Vandewege* ward@gnu.org Date: Fri, Apr 24, 2009 at 11:16 AM To: ron minnich rminnich@gmail.com Cc: Myles Watson mylesgw@gmail.com, Patrick Georgi < patrick.georgi@coresystems.de>, Marc Jones marcj303@gmail.com, Stefan Reinauer stepan@coresystems.de
Please do.
Thanks, Ward.
-- Ward Vandewege ward@fsf.org Free Software Foundation - Senior Systems Administrator
---------- From: *Myles Watson* mylesgw@gmail.com Date: Fri, Apr 24, 2009 at 11:21 AM To: ron minnich rminnich@gmail.com
No problem here. We probably should have done it a while ago. I just didn't want to make too big of a stink if we could fix it quickly.
Thanks, Myles
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I guess you hit some ROMSTRAPs?
Rudolf
On Fri, Apr 24, 2009 at 11:59 AM, Rudolf Marek r.marek@assembler.cz wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I guess you hit some ROMSTRAPs?
hmm. Not sure I understand. how would that affect it?
ron
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
hmm. Not sure I understand. how would that affect it?
On particular addresses in flash there are some magic values for chipset. (HT setup etc) This is fetched by chipset before CPU is on. So it can fetch the code at all. Maybe, just maybe CBFS changed that rom locations.
Rudolf
On Fri, Apr 24, 2009 at 12:05 PM, Rudolf Marek r.marek@assembler.cz wrote:
On particular addresses in flash there are some magic values for chipset. (HT setup etc) This is fetched by chipset before CPU is on. So it can fetch the code at all. Maybe, just maybe CBFS changed that rom locations.
I think you just nailed it! That makes sense. I just totally forgot about that aspect of the nvidia stuff.
Ward and Myles, can you check that?
ron
On Fri, Apr 24, 2009 at 1:07 PM, ron minnich rminnich@gmail.com wrote:
On Fri, Apr 24, 2009 at 12:05 PM, Rudolf Marek r.marek@assembler.cz wrote:
On particular addresses in flash there are some magic values for chipset. (HT setup etc) This is fetched by chipset before CPU is on. So it can fetch the code at all. Maybe, just maybe CBFS changed that rom locations.
I think you just nailed it! That makes sense. I just totally forgot about that aspect of the nvidia stuff.
Ward and Myles, can you check that?
here's the tail end of the hexdump for the normal image:
* 000fff60 ff ff ff ff ff ff ff ff ff 54 79 61 6e 00 73 32 |.........Tyan.s2| 000fff70 38 39 32 00 97 00 00 00 92 00 00 00 00 70 0f 00 |892..........p..| 000fff80 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| * 000fffa0 65 d0 16 2b 00 00 00 00 00 00 00 00 b0 ff ff ff |e..+............| 000fffb0 1c 00 03 00 00 00 00 08 00 00 00 00 ff ff ff ff |................| 000fffc0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 000fffd0 66 32 54 81 e0 00 00 00 ce 09 23 00 00 81 e0 00 |f2T.......#.....| 000fffe0 a0 ff ff ff a0 ff ff ff a0 ff ff ff a0 ff ff ff |................| 000ffff0 e9 b5 79 ff ff 00 00 00 e9 03 7a ff ff 00 00 00 |..y.......z.....| 00100000
And the cbfs image:
* 000fff60 ff ff ff ff ff ff ff ff ff 54 79 61 6e 00 73 32 |.........Tyan.s2| 000fff70 38 39 32 00 97 00 00 00 92 00 00 00 00 00 08 00 |892.............| 000fff80 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| * 000fffa0 65 d0 16 2b 00 00 00 00 00 00 00 00 b0 ff ff ff |e..+............| 000fffb0 1c 00 03 00 00 00 00 08 00 00 00 00 ff ff ff ff |................| 000fffc0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 000fffd0 4f 52 42 43 00 00 00 00 00 08 00 00 00 02 00 00 |ORBC............| 000fffe0 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000ffff0 e9 85 01 ff ff 00 00 00 e9 d3 01 ff d0 ff ff ff |................| 00100000
It looks like romstrap is fffa0-ffff0. Definitely different.
Thanks Rudolf!
Thanks, Myles