attached. It's ugly. But car is complex.
note that the stage1_main code calls disable_car, then stage1_next. But disable_car calls stage1_next too! how can this works?
The redundant call ensures this code works no matter whether the chip-dependent disable_car calls stage1_next or not.
ron
attached for real :-)
BTW, with this patch, and a slight fix to it, I am getting to the payload!
ron
On 11.12.2007 05:43, ron minnich wrote:
note that the stage1_main code calls disable_car, then stage1_next. But disable_car calls stage1_next too! how can this works?
The redundant call ensures this code works no matter whether the chip-dependent disable_car calls stage1_next or not.
I don't like this. disable_car() should just be able to return without problems. If it can't do that, fix it. Otherwise we have to document the call to stage1_end() as a requirement of any diable_car() function and *justify* it.
This change makes it possible for an lx to get through CAR to stage2.
It's kind of ugly, but what it does is require the CAR code to call a function, stage1_end. Why might this be needed? Because, at the end of CAR, we invalidate the stack, which in turn *on some systems* will lose the return address. We are thus losing our stack in disable_car, and to compensate, we are jumping.
Why exactly do we invalidate the stack? If the stack is clobbered, the contents of variables should be in a clobbered state as well. In that case, I suspect parts of our v3 code would misbehave as well.
Can we make sure the stack contents are preserved?
Signed-off-by: Ronald G. Minnich rminnich@gmail.com
Index: arch/x86/stage1.c
--- arch/x86/stage1.c (revision 542) +++ arch/x86/stage1.c (working copy) @@ -73,23 +73,12 @@ */ void __attribute__((stdcall)) stage1_main(u32 bist) {
- void stage1_end(void); int ret;
- struct mem_file archive, result;
- struct mem_file archive; int elfboot_mem(struct lb_memory *mem, void *where, int size);
void *entry;
/* we can't statically init this hack. */
unsigned char faker[64];
struct lb_memory *mem = (struct lb_memory*) faker;
mem->tag = LB_TAG_MEMORY;
mem->size = 28;
mem->map[0].start.lo = mem->map[0].start.hi = 0;
mem->map[0].size.lo = (32*1024*1024);
mem->map[0].size.hi = 0;
mem->map[0].type = LB_MEM_RAM;
post_code(0x02);
// before we do anything, we want to stop if we dont run
@@ -159,7 +148,27 @@
/* Turn off Cache-As-Ram */ disable_car();
- stage1_end();
+}
+void stage1_end(void) +{
- int ret;
- struct mem_file archive, result;
- int elfboot_mem(struct lb_memory *mem, void *where, int size);
- void *entry;
- /* we can't statically init this hack. */
- unsigned char faker[64];
- struct lb_memory *mem = (struct lb_memory*) faker;
- mem->tag = LB_TAG_MEMORY;
- mem->size = 28;
- mem->map[0].start.lo = mem->map[0].start.hi = 0;
- mem->map[0].size.lo = (32*1024*1024);
- mem->map[0].size.hi = 0;
- mem->map[0].type = LB_MEM_RAM;
- entry = load_file_segments(&archive, "normal/stage2"); if (entry == (void *)-1) die("FATAL: Failed loading stage2.");
Index: arch/x86/geodelx/stage1.c
--- arch/x86/geodelx/stage1.c (revision 542) +++ arch/x86/geodelx/stage1.c (working copy) @@ -55,9 +55,11 @@ void disable_car(void) { int i;
- void stage1_end(void);
If we really want to call stage1_end from disable_car(), we have to move that to a header file.
for (i = 0; i < ARRAY_SIZE(msr_table); i++) wrmsr(msr_table[i].msrnum, msr_table[i].msr);
__asm__("wbinvd\n");
- stage1_end();
Disagreed, se above.
}
Carl-Daniel
On Dec 12, 2007 4:50 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I don't like this. disable_car() should just be able to return without problems. If it can't do that, fix it.
I've never found the fix. What if it is a hardware limitation on LX? Note that on several LX platforms on V2, we have this: src/mainboard/digitallogic/msm800sev/cache_as_ram_auto.c: done_cache_as_ram_main();
And it's because I could never get it to work any other way. CAR is very tricky sometimes. Just take a deep look at the opteron code -- you'll see that we do some things that make CAR fail on most systems, due to a known limitation in the K8.
Otherwise we have to document the call to stage1_end() as a requirement of any diable_car() function and *justify* it.
only those architectures that we can't get it to work on.
Why exactly do we invalidate the stack? If the stack is clobbered, the contents of variables should be in a clobbered state as well. In that case, I suspect parts of our v3 code would misbehave as well.
no, because what happens here is you are essentially dropping all local variables etc. -- this is a jump to a new context, and all the variables in that context are re-initialized.
Anyway, maybe Marc can tell us what is wrong. I just don't know. I never solved it on the LX either. I'm happy to see it solved but, at the same time, I don't want to burn my next month of limited cycles on something i have a workaround for.
ron
ron minnich wrote:
On Dec 12, 2007 4:50 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I don't like this. disable_car() should just be able to return without problems. If it can't do that, fix it.
I agree.
Why exactly do we invalidate the stack? If the stack is clobbered, the contents of variables should be in a clobbered state as well. In that case, I suspect parts of our v3 code would misbehave as well.
no, because what happens here is you are essentially dropping all local variables etc. -- this is a jump to a new context, and all the variables in that context are re-initialized.
Anyway, maybe Marc can tell us what is wrong. I just don't know. I never solved it on the LX either. I'm happy to see it solved but, at the same time, I don't want to burn my next month of limited cycles on something i have a workaround for.
I assume you are asking about the wbind. The important part being the wb, writeback. This pushes anything that was in the cache to memory. In LX we maintain the same stack location (same pointers) so we don't need to copy the cache to memory. We can just write it back. This works on the Norwich platform. Any non-working LX systems should be debugged.
The K8 is a bit different. The CAR is specified to be in the C0000 region and the CAR configuration works a little bit differently than the LX and Intel way. The result of those requirements is that at the end of CAR the stack and other contents of CAR have to be copied. Once copied the pointers are adjusted and then the function returns.
That about sums it up.... Marc
OK, will get some debug prints in there for you all and you can tell me what to do.
ron
Hi Marc,
that's funny. We both wrote almost the same mail at the same time.
On 13.12.2007 02:14, Marc Jones wrote:
ron minnich wrote:
On Dec 12, 2007 4:50 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I don't like this. disable_car() should just be able to return without problems. If it can't do that, fix it.
I agree.
Why exactly do we invalidate the stack? If the stack is clobbered, the contents of variables should be in a clobbered state as well. In that case, I suspect parts of our v3 code would misbehave as well.
no, because what happens here is you are essentially dropping all local variables etc. -- this is a jump to a new context, and all the variables in that context are re-initialized.
Anyway, maybe Marc can tell us what is wrong. I just don't know. I never solved it on the LX either. I'm happy to see it solved but, at the same time, I don't want to burn my next month of limited cycles on something i have a workaround for.
I assume you are asking about the wbind. The important part being the wb, writeback. This pushes anything that was in the cache to memory. In LX we maintain the same stack location (same pointers) so we don't need to copy the cache to memory. We can just write it back. This works on the Norwich platform. Any non-working LX systems should be debugged.
Yes, I saw that wbinvd and wondered how it could fail.
The K8 is a bit different. The CAR is specified to be in the C0000 region and the CAR configuration works a little bit differently than the LX and Intel way. The result of those requirements is that at the end of CAR the stack and other contents of CAR have to be copied. Once copied the pointers are adjusted and then the function returns.
You mean, the stack is "moved"? Or why do we need to adjust the pointers? Won't that leave pointers to variables on stack dangling?
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
You mean, the stack is "moved"? Or why do we need to adjust the pointers? Won't that leave pointers to variables on stack dangling?
I meant that ss, esp, ebp are adjusted to point at the new stack. The contents of the stack don't change.
On 13.12.2007 02:29, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
You mean, the stack is "moved"? Or why do we need to adjust the pointers? Won't that leave pointers to variables on stack dangling?
I meant that ss, esp, ebp are adjusted to point at the new stack. The contents of the stack don't change.
Uhhhh. That will break v3 badly unless we change the code flow in major ways, similar to what Ron has done to work around the issue he is seeing.
Is there any chance to have K8 CAR at a location besides C0000? Or, asked another way, is the C0000 a "hard" spec or just a recommendation?
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
On 13.12.2007 02:29, Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
You mean, the stack is "moved"? Or why do we need to adjust the pointers? Won't that leave pointers to variables on stack dangling?
I meant that ss, esp, ebp are adjusted to point at the new stack. The contents of the stack don't change.
Uhhhh. That will break v3 badly unless we change the code flow in major ways, similar to what Ron has done to work around the issue he is seeing.
Is there any chance to have K8 CAR at a location besides C0000? Or, asked another way, is the C0000 a "hard" spec or just a recommendation?
That is how it is spec'd and things are a little different for each CPU revision and feature set. I think that changing the stack registers should be fine.
Marc
On 13.12.2007 01:58, ron minnich wrote:
On Dec 12, 2007 4:50 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
I don't like this. disable_car() should just be able to return without problems. If it can't do that, fix it.
I've never found the fix. What if it is a hardware limitation on LX? Note that on several LX platforms on V2, we have this: src/mainboard/digitallogic/msm800sev/cache_as_ram_auto.c: done_cache_as_ram_main();
Oh. I admit that my cycles belong to v3 only, so I had not researched the v2 code.
And it's because I could never get it to work any other way. CAR is very tricky sometimes. Just take a deep look at the opteron code -- you'll see that we do some things that make CAR fail on most systems, due to a known limitation in the K8.
Dumb question: We know the location of the stack when we enter disable_car(). We also know that all memory belongs to us. Can we copy CAR stack contents to some safe location and restore them after wbinvd()?
Otherwise we have to document the call to stage1_end() as a requirement of any diable_car() function and *justify* it.
only those architectures that we can't get it to work on.
OK, but my goal is to reduce the amount of these architectures to zero.
Why exactly do we invalidate the stack? If the stack is clobbered, the contents of variables should be in a clobbered state as well. In that case, I suspect parts of our v3 code would misbehave as well.
no, because what happens here is you are essentially dropping all local variables etc. -- this is a jump to a new context, and all the variables in that context are re-initialized.
But the reinitialization is not equivalent to keeping the old values. I'd go to great lengths to keep the old values.
Anyway, maybe Marc can tell us what is wrong. I just don't know. I never solved it on the LX either. I'm happy to see it solved but, at the same time, I don't want to burn my next month of limited cycles on something i have a workaround for.
Marc?
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
Dumb question: We know the location of the stack when we enter disable_car(). We also know that all memory belongs to us. Can we copy CAR stack contents to some safe location and restore them after wbinvd()?
This shouldn't be needed. If the stack needed to be moved you would want to do it before the wbinvd so you were copying from cache to memory. Much faster than memory to memory.
Marc
Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
Dumb question: We know the location of the stack when we enter disable_car(). We also know that all memory belongs to us. Can we copy CAR stack contents to some safe location and restore them after wbinvd()?
This shouldn't be needed. If the stack needed to be moved you would want to do it before the wbinvd so you were copying from cache to memory. Much faster than memory to memory.
Marc
and I am too smart for my own good.....
Here is the deal. Look in cpu/amd/model_lx/cache_as_ram.inc. /* If you wanted to maintain the stack in memory you would need to set the tags as dirty so the wbinvd would push out the old stack contents to memory */ /* Clear the cache, the following code from crt0.S.lb will setup a new stack*/ wbinvd
For LX we didn't need to maintain the stack or any variables beyond CAR. That meant that we didn't need to do any stack copy like the K8 post_cache_as_ram()(that never returns). The LX cache_as_ram_main returns and the LB copy function is back in cache_as_ram.inc. Since there is nothing on the stack and everything is running out of the ROM a new stack is easily created.
Sooo, if you want to maintain the stack in v3... I left a comment in the v2 code. Recall that they cache is disabled so the way doesn't get reallocated and thus the tags are never marked dirty and written back. Since the tags are never marked dirty the wbind won't wb but it will invd.
There are a couple ways to address this. 1. copy the stack to a new location. 2. Set the tags dirty with by writing the way MSRs. 3. enable the cache and copy the stack back on it's self to dirty the tags.
I am least sure about number three.
I think that sums it up. I still think that it is best/fastest if the code returned to the CAR function and it setup a new stack in memory and then launched the next stage.
Marc
On Dec 12, 2007 6:00 PM, Marc Jones marc.jones@amd.com wrote:
There are a couple ways to address this.
- copy the stack to a new location.
- Set the tags dirty with by writing the way MSRs.
- enable the cache and copy the stack back on it's self to dirty the tags.
I am least sure about number three.
Here is one idea.
Copy the stack to "somewhere" in real memory, not "CAR memory". Disable CAR. Enable the cache. Copy the stack back.
in fact, we could define disable_car as follows: void disable_car(void *savestack) { memcpy(savestack, &savestack, 4096); /* copy stack to "savestack" in real memory */ /* turn off car */ /* enable cache */ /* problem: how do we invd without losing savestack variable? */ /* one option is to ALWAYS save the stack to page 0 */ memcopy(&savestack, savestack, 4096);
Something like this might work. We're saving the stack to memory, enabling cache, and copying it back. What do you think?
ron
ron minnich wrote:
On Dec 12, 2007 6:00 PM, Marc Jones marc.jones@amd.com wrote:
There are a couple ways to address this.
- copy the stack to a new location.
- Set the tags dirty with by writing the way MSRs.
- enable the cache and copy the stack back on it's self to dirty the tags.
I am least sure about number three.
Here is one idea.
Copy the stack to "somewhere" in real memory, not "CAR memory". Disable CAR. Enable the cache. Copy the stack back.
in fact, we could define disable_car as follows: void disable_car(void *savestack) { memcpy(savestack, &savestack, 4096); /* copy stack to "savestack" in real memory */ /* turn off car */ /* enable cache */ /* problem: how do we invd without losing savestack variable? */ /* one option is to ALWAYS save the stack to page 0 */ memcopy(&savestack, savestack, 4096);
Something like this might work. We're saving the stack to memory, enabling cache, and copying it back. What do you think?
I don't think that moving the stack should be a problem. All access should be push/pop or ss/esp/ebp relative. I also don't think you need to copy the stack back (like K8) and I would only copy the amount of stack that is being used. Not the entire space.
The savestack variable won't be a problem. It will either be in a register or on the stack. Once the stack is move to real memory and the registers adjusted the invd won't change anything. The register adjustment will have to be inline asm with no push, pops, or stack accesses. For and example, look in cpu/amd/car/post_cache_as_ram.c for %esp.
Marc
On Dec 12, 2007 9:49 PM, Marc Jones marc.jones@amd.com wrote:
I don't think that moving the stack should be a problem. All access should be push/pop or ss/esp/ebp relative. I also don't think you need to copy the stack back (like K8) and I would only copy the amount of stack that is being used. Not the entire space.
actually it could be a huge problem. If at some future time someone does something such as & a stack variable, that's no longer an esp-relative reference. You can't move the stack at that point. Or you can but that pointer is now wrong.
Also, in general, it is hard to know how much of the stack is being used ... you really have to save from %esp to top of stack, whatever that is.
Here's my concern. If we return from disable_car(), and we have moved the stack, we're making a guarantee that the state of the stack is good for anything on the stack. We either meet that guarantee or we don't. If we meet it halfway, it's going to cause trouble for somebody in future, and the error is going to be quite obscure, since some things work and some don't. Possibly a comment in the calling code would help ("don't use pointers").
You really should copy the whole stack back IMHO as somebody might in future put a big struct on the stack. They have to put it on stack, since memory is not working at that point.
Or, more simply, disable_car is this:
void disable_car(void (*chain)(void), void *newstack)
1. disable CAR 2. set up new stack and execution environment in ram; if newstack is NULL, a machine-specific default is used 3. call chain(), which does not return.
disable_car does not return. This disable_car is simple and involves a lot less magic. People don't have unrealistic expectations of what is going to happen. I realize the other magic seems easier, but it's a lot harder to debug when things start to go wrong. We need code that we don't have to try to debug 2 years from now.
The savestack variable won't be a problem. It will either be in a register or on the stack. Once the stack is move to real memory and the registers adjusted the invd won't change anything. The register adjustment will have to be inline asm with no push, pops, or stack accesses. For and example, look in cpu/amd/car/post_cache_as_ram.c for %esp.
As I say, we can do this, but it won't really preserve the stack, and the likely result is a strange error the first time a pointer is used. It worries me. But let's see what people think.
ron
ron minnich wrote:
On Dec 12, 2007 9:49 PM, Marc Jones marc.jones@amd.com wrote:
I don't think that moving the stack should be a problem. All access should be push/pop or ss/esp/ebp relative. I also don't think you need to copy the stack back (like K8) and I would only copy the amount of stack that is being used. Not the entire space.
actually it could be a huge problem. If at some future time someone does something such as & a stack variable, that's no longer an esp-relative reference. You can't move the stack at that point. Or you can but that pointer is now wrong.
Yes, someone could do that. This is why I think it would be a good idea to return to CAR init code to disable CAR. It forces you back to a known state. Not much has happened prior to CAR being enabled.
Also, in general, it is hard to know how much of the stack is being used ... you really have to save from %esp to top of stack, whatever that is.
Yes, so don't copy 4KB but just up to %esp (or esp rounded to the next dword).
Here's my concern. If we return from disable_car(), and we have moved the stack, we're making a guarantee that the state of the stack is good for anything on the stack. We either meet that guarantee or we don't. If we meet it halfway, it's going to cause trouble for somebody in future, and the error is going to be quite obscure, since some things work and some don't. Possibly a comment in the calling code would help ("don't use pointers").
You really should copy the whole stack back IMHO as somebody might in future put a big struct on the stack. They have to put it on stack, since memory is not working at that point.
A similar thing is done with the K8 sysinfo structure. It isn't on the stack but it is in CAR and has to be moved to memory. As I mentioned above, if you return to the CAR code nothing can really be on the stack. This forces platform maintainers to design structures and explicitly move them to memory. They can worry about the pointer problem. Yes, it is work for them but they are in control and it shouldn't be that hard.
Or, more simply, disable_car is this:
void disable_car(void (*chain)(void), void *newstack)
- disable CAR
- set up new stack and execution environment in ram; if newstack is
NULL, a machine-specific default is used 3. call chain(), which does not return.
I think that this would work. I don't think that disable CAR has to return. I just don't think it should be called from a mainboard file like it is in V2.
Marc
Marc, should we call disable_car from the initram bits? That makes sense to me anyway.
ron
ron minnich wrote:
Marc, should we call disable_car from the initram bits? That makes sense to me anyway.
ron
So, I should go see where it is right now.... (I've been a bit busy....)
I think the location in stage1.c is good. It is controlled by stage1 core code rather than in the mainboard code. The stuff below it should be a stage1_done() or start_stage2() in stage1.c that gets called from disable_car() as you proposed. I think that would work and isn't too complicated. Everything stays together... In fact, I think I like it.
Marc
BTW, I am really happy with these kinds of discussions we're having. I think the quality of V3 is already very good as a result.
Thanks again
ron