On 04.01.2008 02:43, ron minnich wrote:
I am restarting the CAR discussion a bit. I want to get a resync.
Thanks.
One part of the discussion involves the idea that I am trying to avoid return from disable_car because of some "bug" in our code. It is not that simple. The issue is that, when the code enters disable_car,the CPU has been using cache as ram, and using "memory" at non-memory addresses. In disable_car, the CPU is supposed to disable cache as ram, and continue -- should it return? The return is on the stack. The function that called disable_car has its own stack variables. They're actually in cache, not ram. So if the code is going to return, it will need a working stack. It will need to restore the stack from cache to ram. But where is the stack? at c000 in one case. Where is ram? Somewhere else.
That's something I don't understand. Are you saying that there might be no RAM at address 0xC000? (Or at 0xC0000, IIRC.) That's well below 1MB and on my local system 0xC0000-0xCFFFF is claimed by the Video ROM of the embedded ATI graphics. Then again, I seem to remember that this area can be RAM as well.
So it is possible that variables on stack are wrong, if they are pointers to variables on the stack, and there is the problem of changing esp, ss, and then fixing up the stack for the RET.
Here's a useful comment: x86/car/cache_as_ram_post.c: /* FIXME: I hope we don't need to change esp and ebp value here, so we can restore value from mmx sse back But the problem is the range is some io related, So don't go back */
If you look at the cache as ram for the amd and x86, you will find something interesting: it does not go back. x86: leal _iseg, %edi jmp *%edi
For AMD, at the end of the mainboard cache as ram main, it calls post_cache_as_ram(void). At the end of that function is this: /*copy and execute linuxbios_ram */ copy_and_run(); /* We will not return */
in other words, once CAR is disabled, we chain to copy_and_run, and never return. We've also set up a new ESP.
To put it simply, the most current CAR code *does not return* --because getting that right, for all CPUs, is tricky and not always possible.
OK.
So, expecting a return from disable car is not really in line with what we do now. In fact, what we do now on most of v2 -- certainly on opteron -- is merely what I'm proposing we do on V3.
I'm pasting in some old mail now, with my comments in[[[]]]
Thanks for that. It seems I missed the essence of that mail back then.
================================================================= Marc Jones wrote:
Carl-Daniel Hailfinger wrote:
Dumb question: We know the location of the stack when we enter disable_car(). We also know that all memory belongs to us. Can we copy CAR stack contents to some safe location and restore them after wbinvd()?
This shouldn't be needed. If the stack needed to be moved you would want to do it before the wbinvd so you were copying from cache to memory. Much faster than memory to memory.
Marc
and I am too smart for my own good.....
Here is the deal. Look in cpu/amd/model_lx/cache_as_ram.inc. /* If you wanted to maintain the stack in memory you would need to set the tags as dirty [[[ Good point here! For the wbinvd to push the stack out, we need to set cache tags! This is going to be tough on some CPUs -- how do we do it on LX? ]]]
I assumed from one of the mails in that old thread that setting the cache tags would be easy.
so the wbinvd would push out the old stack contents to memory */
/* Clear the cache, the following code from crt0.S.lb will setup a new stack*/ wbinvd
For LX we didn't need to maintain the stack or any variables beyond CAR. [[[[ But on v3 we do .... ]]]] That meant that we didn't need to do any stack copy like the K8 post_cache_as_ram()(that never returns). The LX cache_as_ram_main returns and the LB copy function is back in cache_as_ram.inc. Since there is nothing on the stack and everything is running out of the ROM a new stack is easily created.
[[[ Please note the comment: "that never returns"]]]
OK.
[[[ and, note: if you look at that code, round about line 313, you'll notice that it does not return either]]]
Sooo, if you want to maintain the stack in v3... I left a comment in the v2 code. Recall that they cache is disabled so the way doesn't get reallocated and thus the tags are never marked dirty and written back. Since the tags are never marked dirty the wbind won't wb but it will invd.
[[[meaning: stack cache lines will be invalidated but NOT written back ...]]]
And that means the void * parameter to the running functions will be invalidated as well because it is passed on the stack. BOOM. You might be lucky enough that gcc generates code which loads the parameter into a register before the wbinvd and does not try to reload that parameter from your now-defunct old stack, but depending on luck is not an option here. If we kill our stack by making the wbinvd essentially an invd (because tags are not dirty), no function arguments nor any local variables will be available directly after the wbinvd. The only way out is to directly call a function (not by pointer) whose address is known during link time. Something like the call to copy_and_run() from v2 is needed here as well.
There are a couple ways to address this.
- copy the stack to a new location.
- Set the tags dirty with by writing the way MSRs.
[[[ This is how you set the tags ]]]
That's the way I'd prefer to handle things.
- enable the cache and copy the stack back on it's self to dirty the tags.
[[[This might work too to set the tags BUT -- the stack really has to be running at the right location. We would have to change this.]]]
I am least sure about number three.
I think that sums it up. I still think that it is best/fastest if the code returned to the CAR function and it setup a new stack in memory and then launched the next stage.
[[[ Which is pretty much what I am proposing]]]
I hope this additional discussion is helpful.
Indeed, thanks!
Now, getting back to where we were: I am still arguing for what Marc argues in the last paragraph, stating "I still think that it is best/fastest if the code returned to the CAR function and it setup a new stack in memory and then launched the next stage.".
As long as I can salvage the old stack contents in some way, that would be acceptable for me as well.
Carl-Daniel, I believe you are hoping we can do the return from disable_car, but as Marc points out in options 1-3 above, it's really not simple. This is not a "bug" per se. Fixing it will be complex. It is possible that you will need different solutions for each CPU *implementation* in some cases, not just each CPU type -- for example, the Intel CAR had to be changed for AMD, and different types of Opteron may well need different CAR return code.
An explosion of code variants for CAR disabling would seriously compromise maintainability of our code.
Regards, Carl-Daniel