First of all, sorry for the very late reply, however, I thought I really ought to offer my perspective on this:
On 11/25/2010 06:24 AM, Kevin O'Connor wrote:
It's difficult to have a uniform view of the stack when transition modes, so pass the return address in a register. As a result, the transition functions only access memory via the %cs selector now.
I think this assertion is rather unfortunate, because my own experience with thunking is that it is actually a very useful thing to have access to the real-mode stack.
This is simply accomplished by computing a 32-bit register containing the value (ss << 4) + sp, for example:
movzwl %sp, %eax movl %ss, %ecx shrl $4, %ecx addl %ecx, %eax
This is particularly handy if there is a push/pop of the 16-bit register set in the entry/exit sequence. Furthermore, pushing the target address onto the stack rather than stuffing it into a register allows a 32-bit routine to have full access to the 16-bit register image, whereas burning a register means that that register is going to have to be handled differently.
In Syslinux I have this formalized so that the sequence:
pushl $func32 callw _pm_call
... turns into the C function call:
void func32(com32sys_t *regs)
... where com32sys_t is a structure which contains the 16-bit register image:
typedef struct { uint16_t gs; /* Offset 0 */ uint16_t fs; /* Offset 2 */ uint16_t es; /* Offset 4 */ uint16_t ds; /* Offset 6 */
reg32_t edi; /* Offset 8 */ reg32_t esi; /* Offset 12 */ reg32_t ebp; /* Offset 16 */ reg32_t _unused_esp; /* Offset 20 */ reg32_t ebx; /* Offset 24 */ reg32_t edx; /* Offset 28 */ reg32_t ecx; /* Offset 32 */ reg32_t eax; /* Offset 36 */
reg32_t eflags; /* Offset 40 */ } com32sys_t;
This is simply the image created on the stack by the sequence (in NASM syntax):
_pm_call: pushfd pushad push ds push es push fs push gs
This has been shown to be amazingly versatile, especially since the 16-bit register image can be not just observed but written directly.
One can implement this either with or without a stack switch (to do so without a stack switch, the protected-mode ESP is computed from SS:SP). However, since real-mode stacks tend to be very small -- often only a few hundred bytes -- it is probably a bad idea.
In Syslinux this is actually implemeted in form of a lower-level function which does indeed take an address in a register, so the two approaches are not mutually exclusive. The actual full implementation of the _pm_call routine looks like (note: this code assumes CS = 0).
; ; _pm_call: call PM routine in low memory from RM ; ; on stack = PM routine to call (a 32-bit address) ; ; ECX, ESI, EDI passed to the called function; ; EAX = EBP in the called function points to the stack frame ; which includes all registers (which can be changed if desired.) ; ; All registers and the flags saved/restored ; ; This routine is invoked by the pm_call macro. ; _pm_call: pushfd pushad push ds push es push fs push gs mov bp,sp mov ax,cs mov ebx,.pm mov ds,ax jmp enter_pm
bits 32 section .textnr .pm: ; EAX points to the top of the RM stack, which is EFLAGS test RM_FLAGSH,02h ; RM EFLAGS.IF jz .no_sti sti .no_sti: call [ebp+4*2+9*4+2] ; Entrypoint on RM stack mov bx,.rm jmp enter_rm
bits 16 section .text16 .rm: pop gs pop fs pop es pop ds popad popfd ret 4 ; Drop entrypoint
The entire file including the enter_pm/enter_rm functions can be seen at:
-hpa
On 12/07/2010 05:14 PM, H. Peter Anvin wrote:
movzwl %sp, %eax movl %ss, %ecx shrl $4, %ecx addl %ecx, %eax
The third instruction should of course be shll, not shrl.
-hpa
On Tue, Dec 07, 2010 at 05:14:11PM -0800, H. Peter Anvin wrote:
First of all, sorry for the very late reply, however, I thought I really ought to offer my perspective on this:
On 11/25/2010 06:24 AM, Kevin O'Connor wrote:
It's difficult to have a uniform view of the stack when transition modes, so pass the return address in a register. As a result, the transition functions only access memory via the %cs selector now.
I think this assertion is rather unfortunate, because my own experience with thunking is that it is actually a very useful thing to have access to the real-mode stack.
This is simply accomplished by computing a 32-bit register containing the value (ss << 4) + sp, for example:
movzwl %sp, %eax movl %ss, %ecx shrl $4, %ecx addl %ecx, %eax
This is basically what the stacks.c:call32() code does.
This is particularly handy if there is a push/pop of the 16-bit register set in the entry/exit sequence. Furthermore, pushing the target address onto the stack rather than stuffing it into a register allows a 32-bit routine to have full access to the 16-bit register image, whereas burning a register means that that register is going to have to be handled differently.
In Syslinux I have this formalized so that the sequence:
pushl $func32 callw _pm_call
This is similar to what SeaBIOS used to do - it had: "pushl $func32; jmp transition32" and "pushl $func16; jmp transition16".
The problem with this is that I can't use "popl" to get the destination address in transition16 because a popl in 16bit mode only looks at %sp and not %esp. So, if %esp==0x90000 and I do "pushl $func16; transition16", then when transition16 does a "retl" (or "popl") then it ends up pulling the address at 0x0000 instead of 0x90000.
You might ask why transition16 doesn't restore %ss/%sp - but that's what the caller (stacks.c:call32) does. So, the real mode stack is available, it's just not the task of transition16.
[...]
typedef struct { uint16_t gs; /* Offset 0 */ uint16_t fs; /* Offset 2 */
[...]
reg32_t eflags; /* Offset 40 */
} com32sys_t;
That's basically the same thing as SeaBIOS's "struct bregs" in bregs.h. The 16bit entry points back up the register state, pass it into the C code (which can then modify the registers if needed), and then restore the register state on return.
[...]
_pm_call:
[...]
mov ebx,.pm mov ds,ax jmp enter_pm
This looks very similar to what SeaBIOS has now - _pm_call is the equivalent of the inline asm in call32() and enter_pm is the equivalent of transtion32. See:
http://git.linuxtogo.org/?p=kevin/seabios.git;a=blob;f=src/stacks.c;h=c77ba1...
and:
http://git.linuxtogo.org/?p=kevin/seabios.git;a=blob;f=src/romlayout.S;h=d83...
-Kevin
On 12/07/2010 06:46 PM, Kevin O'Connor wrote:
In Syslinux I have this formalized so that the sequence:
pushl $func32 callw _pm_call
This is similar to what SeaBIOS used to do - it had: "pushl $func32; jmp transition32" and "pushl $func16; jmp transition16".
The problem with this is that I can't use "popl" to get the destination address in transition16 because a popl in 16bit mode only looks at %sp and not %esp. So, if %esp==0x90000 and I do "pushl $func16; transition16", then when transition16 does a "retl" (or "popl") then it ends up pulling the address at 0x0000 instead of 0x90000.
Right, the code needs to compute the 32-bit flat version and look at it. You don't want to use popl at all. In my code I just use the stored reference on the stack as the target address of the call once we're well within the 32-bit code, at the very end it is dropped by a simple "ret 4".
Anyway, sounds like there might be a fuller version and I'm only seeing part of it.
-hpa