Hello folks,
I am working on Minix, and we detected a problem which might be related with SeaBIOS.
In short, Minix' boot monitor (first written in 1992, and which supports 80286 architectures), while in 16-bit real mode, uses BIOS service 15/87 to move blocks into high memory.
It stopped working recently with kvm on Intel.
On 03/10/2010 12:26 PM, Erik van der Kouwe wrote on kvm mailing list:
I've submitted this bug report a week ago: http://sourceforge.net/tracker/?func=detail&aid=2962575&group_id=180...
To which Avi Kivity wrote on 2010-03-10 13:03:25 +0200:
MINIX is using big real mode which is currently not well supported by
kvm on Intel hardware:
(qemu) info registers EIP=0000f4a7 EFL=00023002 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 0000f300 CS =f000 000f0000 0000ffff 0000f300 SS =9492 00094920 0000ffff 0000f300 DS =97ce 00097cec 0000ffff 0000f300
A ds.base of 0x97cec cannot be translated to a real mode segment.
I searched the issue; if you are interested in the details, please read http://groups.google.com/group/minix3/msg/40f44df0c434cfa6
I notice it might be related to the switch from Bochs to SeaBIOS.
http://bochs.sourceforge.net/cgi-bin/lxr/source/bios/rombios.c has: 3555 case 0x87: ... 3640 mov eax, cr0 3641 or al, #0x01 3642 mov cr0, eax 3643 ;; far jump to flush CPU queue after transition to prot. mode 3644 JMP_AP(0x0020, protected_mode) 3645 3646 protected_mode: 3647 ;; GDT points to valid descriptor table, now load SS, DS, ES ... 3657 rep 3658 movsw ;; move CX words from DS:SI to ES:DI 3659 3660 ;; make sure DS and ES limits are 64KB 3661 mov ax, #0x28 3662 mov ds, ax 3663 mov es, ax 3664 3665 ;; reset PG bit in CR0 ??? 3666 mov eax, cr0 3667 and al, #0xFE 3668 mov cr0, eax
In SeaBIOS, the applicable code is in src/system.c, and looks like (now this is AT&T assembly): 83 static void 84 handle_1587(struct bregs *regs) .... 127 // Enable protected mode 128 " movl %%cr0, %%eax\n" 129 " orl $" __stringify(CR0_PE) ", %%eax\n" 130 " movl %%eax, %%cr0\n" 131 132 // far jump to flush CPU queue after transition to prot. mode 133 " ljmpw $(4<<3), $1f\n" 134 135 // GDT points to valid descriptor table, now load DS, ES ... 144 " rep movsw\n" 145 146 // Disable protected mode 147 " movl %%cr0, %%eax\n" 148 " andl $~" __stringify(CR0_PE) ", %%eax\n" 149 " movl %%eax, %%cr0\n"
Note that while the basic scheme is the same, the "cleaning up" of lines 3660-3663 "make sure DS and ES limits are 64KB" is not present.
IIUC, the virtualized CPU goes back to real mode with those segments sets as they are in protected mode, and yes with Minix boot monitor they happenned to NOT be paragraph-aligned. And aligning it kills the bug...
Avi Kivity seems to think this is a possible explanation; and he does not see any harm in adding that cleaning up, while it could have the benefical property to avoid possible faults while virtualizing the CPU (which is a bit tricky with Intel hardware in real mode.)
Is it possible to add such "cleaning up" to SeaBIOS too?
Antoine
On Mon, Mar 15, 2010 at 04:28:02PM +0100, Antoine Leca wrote:
http://bochs.sourceforge.net/cgi-bin/lxr/source/bios/rombios.c has:
[...]
3660 ;; make sure DS and ES limits are 64KB 3661 mov ax, #0x28 3662 mov ds, ax 3663 mov es, ax
[...]
In SeaBIOS, the applicable code is in src/system.c, and looks like
[...]
Note that while the basic scheme is the same, the "cleaning up" of lines 3660-3663 "make sure DS and ES limits are 64KB" is not present.
That does appear to be a SeaBIOS error. I'll commit a fix (see below).
[...]
(qemu) info registers EIP=0000f4a7 EFL=00023002 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 0000f300 CS =f000 000f0000 0000ffff 0000f300 SS =9492 00094920 0000ffff 0000f300 DS =97ce 00097cec 0000ffff 0000f300
A ds.base of 0x97cec cannot be translated to a real mode segment.
However, it's not clear why it would make a difference. The segment limit is shown as 0xffff here - it's the segment base which is not aligned. On return to real mode, the segment base should have been reloaded..
-Kevin
--- a/src/system.c +++ b/src/system.c @@ -143,6 +143,11 @@ handle_1587(struct bregs *regs) " xorw %%di, %%di\n" " rep movsw\n"
+ // Restore DS and ES segment limits to 0xffff + " movw $(5<<3), %%ax\n" // 5th descriptor in table (SS) + " movw %%ax, %%ds\n" + " movw %%ax, %%es\n" + // Disable protected mode " movl %%cr0, %%eax\n" " andl $~" __stringify(CR0_PE) ", %%eax\n"
On Mon, Mar 15, 2010 at 07:37:56PM -0400, Kevin O'Connor wrote:
On Mon, Mar 15, 2010 at 04:28:02PM +0100, Antoine Leca wrote:
http://bochs.sourceforge.net/cgi-bin/lxr/source/bios/rombios.c has:
[...]
3660 ;; make sure DS and ES limits are 64KB 3661 mov ax, #0x28 3662 mov ds, ax 3663 mov es, ax
[...]
In SeaBIOS, the applicable code is in src/system.c, and looks like
[...]
Note that while the basic scheme is the same, the "cleaning up" of lines 3660-3663 "make sure DS and ES limits are 64KB" is not present.
That does appear to be a SeaBIOS error. I'll commit a fix (see below).
[...]
(qemu) info registers EIP=0000f4a7 EFL=00023002 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 0000f300 CS =f000 000f0000 0000ffff 0000f300 SS =9492 00094920 0000ffff 0000f300 DS =97ce 00097cec 0000ffff 0000f300
A ds.base of 0x97cec cannot be translated to a real mode segment.
However, it's not clear why it would make a difference. The segment limit is shown as 0xffff here - it's the segment base which is not aligned. On return to real mode, the segment base should have been reloaded..
What part of Intel SDM says so?
-Kevin
--- a/src/system.c +++ b/src/system.c @@ -143,6 +143,11 @@ handle_1587(struct bregs *regs) " xorw %%di, %%di\n" " rep movsw\n"
// Restore DS and ES segment limits to 0xffff
" movw $(5<<3), %%ax\n" // 5th descriptor in table (SS)
" movw %%ax, %%ds\n"
" movw %%ax, %%es\n"
// Disable protected mode " movl %%cr0, %%eax\n" " andl $~" __stringify(CR0_PE) ", %%eax\n"
SeaBIOS mailing list SeaBIOS@seabios.org http://www.seabios.org/mailman/listinfo/seabios
-- Gleb.
On Tue, Mar 16, 2010 at 08:34:33AM +0200, Gleb Natapov wrote:
On Mon, Mar 15, 2010 at 07:37:56PM -0400, Kevin O'Connor wrote:
However, it's not clear why it would make a difference. The segment limit is shown as 0xffff here - it's the segment base which is not aligned. On return to real mode, the segment base should have been reloaded..
What part of Intel SDM says so?
SeaBIOS had an explicit segment load of DS and ES in real mode. The segment loads in real mode should have loaded new segment bases. I haven't had a chance to find a reference in the SDM - do you think this is not so?
In any case, it's a SeaBIOS bug because the segment limits and flags would be off - the fix is commit c35e1e50. If this also makes things work from KVM, that's all the better.
-Kevin
On Thu, Mar 18, 2010 at 11:07:44AM -0400, Kevin O'Connor wrote:
On Tue, Mar 16, 2010 at 08:34:33AM +0200, Gleb Natapov wrote:
On Mon, Mar 15, 2010 at 07:37:56PM -0400, Kevin O'Connor wrote:
However, it's not clear why it would make a difference. The segment limit is shown as 0xffff here - it's the segment base which is not aligned. On return to real mode, the segment base should have been reloaded..
What part of Intel SDM says so?
SeaBIOS had an explicit segment load of DS and ES in real mode. The segment loads in real mode should have loaded new segment bases. I haven't had a chance to find a reference in the SDM - do you think this is not so?
Explicit segment load in real mode should reset base and limit to correct values, but you wrote "On return to real mode, the segment base should have been reloaded" and this is not the case AFAIK. May be I misunderstood what you mean.
In any case, it's a SeaBIOS bug because the segment limits and flags would be off - the fix is commit c35e1e50. If this also makes things work from KVM, that's all the better.
Thanks!
-- Gleb.
Kevin O'Connor wrote:
On Mon, Mar 15, 2010 at 04:28:02PM +0100, Antoine Leca wrote:
3660 ;; make sure DS and ES limits are 64KB 3661 mov ax, #0x28 3662 mov ds, ax 3663 mov es, ax
[...]
Note that while the basic scheme is the same, the "cleaning up" of lines 3660-3663 "make sure DS and ES limits are 64KB" is not present.
That does appear to be a SeaBIOS error. I'll commit a fix
Thanks.
(qemu) info registers SS =9492 00094920 0000ffff 0000f300 DS =97ce 00097cec 0000ffff 0000f300
A ds.base of 0x97cec cannot be translated to a real mode segment.
However, it's not clear why it would make a difference.
Because when you return to real mode, if you do not clean up DS (and ES) beforehand, you end up with the previous (cached) bases and limits, the ones used for the move; the problem occurs with KVM (hardware virtualization) on Intel VT, because on that platform real mode is not really possible, so it is faked... and they have no good way to fake a mis-aligned segment (not present segment is not available in V8086, for example.)
The segment limit is shown as 0xffff here - it's the segment base which is not aligned.
You are correct: the comment in Bochs BIOS is slightly misleading here. It is my guess they had a similar problem with big segments (so-called unreal mode) which caused some overflow or something: so they implemented that "fix", which is also curing our problem with mis-aligned base (and appears a Good Move, IMHO.)
On return to real mode, the segment base should have been reloaded..
No: it is well known (and used here and there) that on return from PM to RM, the segment caches are not cleared; and I believe it is even documented in Intel manuals that it is up to the application programmer to reload the segments.
Now, what we are making here is to reload _before_ the mode change, because kvm is unable to "reload" the CPU segment caches with the adequate values.
Antoine
On 03/16/2010 01:31 PM, Antoine Leca wrote:
(qemu) info registers SS =9492 00094920 0000ffff 0000f300 DS =97ce 00097cec 0000ffff 0000f300
A ds.base of 0x97cec cannot be translated to a real mode segment.
However, it's not clear why it would make a difference.
Because when you return to real mode, if you do not clean up DS (and ES) beforehand, you end up with the previous (cached) bases and limits, the ones used for the move; the problem occurs with KVM (hardware virtualization) on Intel VT, because on that platform real mode is not really possible, so it is faked... and they have no good way to fake a mis-aligned segment (not present segment is not available in V8086, for example.)
Correct. To be clear, I regard this as a kvm bug, however fixing it will take a while, so I would appreciate a workaround in seabios.