With a few additional checks it's possible to emulate all the leal cases without requiring a function call. Although this makes the vgafixup.py code a little more complex it eliminates the need for the "emulate_leal" function and it produces better code. In my tests, almost all "leal" instructions are replaced with 4 (or fewer) instructions.
-Kevin
Kevin O'Connor (2): vgabios: Add config option for assembler fixups vgabios: Emulate "leal" instruction
Makefile | 21 +++++++++--------- scripts/vgafixup.py | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++--- vgasrc/Kconfig | 10 +++++++++ vgasrc/vgaentry.S | 28 +++++------------------ 4 files changed, 87 insertions(+), 36 deletions(-)
Add a kconfig build option (CONFIG_VGA_FIXUP_ASM) to allow users to build the vgabios without the complex assembler fixups that work around emulator bugs.
Signed-off-by: Kevin O'Connor kevin@koconnor.net --- Makefile | 21 +++++++++++---------- vgasrc/Kconfig | 10 ++++++++++ vgasrc/vgaentry.S | 4 ++++ 3 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/Makefile b/Makefile index a84a5f7..a4d945c 100644 --- a/Makefile +++ b/Makefile @@ -56,7 +56,7 @@ COMMONCFLAGS := -I$(OUT) -Isrc -Os -MD -g \ -Wall -Wno-strict-aliasing -Wold-style-definition \ $(call cc-option,$(CC),-Wtype-limits,) \ -m32 -march=i386 -mregparm=3 -mpreferred-stack-boundary=2 \ - -minline-all-stringops \ + -minline-all-stringops -fomit-frame-pointer \ -freg-struct-return -ffreestanding -fno-delete-null-pointer-checks \ -ffunction-sections -fdata-sections -fno-common -fno-merge-constants COMMONCFLAGS += $(call cc-option,$(CC),-nopie,) @@ -64,15 +64,14 @@ COMMONCFLAGS += $(call cc-option,$(CC),-fno-stack-protector,) COMMONCFLAGS += $(call cc-option,$(CC),-fno-stack-protector-all,) COMMA := ,
-CFLAGS32FLAT := $(COMMONCFLAGS) -DMODE16=0 -DMODESEGMENT=0 -fomit-frame-pointer +CFLAGS32FLAT := $(COMMONCFLAGS) -DMODE16=0 -DMODESEGMENT=0 CFLAGSSEG := $(COMMONCFLAGS) -DMODESEGMENT=1 -fno-defer-pop \ $(call cc-option,$(CC),-fno-jump-tables,-DMANUAL_NO_JUMP_TABLE) \ $(call cc-option,$(CC),-fno-tree-switch-conversion,) -CFLAGS32SEG := $(CFLAGSSEG) -DMODE16=0 -fomit-frame-pointer -CFLAGS16INC := $(CFLAGSSEG) -DMODE16=1 \ +CFLAGS32SEG := $(CFLAGSSEG) -DMODE16=0 +CFLAGS16 := $(CFLAGSSEG) -DMODE16=1 \ $(call cc-option,$(CC),-m16,-Wa$(COMMA)src/code16gcc.s) \ $(call cc-option,$(CC),--param large-stack-frame=4,-fno-inline) -CFLAGS16 := $(CFLAGS16INC) -fomit-frame-pointer
# Run with "make V=1" to see the actual compile commands ifdef V @@ -210,23 +209,25 @@ SRCVGA=src/output.c src/string.c src/hw/pci.c src/hw/serialio.c \ vgasrc/clext.c vgasrc/bochsvga.c vgasrc/geodevga.c \ src/fw/coreboot.c vgasrc/cbvga.c
-CFLAGS16VGA = $(CFLAGS16INC) -Isrc - -$(OUT)vgaccode16.raw.s: $(OUT)autoconf.h $(patsubst %.c, $(OUT)%.o,$(SRCVGA)) ; $(call whole-compile, $(CFLAGS16VGA) -S, $(SRCVGA),$@) +ifeq "$(CONFIG_VGA_FIXUP_ASM)" "y" +$(OUT)vgaccode16.raw.s: $(OUT)autoconf.h $(patsubst %.c, $(OUT)%.o,$(SRCVGA)) ; $(call whole-compile, $(filter-out -fomit-frame-pointer,$(CFLAGS16)) -fno-omit-frame-pointer -S -Isrc, $(SRCVGA),$@)
$(OUT)vgaccode16.o: $(OUT)vgaccode16.raw.s scripts/vgafixup.py @echo " Fixup VGA rom assembler" $(Q)$(PYTHON) ./scripts/vgafixup.py $< $(OUT)vgaccode16.s $(Q)$(AS) --32 src/code16gcc.s $(OUT)vgaccode16.s -o $@ +else +$(OUT)vgaccode16.o: $(OUT)autoconf.h $(patsubst %.c, $(OUT)%.o,$(SRCVGA)) ; $(call whole-compile, $(CFLAGS16) -Isrc, $(SRCVGA),$@) +endif
$(OUT)vgaentry.o: vgasrc/vgaentry.S $(OUT)autoconf.h $(OUT)asm-offsets.h @echo " Compiling (16bit) $@" - $(Q)$(CC) $(CFLAGS16VGA) -c -D__ASSEMBLY__ $< -o $@ + $(Q)$(CC) $(CFLAGS16) -c -D__ASSEMBLY__ $< -o $@
$(OUT)vgarom.o: $(OUT)vgaccode16.o $(OUT)vgaentry.o $(OUT)vgasrc/vgalayout.lds scripts/buildversion.sh @echo " Linking $@" $(Q)./scripts/buildversion.sh $(OUT)vgaversion.c VAR16 - $(Q)$(CC) $(CFLAGS16VGA) -c $(OUT)vgaversion.c -o $(OUT)vgaversion.o + $(Q)$(CC) $(CFLAGS16) -c $(OUT)vgaversion.c -o $(OUT)vgaversion.o $(Q)$(LD) --gc-sections -T $(OUT)vgasrc/vgalayout.lds $(OUT)vgaccode16.o $(OUT)vgaentry.o $(OUT)vgaversion.o -o $@
$(OUT)vgabios.bin.raw: $(OUT)vgarom.o diff --git a/vgasrc/Kconfig b/vgasrc/Kconfig index 400e8da..27a24c9 100644 --- a/vgasrc/Kconfig +++ b/vgasrc/Kconfig @@ -90,6 +90,16 @@ menu "VGA ROM" Support emulating text mode features when only a framebuffer is available.
+ config VGA_FIXUP_ASM + bool "Fixup assembler to work with broken emulators" + default y + help + This option will cause the build to attempt to avoid + certain x86 machine instructions that are known to confuse + some emulators. In particular, it works around + deficiencies in the Windows vgabios emulator and the + x86emu vgabios emulator (frequently used in Xorg). + config VGA_ALLOCATE_EXTRA_STACK depends on BUILD_VGABIOS bool "Allocate an internal stack for 16bit interrupt entry point" diff --git a/vgasrc/vgaentry.S b/vgasrc/vgaentry.S index e0ab954..7ca550d 100644 --- a/vgasrc/vgaentry.S +++ b/vgasrc/vgaentry.S @@ -64,6 +64,7 @@ x86emu_fault: // This macro implements a call while avoiding instructions // that old versions of x86emu have problems with. .macro VGA_CALLL cfunc +#if CONFIG_VGA_FIXUP_ASM // Make sure leal instruction works. movl $0x8000, %ecx leal (%ecx, %ecx, 1), %ecx @@ -72,6 +73,9 @@ x86emu_fault: // Use callw instead of calll push %ax callw \cfunc +#else + calll \cfunc +#endif .endm
// This macro is the same as ENTRY_ARG except VGA_CALLL is used.
Kevin O'Connor wrote:
+++ b/vgasrc/vgaentry.S @@ -64,6 +64,7 @@ x86emu_fault: // This macro implements a call while avoiding instructions // that old versions of x86emu have problems with. .macro VGA_CALLL cfunc +#if CONFIG_VGA_FIXUP_ASM // Make sure leal instruction works.
Isn't the logic backwards here?
Very nice work on this patchset! :)
//Peter
On Sat, Apr 11, 2015 at 05:36:53PM +0200, Peter Stuge wrote:
Kevin O'Connor wrote:
+++ b/vgasrc/vgaentry.S @@ -64,6 +64,7 @@ x86emu_fault: // This macro implements a call while avoiding instructions // that old versions of x86emu have problems with. .macro VGA_CALLL cfunc +#if CONFIG_VGA_FIXUP_ASM // Make sure leal instruction works.
Isn't the logic backwards here?
Are you pointing out the trap only being enabled when CONFIG_VGA_FIXUP_ASM is on? The trap is only useful if doing asm fixups as some x86emu versions do support leal, but will silently crash on other instructions (such as calll). In any case, the next patch completely eliminates the trap.
I only expect CONFIG_VGA_FIXUP_ASM to be off for those debugging something.
Very nice work on this patchset! :)
Thanks, -Kevin
Kevin O'Connor wrote:
+++ b/vgasrc/vgaentry.S @@ -64,6 +64,7 @@ x86emu_fault: // This macro implements a call while avoiding instructions // that old versions of x86emu have problems with. .macro VGA_CALLL cfunc +#if CONFIG_VGA_FIXUP_ASM // Make sure leal instruction works.
Isn't the logic backwards here?
Are you pointing out the trap only being enabled when CONFIG_VGA_FIXUP_ASM is on?
Right.
The trap is only useful if doing asm fixups
Isn't that backwards? The trap is most important if *not* doing fixups?
(Because then the generated binary may have known problems. Trapping them in that case would be nice.)
as some x86emu versions do support leal, but will silently crash on other instructions (such as calll).
In case the trap is important also when doing fixups, why not just always enable it?
I only expect CONFIG_VGA_FIXUP_ASM to be off for those debugging something.
If you say so.. But the always-on trap was helpful in this case.
//Peter
Emulate the "leal" instruction so that the vgabios can run on older versions of x86emu. (This removes the previous "leal" trap.)
Signed-off-by: Kevin O'Connor kevin@koconnor.net --- scripts/vgafixup.py | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++--- vgasrc/vgaentry.S | 24 +------------------- 2 files changed, 62 insertions(+), 26 deletions(-)
diff --git a/scripts/vgafixup.py b/scripts/vgafixup.py index a981bbf..1d3aea4 100644 --- a/scripts/vgafixup.py +++ b/scripts/vgafixup.py @@ -7,8 +7,8 @@
# The x86emu code widely used in Linux distributions when running Xorg # in vesamode is known to have issues with "retl", "leavel", "entryl", -# and some variants of "calll". This code modifies those instructions -# (ret and leave) that are known to be generated by gcc to avoid +# "leal", and some variants of "calll". This code modifies those +# instructions that are known to be generated by gcc to avoid # triggering the x86emu bugs.
# It is also known that the Windows vgabios emulator has issues with @@ -16,7 +16,62 @@ # worked around by not using the gcc parameter "-fomit-frame-pointer" # when compiling.
-import sys +import sys, re + +# leal parameter regex - example string: -3(%edx,%eax,8), %eax +re_leal = re.compile( + r'^\s*(?P<offset>[^(]*?)\s*' + r'(\s*(?P<base>[^,)]*?)\s*(?:,\s*(?P<index>[^,)]*?)\s*)?' + r'(?:,\s*(?P<scale>[^,)]*?)\s*)?)\s*' + r',\s*(?P<dest>.*?)\s*$') + +# Find an alternate set of instructions for a given "leal" instruction +def handle_leal(sline): + m = re_leal.match(sline[5:]) + if m is None or m.group('index') == '%esp': + print("Unable to fixup leal instruction: %s" % (sline,)) + sys.exit(-1) + offset, base, index, scale, dest = m.group( + 'offset', 'base', 'index', 'scale', 'dest') + if dest == '%esp': + # If destination is %esp then just use 16bit leaw instead + return 'leaw %s\n' % (sline[5:].replace('%e', '%'),) + if not offset: + offset = '0' + offset = int(offset, 0) + if not scale: + scale = '1' + scale = {1: 0, 2: 1, 4: 2, 8: 3}[int(scale, 0)] + # Try to rearrange arguments to simplify 'base' (to improve code gen) + if not scale and base == index: + base, index, scale = '', index, 1 + elif not index or (not scale and base in (dest, '%esp') and index != dest): + base, index, scale = index, base, 0 + # Produce instructions to calculate "leal" + insns = ['pushfl'] + if base != dest: + # Calculate "leal" directly in dest register + if index != dest: + insns.insert(0, 'movl %s, %s' % (index, dest)) + if scale: + insns.append('shll $%d, %s' % (scale, dest)) + if base: + if base == '%esp': + offset += 4 + insns.append('addl %s, %s' % (base, dest)) + elif base == index: + # Use "imull" method + insns.append('imull $%d, %s' % ((1<<scale)+1, dest)) + else: + # Backup/restore index register and do scaling in index register + insns.append('pushl %s' % (index,)) + insns.append('shll $%d, %s' % (scale, index)) + insns.append('addl %s, %s' % (index, dest)) + insns.append('popl %s' % (index,)) + if offset: + insns.append('addl $%d, %s' % (offset, dest)) + insns.append('popfl\n') + return ' ; '.join(insns)
def main(): infilename, outfilename = sys.argv[1:] @@ -30,6 +85,9 @@ def main(): out.append('movl %ebp, %esp ; popl %ebp\n') elif sline.startswith('call'): out.append('pushw %ax ; callw' + sline[4:] + '\n') + elif sline.startswith('leal'): + out.append(handle_leal(sline)) + #print "-> %s\n %s" % (sline, out[-1].strip()) else: out.append(line) infile.close() diff --git a/vgasrc/vgaentry.S b/vgasrc/vgaentry.S index 7ca550d..53be2b3 100644 --- a/vgasrc/vgaentry.S +++ b/vgasrc/vgaentry.S @@ -45,33 +45,11 @@ _rom_header_signature: * Entry points ****************************************************************/
- // Force a fault if found to be running on broken x86emu versions. - DECLFUNC x86emu_fault -msg: .ascii "SeaVGABIOS: x86emu leal trap!\n" -x86emu_fault: -#if CONFIG_DEBUG_IO - movw %cs:DebugOutputPort, %dx - movw $msg, %si -1: movb %cs:(%si), %al - outb %al, (%dx) - incw %si - cmpw $x86emu_fault, %si - jl 1b -#endif -1: hlt - jmp 1b - // This macro implements a call while avoiding instructions // that old versions of x86emu have problems with. .macro VGA_CALLL cfunc #if CONFIG_VGA_FIXUP_ASM - // Make sure leal instruction works. - movl $0x8000, %ecx - leal (%ecx, %ecx, 1), %ecx - cmpl $0x10000, %ecx - jne x86emu_fault - // Use callw instead of calll - push %ax + pushw %ax callw \cfunc #else calll \cfunc
11.04.2015 18:08, Kevin O'Connor wrote:
With a few additional checks it's possible to emulate all the leal cases without requiring a function call. Although this makes the vgafixup.py code a little more complex it eliminates the need for the "emulate_leal" function and it produces better code. In my tests, almost all "leal" instructions are replaced with 4 (or fewer) instructions.
Maybe it is possible to tell gcc to not produce this instruction in the first place?
Thanks,
/mjt
On Sat, Apr 11, 2015 at 08:28:06PM +0300, Michael Tokarev wrote:
11.04.2015 18:08, Kevin O'Connor wrote:
With a few additional checks it's possible to emulate all the leal cases without requiring a function call. Although this makes the vgafixup.py code a little more complex it eliminates the need for the "emulate_leal" function and it produces better code. In my tests, almost all "leal" instructions are replaced with 4 (or fewer) instructions.
Maybe it is possible to tell gcc to not produce this instruction in the first place?
That would certainly be better. Any idea how to do that?
-Kevin
On 11/04/2015 20:07, Kevin O'Connor wrote:
With a few additional checks it's possible to emulate all the leal cases without requiring a function call. Although this makes the vgafixup.py code a little more complex it eliminates the need for the "emulate_leal" function and it produces better code. In my tests, almost all "leal" instructions are replaced with 4 (or fewer) instructions.
Maybe it is possible to tell gcc to not produce this instruction in the first place?
That would certainly be better. Any idea how to do that?
Not really. IIRC, GCC really likes having a way to do additions that doesn't modify the flags.
When porting to an architectures that does not support that (e.g. some old CISC ones), you have to use a completely different mechanism to describe conditional branches and stores---which is legacy and much less optimizable than the newer mechanism. x86 uses the new one.
Paolo