It seems that romcc with -mcpu=p3 -O produces incorrect code for cpu/x86/16bit/reset16.inc. -mpcu=p2 -O does work fine.
I will try to provide more hard data later but it looks like the following code is compiled into an (slightly) incorrect jump which breaks things terribly, of course. .section ".reset" .code16 .globl reset_vector reset_vector: .byte 0xe9 .int _start - ( . + 2 ) . = 0x8; .code32
Jump seems to be off by -4 bytes.
on 30/10/2008 11:16 Andriy Gapon said the following:
It seems that romcc with -mcpu=p3 -O produces incorrect code for cpu/x86/16bit/reset16.inc. -mpcu=p2 -O does work fine.
I will try to provide more hard data later but it looks like the following code is compiled into an (slightly) incorrect jump which breaks things terribly, of course. .section ".reset" .code16 .globl reset_vector reset_vector: .byte 0xe9 .int _start - ( . + 2 ) . = 0x8; .code32
Jump seems to be off by -4 bytes.
My original analysis was quite incorrect. I did thorougher analysis now and here are the main differences. Indeed there are some changes in offsets but the jumps seem to be correct: --- fallback-p2/coreboot.map 2008-10-30 23:29:14.000000000 +0200 +++ fallback-p3/coreboot.map 2008-10-30 23:39:44.000000000 +0200 @@ -112,10 +112,10 @@ 00004000 A HEAP_SIZE 00004000 A _RAMBASE 00004000 A _iseg -0000b889 A _binary_coreboot_ram_rom_size -0000b890 A _start_offset -0000b8c8 A gdtptr16_offset -0000f889 A _eiseg +0000b887 A _binary_coreboot_ram_rom_size +0000b88c A _start_offset +0000b8c4 A gdtptr16_offset +0000f887 A _eiseg 00010000 A ROM_IMAGE_SIZE 00010000 A XIP_ROM_SIZE 0001c200 A TTYS0_BAUD
I wonder where these small 2-4 byte shifts came from. but this is not important.
What is much more important is that all mmN registers were replaced with xmmN registers.
So here are qemu execution logs for comparison (best done side-by-side). With cpu=p2: ---------------- IN: 0xfffffff0: jmp 0xb890
---------------- IN: 0xffffb890: cli 0xffffb891: mov %eax,%ebp 0xffffb894: xor %eax,%eax 0xffffb897: mov %eax,%cr3
---------------- IN: 0xffffb89a: mov %cs,%ax 0xffffb89c: shl $0x4,%ax 0xffffb89f: mov $0xb8c8,%bx 0xffffb8a2: sub %ax,%bx 0xffffb8a4: lgdtl %cs:(%bx) 0xffffb8a9: mov %cr0,%eax 0xffffb8ac: and $0x7ffaffd1,%eax 0xffffb8b2: or $0x60000001,%eax 0xffffb8b8: mov %eax,%cr0
---------------- IN: 0xffffb8bb: mov %ebp,%eax 0xffffb8be: ljmpl $0x8,$0xffffb8f7
---------------- IN: 0xffffb8f7: mov %eax,%ebp 0xffffb8f9: mov $0x10,%al 0xffffb8fb: out %al,$0x80 0xffffb8fd: mov $0x10,%ax 0xffffb901: mov %eax,%ds
outb: 0080 10 dma: extra page register write addr=0x80 data=0x10 ioport80_write 0X10 ---------------- IN: 0xffffb903: mov %eax,%es
---------------- IN: 0xffffb905: mov %eax,%ss
---------------- IN: 0xffffb907: mov %eax,%fs
---------------- IN: 0xffffb909: mov %eax,%gs 0xffffb90b: mov %ebp,%eax 0xffffb90d: jmp 0xffffb919
---------------- IN: 0xffffb919: movd %eax,%mm0 0xffffb91c: mov $0xd,%al . . .
With cpu=p3: ---------------- IN: 0xfffffff0: jmp 0xb88c
---------------- IN: 0xffffb88c: cli 0xffffb88d: mov %eax,%ebp 0xffffb890: xor %eax,%eax 0xffffb893: mov %eax,%cr3
---------------- IN: 0xffffb896: mov %cs,%ax 0xffffb898: shl $0x4,%ax 0xffffb89b: mov $0xb8c4,%bx 0xffffb89e: sub %ax,%bx 0xffffb8a0: lgdtl %cs:(%bx) 0xffffb8a5: mov %cr0,%eax 0xffffb8a8: and $0x7ffaffd1,%eax 0xffffb8ae: or $0x60000001,%eax 0xffffb8b4: mov %eax,%cr0
---------------- IN: 0xffffb8b7: mov %ebp,%eax 0xffffb8ba: ljmpl $0x8,$0xffffb8f3
---------------- IN: 0xffffb8f3: mov %eax,%ebp 0xffffb8f5: mov $0x10,%al 0xffffb8f7: out %al,$0x80 0xffffb8f9: mov $0x10,%ax 0xffffb8fd: mov %eax,%ds
outb: 0080 10 dma: extra page register write addr=0x80 data=0x10 ioport80_write 0X10 ---------------- IN: 0xffffb8ff: mov %eax,%es
---------------- IN: 0xffffb901: mov %eax,%ss
---------------- IN: 0xffffb903: mov %eax,%fs
---------------- IN: 0xffffb905: mov %eax,%gs 0xffffb907: mov %ebp,%eax 0xffffb909: jmp 0xffffb915
---------------- IN: 0xffffb915: movd %eax,%xmm0
qemu: fatal: triple fault EAX=00000000 EBX=0000b8c4 ECX=00000000 EDX=00000673 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=ffffb915 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00cf9300 CS =0008 00000000 ffffffff 00cf9b00 SS =0010 00000000 ffffffff 00cf9300 DS =0010 00000000 ffffffff 00cf9300 FS =0010 00000000 ffffffff 00cf9300 GS =0010 00000000 ffffffff 00cf9300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= ffffb8cc 00000017 IDT= 00000000 0000ffff CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000 CCS=00000000 CCD=60000011 CCO=LOGICL FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 FPR1=0000000000000000 FPR2=0000000000000000 FPR3=0000000000000000 FPR4=0000000000000000 FPR5=0000000000000000 FPR6=0000000000000000 FPR7=0000000000000000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
So the logs are practically identical (but for small changes in addresses), only in p3 case there is a mov to SSE register (as opposed to MMX one) and it results in triple fault.
I wonder if it was my mistake to specify p3 for romcc compilation of fallback.c? I searched through the code and it seems that p3 is only used for normal images, fallback images typically have no cpu option and p2 in one or two cases.
Maybe access to SSE registers requires some prior initialization? I would be very grateful for my education :)