[coreboot] Disassembly of coreboot binaries

Mon Dec 19 19:10:19 CET 2016

On Mon, Dec 19, 2016 at 11:18 AM, Chauhan, Himanshu
<hschauhan at nulltrace.org> wrote:
> On Mon, Dec 19, 2016 at 10:03 PM, Aaron Durbin <adurbin at google.com> wrote:
>> On Mon, Dec 19, 2016 at 9:55 AM, Chauhan, Himanshu
>> <hschauhan at nulltrace.org> wrote:
>>> On Mon, Dec 19, 2016 at 9:09 PM, Aaron Durbin <adurbin at google.com> wrote:
>>>> On Sun, Dec 18, 2016 at 11:04 PM, Chauhan, Himanshu
>>>> <hschauhan at nulltrace.org> wrote:
>>>>> On Mon, Dec 19, 2016 at 12:40 AM, Aaron Durbin <adurbin at google.com> wrote:
>>>>>> On Sun, Dec 18, 2016 at 9:37 AM, Chauhan, Himanshu
>>>>>> <hschauhan at nulltrace.org> wrote:
>>>>>>> Hi Aaron,
>>>>>>>
>>>>>>> I figured out the crash. It wan't because wrong load of the ROM image
>>>>>>> (thanks to the nifty post_code which I could trap on IO). I see that
>>>>>>> the page fault I am getting is in following code:
>>>>>>> (gdb) list *(((0xfff81e41 - 0xfff80000)-200)+0x2000000)
>>>>>>
>>>>>> I'm curious about the 200 and 16MiB offset being applied.
>>>>>
>>>>> 0x2000000 is the new address where romstage is linked. Earlier
>>>>> (atleast in 2014) the linked address used to be 0xfff80000. This is
>>>>> the same address (guest physical) where I map the ROM code. In the
>>>>> above calculation I am taking the offset from 0xfff80000 and adding to
>>>>> the link address of romstage (0x2000000). The 0x200 is the difference
>>>>> I see to map the addresses correctly. This calculation seems fine to
>>>>> me because with this I am able to pin point all the earlier faults and
>>>>> the post_code trap rIP.
>>>>>
>>>>
>>>> If you provide 'cbfstool print -k' output, I could most likely provide
>>>> the exact offset mapping. Alternatively you could extract the
>>>> romstage.elf from the image using 'cbfstool extract -m x86', but it
>>>> won't have debug info. But it'd provide the information to compare
>>>> against the pre-relocated image for the correct mapping.
>>>>
>>> How exactly to run it? It says unknown option -k (cbfstool in build directory).
>>
>>
>> ./coreboot-builds/sharedutils/cbfstool/cbfstool
>> coreboot-builds/GOOGLE_REEF/coreboot.rom print -k
>>
>
> hchauhan at panini:build$ ./cbfstool coreboot.rom print -k
>
> Performing operation on 'COREBOOT' region...
>
> Name    Offset  Type    Metadata Size   Data Size       Total Size
>
> cbfs master header      0x0     cbfs header     0x38    0x20    0x58
>
> fallback/romstage       0x80    stage   0x64    0x320c  0x3270
>
> fallback/ramstage       0x3300  stage   0x38    0x99d7  0x9a0f
>
> config  0xcd40  raw     0x38    0x238   0x270
>
> revision        0xcfc0  raw     0x38    0x239   0x271
>
> cmos_layout.bin 0xd240  cmos_layout     0x38    0x304   0x33c
>
> fallback/dsdt.aml       0xd580  raw     0x48    0xfb5   0xffd
>
> fallback/payload        0xe580  payload 0x38    0x6b85  0x6bbd
>
> (empty) 0x15140 null    0x28    0x6a998 0x6a9c0
>
> bootblock       0x7fb00 bootblock       0x40    0x3c0   0x400


What is CONFIG_ROM_SIZE? When I build qemu-i440fx my ROM_SIZE is 4MiB.
It seems like you are changing it from the default.

$ ./coreboot-builds/sharedutils/cbfstool/cbfstool
./coreboot-builds/EMULATION_QEMU_X86_I440FX/coreboot.rom extract -m
x86 -n fallback/romstage -f extracted_romstage.elf

$ diff -up <(readelf -h
./coreboot-builds/EMULATION_QEMU_X86_I440FX/cbfs/fallback/romstage.elf)
 <(readelf -h extracted_romstage.elf )

--- /dev/fd/63  2016-12-19 11:42:16.459336682 -0600
+++ /dev/fd/62  2016-12-19 11:42:16.459336682 -0600
@@ -8,13 +8,13 @@ ELF Header:
   Type:                              EXEC (Executable file)
   Machine:                           Intel 80386
   Version:                           0x1
-  Entry point address:               0x2000020
-  Start of program headers:          52 (bytes into file)
-  Start of section headers:          21496 (bytes into file)
+  Entry point address:               0xfffc0220
+  Start of program headers:          252 (bytes into file)
+  Start of section headers:          52 (bytes into file)
   Flags:                             0x0
   Size of this header:               52 (bytes)
   Size of program headers:           32 (bytes)
   Number of program headers:         1
   Size of section headers:           40 (bytes)
-  Number of section headers:         8
-  Section header string table index: 5
+  Number of section headers:         5
+  Section header string table index: 1

You can perform the translation based on the difference of the entry
point offsets.

After doing that does the faulting RIP still point to imd_recover()? A
high RIP like that would indicate we're still in rosmtage. I still
don't see how we'd ever be calling into that function given  Having
serial console would be extremely helpful in being able to track down
where things are falling over.

>
>
>> That's an example after me building reef with abuild. How old is your
>> coreboot checkout?
>>
> Pulled just a few days back.
>
>>>
>>>>>>
>>>>>>> 0x2001d79 is in imd_recover (src/lib/imd.c:139).
>>>>>>> 134
>>>>>>> 135     static void imdr_init(struct imdr *ir, void *upper_limit)
>>>>>>> 136     {
>>>>>>> 137             uintptr_t limit = (uintptr_t)upper_limit;
>>>>>>> 138             /* Upper limit is aligned down to 4KiB */
>>>>>>> 139             ir->limit = ALIGN_DOWN(limit, LIMIT_ALIGN);
>>>>>>> 140             ir->r = NULL;
>>>>>>> 141     }
>>>>>>> 142
>>>>>>> 143     static int imdr_create_empty(struct imdr *imdr, size_t root_size,
>>>>>>>
>>>>>>> I see that this function is being called multiple times (I added some
>>>>>>> more post_code and see them being trapped). I get a series of page
>>>>>>> faults which I am able to honour all but last.
>>>>>>
>>>>>> I don't see how imdr_init would be faulting. That's just assigning
>>>>>> fields of a struct sitting on the stack. What's your stack pointer
>>>>>> value at the time of the faults?
>>>>>
>>>>> "ir" should be on stack or on top of the RAM. Right now it looks like
>>>>> its on top of the RAM. That area is not mapped initially. On a page
>>>>> fault, I map a 4K page. For the reference, the following is the
>>>>> register dump of coreboot. RSP is 0x9fe54.
>>>>>
>>>>
>>>> The values should not be striding. That object is always on the stack.
>>>> Where the stack is located could be in low or high memory. I still
>>>> need to know what platform you are targeting for the image to provide
>>>> details. However, it would not be striding.
>>>
>>> I am building this for qemu i440-fx.
>>
>> OK. What is your cmos emulation returning at addresses 0x34, 0x35,
>> 0x5d, 0x5c and 0x5b?


This question above would be helpful.

>>
>> I also don't understand why we're adding 16MiB to
>> qemu_get_memory_size() unconditionally.
>>
>>>
>>>>
>>>>> GUEST guest0/vcpu0 dump state:
>>>>>
>>>>> RAX: 0x9fe80 RBX: 0xfffff8 RCX: 0x1b RDX: 0x53a11439
>>>>> R08: 0x0 R09: 0x0 R10: 0x0 R11: 0x0
>>>>> R12: 0x0 R13: 0x0 R14: 0x0 R15: 0x0
>>>>> RSP: 0x9fe54 RBP: 0xa0000 RDI: 0xfff801e4 RSI: 0x9fe80
>>>>> RIP: 0xfff81e41
>>>>>
>>>>> CR0: 0xe0000011 CR2: 0x0 CR3: 0xa23000 CR4: 0x0
>>>>> CS    : Sel: 0x00000008 Limit: 0xffffffff Base: 0x00000000 (G:  1 DB:
>>>>> 1 L:  0 AVL:  0 P:  1 DPL:  0 S:  1 Type: 11)
>>>>> DS    : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G:  1 DB:
>>>>> 1 L:  0 AVL:  0 P:  1 DPL:  0 S:  1 Type:  3)
>>>>> ES    : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G:  1 DB:
>>>>> 1 L:  0 AVL:  0 P:  1 DPL:  0 S:  1 Type:  3)
>>>>> SS    : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G:  1 DB:
>>>>> 1 L:  0 AVL:  0 P:  1 DPL:  0 S:  1 Type:  3)
>>>>> FS    : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G:  1 DB:
>>>>> 1 L:  0 AVL:  0 P:  1 DPL:  0 S:  1 Type:  3)
>>>>> GS    : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G:  1 DB:
>>>>> 1 L:  0 AVL:  0 P:  1 DPL:  0 S:  1 Type:  3)
>>>>> GDT   : Sel: 0x00000000 Limit: 0x0000001f Base: 0xfff80200 (G:  0 DB:
>>>>> 0 L:  0 AVL:  0 P:  0 DPL:  0 S:  0 Type:  0)
>>>>> LDT   : Sel: 0x00000000 Limit: 0x0000ffff Base: 0x00000000 (G:  0 DB:
>>>>> 0 L:  0 AVL:  0 P:  0 DPL:  0 S:  0 Type:  0)
>>>>> IDT   : Sel: 0x00000000 Limit: 0x00000000 Base: 0x00000000 (G:  0 DB:
>>>>> 0 L:  0 AVL:  0 P:  0 DPL:  0 S:  0 Type:  0)
>>>>> TR    : Sel: 0x00000000 Limit: 0x0000ffff Base: 0x00000000 (G:  1 DB:
>>>>> 0 L:  1 AVL:  1 P:  0 DPL:  0 S:  0 Type:  0)
>>>>> RFLAGS: 0xa    [ ]
>>>>>
>>>>>
>>>>>>>
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f7fffc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f7effc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f7dffc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f7cffc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f7bffc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f7affc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f79ffc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f78ffc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f77ffc (rIP: 00000000FFF81E41)
>>>>>>> (__handle_vm_exception:543) Guest fault: 0x7f76ffc (rIP: 00000000FFF81E41)
>>>>>>> <snip>


What's the full sequence of faults?

>>>>>>
>>>>>> Are those non-rIP addresses the page fault address?
>>>>>
>>>>> Guest fault: 0x7f7fffc is the address which I think is pointing to
>>>>> "ir". If you look all the faulting addresses are 4K apart which is my
>>>>> default page size for mapping all the guest pages. It also means that
>>>>> multiple times "imdr_init" is being called it faults for different
>>>>> addresses hence the same rIP.
>>>>
>>>> I just don't see how we're using that much stack. That doesn't seem
>>>> right at all.
>>>>
>>>
>>> Yes. Something is terribly wrong. I had this working back in 2014.
>>> Please take a look at this video that I created at that time.
>>> https://www.youtube.com/watch?v=jPAzzLQ0NgU
>>
>> i see you do have serial port. It'd be interesting to get full logs
>> when the thing is booting to see where it goes off the rails.
>>>
>>> I couldn't work on it for quite some time and meantime core boot
>>> changed a lot. I have one question. In earlier core boot images,
>>> romstage was linked to 0xfff80000 and now its 0x2000000. Any reason?
>>
>> It's just linked at CONFIG_ROMSTAGE_ADDR to avoid a double link step.
>> It's linked once and cbfstool relocates the image when placing it into
>> CBFS. It previously was linked at a specific address then the xip
>> address was calculated by performing a pseudo CBFS add operation. Then
>> romstage was re-linked and added to CBFS.
>>
>> The offset for address translation is the entry point differences
>> between the 2 elf files. You can extract the one in coreboot.rom to
>> get a the entry point of the romstage being ran.
>>
>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> handle_guest_realmode_page_fault: offset: 0x3ffc fault: 0x1003ffc reg: 0x1000000
>>>>>>> handle_guest_realmode_page_fault: offset: 0x2ffc fault: 0x1002ffc reg: 0x1000000
>>>>>>> handle_guest_realmode_page_fault: offset: 0x1ffc fault: 0x1001ffc reg: 0x1000000
>>>>>>> handle_guest_realmode_page_fault: offset: 0xffc fault: 0x1000ffc reg: 0x1000000
>>>>>>
>>>>>> What is the above detailing? I'm not sure what the 'fault' value means.
>>>>>
>>>>> These are same as Guest fault above. You can disregard them.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> (__handle_vm_exception:561) ERROR: No region mapped to guest physical: 0xfffffc
>>>>>>>
>>>>>>>
>>>>>>> I want to understand why imd_recover gets called multiple times
>>>>>>> starting from top of memory (128MB is what I have assigned to the
>>>>>>> guest) to 16MB last (after which I can't honour). There is something
>>>>>>> amiss in my understanding of core boot memory map.
>>>>>>>
>>>>>>> Could you please help?
>>>>>>
>>>>>> The imd library contains the implementation of cbmem. See
>>>>>> include/cbmem.h for more details, but how it works is that the
>>>>>> platform needs to supply the implementation of cbmem_top() which
>>>>>> defines the exclusive upper boundary to start growing entries downward
>>>>>> from. There is a large and small object size with large blocks being
>>>>>> 4KiB in size and small blocks being 32 byes. I don't understand why
>>>>>> the faulting addresses are offset from 128MiB by 512KiB with a 4KiB
>>>>>> stride.
>>>>>>
>>>>>> What platform are you targeting for your coreboot build? Are you
>>>>>> restarting the instruction that faults? I'm really curious about the
>>>>>> current fault patterns.  It looks like things are faulting around
>>>>>> accessing the imd_root_pointer root_offset field. Are these faults
>>>>>> reads or writes? However, that's assuming cbmem_top() is returning
>>>>>> 128MiB-512KiB. However, it doesn't explain the successive strides. Do
>>>>>> you have serial port emulation to get the console messages out?
>>>>>>
>>>>>> So in your platform code ensure 2 things are happening:
>>>>>>
>>>>>> 1. cbmem_top() returns a highest address in 'ram' of the guest once
>>>>>> it's online. 128MiB if that's your expectation. The value cbmem_top()
>>>>>> returns should never change from successive calls aside from NULL
>>>>>> being returned when ram is not yet available.
>
> It will always return 0x6f8. This is decided when the guest is created.
>

That doesn't seem right. You mean that's what gets returned from cmos
before adding a 16MiB?