On Mon, Dec 19, 2016 at 11:18 AM, Chauhan, Himanshu hschauhan@nulltrace.org wrote:
On Mon, Dec 19, 2016 at 10:03 PM, Aaron Durbin adurbin@google.com wrote:
On Mon, Dec 19, 2016 at 9:55 AM, Chauhan, Himanshu hschauhan@nulltrace.org wrote:
On Mon, Dec 19, 2016 at 9:09 PM, Aaron Durbin adurbin@google.com wrote:
On Sun, Dec 18, 2016 at 11:04 PM, Chauhan, Himanshu hschauhan@nulltrace.org wrote:
On Mon, Dec 19, 2016 at 12:40 AM, Aaron Durbin adurbin@google.com wrote:
On Sun, Dec 18, 2016 at 9:37 AM, Chauhan, Himanshu hschauhan@nulltrace.org wrote: > Hi Aaron, > > I figured out the crash. It wan't because wrong load of the ROM image > (thanks to the nifty post_code which I could trap on IO). I see that > the page fault I am getting is in following code: > (gdb) list *(((0xfff81e41 - 0xfff80000)-200)+0x2000000)
I'm curious about the 200 and 16MiB offset being applied.
0x2000000 is the new address where romstage is linked. Earlier (atleast in 2014) the linked address used to be 0xfff80000. This is the same address (guest physical) where I map the ROM code. In the above calculation I am taking the offset from 0xfff80000 and adding to the link address of romstage (0x2000000). The 0x200 is the difference I see to map the addresses correctly. This calculation seems fine to me because with this I am able to pin point all the earlier faults and the post_code trap rIP.
If you provide 'cbfstool print -k' output, I could most likely provide the exact offset mapping. Alternatively you could extract the romstage.elf from the image using 'cbfstool extract -m x86', but it won't have debug info. But it'd provide the information to compare against the pre-relocated image for the correct mapping.
How exactly to run it? It says unknown option -k (cbfstool in build directory).
./coreboot-builds/sharedutils/cbfstool/cbfstool coreboot-builds/GOOGLE_REEF/coreboot.rom print -k
hchauhan@panini:build$ ./cbfstool coreboot.rom print -k
Performing operation on 'COREBOOT' region...
Name Offset Type Metadata Size Data Size Total Size
cbfs master header 0x0 cbfs header 0x38 0x20 0x58
fallback/romstage 0x80 stage 0x64 0x320c 0x3270
fallback/ramstage 0x3300 stage 0x38 0x99d7 0x9a0f
config 0xcd40 raw 0x38 0x238 0x270
revision 0xcfc0 raw 0x38 0x239 0x271
cmos_layout.bin 0xd240 cmos_layout 0x38 0x304 0x33c
fallback/dsdt.aml 0xd580 raw 0x48 0xfb5 0xffd
fallback/payload 0xe580 payload 0x38 0x6b85 0x6bbd
(empty) 0x15140 null 0x28 0x6a998 0x6a9c0
bootblock 0x7fb00 bootblock 0x40 0x3c0 0x400
What is CONFIG_ROM_SIZE? When I build qemu-i440fx my ROM_SIZE is 4MiB. It seems like you are changing it from the default.
$ ./coreboot-builds/sharedutils/cbfstool/cbfstool ./coreboot-builds/EMULATION_QEMU_X86_I440FX/coreboot.rom extract -m x86 -n fallback/romstage -f extracted_romstage.elf
$ diff -up <(readelf -h ./coreboot-builds/EMULATION_QEMU_X86_I440FX/cbfs/fallback/romstage.elf) <(readelf -h extracted_romstage.elf ) --- /dev/fd/63 2016-12-19 11:42:16.459336682 -0600 +++ /dev/fd/62 2016-12-19 11:42:16.459336682 -0600 @@ -8,13 +8,13 @@ ELF Header: Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 - Entry point address: 0x2000020 - Start of program headers: 52 (bytes into file) - Start of section headers: 21496 (bytes into file) + Entry point address: 0xfffc0220 + Start of program headers: 252 (bytes into file) + Start of section headers: 52 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) - Number of section headers: 8 - Section header string table index: 5 + Number of section headers: 5 + Section header string table index: 1
You can perform the translation based on the difference of the entry point offsets.
After doing that does the faulting RIP still point to imd_recover()? A high RIP like that would indicate we're still in rosmtage. I still don't see how we'd ever be calling into that function given Having serial console would be extremely helpful in being able to track down where things are falling over.
That's an example after me building reef with abuild. How old is your coreboot checkout?
Pulled just a few days back.
> 0x2001d79 is in imd_recover (src/lib/imd.c:139). > 134 > 135 static void imdr_init(struct imdr *ir, void *upper_limit) > 136 { > 137 uintptr_t limit = (uintptr_t)upper_limit; > 138 /* Upper limit is aligned down to 4KiB */ > 139 ir->limit = ALIGN_DOWN(limit, LIMIT_ALIGN); > 140 ir->r = NULL; > 141 } > 142 > 143 static int imdr_create_empty(struct imdr *imdr, size_t root_size, > > I see that this function is being called multiple times (I added some > more post_code and see them being trapped). I get a series of page > faults which I am able to honour all but last.
I don't see how imdr_init would be faulting. That's just assigning fields of a struct sitting on the stack. What's your stack pointer value at the time of the faults?
"ir" should be on stack or on top of the RAM. Right now it looks like its on top of the RAM. That area is not mapped initially. On a page fault, I map a 4K page. For the reference, the following is the register dump of coreboot. RSP is 0x9fe54.
The values should not be striding. That object is always on the stack. Where the stack is located could be in low or high memory. I still need to know what platform you are targeting for the image to provide details. However, it would not be striding.
I am building this for qemu i440-fx.
OK. What is your cmos emulation returning at addresses 0x34, 0x35, 0x5d, 0x5c and 0x5b?
This question above would be helpful.
I also don't understand why we're adding 16MiB to qemu_get_memory_size() unconditionally.
GUEST guest0/vcpu0 dump state:
RAX: 0x9fe80 RBX: 0xfffff8 RCX: 0x1b RDX: 0x53a11439 R08: 0x0 R09: 0x0 R10: 0x0 R11: 0x0 R12: 0x0 R13: 0x0 R14: 0x0 R15: 0x0 RSP: 0x9fe54 RBP: 0xa0000 RDI: 0xfff801e4 RSI: 0x9fe80 RIP: 0xfff81e41
CR0: 0xe0000011 CR2: 0x0 CR3: 0xa23000 CR4: 0x0 CS : Sel: 0x00000008 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 11) DS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) ES : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) SS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) FS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) GS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) GDT : Sel: 0x00000000 Limit: 0x0000001f Base: 0xfff80200 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) LDT : Sel: 0x00000000 Limit: 0x0000ffff Base: 0x00000000 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) IDT : Sel: 0x00000000 Limit: 0x00000000 Base: 0x00000000 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) TR : Sel: 0x00000000 Limit: 0x0000ffff Base: 0x00000000 (G: 1 DB: 0 L: 1 AVL: 1 P: 0 DPL: 0 S: 0 Type: 0) RFLAGS: 0xa [ ]
> > (__handle_vm_exception:543) Guest fault: 0x7f7fffc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f7effc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f7dffc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f7cffc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f7bffc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f7affc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f79ffc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f78ffc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f77ffc (rIP: 00000000FFF81E41) > (__handle_vm_exception:543) Guest fault: 0x7f76ffc (rIP: 00000000FFF81E41) > <snip>
What's the full sequence of faults?
Are those non-rIP addresses the page fault address?
Guest fault: 0x7f7fffc is the address which I think is pointing to "ir". If you look all the faulting addresses are 4K apart which is my default page size for mapping all the guest pages. It also means that multiple times "imdr_init" is being called it faults for different addresses hence the same rIP.
I just don't see how we're using that much stack. That doesn't seem right at all.
Yes. Something is terribly wrong. I had this working back in 2014. Please take a look at this video that I created at that time. https://www.youtube.com/watch?v=jPAzzLQ0NgU
i see you do have serial port. It'd be interesting to get full logs when the thing is booting to see where it goes off the rails.
I couldn't work on it for quite some time and meantime core boot changed a lot. I have one question. In earlier core boot images, romstage was linked to 0xfff80000 and now its 0x2000000. Any reason?
It's just linked at CONFIG_ROMSTAGE_ADDR to avoid a double link step. It's linked once and cbfstool relocates the image when placing it into CBFS. It previously was linked at a specific address then the xip address was calculated by performing a pseudo CBFS add operation. Then romstage was re-linked and added to CBFS.
The offset for address translation is the entry point differences between the 2 elf files. You can extract the one in coreboot.rom to get a the entry point of the romstage being ran.
> > handle_guest_realmode_page_fault: offset: 0x3ffc fault: 0x1003ffc reg: 0x1000000 > handle_guest_realmode_page_fault: offset: 0x2ffc fault: 0x1002ffc reg: 0x1000000 > handle_guest_realmode_page_fault: offset: 0x1ffc fault: 0x1001ffc reg: 0x1000000 > handle_guest_realmode_page_fault: offset: 0xffc fault: 0x1000ffc reg: 0x1000000
What is the above detailing? I'm not sure what the 'fault' value means.
These are same as Guest fault above. You can disregard them.
> > (__handle_vm_exception:561) ERROR: No region mapped to guest physical: 0xfffffc > > > I want to understand why imd_recover gets called multiple times > starting from top of memory (128MB is what I have assigned to the > guest) to 16MB last (after which I can't honour). There is something > amiss in my understanding of core boot memory map. > > Could you please help?
The imd library contains the implementation of cbmem. See include/cbmem.h for more details, but how it works is that the platform needs to supply the implementation of cbmem_top() which defines the exclusive upper boundary to start growing entries downward from. There is a large and small object size with large blocks being 4KiB in size and small blocks being 32 byes. I don't understand why the faulting addresses are offset from 128MiB by 512KiB with a 4KiB stride.
What platform are you targeting for your coreboot build? Are you restarting the instruction that faults? I'm really curious about the current fault patterns. It looks like things are faulting around accessing the imd_root_pointer root_offset field. Are these faults reads or writes? However, that's assuming cbmem_top() is returning 128MiB-512KiB. However, it doesn't explain the successive strides. Do you have serial port emulation to get the console messages out?
So in your platform code ensure 2 things are happening:
- cbmem_top() returns a highest address in 'ram' of the guest once
it's online. 128MiB if that's your expectation. The value cbmem_top() returns should never change from successive calls aside from NULL being returned when ram is not yet available.
It will always return 0x6f8. This is decided when the guest is created.
That doesn't seem right. You mean that's what gets returned from cmos before adding a 16MiB?