On Mon, Dec 19, 2016 at 12:40 AM, Aaron Durbin adurbin@google.com wrote:
On Sun, Dec 18, 2016 at 9:37 AM, Chauhan, Himanshu hschauhan@nulltrace.org wrote:
Hi Aaron,
I figured out the crash. It wan't because wrong load of the ROM image (thanks to the nifty post_code which I could trap on IO). I see that the page fault I am getting is in following code: (gdb) list *(((0xfff81e41 - 0xfff80000)-200)+0x2000000)
I'm curious about the 200 and 16MiB offset being applied.
0x2000000 is the new address where romstage is linked. Earlier (atleast in 2014) the linked address used to be 0xfff80000. This is the same address (guest physical) where I map the ROM code. In the above calculation I am taking the offset from 0xfff80000 and adding to the link address of romstage (0x2000000). The 0x200 is the difference I see to map the addresses correctly. This calculation seems fine to me because with this I am able to pin point all the earlier faults and the post_code trap rIP.
0x2001d79 is in imd_recover (src/lib/imd.c:139). 134 135 static void imdr_init(struct imdr *ir, void *upper_limit) 136 { 137 uintptr_t limit = (uintptr_t)upper_limit; 138 /* Upper limit is aligned down to 4KiB */ 139 ir->limit = ALIGN_DOWN(limit, LIMIT_ALIGN); 140 ir->r = NULL; 141 } 142 143 static int imdr_create_empty(struct imdr *imdr, size_t root_size,
I see that this function is being called multiple times (I added some more post_code and see them being trapped). I get a series of page faults which I am able to honour all but last.
I don't see how imdr_init would be faulting. That's just assigning fields of a struct sitting on the stack. What's your stack pointer value at the time of the faults?
"ir" should be on stack or on top of the RAM. Right now it looks like its on top of the RAM. That area is not mapped initially. On a page fault, I map a 4K page. For the reference, the following is the register dump of coreboot. RSP is 0x9fe54.
GUEST guest0/vcpu0 dump state:
RAX: 0x9fe80 RBX: 0xfffff8 RCX: 0x1b RDX: 0x53a11439 R08: 0x0 R09: 0x0 R10: 0x0 R11: 0x0 R12: 0x0 R13: 0x0 R14: 0x0 R15: 0x0 RSP: 0x9fe54 RBP: 0xa0000 RDI: 0xfff801e4 RSI: 0x9fe80 RIP: 0xfff81e41
CR0: 0xe0000011 CR2: 0x0 CR3: 0xa23000 CR4: 0x0 CS : Sel: 0x00000008 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 11) DS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) ES : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) SS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) FS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) GS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) GDT : Sel: 0x00000000 Limit: 0x0000001f Base: 0xfff80200 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) LDT : Sel: 0x00000000 Limit: 0x0000ffff Base: 0x00000000 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) IDT : Sel: 0x00000000 Limit: 0x00000000 Base: 0x00000000 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) TR : Sel: 0x00000000 Limit: 0x0000ffff Base: 0x00000000 (G: 1 DB: 0 L: 1 AVL: 1 P: 0 DPL: 0 S: 0 Type: 0) RFLAGS: 0xa [ ]
(__handle_vm_exception:543) Guest fault: 0x7f7fffc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f7effc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f7dffc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f7cffc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f7bffc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f7affc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f79ffc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f78ffc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f77ffc (rIP: 00000000FFF81E41) (__handle_vm_exception:543) Guest fault: 0x7f76ffc (rIP: 00000000FFF81E41)
<snip>
Are those non-rIP addresses the page fault address?
Guest fault: 0x7f7fffc is the address which I think is pointing to "ir". If you look all the faulting addresses are 4K apart which is my default page size for mapping all the guest pages. It also means that multiple times "imdr_init" is being called it faults for different addresses hence the same rIP.
handle_guest_realmode_page_fault: offset: 0x3ffc fault: 0x1003ffc reg: 0x1000000 handle_guest_realmode_page_fault: offset: 0x2ffc fault: 0x1002ffc reg: 0x1000000 handle_guest_realmode_page_fault: offset: 0x1ffc fault: 0x1001ffc reg: 0x1000000 handle_guest_realmode_page_fault: offset: 0xffc fault: 0x1000ffc reg: 0x1000000
What is the above detailing? I'm not sure what the 'fault' value means.
These are same as Guest fault above. You can disregard them.
(__handle_vm_exception:561) ERROR: No region mapped to guest physical: 0xfffffc
I want to understand why imd_recover gets called multiple times starting from top of memory (128MB is what I have assigned to the guest) to 16MB last (after which I can't honour). There is something amiss in my understanding of core boot memory map.
Could you please help?
The imd library contains the implementation of cbmem. See include/cbmem.h for more details, but how it works is that the platform needs to supply the implementation of cbmem_top() which defines the exclusive upper boundary to start growing entries downward from. There is a large and small object size with large blocks being 4KiB in size and small blocks being 32 byes. I don't understand why the faulting addresses are offset from 128MiB by 512KiB with a 4KiB stride.
What platform are you targeting for your coreboot build? Are you restarting the instruction that faults? I'm really curious about the current fault patterns. It looks like things are faulting around accessing the imd_root_pointer root_offset field. Are these faults reads or writes? However, that's assuming cbmem_top() is returning 128MiB-512KiB. However, it doesn't explain the successive strides. Do you have serial port emulation to get the console messages out?
So in your platform code ensure 2 things are happening:
- cbmem_top() returns a highest address in 'ram' of the guest once
it's online. 128MiB if that's your expectation. The value cbmem_top() returns should never change from successive calls aside from NULL being returned when ram is not yet available. 2. cbmem_initialize_empty() is called one time once the 'ram' is online for use in the non-S3 resume path and cbmem_initialize() in the S3 resume path. If S3 isn't supported in your guest then just use cbmem_initialize_empty().
I will look in it. I see that RAM top is being provided by the CMOS emulator. I will look at cbmem_initialize_empty().
Regards Himanshu
On Wed, Dec 14, 2016 at 9:27 PM, Chauhan, Himanshu hschauhan@nulltrace.org wrote:
Hi Aaron,
Yes, I am mapping the memory where coreboot.rom is loaded to upper 4GiB. I create a fixed shadow page table entry for reset vector.
Coreboot doesn't have a linked address of RIP that I shared. I think with the increase in size of coreboot (from the previous tag I was using) the load address (guest physical) has changed. I used to calculate the load address manually. I will check this and get back.
Thanks.
On Wed, Dec 14, 2016 at 8:17 PM, Aaron Durbin adurbin@google.com wrote:
On Wed, Dec 14, 2016 at 3:11 AM, Chauhan, Himanshu hschauhan@nulltrace.org wrote:
Hi,
I am working on a hypvervisor and am using coreboot + FILO as guest BIOS. While things were fine a while back, it has stopped working. I see that my hypervisor can't handle address 0xFFFFFC while coreboot's RIP is at 0xfff81e41.
How are you loading up coreboot.rom in the VM? Are you just memory mapping it at the top of 4GiB address space? If so, what does 'cbfstool coreboot.rom print' show?
The exact register dump of guest is as follow:
[guest0/uart0] (__handle_vm_exception:558) ERROR: No region mapped to guest physical: 0xfffffc
GUEST guest0/vcpu0 dump state:
RAX: 0x9fe80 RBX: 0xfffff8 RCX: 0x1b RDX: 0x53a11439 R08: 0x0 R09: 0x0 R10: 0x0 R11: 0x0 R12: 0x0 R13: 0x0 R14: 0x0 R15: 0x0 RSP: 0x9fe54 RBP: 0xa0000 RDI: 0xfff801e4 RSI: 0x9fe80 RIP: 0xfff81e41
CR0: 0xe0000011 CR2: 0x0 CR3: 0xa23000 CR4: 0x0 CS : Sel: 0x00000008 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 11) DS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) ES : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) SS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) FS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) GS : Sel: 0x00000010 Limit: 0xffffffff Base: 0x00000000 (G: 1 DB: 1 L: 0 AVL: 0 P: 1 DPL: 0 S: 1 Type: 3) GDT : Sel: 0x00000000 Limit: 0x0000001f Base: 0xfff80200 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) LDT : Sel: 0x00000000 Limit: 0x0000ffff Base: 0x00000000 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) IDT : Sel: 0x00000000 Limit: 0x00000000 Base: 0x00000000 (G: 0 DB: 0 L: 0 AVL: 0 P: 0 DPL: 0 S: 0 Type: 0) TR : Sel: 0x00000000 Limit: 0x0000ffff Base: 0x00000000 (G: 1 DB: 0 L: 1 AVL: 1 P: 0 DPL: 0 S: 0 Type: 0) RFLAGS: 0xa [ ]
I want to know which binary file (.o) should I disassemble to look at the RIP?
I was looking at objdump -D -mi386 -Maddr16,data16 generated/ramstage.o
but this is prior to linking and thus only has offsets.
--
Regards [Himanshu Chauhan]
-- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
--
Regards [Himanshu Chauhan]
--
Regards [Himanshu Chauhan]