On 08/08/11 22:40, Kenneth Salerno wrote:
Hi,
I finally got around to running QEMU in gdb to debug the AIX V6.1 boot hang and was able to get past where I got stuck previously:
Welcome to AIX. boot image timestamp: 00:39 35/2D
NULL ihandle The current time and date: 00:00:00 228784/00/0008 processor count: 1; memory size: 1024MB; kernel size: 2293829 boot device: cd:\ppc\chrp\bootfile.exe Validation failed: the "/rtas" device node does not exist. EXIT
It used to hang after "boot device: cd:\ppc\chrp\bootfile.exe" with OpenBIOS stuck at line 175 of libopenbios/ofmem_common.c:
for( pp=&ofmem->mfree; *pp&& (**pp).size< d->size ; pp =&(**pp).next ) { }
I made the following hack to get it to progress to the RTAS validation:
--- ofmem_common.c.ORIG 2011-08-08 17:04:25.375000000 -0400 +++ ofmem_common.c 2011-08-08 17:04:45.875000000 -0400 @@ -172,8 +172,9 @@ d->next = ofmem->mfree;
/* insert in the (sorted) freelist */
for( pp=&ofmem->mfree; *pp&& (**pp).size< d->size ; pp =&(**pp).next
) {
}
+/* for( pp=&ofmem->mfree; *pp&& (**pp).size< d->size ; pp =&(**pp).next ) {
} */
pp=&ofmem->mfree; d->next = *pp; *pp = d;
Before I made the above change, the following is what I saw in gdb and qemu console:
(qemu) info cpus
- CPU #0: nip=0x00000000fff91a84 thread_id=6828
(gdb) 0x00000000fff91a84 in ofmem_free (ptr=0x3fca1774) at ../libopenbios/ofmem_common.c:175 175 for( pp=&ofmem->mfree; *pp&& (**pp).size< d->size ; pp =&(**pp).next ) {
(gdb) print /x pp $1 = 0x3fc9e0ac
The value 0x3fc9e0ac can be found in register GPR08:
(qemu) info registers NIP 00000000fff91a84 LR 00000000fff91a58 CTR 00000000fff93784 XER 000000002000 0000 MSR 0000000000003032 HID0 0000000060000000 HF 0000000000002000 idx 1 TB 00000001 7638146663 DECR 951787926 GPR00 000000003fca1764 000000003fdf69e0 00000000fffc68e8 000000003fc9e0ac GPR04 00000000fffc0088 000000003fc5bc68 00000000fffc0860 0000000000044200 GPR08 0000000000000002 000000003fc9e0ac 0000000000000024 0000000000000810 GPR12 00000000000088ac 0000000000000000 00000000fffb6703 00000000fffb7b65 GPR16 00000000fffb8331 00000000fffb6706 0000000004000000 00000000fffbf6b8 GPR20 00000000fffbf634 00000000fffc68e8 00000000fffbf634 00000000fffb650a GPR24 00000000fffb64f8 00000000fffb6478 00000000fffb6500 00000000fffb6505 GPR28 00000000fffb707e 0000000000000027 0000000000000027 000000003fca1774 CR 48000084 [ G L - - - - L G ] RES ffffffffffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 00000000 SRR0 00000000fffaa590 SRR1 0000000000003032 PVR 00000000003c0301 VRSAVE 000 0000000000000 SPRG0 000000003fe00000 SPRG1 000000003fdf6630 SPRG2 0000000000000000 SPRG3 000 0000000000000 SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 000 0000000000000 SDR1 000000003fe00000
Here is my QEMU command:
./testing/qemu/ppc64-softmmu/qemu-system-ppc64 \ -L ./testing/qemu/pc-bios \ -m 1024 \ -bios ./testing/openbios-devel/obj-ppc64/openbios-qemu.elf -drive file=images/aix.img,index=0,media=disk,cache=writeback -cdrom images/ibmaix.iso \ -boot d \ -nographic \ -rtc base=localtime,clock=host \ -uuid 17202d0a-45f8-4159-a8e1-78b866f50aa7 \ -serial tcp::9979,server,nowait \ -monitor tcp::9980,server,nowait \ -gdb tcp::1234
powerpc64-unknown-linux-gnu-gdb testing/openbios-devel/obj-ppc64/openbios-qemu.elf-nostrip
I don't really know what I'm doing so any help explaining this function in ofmem_common.c would be appreciated.
Hi Ken,
I didn't write the original version of this code, however I have played around with it enough to get an idea of what it does, so hopefully the explanation below will make sense ;)
ofmem_malloc() and ofmem_free() are mapped to malloc() and free() calls used within OpenBIOS. They are not used by any client program. If an area of memory is requested via CIF then the allocation is handled separate using the arch/*/ OFMEM code which will handle memory allocations/mapping across the entire address range.
As per the code in libopenbios/ofmem_common.c, ofmem_malloc() allocates memory within the OpenBIOS binary image between ofmem_arch_get_malloc_base() and ofmem_arch_get_heap_top(). These limits are also specified per architecture within the arch/*/ OFMEM handler.
In order to facilitate memory re-use, if a chunk of memory is ofmem_free()d then it is added to a free memory linked-list starting at ofmem->mfree ordered by size. The ofmem_malloc() code at line 105 consults this list first when trying to allocate memory in order to try and ensure that repeated ofmem_malloc() and ofmem_free() calls on the same size area of memory don't deplete the available memory store.
Finally, ofmem_malloc() must ensure that the returned pointer is aligned both physically and virtually, and the so the memory descriptor for an entry is stored just before it in memory like this:
| <- aligned ptr ---------------------------------------- | alloc_desc_t | region (<size> bytes) | ----------------------------------------
The code you are having trouble with is the code that tries to add the ofmem_free() location back into the freelist starting at ofmem->mfree.
Now if ofmem->mfree is getting clobbered somehow then that would cause the linked list to jump to random locations in memory and hence cause a crash. In these cases the tactic is to add a watchpoint in gdb at ofmem->mfree so that gdb breaks whenever that memory location is written to. Assuming you compile OpenBIOS with -O0 -g then you can obtain a backtrace in order to find out where the error is occurring and post it back to the list if you need further help.
HTH,
Mark.