Hello,
With or without the pending ofmem patches, ppc64 boot currently hangs after "Trying cd:,\:tbxi..." (before "Trying cd:,\ppc\chrp \bootfile.exe..."). Symptom is, 0x700 program exception vector (not 0xfff00700) is being called with SRR1 pointing to some address that's neither in the low vectors range nor in OpenBIOS itself apparently. I noticed that branching relatively to unexpected_excep from there is wrong and patched it to bctr there (which unfortunately appears to break 32-bit ppc64), but usually it does not manage to properly do the printk()
Here's what I found out so far:
* a breakpoint for bootinfo_loader_init() or so is not reached * The "Trying" comes from (encode-bootpath) in forth/debugging/client.fs * `debug (encode-bootpath) boot` does not return from open-dev * `debug open-dev` does not return from path-resolution * path-resolution gets called "endlessly" (5+ times single-stepping it), the hang occurred after successfully returning from some instance (after having successfully done so for a previous instance)
Does anyone have a hunch what might be going wrong? Or tips how to further debug?
Thanks, Andreas
Andreas Färber wrote:
Here's what I found out so far:
- a breakpoint for bootinfo_loader_init() or so is not reached
- The "Trying" comes from (encode-bootpath) in forth/debugging/client.fs
- `debug (encode-bootpath) boot` does not return from open-dev
- `debug open-dev` does not return from path-resolution
- path-resolution gets called "endlessly" (5+ times single-stepping it),
the hang occurred after successfully returning from some instance (after having successfully done so for a previous instance)
Does anyone have a hunch what might be going wrong? Or tips how to further debug?
Hi Andreas,
Do you mean path-resolution or (path-resolution)? IIRC (path-resolution) is called recursively for each level of the device so this could potentially happen depending upon the device tree.
The "cd:,\:tbxi" device is a reference to finding a file on a HFS file system with a particular filesystem label/type in the MacOS System folder to boot. So given that CONFIG_HFS and CONFIG_HFSP are set for PPC64, if you're trying to access a HFS file system then it should be hitting hfs_fs.c::hfs_files_open() or hfsp_fs.c::hfsp_files_open() - maybe there are some 64-bit related errors in the code there? Tracing through libopenbios/load.c may help here too.
Alternatively if this is not the case, you may be hitting some generic memory corruption. I had a similar error on SPARC64 with strange behaviour caused by the dictionary being accidentally overwritten.
HTH,
Mark.
Hi,
Am 09.12.2010 um 11:30 schrieb Mark Cave-Ayland:
Andreas Färber wrote:
Here's what I found out so far:
- a breakpoint for bootinfo_loader_init() or so is not reached
- The "Trying" comes from (encode-bootpath) in forth/debugging/
client.fs
- `debug (encode-bootpath) boot` does not return from open-dev
- `debug open-dev` does not return from path-resolution
- path-resolution gets called "endlessly" (5+ times single-stepping
it), the hang occurred after successfully returning from some instance (after having successfully done so for a previous instance) Does anyone have a hunch what might be going wrong? Or tips how to further debug?
Do you mean path-resolution or (path-resolution)? IIRC (path- resolution) is called recursively for each level of the device so this could potentially happen depending upon the device tree.
path-resolution.
The "cd:,\:tbxi" device is a reference to finding a file on a HFS file system with a particular filesystem label/type in the MacOS System folder to boot. So given that CONFIG_HFS and CONFIG_HFSP are set for PPC64, if you're trying to access a HFS file system then it should be hitting hfs_fs.c::hfs_files_open() or hfsp_fs.c::hfsp_files_open() - maybe there are some 64-bit related errors in the code there? Tracing through libopenbios/load.c may help here too.
I did end up in iso9660_files_open() and did get into iso9660_opendir() to the "return NULL;" for the iso9660_get_node() == NULL case and then hit an error seemingly during the epilogue (restgpr...).
So I tried using the grubfs implementation instead and get to "Trying cd:,\ppc\chrp\bootfile.exe..." (got that once during single-stepping through Forth, too, though).
I get to start.s:call_elf() and applied some fixes there. Same for start.S:of_client_callback. client.c carelessly assumed that the prom_args_t struct can hold char* and long but for ppc64 both need to be int. I've drafted an accessor function, the DEBUG_CIF code still does not compile due to lots of format string assumptions that now break - I'm considering a FMT_prom_argx to fix that. A prom_arg_t typedef for the int vs. long thing also sounds intriguing.
load.c:load() is not called.
Alternatively if this is not the case, you may be hitting some generic memory corruption. I had a similar error on SPARC64 with strange behaviour caused by the dictionary being accidentally overwritten.
Yeah, same thing here. When called back from the client, r2 was not set up as TOC pointer so that client.c:of_client_interface() would cause 'unpredictable' damage to memory.
The following branch boots AIX as before (not finding /rtas). Debian starts to boot but serial complains ">> out of malloc memory (11d28)!" (same if increased to -m 1024) and it appears to stop shortly after, last line is "returning from prom_init".
http://repo.or.cz/w/openbios/afaerber.git/shortlog/refs/heads/ppc64-boot
Thanks for pointing me in the right direction! I'll try to clean this up, but it'll take some days.
Andreas
Andreas Färber wrote:
The following branch boots AIX as before (not finding /rtas). Debian starts to boot but serial complains ">> out of malloc memory (11d28)!" (same if increased to -m 1024) and it appears to stop shortly after, last line is "returning from prom_init".
Good work! The malloc memory region for OFMEM on PPC should lie between ofmem_arch_get_malloc_base() and ofmem_arch_get_heap_top() which is tweakable in arch/ppc/qemu/ofmem.c. According to the comments, it should be set to 2MB which I would have thought is enough but AIX could be doing something clever - try enabling the CONFIG_DEBUG_OFMEM option to confirm this.
http://repo.or.cz/w/openbios/afaerber.git/shortlog/refs/heads/ppc64-boot
Thanks for pointing me in the right direction! I'll try to clean this up, but it'll take some days.
No worries - glad that you managed to find out what was happening :)
ATB,
Mark.
On Tue, Dec 7, 2010 at 9:06 PM, Andreas Färber andreas.faerber@web.de wrote:
Hello,
With or without the pending ofmem patches, ppc64 boot currently hangs after "Trying cd:,\:tbxi..." (before "Trying cd:,\ppc\chrp\bootfile.exe..."). Symptom is, 0x700 program exception vector (not 0xfff00700) is being called with SRR1 pointing to some address that's neither in the low vectors range nor in OpenBIOS itself apparently. I noticed that branching relatively to unexpected_excep from there is wrong and patched it to bctr there (which unfortunately appears to break 32-bit ppc64), but usually it does not manage to properly do the printk()
Here's what I found out so far:
- a breakpoint for bootinfo_loader_init() or so is not reached
- The "Trying" comes from (encode-bootpath) in forth/debugging/client.fs
- `debug (encode-bootpath) boot` does not return from open-dev
- `debug open-dev` does not return from path-resolution
- path-resolution gets called "endlessly" (5+ times single-stepping it), the
hang occurred after successfully returning from some instance (after having successfully done so for a previous instance)
Does anyone have a hunch what might be going wrong? Or tips how to further debug?
Does running QEMU with flag -d in_asm,int reveal anything? How about recompiling with DEBUG_SOFTWARE_TLB in target-ppc/op_helper.c enabled?