Hi,
Am 09.12.2010 um 11:30 schrieb Mark Cave-Ayland:
Andreas Färber wrote:
Here's what I found out so far:
- a breakpoint for bootinfo_loader_init() or so is not reached
- The "Trying" comes from (encode-bootpath) in forth/debugging/
client.fs
- `debug (encode-bootpath) boot` does not return from open-dev
- `debug open-dev` does not return from path-resolution
- path-resolution gets called "endlessly" (5+ times single-stepping
it), the hang occurred after successfully returning from some instance (after having successfully done so for a previous instance) Does anyone have a hunch what might be going wrong? Or tips how to further debug?
Do you mean path-resolution or (path-resolution)? IIRC (path- resolution) is called recursively for each level of the device so this could potentially happen depending upon the device tree.
path-resolution.
The "cd:,\:tbxi" device is a reference to finding a file on a HFS file system with a particular filesystem label/type in the MacOS System folder to boot. So given that CONFIG_HFS and CONFIG_HFSP are set for PPC64, if you're trying to access a HFS file system then it should be hitting hfs_fs.c::hfs_files_open() or hfsp_fs.c::hfsp_files_open() - maybe there are some 64-bit related errors in the code there? Tracing through libopenbios/load.c may help here too.
I did end up in iso9660_files_open() and did get into iso9660_opendir() to the "return NULL;" for the iso9660_get_node() == NULL case and then hit an error seemingly during the epilogue (restgpr...).
So I tried using the grubfs implementation instead and get to "Trying cd:,\ppc\chrp\bootfile.exe..." (got that once during single-stepping through Forth, too, though).
I get to start.s:call_elf() and applied some fixes there. Same for start.S:of_client_callback. client.c carelessly assumed that the prom_args_t struct can hold char* and long but for ppc64 both need to be int. I've drafted an accessor function, the DEBUG_CIF code still does not compile due to lots of format string assumptions that now break - I'm considering a FMT_prom_argx to fix that. A prom_arg_t typedef for the int vs. long thing also sounds intriguing.
load.c:load() is not called.
Alternatively if this is not the case, you may be hitting some generic memory corruption. I had a similar error on SPARC64 with strange behaviour caused by the dictionary being accidentally overwritten.
Yeah, same thing here. When called back from the client, r2 was not set up as TOC pointer so that client.c:of_client_interface() would cause 'unpredictable' damage to memory.
The following branch boots AIX as before (not finding /rtas). Debian starts to boot but serial complains ">> out of malloc memory (11d28)!" (same if increased to -m 1024) and it appears to stop shortly after, last line is "returning from prom_init".
http://repo.or.cz/w/openbios/afaerber.git/shortlog/refs/heads/ppc64-boot
Thanks for pointing me in the right direction! I'll try to clean this up, but it'll take some days.
Andreas