Julius Werner has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/35409 )
Change subject: arch/arm: Allow program loading of Linux kernels ......................................................................
Patch Set 5: -Code-Review
The first issue is that coreboot compiles ARMv7 code to thumb by default (-mthumb) so while the inline assembly here generates ARM instructions (with .arm), there's nothing to switch the processor to actually run in ARM mode. From what I understand of the ARM reference manual, I don't believe the `bl` branch will do it (is this the intention of that instruction?). I think it has to be a `blx` instead, otherwise the processor, while in thumb mode, will try to execute ARM encoded instructions and throw an undefined instruction exception.
Right, that probably needs to be a BLX. I think there is sometimes some assembler magic where the assembler turns a BL into a BLX automatically depending on the mode of the current and the target function -- but that might rely on declaring the other code as a separate function to the assembler which this doesn't quite do. So maybe that's why it doesn't translate correctly? (Note that it actually says BLX in the disassembly you provided, is that from you fixing it manually?)
The second issue is that GCC seems happy to choose registers for input operands that will clobber registers used in the inline assembly. Below is the output generated assembly for the arch_prog_run function. You can see that the fdt_ptr is loaded into r3 and the kernel_ptr is loaded into r2 at 0x80007bde but then because of the instruction at 80007bea, r2 gets overwritten with the fdt_ptr. So when the code later tries to jump to the kernel, it instead jumps to the fdt and then gets another undefined instruction exception.
Yeah, probably need to add r0 through r4 to the clobber list there.
I'm guessing there's probably a way to tell GCC to use registers for input operands that won't interfere with registers used in the inline assembly, but perhaps this might be all more straight forward if we used non-inline assembly instead?
I agree, with all these intricacies, just making it a real assembly function in a separate file is probably better.