Sam Lewis has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/35409 )
Change subject: arch/arm: Allow program loading of Linux kernels ......................................................................
Patch Set 5:
So I finally had a bit of time to dig into why this didn't work for me - sorry, has been a bit hectic at my day job!
The inline assembly has a few issues, I think. I haven't done that much ARM inline assembly and have been learning as I'm going, so please correct anything I've misunderstood.
The first issue is that coreboot compiles ARMv7 code to thumb by default (-mthumb) so while the inline assembly here generates ARM instructions (with .arm), there's nothing to switch the processor to actually run in ARM mode. From what I understand of the ARM reference manual, I don't believe the `bl` branch will do it (is this the intention of that instruction?). I think it has to be a `blx` instead, otherwise the processor, while in thumb mode, will try to execute ARM encoded instructions and throw an undefined instruction exception.
The second issue is that GCC seems happy to choose registers for input operands that will clobber registers used in the inline assembly. Below is the output generated assembly for the arch_prog_run function. You can see that the fdt_ptr is loaded into r3 and the kernel_ptr is loaded into r2 at 0x80007bde but then because of the instruction at 80007bea, r2 gets overwritten with the fdt_ptr. So when the code later tries to jump to the kernel, it instead jumps to the fdt and then gets another undefined instruction exception.
void arch_prog_run(struct prog *prog) { 80007bcc: b510 push {r4, lr} 80007bce: 4604 mov r4, r0 void (*doit)(void *); cache_sync_instructions(); 80007bd0: f7ff fe00 bl 800077d4 <cache_sync_instructions> /* The Linux kernel takes 3 dword's as argument */ switch (prog_cbfs_type(prog)) { 80007bd4: 6863 ldr r3, [r4, #4] 80007bd6: 2b21 cmp r3, #33 ; 0x21 80007bd8: d113 bne.n 80007c02 <arch_prog_run+0x36> case CBFS_TYPE_FIT: /* Flattened image tree */ dcache_mmu_disable(); 80007bda: f7ff fdef bl 800077bc <dcache_mmu_disable> asm ( 80007bde: e9d4 2307 ldrd r2, r3, [r4, #28] 80007be2: f04f 0000 mov.w r0, #0 80007be6: f04f 31ff mov.w r1, #4294967295 ; 0xffffffff 80007bea: 461a mov r2, r3 80007bec: 4613 mov r3, r2 80007bee: f04f 04d3 mov.w r4, #211 ; 0xd3 80007bf2: f384 8b00 msr CPSR_fxc, r4 80007bf6: f000 e802 blx 80007bfc <arch_prog_run+0x30> 80007bfa: 0000 .short 0x0000 80007bfc: e1a0f003 mov pc, r3 break; default: doit = prog_entry(prog); doit(prog_entry_arg(prog)); } }
I'm guessing there's probably a way to tell GCC to use registers for input operands that won't interfere with registers used in the inline assembly, but perhaps this might be all more straight forward if we used non-inline assembly instead?
I saw that it has already been discussed a little, but what do people think about adapting this change to use an assembly function similar to how depthcharge does it? (https://chromium.googlesource.com/chromiumos/platform/depthcharge/+/refs/hea... and https://chromium.googlesource.com/chromiumos/platform/depthcharge/+/refs/hea...). I think (but correct me if I'm wrong) this should make the swap to ARM mode automatic (through the function call), and should make the input operand part a bit simpler (can use normal calling convention). I'm happy to give this a shot, but equally happy if Arthur or somebody else wants to do it.