An assumption on v3 going on was that we could run out of ROM, and be fast, since caches are our friend.
That assumption is not working out. Here is another possible design.
stage 0, running in ROM, turns on CAR and runs initram in the LAR.
initram disables car, copies ALL of LAR to top of memory (defined as Top Of Ram - size of LAR)
initram finds stage2 in LAR, uncompresses to RAM, jumps to it.
stage2 finds stage3 in LAR, uncompresses to RAM, runs it.
stage3 finds payload in LAR, uncompresses to RAM, runs it.
So we go to a chain model instead of call/return.
We stop using ROM-based code due to performance problems.
Comments?
ron