On Tue, Feb 14, 2017 at 11:56 AM, ron minnich <rminnich(a)gmail.com> wrote:
> Just a reminder about times past. This discussion has been ongoing since
> 2000. In my view the questions come down to how much the ramstage does, how
> that impacts code complexity and performance, and when the ramstage gets so
> much capability that it ought to be a kernel.
>
> In the earliest iteration, there was no ramstage per se. What we now call
> the ramstage was a Linux kernel.
>
> We had lots of discussions in the early days with LNXI and others about what
> would boot fastest, a dedicated boot loader like etherboot or a general
> purpose kernel like Linux. In all the cases we measured at Los Alamos, Linux
> always won, easily: yes, slower to load than etherboot, more startup
> overhead, but once started Linux support for concurrency and parallelism
> always won the day. Loaders like etherboot (and its descendant, iPXE) spend
> most of their time doing nothing (as measured at the time). It was fun to
> boot 1000 nodes in the time it took PXE on one node to find a connected NIC.
>
> The arguments over payload ended when the FLASH sockets changed to QFP and
> maxed at 256K and Linux could no longer fit.
>
> But if your goal is fast boot, in fact if your goal is 800 miliseconds, we
> know this can work on slow ARMs with Linux, as was shown in 2006.
>
> The very first ramstage was created because Linux could not correctly
> configure a PCI bus in 2000. The core of the ramstage as we know it was the
> PCI config.
>
> We wanted to have ramstage only do PCI setup. We initially put SMP startup
> in Linux, which worked on all but K7, at which point ramstage took on SMP
> startup too. And ramstage started to grow. The growth has never stopped.
>
> At what point is ramstage a kernel? I think at the point we add file systems
> or preemptive scheduling. We're getting dangerously close. If we really
> start to cross that boundary, it's time to rethink the ramstage in my view.
> It's not a good foundation for a kernel.
>
> I've experimented with kernel-as-ramstage with harvey on the riscv and it
> worked. In this case, I manually removed the ramstage from coreboot.rom and
> replaced it with a kernel. It would be interesting, to me at least, to have
> a Kconfig option whereby we can replace the ramstage with some other ELF
> file, to aid such exploration.
>
> I also wonder if we're not at a fork in the road in some ways. There are
> open systems, like RISCV, in which we have full control and can really get
> flexibility in how we boot. We can influence the RISCV vendors not to
> implement hardware designs that have negative impact on firmware and boot
> time performance. And then there are closed systems, like x86, in which many
> opportunities for optimization are lost, and we have little opportunity to
> impact hardware design. We also can't get very smart on x86 because the FSP
> boulder blocks the road.
>
> Where do we go from here?
That I'm not sure. And it does very much depend on the goals of the
project. I will say this, though. Not all architectures are the same
so comparing them both as apples is impossible. With ARM punting
almost all of its initialization to ATF or the kernel it's not
surprising that coreboot's current architecture is simple and easy for
it. The work has just been pushed into other places. For some reason
Intel continually decides to place a large amount of things into the
firmware to do, but I think that decision is usually taken because it
keeps the kernel simpler. The complexity just got moved to a different
place in the stack. Coupled with the decision to hide the SoC support
into a closed off blob just makes things worse. When comparing an
Intel solution to an ARM vendor the SoC bits for bring up are much
more open and thus easier to optimize, if needed. As noted before you
can't punt things out on x86 where device visibility needs be
configured prior to resource allocation so there's definitely
intertwining involved in bringing up the intel SoCs. Firmware is
inherently exposed to the micro-architecture of the underlying device.
There's not a good way around that. Acting like it's not doesn't solve
that problem.
>
> ron