Joshua Wise <joshua(a)joshuawise.com> writes:
On Friday 19 December 2003 7:15 pm, Eric W. Biederman
Yes we are reaching the point where we can
converge on some of these
things. LAB might be the right framework. And if it is something good it
will save me the trouble of starting my own project. But it takes more
than a hyper active 2 year old to convince me. It might take a hyperactive
2 year old to remind me about interesting ideas though.
Right, well then you
should see it in action. If you're in the Boston area
sometime soon I can give you a demo on an iPAQ, perhaps.
Salt Lake City, and Illinois with my family for Christmas. Though a serial
console logfile might be interesting.
512k is with a few ARM-specific drivers, and jffs2. It
does not have
networking. This is with kernel 2.6.
Hmm. I am pretty certain I have gotten 2.6
down some smaller. Our
practical limit with LinuxBIOS etc is in the neighborhood of 384KB.
2.4 in 256k, but it's rather useless like that. If you do not plan
to load modules at runtime, you can shave a good bit more off of it. If you
write bzip2 compression support (or upport the stuff from kernel 2.4), you
can shave even more off of it. I've pulled off 50k with bzip2 (not actually
written the code, just did a bzip2 -9 < piggy > piggy.bz2).
The problem is that the bzip2 decompresser is huge, usually bzip2 is
a net loss because of the decompresser. But it may be possible to
write a tuned version. The cases I have typically worried about are
much smaller and I have made huge gains by switching to nrv2b from upx
because the decompresser is something like 100 bytes, and the
compression is roughly as good as gzip.
If you don't plan to have a framebuffer, you can
shave some off of
it. If you don't plan to have jffs2 you can shave a lot more
off. Little tidbits here and there make the world go 'round.
> Well I think I have run finally convinced to use
the MTD drivers...
> Mostly I prefer to flash from a production kernel rather than a
> bootloader, there are more recover options but anyway.
Ah yes, the ancient problem. Instead of
read/modify/erase/write, it often
turns into read/modify/erase/poweroff. That's Bad.
I will see.
Does LAB restrict it's kernel to a very small subset of
memory? Or do you use something like kexec?
To boot a secondary kernel I use some
code I wrote called armboot, although
it's not very arm specific. It does something like this:
1) Load the new kernel into a contiguous vmalloced block.
2) We allocate 64k for a list of things that need to be relocated. We call
this a pointer of type "struct physlist", which is 32 bytes. It has four
ints: the new address, the old address, the block size, and whether this is
the last block.
2) In blocks of the maximum kmalloc size (these blocks have to be contiguous),
we kmalloc space for the kernel, and memcpy the kernel into those blocks. We
then fill in a struct physlist, and move on to the next struct physlist. We
can do this because kmalloc is always contiguous, and we can always map it
3) We set up another kmalloced block for the tagged list of boot parameters
that you need on ARM.
4) We set up one more kmalloced block and copy an assembler function into it,
to make sure we don't wipe ourself out while relocating.
5) We flush our data caches.
6) We call the relocated assembler function, which turns off the MMU, jumps
into the relocated assembler function's physical address, and does actual
relocating. Then we jump into our newly moved zImage. Confused yet?
Nope. Having implemented something similar it sounds sane.
7) If at any point we failed, the system could be in
an inconsistent state.
You will want to panic() if you fail, because you're leaking memory like a
sieve, and if you failed there's probably something bigger wrong.
This looks more difficult than it actually is. The C
segment is only about 170
lines, and the assembler bit is 90 lines.
The reason that this works is that kmalloc should allocate from the top of
memory down. You need a fair bit of ram - say, 8MB - to prevent the tail from
running over other important structures, such as the list of addresses to
relocate. But it seems to work well enough, and it looks like it should be
fairly portable. The important code is in handhelds.org
linux/kernel26, files drivers/bootldr/armboot.c and
Ok. I need to get into that kernel tree and take a look. But it sounds similar
to my kexec stuff. Which I discuss at least part of the time on fastboot(a)osdl.org.
It sounds compatible enough that we could productively merge implementations,
that plus my kexec stuff is still on Andrew Morton todo list
means it has a fair shot of getting into the stock kernel.
On a practical side I think I can boost it's priority high enough after I
get back to actually do something.
A recent version of kexec patch is at:
Kexec as it is currently structured is actually two system calls
callable from user space.
sys_kexec_load() load the kernel into a linked list of pages, making
certain that when those pages are copied to their final destination
nothing will be stomped. And it allocates a chunk of memory with kmalloc
for the bit of code that copies the kernel to it's final resting
place. This can fail at any time and the system is in a consistent state.
sys_reboot(LINUX_REBOOT_CMD_KEXEC) initiates the transfer to the new
The new kernel is started in physical mode.
sys_kexec_load() is passed an entry point to jump to, and an array
of physical destination address, virtual process space address, and virtual
length regions to load. Which allows us to load arbitrary things.
The only requirement is that you have enough memory for both kernels
simultaneously. For truly high end machines there are some other
restrictions because physical mode does not allow access to all of
their memory but anyway...
From the descriptions my kexec stuff is a little more
general and a little
more robust than your armboot, so I'd like to merge your
mine if possible. Now that 2.6.0 is out I can start sending patches
- I avoid deliberately avoid vmalloc, the vmalloc area is a limited
resource, and I support loading large kernel ramdisk combinations.
- I avoid greater than page size memory allocations because random
memory fragmentation can make that fail.
- I don't leak memory.
- I have an implementation that works on SMP machines.
The worst part is getting all of the drivers to shut themselves down
properly. But I have the appropriate hooks and it doesn't take an
extension of the kernel api just lots of driver bug fixes.
If even the arm guys are up to using a kernel it should be easier
to make progress in this array. On several projects the people
I have talked to are two size constrained to even try using a kernel.