Not long ago, I worked my way through the upx code, and figured out what I needed to do to use it in another project. And started using that as the decompressor for etherboot.
Today I have gone through and replaced the copy from rom to ram in crt0.base with a decompression step. This allows me to compress all of the C code when it is stored in the rom chip, and only decompress it as it is copied to RAM. The upx decompressor is roughly 180 bytes. And I get a compression ratio of about 2:1
When I start compiling in a lot of debug messages, or the intel microcode updates this compressor gives me a lot more room to work with. I reduced my footprint on the E7500 port by 20KB :)
This interacts badly with my implementation of the cache as ram trick but that code wasn't stable enough to really trust. :(
And in theory since I am moving less data off of the rom chip things should go even faster.
Eric