Hi,
what was the motivation to select nrv2b for compression? Speed? Size of decompressor? Coding style?
Size File 843910 vmlinux.bin 437684 vmlinux.bin.nrv2b 430890 vmlinux.bin.gz 414679 vmlinux.bin.bz2 357609 vmlinux.bin.7z
If the 7z decompressor is less than 80 kb bigger than the nrv2b decompressor, would it make sense to switch?
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
Hi,
what was the motivation to select nrv2b for compression? Speed? Size of decompressor? Coding style?
Size File 843910 vmlinux.bin 437684 vmlinux.bin.nrv2b 430890 vmlinux.bin.gz 414679 vmlinux.bin.bz2 357609 vmlinux.bin.7z
If the 7z decompressor is less than 80 kb bigger than the nrv2b decompressor, would it make sense to switch?
It's fine with me. As i said, I had a gzip decompressor in early linuxbios, and we can look at getting it back. I just stole the one from the kernel. nrv2b is nice, I understand, for the romcc section where you don't have many variables to play with, but it's been so long since I looked at the gunzip code that I have no idea if this is true.
ron
* Ronald G Minnich rminnich@lanl.gov [060522 07:22]:
It's fine with me. As i said, I had a gzip decompressor in early linuxbios, and we can look at getting it back. I just stole the one from the kernel. nrv2b is nice, I understand, for the romcc section where you don't have many variables to play with, but it's been so long since I looked at the gunzip code that I have no idea if this is true.
We never actually compile nrv2b with romcc, do we? At the point we run nrv2b ram has to be enabled anyways - we uncompress the code somewhere. So our limits are far bigger than that.
in crt0.s, and it is used to uncompress linuxbios_ram when using romcc.
YH
On 5/22/06, Stefan Reinauer stepan@coresystems.de wrote:
- Ronald G Minnich rminnich@lanl.gov [060522 07:22]:
It's fine with me. As i said, I had a gzip decompressor in early linuxbios, and we can look at getting it back. I just stole the one from the kernel. nrv2b is nice, I understand, for the romcc section where you don't have many variables to play with, but it's been so long since I looked at the gunzip code that I have no idea if this is true.
We never actually compile nrv2b with romcc, do we? At the point we run nrv2b ram has to be enabled anyways - we uncompress the code somewhere. So our limits are far bigger than that.
-- coresystems GmbH • Brahmsstr. 16 • D-79104 Freiburg i. Br. Tel.: +49 761 7668825 • Fax: +49 761 7664613 Email: info@coresystems.de • http://www.coresystems.de/
-- linuxbios mailing list linuxbios@linuxbios.org http://www.openbios.org/mailman/listinfo/linuxbios
On 5/22/06, Stefan Reinauer stepan@coresystems.de wrote:
We never actually compile nrv2b with romcc, do we? At the point we run nrv2b ram has to be enabled anyways - we uncompress the code somewhere. So our limits are far bigger than that.
yhlu wrote:
in crt0.s, and it is used to uncompress linuxbios_ram when using romcc.
You mean copy_and_run() is compiled with romcc? Can we change that?
What's the difference between CONFIG_COMPRESSED_ROM_STREAM and CONFIG_COMPRESS? Can we use lzma compression for both?
Regards, Carl-Daniel
* Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net [060522 21:29]:
On 5/22/06, Stefan Reinauer stepan@coresystems.de wrote:
We never actually compile nrv2b with romcc, do we? At the point we run nrv2b ram has to be enabled anyways - we uncompress the code somewhere. So our limits are far bigger than that.
yhlu wrote:
in crt0.s, and it is used to uncompress linuxbios_ram when using romcc.
You mean copy_and_run() is compiled with romcc? Can we change that?
No, it is not. But src/arch/i386/init/crt0.S.lb contains another assembly written version of an nrv2b decompressor.
crt0.s should be changed to call the C version of unrv2b.
Then it can easily be changed later to use any other algorithm.
Using assembly at this point even though ram is enabled already must be some code fragment from old times when we did a lot more assembly.
This might require though that we have
1. auto.c stage (romcc) 2. uncompression stage (gcc) 3. compressed linuxbios stage (gcc, initially compressed)
This reads ugly in a mail but is by no means uglier than keeping assembler versions of decompressors in the code.
What's the difference between CONFIG_COMPRESSED_ROM_STREAM and CONFIG_COMPRESS? Can we use lzma compression for both?
CONFIG_COMPRESSED_ROM_STREAM says: the payload is compressed. CONFIG_COMPRESS says: LinuxBIOS is compressed.
What's the difference between CONFIG_COMPRESSED_ROM_STREAM and CONFIG_COMPRESS? Can we use lzma compression for both?
CONFIG_COMPRESSED_ROM_STREAM says: the payload is compressed. CONFIG_COMPRESS says: LinuxBIOS is compressed.
Perhaps we should rename those to something more indicative of what they do? Say:
CONFIG_COMPRESSED_ROM_STREAM -> CONFIG_COMPRESSED_PAYLOAD CONFIG_COMPRESS -> CONFIG_COMPRESSED_LINUXBIOS
Stefan Reinauer wrote:
- auto.c stage (romcc)
- uncompression stage (gcc)
- compressed linuxbios stage (gcc, initially compressed)
What I think of as the long term goal for OLPC: 1. auto.c stage: CAR, gunzip gcc code to ram 2. gcc stage: turn on the hardware, gunzip vga bios, other pieces, and payload to ram 3. payload runs, may do further gunzip work
the gunzip was removed from V1 because linux had its own decompressor in the bzimage. Given that there was, at the time, only one gunzip step, using the kernel's gunzip was deemed to save space. On OLPC, which is representative of the new systems, there are several distinct compressed bits: 1. nrv2b for gcc-compiled linuxbios 2. nrv2b for OLPC VSA binary 3. nrv2b for linux+initrd 4. nrv2b for VGA BIOS
it's clear that the balance of compression is now in linuxbios. So, in my view, it's time to move the decompressor back into linuxbios and dump the one we use in the kernel. This will help Plan 9 as well. We should also move to a better compression algorithm, I think.
I've already modified mkelfimage to just create 3 segments 1. kernel 2. cmdline+params (at 0x90000) 3. initrd (at 0x8000000)
So this is no real problem.
ron
Ronald G Minnich wrote:
Stefan Reinauer wrote:
- auto.c stage (romcc)
- uncompression stage (gcc)
- compressed linuxbios stage (gcc, initially compressed)
What I think of as the long term goal for OLPC:
- auto.c stage: CAR, gunzip gcc code to ram
- gcc stage: turn on the hardware, gunzip vga bios, other pieces, and
payload to ram 3. payload runs, may do further gunzip work
Do we really want to gunzip anything in CAR stage? How big is the code to turn on RAM? Will it really benefit from being compressed? I would turn on RAM first and only then start uncompression.
At what stage am I allowed to use malloc? gcc stage? payload stage? Can I abuse the function stack before?
Another thing that bothers me: It seems we include the uncompression code multiple times. Can we include it only once and make all stages use it?
the gunzip was removed from V1 because linux had its own decompressor in the bzimage. Given that there was, at the time, only one gunzip step, using the kernel's gunzip was deemed to save space. On OLPC, which is representative of the new systems, there are several distinct compressed bits:
- nrv2b for gcc-compiled linuxbios
- nrv2b for OLPC VSA binary
- nrv2b for linux+initrd
- nrv2b for VGA BIOS
Can you upload all these files somewhere so I can make a size comparison and work out the best compression parameters for lzma?
I have a (not completely finished) script called squeeze_lzma which will run lzma with different parameters and figure out which yield the best compression for the unlimited RAM case and for the small RAM case. So we can use compression which needs only 5k RAM for decompression in stage 1 and 60k RAM in stage 2+3.
Regards, Carl-Daniel
We at least need two times 1. one is in car stage, and the code is in rom, and it is used to uncompress linuxbios_ram 2. one is in linuxbios_ram, and it is in ram, it is used to uncompress payload.
car stage, before call copy_and_run, the stack is already in ram. but code is still in rom. So at this time we need to balance the lzma benifit on compress linuxbios_ram and it's own code size increase in rom. also need to set one extra buffer in ram for every ap code.
Hope we can squash car stage code + linuxbios_ram code into 64k again.
YH
yhlu wrote:
We at least need two times
- one is in car stage, and the code is in rom, and it is used to
uncompress linuxbios_ram 2. one is in linuxbios_ram, and it is in ram, it is used to uncompress payload.
car stage, before call copy_and_run, the stack is already in ram. but code is still in rom. So at this time we need to balance the lzma benifit on compress linuxbios_ram and it's own code size increase in rom. also need to set one extra buffer in ram for every ap code.
OK, so we have to check which of the two solutions is smaller. 1) lzma decompressor + lzma compressed linuxbios_ram 2) nrv2b decompressor + nrv2b compressed (linuxbios_ram + lzma decompressor)
I just checked this for all abuild targets. The second variant wins for half of the targets. That brings me to a third option: 3) nrv2b decompressor + nrv2b compressed lzma decompressor + lzma compressed linuxbios_ram nrv2b decompressor needs IIRC 512 bytes, lzma decompressor needs 4096 bytes, so if nrv2b only compresses the lzma decompressor and we assume ~50% savings by compression, option 3 is always 1.5k smaller than option 1 and option 3 is 1.5k-7.3k smaller (2k for OLPC) than option 2.
How difficult would it be for us to build option 3?
Hope we can squash car stage code + linuxbios_ram code into 64k again.
Depending on the board, that might be difficult. For OLPC, everything fits nicely into 32k. For most Tyan boards, we suffer from the fact that auto.o is uncompressed and ~28k alone. That is really horrible. Any reasons for that?
Regards, Carl-Daniel
On 5/23/06, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
Depending on the board, that might be difficult. For OLPC, everything fits nicely into 32k. For most Tyan boards, we suffer from the fact that auto.o is uncompressed and ~28k alone. That is really horrible. Any reasons for
That is because of romcc can not have function call. with car cache_as_ram_auto.c you can have that smaller.
YH
OK, then We may let linuxbios_ram.rom and linuxbios_apc.rom and vsm code stay with nrv2b if there is more overhead with lzma in CAR stage.
YH
On 5/23/06, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
yhlu wrote:
We at least need two times
- one is in car stage, and the code is in rom, and it is used to
uncompress linuxbios_ram 2. one is in linuxbios_ram, and it is in ram, it is used to uncompress payload.
car stage, before call copy_and_run, the stack is already in ram. but code is still in rom. So at this time we need to balance the lzma benifit on compress linuxbios_ram and it's own code size increase in rom. also need to set one extra buffer in ram for every ap code.
OK, so we have to check which of the two solutions is smaller.
- lzma decompressor + lzma compressed linuxbios_ram
- nrv2b decompressor + nrv2b compressed (linuxbios_ram + lzma decompressor)
I just checked this for all abuild targets. The second variant wins for half of the targets. That brings me to a third option: 3) nrv2b decompressor + nrv2b compressed lzma decompressor + lzma compressed linuxbios_ram nrv2b decompressor needs IIRC 512 bytes, lzma decompressor needs 4096 bytes, so if nrv2b only compresses the lzma decompressor and we assume ~50% savings by compression, option 3 is always 1.5k smaller than option 1 and option 3 is 1.5k-7.3k smaller (2k for OLPC) than option 2.
How difficult would it be for us to build option 3?
Hope we can squash car stage code + linuxbios_ram code into 64k again.
Depending on the board, that might be difficult. For OLPC, everything fits nicely into 32k. For most Tyan boards, we suffer from the fact that auto.o is uncompressed and ~28k alone. That is really horrible. Any reasons for that?
Regards, Carl-Daniel -- http://www.hailfinger.org/
* yhlu yinghailu@gmail.com [060523 21:49]:
- one is in linuxbios_ram, and it is in ram, it is used to uncompress
payload.
car stage, before call copy_and_run, the stack is already in ram. but code is still in rom. So at this time we need to balance the lzma benifit on compress linuxbios_ram and it's own code size increase in
We have to be careful here, indeed.
rom. also need to set one extra buffer in ram for every ap code.
Can't the APs share the code?
uncompress Code is shared because it is rom. and every CPU has its own stack on its cache.
YH
On 5/24/06, Stefan Reinauer stepan@coresystems.de wrote:
- yhlu yinghailu@gmail.com [060523 21:49]:
- one is in linuxbios_ram, and it is in ram, it is used to uncompress
payload.
car stage, before call copy_and_run, the stack is already in ram. but code is still in rom. So at this time we need to balance the lzma benifit on compress linuxbios_ram and it's own code size increase in
We have to be careful here, indeed.
rom. also need to set one extra buffer in ram for every ap code.
Can't the APs share the code?
-- coresystems GmbH • Brahmsstr. 16 • D-79104 Freiburg i. Br. Tel.: +49 761 7668825 • Fax: +49 761 7664613 Email: info@coresystems.de • http://www.coresystems.de/
yhlu wrote:
uncompress Code is shared because it is rom. and every CPU has its own stack on its cache.
So a decompressor is in ROM, reads from ROM, writes to RAM and has its stack in cache? Or do we uncompress to cache?
How big is that cache? I can modify lzma to only need 6000 bytes stack and no additional memory allocations.
Are reads from ROM special (alignment problems etc)? Can I treat stack like normal RAM?
Regards, Carl-Daniel
just let the the uncompress take more parameter for the properties buf, and every cpu find the correct buffer in RAM in BSP should be OK...because the RAM on first node is ready at that time.
YH
On 5/24/06, Carl-Daniel Hailfinger c-d.hailfinger.devel.2006@gmx.net wrote:
yhlu wrote:
uncompress Code is shared because it is rom. and every CPU has its own stack on its cache.
So a decompressor is in ROM, reads from ROM, writes to RAM and has its stack in cache? Or do we uncompress to cache?
How big is that cache? I can modify lzma to only need 6000 bytes stack and no additional memory allocations.
Are reads from ROM special (alignment problems etc)? Can I treat stack like normal RAM?
Regards, Carl-Daniel -- http://www.hailfinger.org/
yhlu wrote:
just let the the uncompress take more parameter for the properties buf, and every cpu find the correct buffer in RAM in BSP should be OK...because the RAM on first node is ready at that time.
How about
int uncompress_generic(void *src, void *dst, void *buf, int bufsize);
and uncompress_generic can assume that buf can be treated like RAM. If bufsize is too small, it can return an error. An additional function
int needed_bufsize(void *src);
could be written to compute the minimum buffer size for a given compressed image.
What do you think?
Regards, Carl-Daniel
Carl-Daniel Hailfinger wrote:
yhlu wrote:
just let the the uncompress take more parameter for the properties buf, and every cpu find the correct buffer in RAM in BSP should be OK...because the RAM on first node is ready at that time.
How about
int uncompress_generic(void *src, void *dst, void *buf, int bufsize);
and uncompress_generic can assume that buf can be treated like RAM. If bufsize is too small, it can return an error. An additional function
int needed_bufsize(void *src);
could be written to compute the minimum buffer size for a given compressed image.
What do you think?
Regards, Carl-Danie
int needed_bufsize(void *src);
good. And in most cases it is: size = *(unsigned long *)src;
int uncompress_generic(void *src, void *dst, void *buf)
if the programmer is too stupid to call needed_bufsize, they're probably too stupid to pass in a correct bufsize to this function, I think it is not needed.
ron
int uncompress_generic(void *src, void *dst, void *buf)
==>
int uncompress_generic(void *src, void *dst, uint32_t *ilen, void *op_buf)
YH
On 5/25/06, Ronald G Minnich rminnich@lanl.gov wrote:
Carl-Daniel Hailfinger wrote:
yhlu wrote:
just let the the uncompress take more parameter for the properties buf, and every cpu find the correct buffer in RAM in BSP should be OK...because the RAM on first node is ready at that time.
How about
int uncompress_generic(void *src, void *dst, void *buf, int bufsize);
and uncompress_generic can assume that buf can be treated like RAM. If bufsize is too small, it can return an error. An additional function
int needed_bufsize(void *src);
could be written to compute the minimum buffer size for a given compressed image.
What do you think?
Regards, Carl-Danie
int needed_bufsize(void *src);
good. And in most cases it is: size = *(unsigned long *)src;
int uncompress_generic(void *src, void *dst, void *buf)
if the programmer is too stupid to call needed_bufsize, they're probably too stupid to pass in a correct bufsize to this function, I think it is not needed.
ron