On 10/04/12 18:41, Blue Swirl wrote:
Yes, but again, image size savings are not very interesting. Savings in image loading time is easily lost during decompression.
Well, the use case I'm considering at the moment is that if you build OpenBIOS SPARC32 with -O0 -g and then try to load it into QEMU, QEMU reports that the image is too large. So I'm guessing this is a limitation of the memory layout of the machine, rather than the space occupied by the binary.
Secondly, if we're copying the data to an OFMEM-allocated area then why don't we just compress it at build time and then decompress it during the copy using DEFLATE (see my previous email)? Then we can further reduce the dictionary payload size from ~180K to around ~37K, although as you rightly point out there could be a small delay on startup - given the small size involved (and the fact we can lock the TLB entry like we do with the Forth machine), I don't think the penalty will be too bad.
But only the decompressed result is what matters wrt memory usage, isn't it? After Forth has started, there shouldn't be any difference in RAM use, except that the decompressor code takes more space.
Again for the use case above, this would not be a problem if we were to decompress and relocate the dictionary into RAM.
I'm fairly confident I can come up with an experimental patch for this - would people mind if we added zlib as a build-time dependency, and puff.c to the OpenBIOS codebase? At the very least, if the decompression appears too expensive the first stage on its own would still be a good idea.
For maximum compression, bzImage (still remember those?) style approach could be used for whole image.
I can't say I've ever looked at that. But if you're happy to at least consider a patch, then I'd like to invest some time looking at this.
ATB,
Mark.
On Fri, Apr 13, 2012 at 15:56, Mark Cave-Ayland mark.cave-ayland@ilande.co.uk wrote:
On 10/04/12 18:41, Blue Swirl wrote:
Yes, but again, image size savings are not very interesting. Savings in image loading time is easily lost during decompression.
Well, the use case I'm considering at the moment is that if you build OpenBIOS SPARC32 with -O0 -g and then try to load it into QEMU, QEMU reports that the image is too large. So I'm guessing this is a limitation of the memory layout of the machine, rather than the space occupied by the binary.
I'm not sure we should have this restriction on ROM size in QEMU. Currently PROM_SIZE_MAX in QEMU is 1M, but at first the CPU is in boot mode and so the ROM is visible for instruction access at address 0 upwards instead of end of memory, it could be much larger. The next device is located 16MB higher on both SS-5 and SS-20, so the max could be easily 16MB.
However, the ROM is copied to RAM soon by OpenBIOS and the RAM copy is mapped at 0xffd00000 upwards (3MB hole). In this hole we also have to reserve virtual address space or VA to devices esp. frame buffer (1MB), so the available VA during run time is limited. So I don't see how any compression would help, because after decompression the decompressed stuff (which should actually be slightly bigger than original because of the decompression code) must still fit to the area available.
It could be possible to perform some kind of bank switching to avoid using more address space, or for example frame buffer could be unmapped on Forth entry and remapped afterwards but this is IMHO too tricky.
We could also use the other VA areas mentioned in the standard, or just blatantly use more VA than we should, probably everything would still work because OS should not make assumptions on the what allocations OF has made anyway.
Maybe converting Forth to C or vice versa would help, at least fword("Forth stuff in a string") should be more efficient in either pure Forth or C rather than the string + the function call C code involved.
Secondly, if we're copying the data to an OFMEM-allocated area then why don't we just compress it at build time and then decompress it during the copy using DEFLATE (see my previous email)? Then we can further reduce the dictionary payload size from ~180K to around ~37K, although as you rightly point out there could be a small delay on startup - given the small size involved (and the fact we can lock the TLB entry like we do with the Forth machine), I don't think the penalty will be too bad.
But only the decompressed result is what matters wrt memory usage, isn't it? After Forth has started, there shouldn't be any difference in RAM use, except that the decompressor code takes more space.
Again for the use case above, this would not be a problem if we were to decompress and relocate the dictionary into RAM.
I'm fairly confident I can come up with an experimental patch for this - would people mind if we added zlib as a build-time dependency, and puff.c to the OpenBIOS codebase? At the very least, if the decompression appears too expensive the first stage on its own would still be a good idea.
For maximum compression, bzImage (still remember those?) style approach could be used for whole image.
I can't say I've ever looked at that. But if you're happy to at least consider a patch, then I'd like to invest some time looking at this.
ATB,
Mark.