Attention is currently required from: Arthur Heymans, Martin L Roth, Shelley Chen, Wonkyu Kim.
Julius Werner has posted comments on this change. ( https://review.coreboot.org/c/coreboot/+/69753?usp=email )
Change subject: util/cbfstool: Add zstd support ......................................................................
Patch Set 3:
(1 comment)
File src/commonlib/zstd-1.5.2/lib/common/mem.h:
https://review.coreboot.org/c/coreboot/+/69753/comment/06e8a8eb_4800f5a6 : PS1, Line 174: return one.c[0];
Hi, […]
Thanks for the numbers, those are a good data point. It looks like your input size for ZSTD is larger than for the others, though (I assume that's because you compiled the ZSTD code into ramstage there, and for the LZMA example you didn't)? That's not really a fair comparison (because if we found that ZSTD is a lot better than LZMA we could drop LZMA instead), so I would suggest that you do the compression ratio tests in a way that all algorithms are always compiled in, and maybe evaluate code size separately.
Also, can you clarify which compression level you used in ZSTD? (Did you play with compression levels at all? Does it affect decompression speed?)
Otherwise, the decompression time results here seem a bit disappointing, compared to the numbers you can find on the internet. They're a bit all over the place, but in general it seems like ZSTD is supposed to be about 5-10 times faster than LZMA (not just 2), and should only be 4-8 times slower than LZ4 (not 25). Looks like there may still be some optimization potential in the implementation itself here? Does it compile low-level primitives down to the right instructions? Does the benchmark version maybe use special instructions like SSE and we don't (and if so, could we)?
Still, the net result here may be that even with full optimization, SPI transfer speeds on recent Intel platforms are so fast that ZSTD isn't going to outcompete LZ4. If that's the case I would encourage looking into https://github.com/inikep/lizard, which promises to be an algorithm that settles in between LZ4 and ZSTD on the compression ratio vs. decompression speed scale.
(BTW I don't really understand your FSP results. Are you saying that ZSTD compression ratio is lower than LZ4 for that file? Are you sure that's not a measurement error? The compression ratios are also very low compared to the ramstage... I wonder if the reason for that—if those numbers are accurate—is that a large part of the FSP is either already compressed or encrypted, and LZ4 is just able to skip incompressible data with less overhead.)