Re: Broader discussion - limit dictionary encoding to one compression algorithm?

On Tue, May 21, 2024 at 5:05 PM Patrick Meenan <patmeenan@gmail.com> wrote:

> - Brotli is limited to 50MB dictionaries, Zstandard can go up to 128MB.
>

These are artificial limitations. We can change brotli to gigabyte if we'd
like.

Allowing it to use larger dictionaries naturally increases memory use.


> - Brotli uses 16MB of ram for the window while compressing/decompressing
> independent of the dictionary size, Zstandard requires a window (RAM) as
> large as the resource being compressed (for the delta case).
>

Here, the same. Brotli has a large-window-mode where there is no artificial
16 MB limitation. I only added that limitation originally because Chrome
gave it as a launch criterion.


> - Brotli at max compression is ~10-20% smaller than Zstandard at max
> compression with dictionary (current implementations).
>

This is partially because of context modelling.

Brotli's dictionary mechanism has a human-readable-dictionary mode that is
not yet used in dictionary generation (outside of the internal dictionary).
When we use that, it will increase dictionary efficiency for human readable
languages (such as Armenian, Vietnamese, etc. not covered by the Brotli's
static dictionary) about 25 % more than just using a usual dictionary.


> - Zstandard benefits from dictionary use across all compression levels,
> Brotli only benefits from dictionaries at level 5 and above (current
> implementations).
>

This is encoding only decisions and can be changed.

As things stand right now, if you have resources > 50MB and < 128MB you
> can't use brotli to delta-encode them (even in the web case we have already
> seen this with some large WASM apps).
>

We can easily change this.


> If you have static resources < 50MB and can do the compression at build
> time you would benefit from an additional 10-20% savings by using brotli
> (current cli anyway).
>

Brotli has context modeling which helps quite a bit in compression.
Also, brotli has a more complex 'entropy code dance' where it can very
cheaply switch between entropy codes.

If you are compressing dynamic responses and need to limit CPU, you may
> benefit from using Zstandard at low compression levels (the amount of
> brotli level-1 that is on the web may indicate this is a common constraint).
>

There are no technical stoppers that I know of that would not allow Brotli
compression to be optimized similarly. The representations are very similar..

Luca Versari is working on Rust-based Brotli:5-encoder that supposedly is
about 2x faster than the current C++ version.

If you have existing infrastructure plumbed (security approved, etc) to
> support one or the other, your preference might be to use the dictionary
> version of the same algorithm rather than pull in a new library.
>
> Thanks,
>
> -Pat
>

Received on Wednesday, 22 May 2024 10:41:10 UTC