Broader discussion - limit dictionary encoding to one compression algorithm? from Patrick Meenan on 2024-05-21 (ietf-http-wg@w3.org from April to June 2024)

From: Patrick Meenan <patmeenan@gmail.com>
Date: Tue, 21 May 2024 11:01:17 -0400
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAJV+MGzjUnZZ=XFn5veOvuhVWyZNP2b9U0fxpS3UmrDC_bc_wQ@mail.gmail.com>

The last open issue on the work for the compression dictionary transport
draft is around the compression algorithms (content-encodings) that should
be defined in the draft. Specifically, if we should spec both Brotli and
Zstandard or pick one.

It would be helpful to get the wider working group's opinion (and maybe
some prior experience where this may have come up with br/zstd and
deflate/gzip).

I have a reasonably faithful summary of the two options below but if you'd
like to read the full discussion, the issue is here:
https://github.com/httpwg/http-extensions/issues/2756


** The case for a single content-encoding:

A single option will result in broader interoperability.

Multiple choices may lead to fragmentation where clients may be forced to
support both Zstandard and Brotli if there is a wide mix of servers only
implementing one or the other. On the other hand, if different clients only
implement one (and there is no intersection of support), then servers/sites
will need to implement both to get broad benefit.

If Brotli and Zstandard have similar capabilities then converging to a
single encoding would be better for scaling adoption and interop.


** The case for both Brotli and Zstandard:

Zstandard and Brotli are both already dictionary-aware outside of a
specified content-encoding. We already have content-encoding support for
both of them so why should the dictionary variants be given special
treatment? Specifying the format for content-encoding the dictionary
variants of both allows for interop with either format, reducing the need
for a one-off private implementation if someone wants to use the format
that wasn't picked.

There are some tangible differences that may lead someone to choose one
over the other. Some of them are implementation differences in the current
libraries that may be able to improve upon and some are fundamental to the
format:

For the delta-encoding case, the "dictionary" is the previous version of
the resource so any limits around the dictionaries end up being limits for
the size of resources that are being delta-compressed.

- Brotli is limited to 50MB dictionaries, Zstandard can go up to 128MB.
- Brotli uses 16MB of ram for the window while compressing/decompressing
independent of the dictionary size, Zstandard requires a window (RAM) as
large as the resource being compressed (for the delta case).
- Brotli at max compression is ~10-20% smaller than Zstandard at max
compression with dictionary (current implementations).
- Zstandard benefits from dictionary use across all compression levels,
Brotli only benefits from dictionaries at level 5 and above (current
implementations).

As things stand right now, if you have resources > 50MB and < 128MB you
can't use brotli to delta-encode them (even in the web case we have already
seen this with some large WASM apps).

If you have static resources < 50MB and can do the compression at build
time you would benefit from an additional 10-20% savings by using brotli
(current cli anyway).

If you are compressing dynamic responses and need to limit CPU, you may
benefit from using Zstandard at low compression levels (the amount of
brotli level-1 that is on the web may indicate this is a common constraint).

If you have existing infrastructure plumbed (security approved, etc) to
support one or the other, your preference might be to use the dictionary
version of the same algorithm rather than pull in a new library.

Thanks,

-Pat

Received on Tuesday, 21 May 2024 15:01:35 UTC