- From: Patrick Meenan <patmeenan@gmail.com>
- Date: Wed, 22 May 2024 10:03:03 -0400
- To: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAJV+MGwWLuujKUG1vAz8e8F0SoYWxN3KFv-nWfRYiFVk1Q9U2Q@mail.gmail.com>
It's probably worth noting that the draft is not specifying "Brotli" and "Zstandard" but, rather, "dcb" and "dcz" which are specific parameters for each (window size in particular) that lead to the restrictions I mentioned. They are effectively the dictionary-equivalent of "zstd" and "br", both of which use the same 8 and 16 MB windows respectively that "dcz" and "dcb" define. Dictionary compression for delta updates is more likely to benefit from large window variants for use cases where you want to use http to deliver delta updates of large files since the window and other params for each directly impact the effectiveness of the delta encoding and size of resources that they can be applied to. I would not be surprised to see large/huge variants of the content encoding be defined and used outside of the browser case and they can still leverage the same dictionary mechanism, just with a different content-encoding (and would just need to define an appropriate content-encoding). There are other compression algorithms that are specific to resource types that can do MUCH better delta encoding than what Zstandard and Brotli provide in the general case. Courgette, for example: https://www.chromium.org/developers/design-documents/software-updates-courgette/ I wouldn't be surprised if a better diff update were to be developed for ML models that could do something better than pattern matching knowing the format of the file (giant collection of weights), particularly given the size of the Gen AI models where even the smallest are multiple gigabytes. I don't expect dictionary updates over HTTP (using the compression dictionary transport mechanism) will be limited to 1-2 content-encodings for very long so the main question is if we define both "dcb" and "dcz" now or only one of them and let other content-encodings follow for different use cases in future RFCs. I think it makes sense to spec the dictionary-aware versions of both "zstd" and "br" since we already have both of them and they are both in broad use and the parameters map directly to what is currently defined for "dbz" and "dcb". This is effectively defining how the existing encodings should behave when using dictionaries. On Tue, May 21, 2024 at 1:02 PM Patrick Meenan <patmeenan@gmail.com> wrote: > > > On Tue, May 21, 2024 at 12:41 PM Poul-Henning Kamp <phk@phk.freebsd.dk> > wrote: > >> Patrick Meenan writes: >> >> > ** The case for a single content-encoding: >> > […] >> > ** The case for both Brotli and Zstandard: >> >> First, those are not really the two choices before us. >> >> Option one is: Pick one single algorithm >> >> Option two is: Add a negotiation mechanism and seed a new IANA registry >> with those two algorithms >> >> As far as I can tell, there are no credible data which shows any >> performance difference between the two, and no of reason to think that any >> future compression algorithm will do significantly better. >> > > We already have a negotiation mechanism. It uses "Accept-Encoding" and > "Content-Encoding" and the existing registry. Nothing about the negotiation > changes if we use one, two or more. The question is if we specify and > register the "dcb" content-encoding as well as the "dcz" content encoding > as part of this draft or if we only register one (or if we also add a > restriction that no other content encodings can use the dictionary > negotiation). > > As far as future encodings, we don't know if any algorithms will do better > but there is the potential for content-aware delta encodings to do better > (with things like reallocated addresses in WASM, etc). More likely, there > will probably come a time where someone wants to delta-encode > multi-gigabyte resources where the 50/128MB limitations laid out for "dcb" > and "dcz" won't work and a "large window" variant may need to be specified > (as a new content encoding). >
Received on Wednesday, 22 May 2024 14:03:21 UTC