Re: Broader discussion - limit dictionary encoding to one compression algorithm?

> On May 21, 2024, at 11:01 AM, Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com> wrote:
> 
> On Tue, May 21, 2024 at 01:02:30PM -0400, Patrick Meenan wrote:
>> On Tue, May 21, 2024 at 12:41 PM Poul-Henning Kamp <phk@phk.freebsd.dk>
>> wrote:
>> 
>>> Patrick Meenan writes:
>>> 
>>>> ** The case for a single content-encoding:
>>>> […]
>>>> ** The case for both Brotli and Zstandard:
>>> 
>>> First, those are not really the two choices before us.
>>> 
>>> Option one is:  Pick one single algorithm
>>> 
>>> Option two is:  Add a negotiation mechanism and seed a new IANA registry
>>> with those two algorithms
>>> 
>>> As far as I can tell, there are no credible data which shows any
>>> performance difference between the two, and no of reason to think that any
>>> future compression algorithm will do significantly better.
>>> 
>> 
>> We already have a negotiation mechanism.  It uses "Accept-Encoding" and
>> "Content-Encoding" and the existing registry. Nothing about the negotiation
>> changes if we use one, two or more. The question is if we specify and
>> register the "dcb" content-encoding as well as the "dcz" content encoding
>> as part of this draft or if we only register one (or if we also add a
>> restriction that no other content encodings can use the dictionary
>> negotiation).
>> 
>> As far as future encodings, we don't know if any algorithms will do better
>> but there is the potential for content-aware delta encodings to do better
>> (with things like reallocated addresses in WASM, etc). More likely, there
>> will probably come a time where someone wants to delta-encode
>> multi-gigabyte resources where the 50/128MB limitations laid out for "dcb"
>> and "dcz" won't work and a "large window" variant may need to be specified
>> (as a new content encoding).
> 
> A practical approach is to allow for future unknowns as you describe
> above, and to pick one of (brotli, zstd) to be required to be
> implemented in this version of the standard, with the other optional.
> Future versions might have a different required algorithm, and include
> the intention that an algorithm required in a prior version will remain
> an option in future versions.
> 
> If a content-provider spends the time to build procedures and
> infrastructure to deploy compression dictionaries, they will probably
> experiment with their content and their CPU, memory, and other resource
> limitations; and along with client capabilities, balance their choices
> based on all of those inputs and more.
> 
> Proxies and CDNs may make different choices on what they support
> receiving from origin servers, how they store it, and what they support
> sending to clients.  Intermediate security scanners and possibly
> alternate corporate malware will add other limitations.
> 
> Compression dictionaries are an optional feature and one that an origin
> server might choose not to implement (or a server deployed before and
> older than the specification).

I think the problem is that this choice is being characterized as a
negotiation, as if the client is able to choose from multiple encodings,
whereas in common practice the origin server chooses one encoding
or none at all.

IOW, the client's communication is advising the server on
what will work for this response, not which one is best.
It's not really a negotiation -- we just call it that in HTTP.

The server will choose what is best based on their expectations.
That includes deployment, type of client, type of resource, type
of content, and likely changes to the representation over time
(e.g., efficiency of the encoding).

The chosen encoding may change over time, as new encodings are
deployed or new things are learned about likely changes, and
the choice may differ by anticipated client type. That's why
we need labels for the chosen encoding.

I don't think anyone can know that there is only one algorithm
needed for dictionary-compression. I agree with the prior comment
on window-size differences, particularly for brotli.

What a browser wants in terms of reasonable window sizes is
totally different from what a continuous-integration platform
might want, or a software update platform might want.

It's incredibly important to keep in mind that ALL of them
are using HTTP. We cannot fall into the trap of thinking
that communication over HTTP has to be tailored specifically for
certain clients, like the few remaining general-purpose browsers.

We cannot allow HTTP itself to be carved up into application-specific
battles over syntax or named parameters, since its purpose is to be
that one uniform interface which isn't application-specific.
HTTP is extensible, by design.

Also, I don't believe the two existing encodings are equivalent
for all content types. I think brotli is better for some and
zstd is better for others, largely depending on the data format
and the nature of each resource's changes over time.

If the current browser vendors want to deliberately choose
only one encoding to implement, at least for now, that's
fine with me. The IETF doesn't need to make that decision
for them. I don't even need to know why such a choice is made,
so long as it doesn't prevent other implementations from
making their own choices, for their own reasons.

We use self-descriptive protocols to enable extensibility,
in terms of both our limited imagination of what can be done
now and our limited ability to anticipate what will be needed
long into the future. I see no difference here, and no need to
pick a winner when the default is to not use it at all.

....Roy

Received on Tuesday, 21 May 2024 23:36:27 UTC