RE: Working Group Last Call: Compression Dictionary Transport from Mike Bishop on 2024-06-14 (ietf-http-wg@w3.org from April to June 2024)

From: Mike Bishop <mbishop@evequefou.be>
Date: Fri, 14 Jun 2024 19:32:34 +0000
To: Patrick Meenan <patmeenan@gmail.com>, Martin Thomson <mt@lowentropy.net>
CC: Yoav Weiss <yoav.weiss@shopify.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <PH0PR22MB310226537760B8029AF6B338DAC22@PH0PR22MB3102.namprd22.prod.outlook.com>
If I’m distilling this correctly, the current state of implementations is:

  *   Several origin servers
     *   All using the same implementation, or multiple independent implementations?
  *   Multiple CDNs in pass-through mode (i.e. don’t break, let origin send diffs)
  *   Zero CDNs performing the diff themselves
  *   One browser

Is that accurate?

From: Patrick Meenan <patmeenan@gmail.com>
Sent: Thursday, June 13, 2024 9:51 AM
To: Martin Thomson <mt@lowentropy.net>
Cc: Yoav Weiss <yoav.weiss@shopify.com>; ietf-http-wg@w3.org
Subject: Re: Working Group Last Call: Compression Dictionary Transport

Sorry, this is my first foray into the standards process but from reading over RFC 2026 for the standards track from proposed -> draft -> standard, it looked like proposed was appropriate and draft is the point where multiple independent implementations became the defining factor.

Pulling out the relevant section for proposed standard:

   A Proposed Standard specification is generally stable, has resolved
   known design choices, is believed to be well-understood, has received
   significant community review, and appears to enjoy enough community
   interest to be considered valuable.  However, further experience
   might result in a change or even retraction of the specification
   before it advances.

   Usually, neither implementation nor operational experience is
   required for the designation of a specification as a Proposed
   Standard.  However, such experience is highly desirable, and will
   usually represent a strong argument in favor of a Proposed Standard
   designation.

And for experimental:

   The "Experimental" designation typically denotes a specification that
   is part of some research or development effort.  Such a specification
   is published for the general information of the Internet technical
   community and as an archival record of the work, subject only to
   editorial considerations and to verification that there has been
   adequate coordination with the standards process (see below).  An
   Experimental specification may be the output of an organized Internet
   research effort (e.g., a Research Group of the IRTF), an IETF Working
   Group, or it may be an individual contribution.

Maybe I haven't been transparent enough with the process of Chrome's origin trials but it feels like it was experimental already when we adopted the draft into the WG, having done the research and internal testing.

The origin trials started with Chrome 117 last March with the draft-00 design. There have been 3 rounds of trials with 3 different revisions of the draft with the current V3 trial implementing the features in the current draft-05.

The trials included different types of sites from the largest properties (Google and others) as well as sites of various sizes from rich applications to ecommerce and published content sites to make sure the developer ergonomics worked like we expected and that the design failed-safe when exposed to the web at scale. This included testing through most of the popular CDN's to make sure it either worked out of the box as a passthrough cache or could be configured to work (and, more importantly, that it didn't break anything). The trials have been hugely successful with the expected 80%+ reduction in bytes for static content and significant performance wins for dynamic content (even for the most latency-sensitive sites).

As far as breakage goes, the only issue discovered was with some security devices (middleboxes) that inspect traffic but don't modify the Accept-Encoding header that passes through to make sure only encodings that they understand are advertised. We are planning to "fix" the ecosystem when the Chrome feature rolls out by providing an time-locked enterprise policy that will make admins aware of the issue and provide pressure on the device vendors to fix their interception.

There haven't been any fundamental changes to the design since the original draft. We moved a few things around but the basic negotiation and encoding has been stable and we've converged on the current, tested design. This feels like we have quite a bit of both implementation and operational experience deploying it and feels pretty solidly at the "proposed standard" maturity.

It's possible that further experience when CDN's or servers start implementing features to automate the encoding that it would benefit to revise the standard but, as far as I can tell, that's the purpose of proposed standard before it matures to draft standard.


Stage aside, for "Use-As-Dictionary" specifically and the risks of matching every fetch, "clients" can decide the constraints around when they think it would be advantageous to check for a match or when they would be better off ignoring it and falling back to non-dictionary compression. Chrome, for example, has a limit of 1000 dictionaries per origin in a LRU store (and 200 MB per origin). Those may change but there are no MUSTs around using the advertised dictionaries.

For that matter, there is no requirement that a client and server need to use the Use-As-Dictionary header as the only way to seed dictionaries in the client. It's entirely possible to embed a dictionary in a client and still use the Available-Dictionary/Content-Encoding part of the spec. The same can apply to a CDN when it is configured to talk to an origin. There's nothing stopping a CDN from providing a config where dictionaries can be uploaded (or provided) and certain types of requests back to the origin could advertise the configured dictionaries as available.

I'm hopeful that what we have designed and tested has the flexibility to allow for a lot of use cases beyond what we have already deployed and tested but that's largely what the process from proposed standard to draft standard allows for.

On Thu, Jun 13, 2024 at 1:42 AM Martin Thomson <mt@lowentropy.net<mailto:mt@lowentropy.net>> wrote:
On Thu, Jun 13, 2024, at 15:36, Yoav Weiss wrote:
> Yeah, I don't think this is the way to go.

As I said, obviously.  But your strategy only really addresses the serving end.

>> All of which is to say, I think this needs time as an experiment.
>
> I'll let Pat chime in with his thoughts, as I don't have strong
> opinions on that particular front.

I should have said before: I'm supportive of experimentation in this area.  Even to the extent of publishing an RFC with the code points and whatnot.  But I don't think that this meets the bar for Proposed Standard.
Received on Friday, 14 June 2024 19:32:43 UTC