Re: Call for Adoption: draft-meenan-httpbis-compression-dictionary from Patrick Meenan on 2023-08-18 (ietf-http-wg@w3.org from July to September 2023)

From: Patrick Meenan <patmeenan@gmail.com>
Date: Fri, 18 Aug 2023 09:43:27 -0400
To: Stefan Eissing <stefan@eissing.org>
Cc: Fielding Roy <fielding@gbiv.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Tommy Pauly <tpauly@apple.com>
Message-ID: <CAJV+MGxh5jzTWGzAmohyWeGd+9928HP=YfZPu1ch0pXo=LUhnQ@mail.gmail.com>
Content-Encoding is end-to-end and follows the content (though some reverse
proxies decode/re-encode the encoding if you are relying on the reverse
proxy for your compression or modifying the payload).  Transfer-Encoding or
anything at the HTTP/2 or 3 layer would be hop-to-hop.

The main requirements for a reverse proxy to "work" with an origin using
dictionary compression are:
- Pass unknown "Accept-Encoding" values through (if they are stripped, the
responses will still work but dictionary compression won't be used)
- Treat "Content-Encoding" responses with unknown encodings as opaque
responses (most that I have tested already do this)
- Support "Vary" for cache keys (if it is a caching proxy) for
"Accept-Encoding" and "Sec-Available-Dictionary" request headers (may
require some config depending on the proxy)

Here are some notes from April when I tested it on Fastly, CloudFront and
Cloudflare, all of which are reverse-proxies:
https://github.com/pmeenan/compression-dictionary-notes/blob/main/CDN.md

The basic flow looks something like this:

- Request comes in to reverse proxy from c1 for
https://example.com/v2/main.js with "Accept-Encoding: deflate, gzip, br,
zstd, br-d, zstd-d" and "Sec-Available-Dictionary: xxxyyyzzz"
- Resource isn't found in cache, request is made from reverse-proxy to b1
for the URL with the same request headers
- Response from b1 comes back with "Content-Encoding: br-d" and "Vary:
content-encoding, sec-available-dictionary" (and appropriate cache headers
making it cache eligible)
- Proxy stores it in cache, keyed by URL, The Accept-Encoding string and
the xxxyyyzzz dictionary
- Proxy responds with the dictionary-compressed resource (doesn't try to
re-compress it since it is already using content-encoding (and maybe with
an encoding the proxy doesn't understand)

- Request comes in to reverse proxy from c2 for
https://example.com/v2/main.js with "Accept-Encoding: deflate, gzip, br,
zstd, br-d, zstd-d" and "Sec-Available-Dictionary: xxxyyyzzz"
- Proxy finds resource in cache, keyed by URL, Accept-Encoding and
xxxyyyzzz dictionary and serves the dictionary-compressed resource from
cache

On Fri, Aug 18, 2023 at 6:25 AM Stefan Eissing <stefan@eissing.org> wrote:

>
>
> > Am 17.08.2023 um 22:39 schrieb Patrick Meenan <patmeenan@gmail.com>:
> >
> > Probably worth continuing the discussion in a dedicated thread if
> adopted but hopefully it won't hurt to take a first pass (inline)...
> >
> > On Thu, Aug 17, 2023 at 1:55 PM Roy T. Fielding <fielding@gbiv.com>
> wrote:
> > I think implementation of such through content-codings is fundamentally
> > misguided because it changes the resource itself and impacts all caching
> > along the chain of requests in ways that are non-recoverable. That is due
> > to the lost metadata and variance on whatever request field is used to
> indicate
> > that some downstream client can grok some possible dictionary.
> >
> > The decoded version of the resource is unchanged. It's not fundamentally
> different than brotli which happens to include a default dictionary and the
> caching is guaranteed to be maintained in a consistent way as long as
> "Vary" works on "Accept-Encoding" as well as whatever header negotiates the
> dictionary.  Even without the dictionary, if something in the middle
> doesn't know how to process one of the content-encodings (and needs to be
> able to access the content) then the accept-encoding should be modified to
> only include encodings that it knows how to work with.  This isn't really
> notably different than "br" or "zstd".
>
> How would a caching reverse proxy work here? Assume there are frontend
> connection c1 and c2 and backend connection b1?
>
> Can there be dictionary state shared between the clients and the backend?
> If not, and the reverse proxy would need to decode/re-encode content, this
> looks like a Hop-By-Hop thing. Which transfer-encoding seems to suite
> better, e.g. better suited to work with the existing infra.
>
> Maybe I just have an incomplete understanding how this is supposed to work.
>
> Kind Regards,
> Stefan
>
> >  In short, it looks like an easy solution for a browser, but will wreak
> > havoc with the larger architecture of the Web.
> >
> > The right way to do this is to implement it as a transfer encoding that
> > can be decoded without loss or confusion with the unencoded resource,
> > which would require extending h2 and h3 to support that feature of
> HTTP/1.1.
> >
> > For the existing draft, there is a lot of unnecessary confusion regarding
> > features of fetch, like CORS, that don't make any sense from a security
> > perspective. That's not what CORS is capable of covering, nor how it is
> > implemented in practice, so reusing it doesn't make any sense.
> > The same goes for use of the Sec- prefix on header fields.
> >
> > CORS covers privacy from a browser perspective as far as the readability
> of responses relative to the origin of the containing document which is
> exactly the context that it is needed for here. The concern that it takes
> care of is to make sure that responses that shouldn't be readable from the
> document context of the client can't be exposed to oracle timing attacks
> (because there won't be any client-opaque responses). HTTP itself doesn't
> really have the same document framing context and need for protecting read
> access of individual responses on a shared connection by clients running in
> different document contexts.
> >  Allowing a response from one origin to define a compression dictionary
> > for responses received from some other origin would clearly violate the
> > assumptions of https in so many ways (space, time, and cross-analysis).
> > I don't see how we could possibly allow that even if both origins were
> > covered by the same certificate. It would be far easier to require that
> > everything have the same origin (as defined in RFC9110, not fetch) or
> > by having the response origin define specifically which dictionary is
> > being used (identifying both the dictionary URL and hash).  In the latter
> > case, it would be possible to pre-define common dictionaries and thus
> > reduce or remove the need to download them.
> >
> > Maybe we crossed wires somewhere, but the dictionaries and the responses
> they apply to MUST be same-origin to each other in this ID. Where CORS
> comes into play is the dictionary or compressed response's relation to the
> document context that they are being fetched from (in a browser case
> anyway).
> >
> > Moving the compression down into the transport layer is what we tried
> before but failed to navigate the browser security issues because the
> transport layer doesn't have the context of which responses need to be
> opaque, which responses are partitioned across document or frame
> boundaries, etc and that the dictionary compression could be used to
> perform oracle attacks across those boundaries.
> >  Likewise, using * as a wildcard in arbitrary URL references is a foot
> gun.
> > It would make more sense to have two attributes, prefix and suffix, and
> > have them only match within the URL path (i.e., exclude the origin and
> > query portions, preventing matches on full URIs or user-supplied
> > query parameters). That is far more likely to get right than allowing
> > things like "//example.com/*/*/*/*/****"
> >
> > The origin is already excluded from being configurable. There is some
> discussion about only supporting relative paths but allowing for full URLs
> just made it easier to reference the existing URL RFC without having to
> re-define just the parts we need to support.
> >
> > Query params can't necessarily be excluded and some sites are going to
> want to allow for either fixed query param matching or wildcard (and maybe
> for both the static and dynamic use case).  Allowing for * allows for some
> flexibility in site URL structure while still keeping the matching
> relatively simple and without the complexity of URLPattern (
> https://github.com/WICG/urlpattern/blob/main/mdn-drafts/QUICK-REFERENCE.md
> )
> >
> > Anyway, I look forward to shaking these issues out.  I'll see about
> creating issues in the github repo that I have been using for the ID for
> all of the questions and concerns raised to make sure we don't lose track
> of any of them (repo is here:
> https://github.com/pmeenan/i-d-compression-dictionary ).
> >
> > Thanks,
> >
> > -Pat
>
>
Received on Friday, 18 August 2023 13:43:46 UTC