Re: Call for Adoption: draft-meenan-httpbis-compression-dictionary from Patrick Meenan on 2023-08-17 (ietf-http-wg@w3.org from July to September 2023)

From: Patrick Meenan <patmeenan@gmail.com>
Date: Thu, 17 Aug 2023 16:39:19 -0400
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Tommy Pauly <tpauly@apple.com>
Message-ID: <CAJV+MGxySTUmQqp++OxeVACt8zSFTo=ETjO=PHeB7o1HshpkAg@mail.gmail.com>
Probably worth continuing the discussion in a dedicated thread if adopted
but hopefully it won't hurt to take a first pass (inline)...

On Thu, Aug 17, 2023 at 1:55 PM Roy T. Fielding <fielding@gbiv.com> wrote:

> I think implementation of such through content-codings is fundamentally
> misguided because it changes the resource itself and impacts all caching
> along the chain of requests in ways that are non-recoverable. That is due
> to the lost metadata and variance on whatever request field is used to
> indicate
> that some downstream client can grok some possible dictionary.
>

The decoded version of the resource is unchanged. It's not fundamentally
different than brotli which happens to include a default dictionary and the
caching is guaranteed to be maintained in a consistent way as long as
"Vary" works on "Accept-Encoding" as well as whatever header negotiates the
dictionary.  Even without the dictionary, if something in the middle
doesn't know how to process one of the content-encodings (and needs to be
able to access the content) then the accept-encoding should be modified to
only include encodings that it knows how to work with.  This isn't really
notably different than "br" or "zstd".


> In short, it looks like an easy solution for a browser, but will wreak
> havoc with the larger architecture of the Web.
>
> The right way to do this is to implement it as a transfer encoding that
> can be decoded without loss or confusion with the unencoded resource,
> which would require extending h2 and h3 to support that feature of
> HTTP/1.1.
>
> For the existing draft, there is a lot of unnecessary confusion regarding
> features of fetch, like CORS, that don't make any sense from a security
> perspective. That's not what CORS is capable of covering, nor how it is
> implemented in practice, so reusing it doesn't make any sense.
> The same goes for use of the Sec- prefix on header fields.
>

CORS covers privacy from a browser perspective as far as the readability of
responses relative to the origin of the containing document which is
exactly the context that it is needed for here. The concern that it takes
care of is to make sure that responses that shouldn't be readable from the
document context of the client can't be exposed to oracle timing attacks
(because there won't be any client-opaque responses). HTTP itself doesn't
really have the same document framing context and need for protecting read
access of individual responses on a shared connection by clients running in
different document contexts.


> Allowing a response from one origin to define a compression dictionary
> for responses received from some other origin would clearly violate the
> assumptions of https in so many ways (space, time, and cross-analysis).
> I don't see how we could possibly allow that even if both origins were
> covered by the same certificate. It would be far easier to require that
> everything have the same origin (as defined in RFC9110, not fetch) or
> by having the response origin define specifically which dictionary is
> being used (identifying both the dictionary URL and hash).  In the latter
> case, it would be possible to pre-define common dictionaries and thus
> reduce or remove the need to download them.
>

Maybe we crossed wires somewhere, but the dictionaries and the responses
they apply to MUST be same-origin to each other in this ID. Where CORS
comes into play is the dictionary or compressed response's relation to the
document context that they are being fetched from (in a browser case
anyway).

Moving the compression down into the transport layer is what we tried
before but failed to navigate the browser security issues because the
transport layer doesn't have the context of which responses need to be
opaque, which responses are partitioned across document or frame
boundaries, etc and that the dictionary compression could be used to
perform oracle attacks across those boundaries.


> Likewise, using * as a wildcard in arbitrary URL references is a foot gun.
> It would make more sense to have two attributes, prefix and suffix, and
> have them only match within the URL path (i.e., exclude the origin and
> query portions, preventing matches on full URIs or user-supplied
> query parameters). That is far more likely to get right than allowing
> things like "//example.com/*/*/*/*/****"
>

The origin is already excluded from being configurable. There is some
discussion about only supporting relative paths but allowing for full URLs
just made it easier to reference the existing URL RFC without having to
re-define just the parts we need to support.

Query params can't necessarily be excluded and some sites are going to want
to allow for either fixed query param matching or wildcard (and maybe for
both the static and dynamic use case).  Allowing for * allows for some
flexibility in site URL structure while still keeping the matching
relatively simple and without the complexity of URLPattern (
https://github.com/WICG/urlpattern/blob/main/mdn-drafts/QUICK-REFERENCE.md)

Anyway, I look forward to shaking these issues out.  I'll see about
creating issues in the github repo that I have been using for the ID for
all of the questions and concerns raised to make sure we don't lose track
of any of them (repo is here:
https://github.com/pmeenan/i-d-compression-dictionary ).

Thanks,

-Pat
Received on Thursday, 17 August 2023 20:39:38 UTC