Re: Call for Adoption: draft-meenan-httpbis-compression-dictionary from Stefan Eissing on 2023-08-18 (ietf-http-wg@w3.org from July to September 2023)

From: Stefan Eissing <stefan@eissing.org>
Date: Fri, 18 Aug 2023 12:25:04 +0200
To: Patrick Meenan <patmeenan@gmail.com>
Cc: Fielding Roy <fielding@gbiv.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Tommy Pauly <tpauly@apple.com>
Message-Id: <A10BBDFB-044D-4446-80FA-B9985B2FF783@eissing.org>
> Am 17.08.2023 um 22:39 schrieb Patrick Meenan <patmeenan@gmail.com>:
> 
> Probably worth continuing the discussion in a dedicated thread if adopted but hopefully it won't hurt to take a first pass (inline)...
> 
> On Thu, Aug 17, 2023 at 1:55 PM Roy T. Fielding <fielding@gbiv.com> wrote:
> I think implementation of such through content-codings is fundamentally
> misguided because it changes the resource itself and impacts all caching
> along the chain of requests in ways that are non-recoverable. That is due
> to the lost metadata and variance on whatever request field is used to indicate
> that some downstream client can grok some possible dictionary.
> 
> The decoded version of the resource is unchanged. It's not fundamentally different than brotli which happens to include a default dictionary and the caching is guaranteed to be maintained in a consistent way as long as "Vary" works on "Accept-Encoding" as well as whatever header negotiates the dictionary.  Even without the dictionary, if something in the middle doesn't know how to process one of the content-encodings (and needs to be able to access the content) then the accept-encoding should be modified to only include encodings that it knows how to work with.  This isn't really notably different than "br" or "zstd".

How would a caching reverse proxy work here? Assume there are frontend connection c1 and c2 and backend connection b1?

Can there be dictionary state shared between the clients and the backend? If not, and the reverse proxy would need to decode/re-encode content, this looks like a Hop-By-Hop thing. Which transfer-encoding seems to suite better, e.g. better suited to work with the existing infra.

Maybe I just have an incomplete understanding how this is supposed to work.

Kind Regards,
Stefan

>  In short, it looks like an easy solution for a browser, but will wreak
> havoc with the larger architecture of the Web.
> 
> The right way to do this is to implement it as a transfer encoding that
> can be decoded without loss or confusion with the unencoded resource,
> which would require extending h2 and h3 to support that feature of HTTP/1.1.
> 
> For the existing draft, there is a lot of unnecessary confusion regarding
> features of fetch, like CORS, that don't make any sense from a security
> perspective. That's not what CORS is capable of covering, nor how it is
> implemented in practice, so reusing it doesn't make any sense. 
> The same goes for use of the Sec- prefix on header fields.
> 
> CORS covers privacy from a browser perspective as far as the readability of responses relative to the origin of the containing document which is exactly the context that it is needed for here. The concern that it takes care of is to make sure that responses that shouldn't be readable from the document context of the client can't be exposed to oracle timing attacks (because there won't be any client-opaque responses). HTTP itself doesn't really have the same document framing context and need for protecting read access of individual responses on a shared connection by clients running in different document contexts.
>  Allowing a response from one origin to define a compression dictionary
> for responses received from some other origin would clearly violate the
> assumptions of https in so many ways (space, time, and cross-analysis).
> I don't see how we could possibly allow that even if both origins were
> covered by the same certificate. It would be far easier to require that
> everything have the same origin (as defined in RFC9110, not fetch) or
> by having the response origin define specifically which dictionary is
> being used (identifying both the dictionary URL and hash).  In the latter
> case, it would be possible to pre-define common dictionaries and thus
> reduce or remove the need to download them.
> 
> Maybe we crossed wires somewhere, but the dictionaries and the responses they apply to MUST be same-origin to each other in this ID. Where CORS comes into play is the dictionary or compressed response's relation to the document context that they are being fetched from (in a browser case anyway).
> 
> Moving the compression down into the transport layer is what we tried before but failed to navigate the browser security issues because the transport layer doesn't have the context of which responses need to be opaque, which responses are partitioned across document or frame boundaries, etc and that the dictionary compression could be used to perform oracle attacks across those boundaries.
>  Likewise, using * as a wildcard in arbitrary URL references is a foot gun.
> It would make more sense to have two attributes, prefix and suffix, and
> have them only match within the URL path (i.e., exclude the origin and
> query portions, preventing matches on full URIs or user-supplied
> query parameters). That is far more likely to get right than allowing
> things like "//example.com/*/*/*/*/****"
> 
> The origin is already excluded from being configurable. There is some discussion about only supporting relative paths but allowing for full URLs just made it easier to reference the existing URL RFC without having to re-define just the parts we need to support.
> 
> Query params can't necessarily be excluded and some sites are going to want to allow for either fixed query param matching or wildcard (and maybe for both the static and dynamic use case).  Allowing for * allows for some flexibility in site URL structure while still keeping the matching relatively simple and without the complexity of URLPattern (https://github.com/WICG/urlpattern/blob/main/mdn-drafts/QUICK-REFERENCE.md)
> 
> Anyway, I look forward to shaking these issues out.  I'll see about creating issues in the github repo that I have been using for the ID for all of the questions and concerns raised to make sure we don't lose track of any of them (repo is here: https://github.com/pmeenan/i-d-compression-dictionary ).
> 
> Thanks,
> 
> -Pat
Received on Friday, 18 August 2023 10:25:36 UTC