- From: Patrick Meenan <patmeenan@gmail.com>
- Date: Fri, 18 Aug 2023 09:43:27 -0400
- To: Stefan Eissing <stefan@eissing.org>
- Cc: Fielding Roy <fielding@gbiv.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Tommy Pauly <tpauly@apple.com>
- Message-ID: <CAJV+MGxh5jzTWGzAmohyWeGd+9928HP=YfZPu1ch0pXo=LUhnQ@mail.gmail.com>
Content-Encoding is end-to-end and follows the content (though some reverse proxies decode/re-encode the encoding if you are relying on the reverse proxy for your compression or modifying the payload). Transfer-Encoding or anything at the HTTP/2 or 3 layer would be hop-to-hop. The main requirements for a reverse proxy to "work" with an origin using dictionary compression are: - Pass unknown "Accept-Encoding" values through (if they are stripped, the responses will still work but dictionary compression won't be used) - Treat "Content-Encoding" responses with unknown encodings as opaque responses (most that I have tested already do this) - Support "Vary" for cache keys (if it is a caching proxy) for "Accept-Encoding" and "Sec-Available-Dictionary" request headers (may require some config depending on the proxy) Here are some notes from April when I tested it on Fastly, CloudFront and Cloudflare, all of which are reverse-proxies: https://github.com/pmeenan/compression-dictionary-notes/blob/main/CDN.md The basic flow looks something like this: - Request comes in to reverse proxy from c1 for https://example.com/v2/main.js with "Accept-Encoding: deflate, gzip, br, zstd, br-d, zstd-d" and "Sec-Available-Dictionary: xxxyyyzzz" - Resource isn't found in cache, request is made from reverse-proxy to b1 for the URL with the same request headers - Response from b1 comes back with "Content-Encoding: br-d" and "Vary: content-encoding, sec-available-dictionary" (and appropriate cache headers making it cache eligible) - Proxy stores it in cache, keyed by URL, The Accept-Encoding string and the xxxyyyzzz dictionary - Proxy responds with the dictionary-compressed resource (doesn't try to re-compress it since it is already using content-encoding (and maybe with an encoding the proxy doesn't understand) - Request comes in to reverse proxy from c2 for https://example.com/v2/main.js with "Accept-Encoding: deflate, gzip, br, zstd, br-d, zstd-d" and "Sec-Available-Dictionary: xxxyyyzzz" - Proxy finds resource in cache, keyed by URL, Accept-Encoding and xxxyyyzzz dictionary and serves the dictionary-compressed resource from cache On Fri, Aug 18, 2023 at 6:25 AM Stefan Eissing <stefan@eissing.org> wrote: > > > > Am 17.08.2023 um 22:39 schrieb Patrick Meenan <patmeenan@gmail.com>: > > > > Probably worth continuing the discussion in a dedicated thread if > adopted but hopefully it won't hurt to take a first pass (inline)... > > > > On Thu, Aug 17, 2023 at 1:55 PM Roy T. Fielding <fielding@gbiv.com> > wrote: > > I think implementation of such through content-codings is fundamentally > > misguided because it changes the resource itself and impacts all caching > > along the chain of requests in ways that are non-recoverable. That is due > > to the lost metadata and variance on whatever request field is used to > indicate > > that some downstream client can grok some possible dictionary. > > > > The decoded version of the resource is unchanged. It's not fundamentally > different than brotli which happens to include a default dictionary and the > caching is guaranteed to be maintained in a consistent way as long as > "Vary" works on "Accept-Encoding" as well as whatever header negotiates the > dictionary. Even without the dictionary, if something in the middle > doesn't know how to process one of the content-encodings (and needs to be > able to access the content) then the accept-encoding should be modified to > only include encodings that it knows how to work with. This isn't really > notably different than "br" or "zstd". > > How would a caching reverse proxy work here? Assume there are frontend > connection c1 and c2 and backend connection b1? > > Can there be dictionary state shared between the clients and the backend? > If not, and the reverse proxy would need to decode/re-encode content, this > looks like a Hop-By-Hop thing. Which transfer-encoding seems to suite > better, e.g. better suited to work with the existing infra. > > Maybe I just have an incomplete understanding how this is supposed to work. > > Kind Regards, > Stefan > > > In short, it looks like an easy solution for a browser, but will wreak > > havoc with the larger architecture of the Web. > > > > The right way to do this is to implement it as a transfer encoding that > > can be decoded without loss or confusion with the unencoded resource, > > which would require extending h2 and h3 to support that feature of > HTTP/1.1. > > > > For the existing draft, there is a lot of unnecessary confusion regarding > > features of fetch, like CORS, that don't make any sense from a security > > perspective. That's not what CORS is capable of covering, nor how it is > > implemented in practice, so reusing it doesn't make any sense. > > The same goes for use of the Sec- prefix on header fields. > > > > CORS covers privacy from a browser perspective as far as the readability > of responses relative to the origin of the containing document which is > exactly the context that it is needed for here. The concern that it takes > care of is to make sure that responses that shouldn't be readable from the > document context of the client can't be exposed to oracle timing attacks > (because there won't be any client-opaque responses). HTTP itself doesn't > really have the same document framing context and need for protecting read > access of individual responses on a shared connection by clients running in > different document contexts. > > Allowing a response from one origin to define a compression dictionary > > for responses received from some other origin would clearly violate the > > assumptions of https in so many ways (space, time, and cross-analysis). > > I don't see how we could possibly allow that even if both origins were > > covered by the same certificate. It would be far easier to require that > > everything have the same origin (as defined in RFC9110, not fetch) or > > by having the response origin define specifically which dictionary is > > being used (identifying both the dictionary URL and hash). In the latter > > case, it would be possible to pre-define common dictionaries and thus > > reduce or remove the need to download them. > > > > Maybe we crossed wires somewhere, but the dictionaries and the responses > they apply to MUST be same-origin to each other in this ID. Where CORS > comes into play is the dictionary or compressed response's relation to the > document context that they are being fetched from (in a browser case > anyway). > > > > Moving the compression down into the transport layer is what we tried > before but failed to navigate the browser security issues because the > transport layer doesn't have the context of which responses need to be > opaque, which responses are partitioned across document or frame > boundaries, etc and that the dictionary compression could be used to > perform oracle attacks across those boundaries. > > Likewise, using * as a wildcard in arbitrary URL references is a foot > gun. > > It would make more sense to have two attributes, prefix and suffix, and > > have them only match within the URL path (i.e., exclude the origin and > > query portions, preventing matches on full URIs or user-supplied > > query parameters). That is far more likely to get right than allowing > > things like "//example.com/*/*/*/*/****" > > > > The origin is already excluded from being configurable. There is some > discussion about only supporting relative paths but allowing for full URLs > just made it easier to reference the existing URL RFC without having to > re-define just the parts we need to support. > > > > Query params can't necessarily be excluded and some sites are going to > want to allow for either fixed query param matching or wildcard (and maybe > for both the static and dynamic use case). Allowing for * allows for some > flexibility in site URL structure while still keeping the matching > relatively simple and without the complexity of URLPattern ( > https://github.com/WICG/urlpattern/blob/main/mdn-drafts/QUICK-REFERENCE.md > ) > > > > Anyway, I look forward to shaking these issues out. I'll see about > creating issues in the github repo that I have been using for the ID for > all of the questions and concerns raised to make sure we don't lose track > of any of them (repo is here: > https://github.com/pmeenan/i-d-compression-dictionary ). > > > > Thanks, > > > > -Pat > >
Received on Friday, 18 August 2023 13:43:46 UTC