- From: <gs-lists-ietf-http-wg@gluelogic.com>
- Date: Thu, 13 Jun 2024 21:54:02 -0400
- To: Jeremy Roman <jbroman@chromium.org>
- Cc: ietf-http-wg@w3.org
On Fri, Jun 14, 2024 at 11:09:17AM +1000, Mark Nottingham wrote: > From a chair's perspective -- the important thing to establish at this point is whether the WG thinks there's a problem to solve here, and whether this is _approximately_ the right starting point. Document formatting and details of the specification aren't as relevant at this early stage. As documented in the original message, numerous CDNs provide a feature to treat query-string as no-vary. I am commenting from experience at a previous $job where this feature was desired, and at least at the time, not supported by Squid. Yes, I believe this is approximately the right starting point, though my preference would be "Vary-Search" or "Vary-Query". ... naming it Vary-Query might dovetail nicely with QUERY method being discussed in https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-safe-method-w-body Cheers, Glenn > > On 14 Jun 2024, at 9:56 AM, gs-lists-ietf-http-wg@gluelogic.com wrote: > > > > On Thu, Jun 13, 2024 at 05:50:05PM -0400, Jeremy Roman wrote: > >> This is a little hard to read in the plaintext rendering, I'll admit. It's > >> marginally better in the HTML rendering (though even then, the stylesheet > >> makes it only slightly distinguishable). > > > > Yes, I read the plaintext draft. > > > >> WHATWG/W3C specs tend to be read in the HTML rendering, where these links > >> would have been blue and underlined. If quotation marks would help with > >> readability in addition to the section cross-link, that's certainly a > >> typographical change I can make. > > > > I believe that IETF desires RFCs to be readable in more than HTML, > > but I am not the person to provide guidance on that. Would the chair > > advise? > > > >> There are many references in the doc to WHATWG specs rather than IETF > >>> specifications for URLs. Is this intentional? > >>> > >> > >> The embodiment we (Google Chrome) have been working on is in a web browser > >> which implements the WHATWG URL specification, and we want this to be > >> useful in web browsers (and HTTP servers which are interacting with web > >> browsers), so being compatible with the way browsers deal with query > >> strings (namely, the application/x-www-form-urlencoded parser, which is > >> used for, for instance, the URLSearchParams object exposed to JavaScript > >> code). > >> > >> I'm less familiar with the IETF specifications that other uses of HTTP use, > >> though for instance RFC 3986 (Uniform Resource Identifier: Generic Syntax) > >> doesn't say much beyond "query components are often used to carry > >> identifying information in the form of 'key=value' pairs", and RFC 9110 > >> (HTTP Semantics) §4.1 simply specifies that this is an optional URL > >> component in the "http" and "https" URI schemes. This doesn't provide > >> enough for the purposes of this document, even if the two are otherwise > >> compatible (which I'm not sure they are). > > > > If WHATWG spec provides a stricter query string format variant, > > and this RFC draft requires use of that stricter or different > > formatting variant, then the differences should be highlighted. > > > > e.g. > > "No-Vary-Search uses a variant of query string format defined in WHATWG > > (reference) which is stricter than the varieties of query string syntax > > allowed in RFC (reference)." > > > > A reference to the WHATWG ABNF for the stricter format variant of the > > query string is also recommended. > > > >>> The document does not mention the implication of the union of variants > >>> between Vary and No-Vary-Search response headers. A CDN or browser > >>> might have to limit the number of variants cached. > >>> > >> > >> At present the limitations on the cache are not present here, just logic to > >> determine whether a response is suitable to be used. RFC 9111 (HTTP > >> Caching) §4.1 deals with the analogous question for existing variants to a > >> limited extent: > >> > >> """ > >> > >> If multiple stored responses match, the cache will need to choose one to > >> use. When a nominated request header field has a known mechanism for > >> ranking preference (e.g., qvalues on Accept and similar request header > >> fields), that mechanism MAY be used to choose a preferred response. If such > >> a mechanism is not available, or leads to equally preferred responses, the > >> most recent response (as determined by the Date header field) is chosen, as > >> per Section 4 > >> <https://www.rfc-editor.org/rfc/rfc9111#constructing.responses.from.caches>. > >> <https://www.rfc-editor.org/rfc/rfc9111#section-4.1-6> > >> > >> Some resources mistakenly omit the Vary header field from their default > >> response (i.e., the one sent when the request does not express any > >> preferences), with the effect of choosing it for subsequent requests to > >> that resource even when more preferable responses are available. When a > >> cache has multiple stored responses for a target URI and one or more omits > >> the Vary header field, the cache SHOULD choose the most recent (see Section > >> 4.2.3 <https://www.rfc-editor.org/rfc/rfc9111#age.calculations>) stored > >> response with a valid Vary field value. > >> > >> """ > >> I anticipated that any discussion of this issue would make most sense as > >> part of any (not yet present) text discussing how this supplements RFC > >> 9111. Are there novel considerations here about CDN and browser limits that > >> merit specification? > > > > A reference to that might be sufficient to acknowledge that this RFC > > extends caching concerns for clients and caching intermediaries. > > > >> Overall, this document uses idioms I am less familiar seeing in RFCs. > >>> Maybe these idioms are more typical in WHATWG documents, but the > >>> pseudo-code is different than what I typically see in RFCs. > >>> Perhaps I am not familiar with the pseudo-markup variant, but it does > >>> not look like markdown to me. > >>> > >> > >> This is actually the RFC <em> element > >> <https://authors.ietf.org/en/rfcxml-vocabulary#em> (semantic emphasis) as > >> submitted, which is rendered in the plaintext rendering as leading and > >> trailing underscores, but in the HTML and PDF renderings as italics. It's > >> used here to set apart variable names from other text, because italics is > >> the typographical convention for doing so in WHATWG/W3C algorithm > >> pseudo-code. Not setting it apart at all might make it harder to skim the > >> pseudocode (e.g., to clearly tell which values are used where), but it's > >> certainly a bit noisy. > >> > >> e.g. To _parse a URL search variance_ given _value_: > >>> See also my confusion above reading > >>> "The obtain a URL search variance algorithm" > >>> which could have been > >>> "The _obtain a URL search variance_ algorithm" > >>> using the _every-other-word_ idiom from Section 4, > >>> though _I_ _am_ _personally_ _not_ _a_ _fan_ _of_ _this_ formatting. > >>> My preference would suggest using a real language (any one), instead of > >>> pseudo-code, to create a reference implementation, if that is the goal. > >>> Add comments to describe required behavior to clarify the reference > >>> implementation. > >>> > >> > >> It's a pseudo-code that is typical in WHATWG/W3C documents ( > >> https://infra.spec.whatwg.org/#algorithms). Those bodies tend to prefer > >> explicit pseudo-code algorithms, as I understand it, because it forces the > >> algorithm to be unambiguous about some precise aspects of the required > >> behavior that are easy to gloss over in natural language (though of course, > >> it's not the only way of doing so). > >> > >> While the particular flavour of pseudo-code may not be familiar to this > >> group, I have seen similar pseudo-code in documents such as RFC 8941 > >> (Structured Field Values for HTTP) which are relevant to this venue. > > > > Again, I read the plaintext draft. Let's wait for guidance from others > > about document formatting and instead discuss content. > > > >> The No-Vary-Search syntax with "except" reads to me as a double-negative: > >>> No-Vary-Search: params, except=("x") > >>> > >>> Not knowing how far along this spec document is, was naming the header > >>> "Vary-Search" considered? With "Vary-Search", inverting the logic would > >>> suggest "params" to default to all params varying (same as not > >>> specifying Vary-Search), and "except" could be "no-vary" > >>> Vary-Search: params, no-vary=("x") > >>> to indicate no-vary for "x", or > >>> Vary-Search: params, no-vary > >>> to indicate all search params are no-vary (wildcard). > >>> > >> > >> Yes, it was considered. This choice was made because it means that an > >> absent header, or empty header value, should reflect that existing HTTP > >> semantics are used. > > > > The same applies to the theoretical "Vary-Search" I described. > > > >> Since the default behavior is to vary on *all* parameters, > >> their order, and in fact even the way those parameters are encoded in the > >> URL, that means that the behavior of the header is naturally opposite to > >> Vary (which starts from a default behavior of varying on *no* header > >> fields). > > > > That is more a definition than a reason. Ok, I get that choice was > > made, but why is it better than non-inverted logic? > > > >> This does lead to a double negative, unfortunately, with the use of the > >> term "vary". Conceptualizing it instead of as "not varying" as "is the same > >> resource" addresses that (i.e., "This resource *is the same resource* as if > >> it had been requested with other query parameters, except if x differs." > >> has no double negative). Drawing the clear connection with Vary (which is > >> well-known), though, seemed worth the double negative. > > > > Respectfully, I disagree. > > > > This header needs to be processed by caching intermediaries and client > > for caching, just like "Vary" needs to be processed for caching > > variants. Caching variants will have to process and *invert* the logic > > of "No-Vary-Search" to produce the "Vary"-set of varying parameters. > > > > Put another way, storing the variant requires construction of a cache > > key for the variant, which is some sort of encoding of the varying > > parameters. Since the set of varying parameters needs to be collected, > > it makes more sense to me to have Vary and Vary-Search, rather than > > Vary and invert-the-logic-of No-Vary-Search. However, since > > No-Vary-Search supports both positive and a negative ("except") ways to > > define the varying search parameters or non-varying search parameters, > > you might argue the opposite, depending on which version (positive or > > negative) of No-Vary-Search you think may be used more frequently. > > > >> 7. Privacy Considerations > >>> > >>> The ability to cache variants based on search parameters could possibly > >>> compromise privacy due to fingerprinting and the ability to detect cache > >>> hit versus cache miss even with coarse timing resolution. > >>> > >> > >> Can you elaborate? If anything, I would have expected that disregarding > >> certain query parameters would mean that someone probing the cache can > >> learn *less* about which values of that query parameter have been seen by > >> the cache previously. > >> > >> In the context of web browsers, this sort of attack is also mitigated by HTTP > >> cache partitioning > >> <https://developer.chrome.com/blog/http-cache-partitioning> which is now > >> specified (incompletely) by the WHATWG Fetch standard > >> <https://fetch.spec.whatwg.org/#http-cache-partitions>. > > > > If I have 10 different sets of 16-urls each, can I use caching to create > > a tracking identifier if I assign the client one url from each of those > > 10 sets and create a 10-hexdigit identifier? Fetching all the URLs and > > detecting which ones are cached might reveal the identifier? When I > > assign the URLs, I can assign a non-vary parameter 'tracker=1'. When > > a different page on a different site is requesting all the URLs to > > detect which are cached, the client can add non-vary parameter > > 'tracker=0'. The server responding can assign HTTP caching headers > > based on the response. Anyway, this is not my area of expertise. > > Yes, CORS headers factor in, but takes effort to set up properly, and > > "properly" might be malicious if those headers are from malicious sites. > > > > https://coveryourtracks.eff.org/ > > > > Some of your colleages at Google would be able to better explain all the > > creative ways Google is fingerprinting clients in addition to cookies. > > > > Cheers, Glenn > > > >> Cheers, Glenn > >>> > >>> > >>> On Wed, Jun 12, 2024 at 01:23:23PM -0400, Jeremy Roman wrote: > >>>> In the interest of continuing discussion on this list, the WICG draft has > >>>> been reformatted in RFC format and reported to the Datatracker: > >>>> > >>>> https://datatracker.ietf.org/doc/draft-wicg-http-no-vary-search/01/ > >>>> or directly on GitHub > >>>> > >>> https://jeremyroman.github.io/http-no-vary-search/draft-wicg-http-no-vary-search.html > >>>> > >>>> The text has been left mostly unchanged so far (modulo very small > >>> editorial > >>>> changes), and does not yet reflect any change to RFC 9111 behavior > >>> (though > >>>> hopefully it's clear what those changes would be, from the existing > >>> text). > >>>> > >>>> On Tue, Mar 19, 2024 at 2:26 AM Mark Nottingham <mnot@mnot.net> wrote: > >>>> > >>>>> Hi Jeremy, > >>>>> > >>>>>> On 19 Mar 2024, at 11:44, Jeremy Roman <jbroman@chromium.org> wrote: > >>>>>> > >>>>>> Unfortunately it is not possible for me to join personally (time > >>> zones > >>>>> and personal complications). We might be able to brief a Chrome team > >>> member > >>>>> who is attending if there is interest (depending when this is), though > >>> as > >>>>> you point out it would necessarily be a fairly brief overview on short > >>>>> notice (so it might not be possible). > >>>>> > >>>>> It doesn't look likely that we'll have time for additional > >>> presentations. > >>>>> I'd suggest continuing the discussion on the list. > >>>>> > >>>>> Just for some context -- we found this kind of capability useful when I > >>>>> was at Yahoo! way back in 2010: > >>>>> https://www.mnot.net/talks/pdf/Stupid_Web_Caching_Tricks.pdf#page=36 > >>>>> > >>>>> Cloudflare supports configuration to ignore the whole query string, as > >>>>> well as specific arguments in it: > >>>>> https://developers.cloudflare.com/cache/how-to/cache-keys/ > >>>>> > >>>>> As does Fastly: > >>>>> https://docs.fastly.com/en/guides/making-query-strings-agnostic > >>>>> > >>>>> > >>> https://www.fastly.com/documentation/solutions/examples/manipulate-query-string/ > >>>>> > >>>>> As does Akamai (apparently, based upon the information available): > >>>>> > >>>>> > >>> https://community.akamai.com/customers/s/article/Remove-query-strings-from-forward-request-and-cache-key?language=en_US > >>>>> > >>>>> I know Varnish supports this as well; I've done it with Squid (using a > >>>>> helper) too. Not sure about eg nginx or Apache httpd. > >>>>> > >>>>> So I suspect it's safe to say there's interest in this general feature > >>>>> from people who use HTTP caches. > >>>>> > >>>>> The difference here is the control mechanism to invoke that behaviour > >>> -- > >>>>> putting it in a response header is really nice because it's a) > >>>>> standardised, so (eventually) interoperable across implementations, > >>> and b) > >>>>> driven by the resource on the origin server, who has the most > >>> information > >>>>> about the URL's semantics (rather than relying on out-of-band > >>>>> configuration). > >>>>> > >>>>> However, when a cache has multiple stored responses and they have > >>>>> conflicting information about the cache key, we need to be careful > >>> about > >>>>> specifying the interaction. In a way, this is similar to Vary -- it > >>> faced a > >>>>> similar question, and the decisions made in its design made > >>> implementation > >>>>> difficult. We chose a different approach in Key and Variants to address > >>>>> that; we should probably have a similar discussion here. > >>>>> > >>>>> Cheers, > >>>>> > >>>>> > >>>>> -- > >>>>> Mark Nottingham https://www.mnot.net/ > >>>>> > >>>>> > >>> > > > > -- > Mark Nottingham https://www.mnot.net/
Received on Friday, 14 June 2024 01:54:18 UTC