- From: Mark Nottingham <mnot@mnot.net>
- Date: Wed, 30 Oct 2024 10:58:40 +0000
- To: Wim Leers <work@wimleers.com>
- Cc: Working Group HTTP <ietf-http-wg@w3.org>
Hi Wim, thanks very much for the feedback! Some responses below. > On 20 Oct 2024, at 11:49 pm, Wim Leers <work@wimleers.com> wrote: > > Hi! > > First time ever participating here. And >15 years since I last e-mailed to a mailing list — last one was development@drupal.org 😅 > > I am not familiar with the conventions here, but encouraged by Mark Nottingham, I’m posting the comments I previously sent to him after he reached out. (I met some of you at the 2019 Amsterdam Workshop!) > > > # Perspective > > My perspective is that from having authored significant parts of the Drupal (https://drupal.org) rendering pipeline and caching infrastructure, and being employed by a major Drupal hosting & services provider. I offer these comments as an individual Drupal contributor with a broad perspective, not as an employee of Acquia. > > Crucially: Drupal is cheaply and easily deployable, and makes very few infrastructure assumptions. Reverse proxies are not guaranteed. Which is why for … does git archeology … at least 22 years now, Drupal has a built-in “Page Cache” aka a very basic reverse proxy: https://git.drupalcode.org/project/drupal/-/blob/4.0.x/includes/common.inc#L490-503 — it is more advanced since Drupal 8 (shipped ~9 years ago, in 2015). > > Since Drupal 8, “cache tags” are natively supported, which is the very same concept being standardized here. The official docs for cache tags: https://www.drupal.org/docs/drupal-apis/cache-api/cache-tags. > > To fully leverage cache tags, virtually all concepts and layers in Drupal have been evolved to support awareness of "cache tags" (and "cache contexts” and max-age). Which is how in Drupal 8, we were able to ship with Page Cache enabled by default, because stale responses are impossible: https://wimleers.com/blog/drupal-8-page-caching-enabled-by-default. > > Ask any experienced Drupal developer about caching, and they’ll start talking to you about cache tags at least, hopefully also cache contexts. > > I’d *love* to see this standardized! It’s a powerful concept. It’s benefited Drupal immensely. It was proudly found elsewhere. I wasn’t even the one to find it; I was just one of a few people driving it to the point where it was actually usable in production. > > It's not without its challenges either: any cache tags that appear on a significant portion/majority of (cached) HTTP responses can cause multiple failure modes, depending on the infrastructure’s architecture. Thanks - it's helpful to have data from another existing implementation, as well as confirmation that both the use case and standardisation are important. > # Comments > > 1. “groups” is IMHO a non-ideal term choice. The word “group” comes with connotations that do not match the semantics, because usually things are (exclusively) in one of N groups, not in multiple. “sets” would be better (as in math). Or, “tags”, which is the popularized equivalent of “sets” thanks to multiple social media platforms. I know. 'Groups' was a somewhat political decision to avoid using a term that was already used by some vendors but not others. If folks don't worry about this I'm happy to change the name (except that we tend to get into ratholes naming things!). To keep it simple perhaps we should just choose between "cache groups' and 'cache tags' ('sets' begs the question of 'sets of _what_?') Anyone have strong opinions? > 2. 128 groups of up to 128 characters each is 16 KiB. This is the same limit Drupal core chose: https://www.drupal.org/docs/drupal-apis/cache-api/cache-tags#reverse-proxies — but my employer, Acquia, is finding this to be a problem/challenge in the (ongoing) infrastructure overhaul, because Apache and others have a default limit of 8 KiB. I don't think the specification should be limited by one implementation, especially when it can be tuned. > 3. Why would an opaque string of 128 characters ever be necessary? Why lean that much towards human readability? Why not 256 groups of 64 characters? Or 512 of 32? IOW: what’s the rationale behind this choice, what trade-offs were considered to standardize these particular trade-offs? It's a good question. See below. > 4. Why wrap each opaque string in double quotes? This feels like wasted characters, and actually would push the number of *transmitted* bytes to 128*130=16.25 KiB — not even counting the commas and spaces. My recollection is that number was chosen because it was the lowest supported by all of the existing implementations. Unfortunately I don't have the data at hand any more, but it should be possible to find. That doesn't mean we need to choose _that_ number -- I could see 64, for example. What do other folks think? > 5. What if a Cache-Group-Invalidation response header reaches only a subset of the reverse proxies? E.g. I update an article from Belgium, and the response to my PATCH request contains a Cache-Group-Invalidation header, passing through Belgian reverse proxies, but you’re in Australia. Your reverse proxy has no way to know. This is called out in the last paragraph before 1.1 👍, but this alone severely limits real-world applicability 😅 It does, in uncoordinated caches. The thinking here is that a) in use cases where it's most important that the cache a user is sending requests through has up-to-date information, such as when their POST changes thing, it's not necessary to have global distribution of the invalidation, and b) when caches _are_ coordinated (such as in a CDN), mechanisms like Targeted Cache Control (RFC 9213 - <https://www.rfc-editor.org/rfc/rfc9213.html>) can be used to select them, while giving other caches more conservative caching policies. This could be better described in the draft though; I'll try to do so. > 6. … even just in Drupal *itself*, 2 basic reverse proxies are built in: "Page Cache” (for anon users only, so no variations, but guarantees instantaneous updates thanks to cache tags) and “Dynamic Page Cache” (for all users, only the efficiently cacheable parts, the poorly cacheable parts are rendered per request, see this from nearly a decade ago: https://wimleers.com/article/drupal-8-dynamic-page-cache). Both of these would be able to work by relying solely on this RFC. But e.g. in Acquia Cloud, there’s a Varnish reverse proxy at the Acquia Cloud level (to minimize Drupal bootstraps/PHP execution/origin hits to the multiple web servers per site), and then a CDN in front of *that*. That means you need to invalidate both of those too.. Well-established solution, with >30K sites using it: https://www.drupal.org/project/purge (many of these sites are *very* big web properties). I'm not sure if this changes the spec -- but I agree that more work on distributing invalidations is necessary to make it functional in cases like this. > P.S.: Please assume my non-participation in the near future, because I’m about to be a dad in *days* 😅 Hope it goes/went well, and get some sleep! Cheers, > >> On 19 Oct 2024, at 01:39, Tommy Pauly <tpauly@apple.com> wrote: >> >> Hello HTTP, >> >> This email starts a working group last call for draft-ietf-httpbis-cache-groups-02. This draft is stable at this point, and doesn’t have any open issues. >> >> You can find the draft here: >> https://www.ietf.org/archive/id/draft-ietf-httpbis-cache-groups-02.html >> https://datatracker.ietf.org/doc/draft-ietf-httpbis-cache-groups/ >> >> Please send your review and comments in response to this email, and file issues to https://github.com/httpwg/http-extensions/issues. >> >> This call will be open until Monday, November 4. (Which is also our first meeting day for IETF 121 in Dublin, see folks there!) >> >> Best, >> Tommy -- Mark Nottingham https://www.mnot.net/
Received on Wednesday, 30 October 2024 10:58:46 UTC