Re: Working Group Last Call: draft-ietf-httpbis-cache-groups-02

Hi!


First time ever participating here. And >15 years since I last e-mailed to a mailing list — last one was development@drupal.org 😅

I am not familiar with the conventions here, but encouraged by Mark Nottingham, I’m posting the comments I previously sent to him after he reached out. (I met some of you at the 2019 Amsterdam Workshop!)


# Perspective

My perspective is that from having authored significant parts of the Drupal (https://drupal.org) rendering pipeline and caching infrastructure, and being employed by a major Drupal hosting & services provider. I offer these comments as an individual Drupal contributor with a broad perspective, not as an employee of Acquia.

Crucially: Drupal is cheaply and easily deployable, and makes very few infrastructure assumptions. Reverse proxies are not guaranteed. Which is why for … does git archeology … at least 22 years now, Drupal has a built-in “Page Cache” aka a very basic reverse proxy: https://git.drupalcode.org/project/drupal/-/blob/4.0.x/includes/common.inc#L490-503 — it is more advanced since Drupal 8 (shipped ~9 years ago, in 2015).

Since Drupal 8, “cache tags” are natively supported, which is the very same concept being standardized here. The official docs for cache tags: https://www.drupal.org/docs/drupal-apis/cache-api/cache-tags.

To fully leverage cache tags, virtually all concepts and layers in Drupal have been evolved to support awareness of "cache tags" (and "cache contexts” and max-age). Which is how in Drupal 8, we were able to ship with Page Cache enabled by default, because stale responses are impossible: https://wimleers.com/blog/drupal-8-page-caching-enabled-by-default.

Ask any experienced Drupal developer about caching, and they’ll start talking to you about cache tags at least, hopefully also cache contexts.

I’d *love* to see this standardized! It’s a powerful concept. It’s benefited Drupal immensely. It was proudly found elsewhere. I wasn’t even the one to find it; I was just one of a few people driving it to the point where it was actually usable in production.

It's not without its challenges either: any cache tags that appear on a significant portion/majority of (cached) HTTP responses can cause multiple failure modes, depending on the infrastructure’s architecture.


# Comments

1. “groups” is IMHO a non-ideal term choice. The word “group” comes with connotations that do not match the semantics, because usually things are (exclusively) in one of N groups, not in multiple. “sets” would be better (as in math). Or, “tags”, which is the popularized equivalent of “sets” thanks to multiple social media platforms.

2. 128 groups of up to 128 characters each is 16 KiB. This is the same limit Drupal core chose: https://www.drupal.org/docs/drupal-apis/cache-api/cache-tags#reverse-proxies — but my employer, Acquia, is finding this to be a problem/challenge in the (ongoing) infrastructure overhaul, because Apache and others have a default limit of 8 KiB.

3. Why would an opaque string of 128 characters ever be necessary? Why lean that much towards human readability? Why not 256 groups of 64 characters? Or 512 of 32? IOW: what’s the rationale behind this choice, what trade-offs were considered to standardize these particular trade-offs?

4. Why wrap each opaque string in double quotes? This feels like wasted characters, and actually would push the number of *transmitted* bytes to 128*130=16.25 KiB — not even counting the commas and spaces.

5. What if a Cache-Group-Invalidation response header reaches only a subset of the reverse proxies? E.g. I update an article from Belgium, and the response to my PATCH request contains a Cache-Group-Invalidation header, passing through Belgian reverse proxies, but you’re in Australia. Your reverse proxy has no way to know. This is called out in the last paragraph before 1.1 👍, but this alone severely limits real-world applicability 😅

6. … even just in Drupal *itself*, 2 basic reverse proxies are built in: "Page Cache” (for anon users only, so no variations, but guarantees instantaneous updates thanks to cache tags) and “Dynamic Page Cache” (for all users, only the efficiently cacheable parts, the poorly cacheable parts are rendered per request, see this from nearly a decade ago: https://wimleers.com/article/drupal-8-dynamic-page-cache). Both of these would be able to work by relying solely on this RFC. But e.g. in Acquia Cloud, there’s a Varnish reverse proxy at the Acquia Cloud level (to minimize Drupal bootstraps/PHP execution/origin hits to the multiple web servers per site), and then a CDN in front of *that*. That means you need to invalidate both of those too. Well-established solution, with >30K sites using it: https://www.drupal.org/project/purge (many of these sites are *very* big web properties).


P.S.: Please assume my non-participation in the near future, because I’m about to be a dad in *days* 😅


Wim



> On 19 Oct 2024, at 01:39, Tommy Pauly <tpauly@apple.com> wrote:
> 
> Hello HTTP,
> 
> This email starts a working group last call for draft-ietf-httpbis-cache-groups-02. This draft is stable at this point, and doesn’t have any open issues.
> 
> You can find the draft here:
> https://www.ietf.org/archive/id/draft-ietf-httpbis-cache-groups-02.html
> https://datatracker.ietf.org/doc/draft-ietf-httpbis-cache-groups/
> 
> Please send your review and comments in response to this email, and file issues to https://github.com/httpwg/http-extensions/issues.
> 
> This call will be open until *Monday, November 4*. (Which is also our first meeting day for IETF 121 in Dublin, see folks there!)
> 
> Best,
> Tommy

Received on Tuesday, 22 October 2024 09:26:20 UTC