Re: Hyperlinks and content negotiation from Aryeh Gregor on 2009-10-18 (public-html@w3.org from October 2009)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Sun, 18 Oct 2009 11:53:53 -0400
To: Mike Kelly <mike@mykanjo.co.uk>
Cc: Smylers <Smylers@stripey.com>, public-html@w3.org
Message-ID: <7c2a12e20910180853l20e8ad37j97cb5f121a888c88@mail.gmail.com>

On Fri, Oct 16, 2009 at 10:11 PM, Mike Kelly <mike@mykanjo.co.uk> wrote:
> The benefits are realized in terms of automated cache invalidation.
>
> Modifying a resource should automatically invalidate all of its
> representations.
> (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10)
>
> In a server side reverse proxy cache scenario (a common use case for large
> scale web applications); being able to rely on this automatic mechanism as a
> sole method of cache invalidation ensures that the cache is refreshed as
> infrequently and simply as possible, and that destination server usage is
> kept to a minimum. This kind of efficiency gain can dramatically reduce
> operating costs; particularly true in new 'pay-as-you-process' elastic
> computing infrastructures.

This is an interesting point.  For instance, Wikipedia relies on a
complicated mechanism where the software computes all the needed cache
invalidations on the server side and sends them to its reverse
proxies.  The same goes for any complex application that supports HTTP
caching.  A simpler, standard way of doing that would certainly be
valuable.  In fact, the large majority of web applications don't
support HTTP caching for most of their content, partly because of the
difficulty of purging caches correctly (although also because it
imposes serious limitations on locality of data).

However, in addition to the usability problems that have been pointed
out (bookmarking/copy-paste failure), I don't think your proposal is a
flexible enough solution to be very useful in practice for cache
invalidation.  You suggest the case of an HTML version of a page plus
a feed.

But in practice, blogs and so on will often have many pages that need
to be invalidated.  If the front page of a blog changes, then both the
HTML and RSS/Atom versions will have to be purged from cache, it's
true.  But so will a variety of other resources.  If the blog has a
"latest posts" menu on every page, for instance, every page will have
to be purged from cache.

I don't know much about blogs, so more concretely, I'll talk about
wikis and forums.  MediaWiki often needs to purge a lot of pages
whenever one page changes -- one page can include another as a
template, or link to it (and links can be given CSS classes based on
properties of the linked-to page), and so on.  This logic is
complicated and certainly couldn't be reasonably moved out of the
application.

As for forums, the usual type of change that occurs is a new post.
When a new post is made on a forum, a number of pages typically must
be purged.  For instance, the page for the thread itself needs to be
purged, but so does the page for the subforum (so that the thread is
bumped to the top).  The page for every forum containing the subforum
might also need to be purged, to display a correct "last thread" entry
next to the link to the subforum that the thread is in.  Again, this
is not a case that would benefit much from your suggestion.

Do you have an example of a specific application that currently uses
server-side cache purging but could rely on your automated mechanism
instead?  It seems to be of very narrow utility.

Received on Sunday, 18 October 2009 15:54:28 UTC