Re: SHOULD-level requirements in p6-caching from Mark Nottingham on 2011-04-08 (ietf-http-wg@w3.org from April to June 2011)

From: Mark Nottingham <mnot@mnot.net>
Date: Fri, 8 Apr 2011 15:10:17 +1000
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <87AB210C-782E-4475-BD4B-A552A549588E@mnot.net>
Thanks for the feedback, Poul-Henning. Responses below.


On 07/04/2011, at 7:35 PM, Poul-Henning Kamp wrote:

> In message <90400372-C89F-4E9C-92F6-D8F1A6AAD631@mnot.net>, Mark Nottingham writes:
> 
>> In 2.5, 
>> 
>>>   A cache that passes through requests with methods it does not
>>>   understand SHOULD invalidate the effective request URI (Section 4.3
>>>   of [Part1]).
>> 
>> I'm not sure why this is a SHOULD when all of the other invalidation 
>> side effects are MUST-level requirements. Can we raise this to a MUST as 
>> well?
> 
> First off, what does "not understand" mean here ?
> 
> Does that cover a cache which goes "Ohh, POST, I don't do those:
> pass it through" ?

POST is explicitly covered elsewhere in the section, so there's an overlap here; all caches are expected to do this (and more) for POST.

> Or does it only cover "XYZZY / HTTP/1.1" style requests ?

It does that as well.

> Second: Are we sure this complies with Principle Of Least Astonishment ?

Can you say a bit more here? 

> Third: do we really want to give script kiddies their own private
> standards-mandated cache-invalidation button ?

Please read the entire section; this is not new text, and has been present in HTTP for over a decade. In short, there is a mechanism to prevent this kind of attack.


>> In 3.2.1 (only-if-cached),
>> 
>>>      If it receives this
>>>      directive, a cache SHOULD either respond using a stored response
>>>      that is consistent with the other constraints of the request, or
>>>      respond with a 504 (Gateway Timeout) status code.
>> 
>> MUST?
> 
> I must confess I have never understood this directive, nor been able to
> come up with a non-hostile intent for using it.  Can anybody enlighten me ?

I actually know of lots of people using it; it's useful to get something from cache in a low-cost way, without incurring the latency/back-end overhead of a request if it's not in cache.


> Nit:
> 	"The only-if-cached request directive indicates that the client
> 	only wishes to return a stored response."
> 
> 	s/return/receive/ ?

ack.


>> In 3.3,
>> 
>>>   A server SHOULD NOT send Expires dates more than one year in the
>>>   future.
>> 
>> Prose.
> 
> Why this policy restriction ?
> 
> Remove entirely ?

Anyone have history on this one before I dig through the cache list archives?


>> In 3.4,
>> 
>>>   When the no-cache directive is present in a request message, a cache
>>>   SHOULD forward the request toward the origin server even if it has a
>>>   stored copy of what is being requested.
>> 
>> Prose.
> 
> Have you discussed the future of "Pragma:" before ?
> 
> I would like to see the text say that if there is a "Cache-Control:"
> header "Pragma:" MUST be ignored, to resolve the possible conflicts
> between them.

We can't say it MUST be ignored, because I suspect that's break a lot of existing implementations. However, I agree it'd be nice to disambiguate the relative precedence of Pragma and Cache-Control (just as has been done for Cache-Control and Expires).

I also think we can drop "A client SHOULD include both header fields when a no-cache request is sent to a server not known to be HTTP/1.1 compliant. "


> 
>> In 3.5,
>> 
>>>   A server SHOULD include a Vary header field with any cacheable
>>>   response that is subject to server-driven negotiation.
>> 
>> I can't decide if this needs to be a requirement; if it does, I think it 
>> should be a MUST; if not, it should be prose. Thoughts?
> 
> This one is a nasty one.
> 
> Vary is a very crude mechanism in todays web, where objects are
> customized to specific (browser, platform, extension) triplets.
> 
> For instance Varnish experience is that "Vary: User-Agent" reduces
> cache hit rate by about two orders of magnitude.
> 
> Some sites "solve" this problem using a cookie instead.  However,
> you cannot say "Vary: Cookie/my_idea_of_your_browser" to vary only
> on a single cookie, so that doesn't really solve the cache problem.
> 
> This may be a good case for the geniuine SHOULD:  You really should,
> unless you shouldn't because it breaks caching.

I think we need a lot more advice about Vary. This is one of those cases where the spec currently talks about implementation details, when really it should be talking about expected behaviour on the wire. I.e., if you want to enable a client (including one that has a cache) to consider this response specific to the values of certain request headers, you need to include a Vary. That really isn't a SHOULD, it's advisory text, IMO.

This trips people up just as much because they forget to include Vary when they really need to.


> Related to this, we should think a moment about Google Analytics
> here, since it tends to pollute all objects with Cookies, and "Vary:
> Cookie" is implied (but not specified as I recall)
> 
> It would be incredibly beneficial for caching, if it were possible
> to say "Vary: !Cookie" to indicate that the Cookies are not really
> important anyway.


There's been a bit of discussion about defining a new mechanism for refining the cache key; there might be a draft soon. That's not a WG item, however (but it can still be talked about on-list).

Cheers,


--
Mark Nottingham   http://www.mnot.net/
Received on Friday, 8 April 2011 05:10:48 UTC