Re: Query string cacheability from Mark Nottingham on 2010-05-20 (ietf-http-wg@w3.org from April to June 2010)

From: Mark Nottingham <mnot@mnot.net>
Date: Thu, 20 May 2010 10:43:03 +1000
To: Eric Lawrence <ericlaw@exchange.microsoft.com>, Roy Fielding <roy.fielding@day.com>
Cc: "Julian F. Reschke" <julian.reschke@gmx.de>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <E056C83D-0977-49E7-A557-79EAB63FD712@mnot.net>
Interesting. My tests were purely using XHR, and they found that most caches honoured this requirement (IE being the exception); <http://www.mnot.net/blog/2006/05/11/browser_caching>.

I know that Squid honours it as well (or at least did until very recently).

All things being equal, I agree with Roy that it would be good to get rid of this requirement. However, we can't just remove requirements because we don't like them.

If there really isn't existing interop over this requirement -- i.e., a significant number of caches don't honour it -- then it makes more sense. Eric, do you have anything to help reproduce your results that can be made public?

Taking a completely different tack: It's always bothered me that the heuristic is so ill-defined; someone could say that a heuristic is "if it's a HTTP request, I'll cache it." 

Is anyone aware of an implementation that uses a heuristic that *isn't* based upon the Last-Modified header? Tightening that up might make loosening this easier (because script-served query URIs don't emit LM unless they really mean it).

Cheers,




On 20/05/2010, at 5:10 AM, Eric Lawrence wrote:

> FWIW, my research shows that most current version browsers do not meet this requirement.  For a resource which contains no explicit lifetime information but whose URL contains a query string:
> 
> - Firefox will conditionally request/revalidate for LINK HREF (e.g. CSS) and SCRIPT SRC tags. For IMG tags, Firefox appears to revalidate the resource only once per browser session.
> 
> - Internet Explorer 8 and below revalidate such resources once per browser session, regardless of the context in which the resource is used.
> 
> - Chrome and Opera appear to ignore the query string and reuse the cached resource without validation, both during navigation and across browser restarts.
> 
> - Safari for Windows does not revalidate a page's resources during hyperlink navigation. However, it does not appear to cache heuristically cacheable content across multiple browser sessions. Safari 4.0.5 always appears to unconditionally re-request the direct target of a navigation, regardless of whether or not the resource was delivered with headers indicating it was still fresh.
> 
> Eric Lawrence
> IE Program Management
> 
> -----Original Message-----
> From: ietf-http-wg-request@w3.org [mailto:ietf-http-wg-request@w3.org] On Behalf Of Julian Reschke
> Sent: Wednesday, May 19, 2010 6:49 AM
> To: Mark Nottingham
> Cc: HTTP Working Group
> Subject: Re: Query string cacheability
> 
> On 19.05.2010 14:31, Mark Nottingham wrote:
>> One of the things that I did in the big caching rewrite was to remove the text about the effect of query strings on cacheability:
>> 
>>> Section 13.9
>> [...]
>>> 
>>>    We note one exception to this rule: since some applications have
>>>    traditionally used GETs and HEADs with query URLs (those containing a
>>>    "?" in the rel_path part) to perform operations with significant side
>>>    effects, caches MUST NOT treat responses to such URIs as fresh unless
>>>    the server provides an explicit expiration time. This specifically
>>>    means that responses from HTTP/1.0 servers for such URIs SHOULD NOT
>>>    be taken from a cache.
>> 
>> replacing it with, in p6 2.3.1.1:
>> 
>>>    [[REVIEW-query-string-heuristics: took away HTTP/1.0 query string
>>>    heuristic uncacheability.]]
>> 
>> Looking at this with somewhat fresh (but also a bit sleepy) eyes, I think we can re-introduce this text, but wonder if we need the last sentence; it's somewhat of a non-sequitor, AFAICT, since RFC1945 had Expires to determine an explicit expiration time, and anyway it should probably say "origin server," which as discussed before is sometimes difficult to tell, given the lack of Via support in many intermediaries.
>> 
>> I propose we address this by changing the beginning of 2.3.1.1 to:
>> 
>> """
>>    If no explicit expiration time is present in a stored response that
>>    has a status code of 200, 203, 206, 300, 301 or 410, a heuristic
>>    expiration time can be calculated.  Heuristics MUST NOT be used for
>>    other response status codes.
>> 
>>    Also, heuristic freshness MUST NOT be used for responses
>>    to requests with a query component, because
>>    some applications have traditionally used queries on URLs to
>>    perform operations with significant side effects.
>> 
>>    [ remaining paragraphs as in -09]
>> """
>> 
>> Thoughts?
> 
> Sounds good to me.
> 
> Maybe replace
> 
> "some applications have traditionally used queries on URLs to perform operations with significant side effects"
> 
> with
> 
> "some historic, non-compliant applications have implemented non-safe operations in this case"
> 
> (points being: what's in error is the server, and it always has been a compliance issue, no?)
> 
> Best regards, Julian
> 


--
Mark Nottingham     http://www.mnot.net/
Received on Thursday, 20 May 2010 00:43:38 UTC