RE: Query string cacheability from Eric Lawrence on 2010-05-21 (ietf-http-wg@w3.org from April to June 2010)

From: Eric Lawrence <ericlaw@exchange.microsoft.com>
Date: Fri, 21 May 2010 22:04:50 +0000
To: Mark Nottingham <mnot@mnot.net>, Roy Fielding <roy.fielding@day.com>
CC: "Julian F. Reschke" <julian.reschke@gmx.de>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <479CAD406474484E8FA0E39E694732C00864CF@DF-M14-03.exchange.corp.microsoft.com>
My test case entails JS, CSS, and IMAGE resources whose references are directly into the HTML. I wouldn't be surprised at all to learn that some browsers have different behavior for XHR. It's worth noting that I only checked the Windows versions of the browsers mentioned.

Unfortunately, my test case might not be easily accessible to non-Windows users, as it's built as a Meddler Script. Meddler (www.fiddler2.com/meddler/) is a trivial little program which allows you to package a web site repro into a single file (the sort of thing that I'd imagine most folks would do in Perl). 

The test script is here: www.debugtheweb.com/dl/heuristicexpiration.ms. Basically, you need only load the script in Meddler, then visit http://127.0.0.1:8088/lm/ and then watch your HTTP traffic.

As to the question of whether anything will "break" without this requirement: I agree that for the browser world the answer is "probably not", largely because the majority of clients never implemented it. 

I only bring up the topic now because some webdevs have complained that IE6,7 and 8 do not enforce the requirement and thus they consider IE "broken" because they have to put a randomized query string on requests to avoid caching. Obviously, these webdevs don't really understand how response headers containing cache directives are meant to work, but it would be great to get the spec clarified on this particular issue.

Thanks!

-Eric


-----Original Message-----
From: Mark Nottingham [mailto:mnot@mnot.net] 
Sent: Wednesday, May 19, 2010 5:43 PM
To: Eric Lawrence; Roy Fielding
Cc: Julian F. Reschke; HTTP Working Group
Subject: Re: Query string cacheability

Interesting. My tests were purely using XHR, and they found that most caches honoured this requirement (IE being the exception); <http://www.mnot.net/blog/2006/05/11/browser_caching>.

I know that Squid honours it as well (or at least did until very recently).

All things being equal, I agree with Roy that it would be good to get rid of this requirement. However, we can't just remove requirements because we don't like them.

If there really isn't existing interop over this requirement -- i.e., a significant number of caches don't honour it -- then it makes more sense. Eric, do you have anything to help reproduce your results that can be made public?

Taking a completely different tack: It's always bothered me that the heuristic is so ill-defined; someone could say that a heuristic is "if it's a HTTP request, I'll cache it." 

Is anyone aware of an implementation that uses a heuristic that *isn't* based upon the Last-Modified header? Tightening that up might make loosening this easier (because script-served query URIs don't emit LM unless they really mean it).

Cheers,




On 20/05/2010, at 5:10 AM, Eric Lawrence wrote:

> FWIW, my research shows that most current version browsers do not meet this requirement.  For a resource which contains no explicit lifetime information but whose URL contains a query string:
> 
> - Firefox will conditionally request/revalidate for LINK HREF (e.g. CSS) and SCRIPT SRC tags. For IMG tags, Firefox appears to revalidate the resource only once per browser session.
> 
> - Internet Explorer 8 and below revalidate such resources once per browser session, regardless of the context in which the resource is used.
> 
> - Chrome and Opera appear to ignore the query string and reuse the cached resource without validation, both during navigation and across browser restarts.
> 
> - Safari for Windows does not revalidate a page's resources during hyperlink navigation. However, it does not appear to cache heuristically cacheable content across multiple browser sessions. Safari 4.0.5 always appears to unconditionally re-request the direct target of a navigation, regardless of whether or not the resource was delivered with headers indicating it was still fresh.
> 
> Eric Lawrence
> IE Program Management
> 
> -----Original Message-----
> From: ietf-http-wg-request@w3.org [mailto:ietf-http-wg-request@w3.org] On Behalf Of Julian Reschke
> Sent: Wednesday, May 19, 2010 6:49 AM
> To: Mark Nottingham
> Cc: HTTP Working Group
> Subject: Re: Query string cacheability
> 
> On 19.05.2010 14:31, Mark Nottingham wrote:
>> One of the things that I did in the big caching rewrite was to remove the text about the effect of query strings on cacheability:
>> 
>>> Section 13.9
>> [...]
>>> 
>>>    We note one exception to this rule: since some applications have
>>>    traditionally used GETs and HEADs with query URLs (those containing a
>>>    "?" in the rel_path part) to perform operations with significant side
>>>    effects, caches MUST NOT treat responses to such URIs as fresh unless
>>>    the server provides an explicit expiration time. This specifically
>>>    means that responses from HTTP/1.0 servers for such URIs SHOULD NOT
>>>    be taken from a cache.
>> 
>> replacing it with, in p6 2.3.1.1:
>> 
>>>    [[REVIEW-query-string-heuristics: took away HTTP/1.0 query string
>>>    heuristic uncacheability.]]
>> 
>> Looking at this with somewhat fresh (but also a bit sleepy) eyes, I think we can re-introduce this text, but wonder if we need the last sentence; it's somewhat of a non-sequitor, AFAICT, since RFC1945 had Expires to determine an explicit expiration time, and anyway it should probably say "origin server," which as discussed before is sometimes difficult to tell, given the lack of Via support in many intermediaries.
>> 
>> I propose we address this by changing the beginning of 2.3.1.1 to:
>> 
>> """
>>    If no explicit expiration time is present in a stored response that
>>    has a status code of 200, 203, 206, 300, 301 or 410, a heuristic
>>    expiration time can be calculated.  Heuristics MUST NOT be used for
>>    other response status codes.
>> 
>>    Also, heuristic freshness MUST NOT be used for responses
>>    to requests with a query component, because
>>    some applications have traditionally used queries on URLs to
>>    perform operations with significant side effects.
>> 
>>    [ remaining paragraphs as in -09]
>> """
>> 
>> Thoughts?
> 
> Sounds good to me.
> 
> Maybe replace
> 
> "some applications have traditionally used queries on URLs to perform operations with significant side effects"
> 
> with
> 
> "some historic, non-compliant applications have implemented non-safe operations in this case"
> 
> (points being: what's in error is the server, and it always has been a compliance issue, no?)
> 
> Best regards, Julian
> 


--
Mark Nottingham     http://www.mnot.net/
Received on Friday, 21 May 2010 22:05:26 UTC