- From: Adrien de Croy <adrien@qbik.com>
- Date: Mon, 08 Jun 2009 15:01:55 +1200
- To: Mark Nottingham <mnot@mnot.net>
- CC: "Yngve N. Pettersen (Developer Opera Software ASA)" <yngve@opera.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
There is possibly the question about what to do with seemingly
conflicting cache directives.
Combining no-store with any other directive that implies the result can
be stored is therefore a contradiction.
If you can't store something, how can you revalidate it? So any sort of
revalidation directives imply the result may be stored.
The safe option is to honour the no-store, however that dishonours the
other directives.
A quick search throws up a few sites expounding the virtues of setting
no-store, no-cache, pre-check=0 etc on all responses. Even though
pre-check seems to be an IE non-standard extension.
Perhaps some wording about which ones should be ignored when they
conflict could be useful in the spec.
Regards
Adrien
Mark Nottingham wrote:
> Hi Yngve,
>
> I think the your question should be posed as: what compromises the
> cache subsystem?
>
> In a browser, the HTML parser component would dispatch requests to a
> cache, which then either satisfies the requests or forwards them. If
> an HTML page has three references to <http://example.com/foo.gif>, for
> example, there are two ways of approaching this problem;
>
> 1) asking the cache for foo.gif once (perhaps by piggybacking the
> callbacks for subsequent images into the first instance), or
>
> 2) asking the cache for foo.gif three times.
>
> In my reading, #1 is conformant even if the response contains
> no-store; technically (and probably mostly theoretically), #3 is not.
>
> However, I don't think that matters much.
>
> A much greater concern here is that HTTP authors need stable semantics
> for cache directives. By "creatively" interpreting them, or trying to
> guess what the authors intend based upon a survey of Web sites, a
> great disservice is done; authors will need to begin to second guess
> what cache vendors do when they encounter different directives. This
> already happens to a degree (although it's not as bad as it used to
> be, I think, and hopefully HTTPbis will improve the situation a bit
> more), but no-store has always been one of the more unambiguous
> directives.
>
> Please don't dilute it. If sites choose to set it, the only rational
> choice is to assume that they want you to honour it; indeed, there may
> even be legal implications here (although IANAL). If it makes their
> site appear slower, well, they're reaping the benefits -- or
> detriments -- of what they've done.
>
> As far as establishing new directives go, you're absolutely right to
> characterise this as something "beyond HTTPbis"; assuming you want to
> replace (rather than augment) the current directives, the only effort
> that could do this would be HTTP/2.0.
>
> Cheers,
>
>
> On 08/06/2009, at 11:08 AM, Yngve N. Pettersen (Developer Opera
> Software ASA) wrote:
>
>> Hello all,
>>
>> When reading the p6-06 draft it seemed to me that the new phrasing
>> seem to forbid client's own cache from using the received response
>> again under any circumstance, which I think is slightly different
>> from my interpretation of what RFC 2616 says.
>>
>>
>> RFC 2616 says:
>>
>> 14.9.2
>>
>> no-store
>> The purpose of the no-store directive is to prevent the
>> inadvertent release or retention of sensitive information (for
>> example, on backup tapes). The no-store directive applies to the
>> entire message, and MAY be sent either in a response or in a
>> request. If sent in a request, a cache MUST NOT store any part of
>> either this request or any response to it. If sent in a response,
>> a cache MUST NOT store any part of either this response or the
>> request that elicited it. This directive applies to both non-
>> shared and shared caches. "MUST NOT store" in this context means
>> that the cache MUST NOT intentionally store the information in
>> non-volatile storage, and MUST make a best-effort attempt to
>> remove the information from volatile storage as promptly as
>> possible after forwarding it.
>>
>> Even when this directive is associated with a response, users
>> might explicitly store such a response outside of the caching
>> system (e.g., with a "Save As" dialog). History buffers MAY store
>> such responses as part of their normal operation.
>>
>> The purpose of this directive is to meet the stated requirements
>> of certain users and service authors who are concerned about
>> accidental releases of information via unanticipated accesses to
>> cache data structures. While the use of this directive might
>> improve privacy in some cases, we caution that it is NOT in any
>> way a reliable or sufficient mechanism for ensuring privacy. In
>> particular, malicious or compromised caches might not recognize or
>> obey this directive, and communications networks might be
>> vulnerable to eavesdropping.
>>
>> p6-cache says:
>>
>> 3.2.2
>>
>> no-store
>>
>> The no-store response directive indicates that a cache MUST NOT
>> store any part of either the immediate request or response. This
>> directive applies to both non-shared and shared caches. "MUST NOT
>> store" in this context means that the cache MUST NOT intentionally
>> store the information in non-volatile storage, and MUST make a
>> best-effort attempt to remove the information from volatile
>> storage as promptly as possible after forwarding it.
>>
>> This directive is NOT a reliable or sufficient mechanism for
>> ensuring privacy. In particular, malicious or compromised caches
>> might not recognize or obey this directive, and communications
>> networks may be vulnerable to eavesdropping.
>>
>> To me it seems that the new phrasing seem to forbid the client's own
>> cache from using the received response even when the resource is
>> referenced multiple time from the same document, which is common for
>> some sites using small spacer images or other small icons, or by
>> multiple documents, like style sheets and images. (The text also
>> seems to have lost the history reference, though Sec. 4 may make up
>> for that)
>>
>> I agree that for proxies the requirement to discard immediately make
>> sense.
>>
>> But for client it is IMO not just a waste of bandwidth (particularly
>> on performance restricted devices) to reload such resources multiple
>> times, even for the same document, but it would probably require
>> significant changes in how clients handle resources. It also
>> essentially duplicates the no-cache directive in some respects about
>> reuse, although it does go a little further ("must not reuse"). I'll
>> remind you of sec 1.1 "Caching would be useless if it did not
>> significantly improve performance", and the above text will
>> significantly reduce performance in clients if implemented according
>> to my current understanding of it, and IMO such a reduction is
>> unnecessary even from a security perspective.
>>
>> Opera's implementation of this directive since we implemented it has
>> been "Do not store to filesystem, keep in RAM, discard quickly when
>> it is no longer in use". Such resources are re-used just like any
>> other resource in the cache that are not specially treated, like POST
>> form results, and if necessary re-validated when expired. The only
>> difference is that they are not written to the disk cache part of our
>> caching system (this does not prevent virtual memory swapping from
>> writing them to disk; other measures are being considered for that;
>> but that problem apply to all use of these data, also for display).
>>
>> Another aspect of this is that quite a lot of sites, as well as the
>> default configuration of several Wiki packages, in my experience,
>> automatically send the no-store directive, along with
>> must-revalidate, even when there is no need for it.
>>
>> A while back MAMA, our structural web search engine, see
>> http://dev.opera.com/articles/view/mama/ , did a crawl of the Alexa
>> top million sites and other sites, and while the crawl was still
>> underway I asked for a list of sites using the no-store directive.
>>
>> The resulting list contained ~300000 unique sites of over 4 million
>> URLs scanned (total), of which ~50000 (5%) were on the Alexa list,
>> some of them quite high on the list. As the scan was not complete,
>> the actual numbers are probably higher.
>>
>> Examples included these URLs (checked early April) :
>>
>> http://www.mediabox.fr/
>> http://wiki.mediabox.fr/
>> http://www.tayloryourevent.com/
>> http://joomla-wiki.de/doku.php
>> http://sourceforge.net/
>> http://technorati.com/
>> http://secondlife.com/
>> http://www.alltheweb.com/
>> http://babynames.com/
>>
>> http://broadwayworld.com/article/Photo_Coverage_reasons_to_be_pretty_Opening_Night_Celebration_20000101
>>
>>
>> As you will see, many of these are well known sites, and almost all
>> of them are front pages, which are unlikely to be sensitive, or
>> changing very frequently (as in: every few seconds or minutes, and
>> that could be handled using no-cache).
>>
>> A point about broadwayworld.com : Their *articles* are using
>>
>> Cache-Control: no-store, no-cache, must-revalidate, post-check=0,
>> pre-check=0
>>
>> while the *front* pages (which are the dynamic ones) doesn't.
>>
>> Additionally, the default in PHP (at least my copy of v5.0.4) seems
>> to be "Cache-Control: no-store, no-cache, must-revalidate,
>> post-check=0, pre-check=0", and I seem to recall that the MoinMoin
>> wiki had that, too, at least last year (but in that case I may be
>> misremembering).
>>
>> Given the extensive use of no-store in situations where it does not
>> seem necessary, I have started wondering if Opera need to start
>> ignoring the no-store header in non-HTTPS responses, just like we
>> currently only accept must-revalidate (interpreted as re-validate on
>> history navigation) only for HTTPS responses. No decision has been
>> reached yet.
>>
>>
>> My recommendation is that the text describing no-store response
>> directive is phrased so that all caches are forbidden from storing
>> the response to non-volatile media, and clear away ASAP after use,
>> (as it is currently phrased) and that caches that are not part of the
>> client MUST NOT use the response in when responding to another
>> request, while allowing *clients* to use their locally stored copy as
>> long as it can according to other cache policies.
>>
>>
>> Looking forward, past http-bis, given the apparent amount of
>> misunderstanding about the current cache directives (I receive
>> regular questions from customers and bug reports claiming that
>> no-cache means "do not use again", while it only means "revalidate
>> each time you load the document") I am starting to reach the
>> conclusion that no-cache, no-store and must-revalidate should be
>> discarded and replaced with more descriptive names (which should
>> includes the context of when they are to be used), for example,
>> on-load-revalidate, sensitive-content-storage,
>> on-navigate-revalidate, respectively, or words to that effect. If a
>> must-not-reuse indication is needed, then it should also directly say
>> so, e.g. single-use-response or unique-response.
>>
>> Also, while only distantly related, as I've pointed out earlier, HTTP
>> is currently missing a mechanism to let servers invalidate a group of
>> cache entries, for example during logout.I have suggested such a
>> cache context mechanism in a draft (the most recent version is
>> currently expired, but I am planning to refresh it; the most recent
>> version is available at
>> http://my.opera.com/yngve/blog/2008/11/06/refreshed-internet-drafts) .
>>
>> --
>> Sincerely,
>> Yngve N. Pettersen
>>
>> ********************************************************************
>> Senior Developer Email: yngve@opera.com
>> Opera Software ASA http://www.opera.com/
>> Phone: +47 24 16 42 60 Fax: +47 24 16 40 01
>> ********************************************************************
>>
>
>
> --
> Mark Nottingham http://www.mnot.net/
>
>
--
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Monday, 8 June 2009 02:59:21 UTC