- From: Adrien de Croy <adrien@qbik.com>
- Date: Mon, 08 Jun 2009 15:01:55 +1200
- To: Mark Nottingham <mnot@mnot.net>
- CC: "Yngve N. Pettersen (Developer Opera Software ASA)" <yngve@opera.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
There is possibly the question about what to do with seemingly conflicting cache directives. Combining no-store with any other directive that implies the result can be stored is therefore a contradiction. If you can't store something, how can you revalidate it? So any sort of revalidation directives imply the result may be stored. The safe option is to honour the no-store, however that dishonours the other directives. A quick search throws up a few sites expounding the virtues of setting no-store, no-cache, pre-check=0 etc on all responses. Even though pre-check seems to be an IE non-standard extension. Perhaps some wording about which ones should be ignored when they conflict could be useful in the spec. Regards Adrien Mark Nottingham wrote: > Hi Yngve, > > I think the your question should be posed as: what compromises the > cache subsystem? > > In a browser, the HTML parser component would dispatch requests to a > cache, which then either satisfies the requests or forwards them. If > an HTML page has three references to <http://example.com/foo.gif>, for > example, there are two ways of approaching this problem; > > 1) asking the cache for foo.gif once (perhaps by piggybacking the > callbacks for subsequent images into the first instance), or > > 2) asking the cache for foo.gif three times. > > In my reading, #1 is conformant even if the response contains > no-store; technically (and probably mostly theoretically), #3 is not. > > However, I don't think that matters much. > > A much greater concern here is that HTTP authors need stable semantics > for cache directives. By "creatively" interpreting them, or trying to > guess what the authors intend based upon a survey of Web sites, a > great disservice is done; authors will need to begin to second guess > what cache vendors do when they encounter different directives. This > already happens to a degree (although it's not as bad as it used to > be, I think, and hopefully HTTPbis will improve the situation a bit > more), but no-store has always been one of the more unambiguous > directives. > > Please don't dilute it. If sites choose to set it, the only rational > choice is to assume that they want you to honour it; indeed, there may > even be legal implications here (although IANAL). If it makes their > site appear slower, well, they're reaping the benefits -- or > detriments -- of what they've done. > > As far as establishing new directives go, you're absolutely right to > characterise this as something "beyond HTTPbis"; assuming you want to > replace (rather than augment) the current directives, the only effort > that could do this would be HTTP/2.0. > > Cheers, > > > On 08/06/2009, at 11:08 AM, Yngve N. Pettersen (Developer Opera > Software ASA) wrote: > >> Hello all, >> >> When reading the p6-06 draft it seemed to me that the new phrasing >> seem to forbid client's own cache from using the received response >> again under any circumstance, which I think is slightly different >> from my interpretation of what RFC 2616 says. >> >> >> RFC 2616 says: >> >> 14.9.2 >> >> no-store >> The purpose of the no-store directive is to prevent the >> inadvertent release or retention of sensitive information (for >> example, on backup tapes). The no-store directive applies to the >> entire message, and MAY be sent either in a response or in a >> request. If sent in a request, a cache MUST NOT store any part of >> either this request or any response to it. If sent in a response, >> a cache MUST NOT store any part of either this response or the >> request that elicited it. This directive applies to both non- >> shared and shared caches. "MUST NOT store" in this context means >> that the cache MUST NOT intentionally store the information in >> non-volatile storage, and MUST make a best-effort attempt to >> remove the information from volatile storage as promptly as >> possible after forwarding it. >> >> Even when this directive is associated with a response, users >> might explicitly store such a response outside of the caching >> system (e.g., with a "Save As" dialog). History buffers MAY store >> such responses as part of their normal operation. >> >> The purpose of this directive is to meet the stated requirements >> of certain users and service authors who are concerned about >> accidental releases of information via unanticipated accesses to >> cache data structures. While the use of this directive might >> improve privacy in some cases, we caution that it is NOT in any >> way a reliable or sufficient mechanism for ensuring privacy. In >> particular, malicious or compromised caches might not recognize or >> obey this directive, and communications networks might be >> vulnerable to eavesdropping. >> >> p6-cache says: >> >> 3.2.2 >> >> no-store >> >> The no-store response directive indicates that a cache MUST NOT >> store any part of either the immediate request or response. This >> directive applies to both non-shared and shared caches. "MUST NOT >> store" in this context means that the cache MUST NOT intentionally >> store the information in non-volatile storage, and MUST make a >> best-effort attempt to remove the information from volatile >> storage as promptly as possible after forwarding it. >> >> This directive is NOT a reliable or sufficient mechanism for >> ensuring privacy. In particular, malicious or compromised caches >> might not recognize or obey this directive, and communications >> networks may be vulnerable to eavesdropping. >> >> To me it seems that the new phrasing seem to forbid the client's own >> cache from using the received response even when the resource is >> referenced multiple time from the same document, which is common for >> some sites using small spacer images or other small icons, or by >> multiple documents, like style sheets and images. (The text also >> seems to have lost the history reference, though Sec. 4 may make up >> for that) >> >> I agree that for proxies the requirement to discard immediately make >> sense. >> >> But for client it is IMO not just a waste of bandwidth (particularly >> on performance restricted devices) to reload such resources multiple >> times, even for the same document, but it would probably require >> significant changes in how clients handle resources. It also >> essentially duplicates the no-cache directive in some respects about >> reuse, although it does go a little further ("must not reuse"). I'll >> remind you of sec 1.1 "Caching would be useless if it did not >> significantly improve performance", and the above text will >> significantly reduce performance in clients if implemented according >> to my current understanding of it, and IMO such a reduction is >> unnecessary even from a security perspective. >> >> Opera's implementation of this directive since we implemented it has >> been "Do not store to filesystem, keep in RAM, discard quickly when >> it is no longer in use". Such resources are re-used just like any >> other resource in the cache that are not specially treated, like POST >> form results, and if necessary re-validated when expired. The only >> difference is that they are not written to the disk cache part of our >> caching system (this does not prevent virtual memory swapping from >> writing them to disk; other measures are being considered for that; >> but that problem apply to all use of these data, also for display). >> >> Another aspect of this is that quite a lot of sites, as well as the >> default configuration of several Wiki packages, in my experience, >> automatically send the no-store directive, along with >> must-revalidate, even when there is no need for it. >> >> A while back MAMA, our structural web search engine, see >> http://dev.opera.com/articles/view/mama/ , did a crawl of the Alexa >> top million sites and other sites, and while the crawl was still >> underway I asked for a list of sites using the no-store directive. >> >> The resulting list contained ~300000 unique sites of over 4 million >> URLs scanned (total), of which ~50000 (5%) were on the Alexa list, >> some of them quite high on the list. As the scan was not complete, >> the actual numbers are probably higher. >> >> Examples included these URLs (checked early April) : >> >> http://www.mediabox.fr/ >> http://wiki.mediabox.fr/ >> http://www.tayloryourevent.com/ >> http://joomla-wiki.de/doku.php >> http://sourceforge.net/ >> http://technorati.com/ >> http://secondlife.com/ >> http://www.alltheweb.com/ >> http://babynames.com/ >> >> http://broadwayworld.com/article/Photo_Coverage_reasons_to_be_pretty_Opening_Night_Celebration_20000101 >> >> >> As you will see, many of these are well known sites, and almost all >> of them are front pages, which are unlikely to be sensitive, or >> changing very frequently (as in: every few seconds or minutes, and >> that could be handled using no-cache). >> >> A point about broadwayworld.com : Their *articles* are using >> >> Cache-Control: no-store, no-cache, must-revalidate, post-check=0, >> pre-check=0 >> >> while the *front* pages (which are the dynamic ones) doesn't. >> >> Additionally, the default in PHP (at least my copy of v5.0.4) seems >> to be "Cache-Control: no-store, no-cache, must-revalidate, >> post-check=0, pre-check=0", and I seem to recall that the MoinMoin >> wiki had that, too, at least last year (but in that case I may be >> misremembering). >> >> Given the extensive use of no-store in situations where it does not >> seem necessary, I have started wondering if Opera need to start >> ignoring the no-store header in non-HTTPS responses, just like we >> currently only accept must-revalidate (interpreted as re-validate on >> history navigation) only for HTTPS responses. No decision has been >> reached yet. >> >> >> My recommendation is that the text describing no-store response >> directive is phrased so that all caches are forbidden from storing >> the response to non-volatile media, and clear away ASAP after use, >> (as it is currently phrased) and that caches that are not part of the >> client MUST NOT use the response in when responding to another >> request, while allowing *clients* to use their locally stored copy as >> long as it can according to other cache policies. >> >> >> Looking forward, past http-bis, given the apparent amount of >> misunderstanding about the current cache directives (I receive >> regular questions from customers and bug reports claiming that >> no-cache means "do not use again", while it only means "revalidate >> each time you load the document") I am starting to reach the >> conclusion that no-cache, no-store and must-revalidate should be >> discarded and replaced with more descriptive names (which should >> includes the context of when they are to be used), for example, >> on-load-revalidate, sensitive-content-storage, >> on-navigate-revalidate, respectively, or words to that effect. If a >> must-not-reuse indication is needed, then it should also directly say >> so, e.g. single-use-response or unique-response. >> >> Also, while only distantly related, as I've pointed out earlier, HTTP >> is currently missing a mechanism to let servers invalidate a group of >> cache entries, for example during logout.I have suggested such a >> cache context mechanism in a draft (the most recent version is >> currently expired, but I am planning to refresh it; the most recent >> version is available at >> http://my.opera.com/yngve/blog/2008/11/06/refreshed-internet-drafts) . >> >> -- >> Sincerely, >> Yngve N. Pettersen >> >> ******************************************************************** >> Senior Developer Email: yngve@opera.com >> Opera Software ASA http://www.opera.com/ >> Phone: +47 24 16 42 60 Fax: +47 24 16 40 01 >> ******************************************************************** >> > > > -- > Mark Nottingham http://www.mnot.net/ > > -- Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Monday, 8 June 2009 02:59:21 UTC