W3C home > Mailing lists > Public > ietf-http-wg@w3.org > April to June 2009

Re: httpbis-p6-cache-06 and no-store response directive

From: Mark Nottingham <mnot@mnot.net>
Date: Mon, 8 Jun 2009 12:09:16 +1000
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-Id: <FFAF5093-9BF1-4CBC-B8FC-B7A0141161A0@mnot.net>
To: Yngve N. Pettersen (Developer Opera Software ASA) <yngve@opera.com>
Hi Yngve,

I think the your question should be posed as: what compromises the  
cache subsystem?

In a browser, the HTML parser component would dispatch requests to a  
cache, which then either satisfies the requests or forwards them. If  
an HTML page has three references to <http://example.com/foo.gif>, for  
example, there are two ways of approaching this problem;

1) asking the cache for foo.gif once (perhaps by piggybacking the  
callbacks for subsequent images into the first instance), or

2) asking the cache for foo.gif three times.

In my reading, #1 is conformant even if the response contains no- 
store; technically (and probably mostly theoretically), #3 is not.

However, I don't think that matters much.

A much greater concern here is that HTTP authors need stable semantics  
for cache directives. By "creatively" interpreting them, or trying to  
guess what the authors intend based upon a survey of Web sites, a  
great disservice is done; authors will need to begin to second guess  
what cache vendors do when they encounter different directives. This  
already happens to a degree (although it's not as bad as it used to  
be, I think, and hopefully HTTPbis will improve the situation a bit  
more), but no-store has always been one of the more unambiguous  
directives.

Please don't dilute it. If sites choose to set it, the only rational  
choice is to assume that they want you to honour it; indeed, there may  
even be legal implications here (although IANAL). If it makes their  
site appear slower, well, they're reaping the benefits -- or  
detriments -- of what they've done.

As far as establishing new directives go, you're absolutely right to  
characterise this as something "beyond HTTPbis"; assuming you want to  
replace (rather than augment) the current directives, the only effort  
that could do this would be HTTP/2.0.

Cheers,


On 08/06/2009, at 11:08 AM, Yngve N. Pettersen (Developer Opera  
Software ASA) wrote:

> Hello all,
>
> When reading the p6-06 draft it seemed to me that the new phrasing  
> seem to forbid client's own cache from using the received response  
> again under any circumstance, which I think is slightly different  
> from my interpretation of what RFC 2616 says.
>
>
> RFC 2616 says:
>
>  14.9.2
>
>   no-store
>      The purpose of the no-store directive is to prevent the
>      inadvertent release or retention of sensitive information (for
>      example, on backup tapes). The no-store directive applies to the
>      entire message, and MAY be sent either in a response or in a
>      request. If sent in a request, a cache MUST NOT store any part of
>      either this request or any response to it. If sent in a response,
>      a cache MUST NOT store any part of either this response or the
>      request that elicited it. This directive applies to both non-
>      shared and shared caches. "MUST NOT store" in this context means
>      that the cache MUST NOT intentionally store the information in
>      non-volatile storage, and MUST make a best-effort attempt to
>      remove the information from volatile storage as promptly as
>      possible after forwarding it.
>
>      Even when this directive is associated with a response, users
>      might explicitly store such a response outside of the caching
>      system (e.g., with a "Save As" dialog). History buffers MAY store
>      such responses as part of their normal operation.
>
>      The purpose of this directive is to meet the stated requirements
>      of certain users and service authors who are concerned about
>      accidental releases of information via unanticipated accesses to
>      cache data structures. While the use of this directive might
>      improve privacy in some cases, we caution that it is NOT in any
>      way a reliable or sufficient mechanism for ensuring privacy. In
>      particular, malicious or compromised caches might not recognize  
> or
>      obey this directive, and communications networks might be
>      vulnerable to eavesdropping.
>
> p6-cache says:
>
>  3.2.2
>
>   no-store
>
>      The no-store response directive indicates that a cache MUST NOT
>      store any part of either the immediate request or response.  This
>      directive applies to both non-shared and shared caches.  "MUST  
> NOT
>      store" in this context means that the cache MUST NOT  
> intentionally
>      store the information in non-volatile storage, and MUST make a
>      best-effort attempt to remove the information from volatile
>      storage as promptly as possible after forwarding it.
>
>      This directive is NOT a reliable or sufficient mechanism for
>      ensuring privacy.  In particular, malicious or compromised caches
>      might not recognize or obey this directive, and communications
>      networks may be vulnerable to eavesdropping.
>
> To me it seems that the new phrasing seem to forbid the client's own  
> cache from using the received response even when the resource is  
> referenced multiple time from the same document, which is common for  
> some sites using small spacer images or other small icons, or by  
> multiple documents, like style sheets and images. (The text also  
> seems to have lost the history reference, though Sec. 4 may make up  
> for that)
>
> I agree that for proxies the requirement to discard immediately make  
> sense.
>
> But for client it is IMO not just a waste of bandwidth (particularly  
> on performance restricted devices) to reload such resources multiple  
> times, even for the same document, but it would probably require  
> significant changes in how clients handle resources. It also  
> essentially duplicates the no-cache directive in some respects about  
> reuse, although it does go a little further ("must not reuse"). I'll  
> remind you of sec 1.1 "Caching would be useless if it did not  
> significantly improve performance", and the above text will  
> significantly reduce performance in clients if implemented according  
> to my current understanding of it, and IMO such a reduction is  
> unnecessary even from a security perspective.
>
> Opera's implementation of this directive since we implemented it has  
> been "Do not store to filesystem, keep in RAM, discard quickly when  
> it is no longer in use". Such resources are re-used just like any  
> other resource in the cache that are not specially treated, like  
> POST form results, and if necessary re-validated when expired. The  
> only difference is that they are not written to the disk cache part  
> of our caching system (this does not prevent virtual memory swapping  
> from writing them to disk; other measures are being considered for  
> that; but that problem apply to all use of these data, also for  
> display).
>
> Another aspect of this is that quite a lot of sites, as well as the  
> default configuration of several Wiki packages, in my experience,  
> automatically send the no-store directive, along with must- 
> revalidate, even when there is no need for it.
>
> A while back MAMA, our structural web search engine, see http://dev.opera.com/articles/view/mama/ 
>  , did a crawl of the Alexa top million sites and other sites, and  
> while the crawl was still underway I asked for a list of sites using  
> the no-store directive.
>
> The resulting list contained ~300000 unique sites of over 4 million  
> URLs scanned (total), of which ~50000 (5%) were on the Alexa list,  
> some of them quite high on the list. As the scan was not complete,  
> the actual numbers are probably higher.
>
> Examples included these URLs (checked early April) :
>
>   http://www.mediabox.fr/
>   http://wiki.mediabox.fr/
>   http://www.tayloryourevent.com/
>   http://joomla-wiki.de/doku.php
>   http://sourceforge.net/
>   http://technorati.com/
>   http://secondlife.com/
>   http://www.alltheweb.com/
>   http://babynames.com/
>   http://broadwayworld.com/article/Photo_Coverage_reasons_to_be_pretty_Opening_Night_Celebration_20000101
>
> As you will see, many of these are well known sites, and almost all  
> of them are front pages, which are unlikely to be sensitive, or  
> changing very frequently (as in: every few seconds or minutes, and  
> that could be handled using no-cache).
>
> A point about broadwayworld.com : Their *articles* are using
>
>    Cache-Control: no-store, no-cache, must-revalidate, post-check=0,  
> pre-check=0
>
> while the *front* pages (which are the dynamic ones) doesn't.
>
> Additionally, the default in PHP (at least my copy of v5.0.4) seems  
> to be "Cache-Control: no-store, no-cache, must-revalidate, post- 
> check=0, pre-check=0", and I seem to recall that the MoinMoin wiki  
> had that, too, at least last year (but in that case I may be  
> misremembering).
>
> Given the extensive use of no-store in situations where it does not  
> seem necessary, I have started wondering if Opera need to start  
> ignoring the no-store header in non-HTTPS responses, just like we  
> currently only accept must-revalidate (interpreted as re-validate on  
> history navigation) only for HTTPS responses. No decision has been  
> reached yet.
>
>
> My recommendation is that the text describing no-store response  
> directive is phrased so that all caches are forbidden from storing  
> the response to non-volatile media, and clear away ASAP after use,  
> (as it is currently phrased) and that caches that are not part of  
> the client MUST NOT use the response in when responding to another  
> request, while allowing *clients* to use their locally stored copy  
> as long as it can according to other cache policies.
>
>
> Looking forward, past http-bis, given the apparent amount of  
> misunderstanding about the current cache directives (I receive  
> regular questions from customers and bug reports claiming that no- 
> cache means "do not use again", while it only means "revalidate each  
> time you load the document") I am starting to reach the conclusion  
> that no-cache, no-store and must-revalidate should be discarded and  
> replaced with more descriptive names (which should includes the  
> context of when they are to be used), for example, on-load- 
> revalidate, sensitive-content-storage, on-navigate-revalidate,  
> respectively, or words to that effect. If a must-not-reuse  
> indication is needed, then it should also directly say so, e.g.  
> single-use-response or unique-response.
>
> Also, while only distantly related, as I've pointed out earlier,  
> HTTP is currently missing a mechanism to let servers invalidate a  
> group of cache entries, for example during logout.I have suggested  
> such a cache context mechanism in a draft (the most recent version  
> is currently expired, but I am planning to refresh it; the most  
> recent version is available at http://my.opera.com/yngve/blog/2008/11/06/refreshed-internet-drafts) 
>  .
>
> -- 
> Sincerely,
> Yngve N. Pettersen
>
> ********************************************************************
> Senior Developer                     Email: yngve@opera.com
> Opera Software ASA                   http://www.opera.com/
> Phone:  +47 24 16 42 60              Fax:    +47 24 16 40 01
> ********************************************************************
>


--
Mark Nottingham     http://www.mnot.net/
Received on Monday, 8 June 2009 02:09:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:51:03 GMT