Re: httpbis-p6-cache-06 and no-store response directive

On Mon, 08 Jun 2009 04:09:16 +0200, Mark Nottingham <mnot@mnot.net> wrote:

> Hi Yngve,
>
> I think the your question should be posed as: what compromises the cache  
> subsystem?
>
> In a browser, the HTML parser component would dispatch requests to a  
> cache, which then either satisfies the requests or forwards them. If an  
> HTML page has three references to <http://example.com/foo.gif>, for  
> example, there are two ways of approaching this problem;
>
> 1) asking the cache for foo.gif once (perhaps by piggybacking the  
> callbacks for subsequent images into the first instance), or
>
> 2) asking the cache for foo.gif three times.

Mark, from my point of view, the document parsing code have to do #2  
before it can do #1, as #2 produces the index key used by #1.

> In my reading, #1 is conformant even if the response contains no-store;  
> technically (and probably mostly theoretically), #3 is not.
>
> However, I don't think that matters much.
>
> A much greater concern here is that HTTP authors need stable semantics  
> for cache directives. By "creatively" interpreting them, or trying to  
> guess what the authors intend based upon a survey of Web sites, a great  
> disservice is done; authors will need to begin to second guess what  
> cache vendors do when they encounter different directives. This already  
> happens to a degree (although it's not as bad as it used to be, I think,  
> and hopefully HTTPbis will improve the situation a bit more), but  
> no-store has always been one of the more unambiguous directives.
>
> Please don't dilute it. If sites choose to set it, the only rational  
> choice is to assume that they want you to honour it; indeed, there may

Assuming the author even chose it, and didn't have the choice made for  
them, as appears to be the case with PHP and several wikis.

I think it is necessary to distinguish between to uses of no-store (and  
must-revalidate): With unencrypted content, and with encrypted content.

As I said, using no-store with unencrypted content as an indication to  
proxies that they must not re-use the response makes sense, because it  
restricts distribution to other clients using the same proxy, but it does  
not IMO make sense to apply the same restriction to a client cache (or if  
you will, index) for either encrypted or unencrypted, as it will both  
cause a, possibly significant, performance reduction when accessing the  
website, and for unencrypted connections it does not confer any extra  
protection since the information is already sent in the clear, and  
client-side there are other mechanisms at work to prevent any information  
leaks that may be of concern.

My point, though, is that I think the no-store text in the updated text  
goes beyond what 2616 said.

> even be legal implications here (although IANAL). If it makes their site  
> appear slower, well, they're reaping the benefits -- or detriments -- of  
> what they've done.

As I said above: If they made the choice. In many cases I don't think they  
did more than select a development environment that made the choice for  
them, based on what is supposed to provide a "revalidate each time the  
user clicks on a link to this document"-functionality, that is, the same  
as "Cache-Control: max-age=0" and "no-cache".

> As far as establishing new directives go, you're absolutely right to  
> characterise this as something "beyond HTTPbis"; assuming you want to  
> replace (rather than augment) the current directives, the only effort  
> that could do this would be HTTP/2.0.
>
> Cheers,
>
>
> On 08/06/2009, at 11:08 AM, Yngve N. Pettersen (Developer Opera Software  
> ASA) wrote:
>
>> Hello all,
>>
>> When reading the p6-06 draft it seemed to me that the new phrasing seem  
>> to forbid client's own cache from using the received response again  
>> under any circumstance, which I think is slightly different from my  
>> interpretation of what RFC 2616 says.
>>
>>
>> RFC 2616 says:
>>
>>  14.9.2
>>
>>   no-store
>>      The purpose of the no-store directive is to prevent the
>>      inadvertent release or retention of sensitive information (for
>>      example, on backup tapes). The no-store directive applies to the
>>      entire message, and MAY be sent either in a response or in a
>>      request. If sent in a request, a cache MUST NOT store any part of
>>      either this request or any response to it. If sent in a response,
>>      a cache MUST NOT store any part of either this response or the
>>      request that elicited it. This directive applies to both non-
>>      shared and shared caches. "MUST NOT store" in this context means
>>      that the cache MUST NOT intentionally store the information in
>>      non-volatile storage, and MUST make a best-effort attempt to
>>      remove the information from volatile storage as promptly as
>>      possible after forwarding it.
>>
>>      Even when this directive is associated with a response, users
>>      might explicitly store such a response outside of the caching
>>      system (e.g., with a "Save As" dialog). History buffers MAY store
>>      such responses as part of their normal operation.
>>
>>      The purpose of this directive is to meet the stated requirements
>>      of certain users and service authors who are concerned about
>>      accidental releases of information via unanticipated accesses to
>>      cache data structures. While the use of this directive might
>>      improve privacy in some cases, we caution that it is NOT in any
>>      way a reliable or sufficient mechanism for ensuring privacy. In
>>      particular, malicious or compromised caches might not recognize or
>>      obey this directive, and communications networks might be
>>      vulnerable to eavesdropping.
>>
>> p6-cache says:
>>
>>  3.2.2
>>
>>   no-store
>>
>>      The no-store response directive indicates that a cache MUST NOT
>>      store any part of either the immediate request or response.  This
>>      directive applies to both non-shared and shared caches.  "MUST NOT
>>      store" in this context means that the cache MUST NOT intentionally
>>      store the information in non-volatile storage, and MUST make a
>>      best-effort attempt to remove the information from volatile
>>      storage as promptly as possible after forwarding it.
>>
>>      This directive is NOT a reliable or sufficient mechanism for
>>      ensuring privacy.  In particular, malicious or compromised caches
>>      might not recognize or obey this directive, and communications
>>      networks may be vulnerable to eavesdropping.
>>
>> To me it seems that the new phrasing seem to forbid the client's own  
>> cache from using the received response even when the resource is  
>> referenced multiple time from the same document, which is common for  
>> some sites using small spacer images or other small icons, or by  
>> multiple documents, like style sheets and images. (The text also seems  
>> to have lost the history reference, though Sec. 4 may make up for that)
>>
>> I agree that for proxies the requirement to discard immediately make  
>> sense.
>>
>> But for client it is IMO not just a waste of bandwidth (particularly on  
>> performance restricted devices) to reload such resources multiple  
>> times, even for the same document, but it would probably require  
>> significant changes in how clients handle resources. It also  
>> essentially duplicates the no-cache directive in some respects about  
>> reuse, although it does go a little further ("must not reuse"). I'll  
>> remind you of sec 1.1 "Caching would be useless if it did not  
>> significantly improve performance", and the above text will  
>> significantly reduce performance in clients if implemented according to  
>> my current understanding of it, and IMO such a reduction is unnecessary  
>> even from a security perspective.
>>
>> Opera's implementation of this directive since we implemented it has  
>> been "Do not store to filesystem, keep in RAM, discard quickly when it  
>> is no longer in use". Such resources are re-used just like any other  
>> resource in the cache that are not specially treated, like POST form  
>> results, and if necessary re-validated when expired. The only  
>> difference is that they are not written to the disk cache part of our  
>> caching system (this does not prevent virtual memory swapping from  
>> writing them to disk; other measures are being considered for that; but  
>> that problem apply to all use of these data, also for display).
>>
>> Another aspect of this is that quite a lot of sites, as well as the  
>> default configuration of several Wiki packages, in my experience,  
>> automatically send the no-store directive, along with must-revalidate,  
>> even when there is no need for it.
>>
>> A while back MAMA, our structural web search engine, see  
>> http://dev.opera.com/articles/view/mama/ , did a crawl of the Alexa top  
>> million sites and other sites, and while the crawl was still underway I  
>> asked for a list of sites using the no-store directive.
>>
>> The resulting list contained ~300000 unique sites of over 4 million  
>> URLs scanned (total), of which ~50000 (5%) were on the Alexa list, some  
>> of them quite high on the list. As the scan was not complete, the  
>> actual numbers are probably higher.
>>
>> Examples included these URLs (checked early April) :
>>
>>   http://www.mediabox.fr/
>>   http://wiki.mediabox.fr/
>>   http://www.tayloryourevent.com/
>>   http://joomla-wiki.de/doku.php
>>   http://sourceforge.net/
>>   http://technorati.com/
>>   http://secondlife.com/
>>   http://www.alltheweb.com/
>>   http://babynames.com/
>>   http://broadwayworld.com/article/Photo_Coverage_reasons_to_be_pretty_Opening_Night_Celebration_20000101
>>
>> As you will see, many of these are well known sites, and almost all of  
>> them are front pages, which are unlikely to be sensitive, or changing  
>> very frequently (as in: every few seconds or minutes, and that could be  
>> handled using no-cache).
>>
>> A point about broadwayworld.com : Their *articles* are using
>>
>>    Cache-Control: no-store, no-cache, must-revalidate, post-check=0,  
>> pre-check=0
>>
>> while the *front* pages (which are the dynamic ones) doesn't.
>>
>> Additionally, the default in PHP (at least my copy of v5.0.4) seems to  
>> be "Cache-Control: no-store, no-cache, must-revalidate, post-check=0,  
>> pre-check=0", and I seem to recall that the MoinMoin wiki had that,  
>> too, at least last year (but in that case I may be misremembering).
>>
>> Given the extensive use of no-store in situations where it does not  
>> seem necessary, I have started wondering if Opera need to start  
>> ignoring the no-store header in non-HTTPS responses, just like we  
>> currently only accept must-revalidate (interpreted as re-validate on  
>> history navigation) only for HTTPS responses. No decision has been  
>> reached yet.
>>
>>
>> My recommendation is that the text describing no-store response  
>> directive is phrased so that all caches are forbidden from storing the  
>> response to non-volatile media, and clear away ASAP after use, (as it  
>> is currently phrased) and that caches that are not part of the client  
>> MUST NOT use the response in when responding to another request, while  
>> allowing *clients* to use their locally stored copy as long as it can  
>> according to other cache policies.
>>
>>
>> Looking forward, past http-bis, given the apparent amount of  
>> misunderstanding about the current cache directives (I receive regular  
>> questions from customers and bug reports claiming that no-cache means  
>> "do not use again", while it only means "revalidate each time you load  
>> the document") I am starting to reach the conclusion that no-cache,  
>> no-store and must-revalidate should be discarded and replaced with more  
>> descriptive names (which should includes the context of when they are  
>> to be used), for example, on-load-revalidate,  
>> sensitive-content-storage, on-navigate-revalidate, respectively, or  
>> words to that effect. If a must-not-reuse indication is needed, then it  
>> should also directly say so, e.g. single-use-response or  
>> unique-response.
>>
>> Also, while only distantly related, as I've pointed out earlier, HTTP  
>> is currently missing a mechanism to let servers invalidate a group of  
>> cache entries, for example during logout.I have suggested such a cache  
>> context mechanism in a draft (the most recent version is currently  
>> expired, but I am planning to refresh it; the most recent version is  
>> available at  
>> http://my.opera.com/yngve/blog/2008/11/06/refreshed-internet-drafts) .
>>
>> --Sincerely,
>> Yngve N. Pettersen
>>
>> ********************************************************************
>> Senior Developer                     Email: yngve@opera.com
>> Opera Software ASA                   http://www.opera.com/
>> Phone:  +47 24 16 42 60              Fax:    +47 24 16 40 01
>> ********************************************************************
>>
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>



-- 
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer		     Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 24 16 42 60              Fax:    +47 24 16 40 01
********************************************************************

Received on Monday, 15 June 2009 14:43:16 UTC