W3C home > Mailing lists > Public > ietf-http-wg@w3.org > April to June 2009

httpbis-p6-cache-06 and no-store response directive

From: Yngve N. Pettersen (Developer Opera Software ASA) <yngve@opera.com>
Date: Mon, 08 Jun 2009 03:08:59 +0200
To: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <op.uu6lg9vrqrq7tp@nimisha.oslo.opera.com>
Hello all,

When reading the p6-06 draft it seemed to me that the new phrasing seem to  
forbid client's own cache from using the received response again under any  
circumstance, which I think is slightly different from my interpretation  
of what RFC 2616 says.


RFC 2616 says:

   14.9.2

    no-store
       The purpose of the no-store directive is to prevent the
       inadvertent release or retention of sensitive information (for
       example, on backup tapes). The no-store directive applies to the
       entire message, and MAY be sent either in a response or in a
       request. If sent in a request, a cache MUST NOT store any part of
       either this request or any response to it. If sent in a response,
       a cache MUST NOT store any part of either this response or the
       request that elicited it. This directive applies to both non-
       shared and shared caches. "MUST NOT store" in this context means
       that the cache MUST NOT intentionally store the information in
       non-volatile storage, and MUST make a best-effort attempt to
       remove the information from volatile storage as promptly as
       possible after forwarding it.

       Even when this directive is associated with a response, users
       might explicitly store such a response outside of the caching
       system (e.g., with a "Save As" dialog). History buffers MAY store
       such responses as part of their normal operation.

       The purpose of this directive is to meet the stated requirements
       of certain users and service authors who are concerned about
       accidental releases of information via unanticipated accesses to
       cache data structures. While the use of this directive might
       improve privacy in some cases, we caution that it is NOT in any
       way a reliable or sufficient mechanism for ensuring privacy. In
       particular, malicious or compromised caches might not recognize or
       obey this directive, and communications networks might be
       vulnerable to eavesdropping.

p6-cache says:

   3.2.2

    no-store

       The no-store response directive indicates that a cache MUST NOT
       store any part of either the immediate request or response.  This
       directive applies to both non-shared and shared caches.  "MUST NOT
       store" in this context means that the cache MUST NOT intentionally
       store the information in non-volatile storage, and MUST make a
       best-effort attempt to remove the information from volatile
       storage as promptly as possible after forwarding it.

       This directive is NOT a reliable or sufficient mechanism for
       ensuring privacy.  In particular, malicious or compromised caches
       might not recognize or obey this directive, and communications
       networks may be vulnerable to eavesdropping.

To me it seems that the new phrasing seem to forbid the client's own cache  
 from using the received response even when the resource is referenced  
multiple time from the same document, which is common for some sites using  
small spacer images or other small icons, or by multiple documents, like  
style sheets and images. (The text also seems to have lost the history  
reference, though Sec. 4 may make up for that)

I agree that for proxies the requirement to discard immediately make sense.

But for client it is IMO not just a waste of bandwidth (particularly on  
performance restricted devices) to reload such resources multiple times,  
even for the same document, but it would probably require significant  
changes in how clients handle resources. It also essentially duplicates  
the no-cache directive in some respects about reuse, although it does go a  
little further ("must not reuse"). I'll remind you of sec 1.1 "Caching  
would be useless if it did not significantly improve performance", and the  
above text will significantly reduce performance in clients if implemented  
according to my current understanding of it, and IMO such a reduction is  
unnecessary even from a security perspective.

Opera's implementation of this directive since we implemented it has been  
"Do not store to filesystem, keep in RAM, discard quickly when it is no  
longer in use". Such resources are re-used just like any other resource in  
the cache that are not specially treated, like POST form results, and if  
necessary re-validated when expired. The only difference is that they are  
not written to the disk cache part of our caching system (this does not  
prevent virtual memory swapping from writing them to disk; other measures  
are being considered for that; but that problem apply to all use of these  
data, also for display).

Another aspect of this is that quite a lot of sites, as well as the  
default configuration of several Wiki packages, in my experience,  
automatically send the no-store directive, along with must-revalidate,  
even when there is no need for it.

A while back MAMA, our structural web search engine, see  
http://dev.opera.com/articles/view/mama/ , did a crawl of the Alexa top  
million sites and other sites, and while the crawl was still underway I  
asked for a list of sites using the no-store directive.

The resulting list contained ~300000 unique sites of over 4 million URLs  
scanned (total), of which ~50000 (5%) were on the Alexa list, some of them  
quite high on the list. As the scan was not complete, the actual numbers  
are probably higher.

Examples included these URLs (checked early April) :

    http://www.mediabox.fr/
    http://wiki.mediabox.fr/
    http://www.tayloryourevent.com/
    http://joomla-wiki.de/doku.php
    http://sourceforge.net/
    http://technorati.com/
    http://secondlife.com/
    http://www.alltheweb.com/
    http://babynames.com/
    http://broadwayworld.com/article/Photo_Coverage_reasons_to_be_pretty_Opening_Night_Celebration_20000101

As you will see, many of these are well known sites, and almost all of  
them are front pages, which are unlikely to be sensitive, or changing very  
frequently (as in: every few seconds or minutes, and that could be handled  
using no-cache).

A point about broadwayworld.com : Their *articles* are using

     Cache-Control: no-store, no-cache, must-revalidate, post-check=0,  
pre-check=0

while the *front* pages (which are the dynamic ones) doesn't.

Additionally, the default in PHP (at least my copy of v5.0.4) seems to be  
"Cache-Control: no-store, no-cache, must-revalidate, post-check=0,  
pre-check=0", and I seem to recall that the MoinMoin wiki had that, too,  
at least last year (but in that case I may be misremembering).

Given the extensive use of no-store in situations where it does not seem  
necessary, I have started wondering if Opera need to start ignoring the  
no-store header in non-HTTPS responses, just like we currently only accept  
must-revalidate (interpreted as re-validate on history navigation) only  
for HTTPS responses. No decision has been reached yet.


My recommendation is that the text describing no-store response directive  
is phrased so that all caches are forbidden from storing the response to  
non-volatile media, and clear away ASAP after use, (as it is currently  
phrased) and that caches that are not part of the client MUST NOT use the  
response in when responding to another request, while allowing *clients*  
to use their locally stored copy as long as it can according to other  
cache policies.


Looking forward, past http-bis, given the apparent amount of  
misunderstanding about the current cache directives (I receive regular  
questions from customers and bug reports claiming that no-cache means "do  
not use again", while it only means "revalidate each time you load the  
document") I am starting to reach the conclusion that no-cache, no-store  
and must-revalidate should be discarded and replaced with more descriptive  
names (which should includes the context of when they are to be used), for  
example, on-load-revalidate, sensitive-content-storage,  
on-navigate-revalidate, respectively, or words to that effect. If a  
must-not-reuse indication is needed, then it should also directly say so,  
e.g. single-use-response or unique-response.

Also, while only distantly related, as I've pointed out earlier, HTTP is  
currently missing a mechanism to let servers invalidate a group of cache  
entries, for example during logout.I have suggested such a cache context  
mechanism in a draft (the most recent version is currently expired, but I  
am planning to refresh it; the most recent version is available at  
http://my.opera.com/yngve/blog/2008/11/06/refreshed-internet-drafts) .

-- 
Sincerely,
Yngve N. Pettersen
 
********************************************************************
Senior Developer                     Email: yngve@opera.com
Opera Software ASA                   http://www.opera.com/
Phone:  +47 24 16 42 60              Fax:    +47 24 16 40 01
********************************************************************
Received on Monday, 8 June 2009 01:09:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:51:03 GMT