Re: policy-uri is slow from Mark Nottingham on 2011-04-19 (public-web-security@w3.org from April 2011)

From: Mark Nottingham <mnot@mnot.net>
Date: Tue, 19 Apr 2011 13:38:59 +1000
To: Aryeh Gregor <Simetrical+w3c@gmail.com>
Cc: Adam Barth <w3c@adambarth.com>, public-web-security@w3.org
Message-Id: <F6735E8D-7302-4EE1-A6FC-50A051051CC5@mnot.net>
On 18/04/2011, at 8:01 AM, Aryeh Gregor wrote:

> On Fri, Apr 15, 2011 at 7:51 PM, Mark Nottingham <mnot@mnot.net> wrote:
>> You seem to be saying that a site that's going through the effort of setting CSP headers for all of its resources will shy away from setting caching headers as well. Why?
> 
> Because the person setting up CSP isn't thinking about performance, or
> maybe even has little to no idea what they're doing and is just
> following a dodgy tutorial they found on some website.  It's not a
> good idea to make assumptions about authors' competence, if it's
> possible to design features that work as well and don't make such
> assumptions.

That may or may not be the case. Regardless, specifying a heuristic for client caches can avoid this problem.

>> What does revalidation have to do with cache eviction?
> 
> If revalidation is required on every hit, caching is of limited
> utility for small resources, because a 304 Not Modified response will
> take about as much effort as resending the entire resource.

Sure -- but that doesn't have anything to do with what you were responding to.

>> Please re-read the thread; the idea is to use a well-known URI, which is indeed known ahead of time.
> 
> But you still don't know *if* you need to fetch it, unless you plan to
> preemptively try fetching that URL for every site you visit -- which
> makes no sense unless you expect the vast majority of sites to use
> CSP.

That's the discussion that's happening elsewhere in-thread.

>>> But that sort of thing only makes sense if the resource is quite
>>> large, several kilobytes at least.  If we expect the large majority of
>>> CSP policies to be only a handful of bytes long, I really think
>>> linking to a policy will hurt performance in nearly all cases, not
>>> help it.
>> 
>> ... and I think will help.
> 
> It will *always* hurt on a page view will cold cache, to the tune of a
> full round-trip.  And that's a *lot* of views.  This research by
> Yahoo! from 2007 says that 40-60% of their users have an empty cache
> at least sometimes, and about 20% of all their page views have an
> empty cache: <http://www.yuiblog.com/blog/2007/01/04/performance-research-part-2/>
> That's an extra round-trip on all of those page views.

It adds a potential round-trip if you have to wait for the HTML response, yes. 

> By contrast, sending your CSP policy on every request will only
> increase network transfer time appreciably if it happens to cause a
> TCP window to fill up, so that the server has to wait for an
> acknowledgement from the client before proceeding.

You mean congestion window?

>  With current
> efforts to increase default window sizes by a large margin, plus the
> use of pipelining to try keeping large window sizes going, this is
> going to be fairly uncommon even if the policy is a kilobyte in
> length.

It's going to take a fair amount of time to get larger congestion windows rolled out, and that work is still somewhat controversial. Adding bytes to every response assuming that it will just work out -- in all cases (e.g., mobile) -- isn't a good assumption to make.

>  Surely not 25% of the time, which is what it would have to be
> to outweigh a 20% empty cache rate.

You're basing that on common browser cache behaviour in 2007. Lots has changed and will change.

> So why do you think the tradeoff will typically be worth it even
> *with* good caching headers, given how common it is to have a cold
> cache?  With a policy-uri over regular HTTP with a cold cache, it will
> take four round-trips before the user can start seeing resources
> included in the page instead of three.  (One for TCP handshake, one to
> start receiving the page, one to receive the policy, one to retrieve
> additional content.)
> 
>> If this is *really * a concern (and I don't think it is), one could specify that the policy file has a default cacheability of the browser session (for example). HTTP allows heuristics to be used to cache responses, and this is just one example of how we could do this for CSP.
> 
> Still doesn't help the cold cache case, which is very common.

Rather than arguing the details in circles, it seems that the crux here is a) how common CSP (and other policy that leverages CSP) will be, and b) how likely it is that the CSP mechanism's vocabulary will grow over time. If we can get a sense of the answers to those questions, the right thing to do should become clear.

If it's to be commonly deployed (e.g., as much as favicon), it's worth considering pre-emtively accessing a well-known URI as the initial request goes out. If it's going to be uncommon (i.e., most sites won't have CSP), it may not be worthwhile, but having the policy-uri mechanism (or similar) is still worthwhile, as some Web sites may want to use it, especially as the vocabulary grows. 

> If long policies are such an issue, you could do something along the
> lines of a 304 response for CSP.  So instead of policy-uri, allow a
> rule called policy-id, say.  It would take an arbitrary opaque string
> as its argument, which might be a number or timestamp or something in
> practice.  When the browser receives such a policy, it can cache it
> somewhere, remembering the site name and id.  When it next sends an
> HTTP request, it can include a header like
> Remembered-Content-Security-Policy: (insert id here).  When the server
> sees that, it can include a special placeholder like
> Content-Security-Policy: reuse-cached to instruct the browser to just
> reuse the old policy, or it can include a full policy if the id is
> out-of-date.

That seems really convoluted and non-optimal. If it's a normal URI with a normal response, I can reuse normal HTTP caching for it (e.g., reverse proxies, ISP proxies, mobile proxies, browser caches etc.). This mechanism reinvents the wheel.

Cheers,

--
Mark Nottingham   http://www.mnot.net/
Received on Tuesday, 19 April 2011 03:39:27 UTC