Re: Header size and policy delivery from Martin Thomson on 2016-01-07 (public-webappsec@w3.org from January 2016)

From: Martin Thomson <martin.thomson@gmail.com>
Date: Fri, 8 Jan 2016 09:38:11 +1100
To: Patrick Toomey <patrick.toomey@github.com>
Cc: Jonathan Kingston <jonathan@jooped.co.uk>, WebAppSec WG <public-webappsec@w3.org>
Message-ID: <CABkgnnWyYUrJUnwqmMCJY1HmXaA6Qn6kd5wDCwS0Z7KXoXefnQ@mail.gmail.com>
OK, both baseline and source dictionary, got it.

On 8 January 2016 at 02:41, Patrick Toomey <patrick.toomey@github.com> wrote:
> I don't necessarily see the "baseline" CSP being orthogonal to "sources
> lists that chew up all the bytes". Yes, we have a fair number of directives
> (and growing each time directive support is added to browsers) and each of
> those directives may contain a fair number of sources. However, the baseline
> could encompass both aspects. For example, here is our current policy:
>
> default-src *; base-uri 'self'; connect-src 'self' live.github.com
> wss://live.github.com uploads.github.com status.github.com api.github.com
> www.google-analytics.com api.braintreegateway.com
> client-analytics.braintreegateway.com github-cloud.s3.amazonaws.com;
> font-src assets-cdn.github.com; form-action 'self' github.com
> gist.github.com; frame-src 'self' render.githubusercontent.com
> gist.github.com checkout.paypal.com; img-src 'self' data:
> assets-cdn.github.com identicons.github.com www.google-analytics.com
> checkout.paypal.com collector.githubapp.com *.githubusercontent.com
> *.gravatar.com *.wp.com; media-src 'none'; object-src assets-cdn.github.com;
> script-src assets-cdn.github.com; style-src 'self' 'unsafe-inline'
> 'unsafe-eval' assets-cdn.github.com
>
> Of the above, here is the part that I'd anticipate we would leave mostly
> static across pages:
>
> default-src *; base-uri 'self'; font-src assets-cdn.github.com; script-src
> assets-cdn.github.com; style-src 'self' 'unsafe-inline' 'unsafe-eval'
> assets-cdn.github.com; img-src 'self' data: assets-cdn.github.com
> identicons.github.com www.google-analytics.com checkout.paypal.com
> collector.githubapp.com *.githubusercontent.com *.gravatar.com *.wp.com;
> media-src 'none'; object-src assets-cdn.github.com; connect-src 'self'
> live.github.com wss://live.github.com status.github.com api.github.com
> www.google-analytics.com; frame-src 'self' render.githubusercontent.com
> gist.github.com;
>
> And here are the directives I foresee us wanting to customize in the future
> based on the specific request:
>
> connect-src
> form-action
>
> So, it would be nice to have a manifest that allows us to store/reference
> the base policy (directives plus source lists) and have some means of
> customizing/overriding the policy for specific pages. I was thinking of
> something like this:
>
> CSP manifest:
>
> baseline-policy default-src *; base-uri 'self'; font-src
> assets-cdn.github.com; script-src assets-cdn.github.com; style-src
> csp-manifest-baseline-style-srcs; img-src csp-manifest-baseline-img-srcs;
> connect-src csp-manifest-baseline-connect-srcs; frame-src
> csp-manifest-baseline-connect-srcs
>
> baseline-style-srcs 'self' 'unsafe-inline' 'unsafe-eval'
> assets-cdn.github.com
>
> baseline-img-srcs 'self' data: assets-cdn.github.com identicons.github.com
> www.google-analytics.com checkout.paypal.com collector.githubapp.com
> *.githubusercontent.com *.gravatar.com *.wp.com; media-src 'none'
>
> baseline-connect-srcs 'self' live.github.com wss://live.github.com
> status.github.com api.github.com www.google-analytics.com
>
> baseline-frame-src 'self' render.githubusercontent.com gist.github.com;
>
> Then, the CSP header in a typical non-customized response would look
> something like:
>
> Content-Security-Policy: default-policy csp-manifest-baseline-policy (maybe
> some sort of hash makes sense too)
>
> And, let's say we want to customize a specific page to allow an additional
> connect-src. We could do that by overriding connect-src specifically:
>
> Content-Security-Policy: default-policy csp-manifest-baseline-policy;
> connect-src csp-manifest-baseline-connect-srcs uploads.github.com
>
> That is what I had in my head in very broad strokes. For the bulk of our
> requests that would take our CSP header size from 759 bytes to 40-50 bytes.
> Assuming we customize form-action on each page, we would still only be
> looking at something closer to 100 bytes. And, I anticipate that the
> majority of CSP additions would be added to our baseline policy. So, the
> growth of the actual header should be nominal.
>
>
>
> On Thu, Jan 7, 2016 at 1:21 AM Martin Thomson <martin.thomson@gmail.com>
> wrote:
>>
>> That might work.  How much do you think that you would benefit from a
>> "baseline" CSP policy in that document?  That is, rules that were
>> universal, or is it source lists that chew up all the bytes?
>>
>> On 7 January 2016 at 16:08, Patrick Toomey <patrick.toomey@github.com>
>> wrote:
>> > We have started customizing our policy per endpoint and have plans to do
>> > so
>> > even more in the future. It feels like "CSP as a resource" would be a
>> > bit
>> > tricker if one customized their policy per response (maybe I missed
>> > someone
>> > already addressing this concern). If I look at our CSP policy (or the
>> > Twitter one someone showed in the original thread), the bulk of the size
>> > is
>> > taken up with various source lists. What if the cacheable CSP resource
>> > was
>> > mostly used to provide a place to collect/label/cache sets of these
>> > values.
>> > For example, rather than having to send down something like "connect-src
>> > 'self' foo.com bar.com foobar.com" on each response, it could be a
>> > reference
>> > to a "source set" from the cacheable resource. So, the policy would be
>> > more
>> > like "connect-src csp-manifest-my-connect-srcs", where "my-connect-srcs"
>> > would be a labeled set of sources from the cached CSP resource. I guess
>> > there is inevitably a point where a sufficient number of CSP directives
>> > overwhelms the header, but maybe there is a way to handle that too. I
>> > haven't thought about it much, but maybe one could also use the CSP
>> > resource
>> > to collect/label/cache collections of commonly used directives too. Even
>> > though we customize our policy per response, the majority of the policy
>> > stays the same. So, there could be a reference in the CSP header
>> > response
>> > that pulls in a collection of directives that you intend to have on any
>> > page...something like "base-policy csp-manifest-my-base-policy", where
>> > "my-base-policy" would have the parts of your CSP policy that don't
>> > really
>> > change across the site.
>> > On Wed, Jan 6, 2016 at 9:31 PM Martin Thomson <martin.thomson@gmail.com>
>> > wrote:
>> >>
>> >> A CSP resource sounds appealing, but I'm not sure about the latency
>> >> situation: are people OK with the notion that this is a separate
>> >> fetch?  We could use HTTP/2 server push to address the latency
>> >> problem.
>> >>
>> >> On 7 January 2016 at 13:48, Jonathan Kingston <jonathan@jooped.co.uk>
>> >> wrote:
>> >> > Creating a new tread for discussion of a solution to header bloat
>> >> > size
>> >> > if at
>> >> > all possible.
>> >> >
>> >> > Also relevant is:
>> >> >
>> >> > https://lists.w3.org/Archives/Public/public-webappsec/2015Mar/0148.html
>> >> >
>> >> > Taken out from the discussion in: [CSP] "sri" source expression to
>> >> > enforce
>> >> > SRI
>> >> >
>> >> > On Tue, Jan 5, 2016 at 1:59 AM Nottingham, Mark <mnotting@akamai.com>
>> >> > wrote:
>> >> >>
>> >> >> Catching up after holidays -- I've been wanting to talk about this.
>> >> >>
>> >> >> In HTTP/2, the default of SETTINGS_HEADER_TABLE_SIZE is 4k.
>> >> >>
>> >> >> From what I've seen, Chrome and Firefox both stick with the default.
>> >> >>
>> >> >> While 4k of header compression context can help performance
>> >> >> considerably,
>> >> >> it's important to understand that HPACK's compression scheme is
>> >> >> coarse-grained, so when the encoder is faced with a large header, it
>> >> >> has to
>> >> >> choose between putting it into the dynamic table -- thereby denying
>> >> >> use
>> >> >> of
>> >> >> that space to other headers -- or repeatedly putting it out onto the
>> >> >> wire.
>> >> >>
>> >> >> For example, Twitter's response headers already get close to this
>> >> >> limit,
>> >> >> mostly thanks to CSP:
>> >> >> https://redbot.org/?id=w5yLyD
>> >> >>
>> >> >> Their server has to choose between putting that ~3K CSP header into
>> >> >> the
>> >> >> dynamic table, leaving them only about 1k to play with for other
>> >> >> headers per
>> >> >> connection, or leave it out, and send it verbatim on EVERY response.
>> >> >> They'll
>> >> >> get small benefit from static Huffman coding (which reduces the
>> >> >> numbers
>> >> >> above a bit), but that's it.
>> >> >>
>> >> >> If a single header value exceeds SETTINGS_HEADER_TABLE_SIZE, it
>> >> >> can't
>> >> >> be
>> >> >> encoded by reference, and the sender has no choice but to emit it on
>> >> >> every
>> >> >> message.
>> >> >>
>> >> >> Things get even nastier if there are several large versions of CSP
>> >> >> on a
>> >> >> single connection.
>> >> >>
>> >> >> Clients could start advertising a larger SETTINGS_HEADER_TABLE_SIZE,
>> >> >> but
>> >> >> that means a larger state commitment (both client-side and
>> >> >> server-side,
>> >> >> where it can hurt a lot more, offers more DoS exposure, etc.).
>> >> >>
>> >> >> Given that we're already seeing popular sites brush up against this,
>> >> >> PLEASE don't assume that HTTP/2 == free compression, and that we can
>> >> >> continue to merrily add headers.
>> >> >>
>> >> >> Also - when a header is both large and monolithic like CSP (i.e., it
>> >> >> doesn't allow multiple values to be combined into a comma-separated
>> >> >> value),
>> >> >> it makes it much harder to optimise for compression, because of
>> >> >> HPACK's
>> >> >> granularity (again). I realise that there are security motivations
>> >> >> behind
>> >> >> this for CSP, but I wonder if the cost is justified (because once
>> >> >> somebody
>> >> >> can append headers, there's a lot of other damage they can do).
>> >> >>
>> >> >> Cheers,
>> >> >
>> >> >
>> >> > On Tue, Jan 5, 2016 at 11:29 AM Mike O'Neill
>> >> > <michael.oneill@baycloud.com>
>> >> > wrote:
>> >> >>
>> >> >> I don’t know if this has already been talked about, but maybe long
>> >> >> headers
>> >> >> like CSP can be could be put in a well-known resource. It would cost
>> >> >> another
>> >> >> roundtrip but save bandwidth in the end  because the resource would
>> >> >> be
>> >> >> cached. The CSP header would only need to contain a hash of the
>> >> >> resource
>> >> >> to
>> >> >> confirm
>> >> >>
>> >> >
>> >> > On Tue, Jan 5, 2016 at 11:52 AM Jonathan Kingston
>> >> > <jonathan@jooped.co.uk>
>> >> > wrote:
>> >> >>
>> >> >> Yup Mike I had suggested the use of SRI in the header and pointing
>> >> >> to
>> >> >> some
>> >> >> form of manfest file.
>> >> >>
>> >> >> I think this addresses some of Marks concerns about header size
>> >> >> however
>> >> >> creates other issues such as cache management and extra round trips.
>> >> >>
>> >> >> The advantage of the manifest also would allow separation of
>> >> >> concerns
>> >> >> between CSP and SRI within the policy.
>> >> >>
>> >> >
>> >> > Kind regards
>> >> > Jonathan
>> >>
>> >
Received on Thursday, 7 January 2016 22:38:41 UTC