Re: CSP script hashes from Bryan McQuade on 2013-02-16 (public-webappsec@w3.org from February 2013)

From: Bryan McQuade <bmcquade@google.com>
Date: Sat, 16 Feb 2013 10:50:50 -0500
To: "Hill, Brad" <bhill@paypal-inc.com>
Cc: Ian Melven <imelven@mozilla.com>, Jacob Hoffman-Andrews <jsha@twitter.com>, Eric Chen <eric.chen@sv.cmu.edu>, Nicholas Green <ngreen@twitter.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Yoav Weiss <yoav@yoav.ws>
Message-ID: <CADLGQyBddMX_D6rtfvjJdJ4jqF4oAhx1VDR-FMCaz-85i-b2Tw@mail.gmail.com>
Quick update here: I chatted with Adam and he suggested a prototype
implementation in WebKit to figure out which issues are actually likely to
present problems (rather than speculating). Makes sense to me. I'm happy to
dive into this but I don't have time to get to it for a week or 2.
Alternatively if anyone else on list wants to try a prototype impl to get
started, I'd be happy to pitch in as I find free cycles.

Thanks,
Bryan



On Tue, Feb 12, 2013 at 8:07 PM, Hill, Brad <bhill@paypal-inc.com> wrote:

>  This sounds good – but the point that Mountie raised about UTF-8’s not
> being suitable or common for some East Asian languages is important.   ***
> *
>
> ** **
>
> My main concern in suggesting a UTF-8 only requirement was to avoid any
> issues (security, performance, etc.) around the content-encoding sniffing
> and re-parsing rules.  Perhaps this could be adequately addressed by just
> requiring an explicit charset in the Content-Type HTTP header or (slightly
> weaker against injections) as a <meta> in the <head>.****
>
> ** **
>
> -Brad****
>
> ** **
>
> *From:* Bryan McQuade [mailto:bmcquade@google.com]
> *Sent:* Tuesday, February 12, 2013 4:56 PM
> *To:* Hill, Brad
> *Cc:* Ian Melven; Jacob Hoffman-Andrews; Eric Chen; Nicholas Green;
> public-webappsec@w3.org; Yoav Weiss
>
> *Subject:* Re: CSP script hashes****
>
>  ** **
>
> Thanks Brad. I agree very much with your summary and your points,
> especially being aware of not designing something that is brittle going
> forward with respect to computing hashes in the client. My intent is for us
> to come up with a basic proposal, then speak with browser implementors to
> get feedback on the feasibility of implementing that proposal.****
>
> ** **
>
> I do not think we can realistically expect each UA to be able to compute
> hashes of inline script blocks in the document, with the document in its
> original encoding. The tokenization, tree construction, etc subsystems
> almost certainly all expect the document to have been converted to a single
> well known character encoding (likely UTF-8 or UTF-16/UCS-2).****
>
> ** **
>
> I like your suggestion to restrict analysis to UTF-8, but perhaps instead
> of requiring the document to be UTF-8 encoded when served from the origin,
> we instead require that the process for computing hashes of inline blocks
> on the server, as part of the process of constructing the contents of the
> CSP header, goes something like this:****
>
> 1. identify the allowed inline blocks in the document****
>
> 2. convert each inline block's contents to UTF-8****
>
> 3. compute the hash of the UTF-8 encoded block****
>
> 4. serve the original response in its native encoding, whatever the
> content author chose, but send the content hashes of the UTF-8 encoded
> blocks****
>
> ** **
>
> I believe we can expect the client to be capable of converting to UTF-8
> and computing the hashes in the same way.****
>
> ** **
>
> This does violate the priority of constituencies as you note (we are
> putting implementors before authors here, adding complexity to the work
> authors have to do to generate these hashes) but I think it is the right
> tradeoff given the constraints of this specific problem. For authors using
> utf-8, no additional work is required.****
>
> ** **
>
> My biggest open concern with this approach is in verifying that there is a
> single canonical way to convert any given character stream into a UTF-8
> byte stream. If there is more than one way to encode a given character
> stream into UTF-8 and there is not a clear canonical encoding, then this is
> clearly problematic. I'm going to speak with some encoding experts about
> this but if anyone on list happens to know, that'd save me some time. This
> page suggests that there is one canonical way to represent any given
> character stream as a UTF-8 byte stream, which is promising:
> http://stackoverflow.com/questions/4166094/can-i-get-a-single-canonical-utf-8-string-from-a-unicode-string
> .****
>
> ** **
>
> What do you think of this potential approach? I believe it does not
> introduce brittleness in user agent implementations as it should be very
> reasonable to expect each UA to be capable of converting the contents of
> script blocks to UTF-8. This conversion would only be necessary if the CSP
> header includes one or more hashes for inline scripts/styles.****
>
> ** **
>
> On Tue, Feb 12, 2013 at 3:20 PM, Hill, Brad <bhill@paypal-inc.com> wrote:*
> ***
>
> > what is the rationale for preventing this beyond difficulty of
> implementation?****
>
> [Hill, Brad] I'm always the first one to invoke the priority of
> constituencies, but I think there's a real sense in which difficulty of
> implementation is the only interesting problem here, and directly related
> to the use-case goals of the feature.
>
> How do we create a canonical set of bytes to represent script content
> inline in an HTML document that is unambiguous and yet not brittle across
> multiple implementations and (importantly) future implementations?
>
> We're taking dependencies on a core and complex part of HTML here.   We
> should expect HTML to continue to evolve, and for the pressures on it to be
> stronger than any back-pressure we can put it on behalf of script-hash.
>
> If we design something that is brittle, constrictive or otherwise
> problematic in the face of the evolution of core document parsing, we
> should expect script-nonce will fail and get left behind.****
>
> ** **
>
Received on Saturday, 16 February 2013 15:51:18 UTC