Re: CSP script hashes

Thanks Brad. I agree very much with your summary and your points,
especially being aware of not designing something that is brittle going
forward with respect to computing hashes in the client. My intent is for us
to come up with a basic proposal, then speak with browser implementors to
get feedback on the feasibility of implementing that proposal.

I do not think we can realistically expect each UA to be able to compute
hashes of inline script blocks in the document, with the document in its
original encoding. The tokenization, tree construction, etc subsystems
almost certainly all expect the document to have been converted to a single
well known character encoding (likely UTF-8 or UTF-16/UCS-2).

I like your suggestion to restrict analysis to UTF-8, but perhaps instead
of requiring the document to be UTF-8 encoded when served from the origin,
we instead require that the process for computing hashes of inline blocks
on the server, as part of the process of constructing the contents of the
CSP header, goes something like this:
1. identify the allowed inline blocks in the document
2. convert each inline block's contents to UTF-8
3. compute the hash of the UTF-8 encoded block
4. serve the original response in its native encoding, whatever the content
author chose, but send the content hashes of the UTF-8 encoded blocks

I believe we can expect the client to be capable of converting to UTF-8 and
computing the hashes in the same way.

This does violate the priority of constituencies as you note (we are
putting implementors before authors here, adding complexity to the work
authors have to do to generate these hashes) but I think it is the right
tradeoff given the constraints of this specific problem. For authors using
utf-8, no additional work is required.

My biggest open concern with this approach is in verifying that there is a
single canonical way to convert any given character stream into a UTF-8
byte stream. If there is more than one way to encode a given character
stream into UTF-8 and there is not a clear canonical encoding, then this is
clearly problematic. I'm going to speak with some encoding experts about
this but if anyone on list happens to know, that'd save me some time. This
page suggests that there is one canonical way to represent any given
character stream as a UTF-8 byte stream, which is promising:
http://stackoverflow.com/questions/4166094/can-i-get-a-single-canonical-utf-8-string-from-a-unicode-string
.

What do you think of this potential approach? I believe it does not
introduce brittleness in user agent implementations as it should be very
reasonable to expect each UA to be capable of converting the contents of
script blocks to UTF-8. This conversion would only be necessary if the CSP
header includes one or more hashes for inline scripts/styles.


On Tue, Feb 12, 2013 at 3:20 PM, Hill, Brad <bhill@paypal-inc.com> wrote:

> > what is the rationale for preventing this beyond difficulty of
> implementation?
>
> [Hill, Brad] I'm always the first one to invoke the priority of
> constituencies, but I think there's a real sense in which difficulty of
> implementation is the only interesting problem here, and directly related
> to the use-case goals of the feature.
>
> How do we create a canonical set of bytes to represent script content
> inline in an HTML document that is unambiguous and yet not brittle across
> multiple implementations and (importantly) future implementations?
>
> We're taking dependencies on a core and complex part of HTML here.   We
> should expect HTML to continue to evolve, and for the pressures on it to be
> stronger than any back-pressure we can put it on behalf of script-hash.
>
> If we design something that is brittle, constrictive or otherwise
> problematic in the face of the evolution of core document parsing, we
> should expect script-nonce will fail and get left behind.
>

Received on Wednesday, 13 February 2013 00:56:35 UTC