- From: Bryan McQuade <bmcquade@google.com>
- Date: Sat, 16 Feb 2013 10:50:50 -0500
- To: "Hill, Brad" <bhill@paypal-inc.com>
- Cc: Ian Melven <imelven@mozilla.com>, Jacob Hoffman-Andrews <jsha@twitter.com>, Eric Chen <eric.chen@sv.cmu.edu>, Nicholas Green <ngreen@twitter.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Yoav Weiss <yoav@yoav.ws>
- Message-ID: <CADLGQyBddMX_D6rtfvjJdJ4jqF4oAhx1VDR-FMCaz-85i-b2Tw@mail.gmail.com>
Quick update here: I chatted with Adam and he suggested a prototype implementation in WebKit to figure out which issues are actually likely to present problems (rather than speculating). Makes sense to me. I'm happy to dive into this but I don't have time to get to it for a week or 2. Alternatively if anyone else on list wants to try a prototype impl to get started, I'd be happy to pitch in as I find free cycles. Thanks, Bryan On Tue, Feb 12, 2013 at 8:07 PM, Hill, Brad <bhill@paypal-inc.com> wrote: > This sounds good – but the point that Mountie raised about UTF-8’s not > being suitable or common for some East Asian languages is important. *** > * > > ** ** > > My main concern in suggesting a UTF-8 only requirement was to avoid any > issues (security, performance, etc.) around the content-encoding sniffing > and re-parsing rules. Perhaps this could be adequately addressed by just > requiring an explicit charset in the Content-Type HTTP header or (slightly > weaker against injections) as a <meta> in the <head>.**** > > ** ** > > -Brad**** > > ** ** > > *From:* Bryan McQuade [mailto:bmcquade@google.com] > *Sent:* Tuesday, February 12, 2013 4:56 PM > *To:* Hill, Brad > *Cc:* Ian Melven; Jacob Hoffman-Andrews; Eric Chen; Nicholas Green; > public-webappsec@w3.org; Yoav Weiss > > *Subject:* Re: CSP script hashes**** > > ** ** > > Thanks Brad. I agree very much with your summary and your points, > especially being aware of not designing something that is brittle going > forward with respect to computing hashes in the client. My intent is for us > to come up with a basic proposal, then speak with browser implementors to > get feedback on the feasibility of implementing that proposal.**** > > ** ** > > I do not think we can realistically expect each UA to be able to compute > hashes of inline script blocks in the document, with the document in its > original encoding. The tokenization, tree construction, etc subsystems > almost certainly all expect the document to have been converted to a single > well known character encoding (likely UTF-8 or UTF-16/UCS-2).**** > > ** ** > > I like your suggestion to restrict analysis to UTF-8, but perhaps instead > of requiring the document to be UTF-8 encoded when served from the origin, > we instead require that the process for computing hashes of inline blocks > on the server, as part of the process of constructing the contents of the > CSP header, goes something like this:**** > > 1. identify the allowed inline blocks in the document**** > > 2. convert each inline block's contents to UTF-8**** > > 3. compute the hash of the UTF-8 encoded block**** > > 4. serve the original response in its native encoding, whatever the > content author chose, but send the content hashes of the UTF-8 encoded > blocks**** > > ** ** > > I believe we can expect the client to be capable of converting to UTF-8 > and computing the hashes in the same way.**** > > ** ** > > This does violate the priority of constituencies as you note (we are > putting implementors before authors here, adding complexity to the work > authors have to do to generate these hashes) but I think it is the right > tradeoff given the constraints of this specific problem. For authors using > utf-8, no additional work is required.**** > > ** ** > > My biggest open concern with this approach is in verifying that there is a > single canonical way to convert any given character stream into a UTF-8 > byte stream. If there is more than one way to encode a given character > stream into UTF-8 and there is not a clear canonical encoding, then this is > clearly problematic. I'm going to speak with some encoding experts about > this but if anyone on list happens to know, that'd save me some time. This > page suggests that there is one canonical way to represent any given > character stream as a UTF-8 byte stream, which is promising: > http://stackoverflow.com/questions/4166094/can-i-get-a-single-canonical-utf-8-string-from-a-unicode-string > .**** > > ** ** > > What do you think of this potential approach? I believe it does not > introduce brittleness in user agent implementations as it should be very > reasonable to expect each UA to be capable of converting the contents of > script blocks to UTF-8. This conversion would only be necessary if the CSP > header includes one or more hashes for inline scripts/styles.**** > > ** ** > > On Tue, Feb 12, 2013 at 3:20 PM, Hill, Brad <bhill@paypal-inc.com> wrote:* > *** > > > what is the rationale for preventing this beyond difficulty of > implementation?**** > > [Hill, Brad] I'm always the first one to invoke the priority of > constituencies, but I think there's a real sense in which difficulty of > implementation is the only interesting problem here, and directly related > to the use-case goals of the feature. > > How do we create a canonical set of bytes to represent script content > inline in an HTML document that is unambiguous and yet not brittle across > multiple implementations and (importantly) future implementations? > > We're taking dependencies on a core and complex part of HTML here. We > should expect HTML to continue to evolve, and for the pressures on it to be > stronger than any back-pressure we can put it on behalf of script-hash. > > If we design something that is brittle, constrictive or otherwise > problematic in the face of the evolution of core document parsing, we > should expect script-nonce will fail and get left behind.**** > > ** ** >
Received on Saturday, 16 February 2013 15:51:18 UTC