- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Thu, 06 Sep 2012 17:39:55 -0400
- To: public-webappsec@w3.org
Dear all, I was just reading through the CSP draft, and I'm very concerned by the handling of non-ASCII characters in CSP. Specifically, I'm concerned about four things: A) Lack of description for how one goes from an IRI or partial IRI to a host-source expression. B) Lack of description for how one compares a source expression to an IRI. C) Lack of description for how one goes from a Unicode string to policy. D) The fact that the current setup is likely to cause interop problems. As far as I can tell, the current setup is as follows: 1) All CSP policies are made up of bytes in the ASCII range (and in particular, a subset of that range). Non-ASCII hostnames are expected to be encoded as punycode, I guess (though this is not actually stated anywhere; see concern A above). Non-ASCII characters in paths presumably expected to be %-encoded, but the specification doesn't say what encoding should be used for this (concern A again). In practice, by the way, at least one implementation allows non-ASCII bytes in paths, though I think the spec is pretty clear that as things stand this is not allowed. 2) When comparing a source expression to an IRI, the IRI needs to first be converted to a URI, presumably per RFC 3987. If the presumption is correct, this should probably be explicitly called out (concern B above). 3) When converting a Unicode string to a policy, presumably one does it by taking the numeric value of each codepoint and treating it as an ASCII character index? If so, this should be explicitly called out (concern C above). In practice, I expect people to just call their favorite escape() method on their strings if they have to shoehorn them into an ASCII format, which means that we'll get a mix of %-encoding in as ISO-8859-1 and UTF-8 at the very least, and very possibly others. The result will be lack of interop (concern D). It seems to me that a lot of these problems were alleviated if CSP policies were defined as sequences of Unicode codepoints, with a comparison function to IRIs. The spec would also need to define how to construct such a sequence of Unicode codepoints from a Content-Security-Policy HTTP header or a Content-Security-Policy-Report-Only HTTP header, but the result would be to allow authors to use strings that actually make sense to them in CSP policies instead of shoehorning them into an ASCII-only format in likely-broken ways. Thank you for taking the time to read all that, Boris
Received on Thursday, 6 September 2012 21:40:24 UTC