- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Mon, 09 Feb 2015 16:49:24 +0100
- To: Brian Smith <brian@briansmith.org>
- Cc: Mike West <mkwst@google.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Brad Hill <hillbrad@gmail.com>, Dan Veditz <dveditz@mozilla.com>, Wendy Seltzer <wseltzer@w3.org>
* Brian Smith wrote: >Mike West <mkwst@google.com> wrote: >>> 2. As I mentioned previously, I think it is really unfortunate that >>> CSP2 isn't properly Unicode-enabled. I know that nobody is >>> intentionally trying to discriminate against any group of people, but >>> IMO this incidental discrimination shouldn't be accepted either. I >>> think this issue deserves the same level of consideration as >>> accessibility for people with visual impairments. (Note I'm not trying >>> to diminish the importance of accessibility work.) >> >> To be sure I understand what needs to be done here, you'd like us to: >> >> * Remove the recommendation to use punycode (what should we do with >> punycode? should it match its unicode equiv?) > >In the ASCII encoding of an internationalized URL, two different >encoding mechanisms are used: punycode for domain labels, and >URL-escaped UTF-8 (IIRC) for everything else. So, it isn't just an >issue with punycode. > >Yes, a URL should be considered equal to its ASCII-ified (IRI-to-URI) >equivalent. So, for example, > >> * Allow unicode characters as part of the grammar > >> * Recommend that folks %-encode unicode characters when delivered as an HTTP >> header > >Not just %-encoded, but convert the IRI to a URI. In particular, >punycode should be used for the domain labels in the authority, and >the path and query string should be converted to UTF-8 and then >normalized and URL-encoded. I am intimiately familiar with the relevant standards here, but I don't really understand your comment. Could you take a step back and describe the problems you see? Some things to note: * HTTP generally does not use "non-ASCII octets" in headers * host names in URIs can use UTF-8+%xx-encoding * CSP uses bare host names in some protocol elements * urlencode(normalize(utf8encode(...))) is most probably wrong, whatever that is trying to do. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de Available for hire in Berlin (early 2015) · http://www.websitedev.de/
Received on Monday, 9 February 2015 15:50:06 UTC