Re: CfC: Transition CSP2 to CR. from Bjoern Hoehrmann on 2015-02-09 (public-webappsec@w3.org from February 2015)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 09 Feb 2015 16:49:24 +0100
To: Brian Smith <brian@briansmith.org>
Cc: Mike West <mkwst@google.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Brad Hill <hillbrad@gmail.com>, Dan Veditz <dveditz@mozilla.com>, Wendy Seltzer <wseltzer@w3.org>
Message-ID: <q3lhda564k6gu378ijc209s6feile1ct9v@hive.bjoern.hoehrmann.de>

* Brian Smith wrote:
>Mike West <mkwst@google.com> wrote:
>>> 2. As I mentioned previously, I think it is really unfortunate that
>>> CSP2 isn't properly Unicode-enabled. I know that nobody is
>>> intentionally trying to discriminate against any group of people, but
>>> IMO this incidental discrimination shouldn't be accepted either. I
>>> think this issue deserves the same level of consideration as
>>> accessibility for people with visual impairments. (Note I'm not trying
>>> to diminish the importance of accessibility work.)
>>
>> To be sure I understand what needs to be done here, you'd like us to:
>>
>> * Remove the recommendation to use punycode (what should we do with
>> punycode? should it match its unicode equiv?)
>
>In the ASCII encoding of an internationalized URL, two different
>encoding mechanisms are used: punycode for domain labels, and
>URL-escaped UTF-8 (IIRC) for everything else. So, it isn't just an
>issue with punycode.
>
>Yes, a URL should be considered equal to its ASCII-ified (IRI-to-URI)
>equivalent. So, for example,
>
>> * Allow unicode characters as part of the grammar
>
>> * Recommend that folks %-encode unicode characters when delivered as an HTTP
>> header
>
>Not just %-encoded, but convert the IRI to a URI. In particular,
>punycode should be used for the domain labels in the authority, and
>the path and query string should be converted to UTF-8 and then
>normalized and URL-encoded.

I am intimiately familiar with the relevant standards here, but I don't
really understand your comment. Could you take a step back and describe
the problems you see? Some things to note:

  * HTTP generally does not use "non-ASCII octets" in headers
  * host names in URIs can use UTF-8+%xx-encoding
  * CSP uses bare host names in some protocol elements
  * urlencode(normalize(utf8encode(...))) is most probably wrong,
    whatever that is trying to do.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de
 Available for hire in Berlin (early 2015)  · http://www.websitedev.de/

Received on Monday, 9 February 2015 15:50:06 UTC