Re: [CSP] URI/IRI normalization and comparison

Mike West <mkwst@google.com> wrote:
https://github.com/w3c/webappsec/commit/ae22342195ef00120c2a0d1ec1edb47b03bc5681.
> WDYT?

I recommend that Anne look at this, since he's more familiar with the
WHATWG URL spec.

>> 2. Don't require double-escaping. Double-escaping is required in order
>> to allow paths to include "," and ";", but it causes unintuitive
>> behavior for many other situations (any path that contains '%'). I
>> suggest for CSP2 that you simply don't allow paths to contain "," and
>> ";". In a future version, we can define a new escaping syntax that
>> would allow paths to contain those two characters, e.g.
>> "urlencoded:<url>".
>
> Hrm. Given the limited number of source expressions that we'd expect to
> contain either of those characters, it's not clear that this is actually a
> better thing to confuse developers about than the encoding related to URLs
> containing '%'.

I agree that URls with "," and ";" are so rare that we shouldn't
optimize for them. That's why I suggest to remove the
double-URL-encoding requirement, because we're only
double-URL-encoding in order to support "," and ";".

I think it would be a very unfortunate usability issue for CSP source
expressions to be double-URL-encoded when URLs everywhere else in HTML
are single-URL-encoded. This is especially problematic for any
non-ASCII URLs, because without UTF-8 URL support (see below), we
require all of them to be percent-encoded UTF-8, which means they will
all contain lots of "%" characters that need to be escaped.

>> 3. Allow IRIs (unescaped unicode characters), but recommend (not
>> require) that non-ASCII characters be escaped when the policy appears
>> in an HTTP header.
>
> I'd like to defer this to CSP3.

Note that the WHATWG URL spec is written with support for UTF-8
encoded URLs (IRIs) already. So, it is actually more work to remove
support for natively-encoded IRIs from CSP than it is to support them,
AFAICT.

Ethically, we shouldn't unnecessarily restrict peoples' use of their
native languages, especially in security specifications. Users of
non-English/non-latin-based languages are already at a tremendous
disadvantage because all the specifications for these standards are in
English. We shouldn't compound that by defering the fix for this issue
to a later date, just like we wouldn't defer other accessibility
issues like support for users with vision issues.

Also, don't some implementations already support IRIs in CSP? Anne
seemed to suggest so, and I think my tests showed that IRIs are
supported in Chrome and Firefox in at least some situations.

> For the moment, the spec requires entry of
> internationalized domain names as Punycode; that seems like a good baseline
> of support that we can build upon.

Punycode is terrible, and it only applies to hostnames, not to paths.
The ASCII syntax for non-ASCII paths is URL-encoded UTF-8, which is
unreadable to everybody (i.e. terrible), and which is even worse the
way CSP is currently specified, because of the double-escaping issue.

Note that if you allow UTF-8 encoding of paths and hostnames in CSP
source expressions, there will be much less need for the use of the
'%' character in source expressions, so the double-escaping issue
would be less important.

Cheers,
Brian

Received on Monday, 19 January 2015 05:22:10 UTC