Re: [CSP] URI/IRI normalization and comparison from Anne van Kesteren on 2015-01-20 (public-webappsec@w3.org from January 2015)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Tue, 20 Jan 2015 13:37:31 +0100
To: Mike West <mkwst@google.com>
Cc: Brian Smith <brian@briansmith.org>, "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CADnb78jigzcvn3KHr2yOodkpQL35+o=UhiS5o7GeQe9rfH3cyg@mail.gmail.com>

On Fri, Jan 16, 2015 at 11:06 AM, Mike West <mkwst@google.com> wrote:
> WDYT?

Thanks. Reviewing
https://w3c.github.io/webappsec/specs/content-security-policy/#match-a-source-expression
these parts appear to have issues:

* Step 2 talks about a URL's scheme type, while there is no such
primitive. You could talk about URL's origin perhaps, though note that
the origin of a blob URL is typically not a globally unique
identifier.

* Step 3.1, you want to clarify that this is an ASCII case-insensitive
comparison (see e.g. HTML for a definition). This applies elsewhere
too, I recommend searching through the document.

* Step 4.2, this should reference the definition of a URL's origin I
think (from the URL Standard). Also, a URL's origin won't have a
default port included so that note is wrong.

* Step 4.3, a URL's path is a list of segments, not a string. Also, a
URL's path (once a URL is parsed and has a relative scheme) will end
up as "/" serialized so that is a superfluous statement.

* Step 4.5.1, talks about "protected resource’s URL" which is a
variable not introduced in this algorithm before. Is this actually
meant to refer to a part of the source expression? This applies
elsewhere too.

* Step 4.7, should this not perform IPv6 normalization similar to how
URL parsers do it? Or at least parse the IPv6 source expression input
in a way that it can be compared with the URL's IPv6 address without
syntax getting in the way? This might also apply to IPv4 to some
extent. (Given that you avoid Unicode it seems you are IDNA-safe here
and ASCII case-insensitive comparisons will work fine in that case.
URL parser defaults to outputting ASCII domains.)

* Step 4.10 talks about the URL being the result of a redirect but it
is unclear how that information can be obtained from a simple
comparison operation. It also has the same problems with path as
mentioned earlier. It's not a string, but a list.

Was this the only section that changed?

>> 2. Don't require double-escaping. Double-escaping is required in order
>> to allow paths to include "," and ";", but it causes unintuitive
>> behavior for many other situations (any path that contains '%'). I
>> suggest for CSP2 that you simply don't allow paths to contain "," and
>> ";". In a future version, we can define a new escaping syntax that
>> would allow paths to contain those two characters, e.g.
>> "urlencoded:<url>".
>
> Hrm. Given the limited number of source expressions that we'd expect to
> contain either of those characters, it's not clear that this is actually a
> better thing to confuse developers about than the encoding related to URLs
> containing '%'.

What is the problem with allowing "," and ";" in paths, percent-encoded?

>> 3. Allow IRIs (unescaped unicode characters), but recommend (not
>> require) that non-ASCII characters be escaped when the policy appears
>> in an HTTP header.
>
> I'd like to defer this to CSP3. For the moment, the spec requires entry of
> internationalized domain names as Punycode; that seems like a good baseline
> of support that we can build upon. Filed
> https://github.com/w3c/webappsec/issues/145 to track it.

Are implementations rejecting policies that include bytes 0x80 and
over? What is the expected error handling?

-- 
https://annevankesteren.nl/

Received on Tuesday, 20 January 2015 12:37:54 UTC