Re: [CSP] URI/IRI normalization and comparison from Mike West on 2015-01-20 (public-webappsec@w3.org from January 2015)

From: Mike West <mkwst@google.com>
Date: Tue, 20 Jan 2015 15:32:59 +0100
To: Anne van Kesteren <annevk@annevk.nl>
Cc: Brian Smith <brian@briansmith.org>, "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAKXHy=eF8+zN+1J-XubTWuZm_Ux34J+wt8m9oCuZFHohcNXYTA@mail.gmail.com>
On Tue, Jan 20, 2015 at 1:37 PM, Anne van Kesteren <annevk@annevk.nl> wrote:

> On Fri, Jan 16, 2015 at 11:06 AM, Mike West <mkwst@google.com> wrote:
> > WDYT?
>
> Thanks. Reviewing
>
> https://w3c.github.io/webappsec/specs/content-security-policy/#match-a-source-expression
> these parts appear to have issues:
>
> * Step 2 talks about a URL's scheme type, while there is no such
> primitive. You could talk about URL's origin perhaps, though note that
> the origin of a blob URL is typically not a globally unique
> identifier.
>

I'll change this to explicitly list `blob`, `data`, and `filesystem`. It
seems like there should be some concept that encompases these kinds of URLs
that we don't want to match against '*', but I don't see one lying around.

https://github.com/w3c/webappsec/commit/8cbfd691843b110f97b522e06c2990b532d477dd


> * Step 3.1, you want to clarify that this is an ASCII case-insensitive
> comparison (see e.g. HTML for a definition). This applies elsewhere
> too, I recommend searching through the document.
>

https://github.com/w3c/webappsec/commit/788107732af4db10ecfb3be50de9ae7a2a6b60a5


> * Step 4.2, this should reference the definition of a URL's origin I
> think (from the URL Standard). Also, a URL's origin won't have a
> default port included so that note is wrong.
>

https://url.spec.whatwg.org/#origin reads as though the default port is
included.


> * Step 4.3, a URL's path is a list of segments, not a string. Also, a
> URL's path (once a URL is parsed and has a relative scheme) will end
> up as "/" serialized so that is a superfluous statement.
>

Rewrote all of this:
https://github.com/w3c/webappsec/commit/d2925a289c9c6aa39c5600b325aed4923c89455a


> * Step 4.5.1, talks about "protected resource’s URL" which is a
> variable not introduced in this algorithm before. Is this actually
> meant to refer to a part of the source expression? This applies
> elsewhere too.
>

The "protected resource" is the document/worker which the policy applies
to. We can either accept that as a global, or grab it from the incumbent
settings object. I prefer the former for the moment, but will prefer
something else once we rewrite the whole thing in terms of Fetch.

* Step 4.7, should this not perform IPv6 normalization similar to how
> URL parsers do it? Or at least parse the IPv6 source expression input
> in a way that it can be compared with the URL's IPv6 address without
> syntax getting in the way? This might also apply to IPv4 to some
> extent. (Given that you avoid Unicode it seems you are IDNA-safe here
> and ASCII case-insensitive comparisons will work fine in that case.
> URL parser defaults to outputting ASCII domains.)
>

1. Wildcard handling is a broken concept for IPv4; it simply doesn't make
sense as written. Maybe moving the '*' to the right side would make sense,
but honestly I'd rather remove that than somehow extend the concept to IPv6.

2. We accidentally support IPv4 by virtue of supporting numerics in
hostnames. It's not at all clear to me that we _should_ have extended the
grammar to support IPv6. I think we instead should encourage folks not to
tie security checks to IP-address-based hosts. Perhaps we should revert the
decision in https://github.com/w3c/webappsec/issues/49?


> * Step 4.10 talks about the URL being the result of a redirect but it
> is unclear how that information can be obtained from a simple
> comparison operation. It also has the same problems with path as
> mentioned earlier. It's not a string, but a list.
>

This will be clearer with a Fetch-based rewrite. It's not clear how we can
obtain this information at the moment given the way things are structured,
so this bit is seriously hand-wavey.

Was this the only section that changed?
>

Yes. Thanks!


> >> 2. Don't require double-escaping. Double-escaping is required in order
> >> to allow paths to include "," and ";", but it causes unintuitive
> >> behavior for many other situations (any path that contains '%'). I
> >> suggest for CSP2 that you simply don't allow paths to contain "," and
> >> ";". In a future version, we can define a new escaping syntax that
> >> would allow paths to contain those two characters, e.g.
> >> "urlencoded:<url>".
> >
> > Hrm. Given the limited number of source expressions that we'd expect to
> > contain either of those characters, it's not clear that this is actually
> a
> > better thing to confuse developers about than the encoding related to
> URLs
> > containing '%'.
>
> What is the problem with allowing "," and ";" in paths, percent-encoded?
>

Brian; I confused myself trying to explain the problem to Anne. Could you
give us some examples to work with where the current mechanisms fall down?

> I'd like to defer this to CSP3. For the moment, the spec requires entry of
> > internationalized domain names as Punycode; that seems like a good
> baseline
> > of support that we can build upon. Filed
> > https://github.com/w3c/webappsec/issues/145 to track it.
>
> Are implementations rejecting policies that include bytes 0x80 and
> over? What is the expected error handling?
>

Chrome matches the grammar in the document, which rejects embedded UTF-8
characters.

-mike

--
Mike West <mkwst@google.com>, @mikewest

Google Germany GmbH, Dienerstrasse 12, 80331 München,
Germany, Registergericht und -nummer: Hamburg, HRB 86891, Sitz der
Gesellschaft: Hamburg, Geschäftsführer: Graham Law, Christine Elizabeth
Flores
(Sorry; I'm legally required to add this exciting detail to emails. Bleh.)
Received on Tuesday, 20 January 2015 14:33:47 UTC