Re: [CSP] URI/IRI normalization and comparison

On Thu, Nov 6, 2014 at 2:24 PM, Brian Smith <brian@briansmith.org> wrote:
> 1. In section 4.2.2, the first step is "Normalize the URI according to
> Section 6 of RFC3986." However, there is no step for normalizing the
> source expression. I think the source expression should be normalized
> too.

Here is an example:

<!DOCTYPE html>
<meta http-equiv=Content-Security-Policy content="script-src /%3b.js">
<script src="/%3B.js">

the script src attribute is already normalized, but the path in the
policy isn't, thus there won't be a match. But, these two should
probably be considered to match. That is why it is good to normalize
the source expression.

Further, the CSP 2 draft says that the grammar for paths in source
expressions is

      path-part = <path production from RFC 3986, section 3.3>

Later, the draft says "Note: Characters like U+003B SEMICOLON (;) and
U+002C COMMA (,) cannot appear in source expressions directly: if
you’d like to include these characters in a source expression, they
must be percent encoded as %3B and %2C respectively."

Note that the path production from RFC 3986 allows both "," and ";" in
paths, so these two parts contradict each other. It would be better to
define the path-part rule so that it doesn't contradict the note,
similarly to how the rest of the URI productions were copied from RFC
3986 and changed. That's just a minor editorial change.

However, there's a more tricky limitation. The normalization rules for
RFC3986 say "These URIs should be normalized by decoding any
percent-encoded octet that corresponds to an unreserved character, as
described in Section 2.3." However, "," and ";" are not unreserved
characters. To see why this is tricky, consider the following:

<!DOCTYPE html>
<meta http-equiv=Content-Security-Policy
    content="script-src /combined-a.js%2Cb.js%2Cc.js">
<script src="/combined-a.js,b.js,c.js">

Again, the script-src is already normalized and is a valid URL by both
the IETF and WHATWG standards.

But, the path in the CSP is **ALSO** already normalized according to
the IETF rules, because %2C is not an unreserved character.

Consequently, it is impossible to write a CSP path expression for any
URI containing "," or ";".

To fix this, I think that a new normalization rule based on the WHATWG
URL standard's "percent-decode" [1] algorithm is needed.

Cheers,
Brian

[1] https://url.spec.whatwg.org/#percent-decode

Received on Monday, 10 November 2014 01:43:50 UTC