Re: [CSP] URI/IRI normalization and comparison from Brian Smith on 2014-11-11 (public-webappsec@w3.org from November 2014)

From: Brian Smith <brian@briansmith.org>
Date: Tue, 11 Nov 2014 15:51:34 -0800
To: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAFewVt5Y_MftNFJjn5FQQCMchN+-0S7hJ0t+_kyawahMbsryrg@mail.gmail.com>
On Sun, Nov 9, 2014 at 5:43 PM, Brian Smith <brian@briansmith.org> wrote:
> On Thu, Nov 6, 2014 at 2:24 PM, Brian Smith <brian@briansmith.org> wrote:
>> 1. In section 4.2.2, the first step is "Normalize the URI according to
>> Section 6 of RFC3986." However, there is no step for normalizing the
>> source expression. I think the source expression should be normalized
>> too.
>
> Here is an example:
>
> <!DOCTYPE html>
> <meta http-equiv=Content-Security-Policy content="script-src /%3b.js">
> <script src="/%3B.js">
>
> the script src attribute is already normalized, but the path in the
> policy isn't, thus there won't be a match. But, these two should
> probably be considered to match. That is why it is good to normalize
> the source expression.

The above example wasn't the best. Instead of demonstrating why the
source expression should be normalized, it actually demonstrates why
the RFC 3986 normalization doesn't result in a match like one would
expect. Also, my example is missing the required uri-host component:

<!DOCTYPE html>
<meta http-equiv=Content-Security-Policy content="script-src
example.com/%3b.js">
<script src="/%3B.js">

"example.com/%3b.js" will get transformed to "example.com/;.js"
according to the "Let decoded-path be the result of decoding
path-part’s percent-encoded characters" rule in CSP. But, there still
won't be a match with "/%3B.js" because the URI decoding for the
script's src attribute is done using RFC3986 normalization, which
won't decode %3B to ';'.

> Later, the draft says "Note: Characters like U+003B SEMICOLON (;) and
> U+002C COMMA (,) cannot appear in source expressions directly: if
> you’d like to include these characters in a source expression, they
> must be percent encoded as %3B and %2C respectively."
>
> Note that the path production from RFC 3986 allows both "," and ";" in
> paths, so these two parts contradict each other.

Sorry. There is actually no contradiction in *this* part of the spec.
The spec says that when parsing the directive

My mistake was assuming that the ABNF grammar in the spec describes
the syntax. However, I was wrong to assume that. *Some* parts of the
ANBF in the spec specify *some* parts of the syntax, and some other
parts of the ABNF do not seem to agree with the algorithm specified in
the prose. In other words, in one part it is said that the syntax is
"defined by the following ABNF" but that doesn't mean anything,
because a conformant parser cannot use the ABNF (exclusively) to parse
a policy. On the other hand, it seems like the parsing and processing
of some parts of CSP are only defined by the ABNF, not in the prose
(e.g. the parsing of multiple comma-separated policies).

> <!DOCTYPE html>
> <meta http-equiv=Content-Security-Policy
>     content="script-src /combined-a.js%2Cb.js%2Cc.js">
> <script src="/combined-a.js,b.js,c.js">
>
> Again, the script-src is already normalized and is a valid URL by both
> the IETF and WHATWG standards.
>
> But, the path in the CSP is **ALSO** already normalized according to
> the IETF rules, because %2C is not an unreserved character.

Again, my mistake: The "Let decoded-path be the result of decoding
path-part’s percent-encoded characters" part of CSP will transform
"combined-a.js%2Cb.js%2Cc.js" to "/combined-a.js,b.js,c.js" and the
directive WILL match. Also, my example was missing the required
host-part. The problem is actually the opposite, as shown by this new
example:

> <!DOCTYPE html>
> <meta http-equiv=Content-Security-Policy
>     content="script-src https://example.com/combined-a.js%2Cb.js%2Cc.js">
> <script src="https://example.com/combined-a.js%2Cb.js%2Cc.js">

Note that the script-src directive and the src attribute of the script
element are EXACT byte-for-byte matches, but they do NOT match. In
particular, CSP source expression will get transformed to
"https://example.com/combined-a.js,b.js,c.js" by the "Let decoded-path
be the result of decoding path-part’s percent-encoded characters"
step, but the script's src attribute will NOT change as a result of
the "Normalize the URI according to Section 6 of RFC3986" step.

> Consequently, it is impossible to write a CSP path expression for any
> URI containing "," or ";".

Again, I was wrong: The problem with escaping is that many URIs
containing escape sequences cannot be matched using the CSP syntax.

> To fix this, I think that a new normalization rule based on the WHATWG
> URL standard's "percent-decode" [1] algorithm is needed.

Despite being repeatedly wrong about everything else I wrote in that
message, I think this is still right, or at least this is part of the
right thing to do.

Cheers,
Brian
Received on Tuesday, 11 November 2014 23:52:02 UTC