Re: [CSP] URI/IRI normalization and comparison

On Wed, Nov 12, 2014 at 12:08 AM, Anne van Kesteren <annevk@annevk.nl> wrote:
> On Tue, Nov 11, 2014 at 11:36 PM, Brian Smith <brian@briansmith.org> wrote:
>> I think you may be looking at the obsolete version of the spec (RFC
>> 2616). This was fixed (not as completely as I would like) in the new
>> version (RFC 7230).
>
> Not really. RFCs tend to rarely pay attention to the level of detail
> that is required to implement a browser.

I agree with you; that's what I meant by "not as completely as I would like."

>> http://tools.ietf.org/html/rfc7230#section-3.2.4:
>>
>>    Historically, HTTP has allowed field content with text in the
>>    ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
>>    through use of [RFC2047] encoding.  In practice, most HTTP header
>>    field values use only a subset of the US-ASCII charset [USASCII].
>>    Newly defined header fields SHOULD limit their field values to
>>    US-ASCII octets.  A recipient SHOULD treat other octets in field
>>    content (obs-text) as opaque data.
>
> Sure, but what does this mean for implementations? E.g. how we handle
> (using \0xXX to denote a byte)
>
>   Location: /\0x80
>
> is surely not going to change, or is it?
>
> As far as I know that needs to be treated *identical* to
>
>   Location: /%80
>
> Now maybe that matches "treat as opaque data", but it does mean that
> \0x80 needs to become U+0080 before being handed to the URL parser (as
> in "the real world" it operates on code points).

If you get a garbage Location like that for anything other than a
redirect, you just ignore it. When you get a garbage Location like
that for a redirect, you probably should just show an error page,
though you'd have to do a survey of browser implementations to know
for sure what to do.

As far as parsing the Content-Security-Policy header is concerned, I
think the CSP specification is generally doing something reasonable
for invalid characters (as defined by the RFC 3986 syntax). In
particular, if the URL isn't a valid RFC 3986 URL then the browser
will skip the CSP directive without ever feeding the URL to the HTML5
URL normalizer/decoder.

In other words, when processing URLs in HTTP headers, in general you
need to deal with the URL according to RFC 3986 rules at the HTTP
level, and deal with the URL using HTML5 rules at the HTML level. That
means, in particular, that the HTML5 URI parsing/decoding algorithms
need to be able to handle all RFC 3986 URLs, even if such URLs are not
possible in HTML5. And, it also means that there needs to be a way to
convert every HTML5 URL into a valid RFC 3986 URL for the cases where
you need to emit an HTML5 URL in an HTTP request.

I think the main question is whether normalization (including URL
decoding) and comparison should be done in the HTTP layer (using RFC
3986 rules) or in the HTML layer (using HTML5 rules). I believe the
answer, for this particular case, is that normalization and comparison
needs to be done in the HTML layer and not in the HTTP layer, but
right now the CSP spec is wrongly mixing the two.

Cheers,
Brian

Received on Wednesday, 12 November 2014 08:41:10 UTC