Re: [CSP] URI/IRI normalization and comparison from Anne van Kesteren on 2014-11-12 (public-webappsec@w3.org from November 2014)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Wed, 12 Nov 2014 09:08:29 +0100
To: Brian Smith <brian@briansmith.org>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CADnb78hkSgMkxv3eR6EwyCX8rX+0fKxzg1u38Wb5oEKS5b8nsA@mail.gmail.com>

On Tue, Nov 11, 2014 at 11:36 PM, Brian Smith <brian@briansmith.org> wrote:
> I think you may be looking at the obsolete version of the spec (RFC
> 2616). This was fixed (not as completely as I would like) in the new
> version (RFC 7230).

Not really. RFCs tend to rarely pay attention to the level of detail
that is required to implement a browser.


> http://tools.ietf.org/html/rfc7230#section-3:
>
>    A recipient MUST parse an HTTP message as a sequence of octets in an
>    encoding that is a superset of US-ASCII [USASCII].  Parsing an HTTP
>    message as a stream of Unicode characters, without regard for the
>    specific encoding, creates security vulnerabilities due to the
>    varying ways that string processing libraries handle invalid
>    multibyte character sequences that contain the octet LF (%x0A).
>
> http://tools.ietf.org/html/rfc7230#section-3.2.4:
>
>    Historically, HTTP has allowed field content with text in the
>    ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
>    through use of [RFC2047] encoding.  In practice, most HTTP header
>    field values use only a subset of the US-ASCII charset [USASCII].
>    Newly defined header fields SHOULD limit their field values to
>    US-ASCII octets.  A recipient SHOULD treat other octets in field
>    content (obs-text) as opaque data.

Sure, but what does this mean for implementations? E.g. how we handle
(using \0xXX to denote a byte)

  Location: /\0x80

is surely not going to change, or is it?

As far as I know that needs to be treated *identical* to

  Location: /%80

Now maybe that matches "treat as opaque data", but it does mean that
\0x80 needs to become U+0080 before being handed to the URL parser (as
in "the real world" it operates on code points).


-- 
https://annevankesteren.nl/

Received on Wednesday, 12 November 2014 08:08:57 UTC