Re: [CSP] URI/IRI normalization and comparison from Brian Smith on 2014-11-11 (public-webappsec@w3.org from November 2014)

From: Brian Smith <brian@briansmith.org>
Date: Tue, 11 Nov 2014 14:36:41 -0800
To: Anne van Kesteren <annevk@annevk.nl>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAFewVt6UtkODzdQSv3FtdSUz87Yq1tgBErTks-NLpbvR-ZaKzw@mail.gmail.com>

Anne van Kesteren <annevk@annevk.nl> wrote:
> On Mon, Nov 10, 2014 at 2:18 AM, Brian Smith <brian@briansmith.org> wrote:
>> Header encoding is defined in the HTTP specification. Also, there are
>> about 3 million emails on the HTTP WG mailing list about this topic.
>
> As far as I know that's false. Legacy headers are decoded per
> "original latin1". For new headers you need to specify it. Of course,
> that completely fails with generic APIs, I'm not sure if they
> considered that.

I think you may be looking at the obsolete version of the spec (RFC
2616). This was fixed (not as completely as I would like) in the new
version (RFC 7230).

http://tools.ietf.org/html/rfc7230#section-3:

   A recipient MUST parse an HTTP message as a sequence of octets in an
   encoding that is a superset of US-ASCII [USASCII].  Parsing an HTTP
   message as a stream of Unicode characters, without regard for the
   specific encoding, creates security vulnerabilities due to the
   varying ways that string processing libraries handle invalid
   multibyte character sequences that contain the octet LF (%x0A).

http://tools.ietf.org/html/rfc7230#section-3.2.4:

   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
   through use of [RFC2047] encoding.  In practice, most HTTP header
   field values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD limit their field values to
   US-ASCII octets.  A recipient SHOULD treat other octets in field
   content (obs-text) as opaque data.

Cheers,
Brian

Received on Tuesday, 11 November 2014 22:37:08 UTC