- From: Anne van Kesteren <annevk@annevk.nl>
- Date: Wed, 12 Nov 2014 09:08:29 +0100
- To: Brian Smith <brian@briansmith.org>
- Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
On Tue, Nov 11, 2014 at 11:36 PM, Brian Smith <brian@briansmith.org> wrote: > I think you may be looking at the obsolete version of the spec (RFC > 2616). This was fixed (not as completely as I would like) in the new > version (RFC 7230). Not really. RFCs tend to rarely pay attention to the level of detail that is required to implement a browser. > http://tools.ietf.org/html/rfc7230#section-3: > > A recipient MUST parse an HTTP message as a sequence of octets in an > encoding that is a superset of US-ASCII [USASCII]. Parsing an HTTP > message as a stream of Unicode characters, without regard for the > specific encoding, creates security vulnerabilities due to the > varying ways that string processing libraries handle invalid > multibyte character sequences that contain the octet LF (%x0A). > > http://tools.ietf.org/html/rfc7230#section-3.2.4: > > Historically, HTTP has allowed field content with text in the > ISO-8859-1 charset [ISO-8859-1], supporting other charsets only > through use of [RFC2047] encoding. In practice, most HTTP header > field values use only a subset of the US-ASCII charset [USASCII]. > Newly defined header fields SHOULD limit their field values to > US-ASCII octets. A recipient SHOULD treat other octets in field > content (obs-text) as opaque data. Sure, but what does this mean for implementations? E.g. how we handle (using \0xXX to denote a byte) Location: /\0x80 is surely not going to change, or is it? As far as I know that needs to be treated *identical* to Location: /%80 Now maybe that matches "treat as opaque data", but it does mean that \0x80 needs to become U+0080 before being handed to the URL parser (as in "the real world" it operates on code points). -- https://annevankesteren.nl/
Received on Wednesday, 12 November 2014 08:08:57 UTC