- From: Brian Smith <brian@briansmith.org>
- Date: Tue, 11 Nov 2014 14:36:41 -0800
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Anne van Kesteren <annevk@annevk.nl> wrote: > On Mon, Nov 10, 2014 at 2:18 AM, Brian Smith <brian@briansmith.org> wrote: >> Header encoding is defined in the HTTP specification. Also, there are >> about 3 million emails on the HTTP WG mailing list about this topic. > > As far as I know that's false. Legacy headers are decoded per > "original latin1". For new headers you need to specify it. Of course, > that completely fails with generic APIs, I'm not sure if they > considered that. I think you may be looking at the obsolete version of the spec (RFC 2616). This was fixed (not as completely as I would like) in the new version (RFC 7230). http://tools.ietf.org/html/rfc7230#section-3: A recipient MUST parse an HTTP message as a sequence of octets in an encoding that is a superset of US-ASCII [USASCII]. Parsing an HTTP message as a stream of Unicode characters, without regard for the specific encoding, creates security vulnerabilities due to the varying ways that string processing libraries handle invalid multibyte character sequences that contain the octet LF (%x0A). http://tools.ietf.org/html/rfc7230#section-3.2.4: Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data. Cheers, Brian
Received on Tuesday, 11 November 2014 22:37:08 UTC