- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Fri, 28 Mar 2008 16:33:01 +0100
- To: ietf-http-wg@w3.org
Jamie Lokier wrote: > You are sort of making my point for me. Maybe :-) But I'd say ASCII gibberish =?...?.?...?= is one thing, while a bunch of unidentified non-ASCII octets in header fields are another story. For e-mail, I know almost nothing about HTTP apart from "please fix the syntax in RFCs 2616 and 2617". > Is there a problem with transmitting binary UTF-8? It's just an > "odd way to say" some i18n text. Some receivers will decode it > as intended; some will show gibberish. How is that different > from your example? RFC 2616 unfortunately says that it is Latin-1. Jumping from ASCII to UTF-8 could work if existing implementations have no issues with non-ASCII octets. Jumping from Latin-1 to UTF-8 or maybe not (legacy) isn't secure, implementations won't know what it really is, UTF-8, Latin-1, windows-1252, they'd pass it on with "guess" to applications, and I'm not confident that it cannot cause havoc. For ASCII gibberish with odd =?...?.?...?= words I'm ready to bet on "no problem", but random octets go against my instincts. That is why I proposed to exclude 0x80..0x9F from Mark's 2(b). > I thought the IETF was moving to recommend UTF-8 wherever > possible nowadays? At least since RFC 2277 over ten years ago. The last addition to the club was RFC 5198 (net-utf8) yesterday. With roots in RFC 2068 HTTP is a "grandfathered" case, like mail and NetNews wrt US-ASCII message headers. Tons of legacy software where nobody can prove a negative in the sense of "just send 8, no known problems" are tricky. If we'd work on HTTP/1.2 proposing to ignore RFC 5198 would be madness, but we are supposed to improve the HTTP/1.1 spec. Frank
Received on Friday, 28 March 2008 15:31:28 UTC