Re: PROPOSAL: i74: Encoding for non-ASCII headers from Frank Ellermann on 2008-03-28 (ietf-http-wg@w3.org from January to March 2008)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Fri, 28 Mar 2008 16:33:01 +0100
To: ietf-http-wg@w3.org
Message-ID: <fsj2uq$ej1$1@ger.gmane.org>

Jamie Lokier wrote:

> You are sort of making my point for me.

Maybe :-)  But I'd say ASCII gibberish =?...?.?...?= is one thing,
while a bunch of unidentified non-ASCII octets in header fields
are another story.  For e-mail, I know almost nothing about HTTP
apart from "please fix the syntax in RFCs 2616 and 2617".

> Is there a problem with transmitting binary UTF-8?  It's just an
> "odd way to say" some i18n text.  Some receivers will decode it
> as intended; some will show gibberish.  How is that different
> from your example?

RFC 2616 unfortunately says that it is Latin-1.  Jumping from
ASCII to UTF-8 could work if existing implementations have no
issues with non-ASCII octets.   Jumping from Latin-1 to UTF-8
or maybe not (legacy) isn't secure, implementations won't know
what it really is, UTF-8, Latin-1, windows-1252, they'd pass it
on with "guess" to applications, and I'm not confident that it
cannot cause havoc.  

For ASCII gibberish with odd =?...?.?...?= words I'm ready to
bet on "no problem", but random octets go against my instincts.
That is why I proposed to exclude 0x80..0x9F from Mark's 2(b).

> I thought the IETF was moving to recommend UTF-8 wherever 
> possible nowadays?

At least since RFC 2277 over ten years ago.  The last addition
to the club was RFC 5198 (net-utf8) yesterday.  With roots in
RFC 2068 HTTP is a "grandfathered" case, like mail and NetNews
wrt US-ASCII message headers.

Tons of legacy software where nobody can prove a negative in
the sense of "just send 8, no known problems" are tricky.

If we'd work on HTTP/1.2 proposing to ignore RFC 5198 would
be madness, but we are supposed to improve the HTTP/1.1 spec.

 Frank

Received on Friday, 28 March 2008 15:31:28 UTC