Re: IRIs, IDNAbis, and HTTP from Martin Duerst on 2008-03-15 (ietf-http-wg@w3.org from January to March 2008)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Sat, 15 Mar 2008 09:27:18 +0900
To: Julian Reschke <julian.reschke@gmx.de>, Brian Smith <brian@briansmith.org>
Cc: "'HTTP Working Group'" <ietf-http-wg@w3.org>
Message-Id: <6.0.0.20.2.20080315092059.08440ec0@localhost>

(a side issue only)

At 21:57 08/03/14, Julian Reschke wrote:

>It seems the only way to improve RFC-2047 would be by introducing a new encoding that is sane. Such as:
>
>"Any octet sequence starting with EF BB BF (the UTF-8 BOM) is to be interpreted as Unicode, encoded in UTF-8."

If we are speakind about RFC 2047 itself, then indeed no special sentinel
(such as an UTF-8 BOM) would be neeeded. Any byte with the most significant
bit set would be enough. Also, even for HTTP, mixing iso-8859-1 and UTF-8
might be fine in practice, because it's very easy to distinguish them.

But all this would only make the mess bigger. It's much better to sort this
out on a per header base (unless we can confirm that the current cruft isn't
used at all in practice, which would then allow to go for UTF-8 in all
cases where we need something more than US-ASCII).

Regards,   Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Monday, 17 March 2008 03:57:04 UTC