Re: IRIs, IDNAbis, and HTTP

(a side issue only)

At 21:57 08/03/14, Julian Reschke wrote:

>It seems the only way to improve RFC-2047 would be by introducing a new encoding that is sane. Such as:
>
>"Any octet sequence starting with EF BB BF (the UTF-8 BOM) is to be interpreted as Unicode, encoded in UTF-8."

If we are speakind about RFC 2047 itself, then indeed no special sentinel
(such as an UTF-8 BOM) would be neeeded. Any byte with the most significant
bit set would be enough. Also, even for HTTP, mixing iso-8859-1 and UTF-8
might be fine in practice, because it's very easy to distinguish them.

But all this would only make the mess bigger. It's much better to sort this
out on a per header base (unless we can confirm that the current cruft isn't
used at all in practice, which would then allow to go for UTF-8 in all
cases where we need something more than US-ASCII).

Regards,   Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Received on Monday, 17 March 2008 03:57:04 UTC