Re: IRIs, IDNAbis, and HTTP

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Sat, 15 Mar 2008 08:40:35 +0900
Message-Id: <>
To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>, ietf-http-wg@w3.org

At 00:29 08/03/14, Frank Ellermann wrote:
>Brian Smith wrote:

>> RFC 2277 applies to any updates to an existing protocol, as
>> far as I can tell.
>It talks about UTF-8 "for all text".  We can ask Harald what
>that precisely means, my first guess is "SDU" (body), not the
>complete "PDU" (header + body).

It's all those protocol fields where you need human-readable
text. The Subject: of an email very clearly qualifies as
text, so it's not only body.

>> I think that HTTPbis should explain how to encode UTF-8
>> text in newly registered header fields. The de-facto 
>> mechanism for this, used by Atom and WebDAV, is percent-
>> encoded UTF-8.
>For a draft standard MIME RFC 2047 comes to mind, for a BCP 
>one of the two mechanisms recommended in BCP 113 (RFC 5137).
>BCP 113 says that using UTF-8 is typically a bad idea when
>looking for an ASCII-compatible encoding.  I'm not hot about
>what to use in 2616bis, if anything, but if it ends up in a
>single case remotely requiring IDNAbis punycode I scream :-)
>For EAI they test the UTF-8 waters with *experimental* RFCs,
>2616bis will be a draft standard or better if all goes well.

>NAK, but FWIW Harald was the USEFOR WG Chair when what is 
>now RFC.usefor-usefor was Last Called and approved.  He did
>not insist on allowing UTF-8 in NetNews header fields, and
>several years of the USEFOR WG were wasted to pull this 
>feature from an earlier set of Usefor drafts, because it
>would break the complete installed base.

That may have been the case for USEFOR. Can you show how
allowing e.g. new HTTP header fields to use UTF-8 would
break anything in the installed base?

>Of course RFC.usefor-usefor has I18N considerations, and
>you can use UTF-8 in NetNews roughly in the same way (MIME)
>as in mail, or roughly as in HTTP (NetNews is 8bit clean):

There is a big difference between MIME (ASCII+RFC2045)
and HTTP (iso-8859-1+RFC2045).

>But NOT "as is" in header fields (NetNews uses RFC 2231, a
>successor of RFC 2047 also supporting language tags, for
>this purpose).  Another problem with percent-encoded UTF-8,
>it offers no indication of language tags (BCP 47).

I have yet to see a case where the absence of language information
in a header is a problem in practice. Do you know of any?

Regards,   Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
