Re: Character encodings in headers [i74][was: Straw-man charter forhttp-bis]

At 15:56 07/08/20, Keith Moore wrote:
>Mark Nottingham wrote:
>> On 10/06/2007, at 6:05 PM, Martin Duerst wrote:
>>> - RFC 2616 prescribes that headers containing non-ASCII have to use
>>>   either iso-8859-1 or RFC 2047. This is unnecessarily complex and
>>>   not necessarily followed. At the least, new extensions should be
>>>   allowed to specify that UTF-8 is used.
>>
>> My .02;
>>
>> I'm concerned about allowing UTF-8; it may break existing
>> implementations.
>concur.  though at least it is possible to distinguish utf-8 from 8859-1. 

In practice indeed this can be done with high reliability; please
see http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/IUC11-UTF-8.pdf
for details. For iso-8859-1, see in particular p. 21.

>also, I'll note that supporting utf-8 in a way that is backward
>compatible with existing implementations is almost certainly more
>complex (and thus more costly, error-prone, etc) than supporting rfc 2047.

Well, if "backwards compatible" means also supporting RFC 2047,
then that's a tautology. If the choice is between UTF-8 and RFC 2047,
however, then I'd take UTF-8 any time, because RFC 2047 includes
UTF-8 as well as many other encodings.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Received on Monday, 20 August 2007 08:21:21 UTC