W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2007

Re: Character encodings in headers [i74][was: Straw-man charter forhttp-bis]

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Mon, 20 Aug 2007 17:10:27 +0900
Message-Id: <>
To: Keith Moore <moore@cs.utk.edu>, Mark Nottingham <mnot@mnot.net>
Cc: Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>

At 15:56 07/08/20, Keith Moore wrote:
>Mark Nottingham wrote:
>> On 10/06/2007, at 6:05 PM, Martin Duerst wrote:
>>> - RFC 2616 prescribes that headers containing non-ASCII have to use
>>>   either iso-8859-1 or RFC 2047. This is unnecessarily complex and
>>>   not necessarily followed. At the least, new extensions should be
>>>   allowed to specify that UTF-8 is used.
>> My .02;
>> I'm concerned about allowing UTF-8; it may break existing
>> implementations.
>concur.  though at least it is possible to distinguish utf-8 from 8859-1. 

In practice indeed this can be done with high reliability; please
see http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/IUC11-UTF-8.pdf
for details. For iso-8859-1, see in particular p. 21.

>also, I'll note that supporting utf-8 in a way that is backward
>compatible with existing implementations is almost certainly more
>complex (and thus more costly, error-prone, etc) than supporting rfc 2047.

Well, if "backwards compatible" means also supporting RFC 2047,
then that's a tautology. If the choice is between UTF-8 and RFC 2047,
however, then I'd take UTF-8 any time, because RFC 2047 includes
UTF-8 as well as many other encodings.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Monday, 20 August 2007 08:21:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:10:43 UTC