Re: Character encodings in headers [i74][was: Straw-man charter for http-bis] from Keith Moore on 2007-08-21 (ietf-http-wg@w3.org from July to September 2007)

From: Keith Moore <moore@cs.utk.edu>
Date: Tue, 21 Aug 2007 12:51:01 -0400
To: Stefanos Harhalakis <v13@priest.com>
CC: Martin Duerst <duerst@it.aoyama.ac.jp>, Paul Hoffman <phoffman@imc.org>, Felix Sasaki <fsasaki@w3.org>, Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <46CB17F5.8070702@cs.utk.edu>

>
> My 2c:
>
>   UTF-8 introduces a requirement that ISO8859-X encodings don't have. UTF-8 
> strings may be invalid, in which case a proper action may be needed (drop ?). 
> Thus, all UTF-8 strings need to be validated.
>   
no.  the last thing we need in HTTP (or any protocol IMHO) is for
intermediaries to try to be smarter than their endpoints.
>   Apart from that, implementations may do various tricks like logging etc, 
> where:
> a) strlen() is used - not unicode aware
>   
strlen works the same for utf-8 as for ascii, as long as what you care
about is number of bytes in the string rather than, say, the amount of
space it will take up when displayed.
> b) iconv() is used to convert ISO8859-1 to UTF-8 either for presentation or 
> for internal storage (python or java perhaps?)
valid point.

Keith

Received on Tuesday, 21 August 2007 16:51:49 UTC