Re: Character encodings in headers [i74][was: Straw-man charter for http-bis] from Stefanos Harhalakis on 2007-08-20 (ietf-http-wg@w3.org from July to September 2007)

From: Stefanos Harhalakis <v13@priest.com>
Date: Mon, 20 Aug 2007 16:52:26 +0300
To: Martin Duerst <duerst@it.aoyama.ac.jp>
Cc: Mark Nottingham <mnot@mnot.net>, John C Klensin <john-ietf@jck.com>, Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>
Message-Id: <200708201652.26863.v13@priest.com>

On Monday 20 August 2007, Martin Duerst wrote:
> At 17:55 07/08/20, Mark Nottingham wrote:
> >The (potential) problem is that an intermediary (for example) needs
> >to be able to handle headers that it doesn't understand. If it's been
> >built to store headers as iso-8859-1 strings as they pass through (a
> >reasonable assumption, considering 2616), an unknown header with
> >another encoding -- no matter how specified or flagged -- may break it.
>
> I think you present a valid scenario. However, storing headers as
> iso-8859-1 essentially means storing (and resending) them as bytes.
> If such an implementation gets UTF-8, it will just store and
> resend that as iso-8859-1, which means store and resend as bytes,
> which, from the viewpoint of that implementation, will be GIGO,
> but overall, will not cause any damage.

My 2c:

  UTF-8 introduces a requirement that ISO8859-X encodings don't have. UTF-8 
strings may be invalid, in which case a proper action may be needed (drop ?). 
Thus, all UTF-8 strings need to be validated.

  Apart from that, implementations may do various tricks like logging etc, 
where:
a) strlen() is used - not unicode aware
b) iconv() is used to convert ISO8859-1 to UTF-8 either for presentation or 
for internal storage (python or java perhaps?)

Received on Monday, 20 August 2007 13:53:05 UTC