- From: Stefanos Harhalakis <v13@priest.com>
- Date: Mon, 20 Aug 2007 16:52:26 +0300
- To: Martin Duerst <duerst@it.aoyama.ac.jp>
- Cc: Mark Nottingham <mnot@mnot.net>, John C Klensin <john-ietf@jck.com>, Richard Ishida <ishida@w3.org>, Apps Discuss <discuss@apps.ietf.org>, Felix Sasaki <fsasaki@w3.org>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Paul Hoffman <phoffman@imc.org>
On Monday 20 August 2007, Martin Duerst wrote: > At 17:55 07/08/20, Mark Nottingham wrote: > >The (potential) problem is that an intermediary (for example) needs > >to be able to handle headers that it doesn't understand. If it's been > >built to store headers as iso-8859-1 strings as they pass through (a > >reasonable assumption, considering 2616), an unknown header with > >another encoding -- no matter how specified or flagged -- may break it. > > I think you present a valid scenario. However, storing headers as > iso-8859-1 essentially means storing (and resending) them as bytes. > If such an implementation gets UTF-8, it will just store and > resend that as iso-8859-1, which means store and resend as bytes, > which, from the viewpoint of that implementation, will be GIGO, > but overall, will not cause any damage. My 2c: UTF-8 introduces a requirement that ISO8859-X encodings don't have. UTF-8 strings may be invalid, in which case a proper action may be needed (drop ?). Thus, all UTF-8 strings need to be validated. Apart from that, implementations may do various tricks like logging etc, where: a) strlen() is used - not unicode aware b) iconv() is used to convert ISO8859-1 to UTF-8 either for presentation or for internal storage (python or java perhaps?)
Received on Monday, 20 August 2007 13:53:05 UTC