Re: Character encodings in headers [i74][was: Straw-man charter for http-bis]

On Monday 20 August 2007, Martin Duerst wrote:
> At 17:55 07/08/20, Mark Nottingham wrote:
> >The (potential) problem is that an intermediary (for example) needs
> >to be able to handle headers that it doesn't understand. If it's been
> >built to store headers as iso-8859-1 strings as they pass through (a
> >reasonable assumption, considering 2616), an unknown header with
> >another encoding -- no matter how specified or flagged -- may break it.
>
> I think you present a valid scenario. However, storing headers as
> iso-8859-1 essentially means storing (and resending) them as bytes.
> If such an implementation gets UTF-8, it will just store and
> resend that as iso-8859-1, which means store and resend as bytes,
> which, from the viewpoint of that implementation, will be GIGO,
> but overall, will not cause any damage.

My 2c:

  UTF-8 introduces a requirement that ISO8859-X encodings don't have. UTF-8 
strings may be invalid, in which case a proper action may be needed (drop ?). 
Thus, all UTF-8 strings need to be validated.

  Apart from that, implementations may do various tricks like logging etc, 
where:
a) strlen() is used - not unicode aware
b) iconv() is used to convert ISO8859-1 to UTF-8 either for presentation or 
for internal storage (python or java perhaps?)

Received on Monday, 20 August 2007 13:53:05 UTC