On Monday 20 August 2007, Martin Duerst wrote: > At 17:55 07/08/20, Mark Nottingham wrote: > >The (potential) problem is that an intermediary (for example) needs > >to be able to handle headers that it doesn't understand. If it's been > >built to store headers as iso-8859-1 strings as they pass through (a > >reasonable assumption, considering 2616), an unknown header with > >another encoding -- no matter how specified or flagged -- may break it. > > I think you present a valid scenario. However, storing headers as > iso-8859-1 essentially means storing (and resending) them as bytes. > If such an implementation gets UTF-8, it will just store and > resend that as iso-8859-1, which means store and resend as bytes, > which, from the viewpoint of that implementation, will be GIGO, > but overall, will not cause any damage. My 2c: UTF-8 introduces a requirement that ISO8859-X encodings don't have. UTF-8 strings may be invalid, in which case a proper action may be needed (drop ?). Thus, all UTF-8 strings need to be validated. Apart from that, implementations may do various tricks like logging etc, where: a) strlen() is used - not unicode aware b) iconv() is used to convert ISO8859-1 to UTF-8 either for presentation or for internal storage (python or java perhaps?)Received on Monday, 20 August 2007 13:53:05 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 12 September 2008 03:48:58 GMT