- From: Keith Moore <moore@cs.utk.edu>
- Date: Mon, 20 Aug 2007 15:29:46 -0400
- To: der Mouse <mouse@Rodents.Montreal.QC.CA>
- CC: discuss@apps.ietf.org, Felix Sasaki <fsasaki@w3.org>, ietf-http-wg@w3.org, Richard Ishida <ishida@w3.org>
der Mouse wrote: >> I think you present a valid scenario. However, storing headers as >> iso-8859-1 essentially means storing (and resending) them as bytes. >> > > Depends on how much checking is done. The C0 and C1 ranges are not > valid 8859-x text (except for a few codes in C0, like HT), but, as > Clive points out, C1 does, in general, occur in UTF-8-encoded text. > > I recognize there's a "who would bother to check" tendency. While I > share it, I also believe the number of distinct implementations out > there is large enough that anything permitted by the spec has probably > been done (and, of course, a great many things not permitted by the > spec, but I see no reason to care about compatability with them). In > particular, any implementation whose native text encoding is not 8859-1 > may be recoding headers into its native encoding for storage and back > again on output, and that is almost certain to corrupt C1 octets. I suspect that the problem is not so much transparency, as presentation. The larger set of things broken by allowing utf-8 in existing header fields (and to a lesser extent new fields) will not be things that forbid C1 octet values, but rather things that try to display those fields as if they were 8859/1. Translation of the presumed 8859/1 into other charsets is another version of the same problem.
Received on Monday, 20 August 2007 19:30:28 UTC