- From: der Mouse <mouse@Rodents.Montreal.QC.CA>
- Date: Mon, 20 Aug 2007 14:37:13 +0000
- To: discuss@apps.ietf.org, Felix Sasaki <fsasaki@w3.org>, ietf-http-wg@w3.org, Richard Ishida <ishida@w3.org>
> I think you present a valid scenario. However, storing headers as > iso-8859-1 essentially means storing (and resending) them as bytes. Depends on how much checking is done. The C0 and C1 ranges are not valid 8859-x text (except for a few codes in C0, like HT), but, as Clive points out, C1 does, in general, occur in UTF-8-encoded text. I recognize there's a "who would bother to check" tendency. While I share it, I also believe the number of distinct implementations out there is large enough that anything permitted by the spec has probably been done (and, of course, a great many things not permitted by the spec, but I see no reason to care about compatability with them). In particular, any implementation whose native text encoding is not 8859-1 may be recoding headers into its native encoding for storage and back again on output, and that is almost certain to corrupt C1 octets. The only fix I can see for that is to do something like UTF-8, but tweaked to keep all octets in the ISO-8859-x printable space. I've ben unable to come up with a way of doing this by just changing the fixed bits in UTF-8; it seems to me to require putting only five (rather than six) bits of data in the second and later octets. (I suspect this wouldn't fly, simply because UTF-8 is too entrenched, but it's the only way I can see to be strictly compatible. It also has the disadvantage that part of the BMP needs four octets rather than the three that UTF-8 needs.) /~\ The ASCII der Mouse \ / Ribbon Campaign X Against HTML mouse@rodents.montreal.qc.ca / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Received on Monday, 20 August 2007 15:39:26 UTC