[Prev][Next][Index][Thread]
Re: Charset support (was: Accept-Charset support)
On Dec 16, 3:37pm, Martin J. Duerst wrote:
> Chris (and everybody who might have had any doubts or concerns
> about this): UTF-8 leaves all octet values between 00 and 7F
> untouched. Any ASCII character or C0 control character, converted
> to UTF-8, looks exactly the same as before.
Yes, I was aware of that part
> And whatever exotic
> character you take from ISO 10646, there is never any chance
> that some of the octets that represent it in UTF-8 may
> be mistaken for C0 or ASCII,
ah, thanks for the clarification.
> If we had such basic problems, I would
> never have dared to suggest UTF-8 in the first place.
I was somewhat surprised by what you suggested; I am glad to be
reassured that you have considered the string termination.
> Chris:
> > Response codes are for human
> > debugging [...] multilingual response codes are really just
> > icing.
> Also, the http warnings might be the first place where anything
> except 7-bit is allowed *officially* in internet application protocol
> headers. Having such a lopsided spec as "ISO-8859-1 or RFC1522",
> at a place that is just made for UTF-8 (and for which UTF-8 was
> made), creates a very bad precedent.
I accept this argument.
> Accepting the argument that
> ISO-8859-1 was used for "consistency" also creates a very bad precedent.
Agreed
> Overall, I think that if it is a small issue, there should not
> be much resistance getting it right. There seems to be virtually
> no installed base, and the current discussion has not shown
> any good arguments for ISO-8859-1. The main issues seem to be
> procedural concerns, on which I am open to any reasonable
> solution whatsoever (be it a last-minute change to the RFC
> on request of the wg, a separate RFC, a mutual understanding,
> or whatever).
As I understand it, the RFC has not been issued yet.
--
Chris (sorry my .sig is on the blink)
--
References: