[Prev][Next][Index][Thread]

Re: Charset support (was: Accept-Charset support)



On Dec 16,  3:37pm, Martin J. Duerst wrote:

> Chris (and everybody who might have had any doubts or concerns
> about this): UTF-8 leaves all octet values between 00 and 7F
> untouched. Any ASCII character or C0 control character, converted
> to UTF-8, looks exactly the same as before.

Yes, I was aware of that part

> And whatever exotic
> character you take from ISO 10646, there is never any chance
> that some of the octets that represent it in UTF-8 may
> be mistaken for C0 or ASCII,

ah, thanks for the clarification.

> If we had such basic problems, I would
> never have dared to suggest UTF-8 in the first place.

I was somewhat surprised by what you suggested; I am glad to be
reassured that you have considered the string termination.

> Chris:
> > Response codes are for human
> > debugging [...]  multilingual response codes are really just
> > icing.

> Also, the http warnings might be the first place where anything
> except 7-bit is allowed *officially* in internet application protocol
> headers. Having such a lopsided spec as "ISO-8859-1 or RFC1522",
> at a place that is just made for UTF-8 (and for which UTF-8 was
> made), creates a very bad precedent.

I accept this argument.

> Accepting the argument that
> ISO-8859-1 was used for "consistency" also creates a very bad precedent.

Agreed


> Overall, I think that if it is a small issue, there should not
> be much resistance getting it right. There seems to be virtually
> no installed base, and the current discussion has not shown
> any good arguments for ISO-8859-1. The main issues seem to be
> procedural concerns, on which I am open to any reasonable
> solution whatsoever (be it a last-minute change to the RFC
> on request of the wg, a separate RFC, a mutual understanding,
> or whatever).

As I understand it, the RFC has not been issued yet.

--
Chris (sorry my .sig is on the blink)


-- 


References: