- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Mon, 16 Dec 1996 15:37:06 +0100 (MET)
- To: Chris Lilley <Chris.Lilley@sophia.inria.fr>
- cc: Klaus Weide <kweide@tezcat.com>, www-international@w3.org
On Mon, 16 Dec 1996, Chris Lilley wrote: > On Dec 15, 1:49pm, Klaus Weide wrote: > > > On Fri, 13 Dec 1996, Martin J. Duerst wrote: > > > > Then let's make this file (and a little bit of code to > > > extract the desired warning) available to implementors, > > good so far > > > > and let's ask them to just send the strings out as is, > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > and just silently ignore the antiquated ISO-8859-1 default > > ^^^^^^^^^^^^^^^^^^^^ > > > for warnings, and silently change that to UTF-8. > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Is it possible for any Unicode character, when converted to > UTF-8, to contain a byte which is 0D or 0A? I suspect, looking at > the UTF-8 algorithm, that this is the case. How then should a > compliant HTTP/1.1 implementation tell when the HTTP reason code is > finished, if it can contain bytes that look like CR and LF? Chris (and everybody who might have had any doubts or concerns about this): UTF-8 leaves all octet values between 00 and 7F untouched. Any ASCII character or C0 control character, converted to UTF-8, looks exactly the same as before. And whatever exotic character you take from ISO 10646, there is never any chance that some of the octets that represent it in UTF-8 may be mistaken for C0 or ASCII, even if your UTF-8 parser gets hopelessly out of sync. If we had such basic problems, I would never have dared to suggest UTF-8 in the first place. > > If you think that > > part of it is really unacceptable, you should try to take that up with > > the http-wg > > Yes, there is always room for a compelling and well argued case. However, > is this really the number one i18N issue? Response codes are for human > debugging; it is probably more important to ensure that multilingual > content can be delivered, multilingual response codes are really just > icing. I agree that it's not the number one i18n issue. That's probably why it has not received serious attention. On the other hand, I think that for i18n, we have to stop to let bad or antiquated design go by without being concerned. Also, the http warnings might be the first place where anything except 7-bit is allowed *officially* in internet application protocol headers. Having such a lopsided spec as "ISO-8859-1 or RFC1522", at a place that is just made for UTF-8 (and for which UTF-8 was made), creates a very bad precedent. Accepting the argument that ISO-8859-1 was used for "consistency" also creates a very bad precedent. Overall, I think that if it is a small issue, there should not be much resistance getting it right. There seems to be virtually no installed base, and the current discussion has not shown any good arguments for ISO-8859-1. The main issues seem to be procedural concerns, on which I am open to any reasonable solution whatsoever (be it a last-minute change to the RFC on request of the wg, a separate RFC, a mutual understanding, or whatever). Regards, Martin.
Received on Monday, 16 December 1996 09:37:38 UTC