- From: Francois Yergeau <yergeau@alis.com>
- Date: Thu, 12 Dec 1996 22:15:30 -0500
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: mduerst@ifi.unizh.ch, Drazen.Kacar@public.srce.hr, Chris.Lilley@sophia.inria.fr, www-international@w3.org, Alan_Barrett/DUB/Lotus.LOTUSINT@crd.lotus.com, bobj@netscape.com, wjs@netscape.com, erik@netscape.com, Ed_Batutis/CAM/Lotus@crd.lotus.com
À 12:31 12-12-96 PST, Larry Masinter a écrit : >The exception for ISO-8859-1 for warning messages in HTTP is based on >the fact that there is an exception for ISO-8859-1 for text documents, >and that it made no sense for the protocol to be inconsistent. Do you mean that unsupported, fictitious statement that ISO 8859-1 is somehow the default for text documents transmitted through HTTP? I hope everyone on this list realizes that this is wrong, that everybody and his brother sets his browser to default to whatever charset is appropriate to one's language, that every server assumes the data it gets from a form is in the same charset as the form itself. Web history notwithstanding. Or do you mean the statement that one can assume that all HTTP clients support ISO 8859-1? Again, this is patently false; try Lynx on a non-Latin-1 terminal. I voiced those concerns at the HTTP-WG meeting at the last IETF, starting what was later called the "charset flap". It ended when Roy Fielding's poorly thought-out assertions overruled my verifiable arguments. The argument of being consistent with a false statement about a fictitious, universally disregarded default seems rather weak to me. IMHO, the love affair with 8859-1 is due to the fact that it neatly solved the incompatible charset problem that the Web faced in the early, Western-language-only days. The problem has now come back on a larger, global scale and we have to face it, not stick stubbornly to a solution that evidently doesn't work at that scale. >heuristic. The 12-byte overhead for the "=?UTF-8?Q?" and "?=" suffix >in the warning message isn't so big, It's much more that 12 bytes, the Q means you have to encode each byte into 3 ASCII bytes. A ten Kanji warning that comes out at 30 bytes in plain UTF-8 grows to 102 in RFC1522'ed UTF-8. RFC1522 (or rather its replacement, RFC2047) would need an 8-bit mode, but it belongs to the mail folks, who do not seem to have noticed that it is used by other, 8-bit protocols. >and isn't really "Clogging up the >8-bit channel". It's ISO 8859-1, not RFC 1522, that's clogging up the 8-bit channel. Spec'ing 8859-1 is an obstacle to proper i18n, since it cannot be *completely reliably* distinguished from UTF-8, as you argue yourself. Anyway, UTF-8 is illegal in the current spec, despite the IAB charset committee recommandation. >Perhaps by the time Unicode is widespread -- in the next 3-5 years -- >we'll have a new version of HTTP 2.x or HTTP-NG. I would certainly >propose that in the future, new versions of HTTP default to UTF-8. Why do I have this nagging feeling of a tendency to push i18n issues away to the indefinite future? Regards, -- François Yergeau <yergeau@alis.com> Alis Technologies Inc., Montréal Tél : +1 (514) 747-2547 Fax : +1 (514) 747-2561
Received on Thursday, 12 December 1996 22:21:38 UTC