Re: Accept-Charset support from Francois Yergeau on 1996-12-13 (www-international@w3.org from October to December 1996)

From: Francois Yergeau <yergeau@alis.com>
Date: Thu, 12 Dec 1996 22:15:30 -0500
To: Larry Masinter <masinter@parc.xerox.com>
Cc: mduerst@ifi.unizh.ch, Drazen.Kacar@public.srce.hr, Chris.Lilley@sophia.inria.fr, www-international@w3.org, Alan_Barrett/DUB/Lotus.LOTUSINT@crd.lotus.com, bobj@netscape.com, wjs@netscape.com, erik@netscape.com, Ed_Batutis/CAM/Lotus@crd.lotus.com
Message-Id: <2.2.32.19961213031530.0070b9ac@genstar.alis.ca>

À 12:31 12-12-96 PST, Larry Masinter a écrit :
>The exception for ISO-8859-1 for warning messages in HTTP is based on
>the fact that there is an exception for ISO-8859-1 for text documents,
>and that it made no sense for the protocol to be inconsistent.

Do you mean that unsupported, fictitious statement that ISO 8859-1 is
somehow the default for text documents transmitted through HTTP?  I hope
everyone on this list realizes that this is wrong, that everybody and his
brother sets his browser to default to whatever charset is appropriate to
one's language, that every server assumes the data it gets from a form is in
the same charset as the form itself.  Web history notwithstanding.

Or do you mean the statement that one can assume that all HTTP clients
support ISO 8859-1?  Again, this is patently false; try Lynx on a
non-Latin-1 terminal.

I voiced those concerns at the HTTP-WG meeting at the last IETF, starting
what was later called the "charset flap".  It ended when Roy Fielding's
poorly thought-out assertions overruled my verifiable arguments.

The argument of being consistent with a false statement about a fictitious,
universally disregarded default seems rather weak to me.

IMHO, the love affair with 8859-1 is due to the fact that it neatly solved
the incompatible charset problem that the Web faced in the early,
Western-language-only days.  The problem has now come back on a larger,
global scale and we have to face it, not stick stubbornly to a solution that
evidently doesn't work at that scale.

>heuristic. The 12-byte overhead for the "=?UTF-8?Q?" and "?=" suffix
>in the warning message isn't so big, 

It's much more that 12 bytes, the Q means you have to encode each byte into
3 ASCII bytes.  A ten Kanji warning that comes out at 30 bytes in plain
UTF-8 grows to 102 in RFC1522'ed UTF-8.  RFC1522 (or rather its replacement,
RFC2047) would need an 8-bit mode, but it belongs to the mail folks, who do
not seem to have noticed that it is used by other, 8-bit protocols.

>and isn't really "Clogging up the
>8-bit channel".

It's ISO 8859-1, not RFC 1522, that's clogging up the 8-bit channel.
Spec'ing 8859-1 is an obstacle to proper i18n, since it cannot be
*completely reliably* distinguished from UTF-8, as you argue yourself.
Anyway, UTF-8 is illegal in the current spec, despite the IAB charset
committee recommandation.

>Perhaps by the time Unicode is widespread -- in the next 3-5 years --
>we'll have a new version of HTTP 2.x or HTTP-NG. I would certainly
>propose that in the future, new versions of HTTP default to UTF-8.

Why do I have this nagging feeling of a tendency to push i18n issues away to
the indefinite future?

Regards,

-- 
François Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montréal
Tél : +1 (514) 747-2547
Fax : +1 (514) 747-2561

Received on Thursday, 12 December 1996 22:21:38 UTC