Internationalization

Lee Daniel Crocker (lcrocker@calweb.com)
Tue, 7 May 1996 09:38:58 -0700 (PDT)


Message-Id: <199605071638.JAA06656@web1.calweb.com>
Subject: Internationalization
To: www-html@w3.org
Date: Tue, 7 May 1996 09:38:58 -0700 (PDT)
From: "Lee Daniel Crocker" <lcrocker@calweb.com>

While I applaud W3's efforts at Internationalizing HTML, there
is no browser I can find that even handles the existing Latin1
set correctly, and there's a remaining issue in English that
hasn't even been addressed yet--dashes and curly quotes.

Microsoft's own site just breaks the rules and uses &151; to get
em dashes, which happens to work on Netscape as well (on the
Windows platform, anyway--it chokes on Unix).  HTML 3.0 at least
had the &emdash; and &endash; entities, but it appears that even
those didn't make it into 3.2, much less the quotation marks.

The few extra characters that many machines (read Mac & Windows)
have--as they must have to do any serious publishing--are stuck
into the 16 unused slots of ISO-8859-1 and Unicode, simply
because it's convenient to put them there, but if we are to make
HTML truly universal, we must provide these features in a clean
standard way and actively discourage incorrect use.

In particular, the standard should in no uncertain terms forbid
the use of non-Unicode encodings like &151;.  Secondly, it should
include character entities for the extra characters (as it has
already for &copy;, and in 3.0 at least, &emdash; and &endash;).
Also needed are &ldquo;, &lsquo;, &rdquo;, etc.  Finally, since
there are only a dozen or so of these, it is no burrden for a
browser to do a linear search on a lookup table to translate the
proper Unicode encodings of &8212;, etc., and the spec should
encourage doing that correctly.

In short, Internationalization is great, but let's get English
right, too.