Message-Id: <199605071638.JAA06656@web1.calweb.com> Subject: Internationalization To: firstname.lastname@example.org Date: Tue, 7 May 1996 09:38:58 -0700 (PDT) From: "Lee Daniel Crocker" <email@example.com> While I applaud W3's efforts at Internationalizing HTML, there is no browser I can find that even handles the existing Latin1 set correctly, and there's a remaining issue in English that hasn't even been addressed yet--dashes and curly quotes. Microsoft's own site just breaks the rules and uses &151; to get em dashes, which happens to work on Netscape as well (on the Windows platform, anyway--it chokes on Unix). HTML 3.0 at least had the &emdash; and &endash; entities, but it appears that even those didn't make it into 3.2, much less the quotation marks. The few extra characters that many machines (read Mac & Windows) have--as they must have to do any serious publishing--are stuck into the 16 unused slots of ISO-8859-1 and Unicode, simply because it's convenient to put them there, but if we are to make HTML truly universal, we must provide these features in a clean standard way and actively discourage incorrect use. In particular, the standard should in no uncertain terms forbid the use of non-Unicode encodings like &151;. Secondly, it should include character entities for the extra characters (as it has already for ©, and in 3.0 at least, &emdash; and &endash;). Also needed are “, ‘, ”, etc. Finally, since there are only a dozen or so of these, it is no burrden for a browser to do a linear search on a lookup table to translate the proper Unicode encodings of &8212;, etc., and the spec should encourage doing that correctly. In short, Internationalization is great, but let's get English right, too.