- From: Foteos Macrides <MACRIDES@SCI.WFBR.EDU>
- Date: Mon, 09 Feb 1998 14:11:00 -0500 (EST)
- To: DPawson@rnib.org.uk
- Cc: w3c-wai-ig@w3.org
"Pawson, David" <DPawson@rnib.org.uk> wrote: >> to follow up on what Charles said: >> >> > Please refer me to exactly what needs to be corrected in the >> > next version of Internet Explorer. Thanks, >> >> There is also an issue with the programs that originate, HTML, as >> opposed to interpreting in. That is to say, don't represent an >> — as —, etc, but use the SGML entity names or ISO >> character numbers for them. >> >> Al Gilman > [Pawson, David] > > Surely the simple need is for IE4.xx and netscape to implement >ISO latin 1? [I.e. be capable of displaying the correct glyphs for >each entity in the set]. In deference to those out there who only live >in MSDOS, perhaps this should be a switchable option? > > My logic says that the html generator programs will follow the >browsers fairly rapidly. I.e. the html editor software programs. Is >that reasonable? > > My only real concern is that the single ISO latin 1 is only one >of a number needed for true internationalisation. A Unicode shift would >give a real move forward, permitting a wider use of the other entity sets. The codepages 850 for DOS and 1252 for Windows adequately (IMHO) encompass the Latin 1 (iso-8859-1) character set. The problem has two aspects: (1) The HTML editor software programs are not respecting that the values of numeric character references are for the HTML Document Character Set, which it iso-10646 (essentially, Unicode) as of HTML 4.0, and iso-8859-1 (a subset of iso-10646) in previous HTML specs, and are generating numeric character references in the range reserved for control characters -- disallowed for HTML -- but corresponding to intended characters such as fancy dashes and quotation marks in the Windows codepage; (2) that the browsers are treating these character references in that range as references to the corresponding characters in the Windows codepage, rather than as disallowed values in the HTML Document Character Set. Both aspects of the problem need to be addressed simultaneously, or you are likely to find yourself in the position of hoping that the tail can wag the dog. The HTML 4.0 specs now include named character references for all of the characters which are presently being handled via invalid numeric character references (except smiling face :). Here is a list of the invalid nurmeric character references being encountered on today's Web, and their correct numeric (in hex notation) and named references: Conversions of invalid numeric (MicroSoft codepage) character references to valid Unicode numeric or named character reference (names as in HTML 4.0). INVALID Numeric Named Character ------- -------- ------- -----------------------------------------  -> ☺ (none) WHITE SMILING FACE ‚ -> ‚ ‚ SINGLE LOW-9 QUOTATION MARK „ -> „ „ DOUBLE LOW-9 QUOTATION MARK … -> … … HORIZONTAL ELLIPSIS † -> † † DAGGER ‡ -> ‡ ‡ DOUBLE DAGGER ‰ -> ‰ ‰ PER MILLE SIGN ‹ -> ‹ ‹ SINGLE LEFT-POINTING ANGLE QUOTATION MARK ‘ -> ‘ ‘ LEFT SINGLE QUOTATION MARK ’ -> ’ ’ RIGHT SINGLE QUOTATION MARK “ -> “ “ LEFT DOUBLE QUOTATION MARK ” -> ” ” RIGHT DOUBLE QUOTATION MARK • -> • • BULLET – -> – – EN DASH — -> — — EM DASH ˜ -> ˜ ˜ SMALL TILDE ™ -> ™ ™ TRADE MARK SIGN › -> › › SINGLE RIGHT-POINTING ANGLE QUOTATION MARK As I noted in a previous message, for accessibility reasons Lynx 2.7.2 is performing the above conversions of invalid numeric character references, but that's a catch-22. We're now seeing messages like this from people who rely on the empirical behavior of browsers instead of understanding and complying with the standards for interoperability: "We gee, ‘ and ’ get me the quotation marks in Lynx, and it's developers are fussy about standards, so that must be OK." Sigh... ☺ Note that a number of smiley and frowney characters are available in iso-10646, and I hope the named entities for them are added to the HTML specs soon, because the present situation on the Web is a serious threat to world peace. :) Fote ========================================================================= Foteos Macrides Worcester Foundation for Biomedical Research MACRIDES@SCI.WFBR.EDU 222 Maple Avenue, Shrewsbury, MA 01545 =========================================================================
Received on Monday, 9 February 1998 14:15:06 UTC