- From: Thanasis Kinias <tkinias@asu.edu>
- Date: Mon, 04 Jun 2001 13:37:35 -0700
- To: "'www-validator@w3.org'" <www-validator@w3.org>
- Cc: "'tkinias@optimalco.com'" <tkinias@optimalco.com>
- Message-id: <A021872EC2BDD411AB3600902746A055016048D6@mainex4.asu.edu>
Greetings, The validator complains about "non-SGML character" references (e.g., “ instead of the correct “) only when validating as XHTML. That implies that “ and the other Microsoft characters from decimal 128-159 (hex 80-9f) _are_ valid in HTML. However, the HTML 4.01 spec [1] reads: Numeric character references specify the code position of a character in the document character set. [...] The syntax "&#D;", where D is a decimal number, refers to the ISO 10646 decimal character number D. The characters from decimal 128-159 are non-printing controls in UCS/Unicode. From the SGML declaration of HTML4.01 [2]: CHARSET BASESET "ISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED 128 32 UNUSED 160 55136 160 55296 2048 UNUSED -- SURROGATES -- 57344 1056768 57344 As I read that, it means that 32 chars starting decimal 128 are UNUSED. So the validator should flag an error on char refs like “ in HTML4 as well as in XHTML. For an example of a page which uses such invalid references under HTML4 Transitional, but where the refs are not flagged as invalid, see [3]. The WDG validator [5], BTW, does catch this. 1. <http://www.w3.org/TR/html4/charset.html#h-5.3.1> 2. <http://www.w3.org/TR/html4/sgml/sgmldecl.html> 3. <http://my.asu.edu/> 4. <http://www.htmlhelp.com/tools/validator/> Regards, Thanasis Kinias Information Dissemination Team, Information Technology Arizona State University Tempe, Ariz., U.S.A. Qui nos rodunt confundantur et cum iustis non scribantur.
Received on Monday, 4 June 2001 16:38:39 UTC