Non-SGML Char Refs

Greetings,

The validator complains about "non-SGML character" references (e.g., “ 
instead of the correct “) only when validating as XHTML.  That implies

that “ and the other Microsoft characters from decimal 128-159 (hex
80-9f) 
_are_ valid in HTML.

However, the HTML 4.01 spec [1] reads:

	Numeric character references specify the code position of a
character 
	in the document character set. 
	[...]
	The syntax "&#D;", where D is a decimal number, refers to the ISO
10646 
	decimal character number D.

The characters from decimal 128-159 are non-printing controls in
UCS/Unicode.  
From the SGML declaration of HTML4.01 [2]:

	CHARSET
          BASESET  "ISO Registration Number 177//CHARSET
                    ISO/IEC 10646-1:1993 UCS-4 with
                    implementation level 3//ESC 2/5 2/15 4/6"
         DESCSET 0       9       UNUSED
                 9       2       9
                 11      2       UNUSED
                 13      1       13
                 14      18      UNUSED
                 32      95      32
                 127     1       UNUSED
                 128     32      UNUSED
                 160     55136   160
                 55296   2048    UNUSED  -- SURROGATES --
                 57344   1056768 57344

As I read that, it means that 32 chars starting decimal 128 are UNUSED.  
So the validator should flag an error on char refs like “ in HTML4 
as well as in XHTML.

For an example of a page which uses such invalid references under HTML4 
Transitional, but where the refs are not flagged as invalid, see [3].
The WDG validator [5], BTW, does catch this.

1. <http://www.w3.org/TR/html4/charset.html#h-5.3.1>
2. <http://www.w3.org/TR/html4/sgml/sgmldecl.html>
3. <http://my.asu.edu/>
4. <http://www.htmlhelp.com/tools/validator/>

Regards,

Thanasis Kinias
Information Dissemination Team, Information Technology
Arizona State University
Tempe, Ariz., U.S.A.

Qui nos rodunt confundantur
et cum iustis non scribantur.

Received on Monday, 4 June 2001 16:38:39 UTC