- From: Thanasis Kinias <tkinias@asu.edu>
- Date: Mon, 04 Jun 2001 13:37:35 -0700
- To: "'www-validator@w3.org'" <www-validator@w3.org>
- Cc: "'tkinias@optimalco.com'" <tkinias@optimalco.com>
- Message-id: <A021872EC2BDD411AB3600902746A055016048D6@mainex4.asu.edu>
Greetings,
The validator complains about "non-SGML character" references (e.g., “
instead of the correct “) only when validating as XHTML. That implies
that “ and the other Microsoft characters from decimal 128-159 (hex
80-9f)
_are_ valid in HTML.
However, the HTML 4.01 spec [1] reads:
Numeric character references specify the code position of a
character
in the document character set.
[...]
The syntax "&#D;", where D is a decimal number, refers to the ISO
10646
decimal character number D.
The characters from decimal 128-159 are non-printing controls in
UCS/Unicode.
From the SGML declaration of HTML4.01 [2]:
CHARSET
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with
implementation level 3//ESC 2/5 2/15 4/6"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 32 UNUSED
160 55136 160
55296 2048 UNUSED -- SURROGATES --
57344 1056768 57344
As I read that, it means that 32 chars starting decimal 128 are UNUSED.
So the validator should flag an error on char refs like “ in HTML4
as well as in XHTML.
For an example of a page which uses such invalid references under HTML4
Transitional, but where the refs are not flagged as invalid, see [3].
The WDG validator [5], BTW, does catch this.
1. <http://www.w3.org/TR/html4/charset.html#h-5.3.1>
2. <http://www.w3.org/TR/html4/sgml/sgmldecl.html>
3. <http://my.asu.edu/>
4. <http://www.htmlhelp.com/tools/validator/>
Regards,
Thanasis Kinias
Information Dissemination Team, Information Technology
Arizona State University
Tempe, Ariz., U.S.A.
Qui nos rodunt confundantur
et cum iustis non scribantur.
Received on Monday, 4 June 2001 16:38:39 UTC